Izumi TSUNOKUNI Gen SATO Yusuke IKEDA Yasuhiro OIKAWA
This paper reports a spatial extrapolation of the sound field with a physics-informed neural network. We investigate the spatial extrapolation of the room impulse responses with physics-informed SIREN architecture. Furthermore, we proposed a noise-robust extrapolation method by introducing a tolerance term to the loss function.
Akira KITAYAMA Goichi ONO Hiroaki ITO
Edge devices with strict safety and reliability requirements, such as autonomous driving cars, industrial robots, and drones, necessitate software verification on such devices before operation. The human cost and time required for this analysis constitute a barrier in the cycle of software development and updating. In particular, the final verification at the edge device should at least strictly confirm that the updated software is not degraded from the current it. Since the edge device does not have the correct data, it is necessary for a human to judge whether the difference between the updated software and the operating it is due to degradation or improvement. Therefore, this verification is very costly. This paper proposes a novel automated method for efficient verification on edge devices of an object detection AI, which has found practical use in various applications. In the proposed method, a target object existence detector (TOED) (a simple binary classifier) judges whether an object in the recognition target class exists in the region of a prediction difference between the AI’s operating and updated versions. Using the results of this TOED judgement and the predicted difference, an automated verification system for the updated AI was constructed. TOED was designed as a simple binary classifier with four convolutional layers, and the accuracy of object existence judgment was evaluated for the difference between the predictions of the YOLOv5 L and X models using the Cityscapes dataset. The results showed judgement with more than 99.5% accuracy and 8.6% over detection, thus indicating that a verification system adopting this method would be more efficient than simple analysis of the prediction differences.
Qingping YU You ZHANG Zhiping SHI Xingwang LI Longye WANG Ming ZENG
In this letter, a deep neural network (DNN) aided joint source-channel (JSCC) decoding scheme is proposed for polar codes. In the proposed scheme, an integrated factor graph with an unfolded structure is first designed. Then a DNN aided flooding belief propagation decoding (FBP) algorithm is proposed based on the integrated factor, in which both source and channel scaling parameters in the BP decoding are optimized for better performance. Experimental results show that, with the proposed DNN aided FBP decoder, the polar coded JSCC scheme can have about 2-2.5 dB gain over different source statistics p with source message length NSC = 128 and 0.2-1 dB gain over different source statistics p with source message length NSC = 512 over the polar coded JSCC system with existing BP decoder.
Tetsuo KOSAKA Kazuya SAEKI Yoshitaka AIZAWA Masaharu KATO Takashi NOSE
Emotional speech recognition is generally considered more difficult than non-emotional speech recognition. The acoustic characteristics of emotional speech differ from those of non-emotional speech. Additionally, acoustic characteristics vary significantly depending on the type and intensity of emotions. Regarding linguistic features, emotional and colloquial expressions are also observed in their utterances. To solve these problems, we aim to improve recognition performance by adapting acoustic and language models to emotional speech. We used Japanese Twitter-based Emotional Speech (JTES) as an emotional speech corpus. This corpus consisted of tweets and had an emotional label assigned to each utterance. Corpus adaptation is possible using the utterances contained in this corpus. However, regarding the language model, the amount of adaptation data is insufficient. To solve this problem, we propose an adaptation of the language model by using online tweet data downloaded from the internet. The sentences used for adaptation were extracted from the tweet data based on certain rules. We extracted the data of 25.86 M words and used them for adaptation. In the recognition experiments, the baseline word error rate was 36.11%, whereas that with the acoustic and language model adaptation was 17.77%. The results demonstrated the effectiveness of the proposed method.
While deep image compression performs better than traditional codecs like JPEG on natural images, it faces a challenge as a learning-based approach: compression performance drastically decreases for out-of-domain images. To investigate this problem, we introduce a novel task that we call universal deep image compression, which involves compressing images in arbitrary domains, such as natural images, line drawings, and comics. Furthermore, we propose a content-adaptive optimization framework to tackle this task. This framework adapts a pre-trained compression model to each target image during testing for addressing the domain gap between pre-training and testing. For each input image, we insert adapters into the decoder of the model and optimize the latent representation extracted by the encoder and the adapter parameters in terms of rate-distortion, with the adapter parameters transmitted per image. To achieve the evaluation of the proposed universal deep compression, we constructed a benchmark dataset containing uncompressed images of four domains: natural images, line drawings, comics, and vector arts. We compare our proposed method with non-adaptive and existing adaptive compression methods, and the results show that our method outperforms them. Our code and dataset are publicly available at https://github.com/kktsubota/universal-dic.
We investigated the influence of horizontal shifts of the input images for one stage object detection method. We found that the object detector class scores drop when the target object center is at the grid boundary. Many approaches have focused on reducing the aliasing effect of down-sampling to achieve shift-invariance. However, down-sampling does not completely solve this problem at the grid boundary; it is necessary to suppress the dispersion of features in pixels close to the grid boundary into adjacent grid cells. Therefore, this paper proposes two approaches focused on the grid boundary to improve this weak point of current object detection methods. One is the Sub-Grid Feature Extraction Module, in which the sub-grid features are added to the input of the classification head. The other is Grid-Aware Data Augmentation, where augmented data are generated by the grid-level shifts and are used in training. The effectiveness of the proposed approaches is demonstrated using the COCO validation set after applying the proposed method to the FCOS architecture.
Daniel Akira ANDO Yuya KASE Toshihiko NISHIMURA Takanori SATO Takeo OHGANE Yasutaka OGAWA Junichiro HAGIWARA
Direction of arrival (DOA) estimation is an antenna array signal processing technique used in, for instance, radar and sonar systems, source localization, and channel state information retrieval. As new applications and use cases appear with the development of next generation mobile communications systems, DOA estimation performance must be continually increased in order to support the nonstop growing demand for wireless technologies. In previous works, we verified that a deep neural network (DNN) trained offline is a strong candidate tool with the promise of achieving great on-grid DOA estimation performance, even compared to traditional algorithms. In this paper, we propose new techniques for further DOA estimation accuracy enhancement incorporating signal-to-noise ratio (SNR) prediction and an end-to-end DOA estimation system, which consists of three components: source number estimator, DOA angular spectrum grid estimator, and DOA detector. Here, we expand the performance of the DOA detector and angular spectrum estimator, and present a new solution for source number estimation based on DNN with very simple design. The proposed DNN system applied with said enhancement techniques has shown great estimation performance regarding the success rate metric for the case of two radio wave sources although not fully satisfactory results are obtained for the case of three sources.
A fully analog pipelined deep neural network (DNN) accelerator is proposed, which is constructed by using pipeline registers based on master-slave switched capacitors. The idea of the master-slave switched capacitors is an analog equivalent of the delayed flip-flop (D-FF) which has been used as a digital pipeline register. To estimate the performance of the pipeline register, it is applied to a conventional DNN which performs non-pipeline operation. Compared with the conventional DNN, the cycle time is reduced by 61.5% and data rate is increased by 160%. The accuracy reaches 99.6% in MNIST classification test. The energy consumption per classification is reduced by 88.2% to 0.128µJ, achieving an energy efficiency of 1.05TOPS/W and a throughput of 0.538TOPS in 180nm technology node.
Kazuhisa FUJIMOTO Masanori TAKADA
Neuromorphic computing with a spiking neural network (SNN) is expected to provide a complement or alternative to deep learning in the future. The challenge is to develop optimal SNN models, algorithms, and engineering technologies for real use cases. As a potential use cases for neuromorphic computing, we have investigated a person monitoring and worker support with a video surveillance system, given its status as a proven deep neural network (DNN) use case. In the future, to increase the number of cameras in such a system, we will need a scalable approach that embeds only a few neuromorphic devices in a camera. Specifically, this will require a shallow SNN model that can be implemented in a few neuromorphic devices while providing a high recognition accuracy comparable to a DNN with the same configuration. A shallow SNN was built by converting ResNet, a proven DNN for image recognition, and a new configuration of the shallow SNN model was developed to improve its accuracy. The proposed shallow SNN model was evaluated with a few neuromorphic devices, and it achieved a recognition accuracy of more than 80% with about 1/130 less energy consumption than that of a GPU with the same configuration of DNN as that of SNN.
Rong FEI Yufan GUO Junhuai LI Bo HU Lu YANG
With the widespread use of indoor positioning technology, the need for high-precision positioning services is rising; nevertheless, there are several challenges, such as the difficulty of simulating the distribution of interior location data and the enormous inaccuracy of probability computation. As a result, this paper proposes three different neural network model comparisons for indoor location based on WiFi fingerprint - indoor location algorithm based on improved back propagation neural network model, RSSI indoor location algorithm based on neural network angle change, and RSSI indoor location algorithm based on depth neural network angle change - to raise accurately predict indoor location coordinates. Changing the action range of the activation function in the standard back-propagation neural network model achieves the goal of accurately predicting location coordinates. The revised back-propagation neural network model has strong stability and enhances indoor positioning accuracy based on experimental comparisons of loss rate (loss), accuracy rate (acc), and cumulative distribution function (CDF).
Chengkai CAI Kenta IWAI Takanobu NISHIURA
The acquisition of distant sound has always been a hot research topic. Since sound is caused by vibration, one of the best methods for measuring distant sound is to use a laser Doppler vibrometer (LDV). This laser has high directivity, that enables it to acquire sound from far away, which is of great practical use for disaster relief and other situations. However, due to the vibration characteristics of the irradiated object itself and the reflectivity of its surface (or other reasons), the acquired sound is often lacking frequency components in certain frequency bands and is mixed with obvious noise. Therefore, when using LDV to acquire distant speech, if we want to recognize the actual content of the speech, it is necessary to enhance the acquired speech signal in some way. Conventional speech enhancement methods are not generally applicable due to the various types of degradation in observed speech. Moreover, while several speech enhancement methods for LDV have been proposed, they are only effective when the irradiated object is known. In this paper, we present a speech enhancement method for LDV that can deal with unknown irradiated objects. The proposed method is composed of noise reduction, pitch detection, power spectrum envelope estimation, power spectrum reconstruction, and phase estimation. Experimental results demonstrate the effectiveness of our method for enhancing the acquired speech with unknown irradiated objects.
Yangchao ZHANG Hiroaki ITSUJI Takumi UEZONO Tadanobu TOBA Masanori HASHIMOTO
The reliability of deep neural networks (DNN) against hardware errors is essential as DNNs are increasingly employed in safety-critical applications such as automatic driving. Transient errors in memory, such as radiation-induced soft error, may propagate through the inference computation, resulting in unexpected output, which can adversely trigger catastrophic system failures. As a first step to tackle this problem, this paper proposes constructing a vulnerability model (VM) with a small number of fault injections to identify vulnerable model parameters in DNN. We reduce the number of bit locations for fault injection significantly and develop a flow to incrementally collect the training data, i.e., the fault injection results, for VM accuracy improvement. We enumerate key features (KF) that characterize the vulnerability of the parameters and use KF and the collected training data to construct VM. Experimental results show that VM can estimate vulnerabilities of all DNN model parameters only with 1/3490 computations compared with traditional fault injection-based vulnerability estimation.
Mobile communication systems are not only the core of the Information and Communication Technology (ICT) infrastructure but also that of our social infrastructure. The 5th generation mobile communication system (5G) has already started and is in use. 5G is expected for various use cases in industry and society. Thus, many companies and research institutes are now trying to improve the performance of 5G, that is, 5G Enhancement and the next generation of mobile communication systems (Beyond 5G (6G)). 6G is expected to meet various highly demanding requirements even compared with 5G, such as extremely high data rate, extremely large coverage, extremely low latency, extremely low energy, extremely high reliability, extreme massive connectivity, and so on. Artificial intelligence (AI) and machine learning (ML), AI/ML, will have more important roles than ever in 6G wireless communications with the above extreme high requirements for a diversity of applications, including new combinations of the requirements for new use cases. We can say that AI/ML will be essential for 6G wireless communications. This paper introduces some ML techniques and applications in 6G wireless communications, mainly focusing on the physical layer.
Deep neural networks (DNNs) perform well for image recognition, speech recognition, and pattern analysis. However, such neural networks are vulnerable to adversarial examples. An adversarial example is a data sample created by adding a small amount of noise to an original sample in such a way that it is difficult for humans to identify but that will cause the sample to be misclassified by a target model. In a military environment, adversarial examples that are correctly classified by a friendly model while deceiving an enemy model may be useful. In this paper, we propose a method for generating a selective adversarial example that is correctly classified by a friendly gait recognition system and misclassified by an enemy gait recognition system. The proposed scheme generates the selective adversarial example by combining the loss for correct classification by the friendly gait recognition system with the loss for misclassification by the enemy gait recognition system. In our experiments, we used the CASIA Gait Database as the dataset and TensorFlow as the machine learning library. The results show that the proposed method can generate selective adversarial examples that have a 98.5% attack success rate against an enemy gait recognition system and are classified with 87.3% accuracy by a friendly gait recognition system.
Ruxue GUO Pengxu JIANG Ruiyu LIANG Yue XIE Cairong ZOU
For a long time, the compensation effect of hearing aid is mainly evaluated subjectively, and there are fewer studies of objective evaluation. Furthermore, a pure speech signal is generally required as a reference in the existing objective evaluation methods, which restricts the practicality in a real-world environment. Therefore, this paper presents a non-intrusive speech quality evaluation method for hearing aid, which combines the audiogram and weighted frequency information. The proposed model mainly includes an audiogram information extraction network, a frequency information extraction network, and a quality score mapping network. The audiogram is the input of the audiogram information extraction network, which helps the system capture the information related to hearing loss. In addition, the low-frequency bands of speech contain loudness information and the medium and high-frequency components contribute to semantic comprehension. The information of two frequency bands is input to the frequency information extraction network to obtain time-frequency information. When obtaining the high-level features of different frequency bands and audiograms, they are fused into two groups of tensors that distinguish the information of different frequency bands and used as the input of the attention layer to calculate the corresponding weight distribution. Finally, a dense layer is employed to predict the score of speech quality. The experimental results show that it is reasonable to combine the audiogram and the weight of the information from two frequency bands, which can effectively realize the evaluation of the speech quality of the hearing aid.
Deep neural networks show good performance in image recognition, speech recognition, and pattern analysis. However, deep neural networks also have weaknesses, one of which is vulnerability to poisoning attacks. A poisoning attack reduces the accuracy of a model by training the model on malicious data. A number of studies have been conducted on such poisoning attacks. The existing type of poisoning attack causes misrecognition by one classifier. In certain situations, however, it is necessary for multiple models to misrecognize certain data as different specific classes. For example, if there are enemy autonomous vehicles A, B, and C, a poisoning attack could mislead A to turn to the left, B to stop, and C to turn to the right simply by using a traffic sign. In this paper, we propose a multi-targeted poisoning attack method that causes each of several models to misrecognize certain data as a different target class. This study used MNIST and CIFAR10 as datasets and Tensorflow as a machine learning library. The experimental results show that the proposed scheme has a 100% average attack success rate on MNIST and CIFAR10 when malicious data accounting for 5% of the training dataset have been used for training.
In this paper, we propose a selective membership inference attack method that determines whether certain data corresponding to a specific class are being used as training data for a machine learning model or not. By using the proposed method, membership or non-membership can be inferred by generating a decision model from the prediction of the inference models and training the confidence values for the data corresponding to the selected class. We used MNIST as an experimental dataset and Tensorflow as a machine learning library. Experimental results show that the proposed method has a 92.4% success rate with 5 inference models for data corresponding to a specific class.
This paper presents a channel operating margin (COM) based high-speed serial link optimization using machine learning (ML). COM that is proposed for evaluating serial link is calculated at first and during the calculation several important equalization parameters corresponding to the best configuration are extracted which can be used for the ML modeling of serial link. Then a deep neural network containing hidden layers are investigated to model a whole serial link equalization including transmitter feed forward equalizer (FFE), receiver continuous time linear equalizer (CTLE) and decision feedback equalizer (DFE). By training, validating and testing a lot of samples that meet the COM specification of 400GAUI-8 C2C, an effective ML model is generated and the maximum relative error is only 0.1 compared with computation results. At last 3 link configurations are discussed from the view of tradeoff between the link performance and cost, illustrating that our COM based ML modeling method can be applied to advanced serial link design for NRZ, PAM4 or even other higher level pulse amplitude modulation signal.
Hyun KWON Changhyun CHO Jun LEE
Deep neural networks (DNNs) provide excellent services in machine learning tasks such as image recognition, speech recognition, pattern recognition, and intrusion detection. However, an adversarial example created by adding a little noise to the original data can result in misclassification by the DNN and the human eye cannot tell the difference from the original data. For example, if an attacker creates a modified right-turn traffic sign that is incorrectly categorized by a DNN, an autonomous vehicle with the DNN will incorrectly classify the modified right-turn traffic sign as a U-Turn sign, while a human will correctly classify that changed sign as right turn sign. Such an adversarial example is a serious threat to a DNN. Recently, an adversarial example with multiple targets was introduced that causes misclassification by multiple models within each target class using a single modified image. However, it has the weakness that as the number of target models increases, the overall attack success rate decreases. Therefore, if there are multiple models that the attacker wishes to attack, the attacker must control the attack success rate for each model by considering the attack priority for each model. In this paper, we propose a priority adversarial example that considers the attack priority for each model in cases targeting multiple models. The proposed method controls the attack success rate for each model by adjusting the weight of the attack function in the generation process while maintaining minimal distortion. We used MNIST and CIFAR10 as data sets and Tensorflow as machine learning library. Experimental results show that the proposed method can control the attack success rate for each model by considering each model's attack priority while maintaining minimal distortion (average 3.95 and 2.45 with MNIST for targeted and untargeted attacks, respectively, and average 51.95 and 44.45 with CIFAR10 for targeted and untargeted attacks, respectively).
In recent years, deep neural networks (DNNs) have made a significant impact on a variety of research fields and applications. One drawback of DNNs is that it requires a huge amount of dataset for training. Since it is very expensive to ask experts to label the data, many non-expert data collection methods such as web crawling have been proposed. However, dataset created by non-experts often contain corrupted labels, and DNNs trained on such dataset are unreliable. Since DNNs have an enormous number of parameters, it tends to overfit to noisy labels, resulting in poor generalization performance. This problem is called Learning with Noisy labels (LNL). Recent studies showed that DNNs are robust to the noisy labels in the early stage of learning before over-fitting to noisy labels because DNNs learn the simple patterns first. Therefore DNNs tend to output true labels for samples with noisy labels in the early stage of learning, and the number of false predictions for samples with noisy labels is higher than for samples with clean labels. Based on these observations, we propose a new sample selection approach for LNL using the number of false predictions. Our method periodically collects the records of false predictions during training, and select samples with a low number of false predictions from the recent records. Then our method iteratively performs sample selection and training a DNNs model using the updated dataset. Since the model is trained with more clean samples and records more accurate false predictions for sample selection, the generalization performance of the model gradually increases. We evaluated our method on two benchmark datasets, CIFAR-10 and CIFAR-100 with synthetically generated noisy labels, and the obtained results which are better than or comparative to the-state-of-the-art approaches.