Hiroyuki NOZAKA Kosuke KAMATA Kazufumi YAMAGATA
The data augmentation method is known as a helpful technique to generate a dataset with a large number of images from one with a small number of images for supervised training in deep learning. However, a low validity augmentation method for image recognition was reported in a recent study on artificial intelligence (AI). This study aimed to clarify the optimal data augmentation method in deep learning model generation for the recognition of white blood cells (WBCs). Study Design: We conducted three different data augmentation methods (rotation, scaling, and distortion) on original WBC images, with each AI model for WBC recognition generated by supervised training. The subjects of the clinical assessment were 51 healthy persons. Thin-layer blood smears were prepared from peripheral blood and subjected to May-Grünwald-Giemsa staining. Results: The only significantly effective technique among the AI models for WBC recognition was data augmentation with rotation. By contrast, the effectiveness of both image distortion and image scaling was poor, and improved accuracy was limited to a specific WBC subcategory. Conclusion: Although data augmentation methods are often used for achieving high accuracy in AI generation with supervised training, we consider that it is necessary to select the optimal data augmentation method for medical AI generation based on the characteristics of medical images.
Chengkai CAI Kenta IWAI Takanobu NISHIURA
The acquisition of distant sound has always been a hot research topic. Since sound is caused by vibration, one of the best methods for measuring distant sound is to use a laser Doppler vibrometer (LDV). This laser has high directivity, that enables it to acquire sound from far away, which is of great practical use for disaster relief and other situations. However, due to the vibration characteristics of the irradiated object itself and the reflectivity of its surface (or other reasons), the acquired sound is often lacking frequency components in certain frequency bands and is mixed with obvious noise. Therefore, when using LDV to acquire distant speech, if we want to recognize the actual content of the speech, it is necessary to enhance the acquired speech signal in some way. Conventional speech enhancement methods are not generally applicable due to the various types of degradation in observed speech. Moreover, while several speech enhancement methods for LDV have been proposed, they are only effective when the irradiated object is known. In this paper, we present a speech enhancement method for LDV that can deal with unknown irradiated objects. The proposed method is composed of noise reduction, pitch detection, power spectrum envelope estimation, power spectrum reconstruction, and phase estimation. Experimental results demonstrate the effectiveness of our method for enhancing the acquired speech with unknown irradiated objects.
Daiki NISHIYAMA Kazuto FUKUCHI Youhei AKIMOTO Jun SAKUMA
In real world applications of multiclass classification models, misclassification in an important class (e.g., stop sign) can be significantly more harmful than in other classes (e.g., no parking). Thus, it is crucial to improve the recall of an important class while maintaining overall accuracy. For this problem, we found that improving the separation of important classes relative to other classes in the feature space is effective. Existing methods that give a class-sensitive penalty for cross-entropy loss do not improve the separation. Moreover, the methods designed to improve separations between all classes are unsuitable for our purpose because they do not consider the important classes. To achieve the separation, we propose a loss function that explicitly gives loss for the feature space, called class-sensitive additive angular margin (CAMRI) loss. CAMRI loss is expected to reduce the variance of an important class due to the addition of a penalty to the angle between the important class features and the corresponding weight vectors in the feature space. In addition, concentrating the penalty on only the important class hardly sacrifices separating the other classes. Experiments on CIFAR-10, GTSRB, and AwA2 showed that CAMRI loss could improve the recall of a specific class without sacrificing accuracy. In particular, compared with GTSRB's second-worst class recall when trained with cross-entropy loss, CAMRI loss improved recall by 9%.
Yuto OMAE Yuki SAITO Yohei KAKIMOTO Daisuke FUKAMACHI Koichi NAGASHIMA Yasuo OKUMURA Jun TOYOTANI
In this article, a GUI system is proposed to support clinical cardiology examinations. The proposed system estimates “pulmonary artery wedge pressure” based on patients' chest radiographs using an explainable regression-based convolutional neural network. The GUI system was validated by performing an effectiveness survey with 23 cardiology physicians with medical licenses. The results indicated that many physicians considered the GUI system to be effective.
Yangchao ZHANG Hiroaki ITSUJI Takumi UEZONO Tadanobu TOBA Masanori HASHIMOTO
The reliability of deep neural networks (DNN) against hardware errors is essential as DNNs are increasingly employed in safety-critical applications such as automatic driving. Transient errors in memory, such as radiation-induced soft error, may propagate through the inference computation, resulting in unexpected output, which can adversely trigger catastrophic system failures. As a first step to tackle this problem, this paper proposes constructing a vulnerability model (VM) with a small number of fault injections to identify vulnerable model parameters in DNN. We reduce the number of bit locations for fault injection significantly and develop a flow to incrementally collect the training data, i.e., the fault injection results, for VM accuracy improvement. We enumerate key features (KF) that characterize the vulnerability of the parameters and use KF and the collected training data to construct VM. Experimental results show that VM can estimate vulnerabilities of all DNN model parameters only with 1/3490 computations compared with traditional fault injection-based vulnerability estimation.
Mobile communication systems are not only the core of the Information and Communication Technology (ICT) infrastructure but also that of our social infrastructure. The 5th generation mobile communication system (5G) has already started and is in use. 5G is expected for various use cases in industry and society. Thus, many companies and research institutes are now trying to improve the performance of 5G, that is, 5G Enhancement and the next generation of mobile communication systems (Beyond 5G (6G)). 6G is expected to meet various highly demanding requirements even compared with 5G, such as extremely high data rate, extremely large coverage, extremely low latency, extremely low energy, extremely high reliability, extreme massive connectivity, and so on. Artificial intelligence (AI) and machine learning (ML), AI/ML, will have more important roles than ever in 6G wireless communications with the above extreme high requirements for a diversity of applications, including new combinations of the requirements for new use cases. We can say that AI/ML will be essential for 6G wireless communications. This paper introduces some ML techniques and applications in 6G wireless communications, mainly focusing on the physical layer.
Deep neural networks (DNNs) perform well for image recognition, speech recognition, and pattern analysis. However, such neural networks are vulnerable to adversarial examples. An adversarial example is a data sample created by adding a small amount of noise to an original sample in such a way that it is difficult for humans to identify but that will cause the sample to be misclassified by a target model. In a military environment, adversarial examples that are correctly classified by a friendly model while deceiving an enemy model may be useful. In this paper, we propose a method for generating a selective adversarial example that is correctly classified by a friendly gait recognition system and misclassified by an enemy gait recognition system. The proposed scheme generates the selective adversarial example by combining the loss for correct classification by the friendly gait recognition system with the loss for misclassification by the enemy gait recognition system. In our experiments, we used the CASIA Gait Database as the dataset and TensorFlow as the machine learning library. The results show that the proposed method can generate selective adversarial examples that have a 98.5% attack success rate against an enemy gait recognition system and are classified with 87.3% accuracy by a friendly gait recognition system.
Naoya MURAMATSU Hai-Tao YU Tetsuji SATOH
With the continued innovation of deep neural networks, spiking neural networks (SNNs) that more closely resemble biological brain synapses have attracted attention because of their low power consumption. Unlike artificial neural networks (ANNs), for continuous data values, they must employ an encoding process to convert the values to spike trains, suppressing the SNN's performance. To avoid this degradation, the incoming analog signal must be regulated prior to the encoding process, which is also realized in living things eg, the basement membranes of humans mechanically perform the Fourier transform. To this end, we combine an ANN and an SNN to build ANN-to-SNN hybrid neural networks (HNNs) that improve the concerned performance. To qualify this performance and robustness, MNIST and CIFAR-10 image datasets are used for various classification tasks in which the training and encoding methods changes. In addition, we present simultaneous and separate training methods for the artificial and spiking layers, considering the encoding methods of each. We find that increasing the number of artificial layers at the expense of spiking layers improves the HNN performance. For straightforward datasets such as MNIST, similar performances as ANN's are achieved by using duplicate coding and separate learning. However, for more complex tasks, the use of Gaussian coding and simultaneous learning is found to improve the accuracy of the HNN while lower power consumption.
Ze Fu GAO Hai Cheng TAO Qin Yu ZHU Yi Wen JIAO Dong LI Fei Long MAO Chao LI Yi Tong SI Yu Xin WANG
Aiming at the problem of non-line of sight (NLOS) signal recognition for Ultra Wide Band (UWB) positioning, we utilize the concepts of Neural Network Clustering and Neural Network Pattern Recognition. We propose a classification algorithm based on self-organizing feature mapping (SOM) neural network batch processing, and a recognition algorithm based on convolutional neural network (CNN). By assigning different weights to learning, training and testing parts in the data set of UWB location signals with given known patterns, a strong NLOS signal recognizer is trained to minimize the recognition error rate. Finally, the proposed NLOS signal recognition algorithm is verified using data sets from real scenarios. The test results show that the proposed algorithm can solve the problem of UWB NLOS signal recognition under strong signal interference. The simulation results illustrate that the proposed algorithm is significantly more effective compared with other algorithms.
Ruxue GUO Pengxu JIANG Ruiyu LIANG Yue XIE Cairong ZOU
For a long time, the compensation effect of hearing aid is mainly evaluated subjectively, and there are fewer studies of objective evaluation. Furthermore, a pure speech signal is generally required as a reference in the existing objective evaluation methods, which restricts the practicality in a real-world environment. Therefore, this paper presents a non-intrusive speech quality evaluation method for hearing aid, which combines the audiogram and weighted frequency information. The proposed model mainly includes an audiogram information extraction network, a frequency information extraction network, and a quality score mapping network. The audiogram is the input of the audiogram information extraction network, which helps the system capture the information related to hearing loss. In addition, the low-frequency bands of speech contain loudness information and the medium and high-frequency components contribute to semantic comprehension. The information of two frequency bands is input to the frequency information extraction network to obtain time-frequency information. When obtaining the high-level features of different frequency bands and audiograms, they are fused into two groups of tensors that distinguish the information of different frequency bands and used as the input of the attention layer to calculate the corresponding weight distribution. Finally, a dense layer is employed to predict the score of speech quality. The experimental results show that it is reasonable to combine the audiogram and the weight of the information from two frequency bands, which can effectively realize the evaluation of the speech quality of the hearing aid.
Daiki TODA Ren ANZAI Koichi ICHIGE Ryo SAITO Daichi UEKI
A method of radar-based contactless vital-sign sensing and electrocardiogram (ECG) signal reconstruction using deep learning is proposed. A radar system is an effective tool for contactless vital-sign sensing because it can measure a small displacement of the body surface without contact. However, most of the conventional methods have limited evaluation indices and measurement conditions. A method of measuring body-surface-displacement signals by using frequency-modulated continuous-wave (FMCW) radar and reconstructing ECG signals using a convolutional neural network (CNN) is proposed. This study conducted two experiments. First, we trained a model using the data obtained from six subjects breathing in a seated condition. Second, we added sine wave noise to the data and trained the model again. The proposed model is evaluated with a correlation coefficient between the reconstructed and actual ECG signal. The results of first experiment show that their ECG signals are successfully reconstructed by using the proposed method. That of second experiment show that the proposed method can reconstruct signal waveforms even in an environment with low signal-to-noise ratio (SNR).
Huaijin DENG Takehito UTSURO Akio KOBAYASHI Hiromitsu NISHIZAKI
There have been lots of previous studies on fluency evaluation of spontaneous speech. However, most of them focus on lexical cues, and little emphasis is placed on how diverse acoustic features and deep end-to-end models contribute to improving the performance. In this paper, we describe multi-layer neural network to investigate not only lexical features extracted from transcription, but also consider utterance-level acoustic features from audio data. We also conduct the experiments to investigate the performance of end-to-end approaches with mel-spectrogram in this task. As the speech fluency evaluation task, we evaluate our proposed method in two binary classification tasks of fluent speech detection and disfluent speech detection. Speech data of around 10 seconds duration each with the annotation of the three classes of “fluent,” “neutral,” and “disfluent” is used for evaluation. According to the two way splits of those three classes, the task of fluent speech detection is defined as binary classification of fluent vs. neutral and disfluent, while that of disfluent speech detection is defined as binary classification of fluent and neutral vs. disfluent. We then conduct experiments with the purpose of comparative evaluation of multi-layer neural network with diverse features as well as end-to-end models. For the fluent speech detection, in the comparison of utterance-level disfluency-based, prosodic, and acoustic features with multi-layer neural network, disfluency-based and prosodic features only are better. More specifically, the performance improved a lot when removing all of the acoustic features from the full set of features, while the performance is damaged a lot if fillers related features are removed. Overall, however, the end-to-end Transformer+VGGNet model with mel-spectrogram achieves the best results. For the disfluent speech detection, the multi-layer neural network using disfluency-based, prosodic, and acoustic features without fillers achieves the best results. The end-to-end Transformer+VGGNet architecture also obtains high scores, whereas it is exceeded by the best results with the multi-layer neural network with significant difference. Thus, unlike in the fluent speech detection, disfluency-based and prosodic features other than fillers are still necessary in the disfluent speech detection.
Yuexi YAO Tao LU Kanghui ZHAO Yanduo ZHANG Yu WANG
Recently, the face hallucination method based on deep learning understands the mapping between low-resolution (LR) and high-resolution (HR) facial patterns by exploring the priors of facial structure. However, how to maintain the face structure consistency after the reconstruction of face images at different scales is still a challenging problem. In this letter, we propose a novel multi-scale structure prior learning (MSPL) for face hallucination. First, we propose a multi-scale structure prior block (MSPB). Considering the loss of high-frequency information in the LR space, we mainly process the input image in three different scale ascending dimensional spaces, and map the image to the high dimensional space to extract multi-scale structural prior information. Then the size of feature maps is recovered by downsampling, and finally the multi-scale information is fused to restore the feature channels. On this basis, we propose a local detail attention module (LDAM) to focus on the local texture information of faces. We conduct extensive face hallucination reconstruction experiments on a public face dataset (LFW) to verify the effectiveness of our method.
Takeshi SENOO Akira JINGUJI Ryosuke KURAMOCHI Hiroki NAKAHARA
Multilayer perceptron (MLP) is a basic neural network model that is used in practical industrial applications, such as network intrusion detection (NID) systems. It is also used as a building block in newer models, such as gMLP. Currently, there is a demand for fast training in NID and other areas. However, in training with numerous GPUs, the problems of power consumption and long training times arise. Many of the latest deep neural network (DNN) models and MLPs are trained using a backpropagation algorithm which transmits an error gradient from the output layer to the input layer such that in the sequential computation, the next input cannot be processed until the weights of all layers are updated from the last layer. This is known as backward locking. In this study, a weight parameter update mechanism is proposed with time delays that can accommodate the weight update delay to allow simultaneous forward and backward computation. To this end, a one-dimensional systolic array structure was designed on a Xilinx U50 Alveo FPGA card in which each layer of the MLP is assigned to a processing element (PE). The time-delay backpropagation algorithm executes all layers in parallel, and transfers data between layers in a pipeline. Compared to the Intel Core i9 CPU and NVIDIA RTX 3090 GPU, it is 3 times faster than the CPU and 2.5 times faster than the GPU. The processing speed per power consumption is 11.5 times better than that of the CPU and 21.4 times better than that of the GPU. From these results, it is concluded that a training accelerator on an FPGA can achieve high speed and energy efficiency.
Kai YAN Tiejun ZHAO Muyun YANG
Graph layout is a critical component in graph visualization. This paper proposes GRAPHULY, a graph u-nets-based neural network, for end-to-end graph layout generation. GRAPHULY learns the multi-level graph layout process and can generate graph layouts without iterative calculation. We also propose to use Laplacian positional encoding and a multi-level loss fusion strategy to improve the layout learning. We evaluate the model with a random dataset and a graph drawing dataset and showcase the effectiveness and efficiency of GRAPHULY in graph visualization.
Yoshiharu YAMAGISHI Tatsuya KANEKO Megumi AKAI-KASAYA Tetsuya ASAI
Edge computing, which has been gaining attention in recent years, has many advantages, such as reducing the load on the cloud, not being affected by the communication environment, and providing excellent security. Therefore, many researchers have attempted to implement neural networks, which are representative of machine learning in edge computing. Neural networks can be divided into inference and learning parts; however, there has been little research on implementing the learning component in edge computing in contrast to the inference part. This is because learning requires more memory and computation than inference, easily exceeding the limit of resources available for edge computing. To overcome this problem, this research focuses on the optimizer, which is the heart of learning. In this paper, we introduce our new optimizer, hardware-oriented logarithmic momentum estimation (Holmes), which incorporates new perspectives not found in existing optimizers in terms of characteristics and strengths of hardware. The performance of Holmes was evaluated by comparing it with other optimizers with respect to learning progress and convergence speed. Important aspects of hardware implementation, such as memory and operation requirements are also discussed. The results show that Holmes is a good match for edge computing with relatively low resource requirements and fast learning convergence. Holmes will help create an era in which advanced machine learning can be realized on edge computing.
Hyunghoon KIM Jiwoo SHIN Hyo Jin JO
In various studies of attacks on autonomous vehicles (AVs), a phantom attack in which advanced driver assistance system (ADAS) misclassifies a fake object created by an adversary as a real object has been proposed. In this paper, we propose F-GhostBusters, which is an improved version of GhostBusters that detects phantom attacks. The proposed model uses a new feature, i.e, frequency of images. Experimental results show that F-GhostBusters not only improves the detection performance of GhostBusters but also can complement the accuracy against adversarial examples.
This paper presents a channel operating margin (COM) based high-speed serial link optimization using machine learning (ML). COM that is proposed for evaluating serial link is calculated at first and during the calculation several important equalization parameters corresponding to the best configuration are extracted which can be used for the ML modeling of serial link. Then a deep neural network containing hidden layers are investigated to model a whole serial link equalization including transmitter feed forward equalizer (FFE), receiver continuous time linear equalizer (CTLE) and decision feedback equalizer (DFE). By training, validating and testing a lot of samples that meet the COM specification of 400GAUI-8 C2C, an effective ML model is generated and the maximum relative error is only 0.1 compared with computation results. At last 3 link configurations are discussed from the view of tradeoff between the link performance and cost, illustrating that our COM based ML modeling method can be applied to advanced serial link design for NRZ, PAM4 or even other higher level pulse amplitude modulation signal.
Hyun KWON Changhyun CHO Jun LEE
Deep neural networks (DNNs) provide excellent services in machine learning tasks such as image recognition, speech recognition, pattern recognition, and intrusion detection. However, an adversarial example created by adding a little noise to the original data can result in misclassification by the DNN and the human eye cannot tell the difference from the original data. For example, if an attacker creates a modified right-turn traffic sign that is incorrectly categorized by a DNN, an autonomous vehicle with the DNN will incorrectly classify the modified right-turn traffic sign as a U-Turn sign, while a human will correctly classify that changed sign as right turn sign. Such an adversarial example is a serious threat to a DNN. Recently, an adversarial example with multiple targets was introduced that causes misclassification by multiple models within each target class using a single modified image. However, it has the weakness that as the number of target models increases, the overall attack success rate decreases. Therefore, if there are multiple models that the attacker wishes to attack, the attacker must control the attack success rate for each model by considering the attack priority for each model. In this paper, we propose a priority adversarial example that considers the attack priority for each model in cases targeting multiple models. The proposed method controls the attack success rate for each model by adjusting the weight of the attack function in the generation process while maintaining minimal distortion. We used MNIST and CIFAR10 as data sets and Tensorflow as machine learning library. Experimental results show that the proposed method can control the attack success rate for each model by considering each model's attack priority while maintaining minimal distortion (average 3.95 and 2.45 with MNIST for targeted and untargeted attacks, respectively, and average 51.95 and 44.45 with CIFAR10 for targeted and untargeted attacks, respectively).
In this paper, we propose a selective membership inference attack method that determines whether certain data corresponding to a specific class are being used as training data for a machine learning model or not. By using the proposed method, membership or non-membership can be inferred by generating a decision model from the prediction of the inference models and training the confidence values for the data corresponding to the selected class. We used MNIST as an experimental dataset and Tensorflow as a machine learning library. Experimental results show that the proposed method has a 92.4% success rate with 5 inference models for data corresponding to a specific class.