Thi Thu Thao KHONG Takashi NAKADA Yasuhiko NAKASHIMA
Adversarial attacks are viewed as a danger to Deep Neural Networks (DNNs), which reveal a weakness of deep learning models in security-critical applications. Recent findings have been presented adversarial training as an outstanding defense method against adversaries. Nonetheless, adversarial training is a challenge with respect to big datasets and large networks. It is believed that, unless making DNN architectures larger, DNNs would be hard to strengthen the robustness to adversarial examples. In order to avoid iteratively adversarial training, our algorithm is Bayes without Bayesian Learning (BwoBL) that performs the ensemble inference to improve the robustness. As an application of transfer learning, we use learned parameters of pretrained DNNs to build Bayesian Neural Networks (BNNs) and focus on Bayesian inference without costing Bayesian learning. In comparison with no adversarial training, our method is more robust than activation functions designed to enhance adversarial robustness. Moreover, BwoBL can easily integrate into any pretrained DNN, not only Convolutional Neural Networks (CNNs) but also other DNNs, such as Self-Attention Networks (SANs) that outperform convolutional counterparts. BwoBL is also convenient to apply to scaling networks, e.g., ResNet and EfficientNet, with better performance. Especially, our algorithm employs a variety of DNN architectures to construct BNNs against a diversity of adversarial attacks on a large-scale dataset. In particular, under l∞ norm PGD attack of pixel perturbation ε=4/255 with 100 iterations on ImageNet, our proposal in ResNets, SANs, and EfficientNets increase by 58.18% top-5 accuracy on average, which are combined with naturally pretrained ResNets, SANs, and EfficientNets. This enhancement is 62.26% on average below l2 norm C&W attack. The combination of our proposed method with pretrained EfficientNets on both natural and adversarial images (EfficientNet-ADV) drastically boosts the robustness resisting PGD and C&W attacks without additional training. Our EfficientNet-ADV-B7 achieves the cutting-edge top-5 accuracy, which is 92.14% and 94.20% on adversarial ImageNet generated by powerful PGD and C&W attacks, respectively.
Ryosuke ADACHI Yuh YAMASHITA Koichi KOBAYASHI
This paper addresses distributed optimal estimation over wireless sensor networks with scalable communications. For realizing scalable communication, a data-aggregation method is introduced. Since our previously proposed method cannot guarantee the global optimality of each estimator, a modified protocol is proposed. A modification of the proposed method is that weights are introduced in the data aggregation. For selecting the weight values in the data aggregation, a redundant output reduction method with minimum covariance is discussed. Based on the proposed protocol, all estimators can calculate the optimal estimate. Finally, numerical simulations show that the proposed method can realize both the scalability of communication and high accuracy estimation.
Kosei OZEKI Naofumi AOKI Saki ANAZAWA Yoshinori DOBASHI Kenichi IKEDA Hiroshi YASUDA
This study has developed a system that performs data communications using high frequency bands of sound signals. Unlike radio communication systems using advanced wireless devices, it only requires the legacy devices such as microphones and speakers employed in ordinary telephony communication systems. In this study, we have investigated the possibility of a machine learning approach to improve the recognition accuracy identifying binary symbols exchanged through sound media. This paper describes some experimental results evaluating the performance of our proposed technique employing a neural network as its classifier of binary symbols. The experimental results indicate that the proposed technique may have a certain appropriateness for designing an optimal classifier for the symbol identification task.
Hyungjin CHO Seongmin PARK Youngkwon PARK Bomin CHOI Dowon KIM Kangbin YIM
In Feb 2021, As the competition for commercialization of 5G mobile communication has been increasing, 5G SA Network and Vo5G are expected to be commercialized soon. 5G mobile communication aims to provide 20 Gbps transmission speed which is 20 times faster than 4G mobile communication, connection of at least 1 million devices per 1 km2, and 1 ms transmission delay which is 10 times shorter than 4G. To meet this, various technological developments were required, and various technologies such as Massive MIMO (Multiple-Input and Multiple-Output), mmWave, and small cell network were developed and applied in the area of 5G access network. However, in the core network area, the components constituting the LTE (Long Term Evolution) core network are utilized as they are in the NSA (Non-Standalone) architecture, and only the changes in the SA (Standalone) architecture have occurred. Also, in the network area for providing the voice service, the IMS (IP Multimedia Subsystem) infrastructure is still used in the SA architecture. Here, the issue is that while 5G mobile communication is evolving openly to provide various services, security elements are vulnerable to various cyber-attacks because they maintain the same form as before. Therefore, in this paper, we will look at what the network standard for 5G voice service provision consists of, and what are the vulnerable problems in terms of security. And We Suggest Possible Attack Scenario using Security Issue, We also want to consider whether these problems can actually occur and what is the countermeasure.
Naoki HATTORI Jun SHIOMI Yutaka MASUDA Tohru ISHIHARA Akihiko SHINYA Masaya NOTOMI
With the rapid progress of the integrated nanophotonics technology, the optical neural network architecture has been widely investigated. Since the optical neural network can complete the inference processing just by propagating the optical signal in the network, it is expected more than one order of magnitude faster than the electronics-only implementation of artificial neural networks (ANN). In this paper, we first propose an optical vector-matrix multiplication (VMM) circuit using wavelength division multiplexing, which enables inference processing at the speed of light with ultra-wideband. This paper next proposes optoelectronic circuit implementation for batch normalization and activation function, which significantly improves the accuracy of the inference processing without sacrificing the speed performance. Finally, using a virtual environment for machine learning and an optoelectronic circuit simulator, we demonstrate the ultra-fast and accurate operation of the optical-electronic ANN circuit.
Hongjie XU Jun SHIOMI Hidetoshi ONODERA
Hardware accelerators are designed to support a specialized processing dataflow for everchanging deep neural networks (DNNs) under various processing environments. This paper introduces two hardware properties to describe the cost of data movement in each memory hierarchy. Based on the hardware properties, this paper proposes a set of evaluation metrics that are able to evaluate the number of memory accesses and the required memory capacity according to the specialized processing dataflow. Proposed metrics are able to analytically predict energy, throughput, and area of a hardware design without detailed implementation. Once a processing dataflow and constraints of hardware resources are determined, the proposed evaluation metrics quickly quantify the expected hardware benefits, thereby reducing design time.
To cope with complicated interference scenarios in realistic acoustic environment, supervised deep neural networks (DNNs) are investigated to estimate different user-defined targets. Such techniques can be broadly categorized into magnitude estimation and time-frequency mask estimation techniques. Further, the mask such as the Wiener gain can be estimated directly or derived by the estimated interference power spectral density (PSD) or the estimated signal-to-interference ratio (SIR). In this paper, we propose to incorporate the multi-task learning in DNN-based single-channel speech enhancement by using the speech presence probability (SPP) as a secondary target to assist the target estimation in the main task. The domain-specific information is shared between two tasks to learn a more generalizable representation. Since the performance of multi-task network is sensitive to the weight parameters of loss function, the homoscedastic uncertainty is introduced to adaptively learn the weights, which is proven to outperform the fixed weighting method. Simulation results show the proposed multi-task scheme improves the speech enhancement performance overall compared to the conventional single-task methods. And the joint direct mask and SPP estimation yields the best performance among all the considered techniques.
Mariana RODRIGUES MAKIUCHI Tifani WARNITA Nakamasa INOUE Koichi SHINODA Michitaka YOSHIMURA Momoko KITAZAWA Kei FUNAKI Yoko EGUCHI Taishiro KISHIMOTO
We propose a non-invasive and cost-effective method to automatically detect dementia by utilizing solely speech audio data. We extract paralinguistic features for a short speech segment and use Gated Convolutional Neural Networks (GCNN) to classify it into dementia or healthy. We evaluate our method on the Pitt Corpus and on our own dataset, the PROMPT Database. Our method yields the accuracy of 73.1% on the Pitt Corpus using an average of 114 seconds of speech data. In the PROMPT Database, our method yields the accuracy of 74.7% using 4 seconds of speech data and it improves to 80.8% when we use all the patient's speech data. Furthermore, we evaluate our method on a three-class classification problem in which we included the Mild Cognitive Impairment (MCI) class and achieved the accuracy of 60.6% with 40 seconds of speech data.
Jiao GUAN Jueping CAI Ruilian XIE Yequn WANG Jinzhi LAI
This letter presents an oblivious and load-balanced routing (OLBR) method without virtual channels for 2D mesh Network-on-chip (NoC). To balance the traffic load of network and avoid deadlock, OLBR divides network nodes into two regions, one region contains the nodes of east and west sides of NoC, in which packets are routed by odd-even turn rule with Y direction preference (OE-YX), and the remaining nodes are divided to the other region, in which packets are routed by odd-even turn rule with alterable priority arbitration (OE-APA). Simulation results show that OLBR's saturation throughput can be improved than related works by 11.73% and OLBR balances the traffic load over entire network.
Lin CAO Kaixuan LI Kangning DU Yanan GUO Peiran SONG Tao WANG Chong FU
Face sketch synthesis refers to transform facial photos into sketches. Recent research on face sketch synthesis has achieved great success due to the development of Generative Adversarial Networks (GAN). However, these generative methods prone to neglect detailed information and thus lose some individual specific features, such as glasses and headdresses. In this paper, we propose a novel method called Feature Learning Generative Adversarial Network (FL-GAN) to synthesize detail-preserving high-quality sketches. Precisely, the proposed FL-GAN consists of one Feature Learning (FL) module and one Adversarial Learning (AL) module. The FL module aims to learn the detailed information of the image in a latent space, and guide the AL module to synthesize detail-preserving sketch. The AL Module aims to learn the structure and texture of sketch and improve the quality of synthetic sketch by adversarial learning strategy. Quantitative and qualitative comparisons with seven state-of-the-art methods such as the LLE, the MRF, the MWF, the RSLCR, the RL, the FCN and the GAN on four facial sketch datasets demonstrate the superiority of this method.
Yoichi MATSUO Tatsuaki KIMURA Ken NISHIMATSU
When a failure occurs in a network element, such as switch, router, and server, network operators need to recognize the service impact, such as time to recovery from the failure or severity of the failure, since service impact is essential information for handling failures. In this paper, we propose Deep learning based Service Impact Prediction system (DeepSIP), which predicts the service impact of network failure in a network element using a temporal multimodal convolutional neural network (CNN). More precisely, DeepSIP predicts the time to recovery from the failure and the loss of traffic volume due to the failure in a network on the basis of information from syslog messages and traffic volume. Since the time to recovery is useful information for a service level agreement (SLA) and the loss of traffic volume is directly related to the severity of the failure, we regard the time to recovery and the loss of traffic volume as the service impact. The service impact is challenging to predict, since it depends on types of network failures and traffic volume when the failure occurs. Moreover, network elements do not explicitly contain any information about the service impact. To extract the type of network failures and predict the service impact, we use syslog messages and past traffic volume. However, syslog messages and traffic volume are also challenging to analyze because these data are multimodal, are strongly correlated, and have temporal dependencies. To extract useful features for prediction, we develop a temporal multimodal CNN. We experimentally evaluated DeepSIP in terms of accuracy by comparing it with other NN-based methods by using synthetic and real datasets. For both datasets, the results show that DeepSIP outperformed the baselines.
Satoshi DENNO Kazuma YAMAMOTO Yafei HOU
This paper proposes relay selection techniques for XOR physical layer network coding with MMSE based non-linear precoding in MIMO bi-directional wireless relaying networks. The proposed selection techniques are derived on the different assumption about characteristics of the MMSE based non-linear precoding in the wireless network. We show that the signal to noise power ratio (SNR) is dependent on the product of all the eigenvalues in the channels from the terminals to relays. This paper shows that the best selection techniques in all the proposed techniques is to select a group of the relays that maximizes the product. Therefore, the selection technique is called “product of all eigenvalues (PAE)” in this paper. The performance of the proposed relay selection techniques is evaluated in a MIMO bi-directional wireless relaying network where two terminals with 2 antennas exchange their information via relays. When the PAE is applied to select a group of the 2 relays out of the 10 relays where an antenna is placed, the PAE attains a gain of more than 13dB at the BER of 10-3.
Junxuan WANG Meng YU Xuewei ZHANG Fan JIANG
Heterogeneous networks (HetNets) are emerging as an inevitable method to tackle the capacity crunch of the cellular networks. Due to the complicated network environment and a large number of configured parameters, coverage and capacity optimization (CCO) is a challenging issue in heterogeneous cellular networks. By combining the self-optimizing algorithm for radio frequency (RF) parameters with the power control mechanism of small cells, the CCO problem of self-organizing network is addressed in this paper. First, the optimization of RF parameters is solved based on reinforcement learning (RL), where the base station is modeled as an agent that can learn effective strategies to control the tunable parameters by interacting with the surrounding environment. Second, the small cell can autonomously change the state of wireless transmission by comparing its distance from the user equipment with the virtual cell size. Simulation results show that the proposed algorithm can achieve better performance on user throughput compared to different conventional methods.
Ying KANG Cong LIU Ning WANG Dianxi SHI Ning ZHOU Mengmeng LI Yunlong WU
Siamese visual tracking, viewed as a problem of max-similarity matching to the target template, has absorbed increasing attention in computer vision. However, it is a challenge for current Siamese trackers that the demands of balance between accuracy in real-time tracking and robustness in long-time tracking are hard to meet. This work proposes a new Siamese based tracker with a dual-pipeline correlated fusion network (named as ADF-SiamRPN), which consists of one initial template for robust correlation, and the other transient template with the ability of adaptive feature optimal selection for accurate correlation. By the promotion from the learnable correlation-response fusion network afterwards, we are in pursuit of the synthetical improvement of tracking performance. To compare the performance of ADF-SiamRPN with state-of-the-art trackers, we conduct lots of experiments on benchmarks like OTB100, UAV123, VOT2016, VOT2018, GOT-10k, LaSOT and TrackingNet. The experimental results of tracking demonstrate that ADF-SiamRPN outperforms all the compared trackers and achieves the best balance between accuracy and robustness.
Shu JIANG Rui WANG Zuchao LI Masao UTIYAMA Kehai CHEN Eiichiro SUMITA Hai ZHAO Bao-liang LU
Standard neural machine translation (NMT) is on the assumption that the document-level context is independent. Most existing document-level NMT approaches are satisfied with a smattering sense of global document-level information, while this work focuses on exploiting detailed document-level context in terms of a memory network. The capacity of the memory network that detecting the most relevant part of the current sentence from memory renders a natural solution to model the rich document-level context. In this work, the proposed document-aware memory network is implemented to enhance the Transformer NMT baseline. Experiments on several tasks show that the proposed method significantly improves the NMT performance over strong Transformer baselines and other related studies.
We propose a new method for improving the recognition performance of phonemes, speech emotions, and music genres using multi-task learning. When tasks are closely related, multi-task learning can improve the performance of each task by learning common feature representation for all the tasks. However, the recognition tasks considered in this study demand different input signals of speech and music at different time scales, resulting in input features with different characteristics. In addition, a training dataset with multiple labels for all information sources is not available. Considering these issues, we conduct multi-task learning in a sequential training process using input features with a single label for one information source. A comparative evaluation confirms that the proposed method for multi-task learning provides higher performance for all recognition tasks than individual learning for each task as in conventional methods.
Jun MENG Gangyi DING Laiyang LIU
In view of the different spatial and temporal resolutions of observed multi-source heterogeneous carbon dioxide data and the uncertain quality of observations, a data fusion prediction model for observed multi-scale carbon dioxide concentration data is studied. First, a wireless carbon sensor network is created, the gross error data in the original dataset are eliminated, and remaining valid data are combined with kriging method to generate a series of continuous surfaces for expressing specific features and providing unified spatio-temporally normalized data for subsequent prediction models. Then, the long short-term memory network is used to process these continuous time- and space-normalized data to obtain the carbon dioxide concentration prediction model at any scales. Finally, the experimental results illustrate that the proposed method with spatio-temporal features is more accurate than the single sensor monitoring method without spatio-temporal features.
Rui SUN Qili LIANG Zi YANG Zhenghui ZHAO Xudong ZHANG
Video-based person re-identification (re-ID) aims at retrieving person across non-overlapping camera and has achieved promising results owing to deep convolutional neural network. Due to the dynamic properties of the video, the problems of background clutters and occlusion are more serious than image-based person Re-ID. In this letter, we present a novel triple attention network (TriANet) that simultaneously utilizes temporal, spatial, and channel context information by employing the self-attention mechanism to get robust and discriminative feature. Specifically, the network has two parts, where the first part introduces a residual attention subnetwork, which contains channel attention module to capture cross-dimension dependencies by using rotation and transformation and spatial attention module to focus on pedestrian feature. In the second part, a time attention module is designed to judge the quality score of each pedestrian, and to reduce the weight of the incomplete pedestrian image to alleviate the occlusion problem. We evaluate our proposed architecture on three datasets, iLIDS-VID, PRID2011 and MARS. Extensive comparative experimental results show that our proposed method achieves state-of-the-art results.
Tengfei SHAO Yuya IEIRI Reiko HISHIYAMA
Tourist satisfaction plays a very important role in the development of local community tourism. For the development of tourist destinations in local communities, it is important to measure, maintain, and improve tourist destination royalties over the medium to long term. It has been proven that improving tourist satisfaction is a major factor in improving tourist destination royalties. Therefore, to improve tourist satisfaction in local communities, we identified multiple clusters of sightseeing spots and determined that the satisfaction of tourists can be increased based on these clusters of sightseeing spots. Our discovery flow can be summarized as follows. First, we extracted tourism keywords from guidebooks on sightseeing spots. We then constructed a complex network of tourists and sightseeing spots based on the data collected from experiments conducted in Kyoto. Next, we added the corresponding tourism keywords to each sightseeing spot. Finally, by analyzing network motifs, we successfully discovered multiple clusters of sightseeing spots that could be used to improve tourist satisfaction.
Song CHENG Zixuan LI Yongsen WANG Wanbing ZOU Yumei ZHOU Delong SHANG Shushan QIAO
Binary neural networks (BNNs), where both activations and weights are radically quantized to be {-1, +1}, can massively accelerate the run-time performance of convolution neural networks (CNNs) for edge devices, by computation complexity reduction and memory footprint saving. However, the non-differentiable binarizing function used in BNNs, makes the binarized models hard to be optimized, and introduces significant performance degradation than the full-precision models. Many previous works managed to correct the backward gradient of binarizing function with various improved versions of straight-through estimation (STE), or in a gradual approximate approach, but the gradient suppression problem was not analyzed and handled. Thus, we propose a novel gradient corrected approximation (GCA) method to match the discrepancy between binarizing function and backward gradient in a gradual and stable way. Our work has two primary contributions: The first is to approximate the backward gradient of binarizing function using a simple leaky-steep function with variable window size. The second is to correct the gradient approximation by standardizing the backward gradient propagated through binarizing function. Experiment results show that the proposed method outperforms the baseline by 1.5% Top-1 accuracy on ImageNet dataset without introducing extra computation cost.