Keiichiro INAGAKI Tatsuya MARUNO Kota YAMAMOTO
The brain processes numerous information related to traffic scenes for appropriate perception, judgment, and operation in vehicle driving. Here, the strategy for perception, judgment, and operation is individually different for each driver, and this difference is thought to be arise from experience of driving. In the present work, we measure and analyze human brain activity (EEG: Electroencephalogram) related to visual perception during vehicle driving to clarify the relationship between experience of driving and brain activity. As a result, more experts generate α activities than beginners, and also confirm that the β activities is reduced than beginners. These results firstly indicate that experience of driving is reflected into the activation pattern of EEG.
Some non-acoustic modalities have the ability to reveal certain speech attributes that can be used for synthesizing speech signals without acoustic signals. This study validated the use of ultrasonic Doppler frequency shifts caused by facial movements to implement a silent speech interface system. A 40kHz ultrasonic beam is incident to a speaker's mouth region. The features derived from the demodulated received signals were used to estimate the speech parameters. A nonlinear regression approach was employed in this estimation where the relationship between ultrasonic features and corresponding speech is represented by deep neural networks (DNN). In this study, we investigated the discrepancies between the ultrasonic signals of audible and silent speech to validate the possibility for totally silent communication. Since reference speech signals are not available in silently mouthed ultrasonic signals, a nearest-neighbor search and alignment method was proposed, wherein alignment was achieved by determining the optimal pair of ultrasonic and audible features in the sense of a minimum mean square error criterion. The experimental results showed that the performance of the ultrasonic Doppler-based method was superior to that of EMG-based speech estimation, and was comparable to an image-based method.
We propose a deep learning-based model for classifying pathological voices using a convolutional neural network and a feedforward neural network. The model uses combinations of heterogeneous parameters, including mel-frequency cepstral coefficients, linear predictive cepstral coefficients and higher-order statistics. We validate the accuracy of this model using the Massachusetts Eye and Ear Infirmary (MEEI) voice disorder database and the Saarbruecken Voice Database (SVD). Our model achieved an accuracy of 99.3% for MEEI and 75.18% for SVD. This model achieved an accuracy that is 7.18% higher than that of competitive models in previous studies.
Reference current used in sense amplifiers is a crucial factor in a single-end read manner for emerging memories. Dummy cell average read scheme uses multiple pairs of dummy cells inside the array to generate an accurate reference current for data sensing. The previous research adopts current mirror sense amplifier (CMSA) which is compatible with the dummy cell average read scheme. However, clamped bit-line sense amplifier (CBLSA) has higher sensing speed and lower power consumption compared with CMSA. Therefore, applying CBLSA to dummy cell average read scheme is expected to enhance the performance. This paper reveals that direct combination of CBLSA and dummy cell average read scheme leads to sense margin degradation. In order to solve this problem, a new array design is proposed to make CBLSA compatible with dummy cell average read scheme. Current mirror structure is employed to prevent CBLSA from being short-circuited directly. The simulation result shows that the minimum sensible tunnel magnetoresistance ratio (TMRR) can be extended from 14.3% down to 1%. The access speed of the proposed sensing scheme is less than 2 ns when TMRR is 70% or larger, which is about twice higher than the previous research. And this circuit design just consumes half of the energy in one read cycle compared with the previous research. In the proposed array architecture, all the dummy cells can be always short-circuited in totally isolated area by low-resistance metal wiring instead of using controlling transistors. This structure is able to contribute to increasing the dummy cell averaging effect. Besides, the array-level simulation validates that the array design is accessible to every data cell. This design is generally applicable to any kinds of resistance-variable emerging memories including STT-MRAM.
Hatoon S. ALSAGRI Mourad YKHLEF
Social media channels, such as Facebook, Twitter, and Instagram, have altered our world forever. People are now increasingly connected than ever and reveal a sort of digital persona. Although social media certainly has several remarkable features, the demerits are undeniable as well. Recent studies have indicated a correlation between high usage of social media sites and increased depression. The present study aims to exploit machine learning techniques for detecting a probable depressed Twitter user based on both, his/her network behavior and tweets. For this purpose, we trained and tested classifiers to distinguish whether a user is depressed or not using features extracted from his/her activities in the network and tweets. The results showed that the more features are used, the higher are the accuracy and F-measure scores in detecting depressed users. This method is a data-driven, predictive approach for early detection of depression or other mental illnesses. This study's main contribution is the exploration part of the features and its impact on detecting the depression level.
Faster R-CNN uses a region proposal network which consists of a single scale convolution filter and fully connected networks to localize detected regions. However, using a single scale filter is not enough to detect full regions of characters. In this letter, we propose a simple but effective way, i.e., utilizing variously sized convolution filters, to accurately detect Chinese characters of multiple scales in documents. We experimentally verified that our method improved IoU by 4% and detection rate by 3% than the previous single scale Faster R-CNN method.
Deep learning is gaining more and more lots of attractions and better performance in implementing the Intrusion Detection System (IDS), especially for feature learning. This paper presents the state-of-the-art advances and challenges in IDS using deep learning models, which have been achieved the big performance enhancements in the field of computer vision, natural language processing, and image/audio processing than the traditional methods. After providing a systematic and methodical description of the latest developments in deep learning from the points of the deployed architectures and techniques, we suggest the pros-and-cons of all the deep learning-based IDS, and discuss the importance of deep learning models as feature learning approach. For this, the author has suggested the concept of the Deep-Feature Extraction and Selection (D-FES). By combining the stacked feature extraction and the weighted feature selection for D-FES, our experiment was verified to get the best performance of detection rate, 99.918% and false alarm rate, 0.012% to detect the impersonation attacks in Wi-Fi network which can be achieved better than the previous publications. Summary and further challenges are suggested as a concluding remark.
Soudalin KHOUANGVICHIT Nattapong KITSUWAN Eiji OKI
This paper proposes an optimization approach that designs the backup network with the minimum total capacity to protect the primary network from random multiple link failures with link failure probability. In the conventional approach, the routing in the primary network is not considered as a factor in minimizing the total capacity of the backup network. Considering primary routing as a variable when deciding the backup network can reduce the total capacity in the backup network compared to the conventional approach. The optimization problem examined here employs robust optimization to provide probabilistic survivability guarantees for different link capacities in the primary network. The proposed approach formulates the optimization problem as a mixed integer linear programming (MILP) problem with robust optimization. A heuristic implementation is introduced for the proposed approach as the MILP problem cannot be solved in practical time when the network size increases. Numerical results show that the proposed approach can achieve lower total capacity in the backup network than the conventional approach.
Qiaochu ZHAO Ittetsu TANIGUCHI Makoto NAKAMURA Takao ONOYE
Vision systems are widely adopted in industrial fields for monitoring and automation. As a typical example, industrial vision systems are extensively implemented in vibrator parts feeder to ensure orientations of parts for assembling are aligned and disqualified parts are eliminated. An efficient parts orientation recognition and counting method is thus critical to adopt. In this paper, an integrated method for fast parts counting and orientation recognition using industrial vision systems is proposed. Original 2D spatial image signal of parts is decomposed to 1D signal with its temporal variance, thus efficient recognition and counting is achievable, feeding speed of each parts is further leveraged to elaborate counting in an adaptive way. Experiments on parts of different types are conducted, the experimental results revealed that our proposed method is both more efficient and accurate compared to other relevant methods.
Kazuya URAZOE Nobutaka KUROKI Yu KATO Shinya OHTANI Tetsuya HIROSE Masahiro NUMA
Convolutional neural network (CNN)-based image super-resolutions are widely used as a high-quality image-enhancement technique. However, in general, they show little to no luminance isotropy. Thus, we propose two methods, “Luminance Inversion Training (LIT)” and “Luminance Inversion Averaging (LIA),” to improve the luminance isotropy of CNN-based image super-resolutions. Experimental results of 2× image magnification show that the average peak signal-to-noise ratio (PSNR) using Luminance Inversion Averaging is about 0.15-0.20dB higher than that for the conventional super-resolution.
Lu YIN Junfeng LI Yonghong YAN Masato AKAGI
The simultaneous utterances impact the ability of both the hearing-impaired persons and automatic speech recognition systems. Recently, deep neural networks have dramatically improved the speech separation performance. However, most previous works only estimate the speech magnitude and use the mixture phase for speech reconstruction. The use of the mixture phase has become a critical limitation for separation performance. This study proposes a two-stage phase-aware approach for multi-talker speech separation, which integrally recovers the magnitude as well as the phase. For the phase recovery, Multiple Input Spectrogram Inversion (MISI) algorithm is utilized due to its effectiveness and simplicity. The study implements the MISI algorithm based on the mask and gives that the ideal amplitude mask (IAM) is the optimal mask for the mask-based MISI phase recovery, which brings less phase distortion. To compensate for the error of phase recovery and minimize the signal distortion, an advanced mask is proposed for the magnitude estimation. The IAM and the proposed mask are estimated at different stages to recover the phase and the magnitude, respectively. Two frameworks of neural network are evaluated for the magnitude estimation on the second stage, demonstrating the effectiveness and flexibility of the proposed approach. The experimental results demonstrate that the proposed approach significantly minimizes the distortions of the separated speech.
Takanori ISOBE Kyoji SHIBUTANI
In this paper, we explore the security of single-key Even-Mansour ciphers against key-recovery attacks. First, we introduce a simple key-recovery attack using key relations on an n-bit r-round single-key Even-Mansour cipher (r-SEM). This attack is feasible with queries of DTr=O(2rn) and $2^{rac{2r}{r + 1}n}$ memory accesses, which is $2^{rac{1}{r + 1}n}$ times smaller than the previous generic attacks on r-SEM, where D and T are the number of queries to the encryption function EK and the internal permutation P, respectively. Next, we further reduce the time complexity of the key recovery attack on 2-SEM by a start-in-the-middle approach. This is the first attack that is more efficient than an exhaustive key search while keeping the query bound of DT2=O(22n). Finally, we leverage the start-in-the-middle approach to directly improve the previous attacks on 2-SEM by Dinur et al., which exploit t-way collisions of the underlying function. Our improved attacks do not keep the bound of DT2=O(22n), but are the most time-efficient attacks among the existing ones. For n=64, 128 and 256, our attack is feasible with the time complexity of about $2^{n} cdot rac{1}{2 n}$ in the chosen-plaintext model, while Dinur et al.'s attack requires $2^{n} cdot rac{{ m log}(n)}{ n} $ in the known-plaintext model.
Jin LIU Masahide HATANAKA Takao ONOYE
Lately, an increasing number of wireless local area network (WLAN) access points (APs) are deployed to serve an ever increasing number of mobile stations (STAs). Due to the limited frequency spectrum, more and more AP and STA nodes try to access the same channel. Spatial spectrum reuse is promoted by the IEEE 802.11ax task group through dynamic sensitivity control (DSC), which permits cochannel operation when the received signal power at the prospective transmitting node (PTN) is lower than an adjusted carrier sensing threshold (CST). Previously-proposed DSC approaches typically calculate the CST without node grouping by using a margin parameter that remains fixed during operation. Setting the margin has previously been done heuristically. Finding a suitable value has remained an open problem. Therefore, herein, we propose a DSC approach that employs a node grouping method for adaptive calculation of the CST at the PTN with a channel-aware and margin-free formula. Numerical simulations for dense residential WLAN scenario reveal total throughput and Jain's fairness index gains of 8.4% and 7.6%, respectively, vs. no DSC (as in WLANs deployed to present).
Tashpolat NIZAMIDIN Li ZHAO Ruiyu LIANG Yue XIE Askar HAMDULLA
As one of the popular topics in the field of human-computer interaction, the Speech Emotion Recognition (SER) aims to classify the emotional tendency from the speakers' utterances. Using the existing deep learning methods, and with a large amount of training data, we can achieve a highly accurate performance result. Unfortunately, it's time consuming and difficult job to build such a huge emotional speech database that can be applicable universally. However, the Siamese Neural Network (SNN), which we discuss in this paper, can yield extremely precise results with just a limited amount of training data through pairwise training which mitigates the impacts of sample deficiency and provides enough iterations. To obtain enough SER training, this study proposes a novel method which uses Siamese Attention-based Long Short-Term Memory Networks. In this framework, we designed two Attention-based Long Short-Term Memory Networks which shares the same weights, and we input frame level acoustic emotional features to the Siamese network rather than utterance level emotional features. The proposed solution has been evaluated on EMODB, ABC and UYGSEDB corpora, and showed significant improvement on SER results, compared to conventional deep learning methods.
Sanghun CHOI Shuichiro HARUTA Yichen AN Iwao SASASE
Since the owner's data might be leaked from the centralized server storage, the distributed storage schemes with the server storage have been investigated. To ensure the owner's data in those schemes, they use Reed Solomon code. However, those schemes occur the burden of data capacity since the parity data are increased by how much the disconnected data can be restored. Moreover, the calculation time for the restoration will be higher since many parity data are needed to restore the disconnected data. In order to reduce the burden of data capacity and the calculation time, we proposed the server-based distributed storage using Secret Sharing with AES-256 for lightweight safety restoration. Although we use Secret Sharing, the owner's data will be safely kept in the distributed storage since all of the divided data are divided into two pieces with the AES-256 and stored in the peer storage and the server storage. Even though the server storage keeps the divided data, the server and the peer storages might know the pair of divided data via Secret Sharing, the owner's data are secure in the proposed scheme from the inner attack of Secret Sharing. Furthermore, the owner's data can be restored by a few parity data. The evaluations show that our proposed scheme is improved for lightweight, stability, and safety.
Junya IKEMOTO Toshimitsu USHIO
The OGY method is one of control methods for a chaotic system. In the method, we have to calculate a target periodic orbit embedded in its chaotic attractor. Thus, we cannot use this method in the case where a precise mathematical model of the chaotic system cannot be identified. In this case, the delayed feedback control proposed by Pyragas is useful. However, even in the delayed feedback control, we need the mathematical model to determine a feedback gain that stabilizes the periodic orbit. Thus, we propose a reinforcement learning algorithm to the design of a controller for the chaotic system. Recently, reinforcement learning algorithms with deep neural networks have been paid much attention to. Those algorithms make it possible to control complex systems. We propose a controller design method consisting of two steps, where we determine a region including a target periodic point first, and make the controller learn an optimal control policy for its stabilization. The controller efficiently explores its control policy only in the region.
A pre-trained deep convolutional neural network (DCNN) is adopted as a feature extractor to extract the feature representation of vein images for hand-dorsa vein recognition. In specific, a novel selective deep convolutional feature is proposed to obtain more representative and discriminative feature representation. Extensive experiments on the lab-made database obtain the state-of-the-art recognition result, which demonstrates the effectiveness of the proposed model.
Danyang LIU Ji XU Pengyuan ZHANG
End-to-end (E2E) multilingual automatic speech recognition (ASR) systems aim to recognize multilingual speeches in a unified framework. In the current E2E multilingual ASR framework, the output prediction for a specific language lacks constraints on the output scope of modeling units. In this paper, a language supervision training strategy is proposed with language masks to constrain the neural network output distribution. To simulate the multilingual ASR scenario with unknown language identity information, a language identification (LID) classifier is applied to estimate the language masks. On four Babel corpora, the proposed E2E multilingual ASR system achieved an average absolute word error rate (WER) reduction of 2.6% compared with the multilingual baseline system.
Kazuki KAWAMURA Takashi MATSUBARA Kuniaki UEHARA
Action recognition using skeleton data (3D coordinates of human joints) is an attractive topic due to its robustness to the actor's appearance, camera's viewpoint, illumination, and other environmental conditions. However, skeleton data must be measured by a depth sensor or extracted from video data using an estimation algorithm, and doing so risks extraction errors and noise. In this work, for robust skeleton-based action recognition, we propose a deep state-space model (DSSM). The DSSM is a deep generative model of the underlying dynamics of an observable sequence. We applied the proposed DSSM to skeleton data, and the results demonstrate that it improves the classification performance of a baseline method. Moreover, we confirm that feature extraction with the proposed DSSM renders subsequent classifications robust to noise and missing values. In such experimental settings, the proposed DSSM outperforms a state-of-the-art method.
Yeqi LIU Qi ZHANG Xiangjun XIN Qinghua TIAN Ying TAO Naijin LIU Kai LV
Rapid development of modern communications has initiated essential requirements for providing efficient algorithms that can solve the routing and wavelength assignment (RWA) problem in satellite optical networks. In this paper, the bee colony algorithm optimization based on link cost for RWA (BCO-LCRWA) is tailored for satellite networks composed of intersatellite laser links. In BCO-LCRWA, a cost model of intersatellite laser links is established based on metrics of network transmission performance namely delay and wavelengths utilization, with constraints of Doppler wavelength drift, transmission delay, wavelength consistency and continuity. Specifically, the fitness function of bee colony exploited in the proposed algorithm takes wavelength resources utilization and communication hops into account to implement effective utilization of wavelengths, to avoid unnecessary over-detouring and ensure bit error rate (BER) performance of the system. The simulation results corroborate the improved performance of the proposed algorithm compared with the existing alternatives.