Satoshi MIZOGUCHI Yuki SAITO Shinnosuke TAKAMICHI Hiroshi SARUWATARI
We propose deep neural network (DNN)-based speech enhancement that reduces musical noise and achieves better auditory impressions. The musical noise is an artifact generated by nonlinear signal processing and negatively affects the auditory impressions. We aim to develop musical-noise-free speech enhancement methods that suppress the musical noise generation and produce perceptually-comfortable enhanced speech. DNN-based speech enhancement using a soft mask achieves high noise reduction but generates musical noise in non-speech regions. Therefore, first, we define kurtosis matching for DNN-based low-musical-noise speech enhancement. Kurtosis is the fourth-order moment and is known to correlate with the amount of musical noise. The kurtosis matching is a penalty term of the DNN training and works to reduce the amount of musical noise. We further extend this scheme to standardized-moment matching. The extended scheme involves using moments whose orders are higher than kurtosis and generalizes the conventional musical-noise-free method based on kurtosis matching. We formulate standardized-moment matching and explore how effectively the higher-order moments reduce the amount of musical noise. Experimental evaluation results 1) demonstrate that kurtosis matching can reduce musical noise without negatively affecting noise suppression and 2) newly reveal that the sixth-moment matching also achieves low-musical-noise speech enhancement as well as kurtosis matching.
Yan ZHAO Yue XIE Ruiyu LIANG Li ZHANG Li ZHAO Chengyu LIU
Depression endangers people's health conditions and affects the social order as a mental disorder. As an efficient diagnosis of depression, automatic depression detection has attracted lots of researcher's interest. This study presents an attention-based Long Short-Term Memory (LSTM) model for depression detection to make full use of the difference between depression and non-depression between timeframes. The proposed model uses frame-level features, which capture the temporal information of depressive speech, to replace traditional statistical features as an input of the LSTM layers. To achieve more multi-dimensional deep feature representations, the LSTM output is then passed on attention layers on both time and feature dimensions. Then, we concat the output of the attention layers and put the fused feature representation into the fully connected layer. At last, the fully connected layer's output is passed on to softmax layer. Experiments conducted on the DAIC-WOZ database demonstrate that the proposed attentive LSTM model achieves an average accuracy rate of 90.2% and outperforms the traditional LSTM network and LSTM with local attention by 0.7% and 2.3%, respectively, which indicates its feasibility.
Xinran LIU Zhongju WANG Long WANG Chao HUANG Xiong LUO
A hybrid Retinex-based image enhancement algorithm is proposed to improve the quality of images captured by unmanned aerial vehicles (UAVs) in this paper. Hyperparameters of the employed multi-scale Retinex with chromaticity preservation (MSRCP) model are automatically tuned via a two-phase evolutionary computing algorithm. In the two-phase optimization algorithm, the Rao-2 algorithm is applied to performing the global search and a solution is obtained by maximizing the objective function. Next, the Nelder-Mead simplex method is used to improve the solution via local search. Real UAV-taken images of bad quality are collected to verify the performance of the proposed algorithm. Meanwhile, four famous image enhancement algorithms, Multi-Scale Retinex, Multi-Scale Retinex with Color Restoration, Automated Multi-Scale Retinex, and MSRCP are utilized as benchmarking methods. Meanwhile, two commonly used evolutionary computing algorithms, particle swarm optimization and flower pollination algorithm, are considered to verify the efficiency of the proposed method in tuning parameters of the MSRCP model. Experimental results demonstrate that the proposed method achieves the best performance compared with benchmarks and thus the proposed method is applicable for real UAV-based applications.
Naoki HATTORI Jun SHIOMI Yutaka MASUDA Tohru ISHIHARA Akihiko SHINYA Masaya NOTOMI
With the rapid progress of the integrated nanophotonics technology, the optical neural network architecture has been widely investigated. Since the optical neural network can complete the inference processing just by propagating the optical signal in the network, it is expected more than one order of magnitude faster than the electronics-only implementation of artificial neural networks (ANN). In this paper, we first propose an optical vector-matrix multiplication (VMM) circuit using wavelength division multiplexing, which enables inference processing at the speed of light with ultra-wideband. This paper next proposes optoelectronic circuit implementation for batch normalization and activation function, which significantly improves the accuracy of the inference processing without sacrificing the speed performance. Finally, using a virtual environment for machine learning and an optoelectronic circuit simulator, we demonstrate the ultra-fast and accurate operation of the optical-electronic ANN circuit.
Mitsuhiko IGARASHI Yuuki UCHIDA Yoshio TAKAZAWA Makoto YABUUCHI Yasumasa TSUKAMOTO Koji SHIBUTANI Kazutoshi KOBAYASHI
In this paper, we present an analysis of local variability of bias temperature instability (BTI) by measuring Ring-Oscillators (RO) on various processes and its impact on logic circuit and SRAM. The evaluation results based on measuring ROs of a test elementary group (TEG) fabricated in 7nm Fin Field Effect Transistor (FinFET) process, 16/14nm generation FinFET processes and a 28nm planer process show that the standard deviations of Negative BTI (NBTI) Vth degradation (σ(ΔVthp)) are proportional to the square root of the mean value (µ(ΔVthp)) at any stress time, Vth flavors and various recovery conditions. While the amount of local BTI variation depends on the gate length, width and number of fins, the amount of local BTI variation at the 7nm FinFET process is slightly larger than other processes. Based on these measurement results, we present an analysis result of its impact on logic circuit considering measured Vth dependency on global NBTI in the 7nm FinFET process. We also analyse its impact on SRAM minimum operation voltage (Vmin) of static noise margin (SNM) based on sensitivity analysis and shows non-negligible Vmin degradation caused by local NBTI.
Ryosuke MATSUO Jun SHIOMI Tohru ISHIHARA Hidetoshi ONODERA Akihiko SHINYA Masaya NOTOMI
Optical logic circuits based on integrated nanophotonics attract significant interest due to their ultra-high-speed operation. However, the power dissipation of conventional optical logic circuits is exponential to the number of inputs of target logic functions. This paper proposes a synthesis method reducing power dissipation to a polynomial order of the number of inputs while exploiting the high-speed nature. Our method divides the target logic function into multiple sub-functions with Optical-to-Electrical (OE) converters. Each sub-function has a smaller number of inputs than that of the original function, which enables to exponentially reduce the power dissipated by an optical logic circuit representing the sub-function. The proposed synthesis method can mitigate the OE converter delay overhead by parallelizing sub-functions. We apply the proposed synthesis method to the ISCAS'85 benchmark circuits. The power consumption of the conventional circuits based on the Binary Decision Diagram (BDD) is at least three orders of magnitude larger than that of the optical logic circuits synthesized by the proposed method. The proposed method reduces the power consumption to about 100mW. The delay of almost all the circuits synthesized by the proposed method is kept less than four times the delay of the conventional BDD-based circuit.
Kazunari TAKASAKI Ryoichi KIDA Nozomu TOGAWA
With the widespread use of Internet of Things (IoT) devices in recent years, we utilize a variety of hardware devices in our daily life. On the other hand, hardware security issues are emerging. Power analysis is one of the methods to detect anomalous behaviors, but it is hard to apply it to IoT devices where an operating system and various software programs are running. In this paper, we propose an anomalous behavior detection method for an IoT device by extracting application-specific power behaviors. First, we measure power consumption of an IoT device, and obtain the power waveform. Next, we extract an application-specific power waveform by eliminating a steady factor from the obtained power waveform. Finally, we extract feature values from the application-specific power waveform and detect an anomalous behavior by utilizing the local outlier factor (LOF) method. We conduct two experiments to show how our proposed method works: one runs three application programs and an anomalous application program randomly and the other runs three application programs in series and an anomalous application program very rarely. Application programs on both experiments are implemented on a single board computer. The experimental results demonstrate that the proposed method successfully detects anomalous behaviors by extracting application-specific power behaviors, while the existing approaches cannot.
Shoya SONODA Jun SHIOMI Hidetoshi ONODERA
A method for runtime energy optimization based on the supply voltage (Vdd) and the threshold voltage (Vth) scaling is proposed. This paper refers to the optimal voltage pair, which minimizes the energy consumption of LSI circuits under a target delay constraint, as a Minimum Energy Point (MEP). The MEP dynamically fluctuates depending on the operating conditions determined by a target delay constraint, an activity factor and a chip temperature. In order to track the MEP, this paper proposes a closed-form continuous function that determines the MEP over a wide operating performance region ranging from the above-threshold region down to the sub-threshold region. Based on the MEP determination formula, an MEP tracking algorithm is also proposed. The MEP tracking algorithm estimates the MEP even though the operating conditions widely change. Measurement results based on a 32-bit RISC processor fabricated in a 65-nm Silicon On Thin Buried oxide (SOTB) process technology show that the proposed method estimates the MEP within a 5% energy loss in comparison with the actual MEP operation.
A construction method of self-orthogonal and self-dual quasi-cyclic codes is shown which relies on factorization of modulus polynomials for cyclicity in this study. The smaller-size generator polynomial matrices are used instead of the generator matrices as linear codes. An algorithm based on Chinese remainder theorem finds the generator polynomial matrix on the original modulus from the ones constructed on each factor. This method enables us to efficiently construct and search these codes when factoring modulus polynomials into reciprocal polynomials.
In [31], Shin et al. proposed a Leakage-Resilient and Proactive Authenticated Key Exchange (LRP-AKE) protocol for credential services which provides not only a higher level of security against leakage of stored secrets but also secrecy of private key with respect to the involving server. In this paper, we discuss a problem in the security proof of the LRP-AKE protocol, and then propose a modified LRP-AKE protocol that has a simple and effective measure to the problem. Also, we formally prove its AKE security and mutual authentication for the entire modified LRP-AKE protocol. In addition, we describe several extensions of the (modified) LRP-AKE protocol including 1) synchronization issue between the client and server's stored secrets; 2) randomized ID for the provision of client's privacy; and 3) a solution to preventing server compromise-impersonation attacks. Finally, we evaluate the performance overhead of the LRP-AKE protocol and show its test vectors. From the performance evaluation, we can confirm that the LRP-AKE protocol has almost the same efficiency as the (plain) Diffie-Hellman protocol that does not provide authentication at all.
Hideya SO Kazuhiko FUKAWA Hayato SOYA Yuyuan CHANG
In unlicensed spectrum, wireless communications employing carrier sense multiple access with collision avoidance (CSMA/CA) suffer from longer transmission delay time as the number of user terminals (UTs) increases, because packet collisions are more likely to occur. To cope with this problem, this paper proposes a new multiuser detection (MUD) scheme that uses both request-to-send (RTS) and enhanced clear-to-send (eCTS) for high-reliable and low-latency wireless communications. As in conventional MUD scheme, the metric-combining MUD (MC-MUD) calculates log likelihood functions called metrics and accumulates the metrics for the maximum likelihood detection (MLD). To avoid increasing the number of states for MLD, MC-MUD forces the relevant UTs to retransmit their packets until all the collided packets are correctly detected, which requires a kind of central control and reduces the system throughput. To overcome these drawbacks, the proposed scheme, which is referred to as cancelling MC-MUD (CMC-MUD), deletes replicas of some of the collided packets from the received signals, once the packets are correctly detected during the retransmission. This cancellation enables new UTs to transmit their packets and then performs MLD without increasing the number of states, which improves the system throughput without increasing the complexity. In addition, the proposed scheme adopts RTS and eCTS. One UT that suffers from packet collision transmits RTS before the retransmission. Then, the corresponding access point (AP) transmits eCTS including addresses of the other UTs, which have experienced the same packet collision. To reproduce the same packet collision, these other UTs transmit their packets once they receive the eCTS. Computer simulations under one AP conditions evaluate an average carrier-to-interference ratio (CIR) range in which the proposed scheme is effective, and clarify that the transmission delay time of the proposed scheme is shorter than that of the conventional schemes. In two APs environments that can cause the hidden terminal problem, it is demonstrated that the proposed scheme achieves shorter transmission delay times than the conventional scheme with RTS and conventional CTS.
Kosei OZEKI Naofumi AOKI Saki ANAZAWA Yoshinori DOBASHI Kenichi IKEDA Hiroshi YASUDA
This study has developed a system that performs data communications using high frequency bands of sound signals. Unlike radio communication systems using advanced wireless devices, it only requires the legacy devices such as microphones and speakers employed in ordinary telephony communication systems. In this study, we have investigated the possibility of a machine learning approach to improve the recognition accuracy identifying binary symbols exchanged through sound media. This paper describes some experimental results evaluating the performance of our proposed technique employing a neural network as its classifier of binary symbols. The experimental results indicate that the proposed technique may have a certain appropriateness for designing an optimal classifier for the symbol identification task.
This paper reports the evaluation and simulated results of the nonlinear characteristics of the 4.65GHz Active Antenna System (AAS) for 5G mobile communication systems. The antenna element is composed of ±45° dual polarization shared patch antenna, and is equipped with total 64 elements with horizontal 8 × vertical 4 × 2 polarization configuration. A 32-element transceiver circuit was mounted on the back side of the antenna printed circuit board. With the above circuit configuration, a full digital beamforming method has been adopted that can realize high frequency utilization efficiency by using the Sub6GHz-band massive element AAS, and excellent spatial multiplexing performance by Massive MIMO has been pursued. However, it was found that the Downlink (DL) SINR (Signal to Interference and Noise Ratio) to each terminal deteriorated because of the nonlinear distorted radiation as the transmission output power was increased in the maximum rated direction. Therefore, it has been confirmed that the spatial multiplexing performance in the high output power region is significantly improved by installing DPD. In order to clarify the affection of nonlinear distorted radiation on spatial multiplexing performance, the radiation patterns were measured using OFDM signal (subcarrier spacing 60kHz × 1500 subcarriers in 90MHz bandwidth) in an anechoic chamber. And by the simulated analysis for the affection of nonlinear distortion on null characteristic, the accuracy of nulls generated in each user terminal direction does not depend on the degree of nonlinearity, but is affected by the residual amplitude and phase variation among all transmitters and receivers after calibration (CAL). Therefore, it was clarified that the double compensation configuration of DPD and high-precision CAL is effective for achieving excellent Massive MIMO performance. This paper is based on the IEICE Japanese Transactions on Communications (Vol.J102-B, No.11, pp.816-824, Nov. 2019).
Kazuki KASAI Kaoru KAWAKITA Akira KUBOTA Hiroki TSURUSAKI Ryosuke WATANABE Masaru SUGANO
In this paper, we present an efficient and robust method for estimating Homography matrix for soccer field registration between a captured camera image and a soccer field model. The presented method first detects reliable field lines from the camera image through clustering. Constructing a novel directional feature of the intersection points of the lines in both the camera image and the model, the presented method then finds matching pairs of these points between the image and the model. Finally, Homography matrix estimations and validations are performed using the obtained matching pairs, which can reduce the required number of Homography matrix calculations. Our presented method uses possible intersection points outside image for the point matching. This effectively improves robustness and accuracy of Homography estimation as demonstrated in experimental results.
Arata TAKAHASHI Osamu TAKYU Hiroshi FUJIWARA Takeo FUJII Tomoaki OHTSUKI
Information exchange through a relay node is attracting attention for applying machine-to-machine communications. If the node demodulates the received signal in relay processing confidentially, the information leakage through the relay station is a problem. In wireless MIMO switching, the frequency spectrum usage efficiency can be improved owing to the completion of information exchange within a short time. This study proposes a novel wireless MIMO switching method for secure information exchange. An overloaded situation, in which the access nodes are one larger than the number of antennas in the relay node, makes the demodulation of the relay node difficult. The access schedule of nodes is required for maintaining the overload situation and the high information exchange efficiency. This study derives the equation model of the access schedule and constructs an access schedule with fewer time periods in the integer programming problem. From the computer simulation, we confirm that the secure capacity of the proposed MIMO switching is larger than that of the original one, and the constructed access schedule is as large as the ideal and minimum time period for information exchange completion.
Akio KAWABATA Bijoy Chand CHATTERJEE Eiji OKI
In distributed processing for communication services, a proper server selection scheme is required to reduce delay by ensuring the event occurrence order. Although a conservative synchronization algorithm (CSA) has been used to achieve this goal, an optimistic synchronization algorithm (OSA) can be feasible for synchronizing distributed systems. In comparison with CSA, which reproduces events in occurrence order before processing applications, OSA can be feasible to realize low delay communication as the processing events arrive sequentially. This paper proposes an optimal server selection scheme that uses OSA for distributed processing systems to minimize end-to-end delay under the condition that maximum status holding time is limited. In other words, the end-to-end delay is minimized based on the allowed rollback time, which is given according to the application designing aspects and availability of computing resources. Numerical results indicate that the proposed scheme reduces the delay compared to the conventional scheme.
Rui SUN Qili LIANG Zi YANG Zhenghui ZHAO Xudong ZHANG
Video-based person re-identification (re-ID) aims at retrieving person across non-overlapping camera and has achieved promising results owing to deep convolutional neural network. Due to the dynamic properties of the video, the problems of background clutters and occlusion are more serious than image-based person Re-ID. In this letter, we present a novel triple attention network (TriANet) that simultaneously utilizes temporal, spatial, and channel context information by employing the self-attention mechanism to get robust and discriminative feature. Specifically, the network has two parts, where the first part introduces a residual attention subnetwork, which contains channel attention module to capture cross-dimension dependencies by using rotation and transformation and spatial attention module to focus on pedestrian feature. In the second part, a time attention module is designed to judge the quality score of each pedestrian, and to reduce the weight of the incomplete pedestrian image to alleviate the occlusion problem. We evaluate our proposed architecture on three datasets, iLIDS-VID, PRID2011 and MARS. Extensive comparative experimental results show that our proposed method achieves state-of-the-art results.
Sahoko NAKAYAMA Andros TJANDRA Sakriani SAKTI Satoshi NAKAMURA
The phenomenon where a speaker mixes two or more languages within the same conversation is called code-switching (CS). Handling CS is challenging for automatic speech recognition (ASR) and text-to-speech (TTS) because it requires coping with multilingual input. Although CS text or speech may be found in social media, the datasets of CS speech and corresponding CS transcriptions are hard to obtain even though they are required for supervised training. This work adopts a deep learning-based machine speech chain to train CS ASR and CS TTS with each other with semisupervised learning. After supervised learning with monolingual data, the machine speech chain is then carried out with unsupervised learning of either the CS text or speech. The results show that the machine speech chain trains ASR and TTS together and improves performance without requiring the pair of CS speech and corresponding CS text. We also integrate language embedding and language identification into the CS machine speech chain in order to handle CS better by giving language information. We demonstrate that our proposed approach can improve the performance on both a single CS language pair and multiple CS language pairs, including the unknown CS excluded from training data.
Ruochen LIAO Kousuke MORIWAKI Yasushi MAKIHARA Daigo MURAMATSU Noriko TAKEMURA Yasushi YAGI
In this study, we propose a method to estimate body composition-related health indicators (e.g., ratio of body fat, body water, and muscle, etc.) using video-based gait analysis. This method is more efficient than individual measurement using a conventional body composition meter. Specifically, we designed a deep-learning framework with a convolutional neural network (CNN), where the input is a gait energy image (GEI) and the output consists of the health indicators. Although a vast amount of training data is typically required to train network parameters, it is unfeasible to collect sufficient ground-truth data, i.e., pairs consisting of the gait video and the health indicators measured using a body composition meter for each subject. We therefore use a two-step approach to exploit an auxiliary gait dataset that contains a large number of subjects but lacks the ground-truth health indicators. At the first step, we pre-train a backbone network using the auxiliary dataset to output gait primitives such as arm swing, stride, the degree of stoop, and the body width — considered to be relevant to the health indicators. At the second step, we add some layers to the backbone network and fine-tune the entire network to output the health indicators even with a limited number of ground-truth data points of the health indicators. Experimental results show that the proposed method outperforms the other methods when training from scratch as well as when using an auto-encoder-based pre-training and fine-tuning approach; it achieves relatively high estimation accuracy for the body composition-related health indicators except for body fat-relevant ones.
Lin CAO Xibao HUO Yanan GUO Kangning DU
Sketch face recognition refers to matching photos with sketches, which has effectively been used in various applications ranging from law enforcement agencies to digital entertainment. However, due to the large modality gap between photos and sketches, sketch face recognition remains a challenging task at present. To reduce the domain gap between the sketches and photos, this paper proposes a cascaded transformation generation network for cross-modality image generation and sketch face recognition simultaneously. The proposed cascaded transformation generation network is composed of a generation module, a cascaded feature transformation module, and a classifier module. The generation module aims to generate a high quality cross-modality image, the cascaded feature transformation module extracts high-level semantic features for generation and recognition simultaneously, the classifier module is used to complete sketch face recognition. The proposed transformation generation network is trained in an end-to-end manner, it strengthens the recognition accuracy by the generated images. The recognition performance is verified on the UoM-SGFSv2, e-PRIP, and CUFSF datasets; experimental results show that the proposed method is better than other state-of-the-art methods.