Jin LIU Masahide HATANAKA Takao ONOYE
Lately, an increasing number of wireless local area network (WLAN) access points (APs) are deployed to serve an ever increasing number of mobile stations (STAs). Due to the limited frequency spectrum, more and more AP and STA nodes try to access the same channel. Spatial spectrum reuse is promoted by the IEEE 802.11ax task group through dynamic sensitivity control (DSC), which permits cochannel operation when the received signal power at the prospective transmitting node (PTN) is lower than an adjusted carrier sensing threshold (CST). Previously-proposed DSC approaches typically calculate the CST without node grouping by using a margin parameter that remains fixed during operation. Setting the margin has previously been done heuristically. Finding a suitable value has remained an open problem. Therefore, herein, we propose a DSC approach that employs a node grouping method for adaptive calculation of the CST at the PTN with a channel-aware and margin-free formula. Numerical simulations for dense residential WLAN scenario reveal total throughput and Jain's fairness index gains of 8.4% and 7.6%, respectively, vs. no DSC (as in WLANs deployed to present).
Tashpolat NIZAMIDIN Li ZHAO Ruiyu LIANG Yue XIE Askar HAMDULLA
As one of the popular topics in the field of human-computer interaction, the Speech Emotion Recognition (SER) aims to classify the emotional tendency from the speakers' utterances. Using the existing deep learning methods, and with a large amount of training data, we can achieve a highly accurate performance result. Unfortunately, it's time consuming and difficult job to build such a huge emotional speech database that can be applicable universally. However, the Siamese Neural Network (SNN), which we discuss in this paper, can yield extremely precise results with just a limited amount of training data through pairwise training which mitigates the impacts of sample deficiency and provides enough iterations. To obtain enough SER training, this study proposes a novel method which uses Siamese Attention-based Long Short-Term Memory Networks. In this framework, we designed two Attention-based Long Short-Term Memory Networks which shares the same weights, and we input frame level acoustic emotional features to the Siamese network rather than utterance level emotional features. The proposed solution has been evaluated on EMODB, ABC and UYGSEDB corpora, and showed significant improvement on SER results, compared to conventional deep learning methods.
In this letter, we propose a more secure modeling and simulation approach that can systematically detect state variable corruptions caused by buffer overflows in simulation models. Using our approach, developers may not consider secure coding practices related to the corruptions. We have implemented a prototype of the approach based on a modeling and simulation formalism and an open source simulator. Through optimization, the prototype could show better performance, compared to the original simulator, and detect state variable corruptions.
Rongcun WANG Shujuan JIANG Kun ZHANG Qiao YU
Software fault localization, as one of the essential activities in program debugging, aids to software developers to identify the locations of faults in a program, thus reducing the cost of program debugging. Spectrum-based fault localization (SBFL), as one of the representative localization techniques, has been intensively studied. The localization technique calculates the probability of each program entity that is faulty by a certain suspiciousness formula. The accuracy of SBFL is not always as satisfactory as expected because it neglects the contextual information of statement executions. Therefore, we proposed 5 rules, i.e., random, the maximum coverage, the minimum coverage, the maximum distance, and the minimum distance, to improve the accuracy of SBFL for further. The 5 rules can effectively use the contextual information of statement executions. Moreover, they can be implemented on the traditional SBFL techniques using suspiciousness formulas with little effort. We empirically evaluated the impacts of the rules on 17 suspiciousness formulas. The results show that all 5 rules can significantly improve the ranking of faulty statements. Particularly, for the faults difficult to locate, the improvement is more remarkable. Generally, the rules can effectively reduce the number of statements examined by an average of more than 19%. Compared with other rules, the minimum coverage rule generates better results. This indicates that the application of the test case having the minimum coverage capability for fault localization is more effective.
Danyang LIU Ji XU Pengyuan ZHANG
End-to-end (E2E) multilingual automatic speech recognition (ASR) systems aim to recognize multilingual speeches in a unified framework. In the current E2E multilingual ASR framework, the output prediction for a specific language lacks constraints on the output scope of modeling units. In this paper, a language supervision training strategy is proposed with language masks to constrain the neural network output distribution. To simulate the multilingual ASR scenario with unknown language identity information, a language identification (LID) classifier is applied to estimate the language masks. On four Babel corpora, the proposed E2E multilingual ASR system achieved an average absolute word error rate (WER) reduction of 2.6% compared with the multilingual baseline system.
So Ryoung PARK Iickho SONG Seokho YOON
A unified decision scheme for the classification and localization of cable faults is proposed based on a two-step procedure. Having basis in the time domain reflectometry (TDR), the proposed scheme is capable of determining not only the locations but also types of faults in a cable without an excessive additional computational burden compared to other TDR-based schemes. Results from simulation and experiments with measured real data demonstrate that the proposed scheme exhibits a higher rate of correct decision than the conventional schemes in localizing and classifying faults over a wide range of the location of faults.
Shinya HORIIKE Masanori MORISE
To improve the likability of speech, we propose a voice conversion algorithm by controlling the fundamental frequency (F0) and the spectral envelope and carry out a subjective evaluation. The subjects can manipulate these two speech parameters. From the result, the subjects preferred speech with a parameter related to higher brightness.
Thuan Van NGO Rieko KUBO Masato AKAGI
Lombard speech is produced in noisy environments due to the Lombard effect and is intelligible in adverse environments. To adaptively control the intelligibility of transmitted speech for public announcement systems, in this study, we focus on perceptually mimicking Lombard speech under backgrounds with varying noise levels. Other approaches map corresponding neutral speech features to Lombard speech features, but as this can only be applied to one noise level at a time, it is unsuitable for varying noise levels because the characteristics of Lombard speech are varied according to noise level. Instead, we utilize a rule-based method that automatically generates rules and flexibly controls features with any change of noise level. Specifically, we conduct a feature tendency analysis and propose a continuous rule generation model to estimate the effect of varying noise levels on features. The proposed techniques, which are based on a coarticulation model, MRTD, and spectral-GMM, can easily modify neutral speech features by following the generated rules. Voices having these features are then synthesized by STRAIGHT to obtain Lombard speech fitting to noises with varying levels. To validate our proposed method, the quality of mimicking speech is evaluated in subjective listening experiments on similarity, intelligibility, and naturalness. In varying noise levels, the results show equal similarity with Lombard speech between the proposed method and a state-of-the-art method. Intelligibility and naturalness are comparable with some feature modifications.
Mohammed Salah AL-RADHI Tamás Gábor CSAPÓ Géza NÉMETH
In this article, we propose a method called “continuous noise masking (cNM)” that allows eliminating residual buzziness in a continuous vocoder, i.e. of which all parameters are continuous and offers a simple and flexible speech analysis and synthesis system. Traditional parametric vocoders generally show a perceptible deterioration in the quality of the synthesized speech due to different processing algorithms. Furthermore, an inaccurate noise resynthesis (e.g. in breathiness or hoarseness) is also considered to be one of the main underlying causes of performance degradation, leading to noisy transients and temporal discontinuity in the synthesized speech. To overcome these issues, a new cNM is developed based on the phase distortion deviation in order to reduce the perceptual effect of the residual noise, allowing a proper reconstruction of noise characteristics, and model better the creaky voice segments that may happen in natural speech. To this end, the cNM is designed to keep only voice components under a condition of the cNM threshold while discarding others. We evaluate the proposed approach and compare with state-of-the-art vocoders using objective and subjective listening tests. Experimental results show that the proposed method can reduce the effect of residual noise and can reach the quality of other sophisticated approaches like STRAIGHT and log domain pulse model (PML).
Jiaquan WU Feiteng LI Zhijian CHEN Xiaoyan XIANG Yu PU
This paper presents an automated patient-specific ECG classification algorithm, which integrates long short-term memory (LSTM) and convolutional neural networks (CNN). While LSTM extracts the temporal features, such as the heart rate variance (HRV) and beat-to-beat correlation from sequential heartbeats, CNN captures detailed morphological characteristics of the current heartbeat. To further improve the classification performance, adaptive segmentation and re-sampling are applied to align the heartbeats of different patients with various heart rates. In addition, a novel clustering method is proposed to identify the most representative patterns from the common training data. Evaluated on the MIT-BIH arrhythmia database, our algorithm shows the superior accuracy for both ventricular ectopic beats (VEB) and supraventricular ectopic beats (SVEB) recognition. In particular, the sensitivity and positive predictive rate for SVEB increase by more than 8.2% and 8.8%, respectively, compared with the prior works. Since our patient-specific classification does not require manual feature extraction, it is potentially applicable to embedded devices for automatic and accurate arrhythmia monitoring.
Chao-Yuan KAO Sangwook PARK Alzahra BADI David K. HAN Hanseok KO
Performance in Automatic Speech Recognition (ASR) degrades dramatically in noisy environments. To alleviate this problem, a variety of deep networks based on convolutional neural networks and recurrent neural networks were proposed by applying L1 or L2 loss. In this Letter, we propose a new orthogonal gradient penalty (OGP) method for Wasserstein Generative Adversarial Networks (WGAN) applied to denoising and despeeching models. WGAN integrates a multi-task autoencoder which estimates not only speech features but also noise features from noisy speech. While achieving 14.1% improvement in Wasserstein distance convergence rate, the proposed OGP enhanced features are tested in ASR and achieve 9.7%, 8.6%, 6.2%, and 4.8% WER improvements over DDAE, MTAE, R-CED(CNN) and RNN models.
Dongyeong KIM Dawoon KWON Junghwan SONG
The boomerang connectivity table (BCT) was introduced by C. Cid et al. Using the BCT, for SPN block cipher, the dependency between sub-ciphers in boomerang structure can be computed more precisely. However, the existing method to generate BCT is difficult to be applied to the ARX-based cipher, because of the huge domain size. In this paper, we show a method to compute the dependency between sub-ciphers in boomerang structure for modular addition. Using bit relation in modular addition, we compute the dependency sequentially in bitwise. And using this method, we find boomerang characteristics and amplified boomerang characteristics for the ARX-based ciphers LEA and SPECK. For LEA-128, we find a reduced 15-round boomerang characteristic and reduced 16-round amplified boomerang characteristic which is two rounds longer than previous boomerang characteristic. Also for SPECK64/128, we find a reduced 13-round amplified boomerang characteristic which is one round longer than previous rectangle characteristic.
Qingbo WANG Gaoqi DOU Ran DENG Jun GAO
The current orthogonal cooperative system (OCS) achieves diversity through the use of relays and the consumption of an additional time slot (TS). To guarantee the orthogonality of the received signal and avoid the mutual interference at the destination, the source has to be mute in the second TS. Consequently, the spectral efficiency (SE) is halved. In this paper, linear constellation precoded orthogonal frequency division multiplexing with index modulation (LCP-OFDM-IM) based OCS is proposed, where the source activates the complementary subcarriers to convey the symbols over two TSs. Hence the source can consecutively transmit information to the destination without the mutual interference. Compared with the current OFDM based OCS, the LCP-OFDM-IM based OCS can achieve a higher SE, since the subcarrier activation patterns (SAPs) can be exploited to convey additional information. Furthermore, the optimal precoder, in the sense of maximizing the minimum Euclidean distance of the symbols conveyed on each subcarrier over two TSs, is provided. Simulation results show the superiority of the LCP-OFDM-IM based OCS over the current OFDM based OCS.
Angular Momentum (AM) has been considered as a new dimension of wireless transmissions as well as the intrinsic property of Electro-Magnetic (EM) waves. So far, AM is utilized as a discrete mode not only in the quantum states, but also in the statistical beam forming. Traditionally, the continuous value of AM is ignored and only the quantized mode number is identified. However, the recent discovery on electrons in spiral motion producing twisted radiation with AM, including Spin Angular Momentum (SAM) and Orbital Angular Momentum (OAM), proves that the continuous value of AM is available in the statistical EM wave beam. This is also revealed by the so-called fractional OAM, which is reported in optical OAM beams. Then, as the new dimension with continuous real number field, AM should turn out to be a certain spectrum, similar to the frequency spectrum usually in the wireless signal processing. In this letter, we mathematically define the AM spectrum and show the applications in the information theory analysis, which is expected to be an efficient tool for the future wireless communications with AM.
Van-Hai VU Quang-Phuoc NGUYEN Kiem-Hieu NGUYEN Joon-Choul SHIN Cheol-Young OCK
Since deep learning was introduced, a series of achievements has been published in the field of automatic machine translation (MT). However, Korean-Vietnamese MT systems face many challenges because of a lack of data, multiple meanings of individual words, and grammatical diversity that depends on context. Therefore, the quality of Korean-Vietnamese MT systems is still sub-optimal. This paper discusses a method for applying Named Entity Recognition (NER) and Part-of-Speech (POS) tagging to Vietnamese sentences to improve the performance of Korean-Vietnamese MT systems. In terms of implementation, we used a tool to tag NER and POS in Vietnamese sentences. In addition, we had access to a Korean-Vietnamese parallel corpus with more than 450K paired sentences from our previous research paper. The experimental results indicate that tagging NER and POS in Vietnamese sentences can improve the quality of Korean-Vietnamese Neural MT (NMT) in terms of the Bi-Lingual Evaluation Understudy (BLEU) and Translation Error Rate (TER) score. On average, our MT system improved by 1.21 BLEU points or 2.33 TER scores after applying both NER and POS tagging to the Vietnamese corpus. Due to the structural features of language, the MT systems in the Korean to Vietnamese direction always give better BLEU and TER results than translation machines in the reverse direction.
Kazuyuki AMANO Shin-ichi NAKANO
Let P be a set of points on the plane, and d(p, q) be the distance between a pair of points p, q in P. For a point p∈P and a subset S ⊂ P with |S|≥3, the 2-dispersion cost, denoted by cost2(p, S), of p with respect to S is the sum of (1) the distance from p to the nearest point in Ssetminus{p} and (2) the distance from p to the second nearest point in Ssetminus{p}. The 2-dispersion cost cost2(S) of S ⊂ P with |S|≥3 is minp∈S{cost2(p, S)}. Given a set P of n points and an integer k we wish to compute k point subset S of P with maximum cost2(S). In this paper we give a simple 1/({4sqrt{3}}) approximation algorithm for the problem.
Hosang LEE Jawad YOUSAF Kwangho KIM Seongjin MUN Chanseok HWANG Wansoo NAH
This paper analyzes and compares two methods to estimate electromagnetically coupled noises introduced to an antenna due to the nearby circuits at a circuit design stage. One of them is to estimate the power spectrum, and the other one is to estimate the active S11 parameter at the victim antenna, respectively, and both of them use simulated standard S-parameters for the electromagnetic coupling in the circuit. They also need the assumed or measured excitation of noise sources. To confirm the validness of the two methods, an evaluation board consisting of an antenna and noise sources were designed and fabricated in which voltage controlled oscillator (VCO) chips are placed as noise sources. The generated electromagnetic noises are transferred to an antenna via loop-shaped transmission lines, degrading the performance of the antenna. In this paper, detailed analysis procedures are described using the evaluation board, and it is shown that the two methods are equivalent to each other in terms of the induced voltages in the antenna. Finally, a procedure to estimate antenna performance degradation at the design stage is summarized.
Takamaru MATSUI Shouhei KIDERA
Here, we present a novel spectroscopic imaging method based on the boundary-extraction scheme for wide-beam terahertz (THz) three-dimensional imaging. Optical-lens-focusing systems for THz subsurface imaging generally require the depth of the object from the surface to be input beforehand to achieve the desired azimuth resolution. This limitation can be alleviated by incorporating a wide-beam THz transmitter into the synthetic aperture to automatically change the focusing depth in the post-signal processing. The range point migration (RPM) method has been demonstrated to have significant advantages in terms of imaging accuracy over the synthetic-aperture method. Moreover, in the RPM scheme, spectroscopic information can be easily associated with each scattering center. Thus, we propose an RPM-based terahertz spectroscopic imaging method. The finite-difference time-domain-based numerical analysis shows that the proposed algorithm provides accurate target boundary imaging associated with each frequency-dependent characteristic.
We propose a non-photorealistic rendering method for generating edge-preserving bubble images from gray-scale photographic images. Bubble images are non-photorealistic images embedded in many bubbles, and edge-preserving bubble images are bubble images where edges in photographic images are preserved. The proposed method is executed by an iterative processing using absolute difference in window. The proposed method has features that processing is simple and fast. To validate the effectiveness of the proposed method, experiments using various photographic images are conducted. Results show that the proposed method can generate edge-preserving bubble images by preserving the edges of photographic images and the processing speed is high.
Yuya HOSODA Arata KAWAMURA Youji IIGUNI
In this paper, we propose an image to sound mapping method. This technique treats an image as a spectrogram and maps it to a sound by taking inverse FFT of the spectrogram. Amplitude spectra of a speech signal are embedded to the spectrogram to give speech intelligibility for the mapped sound. Specifically, we hold amplitude spectra of a speech signal with strong power and embed the image brightness in other frequency bands. Holding amplitude spectra of a speech signal with strong power preserves a speech spectral envelope and improves the speech quality of the mapped sound. The amplitude spectra of the mapped sound with weak power represent the image brightness, and then the image is successfully reconstructed from the mapped sound. Simulation results show that the proposed method achieves sufficient speech quality.