The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] SPE(2504hit)

81-100hit(2504hit)

  • Neural Incremental Speech Recognition Toward Real-Time Machine Speech Translation

    Sashi NOVITASARI  Sakriani SAKTI  Satoshi NAKAMURA  

     
    PAPER-Speech and Hearing

      Pubricized:
    2021/08/27
      Vol:
    E104-D No:12
      Page(s):
    2195-2208

    Real-time machine speech translation systems mimic human interpreters and translate incoming speech from a source language to the target language in real-time. Such systems can be achieved by performing low-latency processing in ASR (automatic speech recognition) module before passing the output to MT (machine translation) and TTS (text-to-speech synthesis) modules. Although several studies recently proposed sequence mechanisms for neural incremental ASR (ISR), these frameworks have a more complicated training mechanism than the standard attention-based ASR because they have to decide the incremental step and learn the alignment between speech and text. In this paper, we propose attention-transfer ISR (AT-ISR) that learns the knowledge from attention-based non-incremental ASR for a low delay end-to-end speech recognition. ISR comes with a trade-off between delay and performance, so we investigate how to reduce AT-ISR delay without a significant performance drop. Our experiment shows that AT-ISR achieves a comparable performance to the non-incremental ASR when the incremental recognition begins after the speech utterance reaches 25% of the complete utterance length. Additional experiments to investigate the effect of ISR on translation tasks are also performed. The focus is to find the optimum granularity of the output unit. The results reveal that our end-to-end subword-level ISR resulted in the best translation quality with the lowest WER and the lowest uncovered-word rate.

  • Representation Learning of Tongue Dynamics for a Silent Speech Interface

    Hongcui WANG  Pierre ROUSSEL  Bruce DENBY  

     
    PAPER-Speech and Hearing

      Pubricized:
    2021/08/24
      Vol:
    E104-D No:12
      Page(s):
    2209-2217

    A Silent Speech Interface (SSI) is a sensor-based, Artificial Intelligence (AI) enabled system in which articulation is performed without the use of the vocal chords, resulting in a voice interface that conserves the ambient audio environment, protects private data, and also functions in noisy environments. Though portable SSIs based on ultrasound imaging of the tongue have obtained Word Error Rates rivaling that of acoustic speech recognition, SSIs remain relegated to the laboratory due to stability issues. Indeed, reliable extraction of acoustic features from ultrasound tongue images in real-life situations has proven elusive. Recently, Representation Learning has shown considerable success in learning underlying structure in noisy, high-dimensional raw data. In its unsupervised form, Representation Learning is able to reveal structure in unlabeled data, thus greatly simplifying the data preparation task. In the present article, a 3D Convolutional Neural Network architecture is applied to unlabeled ultrasound images, and is shown to reliably predict future tongue configurations. By comparing the 3DCNN to a simple previous-frame predictor, it is possible to recognize tongue trajectories comprising transitions between regions of stability that correlate with formant trajectories in a spectrogram of the signal. Prospects for using the underlying structural representation to provide features for subsequent speech processing tasks are presented.

  • Dependence of Arc Duration and Contact Gap at Arc Extinction of Break Arcs Occurring in a 48VDC/10A-300A Resistive Circuit on Contact Opening Speed

    Haruko YAZAKI  Junya SEKIKAWA  

     
    PAPER-Electromechanical Devices and Components

      Pubricized:
    2021/04/01
      Vol:
    E104-C No:11
      Page(s):
    656-662

    Dependences of arc duration D and contact gap at arc extinction d on contact opening speed v are studied for break arcs generated in a 48VDC resistive circuit at constant contact opening speeds. The opening speed v is varied over a wide range from 0.05 to 0.5m/s. Circuit current while electrical contacts are closed I0 is varied to 10A, 20A, 50A, 100A, 200A, and 300A. The following results were obtained. For each current I0, the arc duration D decreased with increasing contact opening speed v. However, the D at I0=300A was shorter than that at I0=200A. On the other hand, the contact gap at arc extinction d tended to increase with increasing the I0. However, the d at I0=300A was shorter than that at I0=200A. The d was almost constant with increasing the v for each current I0 when the I0 was lower than 200A. However, the d became shorter when the v was slower at I0=200A and 300A. At the v=0.05m/s, for example, the d at I0=300A was shorter than that at I0=100A. To explain the cause of the results of the d, in addition, arc length just before extinction L were analyzed. The L tended to increase with increasing current I0. The L was almost constant with increasing the v when the I0 was lower than 200A. However, when I0=200A and 300A, the L tended to become longer when the v was slower. The characteristics of the d will be discussed using the analyzed results of the L and motion of break arcs. At higher currents at I0=200A and 300A, the shorter d at the slowest v was caused by wide motion of the arc spots on contact surfaces and larger deformation of break arcs.

  • A Simple and Compact Planar Balun with Slit Ground

    Ryosuke SUGA  Kazuto OSHIMA  Tomoki UWANO  

     
    BRIEF PAPER-Microwaves, Millimeter-Waves

      Pubricized:
    2021/04/09
      Vol:
    E104-C No:11
      Page(s):
    667-671

    In this paper, a planar balun having simple and compact features with slit ground was proposed. The operating frequency can be designed by the length and position of the defected ground slits. The 20 dB bandwidth of the common mode rejection ratio of the measuring balun was over 90%.

  • DNN-Based Low-Musical-Noise Single-Channel Speech Enhancement Based on Higher-Order-Moments Matching

    Satoshi MIZOGUCHI  Yuki SAITO  Shinnosuke TAKAMICHI  Hiroshi SARUWATARI  

     
    PAPER-Speech and Hearing

      Pubricized:
    2021/07/30
      Vol:
    E104-D No:11
      Page(s):
    1971-1980

    We propose deep neural network (DNN)-based speech enhancement that reduces musical noise and achieves better auditory impressions. The musical noise is an artifact generated by nonlinear signal processing and negatively affects the auditory impressions. We aim to develop musical-noise-free speech enhancement methods that suppress the musical noise generation and produce perceptually-comfortable enhanced speech. DNN-based speech enhancement using a soft mask achieves high noise reduction but generates musical noise in non-speech regions. Therefore, first, we define kurtosis matching for DNN-based low-musical-noise speech enhancement. Kurtosis is the fourth-order moment and is known to correlate with the amount of musical noise. The kurtosis matching is a penalty term of the DNN training and works to reduce the amount of musical noise. We further extend this scheme to standardized-moment matching. The extended scheme involves using moments whose orders are higher than kurtosis and generalizes the conventional musical-noise-free method based on kurtosis matching. We formulate standardized-moment matching and explore how effectively the higher-order moments reduce the amount of musical noise. Experimental evaluation results 1) demonstrate that kurtosis matching can reduce musical noise without negatively affecting noise suppression and 2) newly reveal that the sixth-moment matching also achieves low-musical-noise speech enhancement as well as kurtosis matching.

  • An Anomalous Behavior Detection Method Utilizing Extracted Application-Specific Power Behaviors

    Kazunari TAKASAKI  Ryoichi KIDA  Nozomu TOGAWA  

     
    PAPER

      Pubricized:
    2021/07/08
      Vol:
    E104-A No:11
      Page(s):
    1555-1565

    With the widespread use of Internet of Things (IoT) devices in recent years, we utilize a variety of hardware devices in our daily life. On the other hand, hardware security issues are emerging. Power analysis is one of the methods to detect anomalous behaviors, but it is hard to apply it to IoT devices where an operating system and various software programs are running. In this paper, we propose an anomalous behavior detection method for an IoT device by extracting application-specific power behaviors. First, we measure power consumption of an IoT device, and obtain the power waveform. Next, we extract an application-specific power waveform by eliminating a steady factor from the obtained power waveform. Finally, we extract feature values from the application-specific power waveform and detect an anomalous behavior by utilizing the local outlier factor (LOF) method. We conduct two experiments to show how our proposed method works: one runs three application programs and an anomalous application program randomly and the other runs three application programs in series and an anomalous application program very rarely. Application programs on both experiments are implemented on a single board computer. The experimental results demonstrate that the proposed method successfully detects anomalous behaviors by extracting application-specific power behaviors, while the existing approaches cannot.

  • A Multi-Task Scheme for Supervised DNN-Based Single-Channel Speech Enhancement by Using Speech Presence Probability as the Secondary Training Target

    Lei WANG  Jie ZHU  Kangbo SUN  

    This paper has been cancelled due to violation of duplicate submission policy on IEICE Transactions on Information and Systems.
     
    PAPER-Speech and Hearing

      Pubricized:
    2021/08/05
      Vol:
    E104-D No:11
      Page(s):
    1963-1970

    To cope with complicated interference scenarios in realistic acoustic environment, supervised deep neural networks (DNNs) are investigated to estimate different user-defined targets. Such techniques can be broadly categorized into magnitude estimation and time-frequency mask estimation techniques. Further, the mask such as the Wiener gain can be estimated directly or derived by the estimated interference power spectral density (PSD) or the estimated signal-to-interference ratio (SIR). In this paper, we propose to incorporate the multi-task learning in DNN-based single-channel speech enhancement by using the speech presence probability (SPP) as a secondary target to assist the target estimation in the main task. The domain-specific information is shared between two tasks to learn a more generalizable representation. Since the performance of multi-task network is sensitive to the weight parameters of loss function, the homoscedastic uncertainty is introduced to adaptively learn the weights, which is proven to outperform the fixed weighting method. Simulation results show the proposed multi-task scheme improves the speech enhancement performance overall compared to the conventional single-task methods. And the joint direct mask and SPP estimation yields the best performance among all the considered techniques.

  • Faster SET Operation in Phase Change Memory with Initialization Open Access

    Yuchan WANG  Suzhen YUAN  Wenxia ZHANG  Yuhan WANG  

     
    PAPER-Electronic Materials

      Pubricized:
    2021/04/14
      Vol:
    E104-C No:11
      Page(s):
    651-655

    In conclusion, an initialization method has been introduced and studied to improve the SET speed in PCM. Before experiment verification, a two-dimensional finite analysis is used, and the results illustrate the proposed method is feasible to improve SET speed. Next, the R-I performances of the discrete PCM device and the resistance distributions of a 64 M bits PCM test chip with and without the initialization have been studied and analyzed, which confirms that the writing speed has been greatly improved. At the same time, the resistance distribution for the repeated initialization operations suggest that a large number of PCM cells have been successfully changed to be in an intermediate state, which is thought that only a shorter current pulse can make the cells SET successfully in this case. Compared the transmission electron microscope (TEM) images before and after initialization, it is found that there are some small grains appeared after initialization, which indicates that the nucleation process of GST has been carried out, and only needs to provide energy for grain growth later.

  • Improving the Recognition Accuracy of a Sound Communication System Designed with a Neural Network

    Kosei OZEKI  Naofumi AOKI  Saki ANAZAWA  Yoshinori DOBASHI  Kenichi IKEDA  Hiroshi YASUDA  

     
    PAPER-Engineering Acoustics

      Pubricized:
    2021/05/06
      Vol:
    E104-A No:11
      Page(s):
    1577-1584

    This study has developed a system that performs data communications using high frequency bands of sound signals. Unlike radio communication systems using advanced wireless devices, it only requires the legacy devices such as microphones and speakers employed in ordinary telephony communication systems. In this study, we have investigated the possibility of a machine learning approach to improve the recognition accuracy identifying binary symbols exchanged through sound media. This paper describes some experimental results evaluating the performance of our proposed technique employing a neural network as its classifier of binary symbols. The experimental results indicate that the proposed technique may have a certain appropriateness for designing an optimal classifier for the symbol identification task.

  • Clustering for Signal Power Distribution Toward Low Storage Crowdsourced Spectrum Database

    Yoji UESUGI  Keita KATAGIRI  Koya SATO  Kei INAGE  Takeo FUJII  

     
    PAPER

      Pubricized:
    2021/03/30
      Vol:
    E104-B No:10
      Page(s):
    1237-1248

    This paper proposes a measurement-based spectrum database (MSD) with clustered fading distributions toward greater storage efficiencies. The conventional MSD can accurately model the actual characteristics of multipath fading by plotting the histogram of instantaneous measurement data for each space-separated mesh and utilizing it in communication designs. However, if the database contains all of a distribution for each location, the amount of data stored will be extremely large. Because the main purpose of the MSD is to improve spectral efficiency, it is necessary to reduce the amount of data stored while maintaining quality. The proposed method reduces the amount of stored data by estimating the distribution of the instantaneous received signal power at each point and integrating similar distributions through clustering. Numerical results show that clustering techniques can reduce the amount of data while maintaining the accuracy of the MSD. We then apply the proposed method to the outage probability prediction for the instantaneous received signal power. It is revealed that the prediction accuracy is maintained even when the amount of data is reduced.

  • Code-Switching ASR and TTS Using Semisupervised Learning with Machine Speech Chain

    Sahoko NAKAYAMA  Andros TJANDRA  Sakriani SAKTI  Satoshi NAKAMURA  

     
    PAPER-Speech and Hearing

      Pubricized:
    2021/07/08
      Vol:
    E104-D No:10
      Page(s):
    1661-1677

    The phenomenon where a speaker mixes two or more languages within the same conversation is called code-switching (CS). Handling CS is challenging for automatic speech recognition (ASR) and text-to-speech (TTS) because it requires coping with multilingual input. Although CS text or speech may be found in social media, the datasets of CS speech and corresponding CS transcriptions are hard to obtain even though they are required for supervised training. This work adopts a deep learning-based machine speech chain to train CS ASR and CS TTS with each other with semisupervised learning. After supervised learning with monolingual data, the machine speech chain is then carried out with unsupervised learning of either the CS text or speech. The results show that the machine speech chain trains ASR and TTS together and improves performance without requiring the pair of CS speech and corresponding CS text. We also integrate language embedding and language identification into the CS machine speech chain in order to handle CS better by giving language information. We demonstrate that our proposed approach can improve the performance on both a single CS language pair and multiple CS language pairs, including the unknown CS excluded from training data.

  • A Spectrum Regeneration and Demodulation Method for Multiple Direct Undersampled Real Signals Open Access

    Takashi SHIBA  Tomoyuki FURUICHI  Mizuki MOTOYOSHI  Suguru KAMEDA  Noriharu SUEMATSU  

     
    PAPER

      Pubricized:
    2021/03/30
      Vol:
    E104-B No:10
      Page(s):
    1260-1267

    We propose a spectrum regeneration and demodulation method for multiple direct RF undersampled real signals by using a new algorithm. Many methods have been proposed to regenerate the RF spectrum by using undersampling because of its simple circuit architecture. However, it is difficult to regenerate the spectrum from a real signal that has a band wider than a half of the sampling frequency, because it is difficult to include complex conjugate relation of the folded spectrum into the linear algebraic equation in this case. We propose a new spectrum regeneration method from direct undersampled real signals that uses multiple clocks and an extended algorithm considering the complex conjugate relation. Simulations are used to verify the potential of this method. The validity of the proposed method is verified by using the simulation data and the measured data. We also apply this algorithm to the demodulation system.

  • A Survey on Spectrum Sensing and Learning Technologies for 6G Open Access

    Zihang SONG  Yue GAO  Rahim TAFAZOLLI  

     
    INVITED PAPER

      Pubricized:
    2021/04/26
      Vol:
    E104-B No:10
      Page(s):
    1207-1216

    Cognitive radio provides a feasible solution for alleviating the lack of spectrum resources by enabling secondary users to access the unused spectrum dynamically. Spectrum sensing and learning, as the fundamental function for dynamic spectrum sharing in 5G evolution and 6G wireless systems, have been research hotspots worldwide. This paper reviews classic narrowband and wideband spectrum sensing and learning algorithms. The sub-sampling framework and recovery algorithms based on compressed sensing theory and their hardware implementation are discussed under the trend of high channel bandwidth and large capacity to be deployed in 5G evolution and 6G communication systems. This paper also investigates and summarizes the recent progress in machine learning for spectrum sensing technology.

  • Highly Efficient Sensing Methods of Primary Radio Transmission Systems toward Dynamic Spectrum Sharing-Based 5G Systems Open Access

    Atomu SAKAI  Keiichi MIZUTANI  Takeshi MATSUMURA  Hiroshi HARADA  

     
    PAPER

      Pubricized:
    2021/03/30
      Vol:
    E104-B No:10
      Page(s):
    1227-1236

    The Dynamic Spectrum Sharing (DSS) system, which uses the frequency band allocated to incumbent systems (i.e., primary users) has attracted attention to expand the available bandwidth of the fifth-generation mobile communication (5G) systems in the sub-6GHz band. In Japan, a DSS system in the 2.3GHz band, in which the ARIB STD-B57-based Field Pickup Unit (FPU) is assigned as an incumbent system, has been studied for the secondary use of 5G systems. In this case, the incumbent FPU is a mobile system, and thus, the DSS system needs to use not only a spectrum sharing database but also radio sensors to detect primary signals with high accuracy, protect the primary system from interference, and achieve more secure spectrum sharing. This paper proposes highly efficient sensing methods for detecting the ARIB STD-B57-based FPU signals in the 2.3GHz band. The proposed methods can be applied to two types of the FPU signal; those that apply the Continuous Pilot (CP) mode pilot and the Scattered Pilot (SP) mode pilot. Moreover, we apply a sample addition method and a symbol addition method for improving the detection performance. Even in the 3GPP EVA channel environment, the proposed method can, with a probability of more than 99%, detect the FPU signal with an SNR of -10dB. In addition, we propose a quantized reference signal for reducing the implementation complexity of the complex cross-correlation circuit. The proposed reference signal can reduce the number of quantization bits of the reference signal to 2 bits for in-phase and 3 bits for orthogonal components.

  • Research & Development of the Advanced Dynamic Spectrum Sharing System between Different Radio Services Open Access

    Hiroyuki SHINBO  Kousuke YAMAZAKI  Yoji KISHI  

     
    INVITED PAPER

      Pubricized:
    2021/03/30
      Vol:
    E104-B No:10
      Page(s):
    1198-1206

    To achieve highly efficient spectrum usage, dynamic sharing of scarce spectrum resources has recently become the subject of intense discussion. The technologies of dynamic spectrum sharing (DSS) have already been adopted or are scheduled to be adopted in a number of countries, and Japan is no exception. The authors and organizations collaborating in the research and development project being undertaken in Japan have studied a novel DSS system positioned between the fifth-generation mobile communication system (5G system) and different incumbent radio systems. Our DSS system has three characteristics. (1) It detects dynamically unused sharable spectrums (USSs) of incumbent radio systems for the space axis by using novel propagation models and estimation of the transmitting location with radio sensor information. (2) It manages USSs for the time axis by interference calculation with propagation parameters, fair assignment and future usage of USSs. (3) It utilizes USSs for the spectrum axis by using methods that decrease interference for lower separation distances. In this paper, we present an overview and the technologies of our DSS system and its applications in Japan.

  • Per-Pixel Water Detection on Surfaces with Unknown Reflectance

    Chao WANG  Michihiko OKUYAMA  Ryo MATSUOKA  Takahiro OKABE  

     
    PAPER

      Pubricized:
    2021/07/06
      Vol:
    E104-D No:10
      Page(s):
    1555-1562

    Water detection is important for machine vision applications such as visual inspection and robot motion planning. In this paper, we propose an approach to per-pixel water detection on unknown surfaces with a hyperspectral image. Our proposed method is based on the water spectral characteristics: water is transparent for visible light but translucent/opaque for near-infrared light and therefore the apparent near-infrared spectral reflectance of a surface is smaller than the original one when water is present on it. Specifically, we use a linear combination of a small number of basis vector to approximate the spectral reflectance and estimate the original near-infrared reflectance from the visible reflectance (which does not depend on the presence or absence of water) to detect water. We conducted a number of experiments using real images and show that our method, which estimates near-infrared spectral reflectance based on the visible spectral reflectance, has better performance than existing techniques.

  • Max-Min 3-Dispersion Problems Open Access

    Takashi HORIYAMA  Shin-ichi NAKANO  Toshiki SAITOH  Koki SUETSUGU  Akira SUZUKI  Ryuhei UEHARA  Takeaki UNO  Kunihiro WASA  

     
    PAPER-Algorithms and Data Structures

      Pubricized:
    2021/03/19
      Vol:
    E104-A No:9
      Page(s):
    1101-1107

    Given a set P of n points on which facilities can be placed and an integer k, we want to place k facilities on some points so that the minimum distance between facilities is maximized. The problem is called the k-dispersion problem. In this paper, we consider the 3-dispersion problem when P is a set of points on a plane (2-dimensional space). Note that the 2-dispersion problem corresponds to the diameter problem. We give an O(n) time algorithm to solve the 3-dispersion problem in the L∞ metric, and an O(n) time algorithm to solve the 3-dispersion problem in the L1 metric. Also, we give an O(n2 log n) time algorithm to solve the 3-dispersion problem in the L2 metric.

  • 28 GHz-Band Experimental Trial Using the Shinkansen in Ultra High-Mobility Environment for 5G Evolution

    Nobuhide NONAKA  Kazushi MURAOKA  Tatsuki OKUYAMA  Satoshi SUYAMA  Yukihiko OKUMURA  Takahiro ASAI  Yoshihiro MATSUMURA  

     
    PAPER

      Pubricized:
    2021/04/01
      Vol:
    E104-B No:9
      Page(s):
    1000-1008

    In order to enhance the fifth generation (5G) mobile communication system further toward 5G Evolution, high bit-rate transmission using high SHF bands (28GHz or EHF bands) should be more stable even in high-mobility environments such as high speed trains. Of particular importance, dynamic changes in the beam direction and the larger Doppler frequency shift can degrade transmission performances in such high frequency bands. Thus, we conduct the world's first 28 GHz-band 5G experimental trial on an actual Shinkansen running at a speed of 283km/h in Japan. This paper introduces the 28GHz-band experimental system used in the 5G experimental trial using the Shinkansen, and then it presents the experimental configuration in which three base stations (BSs) are deployed along the Tokaido Shinkansen railway and a mobile station is located in the train. In addition, transmission performances measured in this ultra high-mobility environment, show that a peak throughput of exceeding 1.0Gbps and successful consecutive BS connection among the three BSs.

  • Efficient DLT-Based Method for Solving PnP, PnPf, and PnPfr Problems

    Gaku NAKANO  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2021/06/17
      Vol:
    E104-D No:9
      Page(s):
    1467-1477

    This paper presents an efficient method for solving PnP, PnPf, and PnPfr problems, which are the problems of determining camera parameters from 2D-3D point correspondences. The proposed method is derived based on a simple usage of linear algebra, similarly to the classical DLT methods. Therefore, the new method is easier to understand, easier to implement, and several times faster than the state-of-the-art methods using Gröbner basis. Contrary to the existing Gröbner basis methods, the proposed method consists of three algorithms depending on the number of the points and the 3D point configuration. Experimental results show that the proposed method is as accurate as the state-of-the-art methods even in near-planar scenes while achieving up to three times faster.

  • Single-Mode Condition of Chalcogenide Glass Channel Waveguides for Integrated Optical Devices Operated across the Astronomical N-Band

    Takashi YASUI  Jun-ichiro SUGISAKA  Koichi HIRAYAMA  

     
    BRIEF PAPER-Optoelectronics

      Pubricized:
    2021/01/13
      Vol:
    E104-C No:8
      Page(s):
    386-389

    In this study, we conduct guided mode analyses for chalcogenide glass channel waveguides using As2Se3 core and As2S3 lower cladding to determine their single-mode conditions across the astronomical N-band (8-12µm). The results reveal that a single-mode operation over the band can be achieved by choosing a suitable core-thickness.

81-100hit(2504hit)