IEICE global.ieice.org Site

Keyword Search Result

[Keyword] SPE(2504hit)

121-140hit(2504hit)

DNN-Based Full-Band Speech Synthesis Using GMM Approximation of Spectral Envelope
Junya KOGUCHI Shinnosuke TAKAMICHI Masanori MORISE Hiroshi SARUWATARI Shigeki SAGAYAMA

PAPER-Speech and Hearing

Pubricized:
2020/09/03
Vol:
E103-D No:12
Page(s):
2673-2681
We propose a speech analysis-synthesis and deep neural network (DNN)-based text-to-speech (TTS) synthesis framework using Gaussian mixture model (GMM)-based approximation of full-band spectral envelopes. GMMs have excellent properties as acoustic features in statistic parametric speech synthesis. Each Gaussian function of a GMM fits the local resonance of the spectrum. The GMM retains the fine spectral envelope and achieve high controllability of the structure. However, since conventional speech analysis methods (i.e., GMM parameter estimation) have been formulated for a narrow-band speech, they degrade the quality of synthetic speech. Moreover, a DNN-based TTS synthesis method using GMM-based approximation has not been formulated in spite of its excellent expressive ability. Therefore, we employ peak-picking-based initialization for full-band speech analysis to provide better initialization for iterative estimation of the GMM parameters. We introduce not only prediction error of GMM parameters but also reconstruction error of the spectral envelopes as objective criteria for training DNN. Furthermore, we propose a method for multi-task learning based on minimizing these errors simultaneously. We also propose a post-filter based on variance scaling of the GMM for our framework to enhance synthetic speech. Experimental results from evaluating our framework indicated that 1) the initialization method of our framework outperformed the conventional one in the quality of analysis-synthesized speech; 2) introducing the reconstruction error in DNN training significantly improved the synthetic speech; 3) our variance-scaling-based post-filter further improved the synthetic speech.
Characterization of Multi-Layer Ceramic Chip Capacitors up to mm-Wave Frequencies for High-Speed Digital Signal Coupling Open Access
Tsugumichi SHIBATA Yoshito KATO

PAPER

Pubricized:
2020/04/09
Vol:
E103-C No:11
Page(s):
575-581
Capacitive coupling of line coded and DC-balanced digital signals is often used to eliminate steady bias current flow between the systems or components in various communication systems. A multi-layer ceramic chip capacitor is promising for the capacitor of very broadband signal coupling because of its high frequency characteristics expected from the downsizing of the chip recent years. The lower limit of the coupling bandwidth is determined by the capacitance while the higher limit is affected by the parasitic inductance associated with the chip structure. In this paper, we investigate the coupling characteristics up to millimeter wave frequencies by the measurement and simulations. A phenomenon has been found in which the change in the current distribution in the chip structure occur at high frequencies and the coupling characteristics are improved compared to the prediction based on the conventional equivalent circuit model. A new equivalent circuit model of chip capacitor that can express the effect of the improvement has been proposed.
Pulse Coding Controlled Switching Converter that Generates Notch Frequency to Suit Noise Spectrum
Yifei SUN Yasunori KOBORI Anna KUWANA Haruo KOBAYASHI

PAPER-Energy in Electronics Communications

Pubricized:
2020/05/20
Vol:
E103-B No:11
Page(s):
1331-1340
This paper proposes a noise reduction technology for a specific frequency band that uses the pulse coding controlled method to automatically set the notch frequency in DC-DC switching converters of communication equipment. For reducing the power levels at the frequency and its harmonics in the switching converter, we often use a frequency-modulated clock. This paper investigates a technology that prevents modulated clock frequency noise from spreading into protected frequency bands; this proposed noise reduction technology does not distribute the switching noise into some specified frequency bands. The notch in the spectrum of the switching pulses is created by the Pulse Width Coding (PWC) method. In communication devices, the noise in the receiving signal band must be as small as possible. The notch frequency is automatically set to the frequency of the received signal by adjusting the clock frequency using the equation Fn = (P+0.5)Fck. Here Fn is the notch frequency, Fck is the clock frequency, and P is a positive integer that determines the noise spectrum location. Therefore, simply be setting the notch frequency to the received signal frequency can suppress the noise present. We confirm with simulations that the proposed technique is effective for noise reduction and notch generation. Also we implement a method of automatic switching between two receiving channels. The conversion voltage ratio in the pulse width coding method switching converter is analyzed and full automatic notch frequency generation is realized. Experiments on a prototype circuit confirm notch frequency generation.
Speech Chain VC: Linking Linguistic and Acoustic Levels via Latent Distinctive Features for RBM-Based Voice Conversion
Takuya KISHIDA Toru NAKASHIKA

PAPER-Speech and Hearing

Pubricized:
2020/08/06
Vol:
E103-D No:11
Page(s):
2340-2350
This paper proposes a voice conversion (VC) method based on a model that links linguistic and acoustic representations via latent phonological distinctive features. Our method, called speech chain VC, is inspired by the concept of the speech chain, where speech communication consists of a chain of events linking the speaker's brain with the listener's brain. We assume that speaker identity information, which appears in the acoustic level, is embedded in two steps — where phonological information is encoded into articulatory movements (linguistic to physiological) and where articulatory movements generate sound waves (physiological to acoustic). Speech chain VC represents these event links by using an adaptive restricted Boltzmann machine (ARBM) introducing phoneme labels and acoustic features as two classes of visible units and latent phonological distinctive features associated with articulatory movements as hidden units. Subjective evaluation experiments showed that intelligibility of the converted speech significantly improved compared with the conventional ARBM-based method. The speaker-identity conversion quality of the proposed method was comparable to that of a Gaussian mixture model (GMM)-based method. Analyses on the representations of the hidden layer of the speech chain VC model supported that some of the hidden units actually correspond to phonological distinctive features. Final part of this paper proposes approaches to achieve one-shot VC by using the speech chain VC model. Subjective evaluation experiments showed that when a target speaker is the same gender as a source speaker, the proposed methods can achieve one-shot VC based on each single source and target speaker's utterance.
Fast Converging ADMM Penalized Decoding Method Based on Improved Penalty Function for LDPC Codes
Biao WANG

LETTER-Coding Theory

Pubricized:
2020/05/08
Vol:
E103-A No:11
Page(s):
1304-1307
For low-density parity-check (LDPC) codes, the penalized decoding method based on the alternating direction method of multipliers (ADMM) can improve the decoding performance at low signal-to-noise ratios and also has low decoding complexity. There are three effective methods that could increase the ADMM penalized decoding speed, which are reducing the number of Euclidean projections in ADMM penalized decoding, designing an effective penalty function and selecting an appropriate layered scheduling strategy for message transmission. In order to further increase the ADMM penalized decoding speed, through reducing the number of Euclidean projections and using the vertical layered scheduling strategy, this paper designs a fast converging ADMM penalized decoding method based on the improved penalty function. Simulation results show that the proposed method not only improves the decoding performance but also reduces the average number of iterations and the average decoding time.
Analysis of Pulse Responses by Dispersion Medium with Periodically Conducting Strips
Ryosuke OZAKI Tomohiro KAGAWA Tsuneki YAMASAKI

BRIEF PAPER

Pubricized:
2020/05/14
Vol:
E103-C No:11
Page(s):
613-616
In this paper, we analyzed the pulse responses of dispersion medium with periodically conducting strips by using a fast inversion Laplace transform (FILT) method combined with point matching method (PMM) for both the TM and TE cases. Specifically, we investigated the influence of the width and number of the conducting strips on the pulse response and distribution of the electric field.
Available Spectral Space in C-Band Expansion Remaining After Optical Quantization Based on Intensity-to-Lambda Conversion Open Access
Yuta KAIHORI Yu YAMASAKI Tsuyoshi KONISHI

INVITED PAPER

Pubricized:
2020/05/14
Vol:
E103-B No:11
Page(s):
1206-1213
A high degree of freedom in spectral domain allows us to accommodate additional optical signal processing for wavelength division multiplexing in photonic analog-to-digital conversion. We experimentally verified a spectral compression to save a necessary bandwidth for soliton self-frequency shift based optical quantization through the cascade of the four-wave mixing based and the sum-frequency generation based spectral compression. This approach can realize 0.03 nm individual bandwidth correspond to save up to more than 85 percent of bandwidth for 7-bit optical quantization in C-band.
Measurement of Spectral Transfer Matrix for DMD Analysis by Using Linear Optical Sampling
Yuki OSAKA Fumihiko ITO Daisuke IIDA Tetsuya MANABE

PAPER

Pubricized:
2020/06/08
Vol:
E103-B No:11
Page(s):
1233-1239
Mode-by-mode impulse responses, or spectral transfer matrix (STM) of birefringent fibers are measured by using linear optical sampling, with assist of polarization multiplexed probe pulse. By using the eigenvalue analysis of the STM, the differential mode delay and PMD vector of polarization-maintaining fiber are analyzed as a function of optical frequency over 1THz. We show that the amplitude averaging of the complex impulse responses is effective for enhancing the signal-to-noise ratio of the measurement, resulting in improving the accuracy and expanding the bandwidth of the measurement.
Efficient Algorithms for the Partial Sum Dispersion Problem
Toshihiro AKAGI Tetsuya ARAKI Shin-ichi NAKANO

PAPER-optimization

Vol:
E103-A No:10
Page(s):
1206-1210
The dispersion problem is a variant of the facility location problem. Given a set P of n points and an integer k, we intend to find a subset S of P with |S|=k such that the cost minp∈S{cost(p)} is maximized, where cost(p) is the sum of the distances from p to the nearest c points in S. We call the problem the dispersion problem with partial c sum cost, or the PcS-dispersion problem. In this paper we present two algorithms to solve the P2S-dispersion problem(c=2) if all points of P are on a line. The running times of the algorithms are O(kn2 log n) and O(n log n), respectively. We also present an algorithm to solve the PcS-dispersion problem if all points of P are on a line. The running time of the algorithm is O(knc+1).
4th Order Moment-Based Linear Prediction for Estimating Ringing Sound of Impulsive Noise in Speech Enhancement Open Access
Naoto SASAOKA Eiji AKAMATSU Arata KAWAMURA Noboru HAYASAKA Yoshio ITOH

LETTER-Digital Signal Processing

Pubricized:
2020/04/02
Vol:
E103-A No:10
Page(s):
1248-1251
Speech enhancement has been proposed to reduce the impulsive noise whose frequency characteristic is wideband. On the other hand, it is challenging to reduce the ringing sound, which is narrowband in impulsive noise. Therefore, we propose the modeling of the ringing sound and its estimation by a linear predictor (LP). However, it is difficult to estimate the ringing sound only in noisy speech due to the auto-correlation property of speech. The proposed system adopts the 4th order moment-based adaptive algorithm by noticing the difference between the 4th order statistics of speech and impulsive noise. The brief analysis and simulation results show that the proposed system has the potential to reduce ringing sound while keeping the quality of enhanced speech.
System Throughput Gain by New Channel Allocation Scheme for Spectrum Suppressed Transmission in Multi-Channel Environments over a Satellite Transponder
Sumika OMATA Motoi SHIRAI Takatoshi SUGIYAMA

PAPER

Pubricized:
2020/03/27
Vol:
E103-B No:10
Page(s):
1059-1068
A spectrum suppressed transmission that increases the frequency utilization efficiency, defined as throughput/bandwidth, by suppressing the required bandwidth has been proposed. This is one of the most effective schemes to solve the exhaustion problem of frequency bandwidths. However, in spectrum suppressed transmission, its transmission quality potentially degrades due to the ISI making the bandwidth narrower than the Nyquist bandwidth. In this paper, in order to improve the transmission quality degradation, we propose the spectrum suppressed transmission applying both FEC (forward error correction) and LE (linear equalization). Moreover, we also propose a new channel allocation scheme for the spectrum suppressed transmission, in multi-channel environments over a satellite transponder. From our computer simulation results, we clarify that the proposed schemes are more effective at increasing the system throughput than the scheme without spectrum suppression.
A Visual Inspection System for Accurate Positioning of Railway Fastener
Jianwei LIU Hongli LIU Xuefeng NI Ziji MA Chao WANG Xun SHAO

PAPER-Image Recognition, Computer Vision

Pubricized:
2020/07/17
Vol:
E103-D No:10
Page(s):
2208-2215
Automatic disassembly of railway fasteners is of great significance for improving the efficiency of replacing rails. The accurate positioning of fastener is the key factor to realize automatic disassembling. However, most of the existing literature mainly focuses on fastener region positioning and the literature on accurate positioning of fasteners is scarce. Therefore, this paper constructed a visual inspection system for accurate positioning of fastener (VISP). At first, VISP acquires railway image by image acquisition subsystem, and then the subimage of fastener can be obtained by coarse-to-fine method. Subsequently, the accurate positioning of fasteners can be completed by three steps, including contrast enhancement, binarization and spike region extraction. The validity and robustness of the VISP were verified by vast experiments. The results show that VISP has competitive performance for accurate positioning of fasteners. The single positioning time is about 260ms, and the average positioning accuracy is above 90%. Thus, it is with theoretical interest and potential industrial application.
Top-N Recommendation Using Low-Rank Matrix Completion and Spectral Clustering
Qian WANG Qingmei ZHOU Wei ZHAO Xuangou WU Xun SHAO

PAPER-Internet

Pubricized:
2020/03/16
Vol:
E103-B No:9
Page(s):
951-959
In the age of big data, recommendation systems provide users with fast access to interesting information, resulting to a significant commercial value. However, the extreme sparseness of user assessment data is one of the key factors that lead to the poor performance of recommendation algorithms. To address this problem, we propose a spectral clustering recommendation scheme with low-rank matrix completion and spectral clustering. Our scheme exploits spectral clustering to achieve the division of a similar user group. Meanwhile, the low-rank matrix completion is used to effectively predict un-rated items in the sub-matrix of the spectral clustering. With the real dataset experiment, the results show that our proposed scheme can effectively improve the prediction accuracy of un-rated items.
Silent Speech Interface Using Ultrasonic Doppler Sonar
Ki-Seung LEE

PAPER-Speech and Hearing

Pubricized:
2020/05/20
Vol:
E103-D No:8
Page(s):
1875-1887
Some non-acoustic modalities have the ability to reveal certain speech attributes that can be used for synthesizing speech signals without acoustic signals. This study validated the use of ultrasonic Doppler frequency shifts caused by facial movements to implement a silent speech interface system. A 40kHz ultrasonic beam is incident to a speaker's mouth region. The features derived from the demodulated received signals were used to estimate the speech parameters. A nonlinear regression approach was employed in this estimation where the relationship between ultrasonic features and corresponding speech is represented by deep neural networks (DNN). In this study, we investigated the discrepancies between the ultrasonic signals of audible and silent speech to validate the possibility for totally silent communication. Since reference speech signals are not available in silently mouthed ultrasonic signals, a nearest-neighbor search and alignment method was proposed, wherein alignment was achieved by determining the optimal pair of ultrasonic and audible features in the sense of a minimum mean square error criterion. The experimental results showed that the performance of the ultrasonic Doppler-based method was superior to that of EMG-based speech estimation, and was comparable to an image-based method.
Interference Management Using Beamforming Techniques for Line-of-Sight Femtocell Networks
Khalid Sheikhidris MOHAMED Mohamad Yusoff ALIAS Mardeni ROSLEE

PAPER-Terrestrial Wireless Communication/Broadcasting Technologies

Pubricized:
2020/01/24
Vol:
E103-B No:8
Page(s):
881-887
Femtocell structures can offer better voice and data exchange in cellular networks. However, interference in such networks poses a major challenge in the practical development of cellular communication. To tackle this issue, an advanced interference mitigation scheme for Line-Of-Sight (LOS) femtocell networks in indoor environments is proposed in this paper. Using a femtocell management system (FMS) that controls all femtocells in a service area, the aggressor femtocells are identified and then the transmitted beam patterns are adjusted using the linear array antenna equipped in each femtocell to mitigate the interference contribution to the neighbouring femtocells. Prior to that, the affected users are switched to the femtocells that provide better throughput levels to avoid increasing the outage probability. This paper considers different femtocell deployment indexes to verify and justifies the feasibility of the findings in different density areas. Relative to fixed and adaptive power control schemes, the proposed scheme achieves approximately 5% spectral efficiency (SE) improvement, about 10% outage probability reduction, and about 7% Mbps average user throughput improvement.
Array Design of High-Density Emerging Memories Making Clamped Bit-Line Sense Amplifier Compatible with Dummy Cell Average Read Scheme
Ziyue ZHANG Takashi OHSAWA

PAPER-Integrated Electronics

Pubricized:
2020/02/26
Vol:
E103-C No:8
Page(s):
372-380
Reference current used in sense amplifiers is a crucial factor in a single-end read manner for emerging memories. Dummy cell average read scheme uses multiple pairs of dummy cells inside the array to generate an accurate reference current for data sensing. The previous research adopts current mirror sense amplifier (CMSA) which is compatible with the dummy cell average read scheme. However, clamped bit-line sense amplifier (CBLSA) has higher sensing speed and lower power consumption compared with CMSA. Therefore, applying CBLSA to dummy cell average read scheme is expected to enhance the performance. This paper reveals that direct combination of CBLSA and dummy cell average read scheme leads to sense margin degradation. In order to solve this problem, a new array design is proposed to make CBLSA compatible with dummy cell average read scheme. Current mirror structure is employed to prevent CBLSA from being short-circuited directly. The simulation result shows that the minimum sensible tunnel magnetoresistance ratio (TMRR) can be extended from 14.3% down to 1%. The access speed of the proposed sensing scheme is less than 2 ns when TMRR is 70% or larger, which is about twice higher than the previous research. And this circuit design just consumes half of the energy in one read cycle compared with the previous research. In the proposed array architecture, all the dummy cells can be always short-circuited in totally isolated area by low-resistance metal wiring instead of using controlling transistors. This structure is able to contribute to increasing the dummy cell averaging effect. Besides, the array-level simulation validates that the array design is accessible to every data cell. This design is generally applicable to any kinds of resistance-variable emerging memories including STT-MRAM.
Low Complexity Statistic Computation for Energy Detection Based Spectrum Sensing with Multiple Antennas
Shusuke NARIEDA Hiroshi NARUSE

PAPER-Communication Theory and Signals

Vol:
E103-A No:8
Page(s):
969-977
This paper presents a novel statistic computation technique for energy detection-based spectrum sensing with multiple antennas. The presented technique computes the statistic for signal detection after combining all the signals. Because the computation of the statistic for all the received signals is not required, the presented technique reduces the computational complexity. Furthermore, the absolute value of all the received signals are combined to prevent the attenuation of the combined signals. Because the statistic computations are not required for all the received signals, the reduction of the computational complexity for signal detection can be expected. Furthermore, the presented technique does not need to choose anything, such as the binary phase rotator in the conventional technique, and therefore, the performance degradation due to wrong choices can be avoided. Numerical examples indicate that the spectrum sensing performances of the presented technique are almost the same as those of conventional techniques despite the complexity of the presented technique being less than that of the conventional techniques.
Spectrum Sensing with Selection Diversity Combining in Cognitive Radio
Shusuke NARIEDA Hiromichi OGASAWARA Hiroshi NARUSE

PAPER-Communication Theory and Signals

Vol:
E103-A No:8
Page(s):
978-986
This paper presents a novel spectrum sensing technique based on selection diversity combining in cognitive radio networks. In general, a selection diversity combining scheme requires a period to select an optimal element, and spectrum sensing requires a period to detect a target signal. We consider that both these periods are required for the spectrum sensing based on selection diversity combining. However, conventional techniques do not consider both the periods. Furthermore, spending a large amount of time in selection and signal detection increases their accuracy. Because the required period for spectrum sensing based on selection diversity combining is the summation of both the periods, their lengths should be considered while developing selection diversity combining based spectrum sensing for a constant period. In reference to this, we discuss the spectrum sensing technique based on selection diversity combining. Numerical examples are shown to validate the effectiveness of the presented design techniques.
Magic Line: An Integrated Method for Fast Parts Counting and Orientation Recognition Using Industrial Vision Systems
Qiaochu ZHAO Ittetsu TANIGUCHI Makoto NAKAMURA Takao ONOYE

PAPER-Vision

Vol:
E103-A No:7
Page(s):
928-936
Vision systems are widely adopted in industrial fields for monitoring and automation. As a typical example, industrial vision systems are extensively implemented in vibrator parts feeder to ensure orientations of parts for assembling are aligned and disqualified parts are eliminated. An efficient parts orientation recognition and counting method is thus critical to adopt. In this paper, an integrated method for fast parts counting and orientation recognition using industrial vision systems is proposed. Original 2D spatial image signal of parts is decomposed to 1D signal with its temporal variance, thus efficient recognition and counting is achievable, feeding speed of each parts is further leveraged to elaborate counting in an adaptive way. Experiments on parts of different types are conducted, the experimental results revealed that our proposed method is both more efficient and accurate compared to other relevant methods.
A Two-Stage Phase-Aware Approach for Monaural Multi-Talker Speech Separation
Lu YIN Junfeng LI Yonghong YAN Masato AKAGI

PAPER-Speech and Hearing

Pubricized:
2020/04/20
Vol:
E103-D No:7
Page(s):
1732-1743
The simultaneous utterances impact the ability of both the hearing-impaired persons and automatic speech recognition systems. Recently, deep neural networks have dramatically improved the speech separation performance. However, most previous works only estimate the speech magnitude and use the mixture phase for speech reconstruction. The use of the mixture phase has become a critical limitation for separation performance. This study proposes a two-stage phase-aware approach for multi-talker speech separation, which integrally recovers the magnitude as well as the phase. For the phase recovery, Multiple Input Spectrogram Inversion (MISI) algorithm is utilized due to its effectiveness and simplicity. The study implements the MISI algorithm based on the mask and gives that the ideal amplitude mask (IAM) is the optimal mask for the mask-based MISI phase recovery, which brings less phase distortion. To compensate for the error of phase recovery and minimize the signal distortion, an advanced mask is proposed for the magnitude estimation. The IAM and the proposed mask are estimated at different stages to recover the phase and the magnitude, respectively. Two frameworks of neural network are evaluated for the magnitude estimation on the second stage, demonstrating the effectiveness and flexibility of the proposed approach. The experimental results demonstrate that the proposed approach significantly minimizes the distortions of the separated speech.

121-140hit(2504hit)

Keyword Search Result

[Keyword] SPE(2504hit)

DNN-Based Full-Band Speech Synthesis Using GMM Approximation of Spectral Envelope

Characterization of Multi-Layer Ceramic Chip Capacitors up to mm-Wave Frequencies for High-Speed Digital Signal Coupling Open Access

Pulse Coding Controlled Switching Converter that Generates Notch Frequency to Suit Noise Spectrum

Speech Chain VC: Linking Linguistic and Acoustic Levels via Latent Distinctive Features for RBM-Based Voice Conversion

Fast Converging ADMM Penalized Decoding Method Based on Improved Penalty Function for LDPC Codes

Analysis of Pulse Responses by Dispersion Medium with Periodically Conducting Strips

Available Spectral Space in C-Band Expansion Remaining After Optical Quantization Based on Intensity-to-Lambda Conversion Open Access

Measurement of Spectral Transfer Matrix for DMD Analysis by Using Linear Optical Sampling

Efficient Algorithms for the Partial Sum Dispersion Problem

4th Order Moment-Based Linear Prediction for Estimating Ringing Sound of Impulsive Noise in Speech Enhancement Open Access

System Throughput Gain by New Channel Allocation Scheme for Spectrum Suppressed Transmission in Multi-Channel Environments over a Satellite Transponder

A Visual Inspection System for Accurate Positioning of Railway Fastener

Top-N Recommendation Using Low-Rank Matrix Completion and Spectral Clustering

Silent Speech Interface Using Ultrasonic Doppler Sonar

Interference Management Using Beamforming Techniques for Line-of-Sight Femtocell Networks

Array Design of High-Density Emerging Memories Making Clamped Bit-Line Sense Amplifier Compatible with Dummy Cell Average Read Scheme

Low Complexity Statistic Computation for Energy Detection Based Spectrum Sensing with Multiple Antennas

Spectrum Sensing with Selection Diversity Combining in Cognitive Radio

Magic Line: An Integrated Method for Fast Parts Counting and Orientation Recognition Using Industrial Vision Systems

A Two-Stage Phase-Aware Approach for Monaural Multi-Talker Speech Separation

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles