An-Sheng CHAO Cheng-Wu LIN Hsin-Wen TING Soon-Jyh CHANG
The proposed stimulus design for linearity test is embedded in a differential successive approximation register analog-to-digital converter (SAR ADC), i.e. a design for testability (DFT). The proposed DFT is compatible to the pattern generator (PG) and output response analyzer (ORA) with the cost of 12.4-% area of the SAR ADC. The 10-bit SAR ADC prototype is verified in a 0.18-µm CMOS technology and the measured differential nonlinearity (DNL) error is between -0.386 and 0.281 LSB at 1-MS/s.
Ju-Ho LEE Goo-Yeon LEE Choong-Kyo JEONG
Mobile Multi-hop Relay (MMR) technology is usually used to increase the transmission rate or to extend communication coverage. In this work, we show that MMR technology can also be used to raise the network capacity. Because Relay Stations (RS) are connected to the Base Station (BS) wirelessly and controlled by the BS, an MMR network can easily be deployed when necessary. High capacity MMR networks thus provide a good candidate solution for coping with temporary traffic surges. For the capacity enhancement of the MMR network, we suggest a novel scheme to parallelize cell transmissions while controlling the interference between transmissions. Using a numerical example for a typical network that is conformant to the IEEE 802.16j, we find that the network capacity increases by 88 percent.
Chen-Yu YANG Zhen-Hua LING Li-Rong DAI
In this paper, an automatic and unsupervised method using context-dependent hidden Markov models (CD-HMMs) is proposed for the prosodic labeling of speech synthesis databases. This method consists of three main steps, i.e., initialization, model training and prosodic labeling. The initial prosodic labels are obtained by unsupervised clustering using the acoustic features designed according to the characteristics of the prosodic descriptor to be labeled. Then, CD-HMMs of the spectral parameters, F0s and phone durations are estimated by a means similar to the HMM-based parametric speech synthesis using the initial prosodic labels. These labels are further updated by Viterbi decoding under the maximum likelihood criterion given the acoustic feature sequences and the trained CD-HMMs. The model training and prosodic labeling procedures are conducted iteratively until convergence. The performance of the proposed method is evaluated on Mandarin speech synthesis databases and two prosodic descriptors are investigated, i.e., the prosodic phrase boundary and the emphasis expression. In our implementation, the prosodic phrase boundary labels are initialized by clustering the durations of the pauses between every two consecutive prosodic words, and the emphasis expression labels are initialized by examining the differences between the original and the synthetic F0 trajectories. Experimental results show that the proposed method is able to label the prosodic phrase boundary positions much more accurately than the text-analysis-based method without requiring any manually labeled training data. The unit selection speech synthesis system constructed using the prosodic phrase boundary labels generated by our proposed method achieves similar performance to that using the manual labels. Furthermore, the unit selection speech synthesis system constructed using the emphasis expression labels generated by our proposed method can convey the emphasis information effectively while maintaining the naturalness of synthetic speech.
Kou TANAKA Tomoki TODA Graham NEUBIG Sakriani SAKTI Satoshi NAKAMURA
This paper presents an electrolaryngeal (EL) speech enhancement method capable of significantly improving naturalness of EL speech while causing no degradation in its intelligibility. An electrolarynx is an external device that artificially generates excitation sounds to enable laryngectomees to produce EL speech. Although proficient laryngectomees can produce quite intelligible EL speech, it sounds very unnatural due to the mechanical excitation produced by the device. Moreover, the excitation sounds produced by the device often leak outside, adding to EL speech as noise. To address these issues, there are mainly two conventional approached to EL speech enhancement through either noise reduction or statistical voice conversion (VC). The former approach usually causes no degradation in intelligibility but yields only small improvements in naturalness as the mechanical excitation sounds remain essentially unchanged. On the other hand, the latter approach significantly improves naturalness of EL speech using spectral and excitation parameters of natural voices converted from acoustic parameters of EL speech, but it usually causes degradation in intelligibility owing to errors in conversion. We propose a hybrid approach using a noise reduction method for enhancing spectral parameters and statistical voice conversion method for predicting excitation parameters. Moreover, we further modify the prediction process of the excitation parameters to improve its prediction accuracy and reduce adverse effects caused by unvoiced/voiced prediction errors. The experimental results demonstrate the proposed method yields significant improvements in naturalness compared with EL speech while keeping intelligibility high enough.
Satoshi TAKAYA Hiroaki IKEDA Makoto NAGATA
A three dimensional (3D) chip stack featuring a 4096-bit wide I/O demonstrator incorporates an in-place waveform capturer on an intermediate interposer within the stack. The capturer includes probing channels on paths of signaling as well as in power delivery and collects analog waveforms for diagnosing circuits within 3D integration. The collection of in-place waveforms on vertical channels with through silicon vias (TSVs) are demonstrated among 128 vertical I/O channels distributed in 8 banks in a 9.9mm × 9.9mm die area. The analog waveforms confirm a full 1.2-V swing of signaling at the maximum data transmission bandwidth of 100GByte/sec with sufficiently small deviations of signal skews and slews among the vertical channels. In addition, it is also experimentally confirmed that the signal swing can be reduced to 0.75V for error free data transfer at 100GByte/sec, achieving the energy efficiency of 0.21pJ/bit.
Ce LIANG Xiyan SUN Yuanfa JI Qinghua LIU Guisheng LIAO
The composite binary offset carrier (CBOC) modulated signal contains multi-peaks in its auto-correlation function, which brings ambiguity to the signal acquisition process of a GNSS receiver. Currently, most traditional ambiguity-removing schemes for CBOC signal acquisition approximate CBOC signal as a BOC signal, which may incur performance degradation. Based on Galileo E1 CBOC signal, this paper proposes a novel adaptive ambiguity-removing acquisition scheme which doesn't adopt the approximation used in traditional schemes. According to the energy ratio of each sub-code of CBOC signal, the proposed scheme can self-adjust its local reference code to achieve unambiguous and precise signal synchronization. Monte Carlo simulation is conducted in this paper to analyze the performance of the proposed scheme and three traditional schemes. Simulation results show that the proposed scheme has higher detection probability and less mean acquisition time than the other three schemes, which verify the superiority of the proposed scheme.
Yuhu CHENG Xuesong WANG Ge CAO
A multi-source Tri-Training transfer learning algorithm is proposed by integrating transfer learning and semi-supervised learning. First, multiple weak classifiers are respectively trained by using both weighted source and target training samples. Then, based on the idea of co-training, each target testing sample is labeled by using trained weak classifiers and the sample with the same label is selected as the high-confidence sample to be added into the target training sample set. Finally, we can obtain a target domain classifier based on the updated target training samples. The above steps are iterated till the high-confidence samples selected at two successive iterations become the same. At each iteration, source training samples are tested by using the target domain classifier and the samples tested as correct continue with training, while the weights of samples tested as incorrect are lowered. Experimental results on text classification dataset have proven the effectiveness and superiority of the proposed algorithm.
Dajuan FAN Zhiqiu HUANG Lei TANG
One of the most important problems in web services application is the integration of different existing services into a new composite service. Existing work has the following disadvantages: (i) developers are often required to provide a composite service model first and perform formal verifications to check whether the model is correct. This makes the synthesis process of composite services semi-automatic, complex and inefficient; (ii) there is no assurance that composite services synthesized by using the fully-automatic approaches are correct; (iii) some approaches only handle simple composition problems where existing services are atomic. To address these problems, we propose a correct assurance approach for automatically synthesizing composite services based on finite state machine model. The syntax and semantics of the requirement model specifying composition requirements is also proposed. Given a set of abstract BPEL descriptions of existing services, and a composition requirement, our approach automatically generate the BPEL implementation of the composite service. Compared with existing approaches, the composite service generated by utilizing our proposed approach is guaranteed to be correct and does not require any formal verification. The correctness of our approach is proved. Moreover, the case analysis indicates that our approach is feasible and effective.
Ryo AIHARA Ryoichi TAKASHIMA Tetsuya TAKIGUCHI Yasuo ARIKI
This paper presents a voice conversion (VC) technique for noisy environments based on a sparse representation of speech. Sparse representation-based VC using Non-negative matrix factorization (NMF) is employed for noise-added spectral conversion between different speakers. In our previous exemplar-based VC method, source exemplars and target exemplars are extracted from parallel training data, having the same texts uttered by the source and target speakers. The input source signal is represented using the source exemplars and their weights. Then, the converted speech is constructed from the target exemplars and the weights related to the source exemplars. However, this exemplar-based approach needs to hold all training exemplars (frames), and it requires high computation times to obtain the weights of the source exemplars. In this paper, we propose a framework to train the basis matrices of the source and target exemplars so that they have a common weight matrix. By using the basis matrices instead of the exemplars, the VC is performed with lower computation times than with the exemplar-based method. The effectiveness of this method was confirmed by comparing its effectiveness (in speaker conversion experiments using noise-added speech data) with that of an exemplar-based method and a conventional Gaussian mixture model (GMM)-based method.
Keigo KUBO Sakriani SAKTI Graham NEUBIG Tomoki TODA Satoshi NAKAMURA
Grapheme-to-phoneme (g2p) conversion, used to estimate the pronunciations of out-of-vocabulary (OOV) words, is a highly important part of recognition systems, as well as text-to-speech systems. The current state-of-the-art approach in g2p conversion is structured learning based on the Margin Infused Relaxed Algorithm (MIRA), which is an online discriminative training method for multiclass classification. However, it is known that the aggressive weight update method of MIRA is prone to overfitting, even if the current example is an outlier or noisy. Adaptive Regularization of Weight Vectors (AROW) has been proposed to resolve this problem for binary classification. In addition, AROW's update rule is simpler and more efficient than that of MIRA, allowing for more efficient training. Although AROW has these advantages, it has not been applied to g2p conversion yet. In this paper, we first apply AROW on g2p conversion task which is structured learning problem. In an evaluation that employed a dataset generated from the collective knowledge on the Web, our proposed approach achieves a 6.8% error reduction rate compared to MIRA in terms of phoneme error rate. Also the learning time of our proposed approach was shorter than that of MIRA in almost datasets.
Honggyu JUNG Kwang-Yul KIM Yoan SHIN
We propose a cooperative compressed spectrum sensing scheme for correlated signals in wideband cognitive radio networks. In order to design a reconstruction algorithm which accurately recover the wideband signals from the compressed samples in low SNR (Signal-to-Noise Ratio) environments, we consider the multiple measurement vector model exploiting a sequence of input signals and propose a cooperative sparse Bayesian learning algorithm which models the temporal correlation of the input signals. Simulation results show that the proposed scheme outperforms existing compressed sensing algorithms for low SNRs.
Masato NAKAMURA Junya SEKIKAWA
Break arcs are generated in a DC48V and 12A resistive circuit. Silver electrical contacts are separated at constant opening speed. The cathode contact surface is irradiated by a blue LED. The center wavelength of the emission of the LED is 470nm. There is no spectral line of the light emitted from the break arcs. Only the images of contact surface are observed by a high-speed camera and an optical band pass filter. Another high-speed camera observes only the images of the break arc. Time evolutions of the cathode surface morphology being eroded by the break arcs and the motion of the break arcs are observed with these cameras, simultaneously. The images of the cathode surface are investigated by the image analysis technique. The results show that the moments when the expanded regions on the cathode surface are formed during the occurrence of the break arcs. In addition, it is shown that the expanded regions are not contacted directly to the cathode roots of the break arcs.
I-Jen CHAO Ching-Wen HOU Bin-Da LIU Soon-Jyh CHANG Chun-Yueh HUANG
A third-order low-distortion delta-sigma modulator (DSM), whose third-order noise-shaping ability is achieved by just a single opamp, is proposed. Since only one amplifier is required in the whole circuit, the designed DSM is very power efficient. To realize the adder in front of quantizer without employing the huge-power opamp, a capacitive passive adder, which is the digital-to-analog converter (DAC) array of a successive-approximation-type quantizer, is used. In addition, the feedback path timing is extended from a nonoverlapping interval for the conventional low-distortion structure to half of the clock period, so that the strict operation timing issue with regard to quantization and the dynamic element matching (DEM) logic operation can be solved. In the proposed DSM structure, the features of the unity-gain signal transfer function (STF) and finite-impulse-response (FIR) noise transfer function (NTF) are still preserved, and thus advantages such as a relaxed opamp slew rate and reduced output swing are also maintained, as with the conventional low-distortion DSM. Moreover, the memory effect in the proposed DSM is analyzed when employing the opamp sharing for integrators. The proposed third-order DSM with a 4-bit SAR ADC as the quantizer is implemented in a 90-nm CMOS process. The post-layout simulations show a 79.8-dB signal-to-noise and distortion ratio (SNDR) in the 1.875-MHz signal bandwidth (OSR=16). The active area of the circuit is 0.35mm2 and total power consumption is 2.85mW, resulting in a figure of merit (FOM) of 95 fJ/conversion-step.
Shoko YAMAHATA Yoshikazu YAMAGUCHI Atsunori OGAWA Hirokazu MASATAKI Osamu YOSHIOKA Satoshi TAKAHASHI
Recognition errors caused by out-of-vocabulary (OOV) words lead critical problems when developing spoken language understanding systems based on automatic speech recognition technology. And automatic vocabulary adaptation is an essential technique to solve these problems. In this paper, we propose a novel and effective automatic vocabulary adaptation method. Our method selects OOV words from relevant documents using combined scores of semantic and acoustic similarities. Using this combined score that reflects both semantic and acoustic aspects, only necessary OOV words can be selected without registering redundant words. In addition, our method estimates probabilities of OOV words using semantic similarity and a class-based N-gram language model. These probabilities will be appropriate since they are estimated by considering both frequencies of OOV words in target speech data and the stable class N-gram probabilities. Experimental results show that our method improves OOV selection accuracy and recognition accuracy of newly registered words in comparison with conventional methods.
Kazuto OGAWA Go OHTAKE Arisa FUJII Goichiro HANAOKA
For the sake of privacy preservation, services that are offered with reference to individual user preferences should do so with a sufficient degree of anonymity. We surveyed various tools that meet requirements of such services and decided that group signature schemes with weakened anonymity (without unlinkability) are adequate. Then, we investigated a theoretical gap between unlinkability of group signature schemes and their other requirements. We show that this gap is significantly large. Specifically, we clarify that if unlinkability can be achieved from any other property of group signature schemes, it becomes possible to construct a chosen-ciphertext secure cryptosystem from any one-way function. This result implies that the efficiency of group signature schemes can be drastically improved if unlinkability is not taken into account. We also demonstrate a way to construct a scheme without unlinkability that is significantly more efficient than the best known full-fledged scheme.
Masahiro FUKUI Shigeaki SASAKI Yusuke HIWASAKI Kimitaka TSUTSUMI Sachiko KURIHARA Hitoshi OHMURO Yoichi HANEDA
We proposes a new adaptive spectral masking method of algebraic vector quantization (AVQ) for non-sparse signals in the modified discreet cosine transform (MDCT) domain. This paper also proposes switching the adaptive spectral masking on and off depending on whether or not the target signal is non-sparse. The switching decision is based on the results of MDCT-domain sparseness analysis. When the target signal is categorized as non-sparse, the masking level of the target MDCT coefficients is adaptively controlled using spectral envelope information. The performance of the proposed method, as a part of ITU-T G.711.1 Annex D, is evaluated in comparison with conventional AVQ. Subjective listening test results showed that the proposed method improves sound quality by more than 0.1 points on a five-point scale on average for speech, music, and mixed content, which indicates significant improvement.
Bin YAO Hua WU Yun YANG Yuyan CHAO Atsushi OHTA Haruki KAWANAKA Lifeng HE
The Euler number of a binary image is an important topological property for pattern recognition, and can be calculated by counting certain bit-quads in the image. This paper proposes an efficient strategy for improving the bit-quad-based Euler number computing algorithm. By use of the information obtained when processing the previous bit quad, the number of times that pixels must be checked in processing a bit quad decreases from 4 to 2. Experiments demonstrate that an algorithm with our strategy significantly outperforms conventional Euler number computing algorithms.
Shinobu MIWA Takara INOUE Hiroshi NAKAMURA
Turbo mode, which accelerates many applications without major change of existing systems, is widely used in commercial processors. Since time duration or powerfulness of turbo mode depends on peak temperature of a processor chip, reducing the peak temperature can reinforce turbo mode. This paper presents that adding small amount of hardware allows microprocessors to reduce the peak temperature drastically and then to reinforce turbo mode successfully. Our approach is to find out a few small units that become heat sources in a processor and to appropriately duplicate them for reduction of their power density. By duplicating the limited units and using the copies evenly, the processor can show significant performance improvement while achieving area-efficiency. The experimental result shows that the proposed method achieves up to 14.5% of performance improvement in exchange for 2.8% of area increase.
Mahmoud KESHAVARZI Delaram AMIRI Amir Mansour PEZESHK Forouhar FARZANEH
This letter presents a novel method based on sparsity, to solve the problem of deinterleaving pulse trains. The proposed method models the problem of deinterleaving pulse trains as an underdetermined system of linear equations. After determining the mixing matrix, we find sparsest solution of an underdetermined system of linear equations using basis pursuit denoising. This method is superior to previous ones in a number of aspects. First, spurious and missing pulses would not cause any performance reduction in the algorithm. Second, the algorithm works well despite the type of pulse repetition interval modulation that is used. Third, the proposed method is able to separate similar sources.
Daichi KITAMURA Hiroshi SARUWATARI Kosuke YAGI Kiyohiro SHIKANO Yu TAKAHASHI Kazunobu KONDO
In this letter, we address monaural source separation based on supervised nonnegative matrix factorization (SNMF) and propose a new penalized SNMF. Conventional SNMF often degrades the separation performance owing to the basis-sharing problem. Our penalized SNMF forces nontarget bases to become different from the target bases, which increases the separated sound quality.