Hiroyuki UZAWA Kazuhiko TERADA Koyo NITTA
The power consumption of optical network units (ONUs) is a major issue in optical access networks. The downstream buffer is one of the largest power consumers among the functional blocks of an ONU. A cyclic sleep scheme for reducing power has been reported, which periodically powers off not only the downstream buffer but also other components, such as optical transceivers, when the idle period is long. However, when the idle period is short, it cannot power off those components even if the input data rate is low. Therefore, as continuous traffic, such as video, increases, the power-reduction effect decreases. To resolve this issue, we propose another sleep scheme in which the downstream buffer can be partially powered off by cooperative operation with an optical line terminal. Simulation and experimental results indicate that the proposed scheme reduces ONU power consumption without causing frame loss even while the ONU continuously receives traffic and the idle period is short.
Hongbing LI Qunfei ZHANG Weike FENG
A novel matrix completion ESPRIT (MC-ESPRIT) algorithm is proposed to estimate the direction of arrival (DOA) with nonuniform linear arrays (NLA). By exploiting the matrix completion theory and the characters of Hankel matrix, the received data matrix of an NLA is tranformed into a two-fold Hankel matrix, which is a treatable for matrix completion. Then the decision variable can be reconstructed by the inexact augmented Lagrange multiplier method. This approach yields a completed data matrix, which is the same as the data matrix of uniform linear array (ULA). Thus the ESPRIT-type algorithm can be used to estimate the DOA. The MC-ESPRIT could resolve more signals than the MUSIC-type algorithms with NLA. Furthermore, the proposed algorithm does not need to divide the field of view of the array compared to the existing virtual interpolated array ESPRIT (VIA-ESPRIT). Simulation results confirm the effectiveness of MC-ESPRIT.
Xiang ZHAO Zishu HE Yikai WANG Yuan JIANG
This letter addresses the problem of space-time adaptive processing (STAP) for airborne nonuniform linear array (NLA) radar using a generalized sidelobe canceller (GSC). Due to the difficulty of determining the spatial nulls for the NLAs, it is a problem to obtain a valid blocking matrix (BM) of the GSC directly. In order to solve this problem and improve the STAP performance, a BM modification method based on the modified Gram-Schmidt orthogonalization algorithm is proposed. The modified GSC processor can achieve the optimal STAP performance and as well a faster convergence rate than the orthogonal subspace projection method. Numerical simulations validate the effectiveness of the proposed methods.
Richeng DUAN Tatsuya KAWAHARA Masatake DANTSUJI Jinsong ZHANG
Aiming at detecting pronunciation errors produced by second language learners and providing corrective feedbacks related with articulation, we address effective articulatory models based on deep neural network (DNN). Articulatory attributes are defined for manner and place of articulation. In order to efficiently train these models of non-native speech without such data, which is difficult to collect in a large scale, several transfer learning based modeling methods are explored. We first investigate three closely-related secondary tasks which aim at effective learning of DNN articulatory models. We also propose to exploit large speech corpora of native and target language to model inter-language phenomena. This kind of transfer learning can provide a better feature representation of non-native speech. Related task transfer and language transfer learning are further combined on the network level. Compared with the conventional DNN which is used as the baseline, all proposed methods improved the performance. In the native attribute recognition task, the network-level combination method reduced the recognition error rate by more than 10% relative for all articulatory attributes. The method was also applied to pronunciation error detection in Mandarin Chinese pronunciation learning by Japanese native speakers, and achieved the relative improvement up to 17.0% for detection accuracy and up to 19.9% for F-score, which is also better than the lattice-based combination.
Thamarak KHAMPEERPAT Chaiporn JAIKAEO
Wireless sensor networks are being used in many disaster-related applications. Certain types of disasters are studied and modeled with different and dynamic risk estimations in different areas, hence requiring different levels of monitoring. Such nonuniform and dynamic coverage requirements pose a challenge to a sensor coverage problem. This work proposes the Mobile sensor Relocation using Delaunay triangulation And Shifting on Hill climbing (MR-DASH) approach, which calculates an appropriate location for each mobile sensor as an attempt to maximize coverage ratio. Based on a probabilistic sensing model, it constructs a Delaunay triangulation from static sensors' locations and vertices of interesting regions. The resulting triangles are then prioritized based on their sizes and corresponding levels of requirement so that mobile sensors can be relocated accordingly. The proposed method was both compared with an existing previous work and demonstrated with real-world disaster scenarios by simulation. The result showed that MR-DASH gives appropriate target locations that significantly improve the coverage ratio with relatively low total sensors' moving distance, while properly adapting to variations in coverage requirements.
Meixu SONG Jielin PAN Qingwei ZHAO Yonghong YAN
Introducing pronunciation models into decoding has been proven to be benefit to LVCSR. In this paper, a discriminative pronunciation modeling method is presented, within the framework of the Minimum Phone Error (MPE) training for HMM/GMM. In order to bring the pronunciation models into the MPE training, the auxiliary function is rewritten at word level and decomposes into two parts. One is for co-training the acoustic models, and the other is for discriminatively training the pronunciation models. On Mandarin conversational telephone speech recognition task, compared to the baseline using a canonical lexicon, the discriminative pronunciation models reduced the absolute Character Error Rate (CER) by 0.7% on LDC test set, and with the acoustic model co-training, 0.8% additional CER decrease had been achieved.
Quang Thang DUONG Shinsuke IBI Seiichi SAMPEI
This paper studies channel sounding for selfish dynamic spectrum control (S-DSC) in which each link dynamically maps its spectral components onto a necessary amount of discrete frequencies having the highest channel gain of the common system band. In S-DSC, it is compulsory to conduct channel sounding for the entire system band by using a reference signal whose spectral components are sparsely allocated by S-DSC. Using nonuniform sampling theory, this paper exploits the finite impulse response characteristic of frequency selective fading channels to carry out the channel sounding. However, when the number of spectral components is relatively small compared to the number of discrete frequencies of the system band, reliability of the channel sounding deteriorates severely due to the ill-conditioned problem and degradation in channel capacity of the next frame occurs as a result. Aiming at balancing frequency selection diversity effect and reliability of channel sounding, this paper proposes an S-DSC which allocates an appropriate number of spectral components onto discrete frequencies with low predicted channel gain besides mapping the rest onto those with high predicted channel gain. A numerical analysis confirms that the proposed S-DSC gives significant enhancement in channel capacity performance.
Junbo ZHANG Fuping PAN Bin DONG Qingwei ZHAO Yonghong YAN
In this paper, we presented a novel method for automatic pronunciation quality assessment. Unlike the popular “Goodness of Pronunciation” (GOP) method, this method does not map the decoding confidence into pronunciation quality score, but differentiates the different pronunciation quality utterances directly. In this method, the student's utterance need to be decoded for two times. The first-time decoding was for getting the time points of each phone of the utterance by a forced alignment using a conventional trained acoustic model (AM). The second-time decoding was for differentiating the pronunciation quality for each triphone using a specially trained AM, where the triphones in different pronunciation qualities were trained as different units, and the model was trained in discriminative method to ensure the model has the best discrimination among the triphones whose names were same but pronunciation quality scores were different. The decoding network in the second-time decoding included different pronunciation quality triphones, so the phone-level scores can be obtained from the decoding result directly. The phone-level scores were combined into the sentence-level scores using maximum entropy criterion. The experimental results shows that the scoring performance was increased significantly compared to the GOP method, especially in sentence-level.
Jhih-Chung CHANG Jui-Chung HUNG Ann-Chen CHANG
The letter deals with direction-of-arrival (DOA) estimation under nonuniform white noise and moderately small signal-to-noise ratios. The proposed approach first uses signal subspace projection for received data vectors, which form an efficient iterative quadratic maximum-likelihood (IQML) approach to achieve fast convergence and high resolution capabilities. In conjunction with a signal subspace selection technique, a more exact signal subspace can be obtained for reducing the nonuniform noise effect. The performance improvement achieved by applying the proposal to the classic IQML method is confirmed by computer simulations.
Dean LUO Yu QIAO Nobuaki MINEMATSU Keikichi HIROSE
This study focuses on speaker adaptation techniques for Computer-Assisted Language Learning (CALL). We first investigate the effects and problems of Maximum Likelihood Linear Regression (MLLR) speaker adaptation when used in pronunciation evaluation. Automatic scoring and error detection experiments are conducted on two publicly available databases of Japanese learners' English pronunciation. As we expected, over-adaptation causes misjudgment of pronunciation accuracy. Following the analysis, we propose a novel method, Regularized Maximum Likelihood Regression (Regularized-MLLR) adaptation, to solve the problem of the adverse effects of MLLR adaptation. This method uses a group of teachers' data to regularize learners' transformation matrices so that erroneous pronunciations will not be erroneously transformed as correct ones. We implement this idea in two ways: one is using the average of the teachers' transformation matrices as a constraint to MLLR, and the other is using linear combinations of the teachers' matrices to represent learners' transformations. Experimental results show that the proposed methods can better utilize MLLR adaptation and avoid over-adaptation.
In this Letter, the maximum likelihood (ML) estimator for the parameters of a real sinusoid in additive white Gaussian noise using irregularly-spaced samples is derived. The ML frequency estimate is first determined by a one-dimensional search, from which optimum amplitude and phase estimates are then computed. It is shown that the estimation performance of the ML method can attain Cramér-Rao lower bound when the signal-to-noise ratio is sufficiently large.
In this paper, we propose a hybrid model adaptation approach in which pronunciation and acoustic models are adapted by incorporating the pronunciation and acoustic variabilities of non-native speech in order to improve the performance of non-native automatic speech recognition (ASR). Specifically, the proposed hybrid model adaptation can be performed at either the state-tying or triphone-modeling level, depending at which acoustic model adaptation is performed. In both methods, we first analyze the pronunciation variant rules of non-native speakers and then classify each rule as either a pronunciation variant or an acoustic variant. The state-tying level hybrid method then adapts pronunciation models and acoustic models by accommodating the pronunciation variants in the pronunciation dictionary and by clustering the states of triphone acoustic models using the acoustic variants, respectively. On the other hand, the triphone-modeling level hybrid method initially adapts pronunciation models in the same way as in the state-tying level hybrid method; however, for the acoustic model adaptation, the triphone acoustic models are then re-estimated based on the adapted pronunciation models and the states of the re-estimated triphone acoustic models are clustered using the acoustic variants. From the Korean-spoken English speech recognition experiments, it is shown that ASR systems employing the state-tying and triphone-modeling level adaptation methods can relatively reduce the average word error rates (WERs) by 17.1% and 22.1% for non-native speech, respectively, when compared to a baseline ASR system.
Changliang LIU Fuping PAN Fengpei GE Bin DONG Hongbin SUO Yonghong YAN
This paper describes a reading miscue detection system based on the conventional Large Vocabulary Continuous Speech Recognition (LVCSR) framework [1]. In order to incorporate the knowledge of reference (what the reader ought to read) and some error patterns into the decoding process, two methods are proposed: Dynamic Multiple Pronunciation Incorporation (DMPI) and Dynamic Interpolation of Language Model (DILM). DMPI dynamically adds some pronunciation variations into the search space to predict reading substitutions and insertions. To resolve the conflict between the coverage of error predications and the perplexity of the search space, only the pronunciation variants related to the reference are added. DILM dynamically interpolates the general language model based on the analysis of the reference and so keeps the active paths of decoding relatively near the reference. It makes the recognition more accurate, which further improves the detection performance. At the final stage of detection, an improved dynamic program (DP) is used to align the confusion network (CN) from speech recognition and the reference to generate the detecting result. The experimental results show that the proposed two methods can decrease the Equal Error Rate (EER) by 14% relatively, from 46.4% to 39.8%.
Hye Kyung LEE Won-Jin YOON Tae-Jin LEE Hyunseung CHOO Min Young CHUNG
The Ethernet passive optical network (EPON), which is one of the PON technologies for realizing FTTx (Fiber-To-The-Curb/Home/Office), is a low-cost and high-speed solution to the bottleneck problem that occurs between a backbone network and end users. The EPON is compatible with existing customer devices that are equipped with an Ethernet card. To effectively control frame transmission from optical network units (ONUs) to an optical line termination (OLT), the EPON can use a multi-point control protocol (MPCP) with control functions in addition to the media access control (MAC) protocol function. In this paper, we propose a two-phase cycle dynamic bandwidth allocation (TCDBA) algorithm to increase the channel utilization on the uplink by allowing frame transmissions during computation periods, and combine the TCDBA algorithm with the queue management schemes performed within each ONU, in order to effectively support differentiated services. Additionally, we perform simulations to validate the effectiveness of the proposed algorithm. The results show that the proposed TCDBA algorithm improves the maximum throughput, average transmission delay, and average volume of frames discarded, compared with the existing algorithms. Furthermore, the proposed TCDBA algorithm is able to support differentiated quality of services (QoS).
Transmission characteristics of a left-handed (LH) ferrite microstrip line are significantly affected by the nonuniform DC bias magnetic field in the ferrite substrate (internal magnetic field Hin) caused by the inhomogeneous demagnetizing effect because the strip conductors of these devices must be mounted at the edge of the ferrite substrate. Three dimensional analyses on the LH ferrite microstrip line are performed taking into account the nonuniform internal magnetic field Hin. The analytical results show that the nonuniform internal magnetic field under the strip conductor near the edge of the ferrite substrate is useful for spreading the frequency band of negative permeability and nonreciprocal operation, and for improvement of both the insertion and return losses of the LH ferrite microstrip line. Measured results of more than 20 dB isolation with 2.2 dB insertion loss and 1.33 GHz bandwidth are corresponding well to the analytical results.
This paper describes a morpheme-based pronunciation model that is especially useful to develop the pronunciation lexicon for Large Vocabulary Continuous Speech Recognition (LVCSR) in Korean. To address pronunciation variation in Korean, we analyze phonological rules based on phonemic contexts together with morphological category and morpheme boundary information. Since the same phoneme sequences can be pronounced in different ways at across morpheme boundary, incorporating morphological environment is required to manipulate pronunciation variation modeling. We implement a rule-based pronunciation variants generator to produce a pronunciation lexicon with context-dependent multiple variants. At the lexical level, we apply an explicit modeling of pronunciation variation to add pronunciation variants at across morphemes as well as within morpheme into the pronunciation lexicon. At the acoustic level, we train the phone models with re-labeled transcriptions through forced alignment using context-dependent pronunciation lexicon. The proposed pronunciation lexicon offers the potential benefit for both training and decoding of a LVCSR system. Subsequently, we perform the speech recognition experiment on read speech task with 34K-morpheme vocabulary. Experiment confirms that improved performance is achieved by pronunciation variation modeling based on morpho-phonological analysis.
Tohru TAMURA Toshifumi SATOH Takayuki UCHIDA Takashi FURUHATA
An analytical approach using human perception has been applied to the evaluation of the front-of-screen (FOS) quality of liquid crystal displays (LCDs), particularly regarding the regions of luminance nonuniformity called "muras." The accurate and consistent inspection of muras is extremely difficult because muras have various shapes and sizes as well as contrasts. And inspection results tend to depend on inspectors during the LCD manufacturing process. To determine the quantitative scale that shows the evaluation results of mura matching human perceptions, first, we conducted a perception test and clarified the "just noticeable difference" (JND) contrast according to the type of mura. Second, the relationship between the JND contrast of mura and background luminance was investigated. Finally, we proposed a quantitative scale of mura level on the basis of the JND contrasts at various background luminances. In this paper, we describe our research on human perception of muras at various background luminances and an approach to determining the quantitative scale of visible muras.
Ha H. NGUYEN Tyler NECHIPORENKO
This letter considers the signal design problems for quaternary digital communications with nonuniform sources. The designs are considered for both the average and equal energy constraints and for a two-dimensional signal space. A tight upper bound on the bit error probability (BEP) is employed as the design criterion. The optimal quarternary signal sets are presented and their BEP performance is compared with that of the standard QPSK and the binary signal set previously designed for nonuniform sources. Results shows that a considerable saving in the transmitted power can be achieved by the proposed average-energy signal set for a highly nonuniform source.
This paper proposes a new theory and design method for a class of recombination nonuniform filter banks (RNFBs) with linear phase (LP) filters. In a uniform filter bank (FB), consecutive channels are merged by sets of transmultiplexers (TMUXs) to realize a nonuniform FB. RNFBs with LP analysis/synthesis filters are of great interest because the analysis filters for the partially reconstructed signals, through merging, are LP and hence less phase distortions are introduced to the desired signals. We analyze the spectrum supports of the analysis filters of these LP RNFBs. The conditions on the uniform FB and recombination TMUXs of an LP RNFB with good frequency characteristics are determined. These conditions are relatively simple to be satisfied and the uniform FB and recombination TMUXs can be designed separately without much degradation in performance. This allows dynamically recombination of different number of channels in the original uniform FB to give a flexible and time-varying frequency partitioning. Using these results, a method for designing a class of near-perfect-reconstruction (NPR) LP RNFBs with cosine roll-off transition band using the REMEZ algorithm is proposed. A design example is given to show that LP RNFBs with good frequency responses and reasonably low reconstruction errors can be achieved.
Masako FUJIMOTO Takayuki KAGOMIYA
In Japanese, there is frequent alternation between CV morae and moraic geminate consonants. In this study, we analyzed the phonemic environments of consonant gemination (CG) using the "Corpus of Spontaneous Japanese (CSJ)." The results revealed that the environment in which gemination occurs is, to some extent, parallel to that of vowel devoicing. However, there are two crucial differences. One difference is that the CG tends to occur in a /kVk/ environment, whereas such is not the case for vowel devoicing. The second difference is that when the preceding consonant is /r/, gemination occurs, but not vowel devoicing. These observations suggest that the mechanism leading to CG differs from that which leads to vowel devoicing.