Yasunari OBUCHI Takashi SUMIYOSHI
In this paper we introduce a new framework of audio processing, which is essential to achieve a trigger-free speech interface for home appliances. If the speech interface works continually in real environments, it must extract occasional voice commands and reject everything else. It is extremely important to reduce the number of false alarms because the number of irrelevant inputs is much larger than the number of voice commands even for heavy users of appliances. The framework, called Intentional Voice Command Detection, is based on voice activity detection, but enhanced by various speech/audio processing techniques such as emotion recognition. The effectiveness of the proposed framework is evaluated using a newly-collected large-scale corpus. The advantages of combining various features were tested and confirmed, and the simple LDA-based classifier demonstrated acceptable performance. The effectiveness of various methods of user adaptation is also discussed.
Tetsuo KOSAKA Yuui TAKEDA Takashi ITO Masaharu KATO Masaki KOHDA
In this paper, we propose a new speaker-class modeling and its adaptation method for the LVCSR system and evaluate the method on the Corpus of Spontaneous Japanese (CSJ). In this method, closer speakers are selected from training speakers and the acoustic models are trained by using their utterances for each evaluation speaker. One of the major issues of the speaker-class model is determining the selection range of speakers. In order to solve the problem, several models which have a variety of speaker range are prepared for each evaluation speaker in advance, and the most proper model is selected on a likelihood basis in the recognition step. In addition, we improved the recognition performance using unsupervised speaker adaptation with the speaker-class models. In the recognition experiments, a significant improvement could be obtained by using the proposed speaker adaptation based on speaker-class models compared with the conventional adaptation method.
Tong WU Ying WANG Yushan PEI Gen LI Ping ZHANG
This letter proposes an intra-cell partial spectrum reuse (PSR) scheme for cellular OFDM-relay networks. The proposed method aims to increase the system throughput, while the SINR of the cell edge users can be also promoted by utilizing the PSR scheme. The novel pre-allocation factor γ not only indicates the flexibility of PSR, but also decreases the complexity of the reuse mechanism. Through simulations, the proposed scheme is shown to offer superior performances in terms of system throughput and SINR of last 5% users.
In this paper, we propose a hybrid model adaptation approach in which pronunciation and acoustic models are adapted by incorporating the pronunciation and acoustic variabilities of non-native speech in order to improve the performance of non-native automatic speech recognition (ASR). Specifically, the proposed hybrid model adaptation can be performed at either the state-tying or triphone-modeling level, depending at which acoustic model adaptation is performed. In both methods, we first analyze the pronunciation variant rules of non-native speakers and then classify each rule as either a pronunciation variant or an acoustic variant. The state-tying level hybrid method then adapts pronunciation models and acoustic models by accommodating the pronunciation variants in the pronunciation dictionary and by clustering the states of triphone acoustic models using the acoustic variants, respectively. On the other hand, the triphone-modeling level hybrid method initially adapts pronunciation models in the same way as in the state-tying level hybrid method; however, for the acoustic model adaptation, the triphone acoustic models are then re-estimated based on the adapted pronunciation models and the states of the re-estimated triphone acoustic models are clustered using the acoustic variants. From the Korean-spoken English speech recognition experiments, it is shown that ASR systems employing the state-tying and triphone-modeling level adaptation methods can relatively reduce the average word error rates (WERs) by 17.1% and 22.1% for non-native speech, respectively, when compared to a baseline ASR system.
Dan-ni AI Xian-hua HAN Xiang RUAN Yen-wei CHEN
In this paper, we present a novel color independent components based SIFT descriptor (termed CIC-SIFT) for object/scene classification. We first learn an efficient color transformation matrix based on independent component analysis (ICA), which is adaptive to each category in a database. The ICA-based color transformation can enhance contrast between the objects and the background in an image. Then we compute CIC-SIFT descriptors over all three transformed color independent components. Since the ICA-based color transformation can boost the objects and suppress the background, the proposed CIC-SIFT can extract more effective and discriminative local features for object/scene classification. The comparison is performed among seven SIFT descriptors, and the experimental classification results show that our proposed CIC-SIFT is superior to other conventional SIFT descriptors.
Yoshihisa KONDO Hiroyuki YOMO Shinji YAMAGUCHI Peter DAVIS Ryu MIURA Sadao OBANA Seiichi SAMPEI
This paper proposes multipoint-to-multipoint (MPtoMP) real-time broadcast transmission using network coding for ad-hoc networks like video game networks. We aim to achieve highly reliable MPtoMP broadcasting using IEEE 802.11 media access control (MAC) that does not include a retransmission mechanism. When each node detects packets from the other nodes in a sequence, the correctly detected packets are network-encoded, and the encoded packet is broadcasted in the next sequence as a piggy-back for its native packet. To prevent increase of overhead in each packet due to piggy-back packet transmission, network coding vector for each node is exchanged between all nodes in the negotiation phase. Each user keeps using the same coding vector generated in the negotiation phase, and only coding information that represents which user signal is included in the network coding process is transmitted along with the piggy-back packet. Our simulation results show that the proposed method can provide higher reliability than other schemes using multi point relay (MPR) or redundant transmissions such as forward error correction (FEC). We also implement the proposed method in a wireless testbed, and show that the proposed method achieves high reliability in a real-world environment with a practical degree of complexity when installed on current wireless devices.
Hasan KADHEM Toshiyuki AMAGASA Hiroyuki KITAGAWA
Encryption can provide strong security for sensitive data against inside and outside attacks. This is especially true in the "Database as Service" model, where confidentiality and privacy are important issues for the client. In fact, existing encryption approaches are vulnerable to a statistical attack because each value is encrypted to another fixed value. This paper presents a novel database encryption scheme called MV-OPES (Multivalued--Order Preserving Encryption Scheme), which allows privacy-preserving queries over encrypted databases with an improved security level. Our idea is to encrypt a value to different multiple values to prevent statistical attacks. At the same time, MV-OPES preserves the order of the integer values to allow comparison operations to be directly applied on encrypted data. Using calculated distance (range), we propose a novel method that allows a join query between relations based on inequality over encrypted values. We also present techniques to offload query execution load to a database server as much as possible, thereby making a better use of server resources in a database outsourcing environment. Our scheme can easily be integrated with current database systems as it is designed to work with existing indexing structures. It is robust against statistical attack and the estimation of true values. MV-OPES experiments show that security for sensitive data can be achieved with reasonable overhead, establishing the practicability of the scheme.
This paper is concerned with the packet transmission scheduling problem for repeating all-to-all broadcasts in Underwater Sensor Networks (USN) in which there are n nodes in a transmission range. All-to-all communication is one of the most dense communication patterns. It is assumed that each node has the same size packet. Unlike the terrestrial scenarios, the propagation time in underwater communications is not negligible. We define all-to-all broadcast as the one where every node transmits packets to all the other nodes in the network except itself. So, there are in total n(n - 1) packets to be transmitted for an all-to-all broadcast. The optimal transmission scheduling is to schedule in a way that all packets can be transmitted within the minimum time. In this paper, we propose an efficient packet transmission scheduling algorithm for underwater acoustic communications using the property of long propagation delay.
Akira SHIOZAKI Masashi KISHIMOTO Genmon MARUOKA
This letter proposes extended single parity check product codes and presents their empirical performances on a Gaussian channel by belief propagation (BP) decoding algorithm. The simulation results show that the codes can achieve close-to-capacity performance in high coding rate. The code of length 9603 and of rate 0.96 is only 0.77 dB away from the Shannon limit for a BER of 10-5.
Li YUE Chenggao HAN Nalin S. WEERASINGHE Takeshi HASHIMOTO
This paper studies the performance of a coded convolutional spreading CDMA system with cyclic prefix (CS-CDMA/CP) combined with the zero correlation zone code generated from the M-sequence (M-ZCZ code) for downlink transmission over a multipath fast fading channel. In particular, we propose a new pilot-aided channel estimation scheme based on the shift property of the M-ZCZ code and show the robustness of the scheme against fast fading through comparison with the W-CDMA system empolying time-multiplexed pilot signals.
Yanqing SUN Yu ZHOU Qingwei ZHAO Pengyuan ZHANG Fuping PAN Yonghong YAN
In this paper, the robustness of the posterior-based confidence measures is improved by utilizing entropy information, which is calculated for speech-unit-level posteriors using only the best recognition result, without requiring a larger computational load than conventional methods. Using different normalization methods, two posterior-based entropy confidence measures are proposed. Practical details are discussed for two typical levels of hidden Markov model (HMM)-based posterior confidence measures, and both levels are compared in terms of their performances. Experiments show that the entropy information results in significant improvements in the posterior-based confidence measures. The absolute improvements of the out-of-vocabulary (OOV) rejection rate are more than 20% for both the phoneme-level confidence measures and the state-level confidence measures for our embedded test sets, without a significant decline of the in-vocabulary accuracy.
Koichiro SAWA Takahiro UENO Hidenori TANAKA
In an automotive fuel pump system, a small DC motor is widely used to drive the pump and driven by a automotive battery. Recently a bio-fuel, usually a mixture of gasoline and ethanol has been used due to shortage of gasoline and environmental aspect. It affects strongly the performances of a DC motor, especially commutation phenomena, what kind of fuel is used. Therefore the authors have started to investigate the influence of ethanol on the commutation phenomena. They have been reporting the wear of brush and carbon flat commutator in gasoline and ethanol so far. In this paper commutation period, arc duration, brush and commutator wear are examined in ethanol 50-gasoline 50%. Brush wears are very small compared with the previous results. Namely in the present test a mechanical sliding wear is predominant rather than erosion by arc due to short arc duration. Further, an area eroded by arc is observed to re-appear as a sliding surface. From these results a threshold arc energy between arc erosion and mechanical sliding wear is obtained, and a wear model is proposed to explain the above wear pattern on the sliding surface.
Shun WATANABE Ryutaroh MATSUMOTO Tomohiko UYEMATSU
Privacy amplification is a technique to distill a secret key from a random variable by a function so that the distilled key and eavesdropper's random variable are statistically independent. There are three kinds of security criteria for the key distilled by privacy amplification: the normalized divergence criterion, which is also known as the weak security criterion, the variational distance criterion, and the divergence criterion, which is also known as the strong security criterion. As a technique to distill a secret key, it is known that the encoder of a Slepian-Wolf (the source coding with full side-information at the decoder) code can be used as a function for privacy amplification if we employ the weak security criterion. In this paper, we show that the encoder of a Slepian-Wolf code cannot be used as a function for privacy amplification if we employ the criteria other than the weak one.
Nan LIU Yao ZHAO Zhenfeng ZHU Rongrong NI
This paper presents a commercial shot classification scheme combining well-designed visual and textual features to automatically detect TV commercials. To identify the inherent difference between commercials and general programs, a special mid-level textual descriptor is proposed, aiming to capture the spatio-temporal properties of the video texts typical of commercials. In addition, we introduce an ensemble-learning based combination method, named Co-AdaBoost, to interactively exploit the intrinsic relations between the visual and textual features employed.
Wei FENG Yanmin WANG Yunzhou LI Shidong ZHOU Jing WANG
In this letter, we address the problem of downlink power allocation for the generalized distributed antenna system (DAS) with cooperative clusters. Considering practical applications, we assume that only the large-scale channel state information is available at the transmitter. The power allocation scheme is investigated with the target of ergodic achievable sum rate maximization. Based on some approximations and the Rayleigh Quotient Theory, the simple selective power allocation scheme is derived for the low SNR scenario and the high SNR scenario, respectively. The methods are applicable in practice due to their low complexity.
Naoki HAYASHI Toshimitsu USHIO Takafumi KANAZAWA
This paper proposes an adaptive resource allocation for multi-tier computing systems to guarantee a fair QoS level under resource constraints of tiers. We introduce a multi-tier computing architecture which consists of a group of resource managers and an arbiter. Resource allocation of each client is managed by a dedicated resource manager. Each resource manager updates resources allocated to subtasks of its client by locally exchanging QoS levels with other resource managers. An arbiter compensates the updated resources to avoid overload conditions in tiers. Based on the compensation by the arbiter, the subtasks of each client are executed in corresponding tiers. We derive sufficient conditions for the proposed resource allocation to achieve a fair QoS level avoiding overload conditions in all tiers with some assumptions on a QoS function and a resource consumption function of each client. We conduct a simulation to demonstrate that the proposed resource allocation can adaptively achieve a fair QoS level without causing any overload condition.
Dongwoo LEE Young Seok JUNG Jae Hong LEE
This paper proposes cooperative coding using cyclic delay diversity (CDD) for OFDM systems. The cooperative diversity is combined with channel coding while CDD is applied to the cooperative transmission of the multiple relays to improve the beneficial effects of the cooperating relays. Analyses of frame error probability (FEP) and the average channel power of the proposed scheme are shown. Simulation results show the frame error rate (FER) of the proposed scheme. The proposed scheme provides not only a simple code design and low system complexity compared to conventional space-time processing, but better FER and diversity gain compared to direct transmission and conventional cooperative coding without CDD.
When a zero offset reference sequence is defined, the i-bit shifted sequence has phase offset i with respect to the reference sequence. In this letter, we propose a new algorithm to compute phase offsets for a periodic binary sequence using the concept of order and index of an integer based on the number theoretical approach. We define an offset evaluation function that is used to calculate the phase offset, and derive properties of the function. Once the function is computed, the phase offset of the sequence is simply obtained by taking the index of it. The new algorithm overcomes the restrictions found in conventional methods on the length and the number of '0's and '1's in binary codes. Its application to the code acquisition is also investigated to show the proposed method is useful.
Youn-Ok CHOI Zheng-Guo PIAO Geum-Bae CHO
This study examined the performance improvement of a photovoltaic (PV) array and inverter as well as their design, construction, and post-operation and management, which will become the key elements in future PV systems. In addition, it evaluated the performance characteristics of a 50 kW grid-connection PV system in Korea. According to the result of the evaluation, the PV array showed approximately 10% efficiency. The inverter was indicated to operate at > 90% efficiency regularly at > 400 W/m2 irradiation. The capture losses (Lc), system losses (Ls) and performance ratio were approximately 0.9 h/d, 0.3 h/d, and > 70%, respectively, indicating that the system was operating stably. In addition, while the Ls decreased rapidly due to the efficiency of the inverter, the performance ratio decreased markedly with increasing Lc due to the increase in temperature when the reference yield was > 5.0 h/d.
Gamal M. DOUSOKY Masahito SHOYAMA Tamotsu NINOMIYA
This paper investigates the effect of several frequency modulation profiles on conducted-noise reduction in dc-dc converters with programmed switching controller. The converter is operated in variable frequency modulation regime. Twelve switching frequency modulation profiles have been studied. Some of the modulation data are prepared using MATLAB software, and others are generated online. Moreover, all the frequency profiles have been designed and implemented using FPGA and experimentally investigated. The experimental results show that the conducted-noise spreading depends on both the modulation sequence profile and the statistical characteristics of the sequence. A substantial part of the manufacturing cost of power converters for telecommunication applications involves designing filters to comply with the EMI limits. Considering this investigation significantly reduces the filter size.