1-13hit |
Xiaolei LIU Xiaosong ZHANG Yiqi JIANG Qingxin ZHU
Optimizating the deployment of wireless sensor networks, which is one of the key issues in wireless sensor networks research, helps improve the coverage of the networks and the system reliability. In this paper, we propose an evolutionary algorithm based on modified t-distribution for the wireless sensor by introducing a deployment optimization operator and an intelligent allocation operator. A directed perturbation operator is applied to the algorithm to guide the evolution of the node deployment and to speed up the convergence. In addition, with a new geometric sensor detection model instead of the old probability model, the computing speed is increased by 20 times. The simulation results show that when this algorithm is utilized in the actual scene, it can get the minimum number of nodes and the optimal deployment quickly and effectively.Compared with the existing mainstream swarm intelligence algorithms, this method has satisfied the need for convergence speed and better coverage, which is closer to the theoretical coverage value.
Jin-Song ZHANG Konstantin MARKOV Tomoko MATSUI Satoshi NAKAMURA
This paper presents a study on modeling inter-word pauses to improve the robustness of acoustic models for recognizing noisy conversational speech. When precise contextual modeling is used for pauses, the frequent appearances and varying acoustics of pauses in noisy conversational speech make it a problem to automatically generate an accurate phonetic transcription of the training data for developing robust acoustic models. This paper presents a proposal to exploit the reliable phonetic heuristics of pauses in speech to aid the detection of varying pauses. Based on it, a stepwise approach to optimize pause HMMs was applied to the data of the DARPA SPINE2 project, and more correct phonetic transcription was achieved. The cross-word triphone HMMs developed using this method got an absolute 9.2% word error reduction when compared to the conventional method with only context free modeling of pauses. For the same pause modeling method, the use of the optimized phonetic segmentation brought about an absolute 5.2% improvements.
Konstantin MARKOV Tomoko MATSUI Rainer GRUHN Jinsong ZHANG Satoshi NAKAMURA
This paper presents the ATR speech recognition system designed for the DARPA SPINE2 evaluation task. The system is capable of dealing with speech from highly variable, real-world noisy conditions and communication channels. A number of robust techniques are implemented, such as differential spectrum mel-scale cepstrum features, on-line MLLR adaptation, and word-level hypothesis combination, which led to a significant reduction in the word error rate.
Richeng DUAN Tatsuya KAWAHARA Masatake DANTSUJI Jinsong ZHANG
Aiming at detecting pronunciation errors produced by second language learners and providing corrective feedbacks related with articulation, we address effective articulatory models based on deep neural network (DNN). Articulatory attributes are defined for manner and place of articulation. In order to efficiently train these models of non-native speech without such data, which is difficult to collect in a large scale, several transfer learning based modeling methods are explored. We first investigate three closely-related secondary tasks which aim at effective learning of DNN articulatory models. We also propose to exploit large speech corpora of native and target language to model inter-language phenomena. This kind of transfer learning can provide a better feature representation of non-native speech. Related task transfer and language transfer learning are further combined on the network level. Compared with the conventional DNN which is used as the baseline, all proposed methods improved the performance. In the native attribute recognition task, the network-level combination method reduced the recognition error rate by more than 10% relative for all articulatory attributes. The method was also applied to pronunciation error detection in Mandarin Chinese pronunciation learning by Japanese native speakers, and achieved the relative improvement up to 17.0% for detection accuracy and up to 19.9% for F-score, which is also better than the lattice-based combination.
Xiao-Dong WANG Keikichi HIROSE Jin-Song ZHANG Nobuaki MINEMATSU
A method was developed for automatic recognition of syllable tone types in continuous speech of Mandarin by integrating two techniques, tone nucleus modeling and neural network classifier. The tone nucleus modeling considers a syllable F0 contour as consisting of three parts: onset course, tone nucleus, and offset course. Two courses are transitions from/to neighboring syllable F0 contours, while the tone nucleus is intrinsic part of the F0 contour. By viewing only the tone nucleus, acoustic features less affected by neighboring syllables are obtained. When using the tone nucleus modeling, automatic detection of tone nucleus comes crucial. An improvement was added to the original detection method. Distinctive acoustic features for tone types are not limited to F0 contours. Other prosodic features, such as waveform power and syllable duration, are also useful for tone recognition. Their heterogeneous features are rather difficult to be handled simultaneously in hidden Markov models (HMM), but are easy in neural networks. We adopted multi-layer perceptron (MLP) as a neural network. Tone recognition experiments were conducted for speaker dependent and independent cases. In order to show the effect of integration, experiments were conducted also for two baselines: HMM classifier with tone nucleus modeling, and MLP classifier viewing entire syllable instead of tone nucleus. The integrated method showed 87.1% of tone recognition rate in speaker dependent case, and 80.9% in speaker independent case, which was about 10% relative error reduction as compared to the baselines.
Jin-Song ZHANG Xin-Hui HU Satoshi NAKAMURA
Chinese is a representative tonal language, and it has been an attractive topic of how to process tone information in the state-of-the-art large vocabulary speech recognition system. This paper presents a novel way to derive an efficient phoneme set of tone-dependent units to build a recognition system, by iteratively merging a pair of tone-dependent units according to the principle of minimal loss of the Mutual Information (MI). The mutual information is measured between the word tokens and their phoneme transcriptions in a training text corpus, based on the system lexical and language model. The approach has a capability to keep discriminative tonal (and phoneme) contrasts that are most helpful for disambiguating homophone words due to lack of tones, and merge those tonal (and phoneme) contrasts that are not important for word disambiguation for the recognition task. This enables a flexible selection of phoneme set according to a balance between the MI information amount and the number of phonemes. We applied the method to traditional phoneme set of Initial/Finals, and derived several phoneme sets with different number of units. Speech recognition experiments using the derived sets showed its effectiveness.
Yu TIAN Linhua MA Bo SONG Hong TANG Song ZHANG Xing HU
Much work in cooperative communication has been done from the perspective of the physical and network layers. However, the exact impact of signal error rate performance on cooperative routing discovery still remains unclear in multihop ad hoc networks. In this paper, we show the symbol error rate (SER) performance improvement obtained from cooperative commutation, and examine how to incorporate the factor of SER into the distributed routing discovery scheme called DGCR (Dynamic Geographic Cooperative Routing). For a single cooperative communication hop, we present two types of metric to specify the degree that one node is suitable for becoming the relay node. One metric is the potential of a node to relay with optimal SER performance. The other metric is the distance of a node to the straight line that passes through the last forwarding node and the destination. Based on location knowledge and contention scheme, we combine the two metrics into a composite metric to choose the relay node. The forwarding node is chosen dynamically according to the positions of the actual relay node and the destination. Simulation results show that our approach outperforms non-cooperative geographic routing significantly in terms of symbol error rate, and that DGCR's SER performance is better than traditional geographic cooperative routing with slight path length increase.
Weina NIU Xiaosong ZHANG Guowu YANG Ruidong CHEN Dong WANG
Advanced Persistent Threat (APT) is one of the most serious network attacks that occurred in cyberspace due to sophisticated techniques and deep concealment. Modeling APT attack process can facilitate APT analysis, detection, and prediction. However, current techniques focus on modeling known attacks, which neither reflect APT attack dynamically nor take human factors into considerations. In order to overcome this limitation, we propose a Targeted Complex Attack Network (TCAN) model for APT attack process based on dynamic attack graph and network evolution. Compared with current models, our model addresses human factors by conducting a two-layer network structure. Meanwhile, we present a stochastic model based on states change in the target network to specify nodes involved in the procedure of this APT. Besides, our model adopts time domain to expand the traditional attack graph into dynamic attack network. Our model is featured by flexibility, which is proven through changing the related parameters. In addition, we propose dynamic evolution rules based on complex network theory and characteristics of the actual attack scenarios. Finally, we elaborate a procedure to add nodes by a matrix operation. The simulation results show that our model can model the process of attack effectively.
Xiaoyun WANG Jinsong ZHANG Masafumi NISHIDA Seiichi YAMAMOTO
This paper describes a novel method to improve the performance of second language speech recognition when the mother tongue of users is known. Considering that second language speech usually includes less fluent pronunciation and more frequent pronunciation mistakes, the authors propose using a reduced phoneme set generated by a phonetic decision tree (PDT)-based top-down sequential splitting method instead of the canonical one of the second language. The authors verify the efficacy of the proposed method using second language speech collected with a translation game type dialogue-based English CALL system. Experiments show that a speech recognizer achieved higher recognition accuracy with the reduced phoneme set than with the canonical phoneme set.
Ying SUN Yong YU Xiaosong ZHANG Jiwen CHAI
Observing the security of existing identity-based proxy signature schemes was proven in the random oracle model, Cao et al. proposed the first direct construction of identity-based proxy signature secure in the standard model by making use of the identity-based signature due to Paterson and Schuldt. They also provided a security proof to show their construction is secure against forgery attacks without resorting to the random oracles. Unfortunately, in this letter, we demonstrate that their scheme is vulnerable to insider attacks. Specifically, after a private-key extraction query, an adversary, behaving as a malicious original signer or a malicious proxy signer, is able to violate the unforgeability of the scheme.
Jin-Song ZHANG Satoshi NAKAMURA
An efficient way to develop large scale speech corpora is to collect phonetically rich ones that have high coverage of phonetic contextual units. The sentence set, usually called as the minimum set, should have small text size in order to reduce the collection cost. It can be selected by a greedy search algorithm from a large mother text corpus. With the inclusion of more and more phonetic contextual effects, the number of different phonetic contextual units increased dramatically, making the search not a trivial issue. In order to improve the search efficiency, we previously proposed a so-called least-to-most-ordered greedy search based on the conventional algorithms. This paper evaluated these algorithms in order to show their different characteristics. The experimental results showed that the least-to-most-ordered methods successfully achieved smaller objective sets at significantly less computation time, when compared with the conventional ones. This algorithm has already been applied to the development a number of speech corpora, including a large scale phonetically rich Chinese speech corpus ATRPTH which played an important role in developing our multi-language translation system.
Peng HAN Hua TIAN Zhensong ZHANG Wei XIE
A wireless emergency communication network with a fixed allocation of spectrum resources cannot meet the tremendous demand for spectrum access when a crisis occurs. It is necessary to develop an effective spectrum access scheme to improve the performance of emergency communication systems. In this paper, we study a new emergency communication system combines cognitive radio technology and an emergency communication network. Emergency users can utility resources in a general network when traffic becomes congested in an emergency network. Non-reciprocal spectrum access scheme (NRA) and reciprocal spectrum access scheme (RA) for two heterogeneous cognitive networks, namely emergency network and general network are proposed to compare with traditional spectrum access scheme (TA). User behavior with each scheme is modeled by continuous-time Markov chains. Moreover, the blocking and dropping probabilities of users in two heterogeneous cognitive networks are derived as the performance metrics. In addition, the throughput and the spectrum utilization rate of the system are evaluated. Finally, we compare the performance of three dynamic spectrum access schemes. The simulation results show that the RA scheme is an effective scheme to enhance the performance of emergency systems.
Yong WANG Xiaoran DUAN Xiaodong YANG Yiquan ZHANG Xiaosong ZHANG
Geosocial networking allows users to interact with respect to their current locations, which enables a group of users to determine where to meet. This calls for techniques that support processing of Multiple-user Location-based Keyword (MULK) queries, which return a set of Point-of-Interests (POIs) that are 'close' to the locations of the users in a group and can provide them with potential options at the lowest expense (e.g., minimizing travel distance). In this paper, we formalize the MULK query and propose a dynamic programming-based algorithm to find the optimal result set. Further, we design two approximation algorithms to improve MULK query processing efficiency. The experimental evaluations show that our solutions are feasible and efficient under various parameter settings.