Rong CHEN Cunqian FENG Sisan HE Yi RAO
The extraction of micro-motion parameters is deeply influenced by the precision of estimation on translational motion parameters. Based on the periodicity of micro-motion, the quadratic polynomial fitting is carried out among range delays to align envelope. The micro-motion component of phase information is eliminated by conjugate multiplication after which the translational motion parameters are estimated. Then the translational motion is precisely compensated through the third order polynomial fitting. Results of simulation demonstrate that the algorithm put forward here can realize the precise compensation for translational motion parameters even under an environment with low signal noise ratio (SNR).
The use of flash memory based storage devices is rapidly increasing, and user demands for high performance are also constantly increasing. The performance of the flash storage device is greatly influenced by cleaning operations of Flash Translation Layer (FTL). Various studies have been conducted to lower the cost of cleaning operations. However, there are limits to achieve sufficient performance improvement of flash storages without help of a host system, with only limited information in storage devices. Recently, SCSI, eMMC, and UFS standards provide an interface for sending semantic information from a host system to a storage device. In this paper, we analyze effects of semantic information on performance and lifetime of flash storage devices. We evaluate performance and lifetime improvement through SA-FTL (Semantic Aware Flash Translation Layer), which can take advantage of semantic information in storage devices. Experiments show that SA-FTL improves performance and lifetime of flash based storages by up to 30 and 35%, respectively, compared to a simple page-level FTL.
JinAn XU Yufeng CHEN Kuang RU Yujie ZHANG Kenji ARAKI
Named Entity Translation Equivalents extraction plays a critical role in machine translation (MT) and cross language information retrieval (CLIR). Traditional methods are often based on large-scale parallel or comparable corpora. However, the applicability of these studies is constrained, mainly because of the scarcity of parallel corpora of the required scale, especially for language pairs of Chinese and Japanese. In this paper, we propose a method considering the characteristics of Chinese and Japanese to automatically extract the Chinese-Japanese Named Entity (NE) translation equivalents based on inductive learning (IL) from monolingual corpora. The method adopts the Chinese Hanzi and Japanese Kanji Mapping Table (HKMT) to calculate the similarity of the NE instances between Japanese and Chinese. Then, we use IL to obtain partial translation rules for NEs by extracting the different parts from high similarity NE instances in Chinese and Japanese. In the end, the feedback processing updates the Chinese and Japanese NE entity similarity and rule sets. Experimental results show that our simple, efficient method, which overcomes the insufficiency of the traditional methods, which are severely dependent on bilingual resource. Compared with other methods, our method combines the language features of Chinese and Japanese with IL for automatically extracting NE pairs. Our use of a weak correlation bilingual text sets and minimal additional knowledge to extract NE pairs effectively reduces the cost of building the corpus and the need for additional knowledge. Our method may help to build a large-scale Chinese-Japanese NE translation dictionary using monolingual corpora.
The paper presents a small reversible language R-CORE, a structured imperative programming language with symbolic tree-structured data (S-expressions). The language is reduced to the core of a reversible language, with a single command for reversibly updating the store, a single reversible control-flow operator, a limited number of variables, and data with a single atom and a single constructor. Despite its extreme simplicity, the language is reversibly universal, which means that it is as powerful as any reversible language can be, while it is linear-time self-interpretable, and it allows reversible programming with dynamic data structures. The four-line program inverter for R-CORE is among the shortest existing program inverters, which demonstrates the conciseness of the language. The translator to R-CORE, which is used to show the formal properties of the language, is clean and modular, and it may serve as a model for related reversible translation problems. The goal is to provide a language that is sufficiently concise for theoretical investigations. Owing to its simplicity, the language may also be used for educational purposes.
Shimin SUN Li HAN Xianshu JIN Sunyoung HAN
For IP-based mobile networks, efficient mobility management is vital to provision seamless online service. IP address starvation and scalability issue constrain the wide deployment of existing mobility schemes, such as Mobile IP, Proxy Mobile IP, and their derivations. Most of the studies focus on the scenario of mobility among public networks. However, most of current networks, such as home networks, sensor networks, and enterprise networks, are deployed with private networks hard to apply mobility solutions. With the rapid development, Software Defined Networking (SDN) offers the opportunity of innovation to support mobility in private network schemes. In this paper, a novel mobility management scheme is presented to support mobile node moving from public network to private network in a seamless handover procedure. The centralized control manner and flexible flow management in SDN are utilized to provide network-based mobility support with better QoS guarantee. Benefiting from SDN/OpenFlow technology, complex handover process is simplified with fewer message exchanges. Furthermore, handover efficiency can be improved in terms of delay and overhead reduction, scalability, and security. Analytical analysis and implementation results showed a better performance than mobile IP in terms of latency and throughput variation.
Shigeki MATSUDA Teruaki HAYASHI Yutaka ASHIKARI Yoshinori SHIGA Hidenori KASHIOKA Keiji YASUDA Hideo OKUMA Masao UCHIYAMA Eiichiro SUMITA Hisashi KAWAI Satoshi NAKAMURA
This study introduces large-scale field experiments of VoiceTra, which is the world's first speech-to-speech multilingual translation application for smart phones. In the study, approximately 10 million input utterances were collected since the experiments commenced. The usage of collected data was analyzed and discussed. The study has several important contributions. First, it explains system configuration, communication protocol between clients and servers, and details of multilingual automatic speech recognition, multilingual machine translation, and multilingual speech synthesis subsystems. Second, it demonstrates the effects of mid-term system updates using collected data to improve an acoustic model, a language model, and a dictionary. Third, it analyzes system usage.
Zhenxin YANG Miao LI Lei CHEN Kai SUN
In this paper, a morpheme-based weighting and its integration method are proposed as a smoothing method to alleviate the data sparseness in Chinese-Mongolian statistical machine translation (SMT). Besides, we present source-side reordering as the pre-processing model to verify the extensibility of our method. Experi-mental results show that the morpheme-based weighting can substantially improve the translation quality.
Dongchul PARK Biplob DEBNATH David H.C. DU
The Flash Translation Layer (FTL) is a firmware layer inside NAND flash memory that allows existing disk-based applications to use it without any significant modifications. Since the FTL has a critical impact on the performance and reliability of flash-based storage, a variety of FTLs have been proposed. The existing FTLs, however, are designed to perform well for either a read intensive workload or a write intensive workload, not for both due to their internal address mapping schemes. To overcome this limitation, we propose a novel hybrid FTL scheme named Convertible Flash Translation Layer (CFTL). CFTL is adaptive to data access patterns with the help of our unique hot data identification design that adopts multiple bloom filters. Thus, CFTL can dynamically switch its mapping scheme to either page-level mapping or block-level mapping to fully exploit the benefits of both schemes. In addition, we design a spatial locality-aware caching mechanism and adaptive cache partitioning to further improve CFTL performance. Consequently, both the adaptive switching scheme and the judicious caching mechanism empower CFTL to achieve good read and write performance. Our extensive evaluations demonstrate that CFTL outperforms existing FTLs. In particular, our specially designed caching mechanism remarkably improves the cache hit ratio, by an average of 2.4×, and achieves much higher hit ratios (up to 8.4×) especially for random read intensive workloads.
This paper claims to use a new question expansion method for question classification in cQA services. The input questions consist of only a question whereas training data do a pair of question and answer. Thus they cannot provide enough information for good classification in many cases. Since the answer is strongly associated with the input questions, we try to create a pseudo answer to expand each input question. Translation probabilities between questions and answers and a pseudo relevant feedback technique are used to generate the pseudo answer. As a result, we obtain the significant improved performances when two approaches are effectively combined.
NAND-based block devices such as memory cards and solid-state drives embed a flash translation layer (FTL) to emulate the standard block device interface and its features. The overall performance of these devices is determined mainly by the efficiency of the FTL scheme, so intensive research has been performed to improve the average performance of the FTL scheme. However, its worst-case performance has rarely been considered. The present study aims to improve the worst-case performance without affecting the average performance. The central concept is to distribute the garbage collection cost, which is the main source of performance fluctuations, over multiple requests. The proposed scheme comprises three modules: i) anticipated partial log block merging to distribute the garbage collection time; ii) reclaiming clean pages by moving valid pages to bound the worst-case garbage collection time, instead of performing repeated block merges; and iii) victim selection based on the valid page count in a victim log and the required clean page count to avoid subsequent garbage collections. A trace-driven simulation showed that the worst-case performance was improved up to 1,300% using the proposed garbage collection scheme. The average performance was also similar to that of the original scheme. This improvement was achieved without additional memory overheads.
Lasguido NIO Sakriani SAKTI Graham NEUBIG Tomoki TODA Satoshi NAKAMURA
This paper describes the design and evaluation of a method for developing a chat-oriented dialog system by utilizing real human-to-human conversation examples from movie scripts and Twitter conversations. The aim of the proposed method is to build a conversational agent that can interact with users in as natural a fashion as possible, while reducing the time requirement for database design and collection. A number of the challenging design issues we faced are described, including (1) constructing an appropriate dialog corpora from raw movie scripts and Twitter data, and (2) developing an multi domain chat-oriented dialog management system which can retrieve a proper system response based on the current user query. To build a dialog corpus, we propose a unit of conversation called a tri-turn (a trigram conversation turn), as well as extraction and semantic similarity analysis techniques to help ensure that the content extracted from raw movie/drama script files forms appropriate dialog-pair (query-response) examples. The constructed dialog corpora are then utilized in a data-driven dialog management system. Here, various approaches are investigated including example-based (EBDM) and response generation using phrase-based statistical machine translation (SMT). In particular, we use two EBDM: syntactic-semantic similarity retrieval and TF-IDF based cosine similarity retrieval. Experiments are conducted to compare and contrast EBDM and SMT approaches in building a chat-oriented dialog system, and we investigate a combined method that addresses the advantages and disadvantages of both approaches. System performance was evaluated based on objective metrics (semantic similarity and cosine similarity) and human subjective evaluation from a small user study. Experimental results show that the proposed filtering approach effectively improve the performance. Furthermore, the results also show that by combing both EBDM and SMT approaches, we could overcome the shortcomings of each.
Taku FUKUSHIMA Takashi YOSHINO
In this study, we have developed a translation repair method to automatically improve the accuracy of translations. Machine translation (MT) supports multilingual communication; however, it cannot achieve high accuracy. MT creates only one translated sentence; therefore, it is difficult to improve the accuracy of translated sentences. Our method creates multiple translations by adding personal pronouns to the source sentence and by using a word dictionary and a parallel corpus. In addition, it selects an accurate translation from among the multiple translations using the results of a Web search. As a result, the translation repair method improved the accuracy of translated sentences, and its accuracy is greater than that of MT.
Yeon-Soo LEE Hyoung-Gyu LEE Hae-Chang RIM Young-Sook HWANG
In phrase-based statistical machine translation, long distance reordering problem is one of the most challenging issues when translating syntactically distant language pairs. In this paper, we propose a novel reordering model to solve this problem. In our model, reordering is affected by the overall structures of sentences such as listings, reduplications, and modifications as well as the relationships of adjacent phrases. To this end, we reflect global syntactic contexts including the parts that are not yet translated during the decoding process.
Xiaoyong ZHANG Noriyasu HOMMA Kei ICHIJI Makoto ABE Norihiro SUGITA Makoto YOSHIZAWA
This paper presents a faster one-dimensional (1-D) phase-only correlation (POC)-based method for estimations of translations, rotation, and scaling in images. The proposed method is to project two-dimensional (2-D) images horizontally and vertically onto 1-D signals, and uses 1-D POCs of the 1-D signals to estimate the translations in images. Combined with a log-polar transform, the proposed method is extended to scaling and rotation estimations. Compared with conventional 2-D and 1-D POC-based methods, the proposed method performs in a lower computational cost. Experimental results demonstrate that the proposed method is capable of estimating large translations, rotation and scaling in images, and its accuracy is comparable to those of the conventional POC-based methods. The experimental results also show that the computational cost of the proposed method is much lower than those of the conventional POC-based methods.
Bo WANG Yuanyuan ZHANG Qian XU
We describe a novel idea to improve machine translation by combining multiple candidate translations and extra translations. Without manual work, extra translations can be generated by identifying and hybridizing the syntactic equivalents in candidate translations. Candidate and extra translations are then combined on sentence level for better general translation performance.
To characterize an antenna, the acquisition of its three-dimensional radiation pattern is the fundamental requirement. Spherical antenna measurement is a practical approach to measuring antenna patterns in spherical geometry. However, due to the limitations of measurement range and measurement time, the measured samples may either be incomplete on scanning sphere, or be inadequate in terms of the sampling interval. Therefore there is a need to extrapolate and interpolate the measured samples. Spherical wave expansion, whose band-limited property is derived from the sampling theorem, provides a good tool for reconstructing antenna patterns. This research identifies the limitation of the conventional algorithm when reconstructing the pattern of an antenna which is not located at the coordinate origin of the measurement set-up. A novel algorithm is proposed to overcome the limitation by resampling between the unprimed and primed (where the antenna is centred) coordinate systems. The resampling of measured samples from the unprimed coordinate to the primed coordinate can be conducted by translational phase shift, and the resampling of reconstructed pattern from the primed coordinate back to the unprimed coordinate can be accomplished by rotation and translation of spherical waves. The proposed algorithm enables the analytical and continuous pattern reconstruction, even under the severe sampling condition for deviated AUT. Numerical investigations are conducted to validate the proposed algorithm.
Minh-Quoc NGHIEM Giovanni YOKO KRISTIANTO Akiko AIZAWA
This paper explores the problem of semantic enrichment of mathematical expressions. We formulate this task as the translation of mathematical expressions from presentation markup to content markup. We use MathML, an application of XML, to describe both the structure and content of mathematical notations. We apply a method based on statistical machine translation to extract translation rules automatically. This approach contrasts with previous research, which tends to rely on manually encoded rules. We also introduce segmentation rules used to segment mathematical expressions. Combining segmentation rules and translation rules strengthens the translation system and archives significant improvements over a prior rule-based system.
Zezhong LI Hideto IKEDA Junichi FUKUMOTO
In most phrase-based statistical machine translation (SMT) systems, the translation model relies on word alignment, which serves as a constraint for the subsequent building of a phrase table. Word alignment is usually inferred by GIZA++, which implements all the IBM models and HMM model in the framework of Expectation Maximum (EM). In this paper, we present a fully Bayesian inference for word alignment. Different from the EM approach, the Bayesian inference makes use of all possible parameter values rather than estimating a single parameter value, from which we expect a more robust inference. After inferring the word alignment, current SMT systems usually train the phrase table from Viterbi word alignment, which is prone to learn incorrect phrases due to the word alignment mistakes. To overcome this drawback, a new phrase extraction method is proposed based on multiple Gibbs samples from Bayesian inference for word alignment. Empirical results show promising improvements over baselines in alignment quality as well as the translation performance.
Hansjorg HOFMANN Sakriani SAKTI Chiori HORI Hideki KASHIOKA Satoshi NAKAMURA Wolfgang MINKER
The performance of English automatic speech recognition systems decreases when recognizing spontaneous speech mainly due to multiple pronunciation variants in the utterances. Previous approaches address this problem by modeling the alteration of the pronunciation on a phoneme to phoneme level. However, the phonetic transformation effects induced by the pronunciation of the whole sentence have not yet been considered. In this article, the sequence-based pronunciation variation is modeled using a noisy channel approach where the spontaneous phoneme sequence is considered as a “noisy” string and the goal is to recover the “clean” string of the word sequence. Hereby, the whole word sequence and its effect on the alternation of the phonemes will be taken into consideration. Moreover, the system not only learns the phoneme transformation but also the mapping from the phoneme to the word directly. In this study, first the phonemes will be recognized with the present recognition system and afterwards the pronunciation variation model based on the noisy channel approach will map from the phoneme to the word level. Two well-known natural language processing approaches are adopted and derived from the noisy channel model theory: Joint-sequence models and statistical machine translation. Both of them are applied and various experiments are conducted using microphone and telephone of spontaneous speech.
Chooi-Ling GOH Taro WATANABE Eiichiro SUMITA
While phrase-based statistical machine translation systems prefer to translate with longer phrases, this may cause errors in a free word order language, such as Japanese, in which the order of the arguments of the predicates is not solely determined by the predicates and the arguments can be placed quite freely in the text. In this paper, we propose to reorder the arguments but not the predicates in Japanese using a dependency structure as a kind of reordering. Instead of a single deterministically given permutation, we generate multiple reordered phrases for each sentence and translate them independently. Then we apply a re-ranking method using a discriminative approach by Ranking Support Vector Machines (SVM) to re-score the multiple reordered phrase translations. In our experiment with the travel domain corpus BTEC, we gain a 1.22% BLEU score improvement when only 1-best is used for re-ranking and 4.12% BLEU score improvement when n-best is used for Japanese-English translation.