Kazu MISHIBA Takeshi YOSHITOME
The relative arrangement, such as relative positions and orientations among objects, can play an important role in expressing the situation such as sports games and race scenes. In this paper, we propose a retargeting method that allows maintaining the relative arrangement. Our proposed retargeting method is based on a warping method which finds an optimal transformation by solving an energy minimization problem. To achieve protection of object arrangement, we introduce an energy that enforces all the objects and the relative positions among these objects to be transformed by the same transformation in the retargeting process. In addition, our method imposes the following three types of conditions in order to obtain more satisfactory results: protection of important regions, avoiding extreme deformation, and cropping with preservation of the balance of visual importance. Experimental results demonstrate that our proposed method maintains the relative arrangement while protecting important regions.
Toru NAKASHIKA Tetsuya TAKIGUCHI Yasuo ARIKI
This paper presents a voice conversion technique using speaker-dependent Restricted Boltzmann Machines (RBM) to build high-order eigen spaces of source/target speakers, where it is easier to convert the source speech to the target speech than in the traditional cepstrum space. We build a deep conversion architecture that concatenates the two speaker-dependent RBMs with neural networks, expecting that they automatically discover abstractions to express the original input features. Under this concept, if we train the RBMs using only the speech of an individual speaker that includes various phonemes while keeping the speaker individuality unchanged, it can be considered that there are fewer phonemes and relatively more speaker individuality in the output features of the hidden layer than original acoustic features. Training the RBMs for a source speaker and a target speaker, we can then connect and convert the speaker individuality abstractions using Neural Networks (NN). The converted abstraction of the source speaker is then back-propagated into the acoustic space (e.g., MFCC) using the RBM of the target speaker. We conducted speaker-voice conversion experiments and confirmed the efficacy of our method with respect to subjective and objective criteria, comparing it with the conventional Gaussian Mixture Model-based method and an ordinary NN.
Shoko YAMAHATA Yoshikazu YAMAGUCHI Atsunori OGAWA Hirokazu MASATAKI Osamu YOSHIOKA Satoshi TAKAHASHI
Recognition errors caused by out-of-vocabulary (OOV) words lead critical problems when developing spoken language understanding systems based on automatic speech recognition technology. And automatic vocabulary adaptation is an essential technique to solve these problems. In this paper, we propose a novel and effective automatic vocabulary adaptation method. Our method selects OOV words from relevant documents using combined scores of semantic and acoustic similarities. Using this combined score that reflects both semantic and acoustic aspects, only necessary OOV words can be selected without registering redundant words. In addition, our method estimates probabilities of OOV words using semantic similarity and a class-based N-gram language model. These probabilities will be appropriate since they are estimated by considering both frequencies of OOV words in target speech data and the stable class N-gram probabilities. Experimental results show that our method improves OOV selection accuracy and recognition accuracy of newly registered words in comparison with conventional methods.
I-Jen CHAO Ching-Wen HOU Bin-Da LIU Soon-Jyh CHANG Chun-Yueh HUANG
A third-order low-distortion delta-sigma modulator (DSM), whose third-order noise-shaping ability is achieved by just a single opamp, is proposed. Since only one amplifier is required in the whole circuit, the designed DSM is very power efficient. To realize the adder in front of quantizer without employing the huge-power opamp, a capacitive passive adder, which is the digital-to-analog converter (DAC) array of a successive-approximation-type quantizer, is used. In addition, the feedback path timing is extended from a nonoverlapping interval for the conventional low-distortion structure to half of the clock period, so that the strict operation timing issue with regard to quantization and the dynamic element matching (DEM) logic operation can be solved. In the proposed DSM structure, the features of the unity-gain signal transfer function (STF) and finite-impulse-response (FIR) noise transfer function (NTF) are still preserved, and thus advantages such as a relaxed opamp slew rate and reduced output swing are also maintained, as with the conventional low-distortion DSM. Moreover, the memory effect in the proposed DSM is analyzed when employing the opamp sharing for integrators. The proposed third-order DSM with a 4-bit SAR ADC as the quantizer is implemented in a 90-nm CMOS process. The post-layout simulations show a 79.8-dB signal-to-noise and distortion ratio (SNDR) in the 1.875-MHz signal bandwidth (OSR=16). The active area of the circuit is 0.35mm2 and total power consumption is 2.85mW, resulting in a figure of merit (FOM) of 95 fJ/conversion-step.
Honggyu JUNG Kwang-Yul KIM Yoan SHIN
We propose a cooperative compressed spectrum sensing scheme for correlated signals in wideband cognitive radio networks. In order to design a reconstruction algorithm which accurately recover the wideband signals from the compressed samples in low SNR (Signal-to-Noise Ratio) environments, we consider the multiple measurement vector model exploiting a sequence of input signals and propose a cooperative sparse Bayesian learning algorithm which models the temporal correlation of the input signals. Simulation results show that the proposed scheme outperforms existing compressed sensing algorithms for low SNRs.
ID-based authenticated key exchange (ID-AKE) is a cryptographic tool to establish a common session key between parties with authentication based on their IDs. If IDs contain some hierarchical structure such as an e-mail address, hierarchical ID-AKE (HID-AKE) is especially suitable because of scalability. However, most of existing HID-AKE schemes do not satisfy advanced security properties such as forward secrecy, and the only known strongly secure HID-AKE scheme is inefficient. In this paper, we propose a new HID-AKE scheme which achieves both strong security and efficiency. We prove that our scheme is eCK-secure (which ensures maximal-exposure-resilience including forward secrecy) without random oracles, while existing schemes is proved in the random oracle model. Moreover, the number of messages and pairing operations are independent of the hierarchy depth; that is, really scalable and practical for a large-system.
Yichao LU Xiao PENG Guifen TIAN Satoshi GOTO
Majority-logic algorithms are devised for decoding non-binary LDPC codes in order to reduce computational complexity. However, compared with conventional belief propagation algorithms, majority-logic algorithms suffer from severe bit error performance degradation. This paper presents a low-complexity reliability-based algorithm aiming at improving error correcting ability of majority-logic algorithms. Reliability measures for check nodes are novelly introduced to realize mutual update between variable message and check message, and hence more efficient reliability propagation can be achieved, similar to belief-propagation algorithm. Simulation results on NB-LDPC codes with different characteristics demonstrate that our algorithm can reduce the bit error ratio by more than one order of magnitude and the coding gain enhancement over ISRB-MLGD can reach 0.2-2.0dB, compared with both the ISRB-MLGD and IISRB-MLGD algorithms. Moreover, simulations on typical LDPC codes show that the computational complexity of the proposed algorithm is closely equivalent to ISRB-MLGD algorithm, and is less than 10% of Min-max algorithm. As a result, the proposed algorithm achieves a more efficient trade-off between decoding computational complexity and error performance.
Ce LIANG Xiyan SUN Yuanfa JI Qinghua LIU Guisheng LIAO
The composite binary offset carrier (CBOC) modulated signal contains multi-peaks in its auto-correlation function, which brings ambiguity to the signal acquisition process of a GNSS receiver. Currently, most traditional ambiguity-removing schemes for CBOC signal acquisition approximate CBOC signal as a BOC signal, which may incur performance degradation. Based on Galileo E1 CBOC signal, this paper proposes a novel adaptive ambiguity-removing acquisition scheme which doesn't adopt the approximation used in traditional schemes. According to the energy ratio of each sub-code of CBOC signal, the proposed scheme can self-adjust its local reference code to achieve unambiguous and precise signal synchronization. Monte Carlo simulation is conducted in this paper to analyze the performance of the proposed scheme and three traditional schemes. Simulation results show that the proposed scheme has higher detection probability and less mean acquisition time than the other three schemes, which verify the superiority of the proposed scheme.
Yu TSAO Ting-Yao HU Sakriani SAKTI Satoshi NAKAMURA Lin-shan LEE
This study proposes a variable selection linear regression (VSLR) adaptation framework to improve the accuracy of automatic speech recognition (ASR) with only limited and unlabeled adaptation data. The proposed framework can be divided into three phases. The first phase prepares multiple variable subsets by applying a ranking filter to the original regression variable set. The second phase determines the best variable subset based on a pre-determined performance evaluation criterion and computes a linear regression (LR) mapping function based on the determined subset. The third phase performs adaptation in either model or feature spaces. The three phases can select the optimal components and remove redundancies in the LR mapping function effectively and thus enable VSLR to provide satisfactory adaptation performance even with a very limited number of adaptation statistics. We formulate model space VSLR and feature space VSLR by integrating the VS techniques into the conventional LR adaptation systems. Experimental results on the Aurora-4 task show that model space VSLR and feature space VSLR, respectively, outperform standard maximum likelihood linear regression (MLLR) and feature space MLLR (fMLLR) and their extensions, with notable word error rate (WER) reductions in a per-utterance unsupervised adaptation manner.
Keigo KUBO Sakriani SAKTI Graham NEUBIG Tomoki TODA Satoshi NAKAMURA
Grapheme-to-phoneme (g2p) conversion, used to estimate the pronunciations of out-of-vocabulary (OOV) words, is a highly important part of recognition systems, as well as text-to-speech systems. The current state-of-the-art approach in g2p conversion is structured learning based on the Margin Infused Relaxed Algorithm (MIRA), which is an online discriminative training method for multiclass classification. However, it is known that the aggressive weight update method of MIRA is prone to overfitting, even if the current example is an outlier or noisy. Adaptive Regularization of Weight Vectors (AROW) has been proposed to resolve this problem for binary classification. In addition, AROW's update rule is simpler and more efficient than that of MIRA, allowing for more efficient training. Although AROW has these advantages, it has not been applied to g2p conversion yet. In this paper, we first apply AROW on g2p conversion task which is structured learning problem. In an evaluation that employed a dataset generated from the collective knowledge on the Web, our proposed approach achieves a 6.8% error reduction rate compared to MIRA in terms of phoneme error rate. Also the learning time of our proposed approach was shorter than that of MIRA in almost datasets.
Masato NAKAMURA Junya SEKIKAWA
Break arcs are generated in a DC48V and 12A resistive circuit. Silver electrical contacts are separated at constant opening speed. The cathode contact surface is irradiated by a blue LED. The center wavelength of the emission of the LED is 470nm. There is no spectral line of the light emitted from the break arcs. Only the images of contact surface are observed by a high-speed camera and an optical band pass filter. Another high-speed camera observes only the images of the break arc. Time evolutions of the cathode surface morphology being eroded by the break arcs and the motion of the break arcs are observed with these cameras, simultaneously. The images of the cathode surface are investigated by the image analysis technique. The results show that the moments when the expanded regions on the cathode surface are formed during the occurrence of the break arcs. In addition, it is shown that the expanded regions are not contacted directly to the cathode roots of the break arcs.
Yuhu CHENG Xuesong WANG Ge CAO
A multi-source Tri-Training transfer learning algorithm is proposed by integrating transfer learning and semi-supervised learning. First, multiple weak classifiers are respectively trained by using both weighted source and target training samples. Then, based on the idea of co-training, each target testing sample is labeled by using trained weak classifiers and the sample with the same label is selected as the high-confidence sample to be added into the target training sample set. Finally, we can obtain a target domain classifier based on the updated target training samples. The above steps are iterated till the high-confidence samples selected at two successive iterations become the same. At each iteration, source training samples are tested by using the target domain classifier and the samples tested as correct continue with training, while the weights of samples tested as incorrect are lowered. Experimental results on text classification dataset have proven the effectiveness and superiority of the proposed algorithm.
Yunpyo HONG Juwon BYUN Youngjo KIM Jaeseok KIM
This letter proposes a pipelined architecture with prediction mode scheduling for high efficiency video coding (HEVC). An increased number of intra prediction modes in HEVC have introduced a new technique, named rough mode decision (RMD). This development, however, means that pipeline architectures for H.264 cannot be used in HEVC. The proposed scheme executes the RMD and the rate-distortion optimization (RDO) process simultaneously by grouping the intra prediction modes and changing the candidate selection method of the RMD algorithm. The proposed scheme reduces execution cycle by up to 26% with negligible coding loss.
Kung-Jui PAI Jou-Ming CHANG Yue-Li WANG Ro-Yu WU
A queue layout of a graph G consists of a linear order of its vertices, and a partition of its edges into queues, such that no two edges in the same queue are nested. The queuenumber qn(G) is the minimum number of queues required in a queue layout of G. The Cartesian product of two graphs G1 = (V1,E1) and G2 = (V2,E2), denoted by G1 × G2, is the graph with {
Kazuhiro NAKAMURA Kei HASHIMOTO Yoshihiko NANKAKU Keiichi TOKUDA
This paper proposes a novel approach for integrating spectral feature extraction and acoustic modeling in hidden Markov model (HMM) based speech synthesis. The statistical modeling process of speech waveforms is typically divided into two component modules: the frame-by-frame feature extraction module and the acoustic modeling module. In the feature extraction module, the statistical mel-cepstral analysis technique has been used and the objective function is the likelihood of mel-cepstral coefficients for given speech waveforms. In the acoustic modeling module, the objective function is the likelihood of model parameters for given mel-cepstral coefficients. It is important to improve the performance of each component module for achieving higher quality synthesized speech. However, the final objective of speech synthesis systems is to generate natural speech waveforms from given texts, and the improvement of each component module does not always lead to the improvement of the quality of synthesized speech. Therefore, ideally all objective functions should be optimized based on an integrated criterion which well represents subjective speech quality of human perception. In this paper, we propose an approach to model speech waveforms directly and optimize the final objective function. Experimental results show that the proposed method outperformed the conventional methods in objective and subjective measures.
Hiroshi FUJIWARA Yasuhiro KONNO Toshihiro FUJITO
The multislope ski-rental problem is an extension of the classical ski-rental problem, where the player has several options of paying both of a per-time fee and an initial fee, in addition to pure renting and buying options. Damaschke gave a lower bound of 3.62 on the competitive ratio for the case where arbitrary number of options can be offered. In this paper we propose a scheme that for the number of options given as an input, provides a lower bound on the competitive ratio, by extending the method of Damaschke. This is the first to establish a lower bound for each of the 5-or-more-option cases, for example, a lower bound of 2.95 for the 5-option case, 3.08 for the 6-option case, and 3.18 for the 7-option case. Moreover, it turns out that our lower bounds for the 3- and 4-option cases respectively coincide with the known upper bounds. We therefore conjecture that our scheme in general derives a matching lower and upper bound.
Naoya ONIZAWA Akira MOCHIZUKI Hirokatsu SHIRAHAMA Masashi IMAI Tomohiro YONEDA Takahiro HANYU
This paper introduces a partially parallel inter-chip link architecture for asynchronous multi-chip Network-on-Chips (NoCs). The multi-chip NoCs that operate as a large NoC have been recently proposed for very large systems, such as automotive applications. Inter-chip links are key elements to realize high-performance multi-chip NoCs using a limited number of I/Os. The proposed asynchronous link based on level-encoded dual-rail (LEDR) encoding transmits several bits in parallel that are received by detecting the phase information of the LEDR signals at each serial link. It employs a burst-mode data transmission that eliminates a per-bit handshake for a high-speed operation, but the elimination may cause data-transmission errors due to cross-talk and power-supply noises. For triggering data retransmission, errors are detected from the embedded phase information; error-detection codes are not used. The throughput is theoretically modelled and is optimized by considering the bit-error rate (BER) of the link. Using delay parameters estimated for a 0.13 µm CMOS technology, the throughput of 8.82 Gbps is achieved by using 10 I/Os, which is 90.5% higher than that of a link using 9 I/Os without an error-detection method operating under negligible low BER (<10-20).
Ryo AIHARA Ryoichi TAKASHIMA Tetsuya TAKIGUCHI Yasuo ARIKI
This paper presents a voice conversion (VC) technique for noisy environments based on a sparse representation of speech. Sparse representation-based VC using Non-negative matrix factorization (NMF) is employed for noise-added spectral conversion between different speakers. In our previous exemplar-based VC method, source exemplars and target exemplars are extracted from parallel training data, having the same texts uttered by the source and target speakers. The input source signal is represented using the source exemplars and their weights. Then, the converted speech is constructed from the target exemplars and the weights related to the source exemplars. However, this exemplar-based approach needs to hold all training exemplars (frames), and it requires high computation times to obtain the weights of the source exemplars. In this paper, we propose a framework to train the basis matrices of the source and target exemplars so that they have a common weight matrix. By using the basis matrices instead of the exemplars, the VC is performed with lower computation times than with the exemplar-based method. The effectiveness of this method was confirmed by comparing its effectiveness (in speaker conversion experiments using noise-added speech data) with that of an exemplar-based method and a conventional Gaussian mixture model (GMM)-based method.
Dajuan FAN Zhiqiu HUANG Lei TANG
One of the most important problems in web services application is the integration of different existing services into a new composite service. Existing work has the following disadvantages: (i) developers are often required to provide a composite service model first and perform formal verifications to check whether the model is correct. This makes the synthesis process of composite services semi-automatic, complex and inefficient; (ii) there is no assurance that composite services synthesized by using the fully-automatic approaches are correct; (iii) some approaches only handle simple composition problems where existing services are atomic. To address these problems, we propose a correct assurance approach for automatically synthesizing composite services based on finite state machine model. The syntax and semantics of the requirement model specifying composition requirements is also proposed. Given a set of abstract BPEL descriptions of existing services, and a composition requirement, our approach automatically generate the BPEL implementation of the composite service. Compared with existing approaches, the composite service generated by utilizing our proposed approach is guaranteed to be correct and does not require any formal verification. The correctness of our approach is proved. Moreover, the case analysis indicates that our approach is feasible and effective.
Hyun-Jun SHIN Hyun-Woo JANG Hyoung-Kyu SONG
In this letter, a cooperative scheme is proposed for the broadcasting and cellular communication system. The proposed scheme improves bit error rate (BER) performance and throughput on the edge of a cellular base station (CBS) cooperating with another CBS in the same broadcasting coverage. The proposed scheme for the enhancement of BER performance employs two schemes by a channel quality information (CQI) between a broadcasting base station (BBS) and users. In a physical area, the edge of a CBS is concatenated with the edge of another CBS. When users are on the edge of a CBS, they transmit simultaneously the CQI to CBSs, and then a BBS and CBSs transmit signals by the proposed algorithm. The two schemes apply space-time cyclic delay diversity (CDD) and a combination of space-time block code (STBC) with vertical Bell Laboratories Layered Space-Time (V-BLAST) to a signal from a BBS and CBSs. The resulting performance indicates that the proposed scheme is effective for users on the edges of CBSs.