The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] SI(16314hit)

4061-4080hit(16314hit)

  • Image Retargeting with Protection of Object Arrangement

    Kazu MISHIBA  Takeshi YOSHITOME  

     
    PAPER-Image Processing and Video Processing

      Vol:
    E97-D No:6
      Page(s):
    1583-1589

    The relative arrangement, such as relative positions and orientations among objects, can play an important role in expressing the situation such as sports games and race scenes. In this paper, we propose a retargeting method that allows maintaining the relative arrangement. Our proposed retargeting method is based on a warping method which finds an optimal transformation by solving an energy minimization problem. To achieve protection of object arrangement, we introduce an energy that enforces all the objects and the relative positions among these objects to be transformed by the same transformation in the retargeting process. In addition, our method imposes the following three types of conditions in order to obtain more satisfactory results: protection of important regions, avoiding extreme deformation, and cropping with preservation of the balance of visual importance. Experimental results demonstrate that our proposed method maintains the relative arrangement while protecting important regions.

  • Voice Conversion Based on Speaker-Dependent Restricted Boltzmann Machines

    Toru NAKASHIKA  Tetsuya TAKIGUCHI  Yasuo ARIKI  

     
    PAPER-Voice Conversion and Speech Enhancement

      Vol:
    E97-D No:6
      Page(s):
    1403-1410

    This paper presents a voice conversion technique using speaker-dependent Restricted Boltzmann Machines (RBM) to build high-order eigen spaces of source/target speakers, where it is easier to convert the source speech to the target speech than in the traditional cepstrum space. We build a deep conversion architecture that concatenates the two speaker-dependent RBMs with neural networks, expecting that they automatically discover abstractions to express the original input features. Under this concept, if we train the RBMs using only the speech of an individual speaker that includes various phonemes while keeping the speaker individuality unchanged, it can be considered that there are fewer phonemes and relatively more speaker individuality in the output features of the hidden layer than original acoustic features. Training the RBMs for a source speaker and a target speaker, we can then connect and convert the speaker individuality abstractions using Neural Networks (NN). The converted abstraction of the source speaker is then back-propagated into the acoustic space (e.g., MFCC) using the RBM of the target speaker. We conducted speaker-voice conversion experiments and confirmed the efficacy of our method with respect to subjective and objective criteria, comparing it with the conventional Gaussian Mixture Model-based method and an ordinary NN.

  • Automatic Vocabulary Adaptation Based on Semantic and Acoustic Similarities

    Shoko YAMAHATA  Yoshikazu YAMAGUCHI  Atsunori OGAWA  Hirokazu MASATAKI  Osamu YOSHIOKA  Satoshi TAKAHASHI  

     
    PAPER-Speech Recognition

      Vol:
    E97-D No:6
      Page(s):
    1488-1496

    Recognition errors caused by out-of-vocabulary (OOV) words lead critical problems when developing spoken language understanding systems based on automatic speech recognition technology. And automatic vocabulary adaptation is an essential technique to solve these problems. In this paper, we propose a novel and effective automatic vocabulary adaptation method. Our method selects OOV words from relevant documents using combined scores of semantic and acoustic similarities. Using this combined score that reflects both semantic and acoustic aspects, only necessary OOV words can be selected without registering redundant words. In addition, our method estimates probabilities of OOV words using semantic similarity and a class-based N-gram language model. These probabilities will be appropriate since they are estimated by considering both frequencies of OOV words in target speech data and the stable class N-gram probabilities. Experimental results show that our method improves OOV selection accuracy and recognition accuracy of newly registered words in comparison with conventional methods.

  • A Single Opamp Third-Order Low-Distortion Delta-Sigma Modulator with SAR Quantizer Embedded Passive Adder

    I-Jen CHAO  Ching-Wen HOU  Bin-Da LIU  Soon-Jyh CHANG  Chun-Yueh HUANG  

     
    PAPER

      Vol:
    E97-C No:6
      Page(s):
    526-537

    A third-order low-distortion delta-sigma modulator (DSM), whose third-order noise-shaping ability is achieved by just a single opamp, is proposed. Since only one amplifier is required in the whole circuit, the designed DSM is very power efficient. To realize the adder in front of quantizer without employing the huge-power opamp, a capacitive passive adder, which is the digital-to-analog converter (DAC) array of a successive-approximation-type quantizer, is used. In addition, the feedback path timing is extended from a nonoverlapping interval for the conventional low-distortion structure to half of the clock period, so that the strict operation timing issue with regard to quantization and the dynamic element matching (DEM) logic operation can be solved. In the proposed DSM structure, the features of the unity-gain signal transfer function (STF) and finite-impulse-response (FIR) noise transfer function (NTF) are still preserved, and thus advantages such as a relaxed opamp slew rate and reduced output swing are also maintained, as with the conventional low-distortion DSM. Moreover, the memory effect in the proposed DSM is analyzed when employing the opamp sharing for integrators. The proposed third-order DSM with a 4-bit SAR ADC as the quantizer is implemented in a 90-nm CMOS process. The post-layout simulations show a 79.8-dB signal-to-noise and distortion ratio (SNDR) in the 1.875-MHz signal bandwidth (OSR=16). The active area of the circuit is 0.35mm2 and total power consumption is 2.85mW, resulting in a figure of merit (FOM) of 95 fJ/conversion-step.

  • Cooperative Bayesian Compressed Spectrum Sensing for Correlated Wideband Signals

    Honggyu JUNG  Kwang-Yul KIM  Yoan SHIN  

     
    LETTER-Communication Theory and Signals

      Vol:
    E97-A No:6
      Page(s):
    1434-1438

    We propose a cooperative compressed spectrum sensing scheme for correlated signals in wideband cognitive radio networks. In order to design a reconstruction algorithm which accurately recover the wideband signals from the compressed samples in low SNR (Signal-to-Noise Ratio) environments, we consider the multiple measurement vector model exploiting a sequence of input signals and propose a cooperative sparse Bayesian learning algorithm which models the temporal correlation of the input signals. Simulation results show that the proposed scheme outperforms existing compressed sensing algorithms for low SNRs.

  • Practical and Exposure-Resilient Hierarchical ID-Based Authenticated Key Exchange without Random Oracles

    Kazuki YONEYAMA  

     
    PAPER

      Vol:
    E97-A No:6
      Page(s):
    1335-1344

    ID-based authenticated key exchange (ID-AKE) is a cryptographic tool to establish a common session key between parties with authentication based on their IDs. If IDs contain some hierarchical structure such as an e-mail address, hierarchical ID-AKE (HID-AKE) is especially suitable because of scalability. However, most of existing HID-AKE schemes do not satisfy advanced security properties such as forward secrecy, and the only known strongly secure HID-AKE scheme is inefficient. In this paper, we propose a new HID-AKE scheme which achieves both strong security and efficiency. We prove that our scheme is eCK-secure (which ensures maximal-exposure-resilience including forward secrecy) without random oracles, while existing schemes is proved in the random oracle model. Moreover, the number of messages and pairing operations are independent of the hierarchy depth; that is, really scalable and practical for a large-system.

  • Dynamic Check Message Majority-Logic Decoding Algorithm for Non-binary LDPC Codes

    Yichao LU  Xiao PENG  Guifen TIAN  Satoshi GOTO  

     
    PAPER

      Vol:
    E97-A No:6
      Page(s):
    1356-1364

    Majority-logic algorithms are devised for decoding non-binary LDPC codes in order to reduce computational complexity. However, compared with conventional belief propagation algorithms, majority-logic algorithms suffer from severe bit error performance degradation. This paper presents a low-complexity reliability-based algorithm aiming at improving error correcting ability of majority-logic algorithms. Reliability measures for check nodes are novelly introduced to realize mutual update between variable message and check message, and hence more efficient reliability propagation can be achieved, similar to belief-propagation algorithm. Simulation results on NB-LDPC codes with different characteristics demonstrate that our algorithm can reduce the bit error ratio by more than one order of magnitude and the coding gain enhancement over ISRB-MLGD can reach 0.2-2.0dB, compared with both the ISRB-MLGD and IISRB-MLGD algorithms. Moreover, simulations on typical LDPC codes show that the computational complexity of the proposed algorithm is closely equivalent to ISRB-MLGD algorithm, and is less than 10% of Min-max algorithm. As a result, the proposed algorithm achieves a more efficient trade-off between decoding computational complexity and error performance.

  • A Novel Adaptive Unambiguous Acquisition Scheme for CBOC Signal Based on Galileo

    Ce LIANG  Xiyan SUN  Yuanfa JI  Qinghua LIU  Guisheng LIAO  

     
    PAPER-Wireless Communication Technologies

      Vol:
    E97-B No:6
      Page(s):
    1157-1165

    The composite binary offset carrier (CBOC) modulated signal contains multi-peaks in its auto-correlation function, which brings ambiguity to the signal acquisition process of a GNSS receiver. Currently, most traditional ambiguity-removing schemes for CBOC signal acquisition approximate CBOC signal as a BOC signal, which may incur performance degradation. Based on Galileo E1 CBOC signal, this paper proposes a novel adaptive ambiguity-removing acquisition scheme which doesn't adopt the approximation used in traditional schemes. According to the energy ratio of each sub-code of CBOC signal, the proposed scheme can self-adjust its local reference code to achieve unambiguous and precise signal synchronization. Monte Carlo simulation is conducted in this paper to analyze the performance of the proposed scheme and three traditional schemes. Simulation results show that the proposed scheme has higher detection probability and less mean acquisition time than the other three schemes, which verify the superiority of the proposed scheme.

  • Variable Selection Linear Regression for Robust Speech Recognition

    Yu TSAO  Ting-Yao HU  Sakriani SAKTI  Satoshi NAKAMURA  Lin-shan LEE  

     
    PAPER-Speech Recognition

      Vol:
    E97-D No:6
      Page(s):
    1477-1487

    This study proposes a variable selection linear regression (VSLR) adaptation framework to improve the accuracy of automatic speech recognition (ASR) with only limited and unlabeled adaptation data. The proposed framework can be divided into three phases. The first phase prepares multiple variable subsets by applying a ranking filter to the original regression variable set. The second phase determines the best variable subset based on a pre-determined performance evaluation criterion and computes a linear regression (LR) mapping function based on the determined subset. The third phase performs adaptation in either model or feature spaces. The three phases can select the optimal components and remove redundancies in the LR mapping function effectively and thus enable VSLR to provide satisfactory adaptation performance even with a very limited number of adaptation statistics. We formulate model space VSLR and feature space VSLR by integrating the VS techniques into the conventional LR adaptation systems. Experimental results on the Aurora-4 task show that model space VSLR and feature space VSLR, respectively, outperform standard maximum likelihood linear regression (MLLR) and feature space MLLR (fMLLR) and their extensions, with notable word error rate (WER) reductions in a per-utterance unsupervised adaptation manner.

  • Structured Adaptive Regularization of Weight Vectors for a Robust Grapheme-to-Phoneme Conversion Model

    Keigo KUBO  Sakriani SAKTI  Graham NEUBIG  Tomoki TODA  Satoshi NAKAMURA  

     
    PAPER-Speech Synthesis and Related Topics

      Vol:
    E97-D No:6
      Page(s):
    1468-1476

    Grapheme-to-phoneme (g2p) conversion, used to estimate the pronunciations of out-of-vocabulary (OOV) words, is a highly important part of recognition systems, as well as text-to-speech systems. The current state-of-the-art approach in g2p conversion is structured learning based on the Margin Infused Relaxed Algorithm (MIRA), which is an online discriminative training method for multiclass classification. However, it is known that the aggressive weight update method of MIRA is prone to overfitting, even if the current example is an outlier or noisy. Adaptive Regularization of Weight Vectors (AROW) has been proposed to resolve this problem for binary classification. In addition, AROW's update rule is simpler and more efficient than that of MIRA, allowing for more efficient training. Although AROW has these advantages, it has not been applied to g2p conversion yet. In this paper, we first apply AROW on g2p conversion task which is structured learning problem. In an evaluation that employed a dataset generated from the collective knowledge on the Web, our proposed approach achieves a 6.8% error reduction rate compared to MIRA in terms of phoneme error rate. Also the learning time of our proposed approach was shorter than that of MIRA in almost datasets.

  • Real Time Spectroscopic Observation of Contact Surfaces Being Eroded by Break Arcs

    Masato NAKAMURA  Junya SEKIKAWA  

     
    PAPER-Electromechanical Devices and Components

      Vol:
    E97-C No:6
      Page(s):
    592-598

    Break arcs are generated in a DC48V and 12A resistive circuit. Silver electrical contacts are separated at constant opening speed. The cathode contact surface is irradiated by a blue LED. The center wavelength of the emission of the LED is 470nm. There is no spectral line of the light emitted from the break arcs. Only the images of contact surface are observed by a high-speed camera and an optical band pass filter. Another high-speed camera observes only the images of the break arc. Time evolutions of the cathode surface morphology being eroded by the break arcs and the motion of the break arcs are observed with these cameras, simultaneously. The images of the cathode surface are investigated by the image analysis technique. The results show that the moments when the expanded regions on the cathode surface are formed during the occurrence of the break arcs. In addition, it is shown that the expanded regions are not contacted directly to the cathode roots of the break arcs.

  • Multi-Source Tri-Training Transfer Learning

    Yuhu CHENG  Xuesong WANG  Ge CAO  

     
    LETTER-Artificial Intelligence, Data Mining

      Vol:
    E97-D No:6
      Page(s):
    1668-1672

    A multi-source Tri-Training transfer learning algorithm is proposed by integrating transfer learning and semi-supervised learning. First, multiple weak classifiers are respectively trained by using both weighted source and target training samples. Then, based on the idea of co-training, each target testing sample is labeled by using trained weak classifiers and the sample with the same label is selected as the high-confidence sample to be added into the target training sample set. Finally, we can obtain a target domain classifier based on the updated target training samples. The above steps are iterated till the high-confidence samples selected at two successive iterations become the same. At each iteration, source training samples are tested by using the target domain classifier and the samples tested as correct continue with training, while the weights of samples tested as incorrect are lowered. Experimental results on text classification dataset have proven the effectiveness and superiority of the proposed algorithm.

  • A Pipelined Architecture for Intra PU Encoding in HEVC

    Yunpyo HONG  Juwon BYUN  Youngjo KIM  Jaeseok KIM  

     
    LETTER-Image

      Vol:
    E97-A No:6
      Page(s):
    1439-1442

    This letter proposes a pipelined architecture with prediction mode scheduling for high efficiency video coding (HEVC). An increased number of intra prediction modes in HEVC have introduced a new technique, named rough mode decision (RMD). This development, however, means that pipeline architectures for H.264 cannot be used in HEVC. The proposed scheme executes the RMD and the rate-distortion optimization (RDO) process simultaneously by grouping the intra prediction modes and changing the candidate selection method of the RMD algorithm. The proposed scheme reduces execution cycle by up to 26% with negligible coding loss.

  • Queue Layouts of Toroidal Grids

    Kung-Jui PAI  Jou-Ming CHANG  Yue-Li WANG  Ro-Yu WU  

     
    PAPER

      Vol:
    E97-A No:6
      Page(s):
    1180-1186

    A queue layout of a graph G consists of a linear order of its vertices, and a partition of its edges into queues, such that no two edges in the same queue are nested. The queuenumber qn(G) is the minimum number of queues required in a queue layout of G. The Cartesian product of two graphs G1 = (V1,E1) and G2 = (V2,E2), denoted by G1 × G2, is the graph with {:v1 ∈ V1 and v2 ∈ V2} as its vertex set and an edge (,) belongs to G1×G2 if and only if either (u1,v1) ∈ E1 and u2 = v2 or (u2,v2) ∈ E2 and u1 = v1. Let Tk1,k2,...,kn denote the n-dimensional toroidal grid defined by the Cartesian product of n cycles with varied lengths, i.e., Tk1,k2,...,kn = Ck1 × Ck2 × … × Ckn, where Cki is a cycle of length ki ≥ 3. If k1 = k2 = … = kn = k, the graph is also called the k-ary n-cube and is denoted by Qnk. In this paper, we deal with queue layouts of toroidal grids and show the following bound: qn(Tk1,k2,...,kn) ≤ 2n-2 if n ≥ 2 and ki ≥ 3 for all i = 1,2,...,n. In particular, for n = 2 and k1,k2 ≥ 3, we acquire qn(Tk1,k2) = 2. Recently, Pai et al. (Inform. Process. Lett. 110 (2009) pp.50-56) showed that qn(Qnk) ≤ 2n-1 if n ≥1 and k ≥9. Thus, our result improves the bound of qn(Qnk) when n ≥2 and k ≥9.

  • Integration of Spectral Feature Extraction and Modeling for HMM-Based Speech Synthesis

    Kazuhiro NAKAMURA  Kei HASHIMOTO  Yoshihiko NANKAKU  Keiichi TOKUDA  

     
    PAPER-HMM-based Speech Synthesis

      Vol:
    E97-D No:6
      Page(s):
    1438-1448

    This paper proposes a novel approach for integrating spectral feature extraction and acoustic modeling in hidden Markov model (HMM) based speech synthesis. The statistical modeling process of speech waveforms is typically divided into two component modules: the frame-by-frame feature extraction module and the acoustic modeling module. In the feature extraction module, the statistical mel-cepstral analysis technique has been used and the objective function is the likelihood of mel-cepstral coefficients for given speech waveforms. In the acoustic modeling module, the objective function is the likelihood of model parameters for given mel-cepstral coefficients. It is important to improve the performance of each component module for achieving higher quality synthesized speech. However, the final objective of speech synthesis systems is to generate natural speech waveforms from given texts, and the improvement of each component module does not always lead to the improvement of the quality of synthesized speech. Therefore, ideally all objective functions should be optimized based on an integrated criterion which well represents subjective speech quality of human perception. In this paper, we propose an approach to model speech waveforms directly and optimize the final objective function. Experimental results show that the proposed method outperformed the conventional methods in objective and subjective measures.

  • Analysis of Lower Bounds for the Multislope Ski-Rental Problem

    Hiroshi FUJIWARA  Yasuhiro KONNO  Toshihiro FUJITO  

     
    PAPER

      Vol:
    E97-A No:6
      Page(s):
    1200-1205

    The multislope ski-rental problem is an extension of the classical ski-rental problem, where the player has several options of paying both of a per-time fee and an initial fee, in addition to pure renting and buying options. Damaschke gave a lower bound of 3.62 on the competitive ratio for the case where arbitrary number of options can be offered. In this paper we propose a scheme that for the number of options given as an input, provides a lower bound on the competitive ratio, by extending the method of Damaschke. This is the first to establish a lower bound for each of the 5-or-more-option cases, for example, a lower bound of 2.95 for the 5-option case, 3.08 for the 6-option case, and 3.18 for the 7-option case. Moreover, it turns out that our lower bounds for the 3- and 4-option cases respectively coincide with the known upper bounds. We therefore conjecture that our scheme in general derives a matching lower and upper bound.

  • High-Throughput Partially Parallel Inter-Chip Link Architecture for Asynchronous Multi-Chip NoCs

    Naoya ONIZAWA  Akira MOCHIZUKI  Hirokatsu SHIRAHAMA  Masashi IMAI  Tomohiro YONEDA  Takahiro HANYU  

     
    PAPER-Dependable Computing

      Vol:
    E97-D No:6
      Page(s):
    1546-1556

    This paper introduces a partially parallel inter-chip link architecture for asynchronous multi-chip Network-on-Chips (NoCs). The multi-chip NoCs that operate as a large NoC have been recently proposed for very large systems, such as automotive applications. Inter-chip links are key elements to realize high-performance multi-chip NoCs using a limited number of I/Os. The proposed asynchronous link based on level-encoded dual-rail (LEDR) encoding transmits several bits in parallel that are received by detecting the phase information of the LEDR signals at each serial link. It employs a burst-mode data transmission that eliminates a per-bit handshake for a high-speed operation, but the elimination may cause data-transmission errors due to cross-talk and power-supply noises. For triggering data retransmission, errors are detected from the embedded phase information; error-detection codes are not used. The throughput is theoretically modelled and is optimized by considering the bit-error rate (BER) of the link. Using delay parameters estimated for a 0.13 µm CMOS technology, the throughput of 8.82 Gbps is achieved by using 10 I/Os, which is 90.5% higher than that of a link using 9 I/Os without an error-detection method operating under negligible low BER (<10-20).

  • Noise-Robust Voice Conversion Based on Sparse Spectral Mapping Using Non-negative Matrix Factorization

    Ryo AIHARA  Ryoichi TAKASHIMA  Tetsuya TAKIGUCHI  Yasuo ARIKI  

     
    PAPER-Voice Conversion and Speech Enhancement

      Vol:
    E97-D No:6
      Page(s):
    1411-1418

    This paper presents a voice conversion (VC) technique for noisy environments based on a sparse representation of speech. Sparse representation-based VC using Non-negative matrix factorization (NMF) is employed for noise-added spectral conversion between different speakers. In our previous exemplar-based VC method, source exemplars and target exemplars are extracted from parallel training data, having the same texts uttered by the source and target speakers. The input source signal is represented using the source exemplars and their weights. Then, the converted speech is constructed from the target exemplars and the weights related to the source exemplars. However, this exemplar-based approach needs to hold all training exemplars (frames), and it requires high computation times to obtain the weights of the source exemplars. In this paper, we propose a framework to train the basis matrices of the source and target exemplars so that they have a common weight matrix. By using the basis matrices instead of the exemplars, the VC is performed with lower computation times than with the exemplar-based method. The effectiveness of this method was confirmed by comparing its effectiveness (in speaker conversion experiments using noise-added speech data) with that of an exemplar-based method and a conventional Gaussian mixture model (GMM)-based method.

  • A Correctness Assurance Approach to Automatic Synthesis of Composite Web Services

    Dajuan FAN  Zhiqiu HUANG  Lei TANG  

     
    PAPER-Data Engineering, Web Information Systems

      Vol:
    E97-D No:6
      Page(s):
    1535-1545

    One of the most important problems in web services application is the integration of different existing services into a new composite service. Existing work has the following disadvantages: (i) developers are often required to provide a composite service model first and perform formal verifications to check whether the model is correct. This makes the synthesis process of composite services semi-automatic, complex and inefficient; (ii) there is no assurance that composite services synthesized by using the fully-automatic approaches are correct; (iii) some approaches only handle simple composition problems where existing services are atomic. To address these problems, we propose a correct assurance approach for automatically synthesizing composite services based on finite state machine model. The syntax and semantics of the requirement model specifying composition requirements is also proposed. Given a set of abstract BPEL descriptions of existing services, and a composition requirement, our approach automatically generate the BPEL implementation of the composite service. Compared with existing approaches, the composite service generated by utilizing our proposed approach is guaranteed to be correct and does not require any formal verification. The correctness of our approach is proved. Moreover, the case analysis indicates that our approach is feasible and effective.

  • An Advanced Cooperative Scheme in the Broadcasting and Cellular System

    Hyun-Jun SHIN  Hyun-Woo JANG  Hyoung-Kyu SONG  

     
    LETTER-Fundamentals of Information Systems

      Vol:
    E97-D No:6
      Page(s):
    1634-1638

    In this letter, a cooperative scheme is proposed for the broadcasting and cellular communication system. The proposed scheme improves bit error rate (BER) performance and throughput on the edge of a cellular base station (CBS) cooperating with another CBS in the same broadcasting coverage. The proposed scheme for the enhancement of BER performance employs two schemes by a channel quality information (CQI) between a broadcasting base station (BBS) and users. In a physical area, the edge of a CBS is concatenated with the edge of another CBS. When users are on the edge of a CBS, they transmit simultaneously the CQI to CBSs, and then a BBS and CBSs transmit signals by the proposed algorithm. The two schemes apply space-time cyclic delay diversity (CDD) and a combination of space-time block code (STBC) with vertical Bell Laboratories Layered Space-Time (V-BLAST) to a signal from a BBS and CBSs. The resulting performance indicates that the proposed scheme is effective for users on the edges of CBSs.

4061-4080hit(16314hit)