The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] SiON(4624hit)

1101-1120hit(4624hit)

  • Implicit Generation of Pattern-Avoiding Permutations by Using Permutation Decision Diagrams

    Yuma INOUE  Takahisa TODA  Shin-ichi MINATO  

     
    PAPER

      Vol:
    E97-A No:6
      Page(s):
    1171-1179

    Pattern-avoiding permutations are permutations where none of the subsequences matches the relative order of a given pattern. Pattern-avoiding permutations are related to practical and abstract mathematical problems and can provide simple representations for such problems. For example, some floorplans, which are used for optimizing very-large-scale integration (VLSI) circuit design, can be encoded into pattern-avoiding permutations. The generation of pattern-avoiding permutations is an important topic in efficient VLSI design and mathematical analysis of patten-avoiding permutations. In this paper, we present an algorithm for generating pattern-avoiding permutations, and extend this algorithm beyond classical patterns to generalized patterns with more restrictions. Our approach is based on the data structure πDDs, which can represent a permutation set compactly and has useful set operations. We demonstrate the efficiency of our algorithm by computational experiments.

  • Feature Fusion for Blurring Detection in Image Forensics

    BenJuan YANG  BenYong LIU  

     
    LETTER-Image Processing and Video Processing

      Vol:
    E97-D No:6
      Page(s):
    1690-1693

    Artificial blurring is a typical operation in image forging. Most existing image forgery detection methods consider only one single feature of artificial blurring operation. In this manuscript, we propose to adopt feature fusion, with multifeatures for artificial blurring operation in image tampering, to improve the accuracy of forgery detection. First, three feature vectors that address the singular values of the gray image matrix, correlation coefficients for double blurring operation, and image quality metrics (IQM) are extracted and fused using principal component analysis (PCA), and then a support vector machine (SVM) classifier is trained using the fused feature extracted from training images or image patches containing artificial blurring operations. Finally, the same procedures of feature extraction and feature fusion are carried out on the suspected image or suspected image patch which is then classified, using the trained SVM, into forged or non-forged classes. Experimental results show the feasibility of the proposed method for image tampering feature fusion and forgery detection.

  • A Pipelined Architecture for Intra PU Encoding in HEVC

    Yunpyo HONG  Juwon BYUN  Youngjo KIM  Jaeseok KIM  

     
    LETTER-Image

      Vol:
    E97-A No:6
      Page(s):
    1439-1442

    This letter proposes a pipelined architecture with prediction mode scheduling for high efficiency video coding (HEVC). An increased number of intra prediction modes in HEVC have introduced a new technique, named rough mode decision (RMD). This development, however, means that pipeline architectures for H.264 cannot be used in HEVC. The proposed scheme executes the RMD and the rate-distortion optimization (RDO) process simultaneously by grouping the intra prediction modes and changing the candidate selection method of the RMD algorithm. The proposed scheme reduces execution cycle by up to 26% with negligible coding loss.

  • Improvement of Semi-Random Measurement Matrix for Compressed Sensing

    Wentao LV  Junfeng WANG  Wenxian YU  Zhen TAN  

     
    LETTER-Digital Signal Processing

      Vol:
    E97-A No:6
      Page(s):
    1426-1429

    In compressed sensing, the design of the measurement matrix is a key work. In order to achieve a more precise reconstruction result, the columns of the measurement matrix should have better orthogonality or linear incoherence. A random matrix, like a Gaussian random matrix (GRM), is commonly adopted as the measurement matrix currently. However, the columns of the random matrix are only statistically-orthogonal. By substituting an orthogonal basis into the random matrix to construct a semi-random measurement matrix and by optimizing the mutual coherence between dictionary columns to approach a theoretical lower bound, the linear incoherence of the measurement matrix can be greatly improved. With this optimization measurement matrix, the signal can be reconstructed from its measures more precisely.

  • A Lossy Identification Scheme Using the Subgroup Decision Assumption

    Shingo HASEGAWA  Shuji ISOBE  

     
    PAPER

      Vol:
    E97-A No:6
      Page(s):
    1296-1306

    Lossy identification schemes are used to construct tightly secure signature schemes via the Fiat-Shamir heuristic in the random oracle model. Several lossy identification schemes are instantiated by using the short discrete logarithm assumption, the ring-LWE assumption and the subset sum assumption, respectively. For assumptions concerning the integer factoring, Abdalla, Ben Hamouda and Pointcheval [3] recently presented lossy identification schemes based on the φ-hiding assumption, the QR assumption and the DCR assumption, respectively. In this paper, we propose new instantiations of lossy identification schemes. We first construct a variant of the Schnorr's identification scheme, and show its lossiness under the subgroup decision assumption. We also construct a lossy identification scheme which is based on the DCR assumption. Our DCR-based scheme has an advantage relative to the ABP's DCR-based scheme since our scheme needs no modular exponentiation in the response phase. Therefore our scheme is suitable when it is transformed to an online/offline signature.

  • Noise-Robust Voice Conversion Based on Sparse Spectral Mapping Using Non-negative Matrix Factorization

    Ryo AIHARA  Ryoichi TAKASHIMA  Tetsuya TAKIGUCHI  Yasuo ARIKI  

     
    PAPER-Voice Conversion and Speech Enhancement

      Vol:
    E97-D No:6
      Page(s):
    1411-1418

    This paper presents a voice conversion (VC) technique for noisy environments based on a sparse representation of speech. Sparse representation-based VC using Non-negative matrix factorization (NMF) is employed for noise-added spectral conversion between different speakers. In our previous exemplar-based VC method, source exemplars and target exemplars are extracted from parallel training data, having the same texts uttered by the source and target speakers. The input source signal is represented using the source exemplars and their weights. Then, the converted speech is constructed from the target exemplars and the weights related to the source exemplars. However, this exemplar-based approach needs to hold all training exemplars (frames), and it requires high computation times to obtain the weights of the source exemplars. In this paper, we propose a framework to train the basis matrices of the source and target exemplars so that they have a common weight matrix. By using the basis matrices instead of the exemplars, the VC is performed with lower computation times than with the exemplar-based method. The effectiveness of this method was confirmed by comparing its effectiveness (in speaker conversion experiments using noise-added speech data) with that of an exemplar-based method and a conventional Gaussian mixture model (GMM)-based method.

  • Structured Adaptive Regularization of Weight Vectors for a Robust Grapheme-to-Phoneme Conversion Model

    Keigo KUBO  Sakriani SAKTI  Graham NEUBIG  Tomoki TODA  Satoshi NAKAMURA  

     
    PAPER-Speech Synthesis and Related Topics

      Vol:
    E97-D No:6
      Page(s):
    1468-1476

    Grapheme-to-phoneme (g2p) conversion, used to estimate the pronunciations of out-of-vocabulary (OOV) words, is a highly important part of recognition systems, as well as text-to-speech systems. The current state-of-the-art approach in g2p conversion is structured learning based on the Margin Infused Relaxed Algorithm (MIRA), which is an online discriminative training method for multiclass classification. However, it is known that the aggressive weight update method of MIRA is prone to overfitting, even if the current example is an outlier or noisy. Adaptive Regularization of Weight Vectors (AROW) has been proposed to resolve this problem for binary classification. In addition, AROW's update rule is simpler and more efficient than that of MIRA, allowing for more efficient training. Although AROW has these advantages, it has not been applied to g2p conversion yet. In this paper, we first apply AROW on g2p conversion task which is structured learning problem. In an evaluation that employed a dataset generated from the collective knowledge on the Web, our proposed approach achieves a 6.8% error reduction rate compared to MIRA in terms of phoneme error rate. Also the learning time of our proposed approach was shorter than that of MIRA in almost datasets.

  • Unsupervised Prosodic Labeling of Speech Synthesis Databases Using Context-Dependent HMMs

    Chen-Yu YANG  Zhen-Hua LING  Li-Rong DAI  

     
    PAPER-Speech Synthesis and Related Topics

      Vol:
    E97-D No:6
      Page(s):
    1449-1460

    In this paper, an automatic and unsupervised method using context-dependent hidden Markov models (CD-HMMs) is proposed for the prosodic labeling of speech synthesis databases. This method consists of three main steps, i.e., initialization, model training and prosodic labeling. The initial prosodic labels are obtained by unsupervised clustering using the acoustic features designed according to the characteristics of the prosodic descriptor to be labeled. Then, CD-HMMs of the spectral parameters, F0s and phone durations are estimated by a means similar to the HMM-based parametric speech synthesis using the initial prosodic labels. These labels are further updated by Viterbi decoding under the maximum likelihood criterion given the acoustic feature sequences and the trained CD-HMMs. The model training and prosodic labeling procedures are conducted iteratively until convergence. The performance of the proposed method is evaluated on Mandarin speech synthesis databases and two prosodic descriptors are investigated, i.e., the prosodic phrase boundary and the emphasis expression. In our implementation, the prosodic phrase boundary labels are initialized by clustering the durations of the pauses between every two consecutive prosodic words, and the emphasis expression labels are initialized by examining the differences between the original and the synthetic F0 trajectories. Experimental results show that the proposed method is able to label the prosodic phrase boundary positions much more accurately than the text-analysis-based method without requiring any manually labeled training data. The unit selection speech synthesis system constructed using the prosodic phrase boundary labels generated by our proposed method achieves similar performance to that using the manual labels. Furthermore, the unit selection speech synthesis system constructed using the emphasis expression labels generated by our proposed method can convey the emphasis information effectively while maintaining the naturalness of synthetic speech.

  • Variable Selection Linear Regression for Robust Speech Recognition

    Yu TSAO  Ting-Yao HU  Sakriani SAKTI  Satoshi NAKAMURA  Lin-shan LEE  

     
    PAPER-Speech Recognition

      Vol:
    E97-D No:6
      Page(s):
    1477-1487

    This study proposes a variable selection linear regression (VSLR) adaptation framework to improve the accuracy of automatic speech recognition (ASR) with only limited and unlabeled adaptation data. The proposed framework can be divided into three phases. The first phase prepares multiple variable subsets by applying a ranking filter to the original regression variable set. The second phase determines the best variable subset based on a pre-determined performance evaluation criterion and computes a linear regression (LR) mapping function based on the determined subset. The third phase performs adaptation in either model or feature spaces. The three phases can select the optimal components and remove redundancies in the LR mapping function effectively and thus enable VSLR to provide satisfactory adaptation performance even with a very limited number of adaptation statistics. We formulate model space VSLR and feature space VSLR by integrating the VS techniques into the conventional LR adaptation systems. Experimental results on the Aurora-4 task show that model space VSLR and feature space VSLR, respectively, outperform standard maximum likelihood linear regression (MLLR) and feature space MLLR (fMLLR) and their extensions, with notable word error rate (WER) reductions in a per-utterance unsupervised adaptation manner.

  • Real Time Spectroscopic Observation of Contact Surfaces Being Eroded by Break Arcs

    Masato NAKAMURA  Junya SEKIKAWA  

     
    PAPER-Electromechanical Devices and Components

      Vol:
    E97-C No:6
      Page(s):
    592-598

    Break arcs are generated in a DC48V and 12A resistive circuit. Silver electrical contacts are separated at constant opening speed. The cathode contact surface is irradiated by a blue LED. The center wavelength of the emission of the LED is 470nm. There is no spectral line of the light emitted from the break arcs. Only the images of contact surface are observed by a high-speed camera and an optical band pass filter. Another high-speed camera observes only the images of the break arc. Time evolutions of the cathode surface morphology being eroded by the break arcs and the motion of the break arcs are observed with these cameras, simultaneously. The images of the cathode surface are investigated by the image analysis technique. The results show that the moments when the expanded regions on the cathode surface are formed during the occurrence of the break arcs. In addition, it is shown that the expanded regions are not contacted directly to the cathode roots of the break arcs.

  • High Capacity Mobile Multi-Hop Relay Network for Temporary Traffic Surge

    Ju-Ho LEE  Goo-Yeon LEE  Choong-Kyo JEONG  

     
    LETTER-Information Network

      Vol:
    E97-D No:6
      Page(s):
    1661-1663

    Mobile Multi-hop Relay (MMR) technology is usually used to increase the transmission rate or to extend communication coverage. In this work, we show that MMR technology can also be used to raise the network capacity. Because Relay Stations (RS) are connected to the Base Station (BS) wirelessly and controlled by the BS, an MMR network can easily be deployed when necessary. High capacity MMR networks thus provide a good candidate solution for coping with temporary traffic surges. For the capacity enhancement of the MMR network, we suggest a novel scheme to parallelize cell transmissions while controlling the interference between transmissions. Using a numerical example for a typical network that is conformant to the IEEE 802.16j, we find that the network capacity increases by 88 percent.

  • A Hybrid Approach to Electrolaryngeal Speech Enhancement Based on Noise Reduction and Statistical Excitation Generation

    Kou TANAKA  Tomoki TODA  Graham NEUBIG  Sakriani SAKTI  Satoshi NAKAMURA  

     
    PAPER-Voice Conversion and Speech Enhancement

      Vol:
    E97-D No:6
      Page(s):
    1429-1437

    This paper presents an electrolaryngeal (EL) speech enhancement method capable of significantly improving naturalness of EL speech while causing no degradation in its intelligibility. An electrolarynx is an external device that artificially generates excitation sounds to enable laryngectomees to produce EL speech. Although proficient laryngectomees can produce quite intelligible EL speech, it sounds very unnatural due to the mechanical excitation produced by the device. Moreover, the excitation sounds produced by the device often leak outside, adding to EL speech as noise. To address these issues, there are mainly two conventional approached to EL speech enhancement through either noise reduction or statistical voice conversion (VC). The former approach usually causes no degradation in intelligibility but yields only small improvements in naturalness as the mechanical excitation sounds remain essentially unchanged. On the other hand, the latter approach significantly improves naturalness of EL speech using spectral and excitation parameters of natural voices converted from acoustic parameters of EL speech, but it usually causes degradation in intelligibility owing to errors in conversion. We propose a hybrid approach using a noise reduction method for enhancing spectral parameters and statistical voice conversion method for predicting excitation parameters. Moreover, we further modify the prediction process of the excitation parameters to improve its prediction accuracy and reduce adverse effects caused by unvoiced/voiced prediction errors. The experimental results demonstrate the proposed method yields significant improvements in naturalness compared with EL speech while keeping intelligibility high enough.

  • Voice Timbre Control Based on Perceived Age in Singing Voice Conversion

    Kazuhiro KOBAYASHI  Tomoki TODA  Hironori DOI  Tomoyasu NAKANO  Masataka GOTO  Graham NEUBIG  Sakriani SAKTI  Satoshi NAKAMURA  

     
    PAPER-Voice Conversion and Speech Enhancement

      Vol:
    E97-D No:6
      Page(s):
    1419-1428

    The perceived age of a singing voice is the age of the singer as perceived by the listener, and is one of the notable characteristics that determines perceptions of a song. In this paper, we describe an investigation of acoustic features that have an effect on the perceived age, and a novel voice timbre control technique based on the perceived age for singing voice conversion (SVC). Singers can sing expressively by controlling prosody and voice timbre, but the varieties of voices that singers can produce are limited by physical constraints. Previous work has attempted to overcome this limitation through the use of statistical voice conversion. This technique makes it possible to convert singing voice timbre of an arbitrary source singer into those of an arbitrary target singer. However, it is still difficult to intuitively control singing voice characteristics by manipulating parameters corresponding to specific physical traits, such as gender and age. In this paper, we first perform an investigation of the factors that play a part in the listener's perception of the singer's age at first. Then, we applied a multiple-regression Gaussian mixture models (MR-GMM) to SVC for the purpose of controlling voice timbre based on the perceived age and we propose SVC based on the modified MR-GMM for manipulating the perceived age while maintaining singer's individuality. The experimental results show that 1) the perceived age of singing voices corresponds relatively well to the actual age of the singer, 2) prosodic features have a larger effect on the perceived age than spectral features, 3) the individuality of a singer is influenced more heavily by segmental features than prosodic features 4) the proposed voice timbre control method makes it possible to change the singer's perceived age while not having an adverse effect on the perceived individuality.

  • Voice Conversion Based on Speaker-Dependent Restricted Boltzmann Machines

    Toru NAKASHIKA  Tetsuya TAKIGUCHI  Yasuo ARIKI  

     
    PAPER-Voice Conversion and Speech Enhancement

      Vol:
    E97-D No:6
      Page(s):
    1403-1410

    This paper presents a voice conversion technique using speaker-dependent Restricted Boltzmann Machines (RBM) to build high-order eigen spaces of source/target speakers, where it is easier to convert the source speech to the target speech than in the traditional cepstrum space. We build a deep conversion architecture that concatenates the two speaker-dependent RBMs with neural networks, expecting that they automatically discover abstractions to express the original input features. Under this concept, if we train the RBMs using only the speech of an individual speaker that includes various phonemes while keeping the speaker individuality unchanged, it can be considered that there are fewer phonemes and relatively more speaker individuality in the output features of the hidden layer than original acoustic features. Training the RBMs for a source speaker and a target speaker, we can then connect and convert the speaker individuality abstractions using Neural Networks (NN). The converted abstraction of the source speaker is then back-propagated into the acoustic space (e.g., MFCC) using the RBM of the target speaker. We conducted speaker-voice conversion experiments and confirmed the efficacy of our method with respect to subjective and objective criteria, comparing it with the conventional Gaussian Mixture Model-based method and an ordinary NN.

  • DOA and DOD Estimation Using Orthogonal Projection Approach for Bistatic MIMO Radars

    Ann-Chen CHANG  Chih-Chang SHEN  Kai-Shiang CHANG  

     
    LETTER-Digital Signal Processing

      Vol:
    E97-A No:5
      Page(s):
    1121-1124

    In this letter, the orthogonal projection (OP) estimation of the direction of arrival (DOA) and direction of departure (DOD) of multiple targets for bistatic multiple-input multiple-output radars is addressed. First, a two-dimensional direction finding estimator based on OP technique with automatic pairing is developed. Second, this letter also presents a modified reduced-dimension estimator by utilizing the characteristic of Kronecker product, which only performs two one-dimensional angle estimates. Furthermore, the DOA and DOD pairing is given automatically. Finally, simulation results are presented to verify the efficiency of the proposed estimators.

  • Computation of the Total Autocorrelation over Shared Binary Decision Diagrams

    Miloš RADMANOVIC  Radomir S. STANKOVIC  Claudio MORAGA  

     
    LETTER-VLSI Design Technology and CAD

      Vol:
    E97-A No:5
      Page(s):
    1140-1143

    This paper describes a method for the efficient computation of the total autocorrelation for large multiple-output Boolean functions over a Shared Binary Decision Diagram (SBDD). The existing methods for computing the total autocorrelation over decision diagrams are restricted to single output functions and in the case of multiple-output functions require repeating the procedure k times where k is the number of outputs. The proposed method permits to perform the computation in a single traversal of SBDD. In that order, compared to standard BDD packages, we modified the way of traversing sub-diagrams in SBDD and introduced an additional memory function kept in the hash table for storing results of the computation of the autocorrelation between two subdiagrams in the SBDD. Due to that, the total amount of computations is reduced which makes the method feasible in practical applications. Experimental results over standard benchmarks confirm the efficiency of the method.

  • Automatic SfM-Based 2D-to-3D Conversion for Multi-Object Scenes

    Hak Gu KIM  Jin-ku KANG  Byung Cheol SONG  

     
    LETTER-Image

      Vol:
    E97-A No:5
      Page(s):
    1159-1161

    This letter presents an automatic 2D-to-3D conversion method using a structure from motion (SfM) process for multi-object scenes. The foreground and background regions may have different depth values in an image. First, we detect the foreground objects and the background by using a depth histogram. Then, the proposed method creates the virtual image by projecting each region with its computed projective matrix. Experimental results compared to previous research show that the proposed method provides realistic stereoscopic images.

  • Efficient CORDIC-Based Processing Elements in Scalable Complex Matrix Inversion

    Huan HE  Feng YU  Bei ZHAO  

     
    LETTER-Algorithms and Data Structures

      Vol:
    E97-A No:5
      Page(s):
    1144-1148

    In this paper we apply angle recoding to the CORDIC-based processing elements in a scalable architecture for complex matrix inversion. We extend the processing elements from the scalable real matrix inversion architecture to the complex domain and obtain the novel scalable complex matrix inversion architecture, which can significantly reduce computational complexity. We rearrange the CORDIC elements to make one half of the processing elements simple and compact. For the other half of the processing elements, the efficient use of angler recoding reduces the number of microrotation steps of the CORDIC elements to 3/4. Consequently, only 3 CORDIC elements are required for the processing elements with full utilization.

  • Behavior of Inter-Core Crosstalk as a Noise and Its Effect on Q-Factor in Multi-Core Fiber

    Tetsuya HAYASHI  Takashi SASAKI  Eisuke SASAOKA  

     
    PAPER-Fiber-Optic Transmission for Communications

      Vol:
    E97-B No:5
      Page(s):
    936-944

    The stochastic behavior of inter-core crosstalk in multi-core fiber is discussed based on a theoretical model validated by measurements, and the effect of the crosstalk on the Q-factor in transmission systems, using multi-core fiber is investigated theoretically. The measurements show that the crosstalk rapidly changes with wavelength, and gradually changes with time, in obedience to the Gaussian distribution in I-Q planes. Therefore, the behavior of the crosstalk as a noise may depend on the bandwidth of the signal light. If the bandwidth is adequately broad, the crosstalk may behave as a virtual additive white Gaussian noise on I-Q planes, and the Q-penalty at the Q-factor of 9.8dB is less than 1dB when the statistical mean of the crosstalk from other cores is less than -16.7dB for PDM-QPSK, -23.7dB for PDM-16QAM, and -29.9dB for PDM-64QAM. If the bandwidth is adequately narrow, the crosstalk may behave as virtually static coupling that changes very gradually with time and heavily depends on the wavelength. To cope with a static crosstalk much higher than its statistical mean, a margin of several decibels from the mean crosstalk may be necessary for suppressing Q-penalty in the case of adequately narrow bandwidth.

  • High-Sensitive Detection of Electronic Emission through Si-Nanocrystals/Si-Nanocolumnar Structures by Conducting-Probe Atomic Force Microscopy

    Daichi TAKEUCHI  Katsunori MAKIHARA  Mitsuhisa IKEDA  Seiichi MIYAZAKI  Hirokazu KAKI  Tsukasa HAYASHI  

     
    PAPER

      Vol:
    E97-C No:5
      Page(s):
    397-400

    We fabricated highly dense Si nano-columnar structures accompanied with Si nanocrystals on W-coated quartz and characterized their local electrical transport in the thickness direction in a non-contact mode by using a Rh-coated Si cantilever with pulse bias application, in which Vmax, Vmin, and the duty ratio were set at +3.0V, -14V, and 50%, respectively. By applying a pulse bias to the bottom W electrode with respect to a grounded top electrode made of ∼10-nm-thick Au on a sample surface, non-uniform current images in correlation with surface morphologies reflecting electron emission were obtained. The change in the surface potential of the highly dense Si nano-columnar structures accompanied with Si nanocrystals, which were measured at room temperature by using an AFM/Kelvin probe technique, indicated electron injection into and extraction from Si nanocrystals, depending on the tip bias polarity. This result is attributable to efficient electron emission under pulsed bias application due to electron charging from the top electrode to the Si nanocrystals in a positively biased duration at the bottom electrode and subsequent quasi-ballistic transport through Si nanocrystals in a negatively biased duration.

1101-1120hit(4624hit)