The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] SPE(2504hit)


  • Adaptive Control for LED-Based Underwater Wireless Communications Using Visible Light

    Xin LIN  


    E100-A No:1

    One of the major subjects for marine resources development and information processing is how to realize underwater short-range and large-capacity data transmissions. The acoustic wave is an effective carrier and has been used for underwater data transmissions because it has lower attenuation in seawater than the radio wave, and has average propagation distance of about 10km or more. However, along with the imaging of transmission data, the inherent low speed of the acoustic wave makes it cannot and become an ideal carrier for high-speed and large-capacity communications. On the other hand, visible-light wave with wavelength of 400nm-650nm is an ideal carrier, which has received much attention. Its attractive features are high transparency and low attenuation rate in underwater, easily control the propagation direction and range by the visibility, and high data rate and capacity, making it excellent for application in underwater wireless communications. However, visible-light waves in the seawater have the spectral attenuation characteristics due to different marine environment. Therefore, in this paper an underwater optical wireless communication method with adaptation seawater function is considered for seawater turbidity of the spatio-temporal change. Two crucial components in the underwater optical wireless communication system, the light wavelength and the modulation method are controlled using wavelength- and modulation-adaptation techniques, respectively. The effectiveness of the method of the adaptation wavelength is demonstrated in underwater optical image transmissions.

  • Deep Nonlinear Metric Learning for Speaker Verification in the I-Vector Space

    Yong FENG  Qingyu XIONG  Weiren SHI  

    LETTER-Speech and Hearing

    E100-D No:1

    Speaker verification is the task of determining whether two utterances represent the same person. After representing the utterances in the i-vector space, the crucial problem is only how to compute the similarity of two i-vectors. Metric learning has provided a viable solution to this problem. Until now, many metric learning algorithms have been proposed, but they are usually limited to learning a linear transformation. In this paper, we propose a nonlinear metric learning method, which learns an explicit mapping from the original space to an optimal subspace using deep Restricted Boltzmann Machine network. The proposed method is evaluated on the NIST SRE 2008 dataset. Since the proposed method has a deep learning architecture, the evaluation results show superior performance than some state-of-the-art methods.

  • A Low Computational Complexity Algorithm for Compressive Wideband Spectrum Sensing

    Shiyu REN  Zhimin ZENG  Caili GUO  Xuekang SUN  Kun SU  

    LETTER-Digital Signal Processing

    E100-A No:1

    Compressed sensing (CS)-based wideband spectrum sensing approaches have attracted much attention because they release the burden of high signal acquisition costs. However, in CS-based sensing approaches, highly non-linear reconstruction methods are used for spectrum recovery, which require high computational complexity. This letter proposes a two-step compressive wideband sensing algorithm. This algorithm introduces a coarse sensing step to further compress the sub-Nyquist measurements before spectrum recovery in the following compressive fine sensing step, as a result of the significant reduction in computational complexity. Its enabled sufficient condition and computational complexity are analyzed. Even when the sufficient condition is just satisfied, the average reduced ratio of computational complexity can reach 50% compared with directly performing compressive sensing with the excellent algorithm that is used in our fine sensing step.

  • Development of Multistatic Linear Array Radar at 10-20GHz

    Yasunari MORI  Takayoshi YUMII  Yumi ASANO  Kyouji DOI  Christian N. KOYAMA  Yasushi IITSUKA  Kazunori TAKAHASHI  Motoyuki SATO  


    E100-C No:1

    This paper presents a prototype of a 3D imaging step-frequency radar system at 10-20GHz suitable for the nondestructive inspection of the walls of wooden houses. Using this prototype, it is possible to obtain data for 3D imaging with a single simple scan and make 3D volume images of braces — broken or not — in the walls of wooden houses using synthetic aperture radar processing. The system is a multistatic radar composed of a one-dimensional array antenna (32 transmitting and 32 receiving antennas, which are resistively loaded printed bowtie antennas) and is able to acquire frequency domain data for all the transmitting and receiving antenna pairs, i.e., 32×32=1024 pairs, in 33ms per position. On the basis of comparisons between two array antenna prototype designs, we investigated the optimal distance between a transmitting array and a receiving array to reduce the direct coupling effect. We produced a prototype multistatic radar system and used it to measure different types of wooden targets in two experiments. In the first experiment, we measured plywood bars behind a decorated gypsum board, simulating a broken wooden brace inside a house wall. In the second experiment, we measured a wooden brace made of Japanese cypress as a target inside a model of a typical (wooden) Japanese house wall. The results of both experiments demonstrate the imaging capability of the radar prototype for nondestructive inspection of the insides of wooden house walls.

  • Analysis of Pulse Reflection Responses from Periodic Perfect Conductor in Two Dispersion Media

    Ryosuke OZAKI  Tsuneki YAMASAKI  


    E100-C No:1

    In this paper, a periodic perfect conductor is used to investigate the solution for the metallic scatterer problem in soil. We analyzed the pulse reflection responses from the periodic perfect conductor in two dispersion media by varying the parameters for the permittivity properties of the complex dielectric constants, and also investigated the influence of both the dielectric and conductor using a combination of the fast inversion Laplace transform (FILT) method and the point matching method (PMM). In addition, we verified the accuracy of the present method with exact solutions for the transient scattering problem for a perfect conductor plate in the dispersion media.

  • Semantic Motion Signature for Segmentation of High Speed Large Displacement Objects

    Yinhui ZHANG  Zifen HE  

    LETTER-Image Processing and Video Processing

    E100-D No:1

    This paper presents a novel method for unsupervised segmentation of objects with large displacements in high speed video sequences. Our general framework introduces a new foreground object predicting method that finds object hypotheses by encoding both spatial and temporal features via a semantic motion signature scheme. More specifically, temporal cues of object hypotheses are captured by the motion signature proposed in this paper, which is derived from sparse saliency representation imposed on magnitude of optical flow field. We integrate semantic scores derived from deep networks with location priors that allows us to directly estimate appearance potentials of foreground hypotheses. A unified MRF energy functional is proposed to simultaneously incorporate the information from the motion signature and semantic prediction features. The functional enforces both spatial and temporal consistency and impose appearance constancy and spatio-temporal smoothness constraints directly on the object hypotheses. It inherently handles the challenges of segmenting ambiguous objects with large displacements in high speed videos. Our experiments on video object segmentation benchmarks demonstrate the effectiveness of the proposed method for segmenting high speed objects despite the complicated scene dynamics and large displacements.

  • Auto-Radiometric Calibration in Photometric Stereo

    Wiennat MONGKULMANN  Takahiro OKABE  Yoichi SATO  

    PAPER-Image Recognition, Computer Vision

    E99-D No:12

    We propose a framework to perform auto-radiometric calibration in photometric stereo methods to estimate surface orientations of an object from a sequence of images taken using a radiometrically uncalibrated camera under varying illumination conditions. Our proposed framework allows the simultaneous estimation of surface normals and radiometric responses, and as a result can avoid cumbersome and time-consuming radiometric calibration. The key idea of our framework is to use the consistency between the irradiance values converted from pixel values by using the inverse response function and those computed from the surface normals. Consequently, a linear optimization problem is formulated to estimate the surface normals and the response function simultaneously. Finally, experiments on both synthetic and real images demonstrate that our framework enables photometric stereo methods to accurately estimate surface normals even when the images are captured using cameras with unknown and nonlinear response functions.

  • Efficient Search for High-Rate Punctured Convolutional Codes Using Dual Codes

    Sen MORIYA  Kana KIKUCHI  Hiroshi SASANO  

    PAPER-Coding Theory and Techniques

    E99-A No:12

    In this study, we consider techniques to search for high-rate punctured convolutional code (PCC) encoders using dual code encoders. A low-rate R=1/n convolutional code (CC) has a dual code that is identical to a PCC with rate R=(n-1)/n. This implies that a rate R=1/n convolutional code encoder can assist in searches for high-rate PCC encoders. On the other hand, we can derive a rate R=1/n CC encoder from good PCC encoders with rate R=(n-1)/n using dual code encoders. This paper proposes a method to obtain improved high-rate PCC encoders, using exhaustive search results of PCC encoders with rate R=1/3 original encoders, and dual code encoders. We also show some PCC encoders obtained by searches that utilized our method.

  • Joint Optimization of Peak-to-Average Power Ratio and Spectral Leakage in NC-OFDM

    Peng WEI  Lilin DAN  Yue XIAO  Shaoqian LI  

    PAPER-Wireless Communication Technologies

    E99-B No:12

    High peak-to-average power ratio (PAPR) and spectral leakage are two main problems of orthogonal frequency division multiplexing (OFDM) systems. For alleviating the above problems, this paper proposes a joint model which efficiently suppresses both PAPR and spectral leakage, by combining serial peak cancellation (SPC) and time-domain N-continuous OFDM (TD-NC-OFDM) in an iterative way. Furthermore, we give an analytical expression of the proposed joint model to analyze the mutual effects between SPC and TD-NC-OFDM. Lastly, simulation results also support that the joint optimization model can obtain notable PAPR reduction and sidelobe suppression performance with low implementation cost.

  • Non-Native Text-to-Speech Preserving Speaker Individuality Based on Partial Correction of Prosodic and Phonetic Characteristics

    Yuji OSHIMA  Shinnosuke TAKAMICHI  Tomoki TODA  Graham NEUBIG  Sakriani SAKTI  Satoshi NAKAMURA  

    PAPER-Speech and Hearing

    E99-D No:12

    This paper presents a novel non-native speech synthesis technique that preserves the individuality of a non-native speaker. Cross-lingual speech synthesis based on voice conversion or Hidden Markov Model (HMM)-based speech synthesis is a technique to synthesize foreign language speech using a target speaker's natural speech uttered in his/her mother tongue. Although the technique holds promise to improve a wide variety of applications, it tends to cause degradation of target speaker's individuality in synthetic speech compared to intra-lingual speech synthesis. This paper proposes a new approach to speech synthesis that preserves speaker individuality by using non-native speech spoken by the target speaker. Although the use of non-native speech makes it possible to preserve the speaker individuality in the synthesized target speech, naturalness is significantly degraded as the synthesized speech waveform is directly affected by unnatural prosody and pronunciation often caused by differences in the linguistic systems of the source and target languages. To improve naturalness while preserving speaker individuality, we propose (1) a prosody correction method based on model adaptation, and (2) a phonetic correction method based on spectrum replacement for unvoiced consonants. The experimental results using English speech uttered by native Japanese speakers demonstrate that (1) the proposed methods are capable of significantly improving naturalness while preserving the speaker individuality in synthetic speech, and (2) the proposed methods also improve intelligibility as confirmed by a dictation test.

  • Logic-Path-and-Clock-Path-Aware At-Speed Scan Test Generation

    Fuqiang LI  Xiaoqing WEN  Kohei MIYASE  Stefan HOLST  Seiji KAJIHARA  


    E99-A No:12

    Excessive IR-drop in capture mode during at-speed scan testing may cause timing errors for defect-free circuits, resulting in undue test yield loss. Previous solutions for achieving capture-power-safety adjust the switching activity around logic paths, especially long sensitized paths, in order to reduce the impact of IR-drop. However, those solutions ignore the impact of IR-drop on clock paths, namely test clock stretch; as a result, they cannot accurately achieve capture-power-safety. This paper proposes a novel scheme, called LP-CP-aware ATPG, for generating high-quality capture-power-safe at-speed scan test vectors by taking into consideration the switching activity around both logic and clock paths. This scheme features (1) LP-CP-aware path classification for characterizing long sensitized paths by considering the IR-drop impact on both logic and clock paths; (2) LP-CP-aware X-restoration for obtaining more effective X-bits by backtracing from both logic and clock paths; (3) LP-CP-aware X-filling for using different strategies according to the positions of X-bits in test cubes. Experimental results on large benchmark circuits demonstrate the advantages of LP-CP-aware ATPG, which can more accurately achieve capture-power-safety without significant test vector count inflation and test quality loss.

  • Development of Zinc Oxide Spatial Light Modulator for High-Yield Speckle Modulation Open Access

    Naoya TATE  Tadashi KAWAZOE  Shunsuke NAKASHIMA  Wataru NOMURA  Motoichi OHTSU  


    E99-C No:11

    In order to realize high-yield speckle modulation, we developed a novel spatial light modulator using zinc oxide single crystal doped with nitrogen ions. The distribution of dopants was optimized to induce characteristic optical functions by applying an annealing method developed by us. The device is driven by a current in the in-plane direction, which induces magnetic fields. These fields strongly interact with the doped material, and the spatial distribution of the refractive index is correspondingly modulated via external control. Using this device, we experimentally demonstrated speckle modulation, and we discuss the quantitative superiority of our approach.

  • Combining Fisher Criterion and Deep Learning for Patterned Fabric Defect Inspection

    Yundong LI  Jiyue ZHANG  Yubing LIN  

    LETTER-Image Recognition, Computer Vision

    E99-D No:11

    In this letter, we propose a novel discriminative representation for patterned fabric defect inspection when only limited negative samples are available. Fisher criterion is introduced into the loss function of deep learning, which can guide the learning direction of deep networks and make the extracted features more discriminating. A deep neural network constructed from the encoder part of trained autoencoders is utilized to classify each pixel in the images into defective or defectless categories, using as context a patch centered on the pixel. Sequentially the confidence map is processed by median filtering and binary thresholding, and then the defect areas are located. Experimental results demonstrate that our method achieves state-of-the-art performance on the benchmark fabric images.

  • Improving Performance of Heuristic Algorithms by Lebesgue Spectrum Filter Open Access

    Mikio HASEGAWA  


    E99-B No:11

    The previous researches on the chaotic CDMA have theoretically derived the chaotic sequences having the minimum asynchronous cross-correlation. To minimize the asynchronous cross-correlation, autocorrelation of each sequence have to be C(τ)≈C×rτ, r=-2+√3, dumped oscillation with increase of the lag τ. There are several methods to generate such sequences, using a chaotic map, using the Lebesgue spectrum filter (LSF) and so on. In this paper, such lowest cross-correlation found in the chaotic CDMA researches is applied to solution search algorithms for combinatorial optimization problems. In combinatorial optimization, effectiveness of the chaotic search has already been clarified. First, an importance of chaos and autocorrelation with dumped oscillation for combinatorial optimization is shown. Next, in order to realize ideal solution search, the LSF is applied to the Hopfield-Tank neural network, the 2-opt method and the 2-exchange method. Effectiveness of the LSF is clarified even for the large problems for the traveling salesman problems and the quadratic assignment problems.

  • Harmonic-Based Robust Voice Activity Detection for Enhanced Low SNR Noisy Speech Recognition System

    Po-Yi SHIH  Po-Chuan LIN  Jhing-Fa WANG  

    PAPER-Speech and Hearing

    E99-A No:11

    This paper describes a novel harmonic-based robust voice activity detection (H-RVAD) method with harmonic spectral local peak (HSLP) feature. HSLP is extracted by spectral amplitude analysis between the adjacent formants, and such characteristic can be used to identify and verify audio stream containing meaningful human speech accurately in low SNR environment. And, an enhanced low SNR noisy speech recognition system framework with wakeup module, speech recognition module and confirmation module is proposed. Users can determine or reject the system feedback while a recognition result was given in the framework, to prevent any chance that the voiced noise misleads the recognition result. The H-RVAD method is evaluated by the AURORA2 corpus in eight types of noise and three SNR levels and increased overall average performance from 4% to 20%. In home noise, the performance of H-RVAD method can be performed from 4% to 14% sentence recognition rate in average.

  • Fast Spectral BRDF & BTDF Measurements for Characterization of Displays and Components Open Access

    Pierre BOHER  Thierry LEROUX  Véronique COLLOMB-PATTON  Thibault BIGNON  


    E99-C No:11

    In the present paper we show how to obtain rapidly the spectral BRDF and BTDF of different display components or transparent displays using Fourier optics system under different illumination configurations. Results can be used to simulate the entire structure of a LCD display or to predict transparent display performances under various illuminations.

  • Speaker Adaptive Training Localizing Speaker Modules in DNN for Hybrid DNN-HMM Speech Recognizers

    Tsubasa OCHIAI  Shigeki MATSUDA  Hideyuki WATANABE  Xugang LU  Chiori HORI  Hisashi KAWAI  Shigeru KATAGIRI  

    PAPER-Acoustic modeling

    E99-D No:10

    Among various training concepts for speaker adaptation, Speaker Adaptive Training (SAT) has been successfully applied to a standard Hidden Markov Model (HMM) speech recognizer, whose state is associated with Gaussian Mixture Models (GMMs). On the other hand, focusing on the high discriminative power of Deep Neural Networks (DNNs), a new type of speech recognizer structure, which combines DNNs and HMMs, has been vigorously investigated in the speaker adaptation research field. Along these two lines, it is natural to conceive of further improvement to a DNN-HMM recognizer by employing the training concept of SAT. In this paper, we propose a novel speaker adaptation scheme that applies SAT to a DNN-HMM recognizer. Our SAT scheme allocates a Speaker Dependent (SD) module to one of the intermediate layers of DNN, treats its remaining layers as a Speaker Independent (SI) module, and jointly trains the SD and SI modules while switching the SD module in a speaker-by-speaker manner. We implement the scheme using a DNN-HMM recognizer, whose DNN has seven layers, and elaborate its utility over TED Talks corpus data. Our experimental results show that in the supervised adaptation scenario, our Speaker-Adapted (SA) SAT-based recognizer reduces the word error rate of the baseline SI recognizer and the lowest word error rate of the SA SI recognizer by 8.4% and 0.7%, respectively, and by 6.4% and 0.6% in the unsupervised adaptation scenario. The error reductions gained by our SA-SAT-based recognizers proved to be significant by statistical testing. The results also show that our SAT-based adaptation outperforms, regardless of the SD module layer selection, its counterpart SI-based adaptation, and that the inner layers of DNN seem more suitable for SD module allocation than the outer layers.

  • Speech Analysis Method Based on Source-Filter Model Using Multivariate Empirical Mode Decomposition

    Surasak BOONKLA  Masashi UNOKI  Stanislav S. MAKHANOV  Chai WUTIWIWATCHAI  

    PAPER-Speech and Hearing

    E99-A No:10

    We propose a speech analysis method based on the source-filter model using multivariate empirical mode decomposition (MEMD). The proposed method takes multiple adjacent frames of a speech signal into account by combining their log spectra into multivariate signals. The multivariate signals are then decomposed into intrinsic mode functions (IMFs). The IMFs are divided into two groups using the peak of the autocorrelation function (ACF) of an IMF. The first group characterized by a spectral fine structure is used to estimate the fundamental frequency F0 by using the ACF, whereas the second group characterized by the frequency response of the vocal-tract filter is used to estimate formant frequencies by using a peak picking technique. There are two advantages of using MEMD: (i) the variation in the number of IMFs is eliminated in contrast with single-frame based empirical mode decomposition and (ii) the common information of the adjacent frames aligns in the same order of IMFs because of the common mode alignment property of MEMD. These advantages make the analysis more accurate than with other methods. As opposed to the conventional linear prediction (LP) and cepstrum methods, which rely on the LP order and cut-off frequency, respectively, the proposed method automatically separates the glottal-source and vocal-tract filter. The results showed that the proposed method exhibits the highest accuracy of F0 estimation and correctly estimates the formant frequencies of the vocal-tract filter.

  • Simple Weighted Diversity Combining Technique for Cyclostationarity Detection Based Spectrum Sensing in Cognitive Radio Networks

    Daiki CHO  Shusuke NARIEDA  

    PAPER-Wireless Communication Technologies

    E99-B No:10

    This paper presents a weighted diversity combining technique for the cyclostationarity detection based spectrum sensing of orthogonal frequency division multiplexing signals in cognitive radio. In cognitive radio systems, secondary users must detect the desired signal in an extremely low signal-to-noise ratio (SNR) environment. In such an environment, multiple antenna techniques (space diversity) such as maximum ratio combining are not effective because the energy of the target signal is also extremely weak, and it is difficult to synchronize some received signals. The cyclic autocorrelation function (CAF) is used for traditional cyclostationarity detection based spectrum sensing. In the presented technique, the CAFs of the received signals are combined, while the received signals themselves are combined with general space diversity techniques. In this paper, the value of the CAF at peak and non-peak cyclic frequencies are computed, and we attempt to improve the sensing performance by using different weights for each CAF value. The results were compared with those from conventional methods and showed that the presented technique can improve the spectrum sensing performance.

  • Multi-Task Learning in Deep Neural Networks for Mandarin-English Code-Mixing Speech Recognition

    Mengzhe CHEN  Jielin PAN  Qingwei ZHAO  Yonghong YAN  

    LETTER-Acoustic modeling

    E99-D No:10

    Multi-task learning in deep neural networks has been proven to be effective for acoustic modeling in speech recognition. In the paper, this technique is applied to Mandarin-English code-mixing recognition. For the primary task of the senone classification, three schemes of the auxiliary tasks are proposed to introduce the language information to networks and improve the prediction of language switching. On the real-world Mandarin-English test corpus in mobile voice search, the proposed schemes enhanced the recognition on both languages and reduced the relative overall error rates by 3.5%, 3.8% and 5.8% respectively.
