The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] SPE(2504hit)

181-200hit(2504hit)

  • Hand-Dorsa Vein Recognition Based on Task-Specific Cross-Convolutional-Layer Pooling Open Access

    Jun WANG  Yulian LI  Zaiyu PAN  

     
    LETTER-Pattern Recognition

      Pubricized:
    2019/09/09
      Vol:
    E102-D No:12
      Page(s):
    2628-2631

    Hand-dorsa vein recognition is solved based on the convolutional activations of the pre-trained deep convolutional neural network (DCNN). In specific, a novel task-specific cross-convolutional-layer pooling is proposed to obtain the more representative and discriminative feature representation. Rigorous experiments on the self-established database achieves the state-of-the-art recognition result, which demonstrates the effectiveness of the proposed model.

  • A Spectral Clustering Based Filter-Level Pruning Method for Convolutional Neural Networks

    Lianqiang LI  Jie ZHU  Ming-Ting SUN  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2019/09/17
      Vol:
    E102-D No:12
      Page(s):
    2624-2627

    Convolutional Neural Networks (CNNs) usually have millions or even billions of parameters, which make them hard to be deployed into mobile devices. In this work, we present a novel filter-level pruning method to alleviate this issue. More concretely, we first construct an undirected fully connected graph to represent a pre-trained CNN model. Then, we employ the spectral clustering algorithm to divide the graph into some subgraphs, which is equivalent to clustering the similar filters of the CNN into the same groups. After gaining the grouping relationships among the filters, we finally keep one filter for one group and retrain the pruned model. Compared with previous pruning methods that identify the redundant filters by heuristic ways, the proposed method can select the pruning candidates more reasonably and precisely. Experimental results also show that our proposed pruning method has significant improvements over the state-of-the-arts.

  • Target-Adapted Subspace Learning for Cross-Corpus Speech Emotion Recognition

    Xiuzhen CHEN  Xiaoyan ZHOU  Cheng LU  Yuan ZONG  Wenming ZHENG  Chuangao TANG  

     
    LETTER-Speech and Hearing

      Pubricized:
    2019/08/26
      Vol:
    E102-D No:12
      Page(s):
    2632-2636

    For cross-corpus speech emotion recognition (SER), how to obtain effective feature representation for the discrepancy elimination of feature distributions between source and target domains is a crucial issue. In this paper, we propose a Target-adapted Subspace Learning (TaSL) method for cross-corpus SER. The TaSL method trys to find a projection subspace, where the feature regress the label more accurately and the gap of feature distributions in target and source domains is bridged effectively. Then, in order to obtain more optimal projection matrix, ℓ1 norm and ℓ2,1 norm penalty terms are added to different regularization terms, respectively. Finally, we conduct extensive experiments on three public corpuses, EmoDB, eNTERFACE and AFEW 4.0. The experimental results show that our proposed method can achieve better performance compared with the state-of-the-art methods in the cross-corpus SER tasks.

  • Latent Words Recurrent Neural Network Language Models for Automatic Speech Recognition

    Ryo MASUMURA  Taichi ASAMI  Takanobu OBA  Sumitaka SAKAUCHI  Akinori ITO  

     
    PAPER-Speech and Hearing

      Pubricized:
    2019/09/25
      Vol:
    E102-D No:12
      Page(s):
    2557-2567

    This paper demonstrates latent word recurrent neural network language models (LW-RNN-LMs) for enhancing automatic speech recognition (ASR). LW-RNN-LMs are constructed so as to pick up advantages in both recurrent neural network language models (RNN-LMs) and latent word language models (LW-LMs). The RNN-LMs can capture long-range context information and offer strong performance, and the LW-LMs are robust for out-of-domain tasks based on the latent word space modeling. However, the RNN-LMs cannot explicitly capture hidden relationships behind observed words since a concept of a latent variable space is not present. In addition, the LW-LMs cannot take into account long-range relationships between latent words. Our idea is to combine RNN-LM and LW-LM so as to compensate individual disadvantages. The LW-RNN-LMs can support both a latent variable space modeling as well as LW-LMs and a long-range relationship modeling as well as RNN-LMs at the same time. From the viewpoint of RNN-LMs, LW-RNN-LM can be considered as a soft class RNN-LM with a vast latent variable space. In contrast, from the viewpoint of LW-LMs, LW-RNN-LM can be considered as an LW-LM that uses the RNN structure for latent variable modeling instead of an n-gram structure. This paper also details a parameter inference method and two kinds of implementation methods, an n-gram approximation and a Viterbi approximation, for introducing the LW-LM to ASR. Our experiments show effectiveness of LW-RNN-LMs on a perplexity evaluation for the Penn Treebank corpus and an ASR evaluation for Japanese spontaneous speech tasks.

  • A Stackelberg Game-Theoretic Solution to Win-Win Situation: A Presale Mechanism in Spectrum Market

    Wei BAI  Yuli ZHANG  Meng WANG  Jin CHEN  Han JIANG  Zhan GAO  Donglin JIAO  

     
    LETTER-Information Network

      Pubricized:
    2019/08/28
      Vol:
    E102-D No:12
      Page(s):
    2607-2610

    This paper investigates the spectrum allocation problem. Under the current spectrum management mode, large amount of spectrum resource is wasted due to uncertainty of user's demand. To reduce the impact of uncertainty, a presale mechanism is designed based on spectrum pool. In this mechanism, the spectrum manager provides spectrum resource at a favorable price for presale aiming at sharing with user the risk caused by uncertainty of demand. Because of the hierarchical characteristic, we build a spectrum market Stackelberg game, in which the manager acts as leader and user as follower. Then proof of the uniqueness and optimality of Stackelberg Equilibrium is given. Simulation results show the presale mechanism can promote profits for both sides and reduce temporary scheduling.

  • Performance Improvement of the Catastrophic CPM Scheme with New Split-Merged MNSED

    Richard Hsin-Hsyong YANG  Chia-Kun LEE  Shiunn-Jang CHERN  

     
    PAPER-Transmission Systems and Transmission Equipment for Communications

      Pubricized:
    2019/05/16
      Vol:
    E102-B No:11
      Page(s):
    2091-2103

    Continuous phase modulation (CPM) is a very attractive digital modulation scheme, with constant envelope feature and high efficiency in meeting the power and bandwidth requirements. CPM signals with pairs of input sequences that differ in an infinite number of positions and map into pairs of transmitted signals with finite Euclidean distance (ED) are called catastrophic. In the CPM scheme, data sequences that have the catastrophic property are called the catastrophic sequences; they are periodic difference data patterns. The catastrophic sequences are usually with shorter length of the merger. The corresponding minimum normalized squared ED (MNSED) is smaller and below the distance bound. Two important CPM schemes, viz., LREC and LRC schemes, are known to be catastrophic for most cases; they have poor overall power and bandwidth performance. In the literatures, it has been shown that the probability of generating such catastrophic sequences are negligible, therefore, the asymptotic error performance (AEP) of those well-known catastrophic CPM schemes evaluated with the corresponding MNSED, over AWGN channels, might be too negative or pessimistic. To deal with this problem in AWGN channel, this paper presents a new split-merged MNSED and provide criteria to explore which conventional catastrophic CPM scheme could increase the length of mergers with split-merged non-periodic events, effectively. For comparison, we investigate the exact power and bandwidth performance for LREC and LRC CPM for the same bandwidth occupancy. Computer simulation results verify that the AEP evaluating with the split-merged MNSED could achieve up to 3dB gain over the conventional approach.

  • Underwater Signal Analysis in the Modulation Spectrogram with Time-Frequency Reassignment Technique

    Hyunjin CHO  Wan Jin KIM  Wooyoung HONG  

     
    LETTER-Engineering Acoustics

      Vol:
    E102-A No:11
      Page(s):
    1542-1544

    Modulation spectrogram is effective for analyzing underwater signals which consist of tonal and modulated components. This method can analyze the acoustic and modulation frequency at the same time, but has the trade-off issue of time-frequency localization. This letter introduces a reassignment method for overcoming the localization issue in conventional spectrograms, and then presents an alignment scheme for implementing modulation spectrogram. Relevant experiments show improvement in acoustic frequency estimation perspective and an increment in analyzable modulation frequency range.

  • QSL: A Specification Language for E-Questionnaire, E-Testing, and E-Voting Systems

    Yuan ZHOU  Yuichi GOTO  Jingde CHENG  

     
    PAPER-Data Engineering, Web Information Systems

      Pubricized:
    2019/08/19
      Vol:
    E102-D No:11
      Page(s):
    2159-2175

    Many kinds of questionnaires, testing, and voting are performed in some completely electronic ways to do questions and answers on the Internet as Web applications, i.e. e-questionnaire systems, e-testing systems, and e-voting systems. Because there is no unified communication tool among the stakeholders of e-questionnaire, e-testing, and e-voting systems, until now, all the e-questionnaire, e-testing, and e-voting systems are designed, developed, used, and maintained in various ad hoc ways. As a result, the stakeholders are difficult to communicate to implement the systems, because there is neither an exhaustive requirement list to have a grasp of the overall e-questionnaire, e-testing, and e-voting systems nor a standardized terminology for these systems to avoid ambiguity. A general-purpose specification language to provide a unified description way for specifying various e-questionnaire, e-testing, and e-voting systems can solve the problems such that the stakeholders can refer to and use the complete requirements and standardized terminology for better communications, and can easily and unambiguously specify all the requirements of systems and services of e-questionnaire, e-testing, and e-voting, even can implement the systems. In this paper, we propose the first specification language, named “QSL,” with a standardized, consistent, and exhaustive list of requirements for specifying various e-questionnaire, e-testing, and e-voting systems such that the specifications can be used as the precondition of automatically generating e-questionnaire, e-testing, and e-voting systems. The paper presents our design addressing that QSL can specify all the requirements of various e-questionnaire, e-testing, and e-voting systems in a structured way, evaluates its effectiveness, performs real applications using QSL in case of e-questionnaire, e-testing, and e-voting systems, and shows various QSL applications for providing convenient QSL services to stakeholders.

  • Personalized Food Image Classifier Considering Time-Dependent and Item-Dependent Food Distribution Open Access

    Qing YU  Masashi ANZAWA  Sosuke AMANO  Kiyoharu AIZAWA  

     
    PAPER

      Pubricized:
    2019/06/21
      Vol:
    E102-D No:11
      Page(s):
    2120-2126

    Since the development of food diaries could enable people to develop healthy eating habits, food image recognition is in high demand to reduce the effort in food recording. Previous studies have worked on this challenging domain with datasets having fixed numbers of samples and classes. However, in the real-world setting, it is impossible to include all of the foods in the database because the number of classes of foods is large and increases continually. In addition to that, inter-class similarity and intra-class diversity also bring difficulties to the recognition. In this paper, we solve these problems by using deep convolutional neural network features to build a personalized classifier which incrementally learns the user's data and adapts to the user's eating habit. As a result, we achieved the state-of-the-art accuracy of food image recognition by the personalization of 300 food records per user.

  • Cross-Domain Deep Feature Combination for Bird Species Classification with Audio-Visual Data

    Naranchimeg BOLD  Chao ZHANG  Takuya AKASHI  

     
    PAPER-Multimedia Pattern Processing

      Pubricized:
    2019/06/27
      Vol:
    E102-D No:10
      Page(s):
    2033-2042

    In recent decade, many state-of-the-art algorithms on image classification as well as audio classification have achieved noticeable successes with the development of deep convolutional neural network (CNN). However, most of the works only exploit single type of training data. In this paper, we present a study on classifying bird species by exploiting the combination of both visual (images) and audio (sounds) data using CNN, which has been sparsely treated so far. Specifically, we propose CNN-based multimodal learning models in three types of fusion strategies (early, middle, late) to settle the issues of combining training data cross domains. The advantage of our proposed method lies on the fact that we can utilize CNN not only to extract features from image and audio data (spectrogram) but also to combine the features across modalities. In the experiment, we train and evaluate the network structure on a comprehensive CUB-200-2011 standard data set combing our originally collected audio data set with respect to the data species. We observe that a model which utilizes the combination of both data outperforms models trained with only an either type of data. We also show that transfer learning can significantly increase the classification performance.

  • A 2.5Gbps Transceiver and Channel Architecture for High-Speed Automotive Communication System

    Kyongsu LEE  Jae-Yoon SIM  

     
    BRIEF PAPER-Integrated Electronics

      Vol:
    E102-C No:10
      Page(s):
    766-769

    In this paper, a new transceiver system for the in-vehicle communication system is proposed to enhance data transmission rate and timing accuracy in TDM-based application. The proposed system utilizes point-to-point (P2P) channel, a closed-loop clock forwarding path, and a transceiver with a repeater and clock delay adjuster. The proposed system with 4 ECU (Electronic Computing Unit) nodes is implemented in 180nm CMOS technology and, when compared with conventional bus-based system, achieved more than 125 times faster data transmission. The maximum data rate was 2.5Gbps at 1.8V power supply and the worst peak-to-peak jitter for the data and clock signals over 5000 data symbols were about 49.6ps and 9.8ps respectively.

  • Effectiveness of Speech Mode Adaptation for Improving Dialogue Speech Synthesis

    Kazuki KAYA  Hiroki MORI  

     
    LETTER-Speech and Hearing

      Pubricized:
    2019/06/13
      Vol:
    E102-D No:10
      Page(s):
    2064-2066

    The effectiveness of model adaptation in dialogue speech synthesis is explored. The proposed adaptation method is based on a conversion from a base model learned with a large dataset into a target, dialogue-style speech model. The proposed method is shown to improve the intelligibility of synthesized dialogue speech, while maintaining the speaking style of dialogue.

  • Scalable Community Identification with Manifold Learning on Speaker I-Vector Space

    Hongcui WANG  Shanshan LIU  Di JIN  Lantian LI  Jianwu DANG  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2019/07/10
      Vol:
    E102-D No:10
      Page(s):
    2004-2012

    Recognizing the different segments of speech belonging to the same speaker is an important speech analysis task in various applications. Recent works have shown that there was an underlying manifold on which speaker utterances live in the model-parameter space. However, most speaker clustering methods work on the Euclidean space, and hence often fail to discover the intrinsic geometrical structure of the data space and fail to use such kind of features. For this problem, we consider to convert the speaker i-vector representation of utterances in the Euclidean space into a network structure constructed based on the local (k) nearest neighbor relationship of these signals. We then propose an efficient community detection model on the speaker content network for clustering signals. The new model is based on the probabilistic community memberships, and is further refined with the idea that: if two connected nodes have a high similarity, their community membership distributions in the model should be made close. This refinement enhances the local invariance assumption, and thus better respects the structure of the underlying manifold than the existing community detection methods. Some experiments are conducted on graphs built from two Chinese speech databases and a NIST 2008 Speaker Recognition Evaluations (SREs). The results provided the insight into the structure of the speakers present in the data and also confirmed the effectiveness of the proposed new method. Our new method yields better performance compared to with the other state-of-the-art clustering algorithms. Metrics for constructing speaker content graph is also discussed.

  • Fast Hyperspectral Unmixing via Reweighted Sparse Regression Open Access

    Hongwei HAN  Ke GUO  Maozhi WANG  Tingbin ZHANG  Shuang ZHANG  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2019/05/28
      Vol:
    E102-D No:9
      Page(s):
    1819-1832

    The sparse unmixing of hyperspectral data has attracted much attention in recent years because it does not need to estimate the number of endmembers nor consider the lack of pure pixels in a given hyperspectral scene. However, the high mutual coherence of spectral libraries strongly affects the practicality of sparse unmixing. The collaborative sparse unmixing via variable splitting and augmented Lagrangian (CLSUnSAL) algorithm is a classic sparse unmixing algorithm that performs better than other sparse unmixing methods. In this paper, we propose a CLSUnSAL-based hyperspectral unmixing method based on dictionary pruning and reweighted sparse regression. First, the algorithm identifies a subset of the original library elements using a dictionary pruning strategy. Second, we present a weighted sparse regression algorithm based on CLSUnSAL to further enhance the sparsity of endmember spectra in a given library. Third, we apply the weighted sparse regression algorithm on the pruned spectral library. The effectiveness of the proposed algorithm is demonstrated on both simulated and real hyperspectral datasets. For simulated data cubes (DC1, DC2 and DC3), the number of the pruned spectral library elements is reduced by at least 94% and the runtime of the proposed algorithm is less than 10% of that of CLSUnSAL. For simulated DC4 and DC5, the runtime of the proposed algorithm is less than 15% of that of CLSUnSAL. For the real hyperspectral datasets, the pruned spectral library successfully reduces the original dictionary size by 76% and the runtime of the proposed algorithm is 11.21% of that of CLSUnSAL. These experimental results show that our proposed algorithm not only substantially improves the accuracy of unmixing solutions but is also much faster than some other state-of-the-art sparse unmixing algorithms.

  • Pre-Training of DNN-Based Speech Synthesis Based on Bidirectional Conversion between Text and Speech

    Kentaro SONE  Toru NAKASHIKA  

     
    PAPER-Speech and Hearing

      Pubricized:
    2019/05/15
      Vol:
    E102-D No:8
      Page(s):
    1546-1553

    Conventional approaches to statistical parametric speech synthesis use context-dependent hidden Markov models (HMMs) clustered using decision trees to generate speech parameters from linguistic features. However, decision trees are not always appropriate to model complex context dependencies of linguistic features efficiently. An alternative scheme that replaces decision trees with deep neural networks (DNNs) was presented as a possible way to overcome the difficulty. By training the network to represent high-dimensional feedforward dependencies from linguistic features to acoustic features, DNN-based speech synthesis systems convert a text into a speech. To improved the naturalness of the synthesized speech, this paper presents a novel pre-training method for DNN-based statistical parametric speech synthesis systems. In our method, a deep relational model (DRM), which represents a joint probability of two visible variables, is applied to describe the joint distribution of acoustic and linguistic features. As with DNNs, a DRM consists several hidden layers and two visible layers. Although DNNs represent feedforward dependencies from one visible variables (inputs) to other visible variables (outputs), a DRM has an ability to represent the bidirectional dependencies between two visible variables. During the maximum-likelihood (ML) -based training, the model optimizes its parameters (connection weights between two adjacent layers, and biases) of a deep architecture considering the bidirectional conversion between 1) acoustic features given linguistic features, and 2) linguistic features given acoustic features generated from itself. Owing to considering whether the generated acoustic features are recognizable, our method can obtain reasonable parameters for speech synthesis. Experimental results in a speech synthesis task show that pre-trained DNN-based systems using our proposed method outperformed randomly-initialized DNN-based systems, especially when the amount of training data is limited. Additionally, speaker-dependent speech recognition experimental results also show that our method outperformed DNN-based systems, by setting the initial parameters of our method are the same as that in the synthesis experiments.

  • Spectrum Sensing Using Phase Inversion Based on Space Diversity with Over Three Antennas

    Shusuke NARIEDA  Hiroshi NARUSE  

     
    LETTER-Communication Theory and Signals

      Vol:
    E102-A No:8
      Page(s):
    974-977

    This letter presents a computational complexity reduction technique for space diversity based spectrum sensing when the number of receive antennas is greater than three (NR≥3 where NR is the number of receive antenna). The received signals are combined with phase inversion so as to not attenuate the combined signal, and a statistic for signal detection is computed from the combined signal. Because the computation of only one statistic is required regardless of the number of receive antenna, the complexity can be reduced. Numerical examples and simple analysis verify the effectiveness of the presented technique.

  • Change Impact Analysis for Refinement-Based Formal Specification

    Shinnosuke SARUWATARI  Fuyuki ISHIKAWA  Tsutomu KOBAYASHI  Shinichi HONIDEN  

     
    PAPER

      Pubricized:
    2019/05/22
      Vol:
    E102-D No:8
      Page(s):
    1462-1477

    Refinement-based formal specification is a promising approach to the increasing complexity of software systems, as demonstrated in the formal method Event-B. It allows stepwise modeling and verifying of complex systems with multiple steps at different abstraction levels. However, making changes is more difficult, as caution is necessary to avoid breaking the consistency between the steps. Judging whether a change is valid or not is a non-trivial task, as the logical dependency relationships between the modeling elements (predicates) are implicit and complex. In this paper, we propose a method for analyzing the impact of the changes of Event-B. By attaching labels to modeling elements (predicates), the method helps engineers understand how a model is structured and what needs to be modified to accomplish a change.

  • Speech Quality Enhancement for In-Ear Microphone Based on Neural Network

    Hochong PARK  Yong-Shik SHIN  Seong-Hyeon SHIN  

     
    LETTER-Speech and Hearing

      Pubricized:
    2019/05/15
      Vol:
    E102-D No:8
      Page(s):
    1594-1597

    Speech captured by an in-ear microphone placed inside an occluded ear has a high signal-to-noise ratio; however, it has different sound characteristics compared to normal speech captured through air conduction. In this study, a method for blind speech quality enhancement is proposed that can convert speech captured by an in-ear microphone to one that resembles normal speech. The proposed method estimates an input-dependent enhancement function by using a neural network in the feature domain and enhances the captured speech via time-domain filtering. Subjective and objective evaluations confirm that the speech enhanced using our proposed method sounds more similar to normal speech than that enhanced using conventional equalizer-based methods.

  • Learning-Based, Distributed Spectrum Observation System for Dynamic Spectrum Sharing in the 5G Era and Beyond

    Masaki KITSUNEZUKA  Kenta TSUKAMOTO  Jun SAKAI  Taichi OHTSUJI  Kazuaki KUNIHIRO  

     
    PAPER

      Pubricized:
    2019/02/20
      Vol:
    E102-B No:8
      Page(s):
    1526-1537

    Dynamic sharing of limited radio spectrum resources is expected to satisfy the increasing demand for spectrum resources in the upcoming 5th generation mobile communication system (5G) era and beyond. Distributed real-time spectrum sensing is a key enabler of dynamic spectrum sharing, but the costs incurred in observed-data transmission are a critical problem, especially when massive numbers of spectrum sensors are deployed. To cope with this issue, the proposed spectrum sensors learn the ambient radio environment in real-time and create a time-spectral model whose parameters are shared with servers operating in the edge-computing layer. This process makes it possible to significantly reduce the communication cost of the sensors because frequent data transmission is no longer needed while enabling the edge servers to keep up on the current status of the radio environment. On the basis of the created time-spectral model, sharable spectrum resources are dynamically harvested and allocated in terms of geospatial, temporal, and frequency-spectral domains when accepting an application for secondary-spectrum use. A web-based prototype spectrum management system has been implemented using ten servers and dozens of sensors. Measured results show that the proposed approach can reduce data traffic between the sensors and servers by 97%, achieving an average data rate of 10 kilobits per second (kbps). In addition, the basic operation flow of the prototype has been verified through a field experiment conducted at a manufacturing facility and a proof-of-concept experiment of dynamic-spectrum sharing using wireless local-area-network equipment.

  • High Speed Mobility Experiments on Distributed MIMO Beamforming for 5G Radio Access in 28-GHz Band

    Daisuke KITAYAMA  Kiichi TATEISHI  Daisuke KURITA  Atsushi HARADA  Minoru INOMATA  Tetsuro IMAI  Yoshihisa KISHIYAMA  Hideshi MURAI  Shoji ITOH  Arne SIMONSSON  Peter ÖKVIST  

     
    PAPER

      Pubricized:
    2019/02/20
      Vol:
    E102-B No:8
      Page(s):
    1418-1426

    This paper describes the results of outdoor mobility measurements and high-speed vehicle tests that clarify the 4-by-8 multiple-input multiple-output (MIMO) throughput performance when applying distributed MIMO with narrow antenna-beam tracking in a 28-GHz frequency band in the downlink of a 5G cellular radio access system. To clarify suitable transmission point (TP) deployment for mobile stations (MS) moving at high speed, we examine two arrangements for 3TPs. The first sets all TPs in a line along the same side of the path traversed by the MS, and the other sets one TP on the other side of the path. The experiments in which the MS is installed on a moving wagon reveal that the latter deployment case enables a high peak data rate and high average throughput performance exhibiting the peak throughput of 15Gbps at the vehicle speed of 3km/h. Setting the MS in a vehicle travelling at 30km/h yielded the peak throughput of 13Gbps. The peak throughput of 11Gbps is achieved at the vehicle speed of 100km/h, and beam tracking and intra-baseband unit hand over operation are successfully demonstrated even at this high vehicle speed.

181-200hit(2504hit)