The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] SiON(4624hit)

361-380hit(4624hit)

  • Speech Chain VC: Linking Linguistic and Acoustic Levels via Latent Distinctive Features for RBM-Based Voice Conversion

    Takuya KISHIDA  Toru NAKASHIKA  

     
    PAPER-Speech and Hearing

      Pubricized:
    2020/08/06
      Vol:
    E103-D No:11
      Page(s):
    2340-2350

    This paper proposes a voice conversion (VC) method based on a model that links linguistic and acoustic representations via latent phonological distinctive features. Our method, called speech chain VC, is inspired by the concept of the speech chain, where speech communication consists of a chain of events linking the speaker's brain with the listener's brain. We assume that speaker identity information, which appears in the acoustic level, is embedded in two steps — where phonological information is encoded into articulatory movements (linguistic to physiological) and where articulatory movements generate sound waves (physiological to acoustic). Speech chain VC represents these event links by using an adaptive restricted Boltzmann machine (ARBM) introducing phoneme labels and acoustic features as two classes of visible units and latent phonological distinctive features associated with articulatory movements as hidden units. Subjective evaluation experiments showed that intelligibility of the converted speech significantly improved compared with the conventional ARBM-based method. The speaker-identity conversion quality of the proposed method was comparable to that of a Gaussian mixture model (GMM)-based method. Analyses on the representations of the hidden layer of the speech chain VC model supported that some of the hidden units actually correspond to phonological distinctive features. Final part of this paper proposes approaches to achieve one-shot VC by using the speech chain VC model. Subjective evaluation experiments showed that when a target speaker is the same gender as a source speaker, the proposed methods can achieve one-shot VC based on each single source and target speaker's utterance.

  • Analysis of Pulse Responses by Dispersion Medium with Periodically Conducting Strips

    Ryosuke OZAKI  Tomohiro KAGAWA  Tsuneki YAMASAKI  

     
    BRIEF PAPER

      Pubricized:
    2020/05/14
      Vol:
    E103-C No:11
      Page(s):
    613-616

    In this paper, we analyzed the pulse responses of dispersion medium with periodically conducting strips by using a fast inversion Laplace transform (FILT) method combined with point matching method (PMM) for both the TM and TE cases. Specifically, we investigated the influence of the width and number of the conducting strips on the pulse response and distribution of the electric field.

  • System Throughput Gain by New Channel Allocation Scheme for Spectrum Suppressed Transmission in Multi-Channel Environments over a Satellite Transponder

    Sumika OMATA  Motoi SHIRAI  Takatoshi SUGIYAMA  

     
    PAPER

      Pubricized:
    2020/03/27
      Vol:
    E103-B No:10
      Page(s):
    1059-1068

    A spectrum suppressed transmission that increases the frequency utilization efficiency, defined as throughput/bandwidth, by suppressing the required bandwidth has been proposed. This is one of the most effective schemes to solve the exhaustion problem of frequency bandwidths. However, in spectrum suppressed transmission, its transmission quality potentially degrades due to the ISI making the bandwidth narrower than the Nyquist bandwidth. In this paper, in order to improve the transmission quality degradation, we propose the spectrum suppressed transmission applying both FEC (forward error correction) and LE (linear equalization). Moreover, we also propose a new channel allocation scheme for the spectrum suppressed transmission, in multi-channel environments over a satellite transponder. From our computer simulation results, we clarify that the proposed schemes are more effective at increasing the system throughput than the scheme without spectrum suppression.

  • Robust Transferable Subspace Learning for Cross-Corpus Facial Expression Recognition

    Dongliang CHEN  Peng SONG  Wenjing ZHANG  Weijian ZHANG  Bingui XU  Xuan ZHOU  

     
    LETTER-Pattern Recognition

      Pubricized:
    2020/07/20
      Vol:
    E103-D No:10
      Page(s):
    2241-2245

    In this letter, we propose a novel robust transferable subspace learning (RTSL) method for cross-corpus facial expression recognition. In this method, on one hand, we present a novel distance metric algorithm, which jointly considers the local and global distance distribution measure, to reduce the cross-corpus mismatch. On the other hand, we design a label guidance strategy to improve the discriminate ability of subspace. Thus, the RTSL is much more robust to the cross-corpus recognition problem than traditional transfer learning methods. We conduct extensive experiments on several facial expression corpora to evaluate the recognition performance of RTSL. The results demonstrate the superiority of the proposed method over some state-of-the-art methods.

  • Efficient Algorithms for the Partial Sum Dispersion Problem

    Toshihiro AKAGI  Tetsuya ARAKI  Shin-ichi NAKANO  

     
    PAPER-optimization

      Vol:
    E103-A No:10
      Page(s):
    1206-1210

    The dispersion problem is a variant of the facility location problem. Given a set P of n points and an integer k, we intend to find a subset S of P with |S|=k such that the cost minp∈S{cost(p)} is maximized, where cost(p) is the sum of the distances from p to the nearest c points in S. We call the problem the dispersion problem with partial c sum cost, or the PcS-dispersion problem. In this paper we present two algorithms to solve the P2S-dispersion problem(c=2) if all points of P are on a line. The running times of the algorithms are O(kn2 log n) and O(n log n), respectively. We also present an algorithm to solve the PcS-dispersion problem if all points of P are on a line. The running time of the algorithm is O(knc+1).

  • On Dimensionally Orthogonal Diagonal Hypercubes Open Access

    Xiao-Nan LU  Tomoko ADACHI  

     
    PAPER-combinatorics

      Vol:
    E103-A No:10
      Page(s):
    1211-1217

    In this paper, we propose a notion for high-dimensional generalizations of mutually orthogonal Latin squares (MOLS) and mutually orthogonal diagonal Latin squares (MODLS), called mutually dimensionally orthogonal d-cubes (MOC) and mutually dimensionally orthogonal diagonal d-cubes (MODC). Systematic constructions for MOC and MODC by using polynomials over finite fields are investigated. In particular, for 3-dimensional cubes, the results for the maximum possible number of MODC are improved by adopting the proposed construction.

  • An MMT-Based Hierarchical Transmission Module for 4K/120fps Temporally Scalable Video

    Yasuhiro MOCHIDA  Takayuki NAKACHI  Takahiro YAMAGUCHI  

     
    PAPER

      Pubricized:
    2020/06/22
      Vol:
    E103-D No:10
      Page(s):
    2059-2066

    High frame rate (HFR) video is attracting strong interest since it is considered as a next step toward providing Ultra-High Definition video service. For instance, the Association of Radio Industries and Businesses (ARIB) standard, the latest broadcasting standard in Japan, defines a 120 fps broadcasting format. The standard stipulates temporally scalable coding and hierarchical transmission by MPEG Media Transport (MMT), in which the base layer and the enhancement layer are transmitted over different paths for flexible distribution. We have developed the first ever MMT transmitter/receiver module for 4K/120fps temporally scalable video. The module is equipped with a newly proposed encapsulation method of temporally scalable bitstreams with correct boundaries. It is also designed to be tolerant to severe network constraints, including packet loss, arrival timing offset, and delay jitter. We conducted a hierarchical transmission experiment for 4K/120fps temporally scalable video. The experiment demonstrated that the MMT module was successfully fabricated and capable of dealing with severe network constraints. Consequently, the module has excellent potential as a means to support HFR video distribution in various network situations.

  • Construction of an Efficient Divided/Distributed Neural Network Model Using Edge Computing

    Ryuta SHINGAI  Yuria HIRAGA  Hisakazu FUKUOKA  Takamasa MITANI  Takashi NAKADA  Yasuhiko NAKASHIMA  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2020/07/02
      Vol:
    E103-D No:10
      Page(s):
    2072-2082

    Modern deep learning has significantly improved performance and has been used in a wide variety of applications. Since the amount of computation required for the inference process of the neural network is large, it is processed not by the data acquisition location like a surveillance camera but by the server with abundant computing power installed in the data center. Edge computing is getting considerable attention to solve this problem. However, edge computing can provide limited computation resources. Therefore, we assumed a divided/distributed neural network model using both the edge device and the server. By processing part of the convolution layer on edge, the amount of communication becomes smaller than that of the sensor data. In this paper, we have evaluated AlexNet and the other eight models on the distributed environment and estimated FPS values with Wi-Fi, 3G, and 5G communication. To reduce communication costs, we also introduced the compression process before communication. This compression may degrade the object recognition accuracy. As necessary conditions, we set FPS to 30 or faster and object recognition accuracy to 69.7% or higher. This value is determined based on that of an approximation model that binarizes the activation of Neural Network. We constructed performance and energy models to find the optimal configuration that consumes minimum energy while satisfying the necessary conditions. Through the comprehensive evaluation, we found that the optimal configurations of all nine models. For small models, such as AlexNet, processing entire models in the edge was the best. On the other hand, for huge models, such as VGG16, processing entire models in the server was the best. For medium-size models, the distributed models were good candidates. We confirmed that our model found the most energy efficient configuration while satisfying FPS and accuracy requirements, and the distributed models successfully reduced the energy consumption up to 48.6%, and 6.6% on average. We also found that HEVC compression is important before transferring the input data or the feature data between the distributed inference processes.

  • Decentralized Probabilistic Frequency-Block Activation Control Method of Base Stations for Inter-cell Interference Coordination and Traffic Load Balancing Open Access

    Fumiya ISHIKAWA  Keiki SHIMADA  Yoshihisa KISHIYAMA  Kenichi HIGUCHI  

     
    PAPER-Terrestrial Wireless Communication/Broadcasting Technologies

      Pubricized:
    2020/04/02
      Vol:
    E103-B No:10
      Page(s):
    1172-1181

    In this paper, we propose a decentralized probabilistic frequency-block activation control method for the cellular downlink. The aim of the proposed method is to increase the downlink system throughput within the system coverage by adaptively controlling the individual activation of each frequency block at all base stations (BSs) to achieve inter-cell interference coordination (ICIC) and traffic load balancing. The proposed method does not rely on complicated inter-BS cooperation. It uses only the inter-BS information exchange regarding the observed system throughput levels with the neighboring BSs. Based on the shared temporal system throughput information, each BS independently controls online the activation of their respective frequency blocks in a probabilistic manner, which autonomously achieves ICIC and load balancing among BSs. Simulation results show that the proposed method achieves greater system throughput and a faster convergence rate than the conventional online probabilistic activation/deactivation control method. We also show that the proposed method successfully tracks dynamic changes in the user distribution generated due to mobility.

  • Optimal Rejuvenation Policies for Non-Markovian Availability Models with Aperiodic Checkpointing

    Junjun ZHENG  Hiroyuki OKAMURA  Tadashi DOHI  

     
    PAPER-Dependable Computing

      Pubricized:
    2020/07/16
      Vol:
    E103-D No:10
      Page(s):
    2133-2142

    In this paper, we present non-Markovian availability models for capturing the dynamics of system behavior of an operational software system that undergoes aperiodic time-based software rejuvenation and checkpointing. Two availability models with rejuvenation are considered taking account of the procedure after the completion of rollback recovery operation. We further proceed to investigate whether there exists the optimal rejuvenation schedule that maximizes the steady-state system availability, which is derived by means of the phase expansion technique, since the resulting models are not the trivial stochastic models such as semi-Markov process and Markov regenerative process, so that it is hard to solve them by using the common approaches like Laplace-Stieltjes transform and embedded Markov chain techniques. The numerical experiments are conducted to determine the optimal rejuvenation trigger timing maximizing the steady-state system availability for each availability model, and to compare both two models.

  • Design of N-path Notch Filter Circuits for Hum Noise Suppression in Biomedical Signal Acquisition

    Khilda AFIFAH  Nicodimus RETDIAN  

     
    PAPER-Electronic Circuits

      Pubricized:
    2020/04/17
      Vol:
    E103-C No:10
      Page(s):
    480-488

    Hum noise such as power line interference is one of the critical problems in the biomedical signal acquisition. Various techniques have been proposed to suppress power line interference. However, some of the techniques require more components and power consumption. The notch depth in the conventional N-path notch filter circuits needs a higher number of paths and switches off-resistance. It makes the conventional N-path notch filter less of efficiency to suppress hum noise. This work proposed the new N-path notch filter to hum noise suppression in biomedical signal acquisition. The new N-path notch filter achieved notch depth above 40dB with sampling frequency 50Hz and 60Hz. Although the proposed circuits use less number of path and switches off-resistance. The proposed circuit has been verified using artificial ECG signal contaminated by hum noise at frequency 50Hz and 60Hz. The output of N-path notch filter achieved a noise-free signal even if the sampling frequency changes.

  • Single Stage Vehicle Logo Detector Based on Multi-Scale Prediction

    Junxing ZHANG  Shuo YANG  Chunjuan BO  Huimin LU  

     
    PAPER-Pattern Recognition

      Pubricized:
    2020/07/14
      Vol:
    E103-D No:10
      Page(s):
    2188-2198

    Vehicle logo detection technology is one of the research directions in the application of intelligent transportation systems. It is an important extension of detection technology based on license plates and motorcycle types. A vehicle logo is characterized by uniqueness, conspicuousness, and diversity. Therefore, thorough research is important in theory and application. Although there are some related works for object detection, most of them cannot achieve real-time detection for different scenes. Meanwhile, some real-time detection methods of single-stage have performed poorly in the object detection of small sizes. In order to solve the problem that the training samples are scarce, our work in this paper is improved by constructing the data of a vehicle logo (VLD-45-S), multi-stage pre-training, multi-scale prediction, feature fusion between deeper with shallow layer, dimension clustering of the bounding box, and multi-scale detection training. On the basis of keeping speed, this article improves the detection precision of the vehicle logo. The generalization of the detection model and anti-interference capability in real scenes are optimized by data enrichment. Experimental results show that the accuracy and speed of the detection algorithm are improved for the object of small sizes.

  • Secure OMP Computation Maintaining Sparse Representations and Its Application to EtC Systems

    Takayuki NAKACHI  Hitoshi KIYA  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2020/06/22
      Vol:
    E103-D No:9
      Page(s):
    1988-1997

    In this paper, we propose a secure computation of sparse coding and its application to Encryption-then-Compression (EtC) systems. The proposed scheme introduces secure sparse coding that allows computation of an Orthogonal Matching Pursuit (OMP) algorithm in an encrypted domain. We prove theoretically that the proposed method estimates exactly the same sparse representations that the OMP algorithm for non-encrypted computation does. This means that there is no degradation of the sparse representation performance. Furthermore, the proposed method can control the sparsity without decoding the encrypted signals. Next, we propose an EtC system based on the secure sparse coding. The proposed secure EtC system can protect the private information of the original image contents while performing image compression. It provides the same rate-distortion performance as that of sparse coding without encryption, as demonstrated on both synthetic data and natural images.

  • Visual Recognition Method Based on Hybrid KPCA Network

    Feng YANG  Zheng MA  Mei XIE  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2020/05/28
      Vol:
    E103-D No:9
      Page(s):
    2015-2018

    In this paper, we propose a deep model of visual recognition based on hybrid KPCA Network(H-KPCANet), which is based on the combination of one-stage KPCANet and two-stage KPCANet. The proposed model consists of four types of basic components: the input layer, one-stage KPCANet, two-stage KPCANet and the fusion layer. The role of one-stage KPCANet is to calculate the KPCA filters for convolution layer, and two-stage KPCANet is to learn PCA filters in the first stage and KPCA filters in the second stage. After binary quantization mapping and block-wise histogram, the features from two different types of KPCANets are fused in the fusion layer. The final feature of the input image can be achieved by weighted serial combination of the two types of features. The performance of our proposed algorithm is tested on digit recognition and object classification, and the experimental results on visual recognition benchmarks of MNIST and CIFAR-10 validated the performance of the proposed H-KPCANet.

  • Joint Adversarial Training of Speech Recognition and Synthesis Models for Many-to-One Voice Conversion Using Phonetic Posteriorgrams

    Yuki SAITO  Kei AKUZAWA  Kentaro TACHIBANA  

     
    PAPER-Speech and Hearing

      Pubricized:
    2020/06/12
      Vol:
    E103-D No:9
      Page(s):
    1978-1987

    This paper presents a method for many-to-one voice conversion using phonetic posteriorgrams (PPGs) based on an adversarial training of deep neural networks (DNNs). A conventional method for many-to-one VC can learn a mapping function from input acoustic features to target acoustic features through separately trained DNN-based speech recognition and synthesis models. However, 1) the differences among speakers observed in PPGs and 2) an over-smoothing effect of generated acoustic features degrade the converted speech quality. Our method performs a domain-adversarial training of the recognition model for reducing the PPG differences. In addition, it incorporates a generative adversarial network into the training of the synthesis model for alleviating the over-smoothing effect. Unlike the conventional method, ours jointly trains the recognition and synthesis models so that they are optimized for many-to-one VC. Experimental evaluation demonstrates that the proposed method significantly improves the converted speech quality compared with conventional VC methods.

  • Combining Siamese Network and Regression Network for Visual Tracking

    Yao GE  Rui CHEN  Ying TONG  Xuehong CAO  Ruiyu LIANG  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2020/05/13
      Vol:
    E103-D No:8
      Page(s):
    1924-1927

    We combine the siamese network and the recurrent regression network, proposing a two-stage tracking framework termed as SiamReg. Our method solves the problem that the classic siamese network can not judge the target size precisely and simplifies the procedures of regression in the training and testing process. We perform experiments on three challenging tracking datasets: VOT2016, OTB100, and VOT2018. The results indicate that, after offline trained, SiamReg can obtain a higher expected average overlap measure.

  • A Study on Attractors of Generalized Asynchronous Random Boolean Networks

    Van Giang TRINH  Kunihiko HIRAISHI  

     
    PAPER-Mathematical Systems Science

      Vol:
    E103-A No:8
      Page(s):
    987-994

    Boolean networks (BNs) are considered as popular formal models for the dynamics of gene regulatory networks. There are many different types of BNs, depending on their updating scheme (synchronous, asynchronous, deterministic, or non-deterministic), such as Classical Random Boolean Networks (CRBNs), Asynchronous Random Boolean Networks (ARBNs), Generalized Asynchronous Random Boolean Networks (GARBNs), Deterministic Asynchronous Random Boolean Networks (DARBNs), and Deterministic Generalized Asynchronous Random Boolean Networks (DGARBNs). An important long-term behavior of BNs, so-called attractor, can provide valuable insights into systems biology (e.g., the origins of cancer). In the previous paper [1], we have studied properties of attractors of GARBNs, their relations with attractors of CRBNs, also proposed different algorithms for attractor detection. In this paper, we propose a new algorithm based on SAT-based bounded model checking to overcome inherent problems in these algorithms. Experimental results prove the effectiveness of the new algorithm. We also show that studying attractors of GARBNs can pave potential ways to study attractors of ARBNs.

  • Machine Learning-Based Approach for Depression Detection in Twitter Using Content and Activity Features

    Hatoon S. ALSAGRI  Mourad YKHLEF  

     
    PAPER-Data Engineering, Web Information Systems

      Pubricized:
    2020/04/24
      Vol:
    E103-D No:8
      Page(s):
    1825-1832

    Social media channels, such as Facebook, Twitter, and Instagram, have altered our world forever. People are now increasingly connected than ever and reveal a sort of digital persona. Although social media certainly has several remarkable features, the demerits are undeniable as well. Recent studies have indicated a correlation between high usage of social media sites and increased depression. The present study aims to exploit machine learning techniques for detecting a probable depressed Twitter user based on both, his/her network behavior and tweets. For this purpose, we trained and tested classifiers to distinguish whether a user is depressed or not using features extracted from his/her activities in the network and tweets. The results showed that the more features are used, the higher are the accuracy and F-measure scores in detecting depressed users. This method is a data-driven, predictive approach for early detection of depression or other mental illnesses. This study's main contribution is the exploration part of the features and its impact on detecting the depression level.

  • Content-Based Superpixel Segmentation and Matching Using Its Region Feature Descriptors

    Jianmei ZHANG  Pengyu WANG  Feiyang GONG  Hongqing ZHU  Ning CHEN  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2020/04/27
      Vol:
    E103-D No:8
      Page(s):
    1888-1900

    Finding the correspondence between two images of the same object or scene is an active research field in computer vision. This paper develops a rapid and effective Content-based Superpixel Image matching and Stitching (CSIS) scheme, which utilizes the content of superpixel through multi-features fusion technique. Unlike popular keypoint-based matching method, our approach proposes a superpixel internal feature-based scheme to implement image matching. In the beginning, we make use of a novel superpixel generation algorithm based on content-based feature representation, named Content-based Superpixel Segmentation (CSS) algorithm. Superpixels are generated in terms of a new distance metric using color, spatial, and gradient feature information. It is developed to balance the compactness and the boundary adherence of resulted superpixels. Then, we calculate the entropy of each superpixel for separating some superpixels with significant characteristics. Next, for each selected superpixel, its multi-features descriptor is generated by extracting and fusing local features of the selected superpixel itself. Finally, we compare the matching features of candidate superpixels and their own neighborhoods to estimate the correspondence between two images. We evaluated superpixel matching and image stitching on complex and deformable surfaces using our superpixel region descriptors, and the results show that new method is effective in matching accuracy and execution speed.

  • Stochastic Discrete First-Order Algorithm for Feature Subset Selection

    Kota KUDO  Yuichi TAKANO  Ryo NOMURA  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2020/04/13
      Vol:
    E103-D No:7
      Page(s):
    1693-1702

    This paper addresses the problem of selecting a significant subset of candidate features to use for multiple linear regression. Bertsimas et al. [5] recently proposed the discrete first-order (DFO) algorithm to efficiently find near-optimal solutions to this problem. However, this algorithm is unable to escape from locally optimal solutions. To resolve this, we propose a stochastic discrete first-order (SDFO) algorithm for feature subset selection. In this algorithm, random perturbations are added to a sequence of candidate solutions as a means to escape from locally optimal solutions, which broadens the range of discoverable solutions. Moreover, we derive the optimal step size in the gradient-descent direction to accelerate convergence of the algorithm. We also make effective use of the L2-regularization term to improve the predictive performance of a resultant subset regression model. The simulation results demonstrate that our algorithm substantially outperforms the original DFO algorithm. Our algorithm was superior in predictive performance to lasso and forward stepwise selection as well.

361-380hit(4624hit)