The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] EE(4079hit)

421-440hit(4079hit)

  • Voice Conversion for Improving Perceived Likability of Uttered Speech

    Shinya HORIIKE  Masanori MORISE  

     
    LETTER-Speech and Hearing

      Pubricized:
    2020/01/23
      Vol:
    E103-D No:5
      Page(s):
    1199-1202

    To improve the likability of speech, we propose a voice conversion algorithm by controlling the fundamental frequency (F0) and the spectral envelope and carry out a subjective evaluation. The subjects can manipulate these two speech parameters. From the result, the subjects preferred speech with a parameter related to higher brightness.

  • End-to-End Deep ROI Image Compression

    Hiroaki AKUTSU  Takahiro NARUKO  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2020/01/24
      Vol:
    E103-D No:5
      Page(s):
    1031-1038

    In this paper, we present the effectiveness of image compression based on a convolutional auto encoder (CAE) with region of interest (ROI) for quality control. We propose a method that adapts image quality for prioritized parts and non-prioritized parts for CAE-based compression. The proposed method uses annotation information for the distortion weights of the MS-SSIM-based loss function. We show experimental results using a road damage image dataset that is used to check damaged parts and an image dataset with segmentation data (ADE20K). The experimental results reveals that the proposed weighted loss function with CAE-based compression from F. Mentzer et al. learns some characteristics and preferred bit allocations of the prioritized parts by end-to-end training. In the case of using road damage image dataset, our method reduces bpp by 31% compared to the original method while meeting quality requirements that an average weighted MS-SSIM for the road damaged parts be larger than 0.97 and an average weighted MS-SSIM for the other parts be larger than 0.95.

  • Composition Proposal Generation for Manga Creation Support

    Hironori ITO  Yasuhito ASANO  

     
    PAPER

      Pubricized:
    2019/12/27
      Vol:
    E103-D No:5
      Page(s):
    949-957

    In recent years, cognition and use of manga pervade, and people who use manga for various purposes such as entertainment, study, marketing are increasing more and more. However, when people who do not specialize in it create it for these purposes, they can write plots expressing what they want to convey but the technique of the composition which arranges elements in manga such as characters or balloons corresponding to the plot create obstacles to using its merits for comprehensibility based on high flexibility of its expression. Therefore, we consider that support of this composition technique is necessary for amateurs to use manga while taking advantage of its benefits. We propose a method of generating composition proposal to support manga creation by amateurs. For the method, we also define new manga metadata model which summarize and extend metadata models by earlier studies. It represents the compostion and the plot in manga. We apply a neural machine translation mechanism for learing the relation between the composition and the plot. It considers that the plot annotation is the source of the composition annotation that is the target, and learns from the annotation dataset based on the metadata model. We conducted experiments to evaluate how the composition proposal generated by our method helps amateur manga creation, and demonstrated that it is useful.

  • Mimicking Lombard Effect: An Analysis and Reconstruction

    Thuan Van NGO  Rieko KUBO  Masato AKAGI  

     
    PAPER-Speech and Hearing

      Pubricized:
    2020/02/13
      Vol:
    E103-D No:5
      Page(s):
    1108-1117

    Lombard speech is produced in noisy environments due to the Lombard effect and is intelligible in adverse environments. To adaptively control the intelligibility of transmitted speech for public announcement systems, in this study, we focus on perceptually mimicking Lombard speech under backgrounds with varying noise levels. Other approaches map corresponding neutral speech features to Lombard speech features, but as this can only be applied to one noise level at a time, it is unsuitable for varying noise levels because the characteristics of Lombard speech are varied according to noise level. Instead, we utilize a rule-based method that automatically generates rules and flexibly controls features with any change of noise level. Specifically, we conduct a feature tendency analysis and propose a continuous rule generation model to estimate the effect of varying noise levels on features. The proposed techniques, which are based on a coarticulation model, MRTD, and spectral-GMM, can easily modify neutral speech features by following the generated rules. Voices having these features are then synthesized by STRAIGHT to obtain Lombard speech fitting to noises with varying levels. To validate our proposed method, the quality of mimicking speech is evaluated in subjective listening experiments on similarity, intelligibility, and naturalness. In varying noise levels, the results show equal similarity with Lombard speech between the proposed method and a state-of-the-art method. Intelligibility and naturalness are comparable with some feature modifications.

  • Adaptive Balanced Allocation for Peer Assessments

    Hideaki OHASHI  Yasuhito ASANO  Toshiyuki SHIMIZU  Masatoshi YOSHIKAWA  

     
    PAPER

      Pubricized:
    2019/12/26
      Vol:
    E103-D No:5
      Page(s):
    939-948

    Peer assessments, in which people review the works of peers and have their own works reviewed by peers, are useful for assessing homework. In conventional peer assessment systems, works are usually allocated to people before the assessment begins; therefore, if people drop out (abandoning reviews) during an assessment period, an imbalance occurs between the number of works a person reviews and that of peers who have reviewed the work. When the total imbalance increases, some people who diligently complete reviews may suffer from a lack of reviews and be discouraged to participate in future peer assessments. Therefore, in this study, we adopt a new adaptive allocation approach in which people are allocated review works only when requested and propose an algorithm for allocating works to people, which reduces the total imbalance. To show the effectiveness of the proposed algorithm, we provide an upper bound of the total imbalance that the proposed algorithm yields. In addition, we extend the above algorithm to consider reviewing ability. The extended algorithm avoids the problem that only unskilled (or skilled) reviewers are allocated to a given work. We show the effectiveness of the proposed two algorithms compared to the existing algorithms through experiments using simulation data.

  • Continuous Noise Masking Based Vocoder for Statistical Parametric Speech Synthesis

    Mohammed Salah AL-RADHI  Tamás Gábor CSAPÓ  Géza NÉMETH  

     
    PAPER-Speech and Hearing

      Pubricized:
    2020/02/10
      Vol:
    E103-D No:5
      Page(s):
    1099-1107

    In this article, we propose a method called “continuous noise masking (cNM)” that allows eliminating residual buzziness in a continuous vocoder, i.e. of which all parameters are continuous and offers a simple and flexible speech analysis and synthesis system. Traditional parametric vocoders generally show a perceptible deterioration in the quality of the synthesized speech due to different processing algorithms. Furthermore, an inaccurate noise resynthesis (e.g. in breathiness or hoarseness) is also considered to be one of the main underlying causes of performance degradation, leading to noisy transients and temporal discontinuity in the synthesized speech. To overcome these issues, a new cNM is developed based on the phase distortion deviation in order to reduce the perceptual effect of the residual noise, allowing a proper reconstruction of noise characteristics, and model better the creaky voice segments that may happen in natural speech. To this end, the cNM is designed to keep only voice components under a condition of the cNM threshold while discarding others. We evaluate the proposed approach and compare with state-of-the-art vocoders using objective and subjective listening tests. Experimental results show that the proposed method can reduce the effect of residual noise and can reach the quality of other sophisticated approaches like STRAIGHT and log domain pulse model (PML).

  • Vehicle Key Information Detection Algorithm Based on Improved SSD

    Ende WANG  Yong LI  Yuebin WANG  Peng WANG  Jinlei JIAO  Xiaosheng YU  

     
    PAPER-Intelligent Transport System

      Vol:
    E103-A No:5
      Page(s):
    769-779

    With the rapid development of technology and economy, the number of cars is increasing rapidly, which brings a series of traffic problems. To solve these traffic problems, the development of intelligent transportation systems are accelerated in many cities. While vehicles and their detailed information detection are great significance to the development of urban intelligent transportation system, the traditional vehicle detection algorithm is not satisfactory in the case of complex environment and high real-time requirement. The vehicle detection algorithm based on motion information is unable to detect the stationary vehicles in video. At present, the application of deep learning method in the task of target detection effectively improves the existing problems in traditional algorithms. However, there are few dataset for vehicles detailed information, i.e. driver, car inspection sign, copilot, plate and vehicle object, which are key information for intelligent transportation. This paper constructs a deep learning dataset containing 10,000 representative images about vehicles and their key information detection. Then, the SSD (Single Shot MultiBox Detector) target detection algorithm is improved and the improved algorithm is applied to the video surveillance system. The detection accuracy of small targets is improved by adding deconvolution modules to the detection network. The experimental results show that the proposed method can detect the vehicle, driver, car inspection sign, copilot and plate, which are vehicle key information, at the same time, and the improved algorithm in this paper has achieved better results in the accuracy and real-time performance of video surveillance than the SSD algorithm.

  • A Two-Stage Feedback Protocol Based on Multipath Profile for MU-MIMO Networks

    Aijing LI  Chao DONG  Zhimin LI  Qihui WU  Guodong WU  

     
    PAPER-Network

      Pubricized:
    2019/11/21
      Vol:
    E103-B No:5
      Page(s):
    559-569

    As a key technology for 5G and beyond, Multi-User Multi-Input Multi-Output (MU-MIMO) can achieve Gbps downlink rate by allowing concurrent transmission from one Access Point (AP) to multiple users. However, the huge overhead of full CSI feedback may overwhelm the gain yielded by beamforming. Although there have been many works on compress CSI to reduce the feedback overhead, the performance of beamforming may decrease because the accuracy of channel state degrades. To address the tradeoff between feedback overhead and accuracy, we present a two-stage Multipath Profile based Feedback protocol (MPF). In the first stage, compared with CSI feedback, the channel state is represented by multipath profile which has a smaller size but is accurate enough for user selection. Meanwhile, we propose an implicit polling scheme to decrease the feedback further. In the second stage, only the selected users send their CSI information to the AP to guarantee the low overhead and accuracy of steering matrix calculation. We implement and evaluate MPF with USRP N210. Experiments show that MPF can outperform alternative schemes in a variety of radio environments.

  • Universal Testing for Linear Feed-Forward/Feedback Shift Registers

    Hideo FUJIWARA  Katsuya FUJIWARA  Toshinori HOSOKAWA  

     
    PAPER-Dependable Computing

      Pubricized:
    2020/02/25
      Vol:
    E103-D No:5
      Page(s):
    1023-1030

    Linear feed-forward/feedback shift registers are used as an effective tool of testing circuits in various fields including built-in self-test and secure scan design. In this paper, we consider the issue of testing linear feed-forward/feedback shift registers themselves. To test linear feed-forward/feedback shift registers, it is necessary to generate a test sequence for each register. We first present an experimental result such that a commercial ATPG (automatic test pattern generator) cannot always generate a test sequence with high fault coverage even for 64-stage linear feed-forward/feedback shift registers. We then show that there exists a universal test sequence with 100% of fault coverage for the class of linear feed-forward/feedback shift registers so that no test generation is required, i.e., the cost of test generation is zero. We prove the existence theorem of universal test sequences for the class of linear feed-forward/feedback shift registers.

  • Air Quality Index Forecasting via Deep Dictionary Learning

    Bin CHEN  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2020/02/20
      Vol:
    E103-D No:5
      Page(s):
    1118-1125

    Air quality index (AQI) is a non-dimensional index for the description of air quality, and is widely used in air quality management schemes. A novel method for Air Quality Index Forecasting based on Deep Dictionary Learning (AQIF-DDL) and machine vision is proposed in this paper. A sky image is used as the input of the method, and the output is the forecasted AQI value. The deep dictionary learning is employed to automatically extract the sky image features and achieve the AQI forecasting. The idea of learning deeper dictionary levels stemmed from the deep learning is also included to increase the forecasting accuracy and stability. The proposed AQIF-DDL is compared with other deep learning based methods, such as deep belief network, stacked autoencoder and convolutional neural network. The experimental results indicate that the proposed method leads to good performance on AQI forecasting.

  • Orthogonal Gradient Penalty for Fast Training of Wasserstein GAN Based Multi-Task Autoencoder toward Robust Speech Recognition

    Chao-Yuan KAO  Sangwook PARK  Alzahra BADI  David K. HAN  Hanseok KO  

     
    LETTER-Speech and Hearing

      Pubricized:
    2020/01/27
      Vol:
    E103-D No:5
      Page(s):
    1195-1198

    Performance in Automatic Speech Recognition (ASR) degrades dramatically in noisy environments. To alleviate this problem, a variety of deep networks based on convolutional neural networks and recurrent neural networks were proposed by applying L1 or L2 loss. In this Letter, we propose a new orthogonal gradient penalty (OGP) method for Wasserstein Generative Adversarial Networks (WGAN) applied to denoising and despeeching models. WGAN integrates a multi-task autoencoder which estimates not only speech features but also noise features from noisy speech. While achieving 14.1% improvement in Wasserstein distance convergence rate, the proposed OGP enhanced features are tested in ASR and achieve 9.7%, 8.6%, 6.2%, and 4.8% WER improvements over DDAE, MTAE, R-CED(CNN) and RNN models.

  • A New Closed-Form Algorithm for Spatial Three-Dimensional Localization with Multiple One-Dimensional Uniform Linear Arrays

    Yifan WEI  Wanchun LI  Yuning GUO  Hongshu LIAO  

     
    LETTER-Digital Signal Processing

      Vol:
    E103-A No:4
      Page(s):
    704-709

    This paper presents a three-dimensional (3D) spatial localization algorithm by using multiple one-dimensional uniform linear arrays (ULA). We first discuss geometric features of the angle-of-arrival (AOA) measurements of the array and present the corresponding principle of spatial cone angle intersection positioning with an angular measurement model. Then, we propose a new positioning method with an analytic study on the geometric dilution of precision (GDOP) of target location in different cases. The results of simulation show that the estimation accuracy of this method can attain the Cramér-Rao Bound (CRB) under low measurement noise.

  • GUNGEN-Heartbeat: A Support System for High Quality Idea Generation Using Heartbeat Variance

    Jun MUNEMORI  Kohei KOMORI  Junko ITOU  

     
    LETTER

      Pubricized:
    2019/06/28
      Vol:
    E103-D No:4
      Page(s):
    796-799

    We propose an idea generation support system known as the “GUNGEN-Heartbeat” that uses heartbeat variations for creating high quality ideas during brainstorming. This system shows “An indication of a check list” or “An indication to promote deep breathing” at time beyond a value with variance of heart rates. We also carried out comparison experiments to evaluate the usefulness of the system.

  • Evaluating Deep Learning for Image Classification in Adversarial Environment

    Ye PENG  Wentao ZHAO  Wei CAI  Jinshu SU  Biao HAN  Qiang LIU  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2019/12/23
      Vol:
    E103-D No:4
      Page(s):
    825-837

    Due to the superior performance, deep learning has been widely applied to various applications, including image classification, bioinformatics, and cybersecurity. Nevertheless, the research investigations on deep learning in the adversarial environment are still on their preliminary stage. The emerging adversarial learning methods, e.g., generative adversarial networks, have introduced two vital questions: to what degree the security of deep learning with the presence of adversarial examples is; how to evaluate the performance of deep learning models in adversarial environment, thus, to raise security advice such that the selected application system based on deep learning is resistant to adversarial examples. To see the answers, we leverage image classification as an example application scenario to propose a framework of Evaluating Deep Learning for Image Classification (EDLIC) to conduct comprehensively quantitative analysis. Moreover, we introduce a set of evaluating metrics to measure the performance of different attacking and defensive techniques. After that, we conduct extensive experiments towards the performance of deep learning for image classification under different adversarial environments to validate the scalability of EDLIC. Finally, we give some advice about the selection of deep learning models for image classification based on these comparative results.

  • Improvement in the Effectiveness of Cutting Skill Practice for Paper-Cutting Creations Based on the Steering Law

    Takafumi HIGASHI  Hideaki KANAI  

     
    PAPER

      Pubricized:
    2019/11/29
      Vol:
    E103-D No:4
      Page(s):
    730-738

    To improve the cutting skills of learners, we developed a method for improving the skill involved in creating paper cuttings based on a steering task in the field of human-computer interaction. TaWe made patterns using the white and black boundaries that make up a picture. The index of difficulty (ID) is a numerical value based on the width and distance of the steering law. First, we evaluated novice and expert pattern-cutters, and measured their moving time (MT), error rate, and compliance with the steering law, confirming that the MT and error rate are affected by pattern width and distance. Moreover, we quantified the skills of novices and experts using ID and MT based models. We then observed changes in the cutting skills of novices who practiced with various widths and evaluated the impact of the difficulty level on skill improvement. Patterns considered to be moderately difficult for novices led to a significant improvement in skills.

  • A Deep Neural Network-Based Approach to Finding Similar Code Segments

    Dong Kwan KIM  

     
    LETTER-Software Engineering

      Pubricized:
    2020/01/17
      Vol:
    E103-D No:4
      Page(s):
    874-878

    This paper presents a Siamese architecture model with two identical Convolutional Neural Networks (CNNs) to identify code clones; two code fragments are represented as Abstract Syntax Trees (ASTs), CNN-based subnetworks extract feature vectors from the ASTs of pairwise code fragments, and the output layer produces how similar or dissimilar they are. Experimental results demonstrate that CNN-based feature extraction is effective in detecting code clones at source code or bytecode levels.

  • Deep-Donor-Induced Suppression of Current Collapse in an AlGaN-GaN Heterojunction Structure Grown on Si Open Access

    Taketoshi TANAKA  Norikazu ITO  Shinya TAKADO  Masaaki KUZUHARA  Ken NAKAHARA  

     
    PAPER-Semiconductor Materials and Devices

      Pubricized:
    2019/10/11
      Vol:
    E103-C No:4
      Page(s):
    186-190

    TCAD simulation was performed to investigate the material properties of an AlGaN/GaN structure in Deep Acceptor (DA)-rich and Deep Donor (DD)-rich GaN cases. DD-rich semi-insulating GaN generated a positively charged area thereof to prevent the electron concentration in 2DEG from decreasing, while a DA-rich counterpart caused electron depletion, which was the origin of the current collapse in AlGaN/GaN HFETs. These simulation results were well verified experimentally using three nitride samples including buffer-GaN layers with carbon concentration ([C]) of 5×1017, 5×1018, and 4×1019 cm-3. DD-rich behaviors were observed for the sample with [C]=4×1019 cm-3, and DD energy level EDD=0.6 eV was estimated by the Arrhenius plot of temperature-dependent IDS. This EDD value coincided with the previously estimated EDD. The backgate experiments revealed that these DD-rich semi-insulating GaN suppressed both current collapse and buffer leakage, thus providing characteristics desirable for practical usage.

  • Software Development Effort Estimation from Unstructured Software Project Description by Sequence Models

    Tachanun KANGWANTRAKOOL  Kobkrit VIRIYAYUDHAKORN  Thanaruk THEERAMUNKONG  

     
    PAPER

      Pubricized:
    2020/01/14
      Vol:
    E103-D No:4
      Page(s):
    739-747

    Most existing methods of effort estimations in software development are manual, labor-intensive and subjective, resulting in overestimation with bidding fail, and underestimation with money loss. This paper investigates effectiveness of sequence models on estimating development effort, in the form of man-months, from software project data. Four architectures; (1) Average word-vector with Multi-layer Perceptron (MLP), (2) Average word-vector with Support Vector Regression (SVR), (3) Gated Recurrent Unit (GRU) sequence model, and (4) Long short-term memory (LSTM) sequence model are compared in terms of man-months difference. The approach is evaluated using two datasets; ISEM (1,573 English software project descriptions) and ISBSG (9,100 software projects data), where the former is a raw text and the latter is a structured data table explained the characteristic of a software project. The LSTM sequence model achieves the lowest and the second lowest mean absolute errors, which are 0.705 and 14.077 man-months for ISEM and ISBSG datasets respectively. The MLP model achieves the lowest mean absolute errors which is 14.069 for ISBSG datasets.

  • Compromising Strategies for Agents in Multiple Interdependent Issues Negotiation

    Shun OKUHARA  Takayuki ITO  

     
    PAPER

      Pubricized:
    2020/01/21
      Vol:
    E103-D No:4
      Page(s):
    759-770

    This paper presents a compromising strategy based on constraint relaxation for automated negotiating agents in the nonlinear utility domain. Automated negotiating agents have been studied widely and are one of the key technologies for a future society in which multiple heterogeneous agents act collaboratively and competitively in order to help humans perform daily activities. A pressing issue is that most of the proposed negotiating agents utilize an ad-hoc compromising process, in which they basically just adjust/reduce a threshold to forcibly accept their opponents' offers. Because the threshold is just reduced and the agent just accepts the offer since the value is more than the threshold, it is very difficult to show how and what the agent conceded even after an agreement has been reached. To address this issue, we describe an explainable concession process using a constraint relaxation process. In this process, an agent changes its belief by relaxing constraints, i.e., removing constraints, so that it can accept it is the opponent's offer. We also propose three types of compromising strategies. Experimental results demonstrate that these strategies are efficient.

  • Robust CAPTCHA Image Generation Enhanced with Adversarial Example Methods

    Hyun KWON  Hyunsoo YOON  Ki-Woong PARK  

     
    LETTER-Information Network

      Pubricized:
    2020/01/15
      Vol:
    E103-D No:4
      Page(s):
    879-882

    Malicious attackers on the Internet use automated attack programs to disrupt the use of services via mass spamming, unnecessary bulletin boarding, and account creation. Completely automated public turing test to tell computers and humans apart (CAPTCHA) is used as a security solution to prevent such automated attacks. CAPTCHA is a system that determines whether the user is a machine or a person by providing distorted letters, voices, and images that only humans can understand. However, new attack techniques such as optical character recognition (OCR) and deep neural networks (DNN) have been used to bypass CAPTCHA. In this paper, we propose a method to generate CAPTCHA images by using the fast-gradient sign method (FGSM), iterative FGSM (I-FGSM), and the DeepFool method. We used the CAPTCHA image provided by python as the dataset and Tensorflow as the machine learning library. The experimental results show that the CAPTCHA image generated via FGSM, I-FGSM, and DeepFool methods exhibits a 0% recognition rate with ε=0.15 for FGSM, a 0% recognition rate with α=0.1 with 50 iterations for I-FGSM, and a 45% recognition rate with 150 iterations for the DeepFool method.

421-440hit(4079hit)