The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] EE(4079hit)

161-180hit(4079hit)

  • Comparative Evaluation of Diverse Features in Fluency Evaluation of Spontaneous Speech

    Huaijin DENG  Takehito UTSURO  Akio KOBAYASHI  Hiromitsu NISHIZAKI  

     
    PAPER-Speech and Hearing

      Pubricized:
    2022/10/25
      Vol:
    E106-D No:1
      Page(s):
    36-45

    There have been lots of previous studies on fluency evaluation of spontaneous speech. However, most of them focus on lexical cues, and little emphasis is placed on how diverse acoustic features and deep end-to-end models contribute to improving the performance. In this paper, we describe multi-layer neural network to investigate not only lexical features extracted from transcription, but also consider utterance-level acoustic features from audio data. We also conduct the experiments to investigate the performance of end-to-end approaches with mel-spectrogram in this task. As the speech fluency evaluation task, we evaluate our proposed method in two binary classification tasks of fluent speech detection and disfluent speech detection. Speech data of around 10 seconds duration each with the annotation of the three classes of “fluent,” “neutral,” and “disfluent” is used for evaluation. According to the two way splits of those three classes, the task of fluent speech detection is defined as binary classification of fluent vs. neutral and disfluent, while that of disfluent speech detection is defined as binary classification of fluent and neutral vs. disfluent. We then conduct experiments with the purpose of comparative evaluation of multi-layer neural network with diverse features as well as end-to-end models. For the fluent speech detection, in the comparison of utterance-level disfluency-based, prosodic, and acoustic features with multi-layer neural network, disfluency-based and prosodic features only are better. More specifically, the performance improved a lot when removing all of the acoustic features from the full set of features, while the performance is damaged a lot if fillers related features are removed. Overall, however, the end-to-end Transformer+VGGNet model with mel-spectrogram achieves the best results. For the disfluent speech detection, the multi-layer neural network using disfluency-based, prosodic, and acoustic features without fillers achieves the best results. The end-to-end Transformer+VGGNet architecture also obtains high scores, whereas it is exceeded by the best results with the multi-layer neural network with significant difference. Thus, unlike in the fluent speech detection, disfluency-based and prosodic features other than fillers are still necessary in the disfluent speech detection.

  • A Non-Intrusive Speech Quality Evaluation Method Based on the Audiogram and Weighted Frequency Information for Hearing Aid

    Ruxue GUO  Pengxu JIANG  Ruiyu LIANG  Yue XIE  Cairong ZOU  

     
    LETTER-Speech and Hearing

      Pubricized:
    2022/07/25
      Vol:
    E106-A No:1
      Page(s):
    64-68

    For a long time, the compensation effect of hearing aid is mainly evaluated subjectively, and there are fewer studies of objective evaluation. Furthermore, a pure speech signal is generally required as a reference in the existing objective evaluation methods, which restricts the practicality in a real-world environment. Therefore, this paper presents a non-intrusive speech quality evaluation method for hearing aid, which combines the audiogram and weighted frequency information. The proposed model mainly includes an audiogram information extraction network, a frequency information extraction network, and a quality score mapping network. The audiogram is the input of the audiogram information extraction network, which helps the system capture the information related to hearing loss. In addition, the low-frequency bands of speech contain loudness information and the medium and high-frequency components contribute to semantic comprehension. The information of two frequency bands is input to the frequency information extraction network to obtain time-frequency information. When obtaining the high-level features of different frequency bands and audiograms, they are fused into two groups of tensors that distinguish the information of different frequency bands and used as the input of the attention layer to calculate the corresponding weight distribution. Finally, a dense layer is employed to predict the score of speech quality. The experimental results show that it is reasonable to combine the audiogram and the weight of the information from two frequency bands, which can effectively realize the evaluation of the speech quality of the hearing aid.

  • Global Asymptotic Stabilization of Feedforward Systems with an Uncertain Delay in the Input by Event-Triggered Control

    Ho-Lim CHOI  

     
    LETTER-Systems and Control

      Pubricized:
    2022/06/28
      Vol:
    E106-A No:1
      Page(s):
    69-72

    In this letter, we consider a global stabilization problem for a class of feedforward systems by an event-triggered control. This is an extended work of [10] in a way that there are uncertain feedforward nonlinearity and time-varying input delay in the system. First, we show that the considered system is globally asymptotically stabilized by a proposed event-triggered controller with a gain-scaling factor. Then, we also show that the interexecution times can be enlarged by adjusting a gain-scaling factor. A simulation example is given for illustration.

  • Polar Coding Aided by Adaptive Channel Equalization for Underwater Acoustic Communication

    Feng LIU  Qianqian WU  Conggai LI  Fangjiong CHEN  Yanli XU  

     
    LETTER-Communication Theory and Signals

      Pubricized:
    2022/07/01
      Vol:
    E106-A No:1
      Page(s):
    83-87

    To improve the performance of underwater acoustic communications, this letter proposes a polar coding scheme with adaptive channel equalization, which can reduce the amount of feedback information. Furthermore, a hybrid automatic repeat request (HARQ) mechanism is provided to mitigate the impact of estimation errors. Simulation results show that the proposed scheme outperforms the turbo equalization in bit error rate. Computational complexity analysis is also provided for comparison.

  • Skin Visualization Using Smartphone and Deep Learning in the Beauty Industry

    Makoto HASEGAWA  Rui MATSUO  

     
    PAPER-Biocybernetics, Neurocomputing

      Pubricized:
    2022/10/12
      Vol:
    E106-D No:1
      Page(s):
    68-77

    Human skin visualization in the beauty industry with a smart-phone based on deep learning was discussed. Skin was photographed with a medical camera that could simultaneously capture RGB and UV images of the same area. Smartphone RGB images were converted into versions similar to medical RGB and UV images via a deep learning method called cycle-GAN, which was trained with the medical and the smartphone images. After converting the smartphone image into a version similar to a medical RGB image using cycle-GAN, the processed image was also converted into a pseudo-UV image via a deep learning method called U-NET. Hidden age spots were effectively visualized by this image. RGB and UV images similar to medical images can be captured with a smartphone. Provided the neural network on deep learning is trained, a medical camera is not required.

  • Design of a Dual-Wideband BPF with Parallel-Coupled Stepped Impedance Resonator and Open-Circuited Stubs

    Chun-Ping CHEN  Zhewang MA  Tetsuo ANADA  

     
    BRIEF PAPER-Microwaves, Millimeter-Waves

      Pubricized:
    2022/06/15
      Vol:
    E105-C No:12
      Page(s):
    761-766

    This brief paper proposes a dual-wideband filter consisting of a parallel-coupled stepped-impedance-resonator (SIR) and open-circuited stubs. Firstly, a notched UWB (ultra-wideband) bandpass filter (BPF) with steep skirt characteristics is theoretically designed. Then a bandstop filter(BSF) is implemented using an SIR and open stubs. By replacing the transmission line part of UWB filter with the BSF, a novel dual-wideband filter (DWBPF) is realized. As a design example, a DWBPF with two passbands, i.e. 3.4-4.8GHz and 7.25-10.25GHz, is designed to validate the design procedure. The designed filter exhibits steep skirt characteristics.

  • Deep Learning-Based Massive MIMO CSI Acquisition for 5G Evolution and 6G

    Xin WANG  Xiaolin HOU  Lan CHEN  Yoshihisa KISHIYAMA  Takahiro ASAI  

     
    PAPER-Wireless Communication Technologies

      Pubricized:
    2022/06/15
      Vol:
    E105-B No:12
      Page(s):
    1559-1568

    Channel state information (CSI) acquisition at the transmitter side is a major challenge in massive MIMO systems for enabling high-efficiency transmissions. To address this issue, various CSI feedback schemes have been proposed, including limited feedback schemes with codebook-based vector quantization and explicit channel matrix feedback. Owing to the limitations of feedback channel capacity, a common issue in these schemes is the efficient representation of the CSI with a limited number of bits at the receiver side, and its accurate reconstruction based on the feedback bits from the receiver at the transmitter side. Recently, inspired by successful applications in many fields, deep learning (DL) technologies for CSI acquisition have received considerable research interest from both academia and industry. Considering the practical feedback mechanism of 5th generation (5G) New radio (NR) networks, we propose two implementation schemes for artificial intelligence for CSI (AI4CSI), the DL-based receiver and end-to-end design, respectively. The proposed AI4CSI schemes were evaluated in 5G NR networks in terms of spectrum efficiency (SE), feedback overhead, and computational complexity, and compared with legacy schemes. To demonstrate whether these schemes can be used in real-life scenarios, both the modeled-based channel data and practically measured channels were used in our investigations. When DL-based CSI acquisition is applied to the receiver only, which has little air interface impact, it provides approximately 25% SE gain at a moderate feedback overhead level. It is feasible to deploy it in current 5G networks during 5G evolutions. For the end-to-end DL-based CSI enhancements, the evaluations also demonstrated their additional performance gain on SE, which is 6%-26% compared with DL-based receivers and 33%-58% compared with legacy CSI schemes. Considering its large impact on air-interface design, it will be a candidate technology for 6th generation (6G) networks, in which an air interface designed by artificial intelligence can be used.

  • Robust Speech Recognition Using Teacher-Student Learning Domain Adaptation

    Han MA  Qiaoling ZHANG  Roubing TANG  Lu ZHANG  Yubo JIA  

     
    PAPER-Speech and Hearing

      Pubricized:
    2022/09/09
      Vol:
    E105-D No:12
      Page(s):
    2112-2118

    Recently, robust speech recognition for real-world applications has attracted much attention. This paper proposes a robust speech recognition method based on the teacher-student learning framework for domain adaptation. In particular, the student network will be trained based on a novel optimization criterion defined by the encoder outputs of both teacher and student networks rather than the final output posterior probabilities, which aims to make the noisy audio map to the same embedding space as clean audio, so that the student network is adaptive in the noise domain. Comparative experiments demonstrate that the proposed method obtained good robustness against noise.

  • Novel Configuration for Phased-Array Antenna System Employing Frequency-Controlled Beam Steering Method

    Atsushi FUKUDA  Hiroshi OKAZAKI  Shoichi NARAHASHI  

     
    PAPER-Microwaves, Millimeter-Waves

      Pubricized:
    2022/06/10
      Vol:
    E105-C No:12
      Page(s):
    740-749

    This paper presents a novel frequency-controlled beam steering scheme for a phased-array antenna system (PAS). The proposed scheme employs phase-controlled carrier signals to form the PAS beam. Two local oscillators (LOs) and delay lines are used to generate the carrier signals. The carrier of one LO is divided into branches, and then the divided carriers passing through the corresponding delay lines have the desired phase relationship, which depends on the oscillation frequency of the LO. To confirm the feasibility of the scheme, four-branch PAS transmitters are configured and tested in a 10-GHz frequency band. The results verify that the formed beam is successfully steered in a wide range, i.e., the 3-dB beamwidth of approximately 100 degrees, using LO frequency control.

  • RVCar: An FPGA-Based Simple and Open-Source Mini Motor Car System with a RISC-V Soft Processor

    Takuto KANAMORI  Takashi ODAN  Kazuki HIROHATA  Kenji KISE  

     
    PAPER

      Pubricized:
    2022/08/09
      Vol:
    E105-D No:12
      Page(s):
    1999-2007

    Deep Neural Network (DNN) is widely used for computer vision tasks, such as image classification, object detection, and segmentation. DNN accelerator on FPGA and especially Convolutional Neural Network (CNN) is a hot topic. More research and education should be conducted to boost this field. A starting point is required to make it easy for new entrants to join this field. We believe that FPGA-based Autonomous Driving (AD) motor cars are suitable for this because DNN accelerators can be used for image processing with low latency. In this paper, we propose an FPGA-based simple and open-source mini motor car system named RVCar with a RISC-V soft processor and a CNN accelerator. RVCar is suitable for the new entrants who want to learn the implementation of a CNN accelerator and the surrounding system. The motor car consists of Xilinx Nexys A7 board and simple parts. All modules except the CNN accelerator are implemented in Verilog HDL and SystemVerilog. The CNN accelerator is converted from a PyTorch model by our tool. The accelerator is written in C++, synthesizable by Vitis HLS, and an easy-to-customize baseline for the new entrants. FreeRTOS is used to implement AD algorithms and executed on the RISC-V soft processor. It helps the users to develop the AD algorithms efficiently. We conduct a case study of the simple AD task we define. Although the task is simple, it is difficult to achieve without image recognition. We confirm that RVCar can recognize objects and make correct decisions based on the results.

  • Multi-Targeted Poisoning Attack in Deep Neural Networks

    Hyun KWON  Sunghwan CHO  

     
    LETTER

      Pubricized:
    2022/08/09
      Vol:
    E105-D No:11
      Page(s):
    1916-1920

    Deep neural networks show good performance in image recognition, speech recognition, and pattern analysis. However, deep neural networks also have weaknesses, one of which is vulnerability to poisoning attacks. A poisoning attack reduces the accuracy of a model by training the model on malicious data. A number of studies have been conducted on such poisoning attacks. The existing type of poisoning attack causes misrecognition by one classifier. In certain situations, however, it is necessary for multiple models to misrecognize certain data as different specific classes. For example, if there are enemy autonomous vehicles A, B, and C, a poisoning attack could mislead A to turn to the left, B to stop, and C to turn to the right simply by using a traffic sign. In this paper, we propose a multi-targeted poisoning attack method that causes each of several models to misrecognize certain data as a different target class. This study used MNIST and CIFAR10 as datasets and Tensorflow as a machine learning library. The experimental results show that the proposed scheme has a 100% average attack success rate on MNIST and CIFAR10 when malicious data accounting for 5% of the training dataset have been used for training.

  • Non-Orthogonal Physical Layer (NOPHY) Design towards 5G Evolution and 6G

    Xiaolin HOU  Wenjia LIU  Juan LIU  Xin WANG  Lan CHEN  Yoshihisa KISHIYAMA  Takahiro ASAI  

     
    PAPER-Wireless Communication Technologies

      Pubricized:
    2022/04/26
      Vol:
    E105-B No:11
      Page(s):
    1444-1457

    5G has achieved large-scale commercialization across the world and the global 6G research and development is accelerating. To support more new use cases, 6G mobile communication systems should satisfy extreme performance requirements far beyond 5G. The physical layer key technologies are the basis of the evolution of mobile communication systems of each generation, among which three key technologies, i.e., duplex, waveform and multiple access, are the iconic characteristics of mobile communication systems of each generation. In this paper, we systematically review the development history and trend of the three key technologies and define the Non-Orthogonal Physical Layer (NOPHY) concept for 6G, including Non-Orthogonal Duplex (NOD), Non-Orthogonal Multiple Access (NOMA) and Non-Orthogonal Waveform (NOW). Firstly, we analyze the necessity and feasibility of NOPHY from the perspective of capacity gain and implementation complexity. Then we discuss the recent progress of NOD, NOMA and NOW, and highlight several candidate technologies and their potential performance gain. Finally, combined with the new trend of 6G, we put forward a unified physical layer design based on NOPHY that well balances performance against flexibility, and point out the possible direction for the research and development of 6G physical layer key technologies.

  • Toward Selective Membership Inference Attack against Deep Learning Model

    Hyun KWON  Yongchul KIM  

     
    LETTER

      Pubricized:
    2022/07/26
      Vol:
    E105-D No:11
      Page(s):
    1911-1915

    In this paper, we propose a selective membership inference attack method that determines whether certain data corresponding to a specific class are being used as training data for a machine learning model or not. By using the proposed method, membership or non-membership can be inferred by generating a decision model from the prediction of the inference models and training the confidence values for the data corresponding to the selected class. We used MNIST as an experimental dataset and Tensorflow as a machine learning library. Experimental results show that the proposed method has a 92.4% success rate with 5 inference models for data corresponding to a specific class.

  • Loosening Bolts Detection of Bogie Box in Metro Vehicles Based on Deep Learning

    Weiwei QI  Shubin ZHENG  Liming LI  Zhenglong YANG  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2022/07/28
      Vol:
    E105-D No:11
      Page(s):
    1990-1993

    Bolts in the bogie box of metro vehicles are fasteners which are significant for bogie box structure. Effective loosening bolts detection in early stage can avoid the bolt loss and accident occurrence. Recently, detection methods based on machine vision are developed for bolt loosening. But traditional image processing and machine learning methods have high missed rate and false rate for bolts detection due to the small size and complex background. To address this problem, a loosening bolts defection method based on deep learning is proposed. The proposed method cascades two stages in a coarse-to-fine manner, including location stage based on the Single Shot Multibox Detector (SSD) and the improved SSD sequentially localizing the bogie box and bolts and a semantic segmentation stage with the U-shaped Network (U-Net) to detect the looseness of the bolts. The accuracy and effectiveness of the proposed method are verified with images captured from the Shanghai Metro Line 9. The results show that the proposed method has a higher accuracy in detecting the bolts loosening, which can guarantee the stable operation of the metro vehicles.

  • A COM Based High Speed Serial Link Optimization Using Machine Learning Open Access

    Yan WANG  Qingsheng HU  

     
    PAPER

      Pubricized:
    2022/05/09
      Vol:
    E105-C No:11
      Page(s):
    684-691

    This paper presents a channel operating margin (COM) based high-speed serial link optimization using machine learning (ML). COM that is proposed for evaluating serial link is calculated at first and during the calculation several important equalization parameters corresponding to the best configuration are extracted which can be used for the ML modeling of serial link. Then a deep neural network containing hidden layers are investigated to model a whole serial link equalization including transmitter feed forward equalizer (FFE), receiver continuous time linear equalizer (CTLE) and decision feedback equalizer (DFE). By training, validating and testing a lot of samples that meet the COM specification of 400GAUI-8 C2C, an effective ML model is generated and the maximum relative error is only 0.1 compared with computation results. At last 3 link configurations are discussed from the view of tradeoff between the link performance and cost, illustrating that our COM based ML modeling method can be applied to advanced serial link design for NRZ, PAM4 or even other higher level pulse amplitude modulation signal.

  • A Low-Power High-Speed Sensing Scheme for Single-Ended SRAM

    Dashan SHI  Heng YOU  Jia YUAN  Yulian WANG  Shushan QIAO  

     
    PAPER-Integrated Electronics

      Pubricized:
    2022/05/06
      Vol:
    E105-C No:11
      Page(s):
    712-719

    In this paper, a reference-voltage self-selected pseudo-differential sensing scheme suitable for single-ended SRAM is proposed. The proposed sensing scheme can select different reference voltage according to the offset direction. With the employment of the new sensing scheme, the swing of the read bit-line in the read operation is reduced by 74.6% and 45.5% compared to the conventional domino and the pseudo-differential sense amplifier sensing scheme, respectively. Therefore, the delay and power consumption of the read operation are significantly improved. Simulation results based on a standard 55nm CMOS show that compared with the conventional domino and pseudo-differential sensing schemes, the sensing delay is improved by 66.4% and 47.7%, and the power consumption is improved by 31.4% and 22.5%, respectively. Although the area of the sensing scheme is increased by 50.8% compared with the pseudo-differential sense amplifier sensing scheme, it has little effect on the entire SRAM area.

  • Priority Evasion Attack: An Adversarial Example That Considers the Priority of Attack on Each Classifier

    Hyun KWON  Changhyun CHO  Jun LEE  

     
    PAPER

      Pubricized:
    2022/08/23
      Vol:
    E105-D No:11
      Page(s):
    1880-1889

    Deep neural networks (DNNs) provide excellent services in machine learning tasks such as image recognition, speech recognition, pattern recognition, and intrusion detection. However, an adversarial example created by adding a little noise to the original data can result in misclassification by the DNN and the human eye cannot tell the difference from the original data. For example, if an attacker creates a modified right-turn traffic sign that is incorrectly categorized by a DNN, an autonomous vehicle with the DNN will incorrectly classify the modified right-turn traffic sign as a U-Turn sign, while a human will correctly classify that changed sign as right turn sign. Such an adversarial example is a serious threat to a DNN. Recently, an adversarial example with multiple targets was introduced that causes misclassification by multiple models within each target class using a single modified image. However, it has the weakness that as the number of target models increases, the overall attack success rate decreases. Therefore, if there are multiple models that the attacker wishes to attack, the attacker must control the attack success rate for each model by considering the attack priority for each model. In this paper, we propose a priority adversarial example that considers the attack priority for each model in cases targeting multiple models. The proposed method controls the attack success rate for each model by adjusting the weight of the attack function in the generation process while maintaining minimal distortion. We used MNIST and CIFAR10 as data sets and Tensorflow as machine learning library. Experimental results show that the proposed method can control the attack success rate for each model by considering each model's attack priority while maintaining minimal distortion (average 3.95 and 2.45 with MNIST for targeted and untargeted attacks, respectively, and average 51.95 and 44.45 with CIFAR10 for targeted and untargeted attacks, respectively).

  • Output Power Characterization of Flexible Thermoelectric Power Generators

    Daiki KANSAKU  Nobuhiro KAWASE  Naoki FUJIWARA  Faizan KHAN  Arockiyasamy Periyanayaga KRISTY  Kuruvankatil Dharmajan NISHA  Toshitaka YAMAKAWA  Kazushi IKEDA  Yasuhiro HAYAKAWA  Kenji MURAKAMI  Masaru SHIMOMURA  Hiroya IKEDA  

     
    BRIEF PAPER

      Pubricized:
    2022/04/21
      Vol:
    E105-C No:10
      Page(s):
    639-642

    To facilitate the reuse of environmental waste heat in our society, we have developed high-efficiency flexible thermoelectric power generators (TEPGs). In this study, we investigated the thermoelectromotive force (TEMF) and output power of a prototype device with 50 pairs of Π-type structures using a homemade measurement system for flexible TEPGs in order to evaluate their characteristics along the thickness direction. The prototype device consisted of C fabrics (CAFs) used as p-type materials, NiCu fabrics (NCFs) used as n-type materials, and Ag fabrics (AGFs) used as metal electrodes. Applying a temperature difference of 5K, we obtained a TEMF of 150μV and maximum output power of 6.4pW. The obtained TEMF was smaller than that expected from the Seebeck coefficients of each fabric, which is considered to be mainly because of the influence of contact thermal resistance at the semiconductor-fabric/AGF interfaces.

  • A 0.4-V 29-GHz-Bandwidth Power-Scalable Distributed Amplifier in 55-nm CMOS DDC Process

    Sangyeop LEE  Shuhei AMAKAWA  Takeshi YOSHIDA  Minoru FUJISHIMA  

     
    BRIEF PAPER

      Pubricized:
    2022/04/11
      Vol:
    E105-C No:10
      Page(s):
    561-564

    A power-scalable wideband distributed amplifier is proposed. For reducing the power consumption of this power-hungry amplifier, it is efficient to lower the supply voltage. However, there is a hurdle owing to the transistor threshold voltage. In this work, a CMOS deeply depleted channel process is employed to overcome the hurdle.

  • Compressed Sensing EEG Measurement Technique with Normally Distributed Sampling Series

    Yuki OKABE  Daisuke KANEMOTO  Osamu MAIDA  Tetsuya HIROSE  

     
    LETTER-Measurement Technology

      Pubricized:
    2022/04/22
      Vol:
    E105-A No:10
      Page(s):
    1429-1433

    We propose a sampling method that incorporates a normally distributed sampling series for EEG measurements using compressed sensing. We confirmed that the ADC sampling count and amount of wirelessly transmitted data can be reduced by 11% while maintaining a reconstruction accuracy similar to that of the conventional method.

161-180hit(4079hit)