The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] SPE(2504hit)

41-60hit(2504hit)

  • An Exploration of Cross-Patch Collaborations via Patch Linkage in OpenStack

    Dong WANG  Patanamon THONGTANUNAM  Raula GAIKOVINA KULA  Kenichi MATSUMOTO  

     
    PAPER

      Pubricized:
    2022/11/18
      Vol:
    E106-D No:2
      Page(s):
    148-156

    Contemporary development projects benefit from code review as it improves the quality of a project. Large ecosystems of inter-dependent projects like OpenStack generate a large number of reviews, which poses new challenges for collaboration (improving patches, fixing defects). Review tools allow developers to link between patches, to indicate patch dependency, competing solutions, or provide broader context. We hypothesize that such patch linkage may also simulate cross-collaboration. With a case study of OpenStack, we take a first step to explore collaborations that occur after a patch linkage was posted between two patches (i.e., cross-patch collaboration). Our empirical results show that although patch linkage that requests collaboration is relatively less prevalent, the probability of collaboration is relatively higher. Interestingly, the results also show that collaborative contributions via patch linkage are non-trivial, i.e, contributions can affect the review outcome (such as voting) or even improve the patch (i.e., revising). This work opens up future directions to understand barriers and opportunities related to this new kind of collaboration, that assists with code review and development tasks in large ecosystems.

  • Suppression Effect of Randomly-Disturbed LC Alignment Fluctuation on Speckle Noise for Electronic Holography Imaging Open Access

    Masatoshi YAITA  Yosei SHIBATA  Takahiro ISHINABE  Hideo FUJIKAKE  

     
    INVITED PAPER

      Pubricized:
    2022/09/08
      Vol:
    E106-C No:2
      Page(s):
    26-33

    In this paper, we proposed the phase disturbing device using randomly-fluctuated liquid crystal (LC) alignment to reduce the speckle noise generated in holographic displays. Some parameters corresponding to the alignment fluctuation of thick LC layer were quantitatively evaluated, and we clarified the effect of the LC alignment fluctuation with the parameters on speckle noise reduction.

  • Comparative Evaluation of Diverse Features in Fluency Evaluation of Spontaneous Speech

    Huaijin DENG  Takehito UTSURO  Akio KOBAYASHI  Hiromitsu NISHIZAKI  

     
    PAPER-Speech and Hearing

      Pubricized:
    2022/10/25
      Vol:
    E106-D No:1
      Page(s):
    36-45

    There have been lots of previous studies on fluency evaluation of spontaneous speech. However, most of them focus on lexical cues, and little emphasis is placed on how diverse acoustic features and deep end-to-end models contribute to improving the performance. In this paper, we describe multi-layer neural network to investigate not only lexical features extracted from transcription, but also consider utterance-level acoustic features from audio data. We also conduct the experiments to investigate the performance of end-to-end approaches with mel-spectrogram in this task. As the speech fluency evaluation task, we evaluate our proposed method in two binary classification tasks of fluent speech detection and disfluent speech detection. Speech data of around 10 seconds duration each with the annotation of the three classes of “fluent,” “neutral,” and “disfluent” is used for evaluation. According to the two way splits of those three classes, the task of fluent speech detection is defined as binary classification of fluent vs. neutral and disfluent, while that of disfluent speech detection is defined as binary classification of fluent and neutral vs. disfluent. We then conduct experiments with the purpose of comparative evaluation of multi-layer neural network with diverse features as well as end-to-end models. For the fluent speech detection, in the comparison of utterance-level disfluency-based, prosodic, and acoustic features with multi-layer neural network, disfluency-based and prosodic features only are better. More specifically, the performance improved a lot when removing all of the acoustic features from the full set of features, while the performance is damaged a lot if fillers related features are removed. Overall, however, the end-to-end Transformer+VGGNet model with mel-spectrogram achieves the best results. For the disfluent speech detection, the multi-layer neural network using disfluency-based, prosodic, and acoustic features without fillers achieves the best results. The end-to-end Transformer+VGGNet architecture also obtains high scores, whereas it is exceeded by the best results with the multi-layer neural network with significant difference. Thus, unlike in the fluent speech detection, disfluency-based and prosodic features other than fillers are still necessary in the disfluent speech detection.

  • A Non-Intrusive Speech Quality Evaluation Method Based on the Audiogram and Weighted Frequency Information for Hearing Aid

    Ruxue GUO  Pengxu JIANG  Ruiyu LIANG  Yue XIE  Cairong ZOU  

     
    LETTER-Speech and Hearing

      Pubricized:
    2022/07/25
      Vol:
    E106-A No:1
      Page(s):
    64-68

    For a long time, the compensation effect of hearing aid is mainly evaluated subjectively, and there are fewer studies of objective evaluation. Furthermore, a pure speech signal is generally required as a reference in the existing objective evaluation methods, which restricts the practicality in a real-world environment. Therefore, this paper presents a non-intrusive speech quality evaluation method for hearing aid, which combines the audiogram and weighted frequency information. The proposed model mainly includes an audiogram information extraction network, a frequency information extraction network, and a quality score mapping network. The audiogram is the input of the audiogram information extraction network, which helps the system capture the information related to hearing loss. In addition, the low-frequency bands of speech contain loudness information and the medium and high-frequency components contribute to semantic comprehension. The information of two frequency bands is input to the frequency information extraction network to obtain time-frequency information. When obtaining the high-level features of different frequency bands and audiograms, they are fused into two groups of tensors that distinguish the information of different frequency bands and used as the input of the attention layer to calculate the corresponding weight distribution. Finally, a dense layer is employed to predict the score of speech quality. The experimental results show that it is reasonable to combine the audiogram and the weight of the information from two frequency bands, which can effectively realize the evaluation of the speech quality of the hearing aid.

  • A Novel e-Cash Payment System with Divisibility Based on Proxy Blind Signature in Web of Things

    Iuon-Chang LIN  Chin-Chen CHANG  Hsiao-Chi CHIANG  

     
    PAPER-Information Network

      Pubricized:
    2022/09/02
      Vol:
    E105-D No:12
      Page(s):
    2092-2103

    The prosperous Internet communication technologies have led to e-commerce in mobile computing and made Web of Things become popular. Electronic payment is the most important part of e-commerce, so many electronic payment schemes have been proposed. However, most of proposed schemes cannot give change. Based on proxy blind signatures, an e-cash payment system is proposed in this paper to solve this problem. This system can not only provide change divisibility through Web of Things, but also provide anonymity, verifiability, unforgeability and double-spending owner track.

  • Model-Agnostic Multi-Domain Learning with Domain-Specific Adapters for Action Recognition

    Kazuki OMI  Jun KIMATA  Toru TAMAKI  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2022/09/15
      Vol:
    E105-D No:12
      Page(s):
    2119-2126

    In this paper, we propose a multi-domain learning model for action recognition. The proposed method inserts domain-specific adapters between layers of domain-independent layers of a backbone network. Unlike a multi-head network that switches classification heads only, our model switches not only the heads, but also the adapters for facilitating to learn feature representations universal to multiple domains. Unlike prior works, the proposed method is model-agnostic and doesn't assume model structures unlike prior works. Experimental results on three popular action recognition datasets (HMDB51, UCF101, and Kinetics-400) demonstrate that the proposed method is more effective than a multi-head architecture and more efficient than separately training models for each domain.

  • Robust Speech Recognition Using Teacher-Student Learning Domain Adaptation

    Han MA  Qiaoling ZHANG  Roubing TANG  Lu ZHANG  Yubo JIA  

     
    PAPER-Speech and Hearing

      Pubricized:
    2022/09/09
      Vol:
    E105-D No:12
      Page(s):
    2112-2118

    Recently, robust speech recognition for real-world applications has attracted much attention. This paper proposes a robust speech recognition method based on the teacher-student learning framework for domain adaptation. In particular, the student network will be trained based on a novel optimization criterion defined by the encoder outputs of both teacher and student networks rather than the final output posterior probabilities, which aims to make the noisy audio map to the same embedding space as clean audio, so that the student network is adaptive in the noise domain. Comparative experiments demonstrate that the proposed method obtained good robustness against noise.

  • A COM Based High Speed Serial Link Optimization Using Machine Learning Open Access

    Yan WANG  Qingsheng HU  

     
    PAPER

      Pubricized:
    2022/05/09
      Vol:
    E105-C No:11
      Page(s):
    684-691

    This paper presents a channel operating margin (COM) based high-speed serial link optimization using machine learning (ML). COM that is proposed for evaluating serial link is calculated at first and during the calculation several important equalization parameters corresponding to the best configuration are extracted which can be used for the ML modeling of serial link. Then a deep neural network containing hidden layers are investigated to model a whole serial link equalization including transmitter feed forward equalizer (FFE), receiver continuous time linear equalizer (CTLE) and decision feedback equalizer (DFE). By training, validating and testing a lot of samples that meet the COM specification of 400GAUI-8 C2C, an effective ML model is generated and the maximum relative error is only 0.1 compared with computation results. At last 3 link configurations are discussed from the view of tradeoff between the link performance and cost, illustrating that our COM based ML modeling method can be applied to advanced serial link design for NRZ, PAM4 or even other higher level pulse amplitude modulation signal.

  • A Low-Power High-Speed Sensing Scheme for Single-Ended SRAM

    Dashan SHI  Heng YOU  Jia YUAN  Yulian WANG  Shushan QIAO  

     
    PAPER-Integrated Electronics

      Pubricized:
    2022/05/06
      Vol:
    E105-C No:11
      Page(s):
    712-719

    In this paper, a reference-voltage self-selected pseudo-differential sensing scheme suitable for single-ended SRAM is proposed. The proposed sensing scheme can select different reference voltage according to the offset direction. With the employment of the new sensing scheme, the swing of the read bit-line in the read operation is reduced by 74.6% and 45.5% compared to the conventional domino and the pseudo-differential sense amplifier sensing scheme, respectively. Therefore, the delay and power consumption of the read operation are significantly improved. Simulation results based on a standard 55nm CMOS show that compared with the conventional domino and pseudo-differential sensing schemes, the sensing delay is improved by 66.4% and 47.7%, and the power consumption is improved by 31.4% and 22.5%, respectively. Although the area of the sensing scheme is increased by 50.8% compared with the pseudo-differential sense amplifier sensing scheme, it has little effect on the entire SRAM area.

  • A Characterization on Necessary Conditions of Realizability for Reactive System Specifications

    Takashi TOMITA  Shigeki HAGIHARA  Masaya SHIMAKAWA  Naoki YONEZAKI  

     
    PAPER

      Pubricized:
    2022/04/08
      Vol:
    E105-D No:10
      Page(s):
    1665-1677

    This paper focuses on verification for reactive system specifications. A reactive system is an open system that continuously interacts with an uncontrollable external environment, and it must often be highly safe and reliable. However, realizability checking for a given specification is very costly, so we need effective methods to detect and analyze defects in unrealizable specifications to refine them efficiently. We introduce a systematic characterization on necessary conditions of realizability. This characterization is based on quantifications for inputs and outputs in early and late behaviors and reveals four essential aspects of realizability: exhaustivity, strategizability, preservability and stability. Additionally, the characterization derives new necessary conditions, which enable us to classify unrealizable specifications systematically and hierarchically.

  • Admittance Spectroscopy Up to 67 GHz in InGaAs/InAlAs Triple-Barrier Resonant Tunneling Diodes

    Kotaro AIKAWA  Michihiko SUHARA  Takumi KIMURA  Junki WAKAYAMA  Takeshi MAKINO  Katsuhiro USUI  Kiyoto ASAKAWA  Kouichi AKAHANE  Issei WATANABE  

     
    BRIEF PAPER

      Pubricized:
    2022/06/30
      Vol:
    E105-C No:10
      Page(s):
    622-626

    S-parameters of InGaAs/InAlAs triple-barrier resonant tunneling diodes (TBRTDs) were measured up to 67 GHz with various mesa areas and various bias voltages. Admittance data of bare TBRTDs are deembedded and evaluated by getting rid of parasitic components with help of electromagnetic simulations for particular fabricated device structures. Admittance spectroscopy up to 67 GHz is applied for bare TBRTDs for the first time and a Kramers-Kronig relation with Lorentzian function is found to be a consistent model for the admittance especially in cases of low bias conditions. Relaxation time included in the Lorentzian function are tentatively evaluated as the order of several pico second.

  • Non-Destructive Inspection of Twisted Wire in Resin Cover Using Terahertz Wave Open Access

    Masaki NAKAMORI  Yukihiro GOTO  Tomoya SHIMIZU  Nazuki HONDA  

     
    PAPER-Transmission Systems and Transmission Equipment for Communications

      Pubricized:
    2022/04/13
      Vol:
    E105-B No:10
      Page(s):
    1202-1208

    We proposed a new method for evaluating the deterioration of messenger wires by using terahertz waves. We use terahertz time-domain spectroscopy to measure several twisted wire samples with different levels of deterioration. We find that each twisted wire sample had a different distribution of reflection intensity which was due to the wires' twist structure. We show that it is possible to assess the degradation from the straight lines present in the reflection intensity distribution image. Furthermore, it was confirmed that our method can be applied to wire covered with resin.

  • Convolutional Auto-Encoder and Adversarial Domain Adaptation for Cross-Corpus Speech Emotion Recognition

    Yang WANG  Hongliang FU  Huawei TAO  Jing YANG  Hongyi GE  Yue XIE  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2022/07/12
      Vol:
    E105-D No:10
      Page(s):
    1803-1806

    This letter focuses on the cross-corpus speech emotion recognition (SER) task, in which the training and testing speech signals in cross-corpus SER belong to different speech corpora. Existing algorithms are incapable of effectively extracting common sentiment information between different corpora to facilitate knowledge transfer. To address this challenging problem, a novel convolutional auto-encoder and adversarial domain adaptation (CAEADA) framework for cross-corpus SER is proposed. The framework first constructs a one-dimensional convolutional auto-encoder (1D-CAE) for feature processing, which can explore the correlation among adjacent one-dimensional statistic features and the feature representation can be enhanced by the architecture based on encoder-decoder-style. Subsequently the adversarial domain adaptation (ADA) module alleviates the feature distributions discrepancy between the source and target domains by confusing domain discriminator, and specifically employs maximum mean discrepancy (MMD) to better accomplish feature transformation. To evaluate the proposed CAEADA, extensive experiments were conducted on EmoDB, eNTERFACE, and CASIA speech corpora, and the results show that the proposed method outperformed other approaches.

  • Speech-Like Emotional Sound Generation Using WaveNet

    Kento MATSUMOTO  Sunao HARA  Masanobu ABE  

     
    PAPER-Speech and Hearing

      Pubricized:
    2022/05/26
      Vol:
    E105-D No:9
      Page(s):
    1581-1589

    In this paper, we propose a new algorithm to generate Speech-like Emotional Sound (SES). Emotional expressions may be the most important factor in human communication, and speech is one of the most useful means of expressing emotions. Although speech generally conveys both emotional and linguistic information, we have undertaken the challenge of generating sounds that convey emotional information alone. We call the generated sounds “speech-like,” because the sounds do not contain any linguistic information. SES can provide another way to generate emotional response in human-computer interaction systems. To generate “speech-like” sound, we propose employing WaveNet as a sound generator conditioned only by emotional IDs. This concept is quite different from the WaveNet Vocoder, which synthesizes speech using spectrum information as an auxiliary feature. The biggest advantage of our approach is that it reduces the amount of emotional speech data necessary for training by focusing on non-linguistic information. The proposed algorithm consists of two steps. In the first step, to generate a variety of spectrum patterns that resemble human speech as closely as possible, WaveNet is trained with auxiliary mel-spectrum parameters and Emotion ID using a large amount of neutral speech. In the second step, to generate emotional expressions, WaveNet is retrained with auxiliary Emotion ID only using a small amount of emotional speech. Experimental results reveal the following: (1) the two-step training is necessary to generate the SES with high quality, and (2) it is important that the training use a large neutral speech database and spectrum information in the first step to improve the emotional expression and naturalness of SES.

  • Dispersion on Intervals

    Tetsuya ARAKI  Hiroyuki MIYATA  Shin-ichi NAKANO  

     
    PAPER-Algorithms and Data Structures

      Pubricized:
    2022/03/08
      Vol:
    E105-A No:9
      Page(s):
    1181-1186

    Given a set of n disjoint intervals on a line and an integer k, we want to find k points in the intervals so that the minimum pairwise distance of the k points is maximized. Intuitively, given a set of n disjoint time intervals on a timeline, each of which is a time span we are allowed to check something, and an integer k, which is the number of times we will check something, we plan k checking times so that the checks occur at equal time intervals as much as possible, that is, we want to maximize the minimum time interval between the k checking times. We call the problem the k-dispersion problem on intervals. If we need to choose exactly one point in each interval, so k=n, and the disjoint intervals are given in the sorted order on the line, then two O(n) time algorithms to solve the problem are known. In this paper we give the first O(n) time algorithm to solve the problem for any constant k. Our algorithm works even if the disjoint intervals are given in any (not sorted) order. If the disjoint intervals are given in the sorted order on the line, then, by slightly modifying the algorithm, one can solve the problem in O(log n) time. This is the first sublinear time algorithm to solve the problem. Also we show some results on the k-dispersion problem on disks, including an FPTAS.

  • A novel Adaptive Weighted Transfer Subspace Learning Method for Cross-Database Speech Emotion Recognition

    Keke ZHAO  Peng SONG  Shaokai LI  Wenjing ZHANG  Wenming ZHENG  

     
    LETTER-Speech and Hearing

      Pubricized:
    2022/06/09
      Vol:
    E105-D No:9
      Page(s):
    1643-1646

    In this letter, we present an adaptive weighted transfer subspace learning (AWTSL) method for cross-database speech emotion recognition (SER), which can efficiently eliminate the discrepancy between source and target databases. Specifically, on one hand, a subspace projection matrix is first learned to project the cross-database features into a common subspace. At the same time, each target sample can be represented by the source samples by using a sparse reconstruction matrix. On the other hand, we design an adaptive weighted matrix learning strategy, which can improve the reconstruction contribution of important features and eliminate the negative influence of redundant features. Finally, we conduct extensive experiments on four benchmark databases, and the experimental results demonstrate the efficacy of the proposed method.

  • Joint User Association and Spectrum Allocation in Satellite-Terrestrial Integrated Networks

    Wenjing QIU  Aijun LIU  Chen HAN  Aihong LU  

     
    PAPER-Wireless Communication Technologies

      Pubricized:
    2022/03/15
      Vol:
    E105-B No:9
      Page(s):
    1063-1077

    This paper investigates the joint problem of user association and spectrum allocation in satellite-terrestrial integrated networks (STINs), where a low earth orbit (LEO) satellite access network cooperating with terrestrial networks constitutes a heterogeneous network, which is beneficial in terms of both providing seamless coverage as well as improving the backhaul capacity for the dense network scenario. However, the orbital movement of satellites results in the dynamic change of accessible satellites and the backhaul capacities. Moreover, spectrum sharing may be faced with severe co-channel interferences (CCIs) caused by overlapping coverage of multiple access points (APs). This paper aims to maximize the total sum rate considering the influences of the dynamic feature of STIN, backhaul capacity limitation and interference management. The optimization problem is then decomposed into two subproblems: resource allocation for terrestrial communications and satellite communications, which are both solved by matching algorithms. Finally, simulation results show the effectiveness of our proposed scheme in terms of STIN's sum rate and spectrum efficiency.

  • Altered Fingerprints Detection Based on Deep Feature Fusion

    Chao XU  Yunfeng YAN  Lehangyu YANG  Sheng LI  Guorui FENG  

     
    LETTER-Image Processing and Video Processing

      Pubricized:
    2022/06/13
      Vol:
    E105-D No:9
      Page(s):
    1647-1651

    The altered fingerprints help criminals escape from police and cause great harm to the society. In this letter, an altered fingerprint detection method is proposed. The method is constructed by two deep convolutional neural networks to train the time-domain and frequency-domain features. A spectral attention module is added to connect two networks. After the extraction network, a feature fusion module is then used to exploit relationship of two network features. We make ablation experiments and add the module proposed in some popular architectures. Results show the proposed method can improve the performance of altered fingerprint detection compared with the recent neural networks.

  • A Two-Fold Cross-Validation Training Framework Combined with Meta-Learning for Code-Switching Speech Recognition

    Zheying HUANG  Ji XU  Qingwei ZHAO  Pengyuan ZHANG  

     
    LETTER-Speech and Hearing

      Pubricized:
    2022/06/20
      Vol:
    E105-D No:9
      Page(s):
    1639-1642

    Although end-to-end based speech recognition research for Mandarin-English code-switching has attracted increasing interests, it remains challenging due to data scarcity. Meta-learning approach is popular with low-resource modeling using high-resource data, but it does not make full use of low-resource code-switching data. Therefore we propose a two-fold cross-validation training framework combined with meta-learning approach. Experiments on the SEAME corpus demonstrate the effects of our method.

  • Fast Gated Recurrent Network for Speech Synthesis

    Bima PRIHASTO  Tzu-Chiang TAI  Pao-Chi CHANG  Jia-Ching WANG  

     
    LETTER-Speech and Hearing

      Pubricized:
    2022/06/10
      Vol:
    E105-D No:9
      Page(s):
    1634-1638

    The recurrent neural network (RNN) has been used in audio and speech processing, such as language translation and speech recognition. Although RNN-based architecture can be applied to speech synthesis, the long computing time is still the primary concern. This research proposes a fast gated recurrent neural network, a fast RNN-based architecture, for speech synthesis based on the minimal gated unit (MGU). Our architecture removes the unit state history from some equations in MGU. Our MGU-based architecture is about twice faster, with equally good sound quality than the other MGU-based architectures.

41-60hit(2504hit)