The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] EE(4079hit)

181-200hit(4079hit)

  • Sample Selection Approach with Number of False Predictions for Learning with Noisy Labels

    Yuichiro NOMURA  Takio KURITA  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2022/07/21
      Vol:
    E105-D No:10
      Page(s):
    1759-1768

    In recent years, deep neural networks (DNNs) have made a significant impact on a variety of research fields and applications. One drawback of DNNs is that it requires a huge amount of dataset for training. Since it is very expensive to ask experts to label the data, many non-expert data collection methods such as web crawling have been proposed. However, dataset created by non-experts often contain corrupted labels, and DNNs trained on such dataset are unreliable. Since DNNs have an enormous number of parameters, it tends to overfit to noisy labels, resulting in poor generalization performance. This problem is called Learning with Noisy labels (LNL). Recent studies showed that DNNs are robust to the noisy labels in the early stage of learning before over-fitting to noisy labels because DNNs learn the simple patterns first. Therefore DNNs tend to output true labels for samples with noisy labels in the early stage of learning, and the number of false predictions for samples with noisy labels is higher than for samples with clean labels. Based on these observations, we propose a new sample selection approach for LNL using the number of false predictions. Our method periodically collects the records of false predictions during training, and select samples with a low number of false predictions from the recent records. Then our method iteratively performs sample selection and training a DNNs model using the updated dataset. Since the model is trained with more clean samples and records more accurate false predictions for sample selection, the generalization performance of the model gradually increases. We evaluated our method on two benchmark datasets, CIFAR-10 and CIFAR-100 with synthetically generated noisy labels, and the obtained results which are better than or comparative to the-state-of-the-art approaches.

  • Multi-Port Amplifier with Enhanced Linearity and Isolation Employing Feed-Forward Techniques

    Yasunori SUZUKI  Tetsuo HIROTA  Toshio NOJIMA  

     
    PAPER

      Pubricized:
    2022/03/25
      Vol:
    E105-C No:10
      Page(s):
    501-508

    This paper proposes a new multi-port amplifier configuration that employs feed-forward techniques. In general, a multi-port amplifier is used as a transponder in a satellite transmitter. A multi-port amplifier comprises an N-in N-out input-side matrix network, N amplifiers, and an N-in N-out output-side matrix network. Based on this configuration, other undesired ports leak power to the desired port in a multi-port amplifier. If the power amplifier of a cellular base station uses a multi-port amplifier, the power leakage from the other ports causes degradation in the error vector magnitude. The proposed configuration employs N-parallel feed-forward amplifiers with a multi-port amplifier as the main amplifier. The proposed configuration drastically reduces the power leakage using the employed feed-forward techniques. An experimental 2-GHz band four-in four-out multi-port amplifier is constructed and tested. It achieves the leakage power level of -58 dB, a gain deviation of less than 0.05 dB, and a phase deviation of less than 0.45 deg. with the maximum power of 35 dBm over a 20-MHz bandwidth with the center frequency 2.14 GHz at room temperature. The experimental multi-port amplifier reduces the leakage power level by approximately 30 dB compared to that for a multi-port amplifier without the feed-forward techniques. The proposed configuration can be applied to power amplifiers in cellular base stations.

  • Analysis of Efficiency-Limiting Factors Resulting from Transistor Current Source on Class-F and Inverse Class-F Power Amplifiers Open Access

    Hiroshi YAMAMOTO  Ken KIKUCHI  Valeria VADALÀ  Gianni BOSI  Antonio RAFFO  Giorgio VANNINI  

     
    INVITED PAPER

      Pubricized:
    2022/03/25
      Vol:
    E105-C No:10
      Page(s):
    449-456

    This paper describes the efficiency-limiting factors resulting from transistor current source in the case of class-F and inverse class-F (F-1) operations under saturated region. We investigated the influence of knee voltage and gate-voltage clipping behaviors on drain efficiency as limiting factors for the current source. Numerical analysis using a simplified transistor model was carried out. As a result, we have demonstrated that the limiting factor for class-F-1 operation is the gate-diode conduction rather than knee voltage. On the other hand, class-F PA is restricted by the knee voltage effects. Furthermore, nonlinear measurements carried out on a GaN HEMT validate our analytical results.

  • Sub-Terahertz MIMO Spatial Multiplexing in Indoor Propagation Environments Open Access

    Yasutaka OGAWA  Taichi UTSUNO  Toshihiko NISHIMURA  Takeo OHGANE  Takanori SATO  

     
    INVITED PAPER

      Pubricized:
    2022/04/18
      Vol:
    E105-B No:10
      Page(s):
    1130-1138

    A sub-Terahertz band is envisioned to play a great role in 6G to achieve extreme high data-rate communication. In addition to very wide band transmission, we need spatial multiplexing using a hybrid MIMO system. A recently presented paper, however, reveals that the number of observed multipath components in a sub-Terahertz band is very few in indoor environments. A channel with few multipath components is called sparse. The number of layers (streams), i.e. multiplexing gain in a MIMO system does not exceed the number of multipaths. The sparsity may restrict the spatial multiplexing gain of sub-Terahertz systems, and the poor multiplexing gain may limit the data rate of communication systems. This paper describes fundamental considerations on sub-Terahertz MIMO spatial multiplexing in indoor environments. We examined how we should steer analog beams to multipath components to achieve higher channel capacity. Furthermore, for different beam allocation schemes, we investigated eigenvalue distributions of a channel Gram matrix, power allocation to each layer, and correlations between analog beams. Through simulation results, we have revealed that the analog beams should be steered to all the multipath components to lower correlations and to achieve higher channel capacity.

  • A Survey on Research Activities for Deploying Cell Free Massive MIMO towards Beyond 5G Open Access

    Issei KANNO  Kosuke YAMAZAKI  Yoji KISHI  Satoshi KONISHI  

     
    INVITED PAPER

      Pubricized:
    2022/04/28
      Vol:
    E105-B No:10
      Page(s):
    1107-1116

    5G service has been launched in various countries, and research for the beyond 5G is already underway actively around the world. In beyond 5G, it is expected to expand the various capabilities of communication technologies to cover further wide use cases from 5G. As a candidate elemental technology, cell free massive MIMO has been widely researched and shown its potential to enhance the capabilities from various aspects. However, for deploying this technology in reality, there are still many technical issues such as a cost of distributing antenna and installing fronthaul, and also the scalability aspects. This paper surveys research trends of cell free massive MIMO, especially focusing on the deployment challenges with an introduction to our specific related research activities including some numerical examples.

  • End-to-End Object Separation for Threat Detection in Large-Scale X-Ray Security Images

    Joanna Kazzandra DUMAGPI  Yong-Jin JEONG  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2022/07/25
      Vol:
    E105-D No:10
      Page(s):
    1807-1811

    Fine-grained image analysis, such as pixel-level approaches, improves threat detection in x-ray security images. In the practical setting, the cost of obtaining complete pixel-level annotations increases significantly, which can be reduced by partially labeling the dataset. However, handling partially labeled datasets can lead to training complicated multi-stage networks. In this paper, we propose a new end-to-end object separation framework that trains a single network on a partially labeled dataset while also alleviating the inherent class imbalance at the data and object proposal level. Empirical results demonstrate significant improvement over existing approaches.

  • A Bus Crowdedness Sensing System Using Deep-Learning Based Object Detection

    Wenhao HUANG  Akira TSUGE  Yin CHEN  Tadashi OKOSHI  Jin NAKAZAWA  

     
    PAPER

      Pubricized:
    2022/06/23
      Vol:
    E105-D No:10
      Page(s):
    1712-1720

    Crowdedness of buses is playing an increasingly important role in the disease control of COVID-19. The lack of a practical approach to sensing the crowdedness of buses is a major problem. This paper proposes a bus crowdedness sensing system which exploits deep learning-based object detection to count the numbers of passengers getting on and off a bus and thus estimate the crowdedness of buses in real time. In our prototype system, we combine YOLOv5s object detection model with Kalman Filter object tracking algorithm to implement a sensing algorithm running on a Jetson nano-based vehicular device mounted on a bus. By using the driving recorder video data taken from real bus, we experimentally evaluate the performance of the proposed sensing system to verify that our proposed system system improves counting accuracy and achieves real-time processing at the Jetson Nano platform.

  • Convolutional Auto-Encoder and Adversarial Domain Adaptation for Cross-Corpus Speech Emotion Recognition

    Yang WANG  Hongliang FU  Huawei TAO  Jing YANG  Hongyi GE  Yue XIE  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2022/07/12
      Vol:
    E105-D No:10
      Page(s):
    1803-1806

    This letter focuses on the cross-corpus speech emotion recognition (SER) task, in which the training and testing speech signals in cross-corpus SER belong to different speech corpora. Existing algorithms are incapable of effectively extracting common sentiment information between different corpora to facilitate knowledge transfer. To address this challenging problem, a novel convolutional auto-encoder and adversarial domain adaptation (CAEADA) framework for cross-corpus SER is proposed. The framework first constructs a one-dimensional convolutional auto-encoder (1D-CAE) for feature processing, which can explore the correlation among adjacent one-dimensional statistic features and the feature representation can be enhanced by the architecture based on encoder-decoder-style. Subsequently the adversarial domain adaptation (ADA) module alleviates the feature distributions discrepancy between the source and target domains by confusing domain discriminator, and specifically employs maximum mean discrepancy (MMD) to better accomplish feature transformation. To evaluate the proposed CAEADA, extensive experiments were conducted on EmoDB, eNTERFACE, and CASIA speech corpora, and the results show that the proposed method outperformed other approaches.

  • Speech-Like Emotional Sound Generation Using WaveNet

    Kento MATSUMOTO  Sunao HARA  Masanobu ABE  

     
    PAPER-Speech and Hearing

      Pubricized:
    2022/05/26
      Vol:
    E105-D No:9
      Page(s):
    1581-1589

    In this paper, we propose a new algorithm to generate Speech-like Emotional Sound (SES). Emotional expressions may be the most important factor in human communication, and speech is one of the most useful means of expressing emotions. Although speech generally conveys both emotional and linguistic information, we have undertaken the challenge of generating sounds that convey emotional information alone. We call the generated sounds “speech-like,” because the sounds do not contain any linguistic information. SES can provide another way to generate emotional response in human-computer interaction systems. To generate “speech-like” sound, we propose employing WaveNet as a sound generator conditioned only by emotional IDs. This concept is quite different from the WaveNet Vocoder, which synthesizes speech using spectrum information as an auxiliary feature. The biggest advantage of our approach is that it reduces the amount of emotional speech data necessary for training by focusing on non-linguistic information. The proposed algorithm consists of two steps. In the first step, to generate a variety of spectrum patterns that resemble human speech as closely as possible, WaveNet is trained with auxiliary mel-spectrum parameters and Emotion ID using a large amount of neutral speech. In the second step, to generate emotional expressions, WaveNet is retrained with auxiliary Emotion ID only using a small amount of emotional speech. Experimental results reveal the following: (1) the two-step training is necessary to generate the SES with high quality, and (2) it is important that the training use a large neutral speech database and spectrum information in the first step to improve the emotional expression and naturalness of SES.

  • Constant-Round Fair SS-4PC for Private Decision Tree Evaluation

    Hikaru TSUCHIDA  Takashi NISHIDE  

     
    PAPER-Cryptography and Information Security

      Pubricized:
    2022/03/09
      Vol:
    E105-A No:9
      Page(s):
    1270-1288

    Multiparty computation (MPC) is a cryptographic method that enables a set of parties to compute an arbitrary joint function of the private inputs of all parties and does not reveal any information other than the output. MPC based on a secret sharing scheme (SS-MPC) and garbled circuit (GC) is known as the most common MPC schemes. Another cryptographic method, homomorphic encryption (HE), computes an arbitrary function represented as a circuit by using ciphertexts without decrypting them. These technologies are in a trade-off relationship for the communication/round complexities, and the computation cost. The private decision tree evaluation (PDTE) is one of the key applications of these technologies. There exist several constant-round PDTE protocols based on GC, HE, or the hybrid schemes that are secure even if a malicious adversary who can deviate from protocol specifications corrupts some parties. There also exist other protocols based only on SS-MPC that are secure only if a semi-honest adversary who follows the protocol specification corrupts some parties. However, to the best of our knowledge, there are currently no constant-round PDTE protocols based only on SS-MPC that are secure against a malicious adversary. In this work, we propose a constant-round four-party PDTE protocol that achieves malicious security. Our protocol provides the PDTE securely and efficiently even when the communication environment has a large latency.

  • A Satisfiability Algorithm for Deterministic Width-2 Branching Programs Open Access

    Tomu MAKITA  Atsuki NAGAO  Tatsuki OKADA  Kazuhisa SETO  Junichi TERUYAMA  

     
    PAPER-Algorithms and Data Structures

      Pubricized:
    2022/03/08
      Vol:
    E105-A No:9
      Page(s):
    1298-1308

    A branching program is a well-studied model of computation and a representation for Boolean functions. It is a directed acyclic graph with a unique root node, some accepting nodes, and some rejecting nodes. Except for the accepting and rejecting nodes, each node has a label with a variable and each outgoing edge of the node has a label with a 0/1 assignment of the variable. The satisfiability problem for branching programs is, given a branching program with n variables and m nodes, to determine if there exists some assignment that activates a consistent path from the root to an accepting node. The width of a branching program is the maximum number of nodes at any level. The satisfiability problem for width-2 branching programs is known to be NP-complete. In this paper, we present a satisfiability algorithm for width-2 branching programs with n variables and cn nodes, and show that its running time is poly(n)·2(1-µ(c))n, where µ(c)=1/2O(c log c). Our algorithm consists of two phases. First, we transform a given width-2 branching program to a set of some structured formulas that consist of AND and Exclusive-OR gates. Then, we check the satisfiability of these formulas by a greedy restriction method depending on the frequency of the occurrence of variables.

  • A novel Adaptive Weighted Transfer Subspace Learning Method for Cross-Database Speech Emotion Recognition

    Keke ZHAO  Peng SONG  Shaokai LI  Wenjing ZHANG  Wenming ZHENG  

     
    LETTER-Speech and Hearing

      Pubricized:
    2022/06/09
      Vol:
    E105-D No:9
      Page(s):
    1643-1646

    In this letter, we present an adaptive weighted transfer subspace learning (AWTSL) method for cross-database speech emotion recognition (SER), which can efficiently eliminate the discrepancy between source and target databases. Specifically, on one hand, a subspace projection matrix is first learned to project the cross-database features into a common subspace. At the same time, each target sample can be represented by the source samples by using a sparse reconstruction matrix. On the other hand, we design an adaptive weighted matrix learning strategy, which can improve the reconstruction contribution of important features and eliminate the negative influence of redundant features. Finally, we conduct extensive experiments on four benchmark databases, and the experimental results demonstrate the efficacy of the proposed method.

  • Optimal Algorithm for Finding Representation of Subtree Distance

    Takanori MAEHARA  Kazutoshi ANDO  

     
    PAPER-Algorithms and Data Structures, Graphs and Networks

      Pubricized:
    2022/04/19
      Vol:
    E105-A No:9
      Page(s):
    1203-1210

    In this paper, we address the problem of finding a representation of a subtree distance, which is an extension of a tree metric. We show that a minimal representation is uniquely determined by a given subtree distance, and give an O(n2) time algorithm that finds such a representation, where n is the size of the ground set. Since a lower bound of the problem is Ω(n2), our algorithm achieves the optimal time complexity.

  • MSFF: A Multi-Scale Feature Fusion Network for Surface Defect Detection of Aluminum Profiles

    Lianshan SUN  Jingxue WEI  Hanchao DU  Yongbin ZHANG  Lifeng HE  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2022/05/30
      Vol:
    E105-D No:9
      Page(s):
    1652-1655

    This paper presents an improved YOLOv3 network, named MSFF-YOLOv3, for precisely detecting variable surface defects of aluminum profiles in practice. First, we introduce a larger prediction scale to provide detailed information for small defect detection; second, we design an efficient attention-guided block to extract more features of defects with less overhead; third, we design a bottom-up pyramid and integrate it with the existing feature pyramid network to construct a twin-tower structure to improve the circulation and fusion of features of different layers. In addition, we employ the K-median algorithm for anchor clustering to speed up the network reasoning. Experimental results showed that the mean average precision of the proposed network MSFF-YOLOv3 is higher than all conventional networks for surface defect detection of aluminum profiles. Moreover, the number of frames processed per second for our proposed MSFF-YOLOv3 could meet real-time requirements.

  • Single Suction Grasp Detection for Symmetric Objects Using Shallow Networks Trained with Synthetic Data

    Suraj Prakash PATTAR  Tsubasa HIRAKAWA  Takayoshi YAMASHITA  Tetsuya SAWANOBORI  Hironobu FUJIYOSHI  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2022/06/21
      Vol:
    E105-D No:9
      Page(s):
    1600-1609

    Predicting the grasping point accurately and quickly is crucial for successful robotic manipulation. However, to commercially deploy a robot, such as a dishwasher robot in a commercial kitchen, we also need to consider the constraints of limited usable resources. We present a deep learning method to predict the grasp position when using a single suction gripper for picking up objects. The proposed method is based on a shallow network to enable lower training costs and efficient inference on limited resources. Costs are further reduced by collecting data in a custom-built synthetic environment. For evaluating the proposed method, we developed a system that models a commercial kitchen for a dishwasher robot to manipulate symmetric objects. We tested our method against a model-fitting method and an algorithm-based method in our developed commercial kitchen environment and found that a shallow network trained with only the synthetic data achieves high accuracy. We also demonstrate the practicality of using a shallow network in sequence with an object detector for ease of training, prediction speed, low computation cost, and easier debugging.

  • A Two-Fold Cross-Validation Training Framework Combined with Meta-Learning for Code-Switching Speech Recognition

    Zheying HUANG  Ji XU  Qingwei ZHAO  Pengyuan ZHANG  

     
    LETTER-Speech and Hearing

      Pubricized:
    2022/06/20
      Vol:
    E105-D No:9
      Page(s):
    1639-1642

    Although end-to-end based speech recognition research for Mandarin-English code-switching has attracted increasing interests, it remains challenging due to data scarcity. Meta-learning approach is popular with low-resource modeling using high-resource data, but it does not make full use of low-resource code-switching data. Therefore we propose a two-fold cross-validation training framework combined with meta-learning approach. Experiments on the SEAME corpus demonstrate the effects of our method.

  • Fast Gated Recurrent Network for Speech Synthesis

    Bima PRIHASTO  Tzu-Chiang TAI  Pao-Chi CHANG  Jia-Ching WANG  

     
    LETTER-Speech and Hearing

      Pubricized:
    2022/06/10
      Vol:
    E105-D No:9
      Page(s):
    1634-1638

    The recurrent neural network (RNN) has been used in audio and speech processing, such as language translation and speech recognition. Although RNN-based architecture can be applied to speech synthesis, the long computing time is still the primary concern. This research proposes a fast gated recurrent neural network, a fast RNN-based architecture, for speech synthesis based on the minimal gated unit (MGU). Our architecture removes the unit state history from some equations in MGU. Our MGU-based architecture is about twice faster, with equally good sound quality than the other MGU-based architectures.

  • Highly-Accurate and Real-Time Speech Measurement for Laser Doppler Vibrometers

    Yahui WANG  Wenxi ZHANG  Zhou WU  Xinxin KONG  Yongbiao WANG  Hongxin ZHANG  

     
    PAPER-Speech and Hearing

      Pubricized:
    2022/06/08
      Vol:
    E105-D No:9
      Page(s):
    1568-1580

    Laser Doppler Vibrometers (LDVs) enable the acquisition of remote speech signals by measuring small-scale vibrations around a target. They are now widely used in the fields of information acquisition and national security. However, in remote speech detection, the coherent measurement signal is subject to environmental noise, making detecting and reconstructing speech signals challenging. To improve the detection distance and speech quality, this paper proposes a highly accurate real-time speech measurement method that can reconstruct speech from noisy coherent signals. First, the I/Q demodulation and arctangent phase discrimination are used to extract the phase transformation caused by the acoustic vibration from coherent signals. Then, an innovative smoothness criterion and a novel phase difference-based dynamic bilateral compensation phase unwrapping algorithm are used to remove any ambiguity caused by the arctangent phase discrimination in the previous step. This important innovation results in the highly accurate detection of phase jumps. After this, a further innovation is used to enhance the reconstructed speech by applying an improved waveform-based linear prediction coding method, together with adaptive spectral subtraction. This removes any impulsive or background noise. The accuracy and performance of the proposed method were validated by conducting extensive simulations and comparisons with existing techniques. The results show that the proposed algorithm can significantly improve the measurement of speech and the quality of reconstructed speech signals. The viability of the method was further assessed by undertaking a physical experiment, where LDV equipment was used to measure speech at a distance of 310m in an outdoor environment. The intelligibility rate for the reconstructed speech exceeded 95%, confirming the effectiveness and superiority of the method for long-distance laser speech measurement.

  • Convergence Acceleration via Chebyshev Step: Plausible Interpretation of Deep-Unfolded Gradient Descent

    Satoshi TAKABE  Tadashi WADAYAMA  

     
    PAPER-Numerical Analysis and Optimization

      Pubricized:
    2022/01/25
      Vol:
    E105-A No:8
      Page(s):
    1110-1120

    Deep unfolding is a promising deep-learning technique, whose network architecture is based on expanding the recursive structure of existing iterative algorithms. Although deep unfolding realizes convergence acceleration, its theoretical aspects have not been revealed yet. This study details the theoretical analysis of the convergence acceleration in deep-unfolded gradient descent (DUGD) whose trainable parameters are step sizes. We propose a plausible interpretation of the learned step-size parameters in DUGD by introducing the principle of Chebyshev steps derived from Chebyshev polynomials. The use of Chebyshev steps in gradient descent (GD) enables us to bound the spectral radius of a matrix governing the convergence speed of GD, leading to a tight upper bound on the convergence rate. Numerical results show that Chebyshev steps numerically explain the learned step-size parameters in DUGD well.

  • BFF R-CNN: Balanced Feature Fusion for Object Detection

    Hongzhe LIU  Ningwei WANG  Xuewei LI  Cheng XU  Yaze LI  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2022/05/17
      Vol:
    E105-D No:8
      Page(s):
    1472-1480

    In the neck part of a two-stage object detection network, feature fusion is generally carried out in either a top-down or bottom-up manner. However, two types of imbalance may exist: feature imbalance in the neck of the model and gradient imbalance in the region of interest extraction layer due to the scale changes of objects. The deeper the network is, the more abstract the learned features are, that is to say, more semantic information can be extracted. However, the extracted image background, spatial location, and other resolution information are less. In contrast, the shallow part can learn little semantic information, but a lot of spatial location information. We propose the Both Ends to Centre to Multiple Layers (BEtM) feature fusion method to solve the feature imbalance problem in the neck and a Multi-level Region of Interest Feature Extraction (MRoIE) layer to solve the gradient imbalance problem. In combination with the Region-based Convolutional Neural Network (R-CNN) framework, our Balanced Feature Fusion (BFF) method offers significantly improved network performance compared with the Faster R-CNN architecture. On the MS COCO 2017 dataset, it achieves an average precision (AP) that is 1.9 points and 3.2 points higher than those of the Feature Pyramid Network (FPN) Faster R-CNN framework and the Generic Region of Interest Extractor (GRoIE) framework, respectively.

181-200hit(4079hit)