The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] EE(4079hit)

261-280hit(4079hit)

  • Balanced, Unbalances, and One-Sided Distributed Teams - An Empirical View on Global Software Engineering Education

    Daniel Moritz MARUTSCHKE  Victor V. KRYSSANOV  Patricia BROCKMANN  

     
    PAPER

      Pubricized:
    2021/09/30
      Vol:
    E105-D No:1
      Page(s):
    2-10

    Global software engineering education faces unique challenges to reflect as close as possible real-world distributed team development in various forms. The complex nature of planning, collaborating, and upholding partnerships present administrative difficulties on top of budgetary constrains. These lead to limited opportunities for students to gain international experiences and for researchers to propagate educational and practical insights. This paper presents an empirical view on three different course structures conducted by the same research and educational team over a four-year time span. The courses were managed in Japan and Germany, facing cultural challenges, time-zone differences, language barriers, heterogeneous and homogeneous team structures, amongst others. Three semesters were carried out before and one during the Covid-19 pandemic. Implications for a recent focus on online education for software engineering education and future directions are discussed. As administrational and institutional differences typically do not guarantee the same number of students on all sides, distributed teams can be 1. balanced, where the number of students on one side is less than double the other, 2. unbalanced, where the number of students on one side is significantly larger than double the other, or 3. one-sided, where one side lacks students altogether. An approach for each of these three course structures is presented and discussed. Empirical analyses and reoccurring patterns in global software engineering education are reported. In the most recent three global software engineering classes, students were surveyed at the beginning and the end of the semester. The questionnaires ask students to rank how impactful they perceive factors related to global software development such as cultural aspects, team structure, language, and interaction. Results of the shift in mean perception are compared and discussed for each of the three team structures.

  • Device-Free Localization via Sparse Coding with a Generalized Thresholding Algorithm

    Qin CHENG  Linghua ZHANG  Bo XUE  Feng SHU  Yang YU  

     
    PAPER-Wireless Communication Technologies

      Pubricized:
    2021/08/05
      Vol:
    E105-B No:1
      Page(s):
    58-66

    As an emerging technology, device-free localization (DFL) using wireless sensor networks to detect targets not carrying any electronic devices, has spawned extensive applications, such as security safeguards and smart homes or hospitals. Previous studies formulate DFL as a classification problem, but there are still some challenges in terms of accuracy and robustness. In this paper, we exploit a generalized thresholding algorithm with parameter p as a penalty function to solve inverse problems with sparsity constraints for DFL. The function applies less bias to the large coefficients and penalizes small coefficients by reducing the value of p. By taking the distinctive capability of the p thresholding function to measure sparsity, the proposed approach can achieve accurate and robust localization performance in challenging environments. Extensive experiments show that the algorithm outperforms current alternatives.

  • Multi-Model Selective Backdoor Attack with Different Trigger Positions

    Hyun KWON  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2021/10/21
      Vol:
    E105-D No:1
      Page(s):
    170-174

    Deep neural networks show good performance in image recognition, speech recognition, and pattern analysis. However, deep neural networks show weaknesses, one of which is vulnerability to backdoor attacks. A backdoor attack performs additional training of the target model on backdoor samples that contain a specific trigger so that normal data without the trigger will be correctly classified by the model, but the backdoor samples with the specific trigger will be incorrectly classified by the model. Various studies on such backdoor attacks have been conducted. However, the existing backdoor attack causes misclassification by one classifier. In certain situations, it may be necessary to carry out a selective backdoor attack on a specific model in an environment with multiple models. In this paper, we propose a multi-model selective backdoor attack method that misleads each model to misclassify samples into a different class according to the position of the trigger. The experiment for this study used MNIST and Fashion-MNIST as datasets and TensorFlow as the machine learning library. The results show that the proposed scheme has a 100% average attack success rate for each model while maintaining 97.1% and 90.9% accuracy on the original samples for MNIST and Fashion-MNIST, respectively.

  • Study in CSI Correction Localization Algorithm with DenseNet Open Access

    Junna SHANG  Ziyang YAO  

     
    PAPER-Navigation, Guidance and Control Systems

      Pubricized:
    2021/06/23
      Vol:
    E105-B No:1
      Page(s):
    76-84

    With the arrival of 5G and the popularity of smart devices, indoor localization technical feasibility has been verified, and its market demands is huge. The channel state information (CSI) extracted from Wi-Fi is physical layer information which is more fine-grained than the received signal strength indication (RSSI). This paper proposes a CSI correction localization algorithm using DenseNet, which is termed CorFi. This method first uses isolation forest to eliminate abnormal CSI, and then constructs a CSI amplitude fingerprint containing time, frequency and antenna pair information. In an offline stage, the densely connected convolutional networks (DenseNet) are trained to establish correspondence between CSI and spatial position, and generalized extended interpolation is applied to construct the interpolated fingerprint database. In an online stage, DenseNet is used for position estimation, and the interpolated fingerprint database and K-nearest neighbor (KNN) are combined to correct the position of the prediction results with low maximum probability. In an indoor corridor environment, the average localization error is 0.536m.

  • 200W Four-Way Combined Pulsed Amplifier with 40% Power-Added Efficiency in X-Band

    Shubo DUN  Tiedi ZHANG  

     
    PAPER-Microwaves, Millimeter-Waves

      Pubricized:
    2021/08/17
      Vol:
    E105-C No:1
      Page(s):
    18-23

    This paper presents an X-band power-combined pulsed high power amplifier (HPA) based on the low insertion loss waveguide combiner. Relationships between the return loss and isolation of the magic Tee (MT) have been analyzed and the accurate design technique is given. The combination network is validated by the measurement of a single MT and a four-way passive network, and the characterization of the combined HPA module is designed, fabricated and discussed. The HPA delivers 200W output power with an associated power-added efficiency close to 40% within the frequency range of 7.8 GHz to 12.3 GHz. The combination efficiency is higher than 93%.

  • Effects of Image Processing Operations on Adversarial Noise and Their Use in Detecting and Correcting Adversarial Images Open Access

    Huy H. NGUYEN  Minoru KURIBAYASHI  Junichi YAMAGISHI  Isao ECHIZEN  

     
    PAPER

      Pubricized:
    2021/10/05
      Vol:
    E105-D No:1
      Page(s):
    65-77

    Deep neural networks (DNNs) have achieved excellent performance on several tasks and have been widely applied in both academia and industry. However, DNNs are vulnerable to adversarial machine learning attacks in which noise is added to the input to change the networks' output. Consequently, DNN-based mission-critical applications such as those used in self-driving vehicles have reduced reliability and could cause severe accidents and damage. Moreover, adversarial examples could be used to poison DNN training data, resulting in corruptions of trained models. Besides the need for detecting adversarial examples, correcting them is important for restoring data and system functionality to normal. We have developed methods for detecting and correcting adversarial images that use multiple image processing operations with multiple parameter values. For detection, we devised a statistical-based method that outperforms the feature squeezing method. For correction, we devised a method that uses for the first time two levels of correction. The first level is label correction, with the focus on restoring the adversarial images' original predicted labels (for use in the current task). The second level is image correction, with the focus on both the correctness and quality of the corrected images (for use in the current and other tasks). Our experiments demonstrated that the correction method could correct nearly 90% of the adversarial images created by classical adversarial attacks and affected only about 2% of the normal images.

  • Movie Map for Virtual Exploration in a City

    Kiyoharu AIZAWA  

     
    INVITED PAPER

      Pubricized:
    2021/10/12
      Vol:
    E105-D No:1
      Page(s):
    38-45

    This paper introduces our work on a Movie Map, which will enable users to explore a given city area using 360° videos. Visual exploration of a city is always needed. Nowadays, we are familiar with Google Street View (GSV) that is an interactive visual map. Despite the wide use of GSV, it provides sparse images of streets, which often confuses users and lowers user satisfaction. Forty years ago, a video-based interactive map was created - it is well-known as Aspen Movie Map. Movie Map uses videos instead of sparse images and seems to improve the user experience dramatically. However, Aspen Movie Map was based on analog technology with a huge effort and never built again. Thus, we renovate the Movie Map using state-of-the-art technology. We build a new Movie Map system with an interface for exploring cities. The system consists of four stages; acquisition, analysis, management, and interaction. After acquiring 360° videos along streets in target areas, the analysis of videos is almost automatic. Frames of the video are localized on the map, intersections are detected, and videos are segmented. Turning views at intersections are synthesized. By connecting the video segments following the specified movement in an area, we can watch a walking view along a street. The interface allows for easy exploration of a target area. It can also show virtual billboards in the view.

  • Observation of Arc Discharges Occurring between Commutator and Brush Simulating a DC Motor by Means of a High-Speed Camera

    Ryosuke SANO  Junya SEKIKAWA  

     
    PAPER

      Pubricized:
    2021/06/09
      Vol:
    E104-C No:12
      Page(s):
    673-680

    Observed results of arc discharges generated between the brush and commutator are reported. The motion of the arc discharges was observed by a high-speed camera. The brush and commutator were installed to an experimental device that simulated the rotational motion of a real DC motor. The aim of this paper is to investigate the occurring position, dimensions, and moving characteristics of the arc discharges by means of high-speed imaging. Time evolutions of the arc voltage and current were measured, simultaneously. The arc discharges were generated when an inductive circuit was interrupted. Circuit current before interruption was 4A. The metal graphite or graphite brush and a copper commutator were used. Following results were obtained. The arc discharge was dragged on the brush surface and the arc discharge was sticking to the side surface of the commutator. The positions of the arc spots were on the end of the commutator and the center of the brush in rotational direction. The dimensions of the arc discharge were about 0.2 mm in length and about 0.3 mm in width. The averaged arc voltage during arc duration became higher and the light emission from the arc discharge became brighter, as the copper content of the cathode decreased.

  • An FPGA-Based Optimizer Design for Distributed Deep Learning with Multiple GPUs

    Tomoya ITSUBO  Michihiro KOIBUCHI  Hideharu AMANO  Hiroki MATSUTANI  

     
    PAPER

      Pubricized:
    2021/07/01
      Vol:
    E104-D No:12
      Page(s):
    2057-2067

    Since deep learning workloads perform a large number of matrix operations on training data, GPUs (Graphics Processing Units) are efficient especially for the training phase. A cluster of computers each of which equips multiple GPUs can significantly accelerate the deep learning workloads. More specifically, a back-propagation algorithm following a gradient descent approach is used for the training. Although the gradient computation is still a major bottleneck of the training, gradient aggregation and optimization impose both communication and computation overheads, which should also be reduced for further shortening the training time. To address this issue, in this paper, multiple GPUs are interconnected with a PCI Express (PCIe) over 10Gbit Ethernet (10GbE) technology. Since these remote GPUs are interconnected with network switches, gradient aggregation and optimizers (e.g., SGD, AdaGrad, Adam, and SMORMS3) are offloaded to FPGA-based 10GbE switches between remote GPUs; thus, the gradient aggregation and parameter optimization are completed in the network. The proposed FPGA-based 10GbE switches with the four optimizers are implemented on NetFPGA-SUME board. Their resource utilizations are increased by PEs for the optimizers, and they consume up to 56% of the resources. Evaluation results using four remote GPUs connected via the proposed FPGA-based switch demonstrate that these optimizers are accelerated by up to 3.0x and 1.25x compared to CPU and GPU implementations, respectively. Also, the gradient aggregation throughput by the FPGA-based switch achieves up to 98.3% of the 10GbE line rate.

  • Neural Incremental Speech Recognition Toward Real-Time Machine Speech Translation

    Sashi NOVITASARI  Sakriani SAKTI  Satoshi NAKAMURA  

     
    PAPER-Speech and Hearing

      Pubricized:
    2021/08/27
      Vol:
    E104-D No:12
      Page(s):
    2195-2208

    Real-time machine speech translation systems mimic human interpreters and translate incoming speech from a source language to the target language in real-time. Such systems can be achieved by performing low-latency processing in ASR (automatic speech recognition) module before passing the output to MT (machine translation) and TTS (text-to-speech synthesis) modules. Although several studies recently proposed sequence mechanisms for neural incremental ASR (ISR), these frameworks have a more complicated training mechanism than the standard attention-based ASR because they have to decide the incremental step and learn the alignment between speech and text. In this paper, we propose attention-transfer ISR (AT-ISR) that learns the knowledge from attention-based non-incremental ASR for a low delay end-to-end speech recognition. ISR comes with a trade-off between delay and performance, so we investigate how to reduce AT-ISR delay without a significant performance drop. Our experiment shows that AT-ISR achieves a comparable performance to the non-incremental ASR when the incremental recognition begins after the speech utterance reaches 25% of the complete utterance length. Additional experiments to investigate the effect of ISR on translation tasks are also performed. The focus is to find the optimum granularity of the output unit. The results reveal that our end-to-end subword-level ISR resulted in the best translation quality with the lowest WER and the lowest uncovered-word rate.

  • An Improved U-Net Architecture for Image Dehazing

    Wenyi GE  Yi LIN  Zhitao WANG  Guigui WANG  Shihan TAN  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2021/09/14
      Vol:
    E104-D No:12
      Page(s):
    2218-2225

    In this paper, we present a simple yet powerful deep neural network for natural image dehazing. The proposed method is designed based on U-Net architecture and we made some design changes to make it better. We first use Group Normalization to replace Batch Normalization to solve the problem of insufficient batch size due to hardware limitations. Second, we introduce FReLU activation into the U-Net block, which can achieve capturing complicated visual layouts with regular convolutions. Experimental results on public benchmarks demonstrate the effectiveness of the modified components. On the SOTS Indoor and Outdoor datasets, it obtains PSNR of 32.23 and 31.64 respectively, which are comparable performances with state-of-the-art methods. The code is publicly available online soon.

  • Representation Learning of Tongue Dynamics for a Silent Speech Interface

    Hongcui WANG  Pierre ROUSSEL  Bruce DENBY  

     
    PAPER-Speech and Hearing

      Pubricized:
    2021/08/24
      Vol:
    E104-D No:12
      Page(s):
    2209-2217

    A Silent Speech Interface (SSI) is a sensor-based, Artificial Intelligence (AI) enabled system in which articulation is performed without the use of the vocal chords, resulting in a voice interface that conserves the ambient audio environment, protects private data, and also functions in noisy environments. Though portable SSIs based on ultrasound imaging of the tongue have obtained Word Error Rates rivaling that of acoustic speech recognition, SSIs remain relegated to the laboratory due to stability issues. Indeed, reliable extraction of acoustic features from ultrasound tongue images in real-life situations has proven elusive. Recently, Representation Learning has shown considerable success in learning underlying structure in noisy, high-dimensional raw data. In its unsupervised form, Representation Learning is able to reveal structure in unlabeled data, thus greatly simplifying the data preparation task. In the present article, a 3D Convolutional Neural Network architecture is applied to unlabeled ultrasound images, and is shown to reliably predict future tongue configurations. By comparing the 3DCNN to a simple previous-frame predictor, it is possible to recognize tongue trajectories comprising transitions between regions of stability that correlate with formant trajectories in a spectrogram of the signal. Prospects for using the underlying structural representation to provide features for subsequent speech processing tasks are presented.

  • Dependence of Arc Duration and Contact Gap at Arc Extinction of Break Arcs Occurring in a 48VDC/10A-300A Resistive Circuit on Contact Opening Speed

    Haruko YAZAKI  Junya SEKIKAWA  

     
    PAPER-Electromechanical Devices and Components

      Pubricized:
    2021/04/01
      Vol:
    E104-C No:11
      Page(s):
    656-662

    Dependences of arc duration D and contact gap at arc extinction d on contact opening speed v are studied for break arcs generated in a 48VDC resistive circuit at constant contact opening speeds. The opening speed v is varied over a wide range from 0.05 to 0.5m/s. Circuit current while electrical contacts are closed I0 is varied to 10A, 20A, 50A, 100A, 200A, and 300A. The following results were obtained. For each current I0, the arc duration D decreased with increasing contact opening speed v. However, the D at I0=300A was shorter than that at I0=200A. On the other hand, the contact gap at arc extinction d tended to increase with increasing the I0. However, the d at I0=300A was shorter than that at I0=200A. The d was almost constant with increasing the v for each current I0 when the I0 was lower than 200A. However, the d became shorter when the v was slower at I0=200A and 300A. At the v=0.05m/s, for example, the d at I0=300A was shorter than that at I0=100A. To explain the cause of the results of the d, in addition, arc length just before extinction L were analyzed. The L tended to increase with increasing current I0. The L was almost constant with increasing the v when the I0 was lower than 200A. However, when I0=200A and 300A, the L tended to become longer when the v was slower. The characteristics of the d will be discussed using the analyzed results of the L and motion of break arcs. At higher currents at I0=200A and 300A, the shorter d at the slowest v was caused by wide motion of the arc spots on contact surfaces and larger deformation of break arcs.

  • Flexible Bayesian Inference by Weight Transfer for Robust Deep Neural Networks

    Thi Thu Thao KHONG  Takashi NAKADA  Yasuhiko NAKASHIMA  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2021/07/28
      Vol:
    E104-D No:11
      Page(s):
    1981-1991

    Adversarial attacks are viewed as a danger to Deep Neural Networks (DNNs), which reveal a weakness of deep learning models in security-critical applications. Recent findings have been presented adversarial training as an outstanding defense method against adversaries. Nonetheless, adversarial training is a challenge with respect to big datasets and large networks. It is believed that, unless making DNN architectures larger, DNNs would be hard to strengthen the robustness to adversarial examples. In order to avoid iteratively adversarial training, our algorithm is Bayes without Bayesian Learning (BwoBL) that performs the ensemble inference to improve the robustness. As an application of transfer learning, we use learned parameters of pretrained DNNs to build Bayesian Neural Networks (BNNs) and focus on Bayesian inference without costing Bayesian learning. In comparison with no adversarial training, our method is more robust than activation functions designed to enhance adversarial robustness. Moreover, BwoBL can easily integrate into any pretrained DNN, not only Convolutional Neural Networks (CNNs) but also other DNNs, such as Self-Attention Networks (SANs) that outperform convolutional counterparts. BwoBL is also convenient to apply to scaling networks, e.g., ResNet and EfficientNet, with better performance. Especially, our algorithm employs a variety of DNN architectures to construct BNNs against a diversity of adversarial attacks on a large-scale dataset. In particular, under l∞ norm PGD attack of pixel perturbation ε=4/255 with 100 iterations on ImageNet, our proposal in ResNets, SANs, and EfficientNets increase by 58.18% top-5 accuracy on average, which are combined with naturally pretrained ResNets, SANs, and EfficientNets. This enhancement is 62.26% on average below l2 norm C&W attack. The combination of our proposed method with pretrained EfficientNets on both natural and adversarial images (EfficientNet-ADV) drastically boosts the robustness resisting PGD and C&W attacks without additional training. Our EfficientNet-ADV-B7 achieves the cutting-edge top-5 accuracy, which is 92.14% and 94.20% on adversarial ImageNet generated by powerful PGD and C&W attacks, respectively.

  • DNN-Based Low-Musical-Noise Single-Channel Speech Enhancement Based on Higher-Order-Moments Matching

    Satoshi MIZOGUCHI  Yuki SAITO  Shinnosuke TAKAMICHI  Hiroshi SARUWATARI  

     
    PAPER-Speech and Hearing

      Pubricized:
    2021/07/30
      Vol:
    E104-D No:11
      Page(s):
    1971-1980

    We propose deep neural network (DNN)-based speech enhancement that reduces musical noise and achieves better auditory impressions. The musical noise is an artifact generated by nonlinear signal processing and negatively affects the auditory impressions. We aim to develop musical-noise-free speech enhancement methods that suppress the musical noise generation and produce perceptually-comfortable enhanced speech. DNN-based speech enhancement using a soft mask achieves high noise reduction but generates musical noise in non-speech regions. Therefore, first, we define kurtosis matching for DNN-based low-musical-noise speech enhancement. Kurtosis is the fourth-order moment and is known to correlate with the amount of musical noise. The kurtosis matching is a penalty term of the DNN training and works to reduce the amount of musical noise. We further extend this scheme to standardized-moment matching. The extended scheme involves using moments whose orders are higher than kurtosis and generalizes the conventional musical-noise-free method based on kurtosis matching. We formulate standardized-moment matching and explore how effectively the higher-order moments reduce the amount of musical noise. Experimental evaluation results 1) demonstrate that kurtosis matching can reduce musical noise without negatively affecting noise suppression and 2) newly reveal that the sixth-moment matching also achieves low-musical-noise speech enhancement as well as kurtosis matching.

  • Smaller Residual Network for Single Image Depth Estimation

    Andi HENDRA  Yasushi KANAZAWA  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2021/08/17
      Vol:
    E104-D No:11
      Page(s):
    1992-2001

    We propose a new framework for estimating depth information from a single image. Our framework is relatively small and straightforward by employing a two-stage architecture: a residual network and a simple decoder network. Our residual network in this paper is a remodeled of the original ResNet-50 architecture, which consists of only thirty-eight convolution layers in the residual block following by pair of two up-sampling and layers. While the simple decoder network, stack of five convolution layers, accepts the initial depth to be refined as the final output depth. During training, we monitor the loss behavior and adjust the learning rate hyperparameter in order to improve the performance. Furthermore, instead of using a single common pixel-wise loss, we also compute loss based on gradient-direction, and their structure similarity. This setting in our network can significantly reduce the number of network parameters, and simultaneously get a more accurate image depth map. The performance of our approach has been evaluated by conducting both quantitative and qualitative comparisons with several prior related methods on the publicly NYU and KITTI datasets.

  • A Multi-Task Scheme for Supervised DNN-Based Single-Channel Speech Enhancement by Using Speech Presence Probability as the Secondary Training Target

    Lei WANG  Jie ZHU  Kangbo SUN  

    This paper has been cancelled due to violation of duplicate submission policy on IEICE Transactions on Information and Systems.
     
    PAPER-Speech and Hearing

      Pubricized:
    2021/08/05
      Vol:
    E104-D No:11
      Page(s):
    1963-1970

    To cope with complicated interference scenarios in realistic acoustic environment, supervised deep neural networks (DNNs) are investigated to estimate different user-defined targets. Such techniques can be broadly categorized into magnitude estimation and time-frequency mask estimation techniques. Further, the mask such as the Wiener gain can be estimated directly or derived by the estimated interference power spectral density (PSD) or the estimated signal-to-interference ratio (SIR). In this paper, we propose to incorporate the multi-task learning in DNN-based single-channel speech enhancement by using the speech presence probability (SPP) as a secondary target to assist the target estimation in the main task. The domain-specific information is shared between two tasks to learn a more generalizable representation. Since the performance of multi-task network is sensitive to the weight parameters of loss function, the homoscedastic uncertainty is introduced to adaptively learn the weights, which is proven to outperform the fixed weighting method. Simulation results show the proposed multi-task scheme improves the speech enhancement performance overall compared to the conventional single-task methods. And the joint direct mask and SPP estimation yields the best performance among all the considered techniques.

  • Faster SET Operation in Phase Change Memory with Initialization Open Access

    Yuchan WANG  Suzhen YUAN  Wenxia ZHANG  Yuhan WANG  

     
    PAPER-Electronic Materials

      Pubricized:
    2021/04/14
      Vol:
    E104-C No:11
      Page(s):
    651-655

    In conclusion, an initialization method has been introduced and studied to improve the SET speed in PCM. Before experiment verification, a two-dimensional finite analysis is used, and the results illustrate the proposed method is feasible to improve SET speed. Next, the R-I performances of the discrete PCM device and the resistance distributions of a 64 M bits PCM test chip with and without the initialization have been studied and analyzed, which confirms that the writing speed has been greatly improved. At the same time, the resistance distribution for the repeated initialization operations suggest that a large number of PCM cells have been successfully changed to be in an intermediate state, which is thought that only a shorter current pulse can make the cells SET successfully in this case. Compared the transmission electron microscope (TEM) images before and after initialization, it is found that there are some small grains appeared after initialization, which indicates that the nucleation process of GST has been carried out, and only needs to provide energy for grain growth later.

  • Gradient Corrected Approximation for Binary Neural Networks

    Song CHENG  Zixuan LI  Yongsen WANG  Wanbing ZOU  Yumei ZHOU  Delong SHANG  Shushan QIAO  

     
    LETTER-Biocybernetics, Neurocomputing

      Pubricized:
    2021/07/05
      Vol:
    E104-D No:10
      Page(s):
    1784-1788

    Binary neural networks (BNNs), where both activations and weights are radically quantized to be {-1, +1}, can massively accelerate the run-time performance of convolution neural networks (CNNs) for edge devices, by computation complexity reduction and memory footprint saving. However, the non-differentiable binarizing function used in BNNs, makes the binarized models hard to be optimized, and introduces significant performance degradation than the full-precision models. Many previous works managed to correct the backward gradient of binarizing function with various improved versions of straight-through estimation (STE), or in a gradual approximate approach, but the gradient suppression problem was not analyzed and handled. Thus, we propose a novel gradient corrected approximation (GCA) method to match the discrepancy between binarizing function and backward gradient in a gradual and stable way. Our work has two primary contributions: The first is to approximate the backward gradient of binarizing function using a simple leaky-steep function with variable window size. The second is to correct the gradient approximation by standardizing the backward gradient propagated through binarizing function. Experiment results show that the proposed method outperforms the baseline by 1.5% Top-1 accuracy on ImageNet dataset without introducing extra computation cost.

  • Triplet Attention Network for Video-Based Person Re-Identification

    Rui SUN  Qili LIANG  Zi YANG  Zhenghui ZHAO  Xudong ZHANG  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2021/07/21
      Vol:
    E104-D No:10
      Page(s):
    1775-1779

    Video-based person re-identification (re-ID) aims at retrieving person across non-overlapping camera and has achieved promising results owing to deep convolutional neural network. Due to the dynamic properties of the video, the problems of background clutters and occlusion are more serious than image-based person Re-ID. In this letter, we present a novel triple attention network (TriANet) that simultaneously utilizes temporal, spatial, and channel context information by employing the self-attention mechanism to get robust and discriminative feature. Specifically, the network has two parts, where the first part introduces a residual attention subnetwork, which contains channel attention module to capture cross-dimension dependencies by using rotation and transformation and spatial attention module to focus on pedestrian feature. In the second part, a time attention module is designed to judge the quality score of each pedestrian, and to reduce the weight of the incomplete pedestrian image to alleviate the occlusion problem. We evaluate our proposed architecture on three datasets, iLIDS-VID, PRID2011 and MARS. Extensive comparative experimental results show that our proposed method achieves state-of-the-art results.

261-280hit(4079hit)