The search functionality is under construction.

Keyword Search Result

[Keyword] distillation(12hit)

1-12hit
  • Improving Sliced Wasserstein Distance with Geometric Median for Knowledge Distillation Open Access

    Hongyun LU  Mengmeng ZHANG  Hongyuan JING  Zhi LIU  

     
    LETTER-Fundamentals of Information Systems

      Pubricized:
    2024/03/08
      Vol:
    E107-D No:7
      Page(s):
    890-893

    Currently, the most advanced knowledge distillation models use a metric learning approach based on probability distributions. However, the correlation between supervised probability distributions is typically geometric and implicit, causing inefficiency and an inability to capture structural feature representations among different tasks. To overcome this problem, we propose a knowledge distillation loss using the robust sliced Wasserstein distance with geometric median (GMSW) to estimate the differences between the teacher and student representations. Due to the intuitive geometric properties of GMSW, the student model can effectively learn to align its produced hidden states from the teacher model, thereby establishing a robust correlation among implicit features. In experiment, our method outperforms state-of-the-art models in both high-resource and low-resource settings.

  • Dataset Distillation Using Parameter Pruning Open Access

    Guang LI  Ren TOGO  Takahiro OGAWA  Miki HASEYAMA  

     
    LETTER-Image

      Pubricized:
    2023/09/06
      Vol:
    E107-A No:6
      Page(s):
    936-940

    In this study, we propose a novel dataset distillation method based on parameter pruning. The proposed method can synthesize more robust distilled datasets and improve distillation performance by pruning difficult-to-match parameters during the distillation process. Experimental results on two benchmark datasets show the superiority of the proposed method.

  • PSDSpell: Pre-Training with Self-Distillation Learning for Chinese Spelling Correction Open Access

    Li HE  Xiaowu ZHANG  Jianyong DUAN  Hao WANG  Xin LI  Liang ZHAO  

     
    PAPER

      Pubricized:
    2023/10/25
      Vol:
    E107-D No:4
      Page(s):
    495-504

    Chinese spelling correction (CSC) models detect and correct a text typo based on the misspelled character and its context. Recently, Bert-based models have dominated the research of Chinese spelling correction. However, these methods only focus on the semantic information of the text during the pretraining stage, neglecting the learning of correcting spelling errors. Moreover, when multiple incorrect characters are in the text, the context introduces noisy information, making it difficult for the model to accurately detect the positions of the incorrect characters, leading to false corrections. To address these limitations, we apply the multimodal pre-trained language model ChineseBert to the task of spelling correction. We propose a self-distillation learning-based pretraining strategy, where a confusion set is used to construct text containing erroneous characters, allowing the model to jointly learns how to understand language and correct spelling errors. Additionally, we introduce a single-channel masking mechanism to mitigate the noise caused by the incorrect characters. This mechanism masks the semantic encoding channel while preserving the phonetic and glyph encoding channels, reducing the noise introduced by incorrect characters during the prediction process. Finally, experiments are conducted on widely used benchmarks. Our model achieves superior performance against state-of-the-art methods by a remarkable gain.

  • Research on Lightweight Acoustic Scene Perception Method Based on Drunkard Methodology

    Wenkai LIU  Lin ZHANG  Menglong WU  Xichang CAI  Hongxia DONG  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2023/10/23
      Vol:
    E107-D No:1
      Page(s):
    83-92

    The goal of Acoustic Scene Classification (ASC) is to simulate human analysis of the surrounding environment and make accurate decisions promptly. Extracting useful information from audio signals in real-world scenarios is challenging and can lead to suboptimal performance in acoustic scene classification, especially in environments with relatively homogeneous backgrounds. To address this problem, we model the sobering-up process of “drunkards” in real-life and the guiding behavior of normal people, and construct a high-precision lightweight model implementation methodology called the “drunkard methodology”. The core idea includes three parts: (1) designing a special feature transformation module based on the different mechanisms of information perception between drunkards and ordinary people, to simulate the process of gradually sobering up and the changes in feature perception ability; (2) studying a lightweight “drunken” model that matches the normal model's perception processing process. The model uses a multi-scale class residual block structure and can obtain finer feature representations by fusing information extracted at different scales; (3) introducing a guiding and fusion module of the conventional model to the “drunken” model to speed up the sobering-up process and achieve iterative optimization and accuracy improvement. Evaluation results on the official dataset of DCASE2022 Task1 demonstrate that our baseline system achieves 40.4% accuracy and 2.284 loss under the condition of 442.67K parameters and 19.40M MAC (multiply-accumulate operations). After adopting the “drunkard” mechanism, the accuracy is improved to 45.2%, and the loss is reduced by 0.634 under the condition of 551.89K parameters and 23.6M MAC.

  • Distilling Distribution Knowledge in Normalizing Flow

    Jungwoo KWON  Gyeonghwan KIM  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2023/04/26
      Vol:
    E106-D No:8
      Page(s):
    1287-1291

    In this letter, we propose a feature-based knowledge distillation scheme which transfers knowledge between intermediate blocks of teacher and student with flow-based architecture, specifically Normalizing flow in our implementation. In addition to the knowledge transfer scheme, we examine how configuration of the distillation positions impacts on the knowledge transfer performance. To evaluate the proposed ideas, we choose two knowledge distillation baseline models which are based on Normalizing flow on different domains: CS-Flow for anomaly detection and SRFlow-DA for super-resolution. A set of performance comparison to the baseline models with popular benchmark datasets shows promising results along with improved inference speed. The comparison includes performance analysis based on various configurations of the distillation positions in the proposed scheme.

  • A Low-Cost Neural ODE with Depthwise Separable Convolution for Edge Domain Adaptation on FPGAs

    Hiroki KAWAKAMI  Hirohisa WATANABE  Keisuke SUGIURA  Hiroki MATSUTANI  

     
    PAPER-Computer System

      Pubricized:
    2023/04/05
      Vol:
    E106-D No:7
      Page(s):
    1186-1197

    High-performance deep neural network (DNN)-based systems are in high demand in edge environments. Due to its high computational complexity, it is challenging to deploy DNNs on edge devices with strict limitations on computational resources. In this paper, we derive a compact while highly-accurate DNN model, termed dsODENet, by combining recently-proposed parameter reduction techniques: Neural ODE (Ordinary Differential Equation) and DSC (Depthwise Separable Convolution). Neural ODE exploits a similarity between ResNet and ODE, and shares most of weight parameters among multiple layers, which greatly reduces the memory consumption. We apply dsODENet to a domain adaptation as a practical use case with image classification datasets. We also propose a resource-efficient FPGA-based design for dsODENet, where all the parameters and feature maps except for pre- and post-processing layers can be mapped onto on-chip memories. It is implemented on Xilinx ZCU104 board and evaluated in terms of domain adaptation accuracy, inference speed, FPGA resource utilization, and speedup rate compared to a software counterpart. The results demonstrate that dsODENet achieves comparable or slightly better domain adaptation accuracy compared to our baseline Neural ODE implementation, while the total parameter size without pre- and post-processing layers is reduced by 54.2% to 79.8%. Our FPGA implementation accelerates the inference speed by 23.8 times.

  • A Novel Multi-Knowledge Distillation Approach

    Lianqiang LI  Kangbo SUN  Jie ZHU  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2020/10/19
      Vol:
    E104-D No:1
      Page(s):
    216-219

    Knowledge distillation approaches can transfer information from a large network (teacher network) to a small network (student network) to compress and accelerate deep neural networks. This paper proposes a novel knowledge distillation approach called multi-knowledge distillation (MKD). MKD consists of two stages. In the first stage, it employs autoencoders to learn compact and precise representations of the feature maps (FM) from the teacher network and the student network, these representations can be treated as the essential of the FM, i.e., EFM. In the second stage, MKD utilizes multiple kinds of knowledge, i.e., the magnitude of individual sample's EFM and the similarity relationships among several samples' EFM to enhance the generalization ability of the student network. Compared with previous approaches that employ FM or the handcrafted features from FM, the EFM learned from autoencoders can be transferred more efficiently and reliably. Furthermore, the rich information provided by the multiple kinds of knowledge guarantees the student network to mimic the teacher network as closely as possible. Experimental results also show that MKD is superior to the-state-of-arts.

  • SENTEI: Filter-Wise Pruning with Distillation towards Efficient Sparse Convolutional Neural Network Accelerators

    Masayuki SHIMODA  Youki SADA  Ryosuke KURAMOCHI  Shimpei SATO  Hiroki NAKAHARA  

     
    PAPER-Computer System

      Pubricized:
    2020/08/03
      Vol:
    E103-D No:12
      Page(s):
    2463-2470

    In the realization of convolutional neural networks (CNNs) in resource-constrained embedded hardware, the memory footprint of weights is one of the primary problems. Pruning techniques are often used to reduce the number of weights. However, the distribution of nonzero weights is highly skewed, which makes it more difficult to utilize the underlying parallelism. To address this problem, we present SENTEI*, filter-wise pruning with distillation, to realize hardware-aware network architecture with comparable accuracy. The filter-wise pruning eliminates weights such that each filter has the same number of nonzero weights, and retraining with distillation retains the accuracy. Further, we develop a zero-weight skipping inter-layer pipelined accelerator on an FPGA. The equalization enables inter-filter parallelism, where a processing block for a layer executes filters concurrently with straightforward architecture. Our evaluation of semantic-segmentation tasks indicates that the resulting mIoU only decreased by 0.4 points. Additionally, the speedup and power efficiency of our FPGA implementation were 33.2× and 87.9× higher than those of the mobile GPU. Therefore, our technique realizes hardware-aware network with comparable accuracy.

  • Multi Model-Based Distillation for Sound Event Detection Open Access

    Yingwei FU  Kele XU  Haibo MI  Qiuqiang KONG  Dezhi WANG  Huaimin WANG  Tie HONG  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2019/07/08
      Vol:
    E102-D No:10
      Page(s):
    2055-2058

    Sound event detection is intended to identify the sound events in audio recordings, which has widespread applications in real life. Recently, convolutional recurrent neural network (CRNN) models have achieved state-of-the-art performance in this task due to their capabilities in learning the representative features. However, the CRNN models are of high complexities with millions of parameters to be trained, which limits their usage for the mobile and embedded devices with limited computation resource. Model distillation is effective to distill the knowledge of a complex model to a smaller one, which can be deployed on the devices with limited computational power. In this letter, we propose a novel multi model-based distillation approach for sound event detection by making use of the knowledge from models of multiple teachers which are complementary in detecting sound events. Extensive experimental results demonstrated that our approach achieves a compression ratio about 50 times. In addition, better performance is obtained for the sound event detection task.

  • A Numerical Evaluation of Entanglement Sharing Protocols Using Quantum LDPC CSS Codes

    Masakazu YOSHIDA  Manabu HAGIWARA  Takayuki MIYADERA  Hideki IMAI  

     
    PAPER-Information Theory

      Vol:
    E95-A No:9
      Page(s):
    1561-1569

    Entangled states play crucial roles in quantum information theory and its applied technologies. In various protocols such as quantum teleportation and quantum key distribution, a good entangled state shared by a pair of distant players is indispensable. In this paper, we numerically examine entanglement sharing protocols using quantum LDPC CSS codes. The sum-product decoding method enables us to detect uncorrectable errors, and thus, two protocols, Detection and Resending (DR) protocol and Non-Detection (ND) protocol are considered. In DR protocol, the players abort the protocol and repeat it if they detect the uncorrectable errors, whereas in ND protocol they do not abort the protocol. We show that DR protocol yields smaller error rate than ND protocol. In addition, it is shown that rather high reliability can be achieved by DR protocol with quantum LDPC CSS codes.

  • Secret Key Agreement by Soft-Decision of Signals in Gaussian Maurer's Model

    Masashi NAITO  Shun WATANABE  Ryutaroh MATSUMOTO  Tomohiko UYEMATSU  

     
    PAPER-Information Theory

      Vol:
    E92-A No:2
      Page(s):
    525-534

    We consider the problem of secret key agreement in Gaussian Maurer's Model. In Gaussian Maurer's model, legitimate receivers, Alice and Bob, and a wire-tapper, Eve, receive signals randomly generated by a satellite through three independent memoryless Gaussian channels respectively. Then Alice and Bob generate a common secret key from their received signals. In this model, we propose a protocol for generating a common secret key by using the result of soft-decision of Alice and Bob's received signals. Then, we calculate a lower bound on the secret key rate in our proposed protocol. As a result of comparison with the protocol that only uses hard-decision, we found that the higher rate is obtained by using our protocol.

  • Secret Key Capacity and Advantage Distillation Capacity

    Jun MURAMATSU  Kazuyuki YOSHIMURA  Peter DAVIS  

     
    PAPER-Cryptography

      Vol:
    E89-A No:10
      Page(s):
    2589-2596

    Secret key agreement is a procedure for agreeing on a secret key by exchanging messages over a public channel when a sender, a legitimate receiver (henceforth referred to as a receiver), and an eavesdropper have access to correlated sources. Maurer [6] defined secret key capacity, which is the least upper bound of the key generation rate of the secret key agreement, and presented an upper and a lower bound for the secret key capacity. The advantage distillation capacity is introduced and it is shown that this quantity equals to the secret key capacity. Naive information theoretical expressions of the secret key capacity and the advantage distillation capacity are also presented. An example of correlated sources, for which an analytic expression of the secret key capacity can be obtained, is also presented.