IEICE global.ieice.org Site

Keyword Search Result

[Keyword] distillation(12hit)

1-12hit

Improving Sliced Wasserstein Distance with Geometric Median for Knowledge Distillation Open Access
Hongyun LU Mengmeng ZHANG Hongyuan JING Zhi LIU

LETTER-Fundamentals of Information Systems

Pubricized:
2024/03/08
Vol:
E107-D No:7
Page(s):
890-893
Currently, the most advanced knowledge distillation models use a metric learning approach based on probability distributions. However, the correlation between supervised probability distributions is typically geometric and implicit, causing inefficiency and an inability to capture structural feature representations among different tasks. To overcome this problem, we propose a knowledge distillation loss using the robust sliced Wasserstein distance with geometric median (GMSW) to estimate the differences between the teacher and student representations. Due to the intuitive geometric properties of GMSW, the student model can effectively learn to align its produced hidden states from the teacher model, thereby establishing a robust correlation among implicit features. In experiment, our method outperforms state-of-the-art models in both high-resource and low-resource settings.
Dataset Distillation Using Parameter Pruning Open Access
Guang LI Ren TOGO Takahiro OGAWA Miki HASEYAMA

LETTER-Image

Pubricized:
2023/09/06
Vol:
E107-A No:6
Page(s):
936-940
In this study, we propose a novel dataset distillation method based on parameter pruning. The proposed method can synthesize more robust distilled datasets and improve distillation performance by pruning difficult-to-match parameters during the distillation process. Experimental results on two benchmark datasets show the superiority of the proposed method.
PSDSpell: Pre-Training with Self-Distillation Learning for Chinese Spelling Correction Open Access
Li HE Xiaowu ZHANG Jianyong DUAN Hao WANG Xin LI Liang ZHAO

PAPER

Pubricized:
2023/10/25
Vol:
E107-D No:4
Page(s):
495-504
Chinese spelling correction (CSC) models detect and correct a text typo based on the misspelled character and its context. Recently, Bert-based models have dominated the research of Chinese spelling correction. However, these methods only focus on the semantic information of the text during the pretraining stage, neglecting the learning of correcting spelling errors. Moreover, when multiple incorrect characters are in the text, the context introduces noisy information, making it difficult for the model to accurately detect the positions of the incorrect characters, leading to false corrections. To address these limitations, we apply the multimodal pre-trained language model ChineseBert to the task of spelling correction. We propose a self-distillation learning-based pretraining strategy, where a confusion set is used to construct text containing erroneous characters, allowing the model to jointly learns how to understand language and correct spelling errors. Additionally, we introduce a single-channel masking mechanism to mitigate the noise caused by the incorrect characters. This mechanism masks the semantic encoding channel while preserving the phonetic and glyph encoding channels, reducing the noise introduced by incorrect characters during the prediction process. Finally, experiments are conducted on widely used benchmarks. Our model achieves superior performance against state-of-the-art methods by a remarkable gain.
Research on Lightweight Acoustic Scene Perception Method Based on Drunkard Methodology
Wenkai LIU Lin ZHANG Menglong WU Xichang CAI Hongxia DONG

PAPER-Artificial Intelligence, Data Mining

Pubricized:
2023/10/23
Vol:
E107-D No:1
Page(s):
83-92
The goal of Acoustic Scene Classification (ASC) is to simulate human analysis of the surrounding environment and make accurate decisions promptly. Extracting useful information from audio signals in real-world scenarios is challenging and can lead to suboptimal performance in acoustic scene classification, especially in environments with relatively homogeneous backgrounds. To address this problem, we model the sobering-up process of “drunkards” in real-life and the guiding behavior of normal people, and construct a high-precision lightweight model implementation methodology called the “drunkard methodology”. The core idea includes three parts: (1) designing a special feature transformation module based on the different mechanisms of information perception between drunkards and ordinary people, to simulate the process of gradually sobering up and the changes in feature perception ability; (2) studying a lightweight “drunken” model that matches the normal model's perception processing process. The model uses a multi-scale class residual block structure and can obtain finer feature representations by fusing information extracted at different scales; (3) introducing a guiding and fusion module of the conventional model to the “drunken” model to speed up the sobering-up process and achieve iterative optimization and accuracy improvement. Evaluation results on the official dataset of DCASE2022 Task1 demonstrate that our baseline system achieves 40.4% accuracy and 2.284 loss under the condition of 442.67K parameters and 19.40M MAC (multiply-accumulate operations). After adopting the “drunkard” mechanism, the accuracy is improved to 45.2%, and the loss is reduced by 0.634 under the condition of 551.89K parameters and 23.6M MAC.
Distilling Distribution Knowledge in Normalizing Flow
Jungwoo KWON Gyeonghwan KIM

LETTER-Artificial Intelligence, Data Mining

Pubricized:
2023/04/26
Vol:
E106-D No:8
Page(s):
1287-1291
In this letter, we propose a feature-based knowledge distillation scheme which transfers knowledge between intermediate blocks of teacher and student with flow-based architecture, specifically Normalizing flow in our implementation. In addition to the knowledge transfer scheme, we examine how configuration of the distillation positions impacts on the knowledge transfer performance. To evaluate the proposed ideas, we choose two knowledge distillation baseline models which are based on Normalizing flow on different domains: CS-Flow for anomaly detection and SRFlow-DA for super-resolution. A set of performance comparison to the baseline models with popular benchmark datasets shows promising results along with improved inference speed. The comparison includes performance analysis based on various configurations of the distillation positions in the proposed scheme.
A Low-Cost Neural ODE with Depthwise Separable Convolution for Edge Domain Adaptation on FPGAs
Hiroki KAWAKAMI Hirohisa WATANABE Keisuke SUGIURA Hiroki MATSUTANI

PAPER-Computer System

Pubricized:
2023/04/05
Vol:
E106-D No:7
Page(s):
1186-1197
High-performance deep neural network (DNN)-based systems are in high demand in edge environments. Due to its high computational complexity, it is challenging to deploy DNNs on edge devices with strict limitations on computational resources. In this paper, we derive a compact while highly-accurate DNN model, termed dsODENet, by combining recently-proposed parameter reduction techniques: Neural ODE (Ordinary Differential Equation) and DSC (Depthwise Separable Convolution). Neural ODE exploits a similarity between ResNet and ODE, and shares most of weight parameters among multiple layers, which greatly reduces the memory consumption. We apply dsODENet to a domain adaptation as a practical use case with image classification datasets. We also propose a resource-efficient FPGA-based design for dsODENet, where all the parameters and feature maps except for pre- and post-processing layers can be mapped onto on-chip memories. It is implemented on Xilinx ZCU104 board and evaluated in terms of domain adaptation accuracy, inference speed, FPGA resource utilization, and speedup rate compared to a software counterpart. The results demonstrate that dsODENet achieves comparable or slightly better domain adaptation accuracy compared to our baseline Neural ODE implementation, while the total parameter size without pre- and post-processing layers is reduced by 54.2% to 79.8%. Our FPGA implementation accelerates the inference speed by 23.8 times.
A Novel Multi-Knowledge Distillation Approach
Lianqiang LI Kangbo SUN Jie ZHU

LETTER-Artificial Intelligence, Data Mining

Pubricized:
2020/10/19
Vol:
E104-D No:1
Page(s):
216-219
Knowledge distillation approaches can transfer information from a large network (teacher network) to a small network (student network) to compress and accelerate deep neural networks. This paper proposes a novel knowledge distillation approach called multi-knowledge distillation (MKD). MKD consists of two stages. In the first stage, it employs autoencoders to learn compact and precise representations of the feature maps (FM) from the teacher network and the student network, these representations can be treated as the essential of the FM, i.e., EFM. In the second stage, MKD utilizes multiple kinds of knowledge, i.e., the magnitude of individual sample's EFM and the similarity relationships among several samples' EFM to enhance the generalization ability of the student network. Compared with previous approaches that employ FM or the handcrafted features from FM, the EFM learned from autoencoders can be transferred more efficiently and reliably. Furthermore, the rich information provided by the multiple kinds of knowledge guarantees the student network to mimic the teacher network as closely as possible. Experimental results also show that MKD is superior to the-state-of-arts.
SENTEI: Filter-Wise Pruning with Distillation towards Efficient Sparse Convolutional Neural Network Accelerators
Masayuki SHIMODA Youki SADA Ryosuke KURAMOCHI Shimpei SATO Hiroki NAKAHARA

PAPER-Computer System

Pubricized:
2020/08/03
Vol:
E103-D No:12
Page(s):
2463-2470
In the realization of convolutional neural networks (CNNs) in resource-constrained embedded hardware, the memory footprint of weights is one of the primary problems. Pruning techniques are often used to reduce the number of weights. However, the distribution of nonzero weights is highly skewed, which makes it more difficult to utilize the underlying parallelism. To address this problem, we present SENTEI*, filter-wise pruning with distillation, to realize hardware-aware network architecture with comparable accuracy. The filter-wise pruning eliminates weights such that each filter has the same number of nonzero weights, and retraining with distillation retains the accuracy. Further, we develop a zero-weight skipping inter-layer pipelined accelerator on an FPGA. The equalization enables inter-filter parallelism, where a processing block for a layer executes filters concurrently with straightforward architecture. Our evaluation of semantic-segmentation tasks indicates that the resulting mIoU only decreased by 0.4 points. Additionally, the speedup and power efficiency of our FPGA implementation were 33.2× and 87.9× higher than those of the mobile GPU. Therefore, our technique realizes hardware-aware network with comparable accuracy.
Multi Model-Based Distillation for Sound Event Detection Open Access
Yingwei FU Kele XU Haibo MI Qiuqiang KONG Dezhi WANG Huaimin WANG Tie HONG

LETTER-Artificial Intelligence, Data Mining

Pubricized:
2019/07/08
Vol:
E102-D No:10
Page(s):
2055-2058
Sound event detection is intended to identify the sound events in audio recordings, which has widespread applications in real life. Recently, convolutional recurrent neural network (CRNN) models have achieved state-of-the-art performance in this task due to their capabilities in learning the representative features. However, the CRNN models are of high complexities with millions of parameters to be trained, which limits their usage for the mobile and embedded devices with limited computation resource. Model distillation is effective to distill the knowledge of a complex model to a smaller one, which can be deployed on the devices with limited computational power. In this letter, we propose a novel multi model-based distillation approach for sound event detection by making use of the knowledge from models of multiple teachers which are complementary in detecting sound events. Extensive experimental results demonstrated that our approach achieves a compression ratio about 50 times. In addition, better performance is obtained for the sound event detection task.
A Numerical Evaluation of Entanglement Sharing Protocols Using Quantum LDPC CSS Codes
Masakazu YOSHIDA Manabu HAGIWARA Takayuki MIYADERA Hideki IMAI

PAPER-Information Theory

Vol:
E95-A No:9
Page(s):
1561-1569
Entangled states play crucial roles in quantum information theory and its applied technologies. In various protocols such as quantum teleportation and quantum key distribution, a good entangled state shared by a pair of distant players is indispensable. In this paper, we numerically examine entanglement sharing protocols using quantum LDPC CSS codes. The sum-product decoding method enables us to detect uncorrectable errors, and thus, two protocols, Detection and Resending (DR) protocol and Non-Detection (ND) protocol are considered. In DR protocol, the players abort the protocol and repeat it if they detect the uncorrectable errors, whereas in ND protocol they do not abort the protocol. We show that DR protocol yields smaller error rate than ND protocol. In addition, it is shown that rather high reliability can be achieved by DR protocol with quantum LDPC CSS codes.
Secret Key Agreement by Soft-Decision of Signals in Gaussian Maurer's Model
Masashi NAITO Shun WATANABE Ryutaroh MATSUMOTO Tomohiko UYEMATSU

PAPER-Information Theory

Vol:
E92-A No:2
Page(s):
525-534
We consider the problem of secret key agreement in Gaussian Maurer's Model. In Gaussian Maurer's model, legitimate receivers, Alice and Bob, and a wire-tapper, Eve, receive signals randomly generated by a satellite through three independent memoryless Gaussian channels respectively. Then Alice and Bob generate a common secret key from their received signals. In this model, we propose a protocol for generating a common secret key by using the result of soft-decision of Alice and Bob's received signals. Then, we calculate a lower bound on the secret key rate in our proposed protocol. As a result of comparison with the protocol that only uses hard-decision, we found that the higher rate is obtained by using our protocol.
Secret Key Capacity and Advantage Distillation Capacity
Jun MURAMATSU Kazuyuki YOSHIMURA Peter DAVIS

PAPER-Cryptography

Vol:
E89-A No:10
Page(s):
2589-2596
Secret key agreement is a procedure for agreeing on a secret key by exchanging messages over a public channel when a sender, a legitimate receiver (henceforth referred to as a receiver), and an eavesdropper have access to correlated sources. Maurer [6] defined secret key capacity, which is the least upper bound of the key generation rate of the secret key agreement, and presented an upper and a lower bound for the secret key capacity. The advantage distillation capacity is introduced and it is shown that this quantity equals to the secret key capacity. Naive information theoretical expressions of the secret key capacity and the advantage distillation capacity are also presented. An example of correlated sources, for which an analytic expression of the secret key capacity can be obtained, is also presented.

Keyword Search Result

[Keyword] distillation(12hit)

Improving Sliced Wasserstein Distance with Geometric Median for Knowledge Distillation Open Access

Dataset Distillation Using Parameter Pruning Open Access

PSDSpell: Pre-Training with Self-Distillation Learning for Chinese Spelling Correction Open Access

Research on Lightweight Acoustic Scene Perception Method Based on Drunkard Methodology

Distilling Distribution Knowledge in Normalizing Flow

A Low-Cost Neural ODE with Depthwise Separable Convolution for Edge Domain Adaptation on FPGAs

A Novel Multi-Knowledge Distillation Approach

SENTEI: Filter-Wise Pruning with Distillation towards Efficient Sparse Convolutional Neural Network Accelerators

Multi Model-Based Distillation for Sound Event Detection Open Access

A Numerical Evaluation of Entanglement Sharing Protocols Using Quantum LDPC CSS Codes

Secret Key Agreement by Soft-Decision of Signals in Gaussian Maurer's Model

Secret Key Capacity and Advantage Distillation Capacity

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles