IEICE global.ieice.org Site

Keyword Search Result

[Keyword] IN(26286hit)

601-620hit(26286hit)

A Lightweight End-to-End Speech Recognition System on Embedded Devices
Yu WANG Hiromitsu NISHIZAKI

PAPER-Speech and Hearing

Pubricized:
2023/04/13
Vol:
E106-D No:7
Page(s):
1230-1239
In industry, automatic speech recognition has come to be a competitive feature for embedded products with poor hardware resources. In this work, we propose a tiny end-to-end speech recognition model that is lightweight and easily deployable on edge platforms. First, instead of sophisticated network structures, such as recurrent neural networks, transformers, etc., the model we propose mainly uses convolutional neural networks as its backbone. This ensures that our model is supported by most software development kits for embedded devices. Second, we adopt the basic unit of MobileNet-v3, which performs well in computer vision tasks, and integrate the features of the hidden layer at different scales, thus compressing the number of parameters of the model to less than 1 M and achieving an accuracy greater than that of some traditional models. Third, in order to further reduce the CPU computation, we directly extract acoustic representations from 1-dimensional speech waveforms and use a self-supervised learning approach to encourage the convergence of the model. Finally, to solve some problems where hardware resources are relatively weak, we use a prefix beam search decoder to dynamically extend the search path with an optimized pruning strategy and an additional initialism language model to capture the probability of between-words in advance and thus avoid premature pruning of correct words. In our experiments, according to a number of evaluation categories, our end-to-end model outperformed several tiny speech recognition models used for embedded devices in related work.
Improving the Accuracy of Differential-Neural Distinguisher for DES, Chaskey, and PRESENT
Liu ZHANG Zilong WANG Yindong CHEN

LETTER-Information Network

Pubricized:
2023/04/13
Vol:
E106-D No:7
Page(s):
1240-1243
In CRYPTO 2019, Gohr first introduced the deep learning method to cryptanalysis for SPECK32/64. A differential-neural distinguisher was obtained using ResNet neural network. Zhang et al. used multiple parallel convolutional layers with different kernel sizes to capture information from multiple dimensions, thus improving the accuracy or obtaining a more round of distinguisher for SPECK32/64 and SIMON32/64. Inspired by Zhang's work, we apply the network structure to other ciphers. We not only improve the accuracy of the distinguisher, but also increase the number of rounds of the distinguisher, that is, distinguish more rounds of ciphertext and random number for DES, Chaskey and PRESENT.
Unsupervised Outlier Detection based on Random Projection Outlyingness with Local Score Weighting
Akira TAMAMORI

LETTER-Artificial Intelligence, Data Mining

Pubricized:
2023/03/29
Vol:
E106-D No:7
Page(s):
1244-1248
This paper proposes an enhanced model of Random Projection Outlyingness (RPO) for unsupervised outlier detection. When datasets have multiple modalities, the RPOs have frequent detection errors. The proposed model deals with this problem via unsupervised clustering and a local score weighting. The experimental results demonstrate that the proposed model outperforms RPO and is comparable with other existing unsupervised models on benchmark datasets, in terms of in terms of Area Under the Curves (AUCs) of Receiver Operating Characteristic (ROC).
Single Image Dehazing Based on Sky Area Segmentation and Image Fusion
Xiangyang CHEN Haiyue LI Chuan LI Weiwei JIANG Hao ZHOU

LETTER-Image Processing and Video Processing

Pubricized:
2023/04/24
Vol:
E106-D No:7
Page(s):
1249-1253
Since the dark channel prior (DCP)-based dehazing method is ineffective in the sky area and will cause the problem of too dark and color distortion of the image, we propose a novel dehazing method based on sky area segmentation and image fusion. We first segment the image according to the characteristics of the sky area and non-sky area of the image, then estimate the atmospheric light and transmission map according to the DCP and correct them, and then fuse the original image after the contrast adaptive histogram equalization to improve the details information of the image. Experiments illustrate that our method performs well in dehazing and can reduce image distortion.
A Fusion Deraining Network Based on Swin Transformer and Convolutional Neural Network
Junhao TANG Guorui FENG

LETTER-Image Processing and Video Processing

Pubricized:
2023/04/24
Vol:
E106-D No:7
Page(s):
1254-1257
Single image deraining is an ill-posed problem which also has been a long-standing issue. In past few years, convolutional neural network (CNN) methods almost dominated the computer vision and achieved considerable success in image deraining. Recently the Swin Transformer-based model also showed impressive performance, even surpassed the CNN-based methods and became the state-of-the-art on high-level vision tasks. Therefore, we attempt to introduce Swin Transformer to deraining tasks. In this paper, we propose a deraining model with two sub-networks. The first sub-network includes two branches. Rain Recognition Network is a Unet with the Swin Transformer layer, which works as preliminarily restoring the background especially for the location where rain streaks appear. Detail Complement Network can extract the background detail beneath the rain streak. The second sub-network which called Refine-Unet utilizes the output of the previous one to further restore the image. Through experiments, our network achieves improvements on single image deraining compared with the previous Transformer research.
Ensemble Learning in CNN Augmented with Fully Connected Subnetworks
Daiki HIRATA Norikazu TAKAHASHI

LETTER-Biocybernetics, Neurocomputing

Pubricized:
2023/04/05
Vol:
E106-D No:7
Page(s):
1258-1261
Convolutional Neural Networks (CNNs) have shown remarkable performance in image recognition tasks. In this letter, we propose a new CNN model called the EnsNet which is composed of one base CNN and multiple Fully Connected SubNetworks (FCSNs). In this model, the set of feature maps generated by the last convolutional layer in the base CNN is divided along channels into disjoint subsets, and these subsets are assigned to the FCSNs. Each of the FCSNs is trained independent of others so that it can predict the class label of each feature map in the subset assigned to it. The output of the overall model is determined by majority vote of the base CNN and the FCSNs. Experimental results using the MNIST, Fashion-MNIST and CIFAR-10 datasets show that the proposed approach further improves the performance of CNNs. In particular, an EnsNet achieves a state-of-the-art error rate of 0.16% on MNIST.
Basic Study of Micro-Pumps for Medication Driven by Chemical Reactions
Mizuki IKEDA Satomitsu IMAI

BRIEF PAPER

Pubricized:
2022/11/28
Vol:
E106-C No:6
Page(s):
253-257
We have developed and evaluated a prototype micro-pump for a new form of medication that is driven by a chemical reaction. The chemical reaction between citric acid and sodium bicarbonate produces carbon dioxide, the pressure of which pushes the medication out. This micropump is smaller in size than conventional diaphragm-type micropumps and is suitable for swallowing.
A Multitask Learning Approach Based on Cascaded Attention Network and Self-Adaption Loss for Speech Emotion Recognition
Yang LIU Yuqi XIA Haoqin SUN Xiaolei MENG Jianxiong BAI Wenbo GUAN Zhen ZHAO Yongwei LI

PAPER-Speech and Hearing

Pubricized:
2022/12/08
Vol:
E106-A No:6
Page(s):
876-885
Speech emotion recognition (SER) has been a complex and difficult task for a long time due to emotional complexity. In this paper, we propose a multitask deep learning approach based on cascaded attention network and self-adaption loss for SER. First, non-personalized features are extracted to represent the process of emotion change while reducing external variables' influence. Second, to highlight salient speech emotion features, a cascade attention network is proposed, where spatial temporal attention can effectively locate the regions of speech that express emotion, while self-attention reduces the dependence on external information. Finally, the influence brought by the differences in gender and human perception of external information is alleviated by using a multitask learning strategy, where a self-adaption loss is introduced to determine the weights of different tasks dynamically. Experimental results on IEMOCAP dataset demonstrate that our method gains an absolute improvement of 1.97% and 0.91% over state-of-the-art strategies in terms of weighted accuracy (WA) and unweighted accuracy (UA), respectively.
Parameterized Formal Graph Systems and Their Polynomial-Time PAC Learnability
Takayoshi SHOUDAI Satoshi MATSUMOTO Yusuke SUZUKI Tomoyuki UCHIDA Tetsuhiro MIYAHARA

PAPER-Algorithms and Data Structures

Pubricized:
2022/12/14
Vol:
E106-A No:6
Page(s):
896-906
A formal graph system (FGS for short) is a logic program consisting of definite clauses whose arguments are graph patterns instead of first-order terms. The definite clauses are referred to as graph rewriting rules. An FGS is shown to be a useful unifying framework for learning graph languages. In this paper, we show the polynomial-time PAC learnability of a subclass of FGS languages defined by parameterized hereditary FGSs with bounded degree, from the viewpoint of computational learning theory. That is, we consider VH-FGSLk,Δ(m, s, t, r, w, d) as the class of FGS languages consisting of graphs of treewidth at most k and of maximum degree at most Δ which is defined by variable-hereditary FGSs consisting of m graph rewriting rules having TGP patterns as arguments. The parameters s, t, and r denote the maximum numbers of variables, atoms in the body, and arguments of each predicate symbol of each graph rewriting rule in an FGS, respectively. The parameters w and d denote the maximum number of vertices of each hyperedge and the maximum degree of each vertex of TGP patterns in each graph rewriting rule in an FGS, respectively. VH-FGSLk,Δ(m, s, t, r, w, d) has infinitely many languages even if all the parameters are bounded by constants. Then we prove that the class VH-FGSLk,Δ(m, s, t, r, w, d) is polynomial-time PAC learnable if all m, s, t, r, w, d, Δ are constants except for k.
Constructions of Low/Zero Correlation Zone Sequence Sets and Their Application in Grant-Free Non-Orthogonal Multiple Access System
Tao LIU Meiyue WANG Dongyan JIA Yubo LI

PAPER-Information Theory

Pubricized:
2022/12/16
Vol:
E106-A No:6
Page(s):
907-915
In the massive machine-type communication scenario, aiming at the problems of active user detection and channel estimation in the grant-free non-orthogonal multiple access (NOMA) system, new sets of non-orthogonal spreading sequences are proposed by using the zero/low correlation zone sequence set with low correlation among multiple sets. The simulation results show that the resulting sequence set has low coherence, which presents reliable performance for channel estimation and active user detection based on compressed sensing. Compared with the traditional Zadoff-Chu (ZC) sequences, the new non-orthogonal spreading sequences have more flexible lengths, and lower peak-to-average power ratio (PAPR) and smaller alphabet size. Consequently, these sequences will effectively solve the problem of high PAPR of time domain signals and are more suitable for low-cost devices in massive machine-type communication.
Approaches to High Performance Terahertz-Waves Emitting Devices Utilizing Single Crystals of High Temperature Superconductor Bi₂Sr₂CaCu₂O_8+δ Open Access
Takanari KASHIWAGI Genki KUWANO Shungo NAKAGAWA Mayu NAKAYAMA Jeonghyuk KIM Kanae NAGAYAMA Takuya YUHARA Takuya YAMAGUCHI Yuma SAITO Shohei SUZUKI Shotaro YAMADA Ryuta KIKUCHI Manabu TSUJIMOTO Hidetoshi MINAMI Kazuo KADOWAKI

INVITED PAPER

Pubricized:
2022/12/12
Vol:
E106-C No:6
Page(s):
281-288
Our group has developed terahertz(THz)-waves emitting devices utilizing single crystals of high temperature superconductor Bi2Sr2CaCu2O8+δ (Bi2212). The working principle of the device is based on the AC Josephson effect which is originated in the intrinsic Josephson junctions (IJJs) constructed in Bi2212 single crystals. In principle, based on the superconducting gap of the compound and the AC Josephson effect, the emission frequency range from 0.1 to 15 THz can be generated by simply adjusting bias voltages to the IJJs. In order to improve the device performances, we have performed continuous improvement to the device structures. In this paper, we present our recent approaches to high performance Bi2212 THz-waves emitters. Firstly, approaches to the reduction of self Joule heating of the devices is described. In virtue of improved device structures using Bi2212 crystal chips, the device characteristics, such as the radiation frequency and the output power, become better than previous structures. Secondly, developments of THz-waves emitting devices using IJJs-mesas coupled with external structures are explained. The results clearly indicate that the external structures are very useful not only to obtain desired radiation frequencies higher than 1 THz but also to control radiation frequency characteristics. Finally, approaches to further understanding of the spontaneous synchronization of IJJs is presented. The device characteristics obtained through the approaches would play important roles in future developments of THz-waves emitting devices by use of Bi2212 single crystals.
A Novel Discriminative Dictionary Learning Method for Image Classification
Wentao LYU Di ZHOU Chengqun WANG Lu ZHANG

PAPER-Image

Pubricized:
2022/12/14
Vol:
E106-A No:6
Page(s):
932-937
In this paper, we present a novel discriminative dictionary learning (DDL) method for image classification. The local structural relationship between samples is first built by the Laplacian eigenmaps (LE), and then integrated into the basic DDL frame to suppress inter-class ambiguity in the feature space. Moreover, in order to improve the discriminative ability of the dictionary, the category label information of training samples is formulated into the objective function of dictionary learning by considering the discriminative promotion term. Thus, the data points of original samples are transformed into a new feature space, in which the points from different categories are expected to be far apart. The test results based on the real dataset indicate the effectiveness of this method.
GazeFollowTR: A Method of Gaze Following with Reborn Mechanism
Jingzhao DAI Ming LI Xuejiao HU Yang LI Sidan DU

PAPER-Vision

Pubricized:
2022/11/30
Vol:
E106-A No:6
Page(s):
938-946
Gaze following is the task of estimating where an observer is looking inside a scene. Both the observer and scene information must be learned to determine the gaze directions and gaze points. Recently, many existing works have only focused on scenes or observers. In contrast, revealed frameworks for gaze following are limited. In this paper, a gaze following method using a hybrid transformer is proposed. Based on the conventional method (GazeFollow), we conduct three developments. First, a hybrid transformer is applied for learning head images and gaze positions. Second, the pinball loss function is utilized to control the gaze point error. Finally, a novel ReLU layer with the reborn mechanism (reborn ReLU) is conducted to replace traditional ReLU layers in different network stages. To test the performance of our developments, we train our developed framework with the DL Gaze dataset and evaluate the model on our collected set. Through our experimental results, it can be proven that our framework can achieve outperformance over our referred methods.
Simplification and Accurate Implementation of State Evolution Recursion for Conjugate Gradient
Sakyo HASHIMOTO Keigo TAKEUCHI

LETTER-Communication Theory and Signals

Pubricized:
2022/12/15
Vol:
E106-A No:6
Page(s):
952-956
This letter simplifies and analyze existing state evolution recursions for conjugate gradient. The proposed simplification reduces the complexity for solving the recursions from cubic order to square order in the total number of iterations. The simplified recursions are still catastrophically sensitive to numerical errors, so that arbitrary-precision arithmetic is used for accurate evaluation of the recursions.
Generation of Reaction-Diffusion-Pattern-Like Images with Partially Variable Size
Toru HIRAOKA

LETTER-Image

Pubricized:
2022/12/08
Vol:
E106-A No:6
Page(s):
957-961
We propose a non-photorealistic rendering method to automatically generate reaction-diffusion-pattern-like images from photographic images. The proposed method uses smoothing filter with a circular window, and changes the size of the circular window depending on the position in photographic images. By partially changing the size of the circular window, the size of reaction-diffusion patterns can be changed partially. To verify the effectiveness of the proposed method, experiments were conducted to apply the proposed method to various photographic images.
Policy-Based Grooming, Route, Spectrum, and Operational Mode Planning in Dynamic Multilayer Networks
Takafumi TANAKA Hiroshi HASEGAWA

PAPER-Fiber-Optic Transmission for Communications

Pubricized:
2022/11/30
Vol:
E106-B No:6
Page(s):
489-499
In this paper, we propose a heuristic planning method to efficiently accommodate dynamic multilayer path (MLP) demand in multilayer networks consisting of a Time Division Multiplexing (TDM) layer and a Wavelength Division Multiplexing (WDM) layer; the goal is to achieve the flexible accommodation of increasing capacity and diversifying path demands. In addition to the grooming of links at the TDM layer and the route and frequency slots for the elastic optical path to be established, MLP requires the selection of an appropriate operational mode, consisting of a combination of modulation formats and symbol rates supported by digital coherent transceivers. Our proposed MLP planning method defines a planning policy for each of these parameters and embeds the values calculated by combining these policies in an auxiliary graph, which allows the planning parameters to be calculated for MLP demand requirements in a single step. Simulations reveal that the choice of operational mode significantly reduces the blocking probability and demonstrate that the edge weights in the auxiliary graph allow MLP planning with characteristics tailored to MLP demand and network requirements. Furthermore, we quantitatively evaluate the impact of each planning policy on the MLP planning results.
Analysis of Field Uniformity in a TEM Cell Based on Finite Difference Method and Measured Field Strength
Yixing GU Zhongyuan ZHOU Yunfen CHANG Mingjie SHENG Qi ZHOU

PAPER-Electromagnetic Compatibility(EMC)

Pubricized:
2022/12/12
Vol:
E106-B No:6
Page(s):
509-517
This paper proposes a method in calculating the field distribution of the cross section in a transverse electromagnetic (TEM) cell based on the method of finite difference. Besides, E-field uniformity of the cross section is analyzed with the calculation results and the measured field strength. Analysis indicates that theoretical calculation via method proposed in this paper can guide the setup of E-field probes to some extent when it comes to the E-field uniformity analysis in a TEM cell.
High Speed ASIC Architectures for Aggregate Signature over BLS12-381
Kaoru MASADA Ryohei NAKAYAMA Makoto IKEDA

BRIEF PAPER

Pubricized:
2022/11/29
Vol:
E106-C No:6
Page(s):
331-334
BLS signature is an elliptic curve cryptography with an attractive feature that signatures can be aggregated and shortened. We have designed two ASIC architectures for hashing to the elliptic curve and pairing to minimize the latency. Also, the designs are optimized for BLS12-381, a relatively new and safe curve.
Protection of Latency-Strict Stations on WLAN Systems Using CTS-to-STA Frames
Kenichi KAWAMURA Shouta NAKAYAMA Keisuke WAKAO Takatsune MORIYAMA Yasushi TAKATORI

PAPER-Wireless Communication Technologies

Pubricized:
2022/11/28
Vol:
E106-B No:6
Page(s):
518-527
Low-latency and highly reliable communication on wireless LAN (WLAN) is difficult due to interference from the surroundings. To overcome this problem, we have developed a scheme called Clear to Send-to-Station (CTS-STA) frame transmission control that enables stable latency communication in environments with strong interference from surrounding WLAN systems. This scheme uses the basic functions of WLAN standards and is effective for both the latest and legacy standard devices. It operates when latency-strict transmission is required for an STA and there is interference from surrounding WLAN devices while minimizing the control signal overhead. Experimental evaluations with prototype systems demonstrate the effectiveness of the proposed scheme.
Unified 6G Waveform Design Based on DFT-s-OFDM Enhancements
Juan LIU Xiaolin HOU Wenjia LIU Lan CHEN Yoshihisa KISHIYAMA Takahiro ASAI

PAPER-Wireless Communication Technologies

Pubricized:
2022/12/05
Vol:
E106-B No:6
Page(s):
528-537
To achieve the extreme high data rate and extreme coverage extension requirements of 6G wireless communication, new spectrum in sub-THz (100-300GHz) and non-terrestrial network (NTN) are two of the macro trends of 6G candidate technologies, respectively. However, non-linearity of power amplifiers (PA) is a critical challenge for both sub-THz and NTN. Therefore, high power efficiency (PE) or low peak to average power ratio (PAPR) waveform design becomes one of the most significant 6G research topics. Meanwhile, high spectral efficiency (SE) and low out-of-band emission (OOBE) are still important key performance indicators (KPIs) for 6G waveform design. Single-carrier waveform discrete Fourier transform spreading orthogonal frequency division multiplexing (DFT-s-OFDM) has achieved many research interests due to its high PE, and it has been supported in 5G New Radio (NR) when uplink coverage is limited. So DFT-s-OFDM can be regarded as a candidate waveform for 6G. Many enhancement schemes based on DFT-s-OFDM have been proposed, including null cyclic prefix (NCP)/unique word (UW), frequency-domain spectral shaping (FDSS), and time-domain compression and expansion (TD-CE), etc. However, there is no unified framework to be compatible with all the enhancement schemes. This paper firstly provides a general description of the 6G candidate waveforms based on DFT-s-OFDM enhancement. Secondly, the more flexible TD-CE supporting methods for unified non-orthogonal waveform (uNOW) are proposed and discussed. Thirdly, a unified waveform framework based on DFT-s-OFDM structure is proposed. By designing the pre-processing and post-processing modules before and after DFT in the unified waveform framework, the three technical methods (NCP/UW, FDSS, and TD-CE) can be integrated to improve three KPIs of DFT-s-OFDM simultaneously with high flexibility. Then the implementation complexity of the 6G candidate waveforms are analyzed and compared. Performance of different DFT-s-OFDM enhancement schemes is investigated by link level simulation, which reveals that uNOW can achieve the best PAPR performance among all the 6G candidate waveforms. When considering PA back-off, uNOW can achieve 124% throughput gain compared to traditional DFT-s-OFDM.

601-620hit(26286hit)

Keyword Search Result

[Keyword] IN(26286hit)

A Lightweight End-to-End Speech Recognition System on Embedded Devices

Improving the Accuracy of Differential-Neural Distinguisher for DES, Chaskey, and PRESENT

Unsupervised Outlier Detection based on Random Projection Outlyingness with Local Score Weighting

Single Image Dehazing Based on Sky Area Segmentation and Image Fusion

A Fusion Deraining Network Based on Swin Transformer and Convolutional Neural Network

Ensemble Learning in CNN Augmented with Fully Connected Subnetworks

Basic Study of Micro-Pumps for Medication Driven by Chemical Reactions

A Multitask Learning Approach Based on Cascaded Attention Network and Self-Adaption Loss for Speech Emotion Recognition

Parameterized Formal Graph Systems and Their Polynomial-Time PAC Learnability

Constructions of Low/Zero Correlation Zone Sequence Sets and Their Application in Grant-Free Non-Orthogonal Multiple Access System

Approaches to High Performance Terahertz-Waves Emitting Devices Utilizing Single Crystals of High Temperature Superconductor Bi₂Sr₂CaCu₂O_8+δ Open Access

A Novel Discriminative Dictionary Learning Method for Image Classification

GazeFollowTR: A Method of Gaze Following with Reborn Mechanism

Simplification and Accurate Implementation of State Evolution Recursion for Conjugate Gradient

Generation of Reaction-Diffusion-Pattern-Like Images with Partially Variable Size

Policy-Based Grooming, Route, Spectrum, and Operational Mode Planning in Dynamic Multilayer Networks

Analysis of Field Uniformity in a TEM Cell Based on Finite Difference Method and Measured Field Strength

High Speed ASIC Architectures for Aggregate Signature over BLS12-381

Protection of Latency-Strict Stations on WLAN Systems Using CTS-to-STA Frames

Unified 6G Waveform Design Based on DFT-s-OFDM Enhancements

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles

Keyword Search Result

[Keyword] IN(26286hit)

Approaches to High Performance Terahertz-Waves Emitting Devices Utilizing Single Crystals of High Temperature Superconductor Bi2Sr2CaCu2O8+δ Open Access

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles

Approaches to High Performance Terahertz-Waves Emitting Devices Utilizing Single Crystals of High Temperature Superconductor Bi₂Sr₂CaCu₂O_8+δ Open Access