The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] ELF(569hit)

1-20hit(569hit)

  • A Channel Contrastive Attention-Based Local-Nonlocal Mutual Block on Super-Resolution Open Access

    Yuhao LIU  Zhenzhong CHU  Lifei WEI  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2024/04/23
      Vol:
    E107-D No:9
      Page(s):
    1219-1227

    In the realm of Single Image Super-Resolution (SISR), the meticulously crafted Nonlocal Sparse Attention-based block demonstrates its efficacy in noise reduction and computational cost reduction for nonlocal (global) features. However, it neglect the traditional Convolutional-based block, which proficient in handling local features. Thus, merging both the Nonlocal Sparse Attention-based block and the Convolutional-based block to concurrently manage local and nonlocal features poses a significant challenge. To tackle the aforementioned issues, this paper introduces the Channel Contrastive Attention-based Local-Nonlocal Mutual block (CCLN) for Super-Resolution (SR). (1) We introduce the CCLN block, encompassing the Local Sparse Convolutional-based block for local features and the Nonlocal Sparse Attention-based network block for nonlocal features. (2) We introduce Channel Contrastive Attention (CCA) blocks, incorporating Sparse Aggregation into Convolutional-based blocks. Additionally, we introduce a robust framework to fuse these two blocks, ensuring that each branch operates according to its respective strengths. (3) The CCLN block can seamlessly integrate into established network backbones like the Enhanced Deep Super-Resolution network (EDSR), achieving in the Channel Attention based Local-Nonlocal Mutual Network (CCLNN). Experimental results show that our CCLNN effectively leverages both local and nonlocal features, outperforming other state-of-the-art algorithms.

  • Permissionless Blockchain-Based Sybil-Resistant Self-Sovereign Identity Utilizing Attested Execution Secure Processors Open Access

    Koichi MORIYAMA  Akira OTSUKA  

     
    INVITED PAPER

      Pubricized:
    2024/04/15
      Vol:
    E107-D No:9
      Page(s):
    1112-1122

    This article describes the idea of utilizing Attested Execution Secure Processors (AESPs) that fit into building a secure Self-Sovereign Identity (SSI) system satisfying Sybil-resistance under permissionless blockchains. Today’s circumstances requiring people to be more online have encouraged us to address digital identity preserving privacy. There is a momentum of research addressing SSI, and many researchers approach blockchain technology as a foundation. SSI brings natural persons various benefits such as owning controls; on the other side, digital identity systems in the real world require Sybil-resistance to comply with Anti-Money-Laundering (AML) and other needs. The main idea in our proposal is to utilize AESPs for three reasons: first is the use of attested execution capability along with tamper-resistance, which is a strong assumption; second is powerfulness and flexibility, allowing various open-source programs to be executed within a secure enclave, and the third is that equipping hardware-assisted security in mobile devices has become a norm. Rafael Pass et al.’s formal abstraction of AESPs and the ideal functionality $\color{brown}{\mathcal{G}_\mathtt{att}}$ enable us to formulate how hardware-assisted security works for secure digital identity systems preserving privacy under permissionless blockchains mathematically. Our proposal of the AESP-based SSI architecture and system protocols, $\color{blue}{\Pi^{\mathcal{G}_\mathtt{att}}}$, demonstrates the advantages of building a proper SSI system that satisfies the Sybil-resistant requirement. The protocols may eliminate the online distributed committee assumed in other research, such as CanDID, because of assuming AESPs; thus, $\color{blue}{\Pi^{\mathcal{G}_\mathtt{att}}}$ allows not to rely on multi-party computation (MPC), bringing drastic flexibility and efficiency compared with the existing SSI systems.

  • Confidence-Driven Contrastive Learning for Document Classification without Annotated Data Open Access

    Zhewei XU  Mizuho IWAIHARA  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2024/04/19
      Vol:
    E107-D No:8
      Page(s):
    1029-1039

    Data sparsity has always been a problem in document classification, for which semi-supervised learning and few-shot learning are studied. An even more extreme scenario is to classify documents without any annotated data, but using only category names. In this paper, we introduce a nearest neighbor search-based method Con2Class to tackle this tough task. We intend to produce embeddings for predefined categories and predict category embeddings for all the unlabeled documents in a unified embedding space, such that categories can be easily assigned by searching the nearest predefined category in the embedding space. To achieve this, we propose confidence-driven contrastive learning, in which prompt-based templates are designed and MLM-maintained contrastive loss is newly proposed to finetune a pretrained language model for embedding production. To deal with the issue that no annotated data is available to validate the classification model, we introduce confidence factor to estimate the classification ability by evaluating the prediction confidence. The language model having the highest confidence factor is used to produce embeddings for similarity evaluation. Pseudo labels are then assigned by searching the semantically closest category name, which are further used to train a separate classifier following a progressive self-training strategy for final prediction. Our experiments on five representative datasets demonstrate the superiority of our proposed method over the existing approaches.

  • Nuclear Norm Minus Frobenius Norm Minimization with Rank Residual Constraint for Image Denoising Open Access

    Hua HUANG  Yiwen SHAN  Chuan LI  Zhi WANG  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2024/04/09
      Vol:
    E107-D No:8
      Page(s):
    992-1006

    Image denoising is an indispensable process of manifold high level tasks in image processing and computer vision. However, the traditional low-rank minimization-based methods suffer from a biased problem since only the noisy observation is used to estimate the underlying clean matrix. To overcome this issue, a new low-rank minimization-based method, called nuclear norm minus Frobenius norm rank residual minimization (NFRRM), is proposed for image denoising. The propose method transforms the ill-posed image denoising problem to rank residual minimization problems through excavating the nonlocal self-similarity prior. The proposed NFRRM model can perform an accurate estimation to the underlying clean matrix through treating each rank residual component flexibly. More importantly, the global optimum of the proposed NFRRM model can be obtained in closed-form. Extensive experiments demonstrate that the proposed NFRRM method outperforms many state-of-the-art image denoising methods.

  • Differential Active Self-Interference Cancellation for Asynchronous In-Band Full-Duplex GFSK Open Access

    Shinsuke IBI  Takumi TAKAHASHI  Hisato IWAI  

     
    PAPER-Wireless Communication Technologies

      Vol:
    E107-B No:8
      Page(s):
    552-563

    This paper proposes a novel differential active self-interference canceller (DASIC) algorithm for asynchronous in-band full-duplex (IBFD) Gaussian filtered frequency shift keying (GFSK), which is designed for wireless Internet of Things (IoT). In IBFD communications, where two terminals simultaneously transmit and receive signals in the same frequency band, there is an extremely strong self-interference (SI). The SI can be mitigated by an active SI canceller (ASIC), which subtracts an interference replica based on channel state information (CSI) from the received signal. The challenging problem is the realization of asynchronous IBFD for wireless IoT in indoor environments. In the asynchronous mode, pilot contamination is induced by the non-orthogonality between asynchronous pilot sequences. In addition, the transceiver suffers from analog front-end (AFE) impairments, such as phase noise. Due to these impairments, the SI cannot be canceled entirely at the receiver, resulting in residual interference. To address the above issue, the DASIC incorporates the principle of the differential codec, which enables to suppress SI without the CSI estimation of SI owing to the differential structure. Also, on the premise of using an error correction technique, iterative detection and decoding (IDD) is applied to improve the detection capability while exchanging the extrinsic log-likelihood ratio (LLR) between the maximum a-posteriori probability (MAP) detector and the channel decoder. Finally, the validity of using the DASIC algorithm is evaluated by computer simulations in terms of the packet error rate (PER). The results clearly demonstrate the possibility of realizing asynchronous IBFD.

  • Backpressure Learning-Based Data Transmission Reliability-Aware Self-Organizing Networking for Power Line Communication in Distribution Network Open Access

    Zhan SHI  

     
    PAPER-Systems and Control

      Pubricized:
    2024/01/15
      Vol:
    E107-A No:8
      Page(s):
    1076-1084

    Power line communication (PLC) provides a flexible-access, wide-distribution, and low-cost communication solution for distribution network services. However, the PLC self-organizing networking in distribution network faces several challenges such as diversified data transmission requirements guarantee, the contradiction between long-term constraints and short-term optimization, and the uncertainty of global information. To address these challenges, we propose a backpressure learning-based data transmission reliability-aware self-organizing networking algorithm to minimize the weighted sum of node data backlogs under the long-term transmission reliability constraint. Specifically, the minimization problem is transformed by the Lyapunov optimization and backpressure algorithm. Finally, we propose a backpressure and data transmission reliability-aware state-action-reward-state-action (SARSA)-based self-organizing networking strategy to realize the PLC networking optimization. Simulation results demonstrate that the proposed algorithm has superior performances of data backlogs and transmission reliability.

  • Power Peak Load Forecasting Based on Deep Time Series Analysis Method Open Access

    Ying-Chang HUNG  Duen-Ren LIU  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2024/03/21
      Vol:
    E107-D No:7
      Page(s):
    845-856

    The prediction of peak power load is a critical factor directly impacting the stability of power supply, characterized significantly by its time series nature and intricate ties to the seasonal patterns in electricity usage. Despite its crucial importance, the current landscape of power peak load forecasting remains a multifaceted challenge in the field. This study aims to contribute to this domain by proposing a method that leverages a combination of three primary models - the GRU model, self-attention mechanism, and Transformer mechanism - to forecast peak power load. To contextualize this research within the ongoing discourse, it’s essential to consider the evolving methodologies and advancements in power peak load forecasting. By delving into additional references addressing the complexities and current state of the power peak load forecasting problem, this study aims to build upon the existing knowledge base and offer insights into contemporary challenges and strategies adopted within the field. Data preprocessing in this study involves comprehensive cleaning, standardization, and the design of relevant functions to ensure robustness in the predictive modeling process. Additionally, recognizing the necessity to capture temporal changes effectively, this research incorporates features such as “Weekly Moving Average” and “Monthly Moving Average” into the dataset. To evaluate the proposed methodologies comprehensively, this study conducts comparative analyses with established models such as LSTM, Self-attention network, Transformer, ARIMA, and SVR. The outcomes reveal that the models proposed in this study exhibit superior predictive performance compared to these established models, showcasing their effectiveness in accurately forecasting electricity consumption. The significance of this research lies in two primary contributions. Firstly, it introduces an innovative prediction method combining the GRU model, self-attention mechanism, and Transformer mechanism, aligning with the contemporary evolution of predictive modeling techniques in the field. Secondly, it introduces and emphasizes the utility of “Weekly Moving Average” and “Monthly Moving Average” methodologies, crucial in effectively capturing and interpreting seasonal variations within the dataset. By incorporating these features, this study enhances the model’s ability to account for seasonal influencing factors, thereby significantly improving the accuracy of peak power load forecasting. This contribution aligns with the ongoing efforts to refine forecasting methodologies and addresses the pertinent challenges within power peak load forecasting.

  • Analysis of Blood Cell Image Recognition Methods Based on Improved CNN and Vision Transformer Open Access

    Pingping WANG  Xinyi ZHANG  Yuyan ZHAO  Yueti LI  Kaisheng XU  Shuaiyin ZHAO  

     
    PAPER-Neural Networks and Bioengineering

      Pubricized:
    2023/09/15
      Vol:
    E107-A No:6
      Page(s):
    899-908

    Leukemia is a common and highly dangerous blood disease that requires early detection and treatment. Currently, the diagnosis of leukemia types mainly relies on the pathologist’s morphological examination of blood cell images, which is a tedious and time-consuming process, and the diagnosis results are highly subjective and prone to misdiagnosis and missed diagnosis. This research suggests a blood cell image recognition technique based on an enhanced Vision Transformer to address these problems. Firstly, this paper incorporate convolutions with token embedding to replace the positional encoding which represent coarse spatial information. Then based on the Transformer’s self-attention mechanism, this paper proposes a sparse attention module that can select identifying regions in the image, further enhancing the model’s fine-grained feature expression capability. Finally, this paper uses a contrastive loss function to further increase the intra-class consistency and inter-class difference of classification features. According to experimental results, The model in this study has an identification accuracy of 92.49% on the Munich single-cell morphological dataset, which is an improvement of 1.41% over the baseline. And comparing with sota Swin transformer, this method still get greater performance. So our method has the potential to provide reference for clinical diagnosis by physicians.

  • Joint Selfattention-SVM DDoS Attack Detection and Defense Mechanism Based on Self-Attention Mechanism and SVM Classification for SDN Networks Open Access

    Wanying MAN  Guiqin YANG  Shurui FENG  

     
    PAPER-Human Communications

      Pubricized:
    2023/09/05
      Vol:
    E107-A No:6
      Page(s):
    881-889

    Software Defined Networking (SDN), a new network architecture, allows for centralized network management by separating the control plane from the forwarding plane. Because forwarding and control is separated, distributed denial of service (DDoS) assaults provide a greater threat to SDN networks. To address the problem, this paper uses a joint high-precision attack detection combining self-attentive mechanism and support vector machine: a trigger mechanism deployed at both control and data layers is proposed to trigger the initial detection of DDoS attacks; the data in the network under attack is screened in detail using a combination of self-attentive mechanism and support vector machine; the control plane is proposed to initiate attack defense using the OpenFlow protocol features to issue flow tables for accurate classification results. The experimental results show that the trigger mechanism can react to the attack in time with less than 20% load, and the accurate detection mechanism is better than the existing inspection and testing methods, with a precision rate of 98.95% and a false alarm rate of only 1.04%. At the same time, the defense strategy can achieve timely recovery of network characteristics.

  • PSDSpell: Pre-Training with Self-Distillation Learning for Chinese Spelling Correction Open Access

    Li HE  Xiaowu ZHANG  Jianyong DUAN  Hao WANG  Xin LI  Liang ZHAO  

     
    PAPER

      Pubricized:
    2023/10/25
      Vol:
    E107-D No:4
      Page(s):
    495-504

    Chinese spelling correction (CSC) models detect and correct a text typo based on the misspelled character and its context. Recently, Bert-based models have dominated the research of Chinese spelling correction. However, these methods only focus on the semantic information of the text during the pretraining stage, neglecting the learning of correcting spelling errors. Moreover, when multiple incorrect characters are in the text, the context introduces noisy information, making it difficult for the model to accurately detect the positions of the incorrect characters, leading to false corrections. To address these limitations, we apply the multimodal pre-trained language model ChineseBert to the task of spelling correction. We propose a self-distillation learning-based pretraining strategy, where a confusion set is used to construct text containing erroneous characters, allowing the model to jointly learns how to understand language and correct spelling errors. Additionally, we introduce a single-channel masking mechanism to mitigate the noise caused by the incorrect characters. This mechanism masks the semantic encoding channel while preserving the phonetic and glyph encoding channels, reducing the noise introduced by incorrect characters during the prediction process. Finally, experiments are conducted on widely used benchmarks. Our model achieves superior performance against state-of-the-art methods by a remarkable gain.

  • Robust Visual Tracking Using Hierarchical Vision Transformer with Shifted Windows Multi-Head Self-Attention

    Peng GAO  Xin-Yue ZHANG  Xiao-Li YANG  Jian-Cheng NI  Fei WANG  

     
    LETTER-Image Recognition, Computer Vision

      Pubricized:
    2023/10/20
      Vol:
    E107-D No:1
      Page(s):
    161-164

    Despite Siamese trackers attracting much attention due to their scalability and efficiency in recent years, researchers have ignored the background appearance, which leads to their inapplicability in recognizing arbitrary target objects with various variations, especially in complex scenarios with background clutter and distractors. In this paper, we present a simple yet effective Siamese tracker, where the shifted windows multi-head self-attention is produced to learn the characteristics of a specific given target object for visual tracking. To validate the effectiveness of our proposed tracker, we use the Swin Transformer as the backbone network and introduced an auxiliary feature enhancement network. Extensive experimental results on two evaluation datasets demonstrate that the proposed tracker outperforms other baselines.

  • Inference Discrepancy Based Curriculum Learning for Neural Machine Translation

    Lei ZHOU  Ryohei SASANO  Koichi TAKEDA  

     
    PAPER-Natural Language Processing

      Pubricized:
    2023/10/18
      Vol:
    E107-D No:1
      Page(s):
    135-143

    In practice, even a well-trained neural machine translation (NMT) model can still make biased inferences on the training set due to distribution shifts. For the human learning process, if we can not reproduce something correctly after learning it multiple times, we consider it to be more difficult. Likewise, a training example causing a large discrepancy between inference and reference implies higher learning difficulty for the MT model. Therefore, we propose to adopt the inference discrepancy of each training example as the difficulty criterion, and according to which rank training examples from easy to hard. In this way, a trained model can guide the curriculum learning process of an initial model identical to itself. We put forward an analogy to this training scheme as guiding the learning process of a curriculum NMT model by a pretrained vanilla model. In this paper, we assess the effectiveness of the proposed training scheme and take an insight into the influence of translation direction, evaluation metrics and different curriculum schedules. Experimental results on translation benchmarks WMT14 English ⇒ German, WMT17 Chinese ⇒ English and Multitarget TED Talks Task (MTTT) English ⇔ German, English ⇔ Chinese, English ⇔ Russian demonstrate that our proposed method consistently improves the translation performance against the advanced Transformer baseline.

  • Loosely-Stabilizing Algorithm on Almost Maximal Independent Set

    Rongcheng DONG  Taisuke IZUMI  Naoki KITAMURA  Yuichi SUDO  Toshimitsu MASUZAWA  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2023/08/07
      Vol:
    E106-D No:11
      Page(s):
    1762-1771

    The maximal independent set (MIS) problem is one of the most fundamental problems in the field of distributed computing. This paper focuses on the MIS problem with unreliable communication between processes in the system. We propose a relaxed notion of MIS, named almost MIS (ALMIS), and show that the loosely-stabilizing algorithm proposed in our previous work can achieve exponentially long holding time with logarithmic convergence time and space complexity regarding ALMIS, which cannot be achieved at the same time regarding MIS in our previous work.

  • Measuring Motivational Pattern on Second Language Learning and its Relationships to Academic Performance: A Case Study of Blended Learning Course

    Zahra AZIZAH  Tomoya OHYAMA  Xiumin ZHAO  Yuichi OHKAWA  Takashi MITSUISHI  

     
    PAPER-Educational Technology

      Pubricized:
    2023/08/01
      Vol:
    E106-D No:11
      Page(s):
    1842-1853

    Learning analytics (LA) has emerged as a technique for educational quality improvement in many learning contexts, including blended learning (BL) courses. Numerous studies show that students' academic performance is significantly impacted by their ability to engage in self-regulated learning (SRL). In this study, learning behaviors indicating SRL and motivation are elucidated during a BL course on second language learning. Online trace data of a mobile language learning application (m-learning app) is used as a part of BL implementation. The observed motivation were of two categories: high-level motivation (study in time, study again, and early learning) and low-level motivation (cramming and catch up). As a result, students who perform well tend to engage in high-level motivation. While low performance students tend to engage in clow-level motivation. Those findings are supported by regression models showing that study in time followed by early learning significantly influences the academic performance of BL courses, both in the spring and fall semesters. Using limited resource of m-learning app log data, this BL study could explain the overall BL performance.

  • GAN-based Image Translation Model with Self-Attention for Nighttime Dashcam Data Augmentation

    Rebeka SULTANA  Gosuke OHASHI  

     
    PAPER-Intelligent Transport System

      Pubricized:
    2023/06/27
      Vol:
    E106-A No:9
      Page(s):
    1202-1210

    High-performance deep learning-based object detection models can reduce traffic accidents using dashcam images during nighttime driving. Deep learning requires a large-scale dataset to obtain a high-performance model. However, existing object detection datasets are mostly daytime scenes and a few nighttime scenes. Increasing the nighttime dataset is laborious and time-consuming. In such a case, it is possible to convert daytime images to nighttime images by image-to-image translation model to augment the nighttime dataset with less effort so that the translated dataset can utilize the annotations of the daytime dataset. Therefore, in this study, a GAN-based image-to-image translation model is proposed by incorporating self-attention with cycle consistency and content/style separation for nighttime data augmentation that shows high fidelity to annotations of the daytime dataset. Experimental results highlight the effectiveness of the proposed model compared with other models in terms of translated images and FID scores. Moreover, the high fidelity of translated images to the annotations is verified by a small object detection model according to detection results and mAP. Ablation studies confirm the effectiveness of self-attention in the proposed model. As a contribution to GAN-based data augmentation, the source code of the proposed image translation model is publicly available at https://github.com/subecky/Image-Translation-With-Self-Attention

  • Single-Power-Supply Six-Transistor CMOS SRAM Enabling Low-Voltage Writing, Low-Voltage Reading, and Low Standby Power Consumption Open Access

    Tadayoshi ENOMOTO  Nobuaki KOBAYASHI  

     
    PAPER-Electronic Circuits

      Pubricized:
    2023/03/16
      Vol:
    E106-C No:9
      Page(s):
    466-476

    We developed a self-controllable voltage level (SVL) circuit and applied this circuit to a single-power-supply, six-transistor complementary metal-oxide-semiconductor static random-access memory (SRAM) to not only improve both write and read performances but also to achieve low standby power and data retention (holding) capability. The SVL circuit comprises only three MOSFETs (i.e., pull-up, pull-down and bypass MOSFETs). The SVL circuit is able to adaptively generate both optimal memory cell voltages and word line voltages depending on which mode of operation (i.e., write, read or hold operation) was used. The write margin (VWM) and read margin (VRM) of the developed (dvlp) SRAM at a supply voltage (VDD) of 1V were 0.470 and 0.1923V, respectively. These values were 1.309 and 2.093 times VWM and VRM of the conventional (conv) SRAM, respectively. At a large threshold voltage (Vt) variability (=+6σ), the minimum power supply voltage (VMin) for the write operation of the conv SRAM was 0.37V, whereas it decreased to 0.22V for the dvlp SRAM. VMin for the read operation of the conv SRAM was 1.05V when the Vt variability (=-6σ) was large, but the dvlp SRAM lowered it to 0.41V. These results show that the SVL circuit expands the operating voltage range for both write and read operations to lower voltages. The dvlp SRAM reduces the standby power consumption (PST) while retaining data. The measured PST of the 2k-bit, 90-nm dvlp SRAM was only 0.957µW at VDD=1.0V, which was 9.46% of PST of the conv SRAM (10.12µW). The Si area overhead of the SVL circuits was only 1.383% of the dvlp SRAM.

  • Rank Metric Codes and Their Galois Duality

    Qing GAO  Yang DING  

     
    LETTER-Coding Theory

      Pubricized:
    2023/02/20
      Vol:
    E106-A No:8
      Page(s):
    1067-1071

    In this paper, we describe the Galois dual of rank metric codes in the ambient space FQn×m and FQmn, where Q=qe. We obtain connections between the duality of rank metric codes with respect to distinct Galois inner products. Furthermore, for 0 ≤ s < e, we introduce the concept of qsm-dual bases of FQm over FQ and obtain some conditions about the existence of qsm-self-dual basis.

  • Reliable and Efficient Chip-PCB Hybrid PUF and Lightweight Key Generator

    Yuanzhong XU  Tao KE  Wenjun CAO  Yao FU  Zhangqing HE  

     
    PAPER-Electronic Circuits

      Pubricized:
    2023/03/10
      Vol:
    E106-C No:8
      Page(s):
    432-441

    Physical Unclonable Function (PUF) is a promising lightweight hardware security primitive that can extract device fingerprints for encryption or authentication. However, extracting fingerprints from either the chip or the board individually has security flaws and cannot provide hardware system-level security. This paper proposes a new Chip-PCB hybrid PUF(CPR PUF) in which Weak PUF on PCB is combined with Strong PUF inside the chip to generate massive responses under the control of challenges of on-chip Strong PUF. This structure tightly couples the chip and PCB into an inseparable and unclonable unit thus can verify the authenticity of chip as well as the board. To improve the uniformity and reliability of Chip-PCB hybrid PUF, we propose a lightweight key generator based on a reliability self-test and debiasing algorithm to extract massive stable and secure keys from unreliable and biased PUF responses, which eliminates expensive error correction processes. The FPGA-based test results show that the PUF responses after robust extraction and debiasing achieve high uniqueness, reliability, uniformity and anti-counterfeiting features. Moreover, the key generator greatly reduces the execution cost and the bit error rate of the keys is less than 10-9, the overall security of the key is also improved by eliminating the entropy leakage of helper data.

  • Temporal-Based Action Clustering for Motion Tendencies

    Xingyu QIAN  Xiaogang CHEN  Aximu YUEMAIER  Shunfen LI  Weibang DAI  Zhitang SONG  

     
    LETTER-Artificial Intelligence, Data Mining

      Pubricized:
    2023/05/02
      Vol:
    E106-D No:8
      Page(s):
    1292-1295

    Video-based action recognition encompasses the recognition of appearance and the classification of action types. This work proposes a discrete-temporal-sequence-based motion tendency clustering framework to implement motion clustering by extracting motion tendencies and self-supervised learning. A published traffic intersection dataset (inD) and a self-produced gesture video set are used for evaluation and to validate the motion tendency action recognition hypothesis.

  • A Lightweight End-to-End Speech Recognition System on Embedded Devices

    Yu WANG  Hiromitsu NISHIZAKI  

     
    PAPER-Speech and Hearing

      Pubricized:
    2023/04/13
      Vol:
    E106-D No:7
      Page(s):
    1230-1239

    In industry, automatic speech recognition has come to be a competitive feature for embedded products with poor hardware resources. In this work, we propose a tiny end-to-end speech recognition model that is lightweight and easily deployable on edge platforms. First, instead of sophisticated network structures, such as recurrent neural networks, transformers, etc., the model we propose mainly uses convolutional neural networks as its backbone. This ensures that our model is supported by most software development kits for embedded devices. Second, we adopt the basic unit of MobileNet-v3, which performs well in computer vision tasks, and integrate the features of the hidden layer at different scales, thus compressing the number of parameters of the model to less than 1 M and achieving an accuracy greater than that of some traditional models. Third, in order to further reduce the CPU computation, we directly extract acoustic representations from 1-dimensional speech waveforms and use a self-supervised learning approach to encourage the convergence of the model. Finally, to solve some problems where hardware resources are relatively weak, we use a prefix beam search decoder to dynamically extend the search path with an optimized pruning strategy and an additional initialism language model to capture the probability of between-words in advance and thus avoid premature pruning of correct words. In our experiments, according to a number of evaluation categories, our end-to-end model outperformed several tiny speech recognition models used for embedded devices in related work.

1-20hit(569hit)