The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] SEM(688hit)

21-40hit(688hit)

  • Single-Power-Supply Six-Transistor CMOS SRAM Enabling Low-Voltage Writing, Low-Voltage Reading, and Low Standby Power Consumption Open Access

    Tadayoshi ENOMOTO  Nobuaki KOBAYASHI  

     
    PAPER-Electronic Circuits

      Pubricized:
    2023/03/16
      Vol:
    E106-C No:9
      Page(s):
    466-476

    We developed a self-controllable voltage level (SVL) circuit and applied this circuit to a single-power-supply, six-transistor complementary metal-oxide-semiconductor static random-access memory (SRAM) to not only improve both write and read performances but also to achieve low standby power and data retention (holding) capability. The SVL circuit comprises only three MOSFETs (i.e., pull-up, pull-down and bypass MOSFETs). The SVL circuit is able to adaptively generate both optimal memory cell voltages and word line voltages depending on which mode of operation (i.e., write, read or hold operation) was used. The write margin (VWM) and read margin (VRM) of the developed (dvlp) SRAM at a supply voltage (VDD) of 1V were 0.470 and 0.1923V, respectively. These values were 1.309 and 2.093 times VWM and VRM of the conventional (conv) SRAM, respectively. At a large threshold voltage (Vt) variability (=+6σ), the minimum power supply voltage (VMin) for the write operation of the conv SRAM was 0.37V, whereas it decreased to 0.22V for the dvlp SRAM. VMin for the read operation of the conv SRAM was 1.05V when the Vt variability (=-6σ) was large, but the dvlp SRAM lowered it to 0.41V. These results show that the SVL circuit expands the operating voltage range for both write and read operations to lower voltages. The dvlp SRAM reduces the standby power consumption (PST) while retaining data. The measured PST of the 2k-bit, 90-nm dvlp SRAM was only 0.957µW at VDD=1.0V, which was 9.46% of PST of the conv SRAM (10.12µW). The Si area overhead of the SVL circuits was only 1.383% of the dvlp SRAM.

  • Ensemble Learning in CNN Augmented with Fully Connected Subnetworks

    Daiki HIRATA  Norikazu TAKAHASHI  

     
    LETTER-Biocybernetics, Neurocomputing

      Pubricized:
    2023/04/05
      Vol:
    E106-D No:7
      Page(s):
    1258-1261

    Convolutional Neural Networks (CNNs) have shown remarkable performance in image recognition tasks. In this letter, we propose a new CNN model called the EnsNet which is composed of one base CNN and multiple Fully Connected SubNetworks (FCSNs). In this model, the set of feature maps generated by the last convolutional layer in the base CNN is divided along channels into disjoint subsets, and these subsets are assigned to the FCSNs. Each of the FCSNs is trained independent of others so that it can predict the class label of each feature map in the subset assigned to it. The output of the overall model is determined by majority vote of the base CNN and the FCSNs. Experimental results using the MNIST, Fashion-MNIST and CIFAR-10 datasets show that the proposed approach further improves the performance of CNNs. In particular, an EnsNet achieves a state-of-the-art error rate of 0.16% on MNIST.

  • Conflict Reduction of Acyclic Flow Event Structures

    Toshiyuki MIYAMOTO  Marika IZAWA  

     
    PAPER

      Pubricized:
    2022/10/26
      Vol:
    E106-A No:5
      Page(s):
    707-714

    Event structures are a well-known modeling formalism for concurrent systems with causality and conflict relations. The flow event structure (FES) is a variant of event structures, which is a generalization of the prime event structure. In an FES, two events may be in conflict even though they are not syntactically in conflict; this is called a semantic conflict. The existence of semantic conflict in an FES motivates reducing conflict relations (i.e., conflict reduction) to obtain a simpler structure. In this paper, we study conflict reduction in acyclic FESs. A necessary and sufficient condition for conflict reduction is given; algorithms to compute semantic conflict, local configurations, and conflict reduction are proposed. A great time reduction was observed in computational experiments when comparing the proposed with the naive method.

  • Semantic Path Planning for Indoor Navigation Tasks Using Multi-View Context and Prior Knowledge

    Jianbing WU  Weibo HUANG  Guoliang HUA  Wanruo ZHANG  Risheng KANG  Hong LIU  

     
    PAPER-Positioning and Navigation

      Pubricized:
    2022/01/20
      Vol:
    E106-D No:5
      Page(s):
    756-764

    Recently, deep reinforcement learning (DRL) methods have significantly improved the performance of target-driven indoor navigation tasks. However, the rich semantic information of environments is still not fully exploited in previous approaches. In addition, existing methods usually tend to overfit on training scenes or objects in target-driven navigation tasks, making it hard to generalize to unseen environments. Human beings can easily adapt to new scenes as they can recognize the objects they see and reason the possible locations of target objects using their experience. Inspired by this, we propose a DRL-based target-driven navigation model, termed MVC-PK, using Multi-View Context information and Prior semantic Knowledge. It relies only on the semantic label of target objects and allows the robot to find the target without using any geometry map. To perceive the semantic contextual information in the environment, object detectors are leveraged to detect the objects present in the multi-view observations. To enable the semantic reasoning ability of indoor mobile robots, a Graph Convolutional Network is also employed to incorporate prior knowledge. The proposed MVC-PK model is evaluated in the AI2-THOR simulation environment. The results show that MVC-PK (1) significantly improves the cross-scene and cross-target generalization ability, and (2) achieves state-of-the-art performance with 15.2% and 11.0% increase in Success Rate (SR) and Success weighted by Path Length (SPL), respectively.

  • Effectively Utilizing the Category Labels for Image Captioning

    Junlong FENG  Jianping ZHAO  

     
    PAPER-Core Methods

      Pubricized:
    2021/12/13
      Vol:
    E106-D No:5
      Page(s):
    617-624

    As a further investigation of the image captioning task, some works extended the vision-text dataset for specific subtasks, such as the stylized caption generating. The corpus in such dataset is usually composed of obvious sentiment-bearing words. While, in some special cases, the captions are classified depending on image category. This will result in a latent problem: the generated sentences are in close semantic meaning but belong to different or even opposite categories. It is a worthy issue to explore an effective way to utilize the image category label to boost the caption difference. Therefore, we proposed an image captioning network with the label control mechanism (LCNET) in this paper. First, to further improve the caption difference, LCNET employs a semantic enhancement module to provide the decoder with global semantic vectors. Then, through the proposed label control LSTM, LCNET can dynamically modulate the caption generation depending on the image category labels. Finally, the decoder integrates the spatial image features with global semantic vectors to output the caption. Using all the standard evaluation metrics shows that our model outperforms the compared models. Caption analysis demonstrates our approach can improve the performance of semantic representation. Compared with other label control mechanisms, our model is capable of boosting the caption difference according to the labels and keeping a better consistent with image content as well.

  • SPSD: Semantics and Deep Reinforcement Learning Based Motion Planning for Supermarket Robot

    Jialun CAI  Weibo HUANG  Yingxuan YOU  Zhan CHEN  Bin REN  Hong LIU  

     
    PAPER-Positioning and Navigation

      Pubricized:
    2022/09/15
      Vol:
    E106-D No:5
      Page(s):
    765-772

    Robot motion planning is an important part of the unmanned supermarket. The challenges of motion planning in supermarkets lie in the diversity of the supermarket environment, the complexity of obstacle movement, the vastness of the search space. This paper proposes an adaptive Search and Path planning method based on the Semantic information and Deep reinforcement learning (SPSD), which effectively improves the autonomous decision-making ability of supermarket robots. Firstly, based on the backbone of deep reinforcement learning (DRL), supermarket robots process real-time information from multi-modality sensors to realize high-speed and collision-free motion planning. Meanwhile, in order to solve the problem caused by the uncertainty of the reward in the deep reinforcement learning, common spatial semantic relationships between landmarks and target objects are exploited to define reward function. Finally, dynamics randomization is introduced to improve the generalization performance of the algorithm in the training. The experimental results show that the SPSD algorithm is excellent in the three indicators of generalization performance, training time and path planning length. Compared with other methods, the training time of SPSD is reduced by 27.42% at most, the path planning length is reduced by 21.08% at most, and the trained network of SPSD can be applied to unfamiliar scenes safely and efficiently. The results are motivating enough to consider the application of the proposed method in practical scenes. We have uploaded the video of the results of the experiment to https://www.youtube.com/watch?v=h1wLpm42NZk.

  • Chinese Named Entity Recognition Method Based on Dictionary Semantic Knowledge Enhancement

    Tianbin WANG  Ruiyang HUANG  Nan HU  Huansha WANG  Guanghan CHU  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2023/02/15
      Vol:
    E106-D No:5
      Page(s):
    1010-1017

    Chinese Named Entity Recognition is the fundamental technology in the field of the Chinese Natural Language Process. It is extensively adopted into information extraction, intelligent question answering, and knowledge graph. Nevertheless, due to the diversity and complexity of Chinese, most Chinese NER methods fail to sufficiently capture the character granularity semantics, which affects the performance of the Chinese NER. In this work, we propose DSKE-Chinese NER: Chinese Named Entity Recognition based on Dictionary Semantic Knowledge Enhancement. We novelly integrate the semantic information of character granularity into the vector space of characters and acquire the vector representation containing semantic information by the attention mechanism. In addition, we verify the appropriate number of semantic layers through the comparative experiment. Experiments on public Chinese datasets such as Weibo, Resume and MSRA show that the model outperforms character-based LSTM baselines.

  • Choice Disjunctive Queries in Logic Programming

    Keehang KWON  Daeseong KANG  

     
    LETTER

      Pubricized:
    2022/12/19
      Vol:
    E106-D No:3
      Page(s):
    333-336

    One of the long-standing research problems on logic programming is to treat the cut predicate in a logical, high-level way. We argue that this problem can be solved by adopting linear logic and choice-disjunctive goal formulas of the form G0 ⊕ G1 where G0, G1 are goals. These goals have the following intended semantics: choose the true disjunct Gi and execute Gi where i (= 0 or 1), while discarding the unchosen disjunct. Note that only one goal can remain alive during execution. These goals thus allow us to specify mutually exclusive tasks in a high-level way. Note that there is another use of cut which is for breaking out of failure-driven loops and efficient heap management. Unfortunately, it is not possible to replace cut of this kind with use of choice-disjunctive goals.

  • Ensemble-Based Method for Correcting Global Explanation of Prediction Model

    Masaki HAMAMOTO  Hiroyuki NAMBA  Masashi EGI  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2022/11/15
      Vol:
    E106-D No:2
      Page(s):
    218-228

    Explainable artificial intelligence (AI) technology enables us to quantitatively analyze the whole prediction logic of AI as a global explanation. However, unwanted relationships learned by AI due to data sparsity, high dimensionality, and noise are also visualized in the explanation, which deteriorates confidence in the AI. Thus, methods for correcting those unwanted relationships in explanation has been developed. However, since these methods are applicable only to differentiable machine learning (ML) models but not to non-differentiable models such as tree-based models, they are insufficient for covering a wide range of ML technology. Since these methods also require re-training of the model for correcting its explanation (i.e., in-processing method), they cannot be applied to black-box models provided by third parties. Therefore, we propose a method called ensemble-based explanation correction (EBEC) as a post-processing method for correcting the global explanation of a prediction model in a model-agnostic manner by using the Rashomon effect of statistics. We evaluated the performance of EBEC with three different tasks and analyzed its function in more detail. The evaluation results indicate that EBEC can correct global explanation of the model so that the explanation aligns with the domain knowledge given by the user while maintaining its accuracy. EBEC can be extended in various ways and combined with any method to improve correction performance since it is a post-processing-type correction method. Hence, EBEC would contribute to high-productivity ML modeling as a new type of explanation-correction method.

  • Chinese Lexical Sememe Prediction Using CilinE Knowledge

    Hao WANG  Sirui LIU  Jianyong DUAN  Li HE  Xin LI  

     
    PAPER-Language, Thought, Knowledge and Intelligence

      Pubricized:
    2022/08/18
      Vol:
    E106-A No:2
      Page(s):
    146-153

    Sememes are the smallest semantic units of human languages, the composition of which can represent the meaning of words. Sememes have been successfully applied to many downstream applications in natural language processing (NLP) field. Annotation of a word's sememes depends on language experts, which is both time-consuming and labor-consuming, limiting the large-scale application of sememe. Researchers have proposed some sememe prediction methods to automatically predict sememes for words. However, existing sememe prediction methods focus on information of the word itself, ignoring the expert-annotated knowledge bases which indicate the relations between words and should value in sememe predication. Therefore, we aim at incorporating the expert-annotated knowledge bases into sememe prediction process. To achieve that, we propose a CilinE-guided sememe prediction model which employs an existing word knowledge base CilinE to remodel the sememe prediction from relational perspective. Experiments on HowNet, a widely used Chinese sememe knowledge base, have shown that CilinE has an obvious positive effect on sememe prediction. Furthermore, our proposed method can be integrated into existing methods and significantly improves the prediction performance. We will release the data and code to the public.

  • Noise Suppression in SiC-MOSFET Body Diode Turn-Off Operation with Simple and Robust Gate Driver

    Hiroshi SUZUKI  Tsuyoshi FUNAKI  

     
    PAPER-Semiconductor Materials and Devices

      Pubricized:
    2022/06/14
      Vol:
    E105-C No:12
      Page(s):
    750-760

    SiC-MOSFETs are being increasingly implemented in power electronics systems as low-loss, fast switching devices. Despite the advantages of an SiC-MOSFET, its large dv/dt or di/dt has fear of electromagnetic interference (EMI) noise. This paper proposes and demonstrates a simple and robust gate driver that can suppress ringing oscillation and surge voltage induced by the turn-off of the SiC-MOSFET body diode. The proposed gate driver utilizes the channel leakage current methodology (CLC) to enhance the damping effect by elevating the gate-source voltage (VGS) and inducing the channel leakage current in the device. The gate driver can self-adjust the timing of initiating CLC operation, which avoids an increase in switching loss. Additionally, the output voltage of the VGS elevation circuit does not need to be actively controlled in accordance with the operating conditions. Thus, the circuit topology is simple, and ringing oscillation can be easily attenuated with fixed circuit parameters regardless of operating conditions, minimizing the increase in switching loss. The effectiveness and versatility of proposed gate driver were experimentally validated for a wide range of operating conditions by double and single pulse switching tests.

  • A KPI Anomaly Detection Method Based on Fast Clustering

    Yun WU  Yu SHI  Jieming YANG  Lishan BAO  Chunzhe LI  

     
    PAPER

      Pubricized:
    2022/05/27
      Vol:
    E105-B No:11
      Page(s):
    1309-1317

    In the Artificial Intelligence for IT Operations scenarios, KPI (Key Performance Indicator) is a very important operation and maintenance monitoring indicator, and research on KPI anomaly detection has also become a hot spot in recent years. Aiming at the problems of low detection efficiency and insufficient representation learning of existing methods, this paper proposes a fast clustering-based KPI anomaly detection method HCE-DWL. This paper firstly adopts the combination of hierarchical agglomerative clustering (HAC) and deep assignment based on CNN-Embedding (CE) to perform cluster analysis (that is HCE) on KPI data, so as to improve the clustering efficiency of KPI data, and then separately the centroid of each KPI cluster and its Transformed Outlier Scores (TOS) are given weights, and finally they are put into the LightGBM model for detection (the Double Weight LightGBM model, referred to as DWL). Through comparative experimental analysis, it is proved that the algorithm can effectively improve the efficiency and accuracy of KPI anomaly detection.

  • Sputtering Gas Pressure Dependence on the LaBxNy Insulator Formation for Pentacene-Based Back-Gate Type Floating-Gate Memory with an Amorphous Rubrene Passivation Layer

    Eun-Ki HONG  Kyung Eun PARK  Shun-ichiro OHMI  

     
    PAPER

      Pubricized:
    2022/06/27
      Vol:
    E105-C No:10
      Page(s):
    589-595

    In this research, the effect of Ar/N2-plasma sputtering gas pressure on the LaBxNy tunnel and block layer was investigated for pentacene-based floating-gate memory with an amorphous rubrene (α-rubrene) passivation layer. The influence of α-rubrene passivation layer for memory characteristic was examined. The pentacene-based metal/insulator/metal/insulator/semiconductor (MIMIS) diode and organic field-effect transistor (OFET) were fabricated utilizing N-doped LaB6 metal layer and LaBxNy insulator with α-rubrene passivation layer at annealing temperature of 200°C. In the case of MIMIS diode, the leakage current density and the equivalent oxide thickness (EOT) were decreased from 1.2×10-2 A/cm2 to 1.1×10-7 A/cm2 and 3.5 nm to 3.1 nm, respectively, by decreasing the sputtering gas pressure from 0.47 Pa to 0.19 Pa. In the case of floating-gate type OFET with α-rubrene passivation layer, the larger memory window of 0.68 V was obtained with saturation mobility of 2.2×10-2 cm2/(V·s) and subthreshold swing of 199 mV/dec compared to the device without α-rubrene passivation layer.

  • Ramsey Numbers of Trails Open Access

    Masatoshi OSUMI  

     
    PAPER-Graphs and Networks

      Pubricized:
    2022/03/24
      Vol:
    E105-A No:9
      Page(s):
    1235-1240

    We initiate the study of Ramsey numbers of trails. Let k≥2 be a positive integer. The Ramsey number of trails with k vertices is defined as the the smallest number n such that for every graph H with n vertices, H or the complete H contains a trail with k vertices. We prove that the Ramsey number of trails with k vertices is at most k and at least 2√k+Θ(1). This improves the trivial upper bound of ⌊3k/2⌋-1.

  • Diabetes Noninvasive Recognition via Improved Capsule Network

    Cunlei WANG  Donghui LI  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2022/05/06
      Vol:
    E105-D No:8
      Page(s):
    1464-1471

    Noninvasive recognition is an important trend in diabetes recognition. Unfortunately, the accuracy obtained from the conventional noninvasive recognition methods is low. This paper proposes a novel Diabetes Noninvasive Recognition method via the plantar pressure image and improved Capsule Network (DNR-CapsNet). The input of the proposed method is a plantar pressure image, and the output is the recognition result: healthy or possibly diabetes. The ResNet18 is used as the backbone of the convolutional layers to convert pixel intensities to local features in the proposed DNR-CapsNet. Then, the PrimaryCaps layer, SecondaryCaps layer, and DiabetesCaps layer are developed to achieve the diabetes recognition. The semantic fusion and locality-constrained dynamic routing are also developed to further improve the recognition accuracy in our method. The experimental results indicate that the proposed method has a better performance on diabetes noninvasive recognition than the state-of-the-art methods.

  • An Interpretable Feature Selection Based on Particle Swarm Optimization

    Yi LIU  Wei QIN  Qibin ZHENG  Gensong LI  Mengmeng LI  

     
    LETTER-Pattern Recognition

      Pubricized:
    2022/05/09
      Vol:
    E105-D No:8
      Page(s):
    1495-1500

    Feature selection based on particle swarm optimization is often employed for promoting the performance of artificial intelligence algorithms. However, its interpretability has been lacking of concrete research. Improving the stability of the feature selection method is a way to effectively improve its interpretability. A novel feature selection approach named Interpretable Particle Swarm Optimization is developed in this paper. It uses four data perturbation ways and three filter feature selection methods to obtain stable feature subsets, and adopts Fuch map to convert them to initial particles. Besides, it employs similarity mutation strategy, which applies Tanimoto distance to choose the nearest 1/3 individuals to the previous particles to implement mutation. Eleven representative algorithms and four typical datasets are taken to make a comprehensive comparison with our proposed approach. Accuracy, F1, precision and recall rate indicators are used as classification measures, and extension of Kuncheva indicator is employed as the stability measure. Experiments show that our method has a better interpretability than the compared evolutionary algorithms. Furthermore, the results of classification measures demonstrate that the proposed approach has an excellent comprehensive classification performance.

  • Obstacle Detection for Unmanned Surface Vehicles by Fusion Refinement Network

    Weina ZHOU  Xinxin HUANG  Xiaoyang ZENG  

     
    PAPER-Information Network

      Pubricized:
    2022/05/12
      Vol:
    E105-D No:8
      Page(s):
    1393-1400

    As a kind of marine vehicles, Unmanned Surface Vehicles (USV) are widely used in military and civilian fields because of their low cost, good concealment, strong mobility and high speed. High-precision detection of obstacles plays an important role in USV autonomous navigation, which ensures its subsequent path planning. In order to further improve obstacle detection performance, we propose an encoder-decoder architecture named Fusion Refinement Network (FRN). The encoder part with a deeper network structure enables it to extract more rich visual features. In particular, a dilated convolution layer is used in the encoder for obtaining a large range of obstacle features in complex marine environment. The decoder part achieves the multiple path feature fusion. Attention Refinement Modules (ARM) are added to optimize features, and a learnable fusion algorithm called Feature Fusion Module (FFM) is used to fuse visual information. Experimental validation results on three different datasets with real marine images show that FRN is superior to state-of-the-art semantic segmentation networks in performance evaluation. And the MIoU and MPA of the FRN can peak at 97.01% and 98.37% respectively. Moreover, FRN could maintain a high accuracy with only 27.67M parameters, which is much smaller than the latest obstacle detection network (WaSR) for USV.

  • Temporal Ensemble SSDLite: Exploiting Temporal Correlation in Video for Accurate Object Detection

    Lukas NAKAMURA  Hiromitsu AWANO  

     
    PAPER-Vision

      Pubricized:
    2022/01/18
      Vol:
    E105-A No:7
      Page(s):
    1082-1090

    We propose “Temporal Ensemble SSDLite,” a new method for video object detection that boosts accuracy while maintaining detection speed and energy consumption. Object detection for video is becoming increasingly important as a core part of applications in robotics, autonomous driving and many other promising fields. Many of these applications require high accuracy and speed to be viable, but are used in compute and energy restricted environments. Therefore, new methods that increase the overall performance of video object detection i.e., accuracy and speed have to be developed. To increase accuracy we use ensemble, the machine learning method of combining predictions of multiple different models. The drawback of ensemble is the increased computational cost which is proportional to the number of models used. We overcome this deficit by deploying our ensemble temporally, meaning we inference with only a single model at each frame, cycling through our ensemble of models at each frame. Then, we combine the predictions for the last N frames where N is the number of models in our ensemble through non-max-suppression. This is possible because close frames in a video are extremely similar due to temporal correlation. As a result, we increase accuracy through the ensemble while only inferencing a single model at each frame and therefore keeping the detection speed. To evaluate the proposal, we measure the accuracy, detection speed and energy consumption on the Google Edge TPU, a machine learning inference accelerator, with the Imagenet VID dataset. Our results demonstrate an accuracy boost of up to 4.9% while maintaining real-time detection speed and an energy consumption of 181mJ per image.

  • A Hardware Efficient Reservoir Computing System Using Cellular Automata and Ensemble Bloom Filter

    Dehua LIANG  Jun SHIOMI  Noriyuki MIURA  Masanori HASHIMOTO  Hiromitsu AWANO  

     
    PAPER-Computer System

      Pubricized:
    2022/04/08
      Vol:
    E105-D No:7
      Page(s):
    1273-1282

    Reservoir computing (RC) is an attractive alternative to machine learning models owing to its computationally inexpensive training process and simplicity. In this work, we propose EnsembleBloomCA, which utilizes cellular automata (CA) and an ensemble Bloom filter to organize an RC system. In contrast to most existing RC systems, EnsembleBloomCA eliminates all floating-point calculation and integer multiplication. EnsembleBloomCA adopts CA as the reservoir in the RC system because it can be implemented using only binary operations and is thus energy efficient. The rich pattern dynamics created by CA can map the original input into a high-dimensional space and provide more features for the classifier. Utilizing an ensemble Bloom filter as the classifier, the features provided by the reservoir can be effectively memorized. Our experiment revealed that applying the ensemble mechanism to the Bloom filter resulted in a significant reduction in memory cost during the inference phase. In comparison with Bloom WiSARD, one of the state-of-the-art reference work, the EnsembleBloomCA model achieves a 43× reduction in memory cost while maintaining the same accuracy. Our hardware implementation also demonstrated that EnsembleBloomCA achieved over 23× and 8.5× reductions in area and power, respectively.

  • A Binary Translator to Accelerate Development of Deep Learning Processing Library for AArch64 CPU Open Access

    Kentaro KAWAKAMI  Kouji KURIHARA  Masafumi YAMAZAKI  Takumi HONDA  Naoto FUKUMOTO  

     
    PAPER

      Pubricized:
    2021/12/03
      Vol:
    E105-C No:6
      Page(s):
    222-231

    To accelerate deep learning (DL) processes on the supercomputer Fugaku, the authors have ported and optimized oneDNN for Fugaku's CPU, the Fujitsu A64FX. oneDNN is an open-source DL processing library developed by Intel for the x86_64 architecture. The A64FX CPU is based on the Armv8-A architecture. oneDNN dynamically creates the execution code for the computation kernels, which are implemented at the granularity of x86_64 instructions using Xbyak, the Just-In-Time (JIT) assembler for x86_64 architecture. To port oneDNN to A64FX, it must be rewritten into Armv8-A instructions using Xbyak_aarch64, the JIT assembler for the Armv8-A architecture. This is challenging because the number of steps to be rewritten exceeds several tens of thousands of lines. This study presents the Xbyak_translator_aarch64. Xbyak_translator_aarch64 is a binary translator that at runtime converts dynamically produced executable codes for the x86_64 architecture into executable codes for the Armv8-A architecture. Xbyak_translator_aarch64 eliminates the need to rewrite the source code for porting oneDNN to A64FX and allows us to port oneDNN to A64FX quickly.

21-40hit(688hit)