The search functionality is under construction.

Keyword Search Result

[Keyword] SLA(197hit)

1-20hit(197hit)

  • Multi-Style Shape Matching GAN for Text Images Open Access

    Honghui YUAN  Keiji YANAI  

     
    PAPER

      Pubricized:
    2023/12/27
      Vol:
    E107-D No:4
      Page(s):
    505-514

    Deep learning techniques are used to transform the style of images and produce diverse images. In the text style transformation field, many previous studies attempted to generate stylized text using deep learning networks. However, to achieve multiple style transformations for text images, the methods proposed in previous studies require learning multiple networks or cannot be guided by style images. Thus, in this study we focused on multistyle transformation of text images using style images to guide the generation of results. We propose a multiple-style transformation network for text style transfer, which we refer to as the Multi-Style Shape Matching GAN (Multi-Style SMGAN). The proposed method generates multiple styles of text images using a single model by training the model only once, and allows users to control the text style according to style images. The proposed method implements conditions to the network such that all styles can be distinguished effectively in the network, and the generation of each styled text can be controlled according to these conditions. The proposed network is optimized such that the conditional information can be transmitted effectively throughout the network. The proposed method was evaluated experimentally on a large number of text images, and the results show that the trained model can generate multiple-style text in realtime according to the style image. In addition, the results of a user survey study indicate that the proposed method produces higher quality results compared to existing methods.

  • Inference Discrepancy Based Curriculum Learning for Neural Machine Translation

    Lei ZHOU  Ryohei SASANO  Koichi TAKEDA  

     
    PAPER-Natural Language Processing

      Pubricized:
    2023/10/18
      Vol:
    E107-D No:1
      Page(s):
    135-143

    In practice, even a well-trained neural machine translation (NMT) model can still make biased inferences on the training set due to distribution shifts. For the human learning process, if we can not reproduce something correctly after learning it multiple times, we consider it to be more difficult. Likewise, a training example causing a large discrepancy between inference and reference implies higher learning difficulty for the MT model. Therefore, we propose to adopt the inference discrepancy of each training example as the difficulty criterion, and according to which rank training examples from easy to hard. In this way, a trained model can guide the curriculum learning process of an initial model identical to itself. We put forward an analogy to this training scheme as guiding the learning process of a curriculum NMT model by a pretrained vanilla model. In this paper, we assess the effectiveness of the proposed training scheme and take an insight into the influence of translation direction, evaluation metrics and different curriculum schedules. Experimental results on translation benchmarks WMT14 English ⇒ German, WMT17 Chinese ⇒ English and Multitarget TED Talks Task (MTTT) English ⇔ German, English ⇔ Chinese, English ⇔ Russian demonstrate that our proposed method consistently improves the translation performance against the advanced Transformer baseline.

  • Hierarchical Detailed Intermediate Supervision for Image-to-Image Translation

    Jianbo WANG  Haozhi HUANG  Li SHEN  Xuan WANG  Toshihiko YAMASAKI  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2023/09/14
      Vol:
    E106-D No:12
      Page(s):
    2085-2096

    The image-to-image translation aims to learn a mapping between the source and target domains. For improving visual quality, the majority of previous works adopt multi-stage techniques to refine coarse results in a progressive manner. In this work, we present a novel approach for generating plausible details by only introducing a group of intermediate supervisions without cascading multiple stages. Specifically, we propose a Laplacian Pyramid Transformation Generative Adversarial Network (LapTransGAN) to simultaneously transform components in different frequencies from the source domain to the target domain within only one stage. Hierarchical perceptual and gradient penalization are utilized for learning consistent semantic structures and details at each pyramid level. The proposed model is evaluated based on various metrics, including the similarity in feature maps, reconstruction quality, segmentation accuracy, similarity in details, and qualitative appearances. Our experiments show that LapTransGAN can achieve a much better quantitative performance than both the supervised pix2pix model and the unsupervised CycleGAN model. Comprehensive ablation experiments are conducted to study the contribution of each component.

  • GAN-based Image Translation Model with Self-Attention for Nighttime Dashcam Data Augmentation

    Rebeka SULTANA  Gosuke OHASHI  

     
    PAPER-Intelligent Transport System

      Pubricized:
    2023/06/27
      Vol:
    E106-A No:9
      Page(s):
    1202-1210

    High-performance deep learning-based object detection models can reduce traffic accidents using dashcam images during nighttime driving. Deep learning requires a large-scale dataset to obtain a high-performance model. However, existing object detection datasets are mostly daytime scenes and a few nighttime scenes. Increasing the nighttime dataset is laborious and time-consuming. In such a case, it is possible to convert daytime images to nighttime images by image-to-image translation model to augment the nighttime dataset with less effort so that the translated dataset can utilize the annotations of the daytime dataset. Therefore, in this study, a GAN-based image-to-image translation model is proposed by incorporating self-attention with cycle consistency and content/style separation for nighttime data augmentation that shows high fidelity to annotations of the daytime dataset. Experimental results highlight the effectiveness of the proposed model compared with other models in terms of translated images and FID scores. Moreover, the high fidelity of translated images to the annotations is verified by a small object detection model according to detection results and mAP. Ablation studies confirm the effectiveness of self-attention in the proposed model. As a contribution to GAN-based data augmentation, the source code of the proposed image translation model is publicly available at https://github.com/subecky/Image-Translation-With-Self-Attention

  • A Fully Analog Deep Neural Network Inference Accelerator with Pipeline Registers Based on Master-Slave Switched Capacitors

    Yaxin MEI  Takashi OHSAWA  

     
    PAPER-Integrated Electronics

      Pubricized:
    2023/03/08
      Vol:
    E106-C No:9
      Page(s):
    477-485

    A fully analog pipelined deep neural network (DNN) accelerator is proposed, which is constructed by using pipeline registers based on master-slave switched capacitors. The idea of the master-slave switched capacitors is an analog equivalent of the delayed flip-flop (D-FF) which has been used as a digital pipeline register. To estimate the performance of the pipeline register, it is applied to a conventional DNN which performs non-pipeline operation. Compared with the conventional DNN, the cycle time is reduced by 61.5% and data rate is increased by 160%. The accuracy reaches 99.6% in MNIST classification test. The energy consumption per classification is reduced by 88.2% to 0.128µJ, achieving an energy efficiency of 1.05TOPS/W and a throughput of 0.538TOPS in 180nm technology node.

  • Optimal Movement for SLAM by Hopping Rover

    Shuntaro TAKEKUMA  Shun-ichi AZUMA  Ryo ARIIZUMI  Toru ASAI  

     
    PAPER

      Pubricized:
    2022/10/24
      Vol:
    E106-A No:5
      Page(s):
    715-720

    A hopping rover is a robot that can move in low gravity planets by the characteristic motion called the hopping motion. For its autonomous explorations, the so-called SLAM (Simultaneous Localization and Mapping) is a basic function. SLAM is the combination of estimating the position of a robot and creating a map of an unknown environment. Most conventional methods of SLAM are based on odometry to estimate the position of the robot. However, in the case of the hopping rover, the error of odometry becomes considerably large because its hopping motion involves unpredictable bounce on the rough ground on an unexplored planet. Motivated by the above discussion, this paper addresses a problem of finding an optimal movement of the hopping rover for the estimation performance of the SLAM. For the problem, we first set the model of the SLAM system for the hopping rover. The problem is formulated as minimizing the expectation of the estimation error at a pre-specified time with respect to the sequence of control inputs. We show that the optimal input sequence tends to force the final position to be not at the landmark but in front of the landmark, and furthermore, the optimal input sequence is constant on the time interval for optimization.

  • Image-to-Image Translation for Data Augmentation on Multimodal Medical Images

    Yue PENG  Zuqiang MENG  Lina YANG  

     
    PAPER-Smart Healthcare

      Pubricized:
    2022/03/01
      Vol:
    E106-D No:5
      Page(s):
    686-696

    Medical images play an important role in medical diagnosis. However, acquiring a large number of datasets with annotations is still a difficult task in the medical field. For this reason, research in the field of image-to-image translation is combined with computer-aided diagnosis, and data augmentation methods based on generative adversarial networks are applied to medical images. In this paper, we try to perform data augmentation on unimodal data. The designed StarGAN V2 based network has high performance in augmenting the dataset using a small number of original images, and the augmented data is expanded from unimodal data to multimodal medical images, and this multimodal medical image data can be applied to the segmentation task with some improvement in the segmentation results. Our experiments demonstrate that the generated multimodal medical image data can improve the performance of glioma segmentation.

  • A Binary Translator to Accelerate Development of Deep Learning Processing Library for AArch64 CPU Open Access

    Kentaro KAWAKAMI  Kouji KURIHARA  Masafumi YAMAZAKI  Takumi HONDA  Naoto FUKUMOTO  

     
    PAPER

      Pubricized:
    2021/12/03
      Vol:
    E105-C No:6
      Page(s):
    222-231

    To accelerate deep learning (DL) processes on the supercomputer Fugaku, the authors have ported and optimized oneDNN for Fugaku's CPU, the Fujitsu A64FX. oneDNN is an open-source DL processing library developed by Intel for the x86_64 architecture. The A64FX CPU is based on the Armv8-A architecture. oneDNN dynamically creates the execution code for the computation kernels, which are implemented at the granularity of x86_64 instructions using Xbyak, the Just-In-Time (JIT) assembler for x86_64 architecture. To port oneDNN to A64FX, it must be rewritten into Armv8-A instructions using Xbyak_aarch64, the JIT assembler for the Armv8-A architecture. This is challenging because the number of steps to be rewritten exceeds several tens of thousands of lines. This study presents the Xbyak_translator_aarch64. Xbyak_translator_aarch64 is a binary translator that at runtime converts dynamically produced executable codes for the x86_64 architecture into executable codes for the Armv8-A architecture. Xbyak_translator_aarch64 eliminates the need to rewrite the source code for porting oneDNN to A64FX and allows us to port oneDNN to A64FX quickly.

  • Research on Mongolian-Chinese Translation Model Based on Transformer with Soft Context Data Augmentation Technique

    Qing-dao-er-ji REN  Yuan LI  Shi BAO  Yong-chao LIU  Xiu-hong CHEN  

     
    PAPER-Neural Networks and Bioengineering

      Pubricized:
    2021/11/19
      Vol:
    E105-A No:5
      Page(s):
    871-876

    As the mainstream approach in the field of machine translation, neural machine translation (NMT) has achieved great improvements on many rich-source languages, but performance of NMT for low-resource languages ae not very good yet. This paper uses data enhancement technology to construct Mongolian-Chinese pseudo parallel corpus, so as to improve the translation ability of Mongolian-Chinese translation model. Experiments show that the above methods can improve the translation ability of the translation model. Finally, a translation model trained with large-scale pseudo parallel corpus and integrated with soft context data enhancement technology is obtained, and its BLEU value is 39.3.

  • Exploring Hypotactic Structure for Chinese-English Machine Translation with a Structure-Aware Encoder-Decoder Neural Model

    Guoyi MIAO  Yufeng CHEN  Mingtong LIU  Jinan XU  Yujie ZHANG  Wenhe FENG  

     
    PAPER-Natural Language Processing

      Pubricized:
    2022/01/11
      Vol:
    E105-D No:4
      Page(s):
    797-806

    Translation of long and complex sentence has always been a challenge for machine translation. In recent years, neural machine translation (NMT) has achieved substantial progress in modeling the semantic connection between words in a sentence, but it is still insufficient in capturing discourse structure information between clauses within complex sentences, which often leads to poor discourse coherence when translating long and complex sentences. On the other hand, the hypotactic structure, a main component of the discourse structure, plays an important role in the coherence of discourse translation, but it is not specifically studied. To tackle this problem, we propose a novel Chinese-English NMT approach that incorporates the hypotactic structure knowledge of complex sentences. Specifically, we first annotate and build a hypotactic structure aligned parallel corpus to provide explicit hypotactic structure knowledge of complex sentences for NMT. Then we propose three hypotactic structure-aware NMT models with three different fusion strategies, including source-side fusion, target-side fusion, and both-side fusion, to integrate the annotated structure knowledge into NMT. Experimental results on WMT17, WMT18 and WMT19 Chinese-English translation tasks demonstrate that the proposed method can significantly improve the translation performance and enhance the discourse coherence of machine translation.

  • Volume Integral Equations Combined with Orthogonality of Modes for Analysis of Two-Dimensional Optical Slab Waveguide

    Masahiro TANAKA  

     
    PAPER

      Pubricized:
    2021/10/18
      Vol:
    E105-C No:4
      Page(s):
    137-145

    Volume integral equations combined with orthogonality of guided mode and non-guided field are proposed for the TE incidence of two-dimensional optical slab waveguide. The slab waveguide is assumed to satisfy the single mode condition. The formulation of the integral equations are described in detail. The matrix equation obtained by applying the method of moments to the integral equations is shown. Numerical results for step, gap, and grating waveguides are given. They are compared to published papers to validate the proposed method.

  • Numerical Analysis of Pulse Response for Slanted Grating Structure with an Air Regions in Dispersion Media by TE Case Open Access

    Ryosuke OZAKI  Tsuneki YAMASAKI  

     
    BRIEF PAPER

      Pubricized:
    2021/10/18
      Vol:
    E105-C No:4
      Page(s):
    154-158

    In our previous paper, we have proposed a new numerical technique for transient scattering problem of periodically arrayed dispersion media by using a combination of the fast inversion Laplace transform (FILT) method and Fourier series expansion method (FSEM), and analyzed the pulse response for several widths of the dispersion media or rectangular cavities. From the numerical results, we examined the influence of a periodically arrayed dispersion media with a rectangular cavity on the pulse response. In this paper, we analyzed the transient scattering problem for the case of dispersion media with slanted air regions by utilizing a combination of the FILT, FSEM, and multilayer division method (MDM), and investigated an influence for the slanted angle of an air region. In addition, we verified the computational accuracy for term of the MDM and truncation mode number of the electromagnetic fields.

  • Activation-Aware Slack Assignment Based Mode-Wise Voltage Scaling for Energy Minimization

    TaiYu CHENG  Yutaka MASUDA  Jun NAGAYAMA  Yoichi MOMIYAMA  Jun CHEN  Masanori HASHIMOTO  

     
    PAPER

      Pubricized:
    2021/08/31
      Vol:
    E105-A No:3
      Page(s):
    497-508

    Reducing power consumption is a crucial factor making industrial designs, such as mobile SoCs, competitive. Voltage scaling (VS) is the classical yet most effective technique that contributes to quadratic power reduction. A recent design technique called activation-aware slack assignment (ASA) enhances the voltage-scaling by allocating the timing margin of critical paths with a stochastic mean-time-to-failure (MTTF) analysis. Meanwhile, such stochastic treatment of timing errors is accepted in limited application domains, such as image processing. This paper proposes a design optimization methodology that achieves a mode-wise voltage-scalable (MWVS) design guaranteeing no timing errors in each mode operation. This work formulates the MWVS design as an optimization problem that minimizes the overall power consumption considering each mode duration, achievable voltage lowering and accompanied circuit overhead explicitly, and explores the solution space with the downhill simplex algorithm that does not require numerical derivation and frequent objective function evaluations. For obtaining a solution, i.e., a design, in the optimization process, we exploit the multi-corner multi-mode design flow in a commercial tool for performing mode-wise ASA with sets of false paths dedicated to individual modes. We applied the proposed design methodology to RISC-V design. Experimental results show that the proposed methodology saves 13% to 20% more power compared to the conventional VS approach and attains 8% to 15% gain from the conventional single-mode ASA. We also found that cycle-by-cycle fine-grained false path identification reduced leakage power by 31% to 42%.

  • Adversarial Scan Attack against Scan Matching Algorithm for Pose Estimation in LiDAR-Based SLAM Open Access

    Kota YOSHIDA  Masaya HOJO  Takeshi FUJINO  

     
    PAPER

      Pubricized:
    2021/10/26
      Vol:
    E105-A No:3
      Page(s):
    326-335

    Autonomous robots are controlled using physical information acquired by various sensors. The sensors are susceptible to physical attacks, which tamper with the observed values and interfere with control of the autonomous robots. Recently, sensor spoofing attacks targeting subsequent algorithms which use sensor data have become large threats. In this paper, we introduce a new attack against the LiDAR-based simultaneous localization and mapping (SLAM) algorithm. The attack uses an adversarial LiDAR scan to fool a pose graph and a generated map. The adversary calculates a falsification amount for deceiving pose estimation and physically injects the spoofed distance against LiDAR. The falsification amount is calculated by gradient method against a cost function of the scan matching algorithm. The SLAM algorithm generates the wrong map from the deceived movement path estimated by scan matching. We evaluated our attack on two typical scan matching algorithms, iterative closest point (ICP) and normal distribution transform (NDT). Our experimental results show that SLAM can be fooled by tampering with the scan. Simple odometry sensor fusion is not a sufficient countermeasure. We argue that it is important to detect or prevent tampering with LiDAR scans and to notice inconsistencies in sensors caused by physical attacks.

  • Movie Map for Virtual Exploration in a City

    Kiyoharu AIZAWA  

     
    INVITED PAPER

      Pubricized:
    2021/10/12
      Vol:
    E105-D No:1
      Page(s):
    38-45

    This paper introduces our work on a Movie Map, which will enable users to explore a given city area using 360° videos. Visual exploration of a city is always needed. Nowadays, we are familiar with Google Street View (GSV) that is an interactive visual map. Despite the wide use of GSV, it provides sparse images of streets, which often confuses users and lowers user satisfaction. Forty years ago, a video-based interactive map was created - it is well-known as Aspen Movie Map. Movie Map uses videos instead of sparse images and seems to improve the user experience dramatically. However, Aspen Movie Map was based on analog technology with a huge effort and never built again. Thus, we renovate the Movie Map using state-of-the-art technology. We build a new Movie Map system with an interface for exploring cities. The system consists of four stages; acquisition, analysis, management, and interaction. After acquiring 360° videos along streets in target areas, the analysis of videos is almost automatic. Frames of the video are localized on the map, intersections are detected, and videos are segmented. Turning views at intersections are synthesized. By connecting the video segments following the specified movement in an area, we can watch a walking view along a street. The interface allows for easy exploration of a target area. It can also show virtual billboards in the view.

  • Fogcached-Ros: DRAM/NVMM Hybrid KVS Server with ROS Based Extension for ROS Application and SLAM Evaluation

    Koki HIGASHI  Yoichi ISHIWATA  Takeshi OHKAWA  Midori SUGAYA  

     
    PAPER

      Pubricized:
    2021/08/18
      Vol:
    E104-D No:12
      Page(s):
    2097-2108

    Recently, edge servers located closer than the cloud have become expected for the purpose of processing the large amount of sensor data generated by IoT devices such as robots. Research has been proposed to improve responsiveness as a cache server by applying KVS (Key-Value Store) to the edge as a method for obtaining high responsiveness. Above all, a hybrid-KVS server that uses both DRAM and NVMM (Non-Volatile Main Memory) devices is expected to achieve both responsiveness and reliability. However, its effectiveness has not been verified in actual applications, and its effectiveness is not clear in terms of its relationship with the cloud. The purpose of this study is to evaluate the effectiveness of hybrid-KVS servers using the SLAM (Simultaneous Localization and Mapping), which is a widely used application in robots and autonomous driving. It is appropriate for applying an edge server and requires responsiveness and reliability. SLAM is generally implemented on ROS (Robot Operating System) middleware and communicates with the server through ROS middleware. However, if we use hybrid-KVS on the edge with the SLAM and ROS, the communication could not be achieved since the message objects are different from the format expected by KVS. Therefore, in this research, we propose a mechanism to apply the ROS memory object to hybrid-KVS by designing and implementing the data serialization function to extend ROS. As a result of the proposed fogcached-ros and evaluation, we confirm the effectiveness of low API overhead, support for data used by SLAM, and low latency difference between the edge and cloud.

  • Neural Incremental Speech Recognition Toward Real-Time Machine Speech Translation

    Sashi NOVITASARI  Sakriani SAKTI  Satoshi NAKAMURA  

     
    PAPER-Speech and Hearing

      Pubricized:
    2021/08/27
      Vol:
    E104-D No:12
      Page(s):
    2195-2208

    Real-time machine speech translation systems mimic human interpreters and translate incoming speech from a source language to the target language in real-time. Such systems can be achieved by performing low-latency processing in ASR (automatic speech recognition) module before passing the output to MT (machine translation) and TTS (text-to-speech synthesis) modules. Although several studies recently proposed sequence mechanisms for neural incremental ASR (ISR), these frameworks have a more complicated training mechanism than the standard attention-based ASR because they have to decide the incremental step and learn the alignment between speech and text. In this paper, we propose attention-transfer ISR (AT-ISR) that learns the knowledge from attention-based non-incremental ASR for a low delay end-to-end speech recognition. ISR comes with a trade-off between delay and performance, so we investigate how to reduce AT-ISR delay without a significant performance drop. Our experiment shows that AT-ISR achieves a comparable performance to the non-incremental ASR when the incremental recognition begins after the speech utterance reaches 25% of the complete utterance length. Additional experiments to investigate the effect of ISR on translation tasks are also performed. The focus is to find the optimum granularity of the output unit. The results reveal that our end-to-end subword-level ISR resulted in the best translation quality with the lowest WER and the lowest uncovered-word rate.

  • Document-Level Neural Machine Translation with Associated Memory Network

    Shu JIANG  Rui WANG  Zuchao LI  Masao UTIYAMA  Kehai CHEN  Eiichiro SUMITA  Hai ZHAO  Bao-liang LU  

     
    PAPER-Natural Language Processing

      Pubricized:
    2021/06/24
      Vol:
    E104-D No:10
      Page(s):
    1712-1723

    Standard neural machine translation (NMT) is on the assumption that the document-level context is independent. Most existing document-level NMT approaches are satisfied with a smattering sense of global document-level information, while this work focuses on exploiting detailed document-level context in terms of a memory network. The capacity of the memory network that detecting the most relevant part of the current sentence from memory renders a natural solution to model the rich document-level context. In this work, the proposed document-aware memory network is implemented to enhance the Transformer NMT baseline. Experiments on several tasks show that the proposed method significantly improves the NMT performance over strong Transformer baselines and other related studies.

  • An FPGA Acceleration and Optimization Techniques for 2D LiDAR SLAM Algorithm

    Keisuke SUGIURA  Hiroki MATSUTANI  

     
    PAPER-Computer System

      Pubricized:
    2021/03/04
      Vol:
    E104-D No:6
      Page(s):
    789-800

    An efficient hardware implementation for Simultaneous Localization and Mapping (SLAM) methods is of necessity for mobile autonomous robots with limited computational resources. In this paper, we propose a resource-efficient FPGA implementation for accelerating scan matching computations, which typically cause a major bottleneck in 2D LiDAR SLAM methods. Scan matching is a process of correcting a robot pose by aligning the latest LiDAR measurements with an occupancy grid map, which encodes the information about the surrounding environment. We exploit an inherent parallelism in the Rao-Blackwellized Particle Filter (RBPF) based algorithm to perform scan matching computations for multiple particles in parallel. In the proposed design, several techniques are employed to reduce the resource utilization and to achieve the maximum throughput. Experimental results using the benchmark datasets show that the scan matching is accelerated by 5.31-8.75× and the overall throughput is improved by 3.72-5.10× without seriously degrading the quality of the final outputs. Furthermore, our proposed IP core requires only 44% of the total resources available in the TUL Pynq-Z2 FPGA board, thus facilitating the realization of SLAM applications on indoor mobile robots.

  • Rapid Recovery by Maximizing Page-Mapping Logs Deactivation

    Jung-Hoon KIM  

     
    LETTER-Software System

      Pubricized:
    2021/02/25
      Vol:
    E104-D No:6
      Page(s):
    885-889

    As NAND flash-based storage has been settled, a flash translation layer (FTL) has been in charge of mapping data addresses on NAND flash memory. Many FTLs implemented various mapping schemes, but the amount of mapping data depends on the mapping level. However, the FTL should contemplate mapping consistency irrespective of how much mapping data dwell in the storage. Furthermore, the recovery cost by the inconsistency needs to be considered for a faster storage reboot time. This letter proposes a novel method that enhances the consistency for a page-mapping level FTL running a legacy logging policy. Moreover, the recovery cost of page mappings also decreases. The novel method is to adopt a virtually-shrunk segment and deactivate page-mapping logs by assembling and storing the segments. This segment scheme already gave embedded NAND flash-based storage enhance its response time in our previous study. In addition to that improved result, this novel plan maximizes the page-mapping consistency, therefore improves the recovery cost compared with the legacy page-mapping FTL.

1-20hit(197hit)