The search functionality is under construction.

Keyword Search Result

[Keyword] OMP(3924hit)

241-260hit(3924hit)

  • A Statistical Trust for Detecting Malicious Nodes in IoT Sensor Networks

    Fang WANG  Zhe WEI  

     
    LETTER-Mobile Information Network and Personal Communications

      Pubricized:
    2021/02/19
      Vol:
    E104-A No:8
      Page(s):
    1084-1087

    The unattended malicious nodes pose great security threats to the integrity of the IoT sensor networks. However, preventions such as cryptography and authentication are difficult to be deployed in resource constrained IoT sensor nodes with low processing capabilities and short power supply. To tackle these malicious sensor nodes, in this study, the trust computing method is applied into the IoT sensor networks as a light weight security mechanism, and based on the theory of Chebyshev Polynomials for the approximation of time series, the trust data sequence generated by each sensor node is linearized and treated as a time series for malicious node detection. The proposed method is evaluated against existing schemes using several simulations and the results demonstrate that our method can better deal with malicious nodes resulting in higher correct packet delivery rate.

  • Classification Functions for Handwritten Digit Recognition

    Tsutomu SASAO  Yuto HORIKAWA  Yukihiro IGUCHI  

     
    PAPER-Logic Design

      Pubricized:
    2021/04/01
      Vol:
    E104-D No:8
      Page(s):
    1076-1082

    A classification function maps a set of vectors into several classes. A machine learning problem is treated as a design problem for partially defined classification functions. To realize classification functions for MNIST hand written digits, three different architectures are considered: Single-unit realization, 45-unit realization, and 45-unit ×r realization. The 45-unit realization consists of 45 ternary classifiers, 10 counters, and a max selector. Test accuracy of these architectures are compared using MNIST data set.

  • Remote Dynamic Reconfiguration of a Multi-FPGA System FiC (Flow-in-Cloud)

    Kazuei HIRONAKA  Kensuke IIZUKA  Miho YAMAKURA  Akram BEN AHMED  Hideharu AMANO  

     
    PAPER-Computer System

      Pubricized:
    2021/05/12
      Vol:
    E104-D No:8
      Page(s):
    1321-1331

    Multi-FPGA systems have been receiving a lot of attention as a low cost and energy efficient system for Multi-access Edge Computing (MEC). For such purpose, a bare-metal multi-FPGA system called FiC (Flow-in-Cloud) is under development. In this paper, we introduce the FiC multi FPGA cluster which is applied partial reconfiguration (PR) FPGA design flow to support online user defined accelerator replacement while executing FPGA interconnection network and its low-level multiple FPGA management software called remote PR manager. With the remote PR manager, the user can define the FiC FPGA cluster setup by JSON and control the cluster from user application with the cooperation of simple cluster management tool / library called ficmgr on the client host and REST API service provider called ficwww on Raspberry Pi 3 (RPi3) on each node. According to the evaluation results with a prototype FiC FPGA cluster system with 12 nodes, using with online application replacement by PR and on-the-fly FPGA bitstream compression, the time for FPGA bitstream distribution was reduced to 1/17 and the total cluster setup time was reduced by 21∼57% than compared to cluster setup with full configuration FPGA bitstream.

  • Extended-Domain Golomb Code and Symmetry of Relative Redundancy

    Ryosuke SUGIURA  Yutaka KAMAMOTO  Takehiro MORIYA  

     
    PAPER-Coding Theory

      Pubricized:
    2021/02/08
      Vol:
    E104-A No:8
      Page(s):
    1033-1042

    This paper presents extended-domain Golomb (XDG) code, an extension of Golomb code for sparse geometric sources as well as a generalization of extended-domain Golomb-Rice (XDGR) code, based on the idea of almost instantaneous fixed-to-variable length (AIFV) codes. Showing that the XDGR encoding can be interpreted as extended usage of the code proposed in the previous works, this paper discusses the following two facts: The proposed XDG code can be constructed as an AIFV code relating to Golomb code as XDGR code does to Rice code; XDG and Golomb codes are symmetric in the sense of relative redundancy. The proposed XDG code can be efficiently used for losslessly compressing geometric sources too sparse for the conventional Golomb and Rice codes. According to the symmetry, its relative redundancy is guaranteed to be as low as Golomb code compressing non-sparse geometric sources. Awing to this fact, the parameter of the proposed XDG code, which is more finely tunable than the conventional XDGR code, can be optimized for given inputs using the conventional techniques. Therefore, it is expected to be more useful for many coding applications that deal with geometric sources at low bit rates.

  • Video Inpainting by Frame Alignment with Deformable Convolution

    Yusuke HARA  Xueting WANG  Toshihiko YAMASAKI  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2021/04/22
      Vol:
    E104-D No:8
      Page(s):
    1349-1358

    Video inpainting is a task of filling missing regions in videos. In this task, it is important to efficiently use information from other frames and generate plausible results with sufficient temporal consistency. In this paper, we present a video inpainting method jointly using affine transformation and deformable convolutions for frame alignment. The former is responsible for frame-scale rough alignment and the latter performs pixel-level fine alignment. Our model does not depend on 3D convolutions, which limits the temporal window, or troublesome flow estimation. The proposed method achieves improved object removal results and better PSNR and SSIM values compared with previous learning-based methods.

  • Energy Efficient Approximate Storing of Image Data for MTJ Based Non-Volatile Flip-Flops and MRAM

    Yoshinori ONO  Kimiyoshi USAMI  

     
    PAPER

      Pubricized:
    2021/01/06
      Vol:
    E104-C No:7
      Page(s):
    338-349

    A non-volatile memory (NVM) employing MTJ has a lot of strong points such as read/write performance, best endurance and operating-voltage compatibility with standard CMOS. However, it consumes a lot of energy when writing the data. This becomes an obstacle when applying to battery-operated mobile devices. To solve this problem, we propose an approach to augment the capability of the precision scaling technique for the write operation in NVM. Precision scaling is an approximate computing technique to reduce the bit width of data (i.e. precision) for energy reduction. When writing image data to NVM with the precision scaling, the write energy and the image quality are changed according to the write time and the target bit range. We propose an energy-efficient approximate storing scheme for non-volatile flip-flops and a magnetic random-access memory (MRAM) that allows us to write the data by optimizing the bit positions to split the data and the write time for each bit range. By using the statistical model, we obtained optimal values for the write time and the targeted bit range under the trade-off between the write energy reduction and image quality degradation. Simulation results have demonstrated that by using these optimal values the write energy can be reduced up to 50% while maintaining the acceptable image quality. We also investigated the relationship between the input images and the output image quality when using this approach in detail. In addition, we evaluated the energy benefits when applying our approach to nine types of image processing including linear filters and edge detectors. Results showed that the write energy is reduced by further 12.5% at the maximum.

  • Multi-View Texture Learning for Face Super-Resolution

    Yu WANG  Tao LU  Feng YAO  Yuntao WU  Yanduo ZHANG  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2021/03/24
      Vol:
    E104-D No:7
      Page(s):
    1028-1038

    In recent years, single face image super-resolution (SR) using deep neural networks have been well developed. However, most of the face images captured by the camera in a real scene are from different views of the same person, and the existing traditional multi-frame image SR requires alignment between images. Due to multi-view face images contain texture information from different views, which can be used as effective prior information, how to use this prior information from multi-views to reconstruct frontal face images is challenging. In order to effectively solve the above problems, we propose a novel face SR network based on multi-view face images, which focus on obtaining more texture information from multi-view face images to help the reconstruction of frontal face images. And in this network, we also propose a texture attention mechanism to transfer high-precision texture compensation information to the frontal face image to obtain better visual effects. We conduct subjective and objective evaluations, and the experimental results show the great potential of using multi-view face images SR. The comparison with other state-of-the-art deep learning SR methods proves that the proposed method has excellent performance.

  • Design Method of Variable-Latency Circuit with Tunable Approximate Completion-Detection Mechanism

    Yuta UKON  Shimpei SATO  Atsushi TAKAHASHI  

     
    PAPER

      Pubricized:
    2020/12/21
      Vol:
    E104-C No:7
      Page(s):
    309-318

    Advanced information-processing services such as computer vision require a high-performance digital circuit to perform high-load processing at high speed. To achieve high-speed processing, several image-processing applications use an approximate computing technique to reduce idle time of the circuit. However, it is difficult to design the high-speed image-processing circuit while controlling the error rate so as not to degrade service quality, and this technique is used for only a few applications. In this paper, we propose a method that achieves high-speed processing effectively in which processing time for each task is changed by roughly detecting its completion. Using this method, a high-speed processing circuit with a low error rate can be designed. The error rate is controllable, and a circuit design method to minimize the error rate is also presented in this paper. To confirm the effectiveness of our proposal, a ripple-carry adder (RCA), 2-dimensional discrete cosine transform (2D-DCT) circuit, and histogram of oriented gradients (HOG) feature calculation circuit are evaluated. Effective clock periods of these circuits obtained by our method with around 1% error rate are improved about 64%, 6%, and 12%, respectively, compared with circuits without error. Furthermore, the impact of the miscalculation on a video monitoring service using an object detection application is investigated. As a result, more than 99% of detection points required to be obtained are detected, and it is confirmed the miscalculation hardly degrades the service quality.

  • Exploring the Outer Boundary of a Simple Polygon

    Qi WEI  Xiaolin YAO  Luan LIU  Yan ZHANG  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2021/04/02
      Vol:
    E104-D No:7
      Page(s):
    923-930

    We investigate an online problem of a robot exploring the outer boundary of an unknown simple polygon P. The robot starts from a specified vertex s and walks an exploration tour outside P. It has to see all points of the polygon's outer boundary and to return to the start. We provide lower and upper bounds on the ratio of the distance traveled by the robot in comparison to the length of the shortest path. We consider P in two scenarios: convex polygon and concave polygon. For the first scenario, we prove a lower bound of 5 and propose a 23.78-competitive strategy. For the second scenario, we prove a lower bound of 5.03 and propose a 26.5-competitive strategy.

  • Video Magnification under the Presence of Complex Background Motions

    Long ZHANG  Xuezhi YANG  

     
    LETTER-Computer Graphics

      Pubricized:
    2021/03/15
      Vol:
    E104-D No:6
      Page(s):
    909-914

    We propose a video magnification method for magnifying subtle color and motion changes under the presence of non-meaningful background motions. We use frequency variability to design a filter that passes only meaningful subtle changes and removes non-meaningful ones; our method obtains more impressive magnification results without artifacts than compared methods.

  • Video Smoke Removal from a Single Image Sequence Open Access

    Shiori YAMAGUCHI  Keita HIRAI  Takahiko HORIUCHI  

     
    PAPER

      Pubricized:
    2021/01/07
      Vol:
    E104-A No:6
      Page(s):
    876-886

    In this study, we present a novel method for removing smoke from videos based on a single image sequence. Smoke is a significant artifact in images or videos because it can reduce the visibility in disaster scenes. Our proposed method for removing smoke involves two main processes: (1) the development of a smoke imaging model and (2) smoke removal using spatio-temporal pixel compensation. First, we model the optical phenomena in natural scenes including smoke, which is called a smoke imaging model. Our smoke imaging model is developed by extending conventional haze imaging models. We then remove the smoke from a video in a frame-by-frame manner based on the smoke imaging model. Next, we refine the appearance of the smoke-free video by spatio-temporal pixel compensation, where we align the smoke-free frames using the corresponding pixels. To obtain the corresponding pixels, we use SIFT and color features with distance constraints. Finally, in order to obtain a clear video, we refine the pixel values based on the spatio-temporal weightings of the corresponding pixels in the smoke-free frames. We used simulated and actual smoke videos in our validation experiments. The experimental results demonstrated that our method can obtain effective smoke removal results from dynamic scenes. We also quantitatively assessed our method based on a temporal coherence measure.

  • Low-Complexity Training for Binary Convolutional Neural Networks Based on Clipping-Aware Weight Update

    Changho RYU  Tae-Hwan KIM  

     
    LETTER-Biocybernetics, Neurocomputing

      Pubricized:
    2021/03/17
      Vol:
    E104-D No:6
      Page(s):
    919-922

    This letter presents an efficient technique to reduce the computational complexity involved in training binary convolutional neural networks (BCNN). The BCNN training shall be conducted focusing on the optimization of the sign of each weight element rather than the exact value itself in convention; in which, the sign of an element is not likely to be flipped anymore after it has been updated to have such a large magnitude to be clipped out. The proposed technique does not update such elements that have been clipped out and eliminates the computations involved in their optimization accordingly. The complexity reduction by the proposed technique is as high as 25.52% in training the BCNN model for the CIFAR-10 classification task, while the accuracy is maintained without severe degradation.

  • Preliminary Performance Analysis of Distributed DNN Training with Relaxed Synchronization

    Koichi SHIRAHATA  Amir HADERBACHE  Naoto FUKUMOTO  Kohta NAKASHIMA  

     
    BRIEF PAPER

      Pubricized:
    2020/12/01
      Vol:
    E104-C No:6
      Page(s):
    257-260

    Scalability of distributed DNN training can be limited by slowdown of specific processes due to unexpected hardware failures. We propose a dynamic process exclusion technique so that training throughput is maximized. Our evaluation using 32 processes with ResNet-50 shows that our proposed technique reduces slowdown by 12.5% to 50% without accuracy loss through excluding the slow processes.

  • Action Recognition Using Pose Data in a Distributed Environment over the Edge and Cloud

    Chikako TAKASAKI  Atsuko TAKEFUSA  Hidemoto NAKADA  Masato OGUCHI  

     
    PAPER

      Pubricized:
    2021/02/02
      Vol:
    E104-D No:5
      Page(s):
    539-550

    With the development of cameras and sensors and the spread of cloud computing, life logs can be easily acquired and stored in general households for the various services that utilize the logs. However, it is difficult to analyze moving images that are acquired by home sensors in real time using machine learning because the data size is too large and the computational complexity is too high. Moreover, collecting and accumulating in the cloud moving images that are captured at home and can be used to identify individuals may invade the privacy of application users. We propose a method of distributed processing over the edge and cloud that addresses the processing latency and the privacy concerns. On the edge (sensor) side, we extract feature vectors of human key points from moving images using OpenPose, which is a pose estimation library. On the cloud side, we recognize actions by machine learning using only the feature vectors. In this study, we compare the action recognition accuracies of multiple machine learning methods. In addition, we measure the analysis processing time at the sensor and the cloud to investigate the feasibility of recognizing actions in real time. Then, we evaluate the proposed system by comparing it with the 3D ResNet model in recognition experiments. The experimental results demonstrate that the action recognition accuracy is the highest when using LSTM and that the introduction of dropout in action recognition using 100 categories alleviates overfitting because the models can learn more generic human actions by increasing the variety of actions. In addition, it is demonstrated that preprocessing using OpenPose on the sensor side can substantially reduce the transfer quantity from the sensor to the cloud.

  • A Low-Complexity QR Decomposition with Novel Modified RVD for MIMO Systems

    Lu SUN  Bin WU  Tianchun YE  

     
    LETTER-Digital Signal Processing

      Pubricized:
    2020/11/02
      Vol:
    E104-A No:5
      Page(s):
    814-817

    In this letter, a two-stage QR decomposition scheme based on Givens rotation with novel modified real-value decomposition (RVD) is presented. With the modified RVD applied to the result from complex Givens rotation at first stage, the number of non-zero terms needed to be eliminated by real Givens rotation at second stage decreases greatly and the computational complexity is thereby reduced significantly compared to the decomposition scheme with the conventional RVD. Besides, the proposed scheme is suitable for the hardware design of QR decomposition. Evaluation shows that the proposed QR decomposition scheme is superior to the related works in terms of computational complexity.

  • Parallel Peak Cancellation Signal-Based PAPR Reduction Method Using Null Space in MIMO Channel for MIMO-OFDM Transmission Open Access

    Taku SUZUKI  Mikihito SUZUKI  Kenichi HIGUCHI  

     
    PAPER-Wireless Communication Technologies

      Pubricized:
    2020/11/20
      Vol:
    E104-B No:5
      Page(s):
    539-549

    This paper proposes a parallel peak cancellation (PC) process for the computational complexity-efficient algorithm called PC with a channel-null constraint (PCCNC) in the adaptive peak-to-average power ratio (PAPR) reduction method using the null space in a multiple-input multiple-output (MIMO) channel for MIMO-orthogonal frequency division multiplexing (OFDM) signals. By simultaneously adding multiple PC signals to the time-domain transmission signal vector, the required number of iterations of the iterative algorithm is effectively reduced along with the PAPR. We implement a constraint in which the PC signal is transmitted only to the null space in the MIMO channel by beamforming (BF). By doing so the data streams do not experience interference from the PC signal on the receiver side. Since the fast Fourier transform (FFT) and inverse FFT (IFFT) operations at each iteration are not required unlike the previous algorithm and thanks to the newly introduced parallel processing approach, the enhanced PCCNC algorithm reduces the required total computational complexity and number of iterations compared to the previous algorithms while achieving the same throughput-vs.-PAPR performance.

  • A Fast Chroma Intra-Prediction Mode Decision Algorithm Based on Texture Characteristics for VVC

    Zhi LIU  Yifan SU  Shuzhong YANG  Mengmeng ZHANG  

     
    LETTER-Image Processing and Video Processing

      Pubricized:
    2021/02/05
      Vol:
    E104-D No:5
      Page(s):
    781-784

    Cross-component linear model (CCLM) chromaticity prediction is a new technique introduced in Versatile Video Coding (VVC), which utilizes the reconstructed luminance component to predict the chromaticity parts, and can improve the coding performance. However, it increases the coding complexity. In this paper, how to accelerate the chroma intra-prediction process is studied based on texture characteristics. Firstly, two observations have been found through experimental statistics for the process. One is that the choice of the chroma intra-prediction candidate modes is closely related to the texture complexity of the coding unit (CU), and the other is that whether the direct mode (DM) is selected is closely related to the texture similarity between current chromaticity CU and the corresponding luminance CU. Secondly, a fast chroma intra-prediction mode decision algorithm is proposed based on these observations. A modified metric named sum modulus difference (SMD) is introduced to measure the texture complexity of CU and guide the filtering of the irrelevant candidate modes. Meanwhile, the structural similarity index measurement (SSIM) is adopted to help judging the selection of the DM mode. The experimental results show that compared with the reference model VTM8.0, the proposed algorithm can reduce the coding time by 12.92% on average, and increases the BD-rate of Y, U, and V components by only 0.05%, 0.32%, and 0.29% respectively.

  • A Feasibility Study of Multi-Domain Stochastic Computing Circuit Open Access

    Tati ERLINA  Renyuan ZHANG  Yasuhiko NAKASHIMA  

     
    PAPER-Integrated Electronics

      Pubricized:
    2020/10/29
      Vol:
    E104-C No:5
      Page(s):
    153-163

    An efficient approximate computing circuit is developed for polynomial functions through the hybrid of analog and stochastic domains. Different from the ordinary time-based stochastic computing (TBSC), the proposed circuit exploits not only the duty cycle of pulses but also the pulse strength of the analog current to carry information for multiplications. The accumulation of many multiplications is performed by merely collecting the stochastic-current. As the calculation depth increases, the growth of latency (while summations), signal power weakening, and disparity of output signals (while multiplications) are substantially avoidable in contrast to that in the conventional TBSC. Furthermore, the calculation range spreads to bipolar infinite without scaling, theoretically. The proposed multi-domain stochastic computing (MDSC) is designed and simulated in a 0.18 µm CMOS technology by employing a set of current mirrors and an improved scheme of the TBSC circuit based on the Neuron-MOS mechanism. For proof-of-concept, the multiply and accumulate calculations (MACs) are implemented, achieving an average accuracy of 95.3%. More importantly, the transistor counting, power consumption, and latency decrease to 6.1%, 55.4%, and 4.2% of the state-of-art TBSC circuit, respectively. The robustness against temperature and process variations is also investigated and presented in detail.

  • Efficient Hardware Accelerator for Compressed Sparse Deep Neural Network

    Hao XIAO  Kaikai ZHAO  Guangzhu LIU  

     
    LETTER-Computer System

      Pubricized:
    2021/02/19
      Vol:
    E104-D No:5
      Page(s):
    772-775

    This work presents a DNN accelerator architecture specifically designed for performing efficient inference on compressed and sparse DNN models. Leveraging the data sparsity, a runtime processing scheme is proposed to deal with the encoded weights and activations directly in the compressed domain without decompressing. Furthermore, a new data flow is proposed to facilitate the reusage of input activations across the fully-connected (FC) layers. The proposed design is implemented and verified using the Xilinx Virtex-7 FPGA. Experimental results show it achieves 1.99×, 1.95× faster and 20.38×, 3.04× more energy efficient than CPU and mGPU platforms, respectively, running AlexNet.

  • Approximate Simultaneous Diagonalization of Matrices via Structured Low-Rank Approximation

    Riku AKEMA  Masao YAMAGISHI  Isao YAMADA  

     
    PAPER-Digital Signal Processing

      Pubricized:
    2020/10/15
      Vol:
    E104-A No:4
      Page(s):
    680-690

    Approximate Simultaneous Diagonalization (ASD) is a problem to find a common similarity transformation which approximately diagonalizes a given square-matrix tuple. Many data science problems have been reduced into ASD through ingenious modelling. For ASD, the so-called Jacobi-like methods have been extensively used. However, the methods have no guarantee to suppress the magnitude of off-diagonal entries of the transformed tuple even if the given tuple has an exact common diagonalizer, i.e., the given tuple is simultaneously diagonalizable. In this paper, to establish an alternative powerful strategy for ASD, we present a novel two-step strategy, called Approximate-Then-Diagonalize-Simultaneously (ATDS) algorithm. The ATDS algorithm decomposes ASD into (Step 1) finding a simultaneously diagonalizable tuple near the given one; and (Step 2) finding a common similarity transformation which diagonalizes exactly the tuple obtained in Step 1. The proposed approach to Step 1 is realized by solving a Structured Low-Rank Approximation (SLRA) with Cadzow's algorithm. In Step 2, by exploiting the idea in the constructive proof regarding the conditions for the exact simultaneous diagonalizability, we obtain an exact common diagonalizer of the obtained tuple in Step 1 as a solution for the original ASD. Unlike the Jacobi-like methods, the ATDS algorithm has a guarantee to find an exact common diagonalizer if the given tuple happens to be simultaneously diagonalizable. Numerical experiments show that the ATDS algorithm achieves better performance than the Jacobi-like methods.

241-260hit(3924hit)