The search functionality is under construction.

Keyword Search Result

[Keyword] MPU(1519hit)

101-120hit(1519hit)

  • A Hybrid Retinex-Based Algorithm for UAV-Taken Image Enhancement

    Xinran LIU  Zhongju WANG  Long WANG  Chao HUANG  Xiong LUO  

     
    LETTER-Image Processing and Video Processing

      Pubricized:
    2021/08/05
      Vol:
    E104-D No:11
      Page(s):
    2024-2027

    A hybrid Retinex-based image enhancement algorithm is proposed to improve the quality of images captured by unmanned aerial vehicles (UAVs) in this paper. Hyperparameters of the employed multi-scale Retinex with chromaticity preservation (MSRCP) model are automatically tuned via a two-phase evolutionary computing algorithm. In the two-phase optimization algorithm, the Rao-2 algorithm is applied to performing the global search and a solution is obtained by maximizing the objective function. Next, the Nelder-Mead simplex method is used to improve the solution via local search. Real UAV-taken images of bad quality are collected to verify the performance of the proposed algorithm. Meanwhile, four famous image enhancement algorithms, Multi-Scale Retinex, Multi-Scale Retinex with Color Restoration, Automated Multi-Scale Retinex, and MSRCP are utilized as benchmarking methods. Meanwhile, two commonly used evolutionary computing algorithms, particle swarm optimization and flower pollination algorithm, are considered to verify the efficiency of the proposed method in tuning parameters of the MSRCP model. Experimental results demonstrate that the proposed method achieves the best performance compared with benchmarks and thus the proposed method is applicable for real UAV-based applications.

  • An Optimistic Synchronization Based Optimal Server Selection Scheme for Delay Sensitive Communication Services Open Access

    Akio KAWABATA  Bijoy Chand CHATTERJEE  Eiji OKI  

     
    PAPER-Network System

      Pubricized:
    2021/04/09
      Vol:
    E104-B No:10
      Page(s):
    1277-1287

    In distributed processing for communication services, a proper server selection scheme is required to reduce delay by ensuring the event occurrence order. Although a conservative synchronization algorithm (CSA) has been used to achieve this goal, an optimistic synchronization algorithm (OSA) can be feasible for synchronizing distributed systems. In comparison with CSA, which reproduces events in occurrence order before processing applications, OSA can be feasible to realize low delay communication as the processing events arrive sequentially. This paper proposes an optimal server selection scheme that uses OSA for distributed processing systems to minimize end-to-end delay under the condition that maximum status holding time is limited. In other words, the end-to-end delay is minimized based on the allowed rollback time, which is given according to the application designing aspects and availability of computing resources. Numerical results indicate that the proposed scheme reduces the delay compared to the conventional scheme.

  • Counting Convex and Non-Convex 4-Holes in a Point Set

    Young-Hun SUNG  Sang Won BAE  

     
    PAPER-Algorithms and Data Structures

      Pubricized:
    2021/03/18
      Vol:
    E104-A No:9
      Page(s):
    1094-1100

    In this paper, we present an algorithm that counts the number of empty quadrilaterals whose corners are chosen from a given set S of n points in general position. Our algorithm can separately count the number of convex or non-convex empty quadrilaterals in O(T) time, where T denotes the number of empty triangles in S. Note that T varies from Ω(n2) and O(n3) and the expected value of T is known to be Θ(n2) when the n points in S are chosen uniformly and independently at random from a convex and bounded body in the plane. We also show how to enumerate all convex and/or non-convex empty quadrilaterals in S in time proportional to the number of reported quadrilaterals, after O(T)-time preprocessing.

  • Character Design Generation System Using Multiple Users' Gaze Information

    Hiroshi TAKENOUCHI  Masataka TOKUMARU  

     
    PAPER-Human-computer Interaction

      Pubricized:
    2021/05/25
      Vol:
    E104-D No:9
      Page(s):
    1459-1466

    We investigate an interactive evolutionary computation (IEC) using multiple users' gaze information when users partially participate in each design evaluation. Many previous IEC systems have a problem that user evaluation loads are too large. Hence, we proposed to employ user gaze information for evaluating designs generated by IEC systems in order to solve this problem. In this proposed system, users just view the presented designs, not assess, then the system automatically creates users' favorite designs. With the user's gaze information, the proposed system generates coordination that can satisfy many users. In our previous study, we verified the effectiveness of the proposed system from a real system operation viewpoint. However, we did not consider the fluctuation of the users during a solution candidate evaluation. In the actual operation of the proposed system, users may change during the process due to the user interchange. Therefore, in this study, we verify the effectiveness of the proposed system when varying the users participating in each evaluation for each generation. In the experiment, we employ two types of situations as assumed in real environments. The first situation changes the number of users evaluating the designs for each generation. The second situation employs various users from the predefined population to evaluate the designs for each generation. From the experimental results in the first situation, we confirm that, despite the change in the number of users during the solution candidate evaluation, the proposed system can generate coordination to satisfy many users. Also, from the results in the second situation, we verify that the proposed system can also generate coordination which both users who participate in the coordination evaluation can more satisfy.

  • Hybrid Electrical/Optical Switch Architectures for Training Distributed Deep Learning in Large-Scale

    Thao-Nguyen TRUONG  Ryousei TAKANO  

     
    PAPER-Information Network

      Pubricized:
    2021/04/23
      Vol:
    E104-D No:8
      Page(s):
    1332-1339

    Data parallelism is the dominant method used to train deep learning (DL) models on High-Performance Computing systems such as large-scale GPU clusters. When training a DL model on a large number of nodes, inter-node communication becomes bottle-neck due to its relatively higher latency and lower link bandwidth (than intra-node communication). Although some communication techniques have been proposed to cope with this problem, all of these approaches target to deal with the large message size issue while diminishing the effect of the limitation of the inter-node network. In this study, we investigate the benefit of increasing inter-node link bandwidth by using hybrid switching systems, i.e., Electrical Packet Switching and Optical Circuit Switching. We found that the typical data-transfer of synchronous data-parallelism training is long-lived and rarely changed that can be speed-up with optical switching. Simulation results on the Simgrid simulator show that our approach speed-up the training time of deep learning applications, especially in a large-scale manner.

  • A Statistical Trust for Detecting Malicious Nodes in IoT Sensor Networks

    Fang WANG  Zhe WEI  

     
    LETTER-Mobile Information Network and Personal Communications

      Pubricized:
    2021/02/19
      Vol:
    E104-A No:8
      Page(s):
    1084-1087

    The unattended malicious nodes pose great security threats to the integrity of the IoT sensor networks. However, preventions such as cryptography and authentication are difficult to be deployed in resource constrained IoT sensor nodes with low processing capabilities and short power supply. To tackle these malicious sensor nodes, in this study, the trust computing method is applied into the IoT sensor networks as a light weight security mechanism, and based on the theory of Chebyshev Polynomials for the approximation of time series, the trust data sequence generated by each sensor node is linearized and treated as a time series for malicious node detection. The proposed method is evaluated against existing schemes using several simulations and the results demonstrate that our method can better deal with malicious nodes resulting in higher correct packet delivery rate.

  • Remote Dynamic Reconfiguration of a Multi-FPGA System FiC (Flow-in-Cloud)

    Kazuei HIRONAKA  Kensuke IIZUKA  Miho YAMAKURA  Akram BEN AHMED  Hideharu AMANO  

     
    PAPER-Computer System

      Pubricized:
    2021/05/12
      Vol:
    E104-D No:8
      Page(s):
    1321-1331

    Multi-FPGA systems have been receiving a lot of attention as a low cost and energy efficient system for Multi-access Edge Computing (MEC). For such purpose, a bare-metal multi-FPGA system called FiC (Flow-in-Cloud) is under development. In this paper, we introduce the FiC multi FPGA cluster which is applied partial reconfiguration (PR) FPGA design flow to support online user defined accelerator replacement while executing FPGA interconnection network and its low-level multiple FPGA management software called remote PR manager. With the remote PR manager, the user can define the FiC FPGA cluster setup by JSON and control the cluster from user application with the cooperation of simple cluster management tool / library called ficmgr on the client host and REST API service provider called ficwww on Raspberry Pi 3 (RPi3) on each node. According to the evaluation results with a prototype FiC FPGA cluster system with 12 nodes, using with online application replacement by PR and on-the-fly FPGA bitstream compression, the time for FPGA bitstream distribution was reduced to 1/17 and the total cluster setup time was reduced by 21∼57% than compared to cluster setup with full configuration FPGA bitstream.

  • Minimax Design of Sparse IIR Filters Using Sparse Linear Programming Open Access

    Masayoshi NAKAMOTO  Naoyuki AIKAWA  

     
    PAPER-Digital Signal Processing

      Pubricized:
    2021/02/15
      Vol:
    E104-A No:8
      Page(s):
    1006-1018

    Recent trends in designing filters involve development of sparse filters with coefficients that not only have real but also zero values. These sparse filters can achieve a high performance through optimizing the selection of the zero coefficients and computing the real (non-zero) coefficients. Designing an infinite impulse response (IIR) sparse filter is more challenging than designing a finite impulse response (FIR) sparse filter. Therefore, studies on the design of IIR sparse filters have been rare. In this study, we consider IIR filters whose coefficients involve zero value, called sparse IIR filter. First, we formulate the design problem as a linear programing problem without imposing any stability condition. Subsequently, we reformulate the design problem by altering the error function and prepare several possible denominator polynomials with stable poles. Finally, by incorporating these methods into successive thinning algorithms, we develop a new design algorithm for the filters. To demonstrate the effectiveness of the proposed method, its performance is compared with that of other existing methods.

  • Video Inpainting by Frame Alignment with Deformable Convolution

    Yusuke HARA  Xueting WANG  Toshihiko YAMASAKI  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2021/04/22
      Vol:
    E104-D No:8
      Page(s):
    1349-1358

    Video inpainting is a task of filling missing regions in videos. In this task, it is important to efficiently use information from other frames and generate plausible results with sufficient temporal consistency. In this paper, we present a video inpainting method jointly using affine transformation and deformable convolutions for frame alignment. The former is responsible for frame-scale rough alignment and the latter performs pixel-level fine alignment. Our model does not depend on 3D convolutions, which limits the temporal window, or troublesome flow estimation. The proposed method achieves improved object removal results and better PSNR and SSIM values compared with previous learning-based methods.

  • Energy Efficient Approximate Storing of Image Data for MTJ Based Non-Volatile Flip-Flops and MRAM

    Yoshinori ONO  Kimiyoshi USAMI  

     
    PAPER

      Pubricized:
    2021/01/06
      Vol:
    E104-C No:7
      Page(s):
    338-349

    A non-volatile memory (NVM) employing MTJ has a lot of strong points such as read/write performance, best endurance and operating-voltage compatibility with standard CMOS. However, it consumes a lot of energy when writing the data. This becomes an obstacle when applying to battery-operated mobile devices. To solve this problem, we propose an approach to augment the capability of the precision scaling technique for the write operation in NVM. Precision scaling is an approximate computing technique to reduce the bit width of data (i.e. precision) for energy reduction. When writing image data to NVM with the precision scaling, the write energy and the image quality are changed according to the write time and the target bit range. We propose an energy-efficient approximate storing scheme for non-volatile flip-flops and a magnetic random-access memory (MRAM) that allows us to write the data by optimizing the bit positions to split the data and the write time for each bit range. By using the statistical model, we obtained optimal values for the write time and the targeted bit range under the trade-off between the write energy reduction and image quality degradation. Simulation results have demonstrated that by using these optimal values the write energy can be reduced up to 50% while maintaining the acceptable image quality. We also investigated the relationship between the input images and the output image quality when using this approach in detail. In addition, we evaluated the energy benefits when applying our approach to nine types of image processing including linear filters and edge detectors. Results showed that the write energy is reduced by further 12.5% at the maximum.

  • Parameters Estimation of Impulse Noise for Channel Coded Systems over Fading Channels

    Chun-Yin CHEN  Mao-Ching CHIU  

     
    PAPER-Wireless Communication Technologies

      Pubricized:
    2021/01/18
      Vol:
    E104-B No:7
      Page(s):
    903-912

    In this paper, we propose a robust parameters estimation algorithm for channel coded systems based on the low-density parity-check (LDPC) code over fading channels with impulse noise. The estimated parameters are then used to generate bit log-likelihood ratios (LLRs) for a soft-inputLDPC decoder. The expectation-maximization (EM) algorithm is used to estimate the parameters, including the channel gain and the parameters of the Bernoulli-Gaussian (B-G) impulse noise model. The parameters can be estimated accurately and the average number of iterations of the proposed algorithm is acceptable. Simulation results show that over a wide range of impulse noise power, the proposed algorithm approaches the optimal performance under different Rician channel factors and even under Middleton class-A (M-CA) impulse noise models.

  • Exploring the Outer Boundary of a Simple Polygon

    Qi WEI  Xiaolin YAO  Luan LIU  Yan ZHANG  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2021/04/02
      Vol:
    E104-D No:7
      Page(s):
    923-930

    We investigate an online problem of a robot exploring the outer boundary of an unknown simple polygon P. The robot starts from a specified vertex s and walks an exploration tour outside P. It has to see all points of the polygon's outer boundary and to return to the start. We provide lower and upper bounds on the ratio of the distance traveled by the robot in comparison to the length of the shortest path. We consider P in two scenarios: convex polygon and concave polygon. For the first scenario, we prove a lower bound of 5 and propose a 23.78-competitive strategy. For the second scenario, we prove a lower bound of 5.03 and propose a 26.5-competitive strategy.

  • Preliminary Performance Analysis of Distributed DNN Training with Relaxed Synchronization

    Koichi SHIRAHATA  Amir HADERBACHE  Naoto FUKUMOTO  Kohta NAKASHIMA  

     
    BRIEF PAPER

      Pubricized:
    2020/12/01
      Vol:
    E104-C No:6
      Page(s):
    257-260

    Scalability of distributed DNN training can be limited by slowdown of specific processes due to unexpected hardware failures. We propose a dynamic process exclusion technique so that training throughput is maximized. Our evaluation using 32 processes with ResNet-50 shows that our proposed technique reduces slowdown by 12.5% to 50% without accuracy loss through excluding the slow processes.

  • Action Recognition Using Pose Data in a Distributed Environment over the Edge and Cloud

    Chikako TAKASAKI  Atsuko TAKEFUSA  Hidemoto NAKADA  Masato OGUCHI  

     
    PAPER

      Pubricized:
    2021/02/02
      Vol:
    E104-D No:5
      Page(s):
    539-550

    With the development of cameras and sensors and the spread of cloud computing, life logs can be easily acquired and stored in general households for the various services that utilize the logs. However, it is difficult to analyze moving images that are acquired by home sensors in real time using machine learning because the data size is too large and the computational complexity is too high. Moreover, collecting and accumulating in the cloud moving images that are captured at home and can be used to identify individuals may invade the privacy of application users. We propose a method of distributed processing over the edge and cloud that addresses the processing latency and the privacy concerns. On the edge (sensor) side, we extract feature vectors of human key points from moving images using OpenPose, which is a pose estimation library. On the cloud side, we recognize actions by machine learning using only the feature vectors. In this study, we compare the action recognition accuracies of multiple machine learning methods. In addition, we measure the analysis processing time at the sensor and the cloud to investigate the feasibility of recognizing actions in real time. Then, we evaluate the proposed system by comparing it with the 3D ResNet model in recognition experiments. The experimental results demonstrate that the action recognition accuracy is the highest when using LSTM and that the introduction of dropout in action recognition using 100 categories alleviates overfitting because the models can learn more generic human actions by increasing the variety of actions. In addition, it is demonstrated that preprocessing using OpenPose on the sensor side can substantially reduce the transfer quantity from the sensor to the cloud.

  • Parallel Peak Cancellation Signal-Based PAPR Reduction Method Using Null Space in MIMO Channel for MIMO-OFDM Transmission Open Access

    Taku SUZUKI  Mikihito SUZUKI  Kenichi HIGUCHI  

     
    PAPER-Wireless Communication Technologies

      Pubricized:
    2020/11/20
      Vol:
    E104-B No:5
      Page(s):
    539-549

    This paper proposes a parallel peak cancellation (PC) process for the computational complexity-efficient algorithm called PC with a channel-null constraint (PCCNC) in the adaptive peak-to-average power ratio (PAPR) reduction method using the null space in a multiple-input multiple-output (MIMO) channel for MIMO-orthogonal frequency division multiplexing (OFDM) signals. By simultaneously adding multiple PC signals to the time-domain transmission signal vector, the required number of iterations of the iterative algorithm is effectively reduced along with the PAPR. We implement a constraint in which the PC signal is transmitted only to the null space in the MIMO channel by beamforming (BF). By doing so the data streams do not experience interference from the PC signal on the receiver side. Since the fast Fourier transform (FFT) and inverse FFT (IFFT) operations at each iteration are not required unlike the previous algorithm and thanks to the newly introduced parallel processing approach, the enhanced PCCNC algorithm reduces the required total computational complexity and number of iterations compared to the previous algorithms while achieving the same throughput-vs.-PAPR performance.

  • A Feasibility Study of Multi-Domain Stochastic Computing Circuit Open Access

    Tati ERLINA  Renyuan ZHANG  Yasuhiko NAKASHIMA  

     
    PAPER-Integrated Electronics

      Pubricized:
    2020/10/29
      Vol:
    E104-C No:5
      Page(s):
    153-163

    An efficient approximate computing circuit is developed for polynomial functions through the hybrid of analog and stochastic domains. Different from the ordinary time-based stochastic computing (TBSC), the proposed circuit exploits not only the duty cycle of pulses but also the pulse strength of the analog current to carry information for multiplications. The accumulation of many multiplications is performed by merely collecting the stochastic-current. As the calculation depth increases, the growth of latency (while summations), signal power weakening, and disparity of output signals (while multiplications) are substantially avoidable in contrast to that in the conventional TBSC. Furthermore, the calculation range spreads to bipolar infinite without scaling, theoretically. The proposed multi-domain stochastic computing (MDSC) is designed and simulated in a 0.18 µm CMOS technology by employing a set of current mirrors and an improved scheme of the TBSC circuit based on the Neuron-MOS mechanism. For proof-of-concept, the multiply and accumulate calculations (MACs) are implemented, achieving an average accuracy of 95.3%. More importantly, the transistor counting, power consumption, and latency decrease to 6.1%, 55.4%, and 4.2% of the state-of-art TBSC circuit, respectively. The robustness against temperature and process variations is also investigated and presented in detail.

  • A Hardware Implementation on Customizable Embedded DSP Core for Colorectal Tumor Classification with Endoscopic Video toward Real-Time Computer-Aided Diagnosais System

    Masayuki ODAGAWA  Takumi OKAMOTO  Tetsushi KOIDE  Toru TAMAKI  Bisser RAYTCHEV  Kazufumi KANEDA  Shigeto YOSHIDA  Hiroshi MIENO  Shinji TANAKA  Takayuki SUGAWARA  Hiroshi TOISHI  Masayuki TSUJI  Nobuo TAMBA  

     
    PAPER-VLSI Design Technology and CAD

      Pubricized:
    2020/10/06
      Vol:
    E104-A No:4
      Page(s):
    691-701

    In this paper, we present a hardware implementation of a colorectal cancer diagnosis support system using a colorectal endoscopic video image on customizable embedded DSP. In an endoscopic video image, color shift, blurring or reflection of light occurs in a lesion area, which affects the discrimination result by a computer. Therefore, in order to identify lesions with high robustness and stable classification to these images specific to video frame, we implement a computer-aided diagnosis (CAD) system for colorectal endoscopic images with Narrow Band Imaging (NBI) magnification with the Convolutional Neural Network (CNN) feature and Support Vector Machine (SVM) classification. Since CNN and SVM need to perform many multiplication and accumulation (MAC) operations, we implement the proposed hardware system on a customizable embedded DSP, which can realize at high speed MAC operations and parallel processing with Very Long Instruction Word (VLIW). Before implementing to the customizable embedded DSP, we profile and analyze processing cycles of the CAD system and optimize the bottlenecks. We show the effectiveness of the real-time diagnosis support system on the embedded system for endoscopic video images. The prototyped system demonstrated real-time processing on video frame rate (over 30fps @ 200MHz) and more than 90% accuracy.

  • Service Migration Scheduling with Bandwidth Limitation against Crowd Mobility in Edge Computing Environments

    Hiroaki YAMANAKA  Yuuichi TERANISHI  Eiji KAWAI  

     
    PAPER-Network

      Pubricized:
    2020/09/11
      Vol:
    E104-B No:3
      Page(s):
    240-250

    Edge computing offers computing capability with ultra-low response times by leveraging servers close to end-user devices. Due to the mobility of end-user devices, the latency between the servers and the end-user devices can become long and the response time might become unacceptable for an application service. Service (container) migration that follows the handover of end-user devices retains the response time. Service migration following the mass movement of people in the same geographic area and at the same time due to an event (e.g., commuting) generates heavy bandwidth usage in the mobile backhaul network. Heavy usage by service migration reduces available bandwidth for ordinary application traffic in the network. Shaping the migration traffic limits the bandwidth usage while delaying service migration and increasing the response time of the container for the moving end-user device. Furthermore, targets of migration decisions increase (i.e., the system load) because delaying a migration process accumulates containers waiting for migration. In this paper, we propose a migration scheduling method to control bandwidth usage for migration in a network and ensure timely processing of service migration. Simulations that compare the proposal with state-of-the-art methods show that the proposal always suppresses the bandwidth usage under the predetermined threshold. The method reduced the number of containers exceeding the acceptable response time up to 40% of the compared state-of-the-art methods. Furthermore, the proposed method minimized the targets of migration decisions.

  • Disaggregated Accelerator Management System for Cloud Data Centers

    Ryousei TAKANO  Kuniyasu SUZAKI  

     
    LETTER-Software System

      Pubricized:
    2020/12/07
      Vol:
    E104-D No:3
      Page(s):
    465-468

    A conventional data center that consists of monolithic-servers is confronted with limitations including lack of operational flexibility, low resource utilization, low maintainability, etc. Resource disaggregation is a promising solution to address the above issues. We propose a concept of disaggregated cloud data center architecture called Flow-in-Cloud (FiC) that enables an existing cluster computer system to expand an accelerator pool through a high-speed network. FlowOS-RM manages the entire pool resources, and deploys a user job on a dynamically constructed slice according to a user request. This slice consists of compute nodes and accelerators where each accelerator is attached to the corresponding compute node. This paper demonstrates the feasibility of FiC in a proof of concept experiment running a distributed deep learning application on the prototype system. The result successfully warrants the applicability of the proposed system.

  • Empirical Study of Low-Latency Network Model with Orchestrator in MEC Open Access

    Krittin INTHARAWIJITR  Katsuyoshi IIDA  Hiroyuki KOGA  Katsunori YAMAOKA  

     
    PAPER-Network

      Pubricized:
    2020/09/01
      Vol:
    E104-B No:3
      Page(s):
    229-239

    The Internet of Things (IoT) with its support for cyber-physical systems (CPS) will provide many latency-sensitive services that require very fast responses from network services. Mobile edge computing (MEC), one of the distributed computing models, is a promising component of the low-latency network architecture. In network architectures with MEC, mobile devices will offload heavy computing tasks to edge servers. There exist numbers of researches about low-latency network architecture with MEC. However, none of the existing researches simultaneously satisfy the followings: (1) guarantee the latency of computing tasks and (2) implement a real system. In this paper, we designed and implemented an MEC based network architecture that guarantees the latency of offloading tasks. More specifically, we first estimate the total latency including computing and communication ones at the centralized node called orchestrator. If the estimated value exceeds the latency requirement, the task will be rejected. We then evaluated its performance in terms of the blocking probability of the tasks. To analyze the results, we compared the performance between obtained from experiments and simulations. Based on the comparisons, we clarified that the computing latency estimation accuracy is a significant factor for this system.

101-120hit(1519hit)