The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] CTI(8214hit)

1641-1660hit(8214hit)

  • An Efficient Algorithm of Discrete Particle Swarm Optimization for Multi-Objective Task Assignment

    Nannan QIAO  Jiali YOU  Yiqiang SHENG  Jinlin WANG  Haojiang DENG  

     
    PAPER-Distributed system

      Pubricized:
    2016/08/24
      Vol:
    E99-D No:12
      Page(s):
    2968-2977

    In this paper, a discrete particle swarm optimization method is proposed to solve the multi-objective task assignment problem in distributed environment. The objectives of optimization include the makespan for task execution and the budget caused by resource occupation. A two-stage approach is designed as follows. In the first stage, several artificial particles are added into the initialized swarm to guide the search direction. In the second stage, we redefine the operators of the discrete PSO to implement addition, subtraction and multiplication. Besides, a fuzzy-cost-based elite selection is used to improve the computational efficiency. Evaluation shows that the proposed algorithm achieves Pareto improvement in comparison to the state-of-the-art algorithms.

  • A Waiting Mechanism with Conflict Prediction on Hardware Transactional Memory

    Keisuke MASHITA  Maya TABUCHI  Ryohei YAMADA  Tomoaki TSUMURA  

     
    PAPER-Architecture

      Pubricized:
    2016/08/24
      Vol:
    E99-D No:12
      Page(s):
    2860-2870

    Lock-based thread synchronization techniques have been commonly used in parallel programming on multi-core processors. However, lock can cause deadlocks and poor scalabilites, and Transactional Memory (TM) has been proposed and studied for lock-free synchronization. On TMs, transactions are executed speculatively in parallel as long as they do not encounter any conflicts on shared variables. On general HTMs: hardware implementations of TM, transactions which have conflicted once each other will conflict repeatedly if they will be executed again in parallel, and the performance of HTM will decline. To address this problem, in this paper, we propose a conflict prediction to avoid conflicts before executing transactions, considering historical data of conflicts. The result of the experiment shows that the execution time of HTM is reduced 59.2% at a maximum, and 16.8% on average with 16 threads.

  • Novel Chip Stacking Methods to Extend Both Horizontally and Vertically for Many-Core Architectures with ThrouChip Interface

    Hiroshi NAKAHARA  Tomoya OZAKI  Hiroki MATSUTANI  Michihiro KOIBUCHI  Hideharu AMANO  

     
    PAPER-Architecture

      Pubricized:
    2016/08/24
      Vol:
    E99-D No:12
      Page(s):
    2871-2880

    The increase of recent non-recurrent engineering cost (design, mask and test cost) have made large System-on-Chip (SoC) difficult to develop especially with advanced technology. We radically explore an approach for cheap and flexible chip stacking by using Inductive coupling ThruChip Interface (TCI). In order to connect a large number of small chips for building a large scale system, novel chip stacking methods called the linear stacking and staggered stacking are proposed. They enable the system to be extended to x or/and y dimensions, not only to z dimension. Here, a novel chip staking layout, and its deadlock-free routing design for the case using single-core chips and multi-core chips are shown. The network with 256 nodes formed by the proposed stacking improves the latency of 2D mesh by 13.8% and the performance of NAS Parallel Benchmarks by 5.4% on average compared to that of 2D mesh.

  • The Improvement of the Processes of a Class of Graph-Cut-Based Image Segmentation Algorithms

    Shengxiao NIU  Gengsheng CHEN  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2016/09/14
      Vol:
    E99-D No:12
      Page(s):
    3053-3059

    In this paper, an analysis of the basic process of a class of interactive-graph-cut-based image segmentation algorithms indicates that it is unnecessary to construct n-links for all adjacent pixel nodes of an image before calculating the maximum flow and the minimal cuts. There are many pixel nodes for which it is not necessary to construct n-links at all. Based on this, we propose a new algorithm for the dynamic construction of all necessary n-links that connect the pixel nodes explored by the maximum flow algorithm. These n-links are constructed dynamically and without redundancy during the process of calculating the maximum flow. The Berkeley segmentation dataset benchmark is used to prove that this method can reduce the average running time of segmentation algorithms on the premise of correct segmentation results. This improvement can also be applied to any segmentation algorithm based on graph cuts.

  • Probabilistic Analysis of the Network Reliability Problem on Random Graph Ensembles

    Akiyuki YANO  Tadashi WADAYAMA  

     
    PAPER-Networks and Network Coding

      Vol:
    E99-A No:12
      Page(s):
    2218-2225

    In the field of computer science, the network reliability problem for evaluating the network failure probability has been extensively investigated. For a given undirected graph G, the network failure probability is the probability that edge failures (i.e., edge erasures) make G unconnected. Edge failures are assumed to occur independently with the same probability. The main contributions of the present paper are the upper and lower bounds on the expected network failure probability. We herein assume a simple random graph ensemble that is closely related to the Erds-Rényi random graph ensemble. These upper and lower bounds exhibit the typical behavior of the network failure probability. The proof is based on the fact that the cut-set space of G is a linear space over F2 spanned by the incident matrix of G. The present study shows a close relationship between the ensemble analysis of the expected network failure probability and the ensemble analysis of the error detection probability of LDGM codes with column weight 2.

  • Reliability-Security Tradeoff for Secure Transmission with Untrusted Relays

    Dechuan CHEN  Weiwei YANG  Jianwei HU  Yueming CAI  Xin LIU  

     
    LETTER-Communication Theory and Signals

      Vol:
    E99-A No:12
      Page(s):
    2597-2599

    In this paper, we identify the tradeoff between security and reliability in the amplify-and-forward (AF) distributed beamforming (DBF) cooperative network with K untrusted relays. In particular, we derive the closed-form expressions for the connection outage probability (COP), the secrecy outage probability (SOP), the tradeoff relationship, and the secrecy throughput. Analytical and simulation results demonstrate that increasing K leads to the enhancement of the reliability performance, but the degradation of the security performance. This tradeoff also means that there exists an optimal K maximizing the secrecy throughput.

  • Accelerating Reachability Analysis on Petri Net for Mutual Exclusion-Based Deadlock Detection

    Yunkai DU  Naijie GU  Xin ZHOU  

     
    PAPER-Distributed system

      Pubricized:
    2016/08/24
      Vol:
    E99-D No:12
      Page(s):
    2978-2985

    Petri Net (PN) is a frequently-used model for deadlock detection. Among various detection methods on PN, reachability analysis is the most accurate one since it never produces any false positive or false negative. Although suffering from the well-known state space explosion problem, reachability analysis is appropriate for small- and medium-scale programs. In order to mitigate the explosion problem several kinds of techniques have been proposed aiming at accelerating the reachability analysis, such as net reduction and abstraction. However, these techniques are for general PN and do not take the particularity of application into consideration, so their optimization potential is not adequately developed. In this paper, the feature of mutual exclusion-based program is considered, therefore several strategies are proposed to accelerate the reachability analysis. Among these strategies a customized net reduction rule aims at reducing the scale of PN, two marking compression methods and two pruning methods can reduce the volume of reachability graph. Reachability analysis on PN can only report one deadlock on each path. However, the reported deadlock may be a false alarm in which situation real deadlocks may be hidden. To improve the detection efficiency, we proposed a deadlock recovery algorithm so that more deadlocks can be detected in a shorter time. To validate the efficiency of these methods, a prototype is implemented and applied to SPLASH2 benchmarks. The experimental results show that these methods accelerate the reachability analysis for mutual exclusion-based deadlock detection significantly.

  • Enhancing Entropy Throttling: New Classes of Injection Control in Interconnection Networks

    Takashi YOKOTA  Kanemitsu OOTSU  Takeshi OHKAWA  

     
    PAPER-Interconnection network

      Pubricized:
    2016/08/25
      Vol:
    E99-D No:12
      Page(s):
    2911-2922

    State-of-the-art parallel computers, which are growing in parallelism, require a lot of things in their interconnection networks. Although wide spectrum of efforts in research and development for effective and practical interconnection networks are reported, the problem is still open. One of the largest issues is congestion control that intends to maximize the network performance in terms of throughput and latency. Throttling, or injection limitation, is one of the center ideas of congestion control. We have proposed a new class of throttling method, Entropy Throttling, whose foundation is entropy concept of packets. The throttling method is successful in part, however, its potentials are not sufficiently discussed. This paper aims at exploiting capabilities of the Entropy Throttling method via comprehensive evaluation. Major contributions of this paper are to introduce two ideas of hysteresis function and guard time and also to clarify wide performance characteristics in steady and unsteady communication situations. By introducing the new ideas, we extend the Entropy throttling method. The extended methods improve communication performance at most 3.17 times in the best case and 1.47 times in average compared with non-throttling cases in collective communication, while the method can sustain steady communication performance.

  • Multi-Track Joint Decoding Schemes Using Two-Dimensional Run-Length Limited Codes for Bit-Patterned Media Magnetic Recording

    Hidetoshi SAITO  

     
    PAPER-Signal Processing for Storage

      Vol:
    E99-A No:12
      Page(s):
    2248-2255

    This paper proposes an effective signal processing scheme using a modulation code with two-dimensional (2D) run-length limited (RLL) constraints for bit-patterned media magnetic recording (BPMR). This 2D signal processing scheme is applied to be one of two-dimensional magnetic recording (TDMR) schemes for shingled magnetic recording on bit patterned media (BPM). A TDMR scheme has been pointed out an important key technology for increasing areal density toward 10Tb/in2. From the viewpoint of 2D signal processing for TDMR, multi-track joint decoding scheme is desirable to increase an effective transfer rate because this scheme gets readback signals from several adjacent parallel tracks and detect recorded data written in these tracks simultaneously. Actually, the proposed signal processing scheme for BPMR gets mixed readback signal sequences from the parallel tracks using a single reading head and these readback signal sequences are equalized to a frequency response given by a desired 2D generalized partial response system. In the decoding process, it leads to an increase in the effective transfer rate by using a single maximum likelihood (ML) sequence detector because the recorded data on the parallel tracks are decoded for each time slot. Furthermore, a new joint pattern-dependent noise-predictive (PDNP) sequence detection scheme is investigated for multi-track recording with media noise. This joint PDNP detection is embed in a ML detector and can be useful to eliminate media noise. Using computer simulation, it is shown that the joint PDNP detection scheme is able to compensate media noise in the equalizer output which is correlated and data-dependent.

  • Improvement of Throughput Prediction Scheme Considering Terminal Distribution in Multi-Rate WLAN Considering Both CSMA/CA and Frame Collision

    Ryo HAMAMOTO  Chisa TAKANO  Hiroyasu OBATA  Kenji ISHIDA  

     
    PAPER-Wireless system

      Pubricized:
    2016/08/24
      Vol:
    E99-D No:12
      Page(s):
    2923-2933

    Wireless Local Area Networks (WLANs) based on the IEEE 802.11 standard have been increasingly used. Access Points (APs) are being established in various public places, such as railway stations and airports, as well as private residences. Moreover, the rate of public WLAN services continues to increase. Throughput prediction of an AP in a multi-rate environment, i.e., predicting the amount of receipt data (including retransmission packets at an AP), is an important issue for wireless network design. Moreover, it is important to solve AP placement and selection problems. To realize the throughput prediction, we have proposed an AP throughput prediction method that considers terminal distribution. We compared the predicted throughput of the proposed method with a method that uses linear order computation and confirmed the performance of the proposed method, not by a network simulator but by the numerical computation. However, it is necessary to consider the impact of CSMA/CA in the MAC layer, because throughput is greatly influenced by frame collision. In this paper, we derive an effective transmission rate considering CSMA/CA and frame collision. We then compare the throughput obtained using the network simulator NS2 with a prediction value calculated by the proposed method. Simulation results show that the maximum relative error of the proposed method is approximately 6% and 15% for UDP and TCP, respectively, while that is approximately 17% and 21% in existing method.

  • Hardware-Efficient Local Extrema Detection for Scale-Space Extrema Detection in SIFT Algorithm

    Kazuhito ITO  Hiroki HAYASHI  

     
    LETTER

      Vol:
    E99-A No:12
      Page(s):
    2507-2510

    In this paper a hardware-efficient local extrema detection (LED) method used for scale-space extrema detection in the SIFT algorithm is proposed. By reformulating the reuse of the intermediate results in taking the local maximum and minimum, the necessary operations in LED are reduced without degrading the detection accuracy. The proposed method requires 25% to 35% less logic resources than the conventional method when implemented in an FPGA with a slight increase in latency.

  • New Non-Asymptotic Bounds on Numbers of Codewords for the Fixed-Length Lossy Compression

    Tetsunao MATSUTA  Tomohiko UYEMATSU  

     
    PAPER-Source Coding and Data Compression

      Vol:
    E99-A No:12
      Page(s):
    2116-2129

    In this paper, we deal with the fixed-length lossy compression, where a fixed-length sequence emitted from the information source is encoded into a codeword, and the source sequence is reproduced from the codeword with a certain distortion. We give lower and upper bounds on the minimum number of codewords such that the probability of exceeding a given distortion level is less than a given probability. These bounds are characterized by using the α-mutual information of order infinity. Further, for i.i.d. binary sources, we provide numerical examples of tight upper bounds which are computable in polynomial time in the blocklength.

  • Signal Power Estimation Based on Orthogonal Projection and Oblique Projection

    Norisato SUGA  Toshihiro FURUKAWA  

     
    LETTER-Digital Signal Processing

      Vol:
    E99-A No:12
      Page(s):
    2571-2575

    In this letter, we show the new signal power estimation method base on the subspace projection. This work mainly contributes to the SINR estimation problem because, in this research, the signal power estimation is implicitly or explicitly performed. The difference between our method and the conventional method related to this topic is the exploitation of the subspace character of the signals constructing the observed signal. As tools to perform subspace operation, we apply orthogonal projection and oblique projection which can extracts desired parameters. In the proposed scheme, the statistics of the projected observed signal by these projection are used to estimate the parameters.

  • A New Algorithm for Reducing Components of a Gaussian Mixture Model

    Naoya YOKOYAMA  Daiki AZUMA  Shuji TSUKIYAMA  Masahiro FUKUI  

     
    PAPER

      Vol:
    E99-A No:12
      Page(s):
    2425-2434

    In statistical methods, such as statistical static timing analysis, Gaussian mixture model (GMM) is a useful tool for representing a non-Gaussian distribution and handling correlation easily. In order to repeat various statistical operations such as summation and maximum for GMMs efficiently, the number of components should be restricted around two. In this paper, we propose a method for reducing the number of components of a given GMM to two (2-GMM). Moreover, since the distribution of each component is represented often by a linear combination of some explanatory variables, we propose a method to compute the covariance between each explanatory variable and the obtained 2-GMM, that is, the sensitivity of 2-GMM to each explanatory variable. In order to evaluate the performance of the proposed methods, we show some experimental results. The proposed methods minimize the normalized integral square error of probability density function of 2-GMM by the sacrifice of the accuracy of sensitivities of 2-GMM.

  • A Bit-Write-Reducing and Error-Correcting Code Generation Method by Clustering ECC Codewords for Non-Volatile Memories

    Tatsuro KOJO  Masashi TAWADA  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER

      Vol:
    E99-A No:12
      Page(s):
    2398-2411

    Non-volatile memories are paid attention to as a promising alternative to memory design. Data stored in them still may be destructed due to crosstalk and radiation. We can restore the data by using error-correcting codes which require extra bits to correct bit errors. Further, non-volatile memories consume ten to hundred times more energy than normal memories in bit-writing. When we configure them using error-correcting codes, it is quite necessary to reduce writing bits. In this paper, we propose a method to generate a bit-write-reducing code with error-correcting ability. We first pick up an error-correcting code which can correct t-bit errors. We cluster its codeswords and generate a cluster graph satisfying the S-bit flip conditions. We assign a data to be written to each cluster. In other words, we generate one-to-many mapping from each data to the codewords in the cluster. We prove that, if the cluster graph is a complete graph, every data in a memory cell can be re-written into another data by flipping at most S bits keeping error-correcting ability to t bits. We further propose an efficient method to cluster error-correcting codewords. Experimental results show that the bit-write-reducing and error-correcting codes generated by our proposed method efficiently reduce energy consumption. This paper proposes the world-first theoretically near-optimal bit-write-reducing code with error-correcting ability based on the efficient coding theories.

  • A Highly-Adaptable and Small-Sized In-Field Power Analyzer for Low-Power IoT Devices

    Ryosuke KITAYAMA  Takashi TAKENAKA  Masao YANAGISAWA  Nozomu TOGAWA  

     
    PAPER

      Vol:
    E99-A No:12
      Page(s):
    2348-2362

    Power analysis for IoT devices is strongly required to protect attacks from malicious attackers. It is also very important to reduce power consumption itself of IoT devices. In this paper, we propose a highly-adaptable and small-sized in-field power analyzer for low-power IoT devices. The proposed power analyzer has the following advantages: (A) The proposed power analyzer realizes signal-averaging noise reduction with synchronization signal lines and thus it can reduce wide frequency range of noises; (B) The proposed power analyzer partitions a long-term power analysis process into several analysis segments and measures voltages and currents of each analysis segment by using small amount of data memories. By combining these analysis segments, we can obtain long-term analysis results; (C) The proposed power analyzer has two amplifiers that amplify current signals adaptively depending on their magnitude. Hence maximum readable current can be increased with keeping minimum readable current small enough. Since all of (A), (B) and (C) do not require complicated mechanisms nor circuits, the proposed power analyzer is implemented on just a 2.5cm×3.3cm board, which is the smallest size among the other existing power analyzers for IoT devices. We have measured power and energy consumption of the AES encryption process on the IoT device and demonstrated that the proposed power analyzer has only up to 1.17% measurement errors compared to a high-precision oscilloscope.

  • A Memory-Access-Efficient Implementation for Computing the Approximate String Matching Algorithm on GPUs

    Lucas Saad Nogueira NUNES  Jacir Luiz BORDIM  Yasuaki ITO  Koji NAKANO  

     
    PAPER-GPU computing

      Pubricized:
    2016/08/24
      Vol:
    E99-D No:12
      Page(s):
    2995-3003

    The closeness of a match is an important measure with a number of practical applications, including computational biology, signal processing and text retrieval. The approximate string matching (ASM) problem asks to find a substring of string Y of length n that is most similar to string X of length m. It is well-know that the ASM can be solved by dynamic programming technique by computing a table of size m×n. The main contribution of this work is to present a memory-access-efficient implementation for computing the ASM on a GPU. The proposed GPU implementation relies on warp shuffle instructions which are used to accelerate the communication between threads without resorting to shared memory access. Despite the fact that O(mn) memory access operations are necessary to access all elements of a table with size n×m, the proposed implementation performs only $O( rac{mn}{w})$ memory access operations, where w is the warp size. Experimental results carried out on a GeForce GTX 980 GPU show that the proposed implementation, called w-SCAN, provides speed-up of over two fold in computing the ASM as compared to another prominent alternative.

  • Adaptive Local Thresholding for Co-Localization Detection in Multi-Channel Fluorescence Microscopic Images

    Eisuke ITO  Yusuke TOMARU  Akira IIZUKA  Hirokazu HIRAI  Tsuyoshi KATO  

     
    LETTER-Biological Engineering

      Pubricized:
    2016/07/27
      Vol:
    E99-D No:11
      Page(s):
    2851-2855

    Automatic detection of immunoreactive areas in fluorescence microscopic images is becoming a key technique in the field of biology including neuroscience, although it is still challenging because of several reasons such as low signal-to-noise ratio and contrast variation within an image. In this study, we developed a new algorithm that exhaustively detects co-localized areas in multi-channel fluorescence images, where shapes of target objects may differ among channels. Different adaptive binarization thresholds for different local regions in different channels are introduced and the condition of each segment is assessed to recognize the target objects. The proposed method was applied to detect immunoreactive spots that labeled membrane receptors on dendritic spines of mouse cerebellar Purkinje cells. Our method achieved the best detection performance over five pre-existing methods.

  • Object Detection Based on Image Blur Evaluated by Discrete Fourier Transform and Haar-Like Features

    Ryusuke MIYAMOTO  Shingo KOBAYASHI  

     
    PAPER-Image

      Vol:
    E99-A No:11
      Page(s):
    1990-1999

    In general, in-focus images are used in visual object detection because image blur is considered as a factor reducing detection accuracy. However, in-focus images make it difficult to separate target objects from background images, because of that, visual object detection becomes a hard task. Background subtraction and inter-frame difference are famous schemes for separating target objects from background but they have a critical disadvantage that they cannot be used if illumination changes or the point of view moves. Considering these problems, the authors aim to improve detection accuracy by using images with out-of-focus blur obtained from a camera with a shallow depth of field. In these images, it is expected that target objects become in-focus and other regions are blurred. To enable visual object detection based on such image blur, this paper proposes a novel scheme using DFT-based feature extraction. The experimental results using synthetic images including, circle, star, and square objects as targets showed that a classifier constructed by the proposed scheme showed 2.40% miss rate at 0.1 FPPI and perfect detection has been achieved for detection of star and square objects. In addition, the proposed scheme achieved perfect detection of humans in natural images when the upper half of the human body was trained. The accuracy of the proposed scheme is better than the Filtered Channel Features, one of the state-of-the-art schemes for visual object detection. Analyzing the result, it is convincing that the proposed scheme is very feasible for visual object detection based on image blur.

  • Transient Response of Reference Modified Digital PID Control DC-DC Converters with Neural Network Prediction

    Hidenori MARUTA  Daiki MITSUTAKE  Masashi MOTOMURA  Fujio KUROKAWA  

     
    PAPER-Energy in Electronics Communications

      Pubricized:
    2016/06/17
      Vol:
    E99-B No:11
      Page(s):
    2340-2350

    This paper presents a novel control method based on predictions of a neural network in coordination with a conventional PID control to improve transient characteristics of digitally controlled switching dc-dc converters. Power supplies in communication systems require to achieve a superior operation for electronic equipment installed to them. Especially, it is important to improve transient characteristics in any required conditions since they affect to the operation of power supplies. Therefore, dc-dc converters in power supplies need a superior control method which can suppress transient undershoot and overshoot of output voltage. In the presented method, the neural network is trained to predict the output voltage and is adopted to modify the reference value in the PID control to reduce the difference between the output voltage and its desired one in the transient state. The transient characteristics are gradually improved as the training procedure of the neural network is proceeded repetitively. Furthermore, the timing and duration of neural network control are also investigated and devised since the time delay, which is one of the main issue in digital control methods, affects to the improvement of transient characteristics. The repetitive training and duration adjustment of the neural network are performed simultaneously to obtain more improvement of the transient characteristics. From simulated and experimental results, it is confirmed that the presented method realizes superior transient characteristics compared to the conventional PID control.

1641-1660hit(8214hit)