The search functionality is under construction.
The search functionality is under construction.

IEICE TRANSACTIONS on Information

  • Impact Factor

    0.59

  • Eigenfactor

    0.002

  • article influence

    0.1

  • Cite Score

    1.4

Advance publication (published online immediately after acceptance)

Volume E86-D No.5  (Publication Date:2003/05/01)

    Special Issue on Reconfigurable Computing
  • FOREWORD

    Toshinori SUEYOSHI  

     
    FOREWORD

      Page(s):
    779-780
  • Time-Memory Trade-off Cryptanalysis for Limited Key on FPGA-Based Parallel Machine RASH

    Katsumi TAKAHASHI  Hiroai ASAMI  Katsuto NAKAJIMA  Masahiro IIDA  

     
    PAPER

      Page(s):
    781-788

    We designed an FPGA-based parallel machine called "RASH"(Reconfigurable Architecture based on Scalable Hardware) for high speed and flexible signal/data processing. Cryptanalysis is one of the killer applications for FPGA-based machines because huge amounts of logical and/or simple arithmetic operations are required and FPGA is suitable for this. One of the well-known activities in cryptanalysis is the DES (Data Encryption Standard) cracking contest conducted by RSA Data Security. TMTO (Time-Memory Trade-Off) Cryptanalysis is a practical method to dramatically shorten the time for key search when plaintext is given in advance. A string of ASCII characters is used as the key much like a password. The ASCII character is 7-bit character and is changed to 96 kinds of value. The 56-bit DES key is given with a string of 8 ASCII characters. Although the DES key has 64 trillion(=256) possibilities, the key that is given with a string has only 6.4 trillion(=968) possibilities. Therefore, we improve TMTO cryptanalysis so that we search only the limited key by ASCII characters and reduce the quantity of computation. In this paper, we demonstrate how TMTO cryptanalysis for limited key is well suited to our FPGA-based RASH machine. By limiting the key to a string, DES key will be found at 80% probability within 45 minutes after ciphertext is given on 10 units of RASH. The precomputation before starting key search takes 3 weeks on the same RASH configuration.

  • Design and Implementation of RHiNET-2/NI0: A Reconfigurable Network Interface for Cluster Computing

    Tomonori YOKOYAMA  Naoyuki IZU  Jun-ichiro TSUCHIYA  Konosuke WATANABE  Hideharu AMANO  Tomohiro KUDOH  

     
    PAPER

      Page(s):
    789-795

    A reconfigurable network interface called RHiNET-2/NI0 is developed for parallel processing of PCs distributed within one or more floors of a building. Two configurations: the HS (High Speed) configuration with only a high-speed primitive and the DSM (Distributed Shared Memory) configuration which supports sophisticated primitives can be selected by the network requirement. From the empirical evaluation, it appears that the HS configuration markedly improves the latency of data transfer compared with traditional network interfaces. On the other hand, the DSM configuration executes sophisticated primitives for distributed shared memory more than twice as fast as that of software implementation.

  • Data Dependent Circuit for Subgraph Isomorphism Problem

    Shuichi ICHIKAWA  Shoji YAMAMOTO  

     
    PAPER

      Page(s):
    796-802

    Although the subgraph isomorphism problem has various important applications, it is generally NP-complete and difficult to solve. Though a custom computing circuit can reduce the execution time substantially, it requires considerable hardware resources and is inapplicable to large problems. This paper examines the feasibility of data dependent designs, which are particularly suitable to a Field Programmable Gate Array (FPGA). The data dependent approach drastically reduces hardware requirements. For graphs of 32 vertices, the average logic scale of data dependent circuits is only 5% of the corresponding data independent circuit. The data dependent circuit is estimated to be maximally 460 times faster than the software. Even if the circuit generation time is included, a data dependent circuit is estimated to be 2.04 times faster than software for graphs of 32 vertices. The performance gain would increase for larger graphs.

  • Accelerating the CKY Parsing Using FPGAs

    Jacir L. BORDIM  Yasuaki ITO  Koji NAKANO  

     
    PAPER

      Page(s):
    803-810

    The main contribution of this paper is to present an FPGA-based implementation of an instance-specific hardware which accelerates the CKY (Cocke-Kasami-Younger) parsing for context-free grammars. Given a context-free grammar G and a string x, the CKY parsing determines whether G derives x. We have developed a hardware generator that creates a Verilog HDL source to perform the CKY parsing for any given context-free grammar G. The generated source is embedded in an FPGA using the design software provided by the FPGA vendor. We evaluated the instance-specific hardware, generated by our hardware generator, using a timing analyzer and tested it using the Altera FPGAs. The generated hardware attains a speed-up factor of approximately 750 over the software CKY parsing algorithm.

  • An Image Retrieval System Using FPGAs

    Koji NAKANO  Etsuko TAKAMICHI  

     
    PAPER

      Page(s):
    811-818

    The main contribution of this paper is to present an image retrieval system using FPGAs. Given a template image T and a database of a number of images I1, I2,, our system lists all images that contain a subimage similar to T. More specifically, a hardware generator in our system creates the Verilog HDL source of a hardware that determines whether Ii has a similar subimage to T for any image Ii and a particular template T. The created Verilog HDL source is compiled and embedded in an FPGA using the design tool provided by the FPGA vendor. Since the hardware embedded in the FPGA is designed for a particular template T, it is an instance-specific hardware that allows us to achieve extreme acceleration. We evaluate the performance of our image matching hardware using a PCI-connected Xilinx FPGA and a timing analyzer. Since the generated hardware attains up to 3000 speed-up factor over the software solution, our approach is promising.

  • Reconfigurable Onboard Processing and Real-Time Remote Sensing

    John A. WILLIAMS  Anwar S. DAWOOD  Stephen J. VISSER  

     
    PAPER

      Page(s):
    819-829

    In this paper we present reconfigurable computing as a compelling choice of computing platform for real-time, onboard processing for satellite applications. In particular, we discuss the use of reconfigurable computing in the context of a real-time remote sensing system, providing motivation for such a system and describing attributes of reconfigurable computing that support it as the technology of choice. The High Performance Computing (HPC-I) payload, designed and developed for the Australian scientific satellite FedSat, is introduced as a demonstration of onboard processing in space using reconfigurable logic. We present an overview of the real-time remote sensing system architecture, and describe the design and implementation of three remote sensing algorithms in HPC-I for cloud masking, wildfire detection and volcanic plume detection. Finally, results from simulation and testing are presented which show very promising performance in terms of data throughput and detection capabilities.

  • PARS Architecture: A Reconfigurable Architecture with Generalized Execution Model--Design and Implementation of Its Prototype Processor

    Kazuya TANIGAWA  Tetsuo HIRONAKA  Akira KOJIMA  Noriyoshi YOSHIDA  

     
    PAPER

      Page(s):
    830-840

    Reconfigurable architectures have been focused for its potential on achieving high performance by reconfiguring special purpose circuits for a target application and its flexibility due to its ability of reconfiguring. We have set our sights on use of a reconfigurable architecture as a general-purpose computer by extending the advantageous properties of the architecture. To achieve the goal, a generalized execution model for reconfigurable architecture is required, so we have proposed an Ideal PARallel Structure (I-PARS) execution model. In the I-PARS execution model, any programs based on its model has no restriction depending on hardware structures based on a specific reconfigurable processor, which makes it easier to develop software. Further, we have proposed a PARS architecture which executes programs based on the I-PARS execution model effectively. The PARS architecture has a large reconfigurable part for highly parallel execution, which utilizes parallelism described on the I-PARS execution model. For effective utilization of the reconfigurable part in the PARS architecture, it has an ability to reconfigure and execute operations simultaneously in one cycle. Further, the PARS architecture supports branch operations to introduce control flow in an execution on the architecture, which makes it possible to skip an execution which does not produce a valid result. In this paper, we introduce the detailed structure of an implemented prototype processor based on the PARS architecture. In the implementation, 420,377 CMOS transistors were used, which was only 3.8% of the number of transistors used in the UltraSPARC-III in logic circuits. Additionally, we evaluated the performance of the prototype processor by using some benchmark programs. From the evaluation results, we found that the prototype processor could achieve nearly the same performance and be implemented with extremely the less number of transistors compared with UltraSPARC-III 750MHz.

  • Implementation of Data Driven Applications on a Multi-Context Reconfigurable Device

    Masaki UNO  Yuichiro SHIBATA  Hideharu AMANO  

     
    PAPER

      Page(s):
    841-849

    WASMII is a virtual hardware system that executes dataflow algorithms using a dynamically reconfigurable multi-context device with a data driven control mechanism. Although the effectiveness of the system has been evaluated through simulations and using an emulator, implementation of WASMII was infeasible due to the unavailability of such a device. However, the first prototype of a practical dynamically reconfigurable multi-context device called DRL has been developed by NEC, and we developed a reconfigurable test bed using four sample DRL chips. On this board, we have implemented and executed some simple applications of WASMII mechanism. Evaluation results show that the performance of the parallel implementation of WASMII is almost twice as that of a PC with a CPU based on the corresponding technology.

  • Evaluation and Comparison of Implementation Alternatives for Look-up Tables for Plastic Cell Architecture

    Jun'ichiro TAKEMOTO  Toshihiro GOTO  Yuichiro SHIBATA  Kiyoshi OGURI  

     
    PAPER

      Page(s):
    850-858

    In this paper, the efficient structure of an LUT (look-up table) for an asynchronous reconfigurable PCA (Plastic Cell Architecture) device is investigated. A total of 15 types of implementation alternatives for LUTs are evaluated and compared in an empirical manner in which full custom layout design is developed and simulated. The evaluation results show that by introducing transmission gates in memory cells in an LUT, read time can be improved by 14.3% at the cost of 13.6% area increase compared to a conventional speed oriented implementation. It is also shown that use of transmission gates reduces 6.4% of area and 19.2% of read time against a conventional area oriented LUT implementation.

  • Dynamically Reconfigurable Logic LSI--PCA-1: The First Realization of the Plastic Cell Architecture

    Hideyuki ITO  Ryusuke KONISHI  Hiroshi NAKADA  Kiyoshi OGURI  Minoru INAMORI  Akira NAGOYA  

     
    PAPER

      Page(s):
    859-867

    This paper describes the realization of a dynamically reconfigurable logic LSI based on a novel parallel computer architecture. The key point of the architecture is its dual-structured cell array which enables dynamic and autonomous reconfiguration of the logic circuits. The LSI was completed by successfully introducing two specific features: fully asynchronous logic circuits and a homogeneous structure, only LUTs are used.

  • Design Tools and Trial Designs for PCA-Chip2

    Takuya OKAMOTO  Takafumi YUASA  Tomonori IZUMI  Takao ONOYE  Yukihiro NAKAMURA  

     
    LETTER

      Page(s):
    868-871

    A configurable device "PCA-Chip2" implements the concept of Plastic Cell Architecture, which is an extension of programmable logic devices. This paper presents basic design tools for the PCA-Chip2 as the first step to develop the total design environment. Given a C description of a target function, configuration data for PCA-Chip2 is automatically generated by the tools. Trial designs by the tools are also presented to demonstrate the practicability of the proposed approach.

  • A Pulse-Coupled Neural Network Simulator Using a Programmable Gate Array Technique

    Kousuke KATAYAMA  Atsushi IWATA  

     
    PAPER

      Page(s):
    872-881

    In this paper, we propose a novel pulse-coupled neural network (PCNN) simulator using a programmable gate array (PGA) technique. The simulator is composed of modified phase-locked loops (PLLs) and a programmable gate array (PGA). The PLL, which is modified by the addition of multiple inputs and multiple feedbacks, works as a neuron. The PGA, which controls the network connection, works as nodes of dendritic trees. This simulator, which has 16 neurons and 32 32 network connections, is designed on a chip (4.73mm 4.73mm), and its basic operations such as synchronization, an oscillatory associative memory, and FM interactions are confirmed using circuit simulator SPICE.

  • An Evolvable Hardware Chip for a Prosthetic-Hand Controller--New Reconfigurable Hardware Paradigm--

    Isamu KAJITANI  Masaya IWATA  Nobuyuki OTSU  Tetsuya HIGUCHI  

     
    PAPER

      Page(s):
    882-890

    This paper presents a new reconfigurable hardware paradigm, called evolvable hardware (EHW), and its application to the biomedical engineering problem of an artificial hand controller. Evolvable hardware is based on the idea of combining a reconfigurable hardware device with an artificial intelligence robust search technique called genetic algorithms (GAs) to execute reconfiguration autonomously. The first version of the EHW chip was designed in 1998, and this paper describes the latest improvements to the EHW chip, as well as outlining its architecture and the hardware implementation of the GA operations. Execution speed for genetic operations is shown to be about 38.7 times faster with the hardware implementation than with software program running on an AMD Athlon processor (1.2GHz). As an application of the EHW chip, this paper introduces a controller for a multi-functional prosthetic-hand, and presents experimental data in which a practical myoelectric pattern classification rate of 97.8% was achieved through the application of the EHW chip.

  • Prototyping of a 5 GHz WLAN Reconfigurable System-on-Chip

    Spyridon BLIONAS  Konstantinos MASSELOS  Chrissavgi DRE  Christos DROSOS  Fragkiskos IEROMNIMON  Dimitris METAFAS  Thanasis PAGONIS  Aristodemos PNEVMATIKAKIS  Anna TATSAKI  Theodor TRIMIS  Adamandios VONTZALIDIS  

     
    PAPER

      Page(s):
    891-900

    In this paper the development of the prototyping platform of a partly reconfigurable System-on-Chip (SoC) for wireless LANs, is described. It is designed to realize both HIPERLAN/2 and IEEE 802.11a wireless LAN systems. The current version of the system includes Mobile Terminal and AP functionality only for indoor use. Future firmware versions (configurations for its reconfigurable part) will upgrade system's functionality to allow its operation in outdoor environments and in wireless point-to-point links. The target System-on-Chip implementation platform will include instruction set processor cores, ASIC blocks and embedded reconfigurable blocks to achieve an optimal balance between implementation efficiency (area, power, performance) and flexibility. The system's prototype is developed on the ARM integrator platform and all firmware versions will be verified before ASIC prototyping.

  • Evaluating Online Hot Instruction Sequence Profilers for Dynamically Reconfigurable Functional Units

    Takanori HAYASHIDA  Kazuaki MURAKAMI  

     
    PAPER

      Page(s):
    901-909

    Online profiling methodologies are studied for exploiting dynamic optimization. On a dynamic optimizable system with online profilers, it has to get accurate profile in early step of the program execution for effective execution. However, for getting more effective profile by online profiling, it has to satisfy "Rapidness" and "Accuracy". They are conflicted requirements. Therefore, it has to choose trade-off point at implementation. We focused into online Hot Instruction Sequence (HIS) profiler to exploit reconfigurable functional units. To circumstantiate the effectiveness of online HIS profiling, we build some evaluation models for experimental evaluation. Our profiler models are SC/DM, SC/FA and JC/DM. These models have different policy of event counting and table lookup. Our event counting policies are simple-counting or jumble-counting. On the other hand, table lookup policies are direct-map or full-associative. In our experimental evaluation, SC/FA and JC/DM models scored higher accuracy than SC/DM. The JC/DM model is able to implement by lower cost for table lookup, but it scored high accuracy comparable to SC/FA.

  • Regular Section
  • B-Ternary Asynchronous Digital System under Relativity Delay

    Yasunori NAGATA  Masao MUKAIDONO  

     
    PAPER-Computer System Element

      Page(s):
    910-919

    Some of the recent digital systems have a serious clock skew problem due to huge hardware implementation and high-speed operation in VLSI's. To overcome this problem, clock distribution techniques and, more notably, asynchronous system design methodologies have been investigated. Since the latest asynchronous digital systems use two-rail logic with two-phase data transfer manner, more than two-fold hardware is required in comparison with the synchronous system. In this article, we present a design of asynchronous digital system which is based on B-ternary logic that can process binary data. The system which is based on speed-independent mode consists of data-path and its controller. Then we provide B-ternary two-phase binary data processing in the data-path and its control procedure with hand-shake protocol. To implement the system some functional elements are presented, that is, a ternary-in/binary-out register with request/acknowledge circuits and a control unit. These functional elements are fabricated with ternary NOR, NAND, INV gates and ternary-in/binary-out D-FF (D-elements). The B-ternary based asynchronous circuit has less interconnections, achives race-free operations and makes use of conventional binary powerful design tools. Particularly, we extend the speed-independent delay model to relativity delays in order to reduce hardware overhead of checking memory stability in the system. As a concrete example, a carry-completion type asynchronous adder system is demonstrated under extended speed-independent mode to show the validity of the extension.

  • Relaxing Constraints due to Data and Control Dependences

    Katsuhiko METSUGI  Kazuaki MURAKAMI  

     
    PAPER-Computer Systems

      Page(s):
    920-928

    TLSP (Thread-Level Speculative Parallel processing) architecture is a growing processor architecture. Parallelism of a program executed on this architecture is ruled by the combination of techniques which relax data dependences. In this paper, we evaluate the limits of parallelism of the TLSP architecture by using abstract machine models. We have three major results. First, if we use solely each technique which relaxes data dependences, "renaming" has a large effect on the TLSP architecture. Second, combinatorial use of "memory disambiguation" and "renaming" leads to huge parallelism. Third, constant effects are obtained by concurrent use of "value prediction" and other techniques.

  • A New Texture Feature Based on PCA Pattern Maps and Its Application to Image Retrieval

    Xiang-Yan ZENG  Yen-Wei CHEN  Zensho NAKAO  Hanqing LU  

     
    PAPER-Pattern Recognition

      Page(s):
    929-936

    We propose a novel pixel pattern-based approach for texture classification, which is independent of the variance of illumination. Gray scale images are first transformed into pattern maps in which edges and lines, used for characterizing texture information, are classified by pattern matching. We employ principal component analysis (PCA) which is widely applied to feature extraction. We use the basis functions learned through PCA as templates for pattern matching. Using PCA pattern maps, the feature vector is comprised of the numbers of the pixels belonging to a specific pattern. The effectiveness of the new feature is demonstrated by applications to the image retrievals of the Brodatz texture database. Comparisons with multichannel and multiresolution features indicate that the new feature is quite time saving, free of the influence of illumination, and has comparable accuracy.

  • Speech Enhancement Using Band-Dependent Spectral Estimators

    Ilyas POTAMITIS  Nikos FAKOTAKIS  George KOKKINAKIS  

     
    PAPER-Speech and Hearing

      Page(s):
    937-946

    Our work introduces a speech enhancement algorithm that modifies on-line the spectral representation of degraded speech to approximate the spectral coefficients of high quality speech. The proposed framework is based on the application of Discrete Fourier Transform (DFT) to a large ensemble of clean speech frames and the estimation of parametric, heavy-tail non-Gaussian probability distributions for the spectral magnitude. Each clean spectral band possesses a unique pdf. This is selected according to the smallest Kullback-Leibler divergence between each candidate heavy-tail pdf and the non-parametric pdf of the magnitude of each spectral band of the clean ensemble. The parameters of the distributions are derived by Maximum Likelihood Estimation (MLE). A maximum a-posteriori (MAP) formulation of the degraded spectral bands leads to soft threshold functions, optimally derived from the statistics of each spectral band and effectively reducing white and slowly varying coloured Gaussian noise. We evaluate the new algorithm on the task of improving the quality of speech perception as well as Automatic Speech Recognition (ASR) and demonstrate its robustness at SNRs as low as 0 dB.

  • 3-D Modeling of Real World by Fusing Multi-View Range Data and Texture Images

    Conny GUNADI  Hiroyuki SHIMIZU  Kazuya KODAMA  Kiyoharu AIZAWA  

     
    PAPER-Image Processing, Image Pattern Recognition

      Page(s):
    947-955

    Construction of large-scale virtual environment is gaining more attentions for its applications in virtual mall, virtual sightseeing, tele-presence, etc. This paper presents a framework for building a realistic virtual environment from geometry-based approach. We propose an algorithm to construct a realistic 3-D model from multi-view range data and multi-view texture images. The proposed method tries to adopt the result of region segmentation of range images in some phases of the modeling process. It is shown that the relations obtained from region segmentation are quite effective in improving the result of registration as well as mesh merging.

  • Human Face Extraction and Recognition Using Radial Basis Function Networks

    Kiminori SATO  Nan HE  Yukitoshi TAKAHASHI  

     
    PAPER-Image Processing, Image Pattern Recognition

      Page(s):
    956-963

    Partial face images, e.g., eyes, nose, and ear images are significant for face recognition. In this paper, we present a method for partial face extraction and recognition based on Radial Basis Function (RBF) networks. Focus has been centered on using ear images because they are not influenced by facial expression, and the influences of aging are negligible. Original human side face image with 320240 pixels is input, and then the RBF network locates the ear and extracts it with a 200120 pixel image. Next, another RBF network is constructed for the purpose of recognition. An algorithm that determines the radius of an RBF function is proposed. Dynamic radius, so called as compared to static one, is found through the algorithm that makes RBF functions adaptable to the training samples. We built a database that contains 600 side face images, from 100 people, to test the method and the results of both extraction and recognition are satisfied.

  • Automatic Feature Extraction from Breast Tumor Images Using Artificial Organisms

    Hironori OKII  Takashi UOZUMI  Koichi ONO  Hong YAN  

     
    PAPER-Medical Engineering

      Page(s):
    964-975

    In this paper, we propose a new computer-aided diagnosis system which can extract specific features from hematoxylin and eosin (HE)-stained breast tumor images and evaluate the type of tumor using artificial organisms. The gene of the artificial organisms is defined by three kinds of texture features, which can evaluate the specific features of the tumor region in the image. The artificial organisms move around in the image and investigate their environmental conditions during the searching process. When the target pixel is regarded as a tumor region, the organism obtains energy and produces offspring; organisms in other regions lose energy and die. The searching process is iterated until the 30th generation; as a result, tumor regions are filled with artificial organisms. Whether the detected tumor is benign or malignant is evaluated based on the combination of selected genes. The method developed was applied to 27 test cases and the distinction between benign and malignant tumors by the artificial organisms was successful in about 90% of tumor images. In this diagnosis support system, the combination of genes, which represents specific features of detected tumor region, is selected automatically for each tumor image during the searching process.

  • Detection of Summative Global Predicates

    Loon-Been CHEN  I-Chen WU  

     
    LETTER-Theory and Models of Software

      Page(s):
    976-980

    In many distributed systems, tokens are fundamental tools to manage resources shared by processes. Thus, monitoring tokens has become a significant problem in developing the distributed programs. This paper formulates the problems of monitoring tokens in terms of detecting the special global predicates, called summative global predicates. In this paper, several algorithms to detect various summative global predicates are developed and their time complexities are discussed.

  • Calibration Method by Image Registration with Synthetic Image of 3D Model

    Toru TAMAKI  Masanobu YAMAMOTO  

     
    LETTER-Image Processing, Image Pattern Recognition

      Page(s):
    981-985

    We propose a method for camera calibration based on image registration. This method registers two images; one is a real image captured by a camera with a calibration object with known shape and texture, and the other is a synthetic image containing the object. The proposed method estimates the parameters of the rotation and translation of the object by using the depth information of the synthetic image. The Gauss-Newton method is used to minimize the residuals of intensities of the two images. The proposed method does not depend on initial values of the minimization, and is applicable to images with much noise. Experimental results using real images demonstrate the robustness against initial state and noise on the image.