The search functionality is under construction.

Author Search Result

[Author] Hiroyuki OCHI(24hit)

1-20hit(24hit)

  • Fault Tolerant Dynamic Reconfigurable Device Based on EDAC with Rollback

    Kentaro NAKAHARA  Shin'ichi KOUYAMA  Tomonori IZUMI  Hiroyuki OCHI  Yukihiro NAKAMURA  

     
    PAPER-VLSI Architecture

      Vol:
    E89-A No:12
      Page(s):
    3652-3658

    Reconfigurable devices are expected to be utilized in such mission-critical fields as space development and undersea cables, because system updates and pseudo-repair can be achieved remotely by reconfiguring. However, conventional reconfigurable devices suffer from memory-bit upset caused by charged particles in space which results in fatal system problems. In this paper, we propose an architecture of a fault-tolerant reconfigurable device. The proposed device is divided into "autonomous-repair cells" with embedded control circuits. The autonomous-repair cell proposed in this paper is based on error detection and correction (EDAC) and uses hardware and time redundancy. From evaluation, it is shown that the proposed architecture achieves sufficient reliability against configuration memory upset. Trade-offs between performance and cost are also analyzed.

  • Development of an IP Library of IEEE-754-Standard Single-Precision Floating-Point Dividers

    Hiroyuki OCHI  Tatsuya SUZUKI  Sayaka MATSUNAGA  Yoichi KAWANO  Takao TSUDA  

     
    PAPER-IP Design

      Vol:
    E86-A No:12
      Page(s):
    3020-3027

    Floating-point units (FPUs) are indispensable in processors, 3D-graphic engines, etc. To improve design productivity of these LSIs, FPU IPs are strongly desired. However, it is impossible to cover wide range of needs by an FPU IP, because there are various kind of options in specifications (e.g., operating frequency, latency, and ability of pipeline operation) and implementations (e.g., hardware algorithms). Thus, multiple IPs are needed even for the same functionality. In this paper, we propose to build an IP Library which consists of large number of FPU IPs with various kind of specifications and implementations, and which has catalogue data that shows not only specifications but also post-layout area and power dissipation of each IP. As the first step of the project, we have developed an IP Library targeted to Rohm 0.35 µm triple-metal process, which consists of 20 IPs for IEEE-754-standard single-precision floating-point division with 5 operating frequencies (50 MHz, 75 MHz, 100 MHz, 125 MHz, and 150 MHz), with two options whether pipelined or not, and with two hardware algorithms (the restoring method and the SRT method). We have also developed a catalogue for the IP Library, which shows post-layout area and power dissipation as well as specification of each IP. We have introduced two metrics "performance-area ratio (MFLOPS/mm2)" and "performance-power ratio (MFLOPS/W)" to afford a good insight into efficiency of implementations. From the catalogue data, the restoring method is, on the average, 1.4 times and 2.3 times better than the SRT method in terms of performance-area ratio and performance-power ratio, respectively. The developed catalogue is usable not only for selection of the optimal IP for a specific application, but also for quantitative analysis at the early stage of architecture design. It is also expected that the catalogue data based on an actual process technology is valuable for education.

  • ASAver.1: An FPGA-Based Education Board for Computer Architecture/System Design

    Hiroyuki OCHI  Yoko KAMIDOI  Hideyuki KAWABATA  

     
    PAPER

      Vol:
    E80-A No:10
      Page(s):
    1826-1833

    This paper proposes a new approach that makes it possible for every undergraduate student to perform experiments of developing a Ipipelined RISC processor within limited time available for the course. The approach consists of 4 steps. At the first step, every student implements by himself/herself a pipelined RISC processor which is based on a given, very simple model; it has separate buses for instruction and data memory ("Harvard architecture") to avoid structural hazard, while it completely ignores data control hazards to make implementation easy. Although it is such a "defective" processor, we can test its functionality by giving object code containing sufficient amount of NOP instructions to avoid hazards. At the second step, NOP instructions are deleted and behavior of the developed processor is observed carefully to understand data and control hazards. At the third step, benchmark problems are provided, and every student challenges to improve its performance. Finally every student is requested to present how he/she improved the processor. This paper also describes a new educational FPGA board ASAver.1 which is useful for experiments from introductory class to computer architecture/system class. As a feasibility study, a 16-bit pipelined RISC processor "ASAP-O" has been developed which has eight 16-bit general purpose registers, a 16-bit program counter, and a zero flag, with 10 essential instructions.

  • A Variability-Aware Energy-Minimization Strategy for Subthreshold Circuits

    Junya KAWASHIMA  Hiroshi TSUTSUI  Hiroyuki OCHI  Takashi SATO  

     
    PAPER-Device and Circuit Modeling and Analysis

      Vol:
    E95-A No:12
      Page(s):
    2242-2250

    We investigate a design strategy for subthreshold circuits focusing on energy-consumption minimization and yield maximization under process variations. The design strategy is based on the following findings related to the operation of low-power CMOS circuits: (1) The minimum operation voltage (VDDmin) of a circuit is dominated by flip-flops (FFs), and VDDmin of an FF can be improved by upsizing a few key transistors, (2) VDDmin of an FF is stochastically modeled by a log-normal distribution, (3) VDDmin of a large circuit can be efficiently estimated by using the above model, which eliminates extensive Monte Carlo simulations, and (4) improving VDDmin may substantially contribute to decreasing energy consumption. The effectiveness of the proposed design strategy has been verified through circuit simulations on various circuits, which clearly show the design tradeoff between voltage scaling and transistor sizing.

  • Nonvolatile Storage Cells Using FiCC for IoT Processors with Intermittent Operations

    Yuki ABE  Kazutoshi KOBAYASHI  Jun SHIOMI  Hiroyuki OCHI  

     
    PAPER

      Pubricized:
    2023/04/13
      Vol:
    E106-C No:10
      Page(s):
    546-555

    Energy harvesting has been widely investigated as a potential solution to supply power for Internet of Things (IoT) devices. Computing devices must operate intermittently rather than continuously, because harvested energy is unstable and some of IoT applications can be periodic. Therefore, processors for IoT devices with intermittent operation must feature a hibernation mode with zero-standby-power in addition to energy-efficient normal mode. In this paper, we describe the layout design and measurement results of a nonvolatile standard cell memory (NV-SCM) and nonvolatile flip-flops (NV-FF) with a nonvolatile memory using Fishbone-in-Cage Capacitor (FiCC) suitable for IoT processors with intermittent operations. They can be fabricated in any conventional CMOS process without any additional mask. NV-SCM and NV-FF are fabricated in a 180nm CMOS process technology. The area overhead by nonvolatility of a bit cell are 74% in NV-SCM and 29% in NV-FF, respectively. We confirmed full functionality of the NV-SCM and NV-FF. The nonvolatile system using proposed NV-SCM and NV-FF can reduce the energy consumption by 24.3% compared to the volatile system when hibernation/normal operation time ratio is 500 as shown in the simulation.

  • Hardware Accelerator for Run-Time Learning Adopted in Object Recognition with Cascade Particle Filter

    Hiroki SUGANO  Hiroyuki OCHI  Yukihiro NAKAMURA  Ryusuke MIYAMOTO  

     
    PAPER-Image Processing

      Vol:
    E92-A No:11
      Page(s):
    2801-2808

    Recently, many researchers tackle accurate object recognition algorithms and many algorithms are proposed. However, these algorithms have some problems caused by variety of real environments such as a direction change of the object or its shading change. The new tracking algorithm, Cascade Particle Filter, is proposed to fill such demands in real environments by constructing the object model while tracking the objects. We have been investigating to implement accurate object recognition on embedded systems in real-time. In order to apply the Cascade Particle Filter to embedded applications such as surveillance, automotives, and robotics, a hardware accelerator is indispensable because of limitations in power consumption. In this paper we propose a hardware implementation of the Discrete AdaBoost algorithm that is the most computationally intensive part of the Cascade Particle Filter. To implement the proposed hardware, we use PICO Express, a high level synthesis tool provided by Synfora, for rapid prototyping. Implementation result shows that the synthesized hardware has 1,132,038 transistors and the die area is 2,195 µm 1,985 µm under a 0.180 µm library. The simulation result shows that total processing time is about 8.2 milliseconds at 65 MHz operation frequency.

  • An Exact Minimization of AND-EXOR Expressions Using Encoded MRCF

    Hiroyuki OCHI  

     
    LETTER

      Vol:
    E79-A No:12
      Page(s):
    2131-2133

    In this paper, an exact-minimization method for an AND-EXOR expression (ESOP) using O-suppressed binary decision diagrams (ZBDDs) is considered. The proposed method is an improvement of Sasao's MRCF-based method. From experimental results, it is shown that required ZBDD size is reduced to 1/3 in the best case compared with the MRCF-based method.

  • Range Limiter Using Connection Bounding Box for SA-Based Placement of Mixed-Grained Reconfigurable Architecture

    Takashi KISHIMOTO  Wataru TAKAHASHI  Kazutoshi WAKABAYASHI  Hiroyuki OCHI  

     
    PAPER

      Vol:
    E99-A No:12
      Page(s):
    2328-2334

    In this paper, we propose a novel placement algorithm for mixed-grained reconfigurable architectures (MGRAs). MGRA consists of coarse-grained and fine-grained clusters, in order to implement a combined digital systems of high-speed data paths with multi-bit operands and random logic circuits for state machines and bit-wise operations. For accelerating simulated annealing based FPGA placement algorithm, range limiter has been proposed to control the distance of two blocks to be interchanged. However, it is not applicable to MGRAs due to the heterogeneous structure of MGRAs. Proposed range limiter using connection bounding box effectively keeps the size of range limiter to encourage moves across fine-grain blocks in non-adjacent clusters. From experimental results, the proposed method achieved 47.8% reduction of cost in the best case compared with conventional methods.

  • Reliability-Configurable Mixed-Grained Reconfigurable Array Supporting C-Based Design and Its Irradiation Testing

    Hiroaki KONOURA  Dawood ALNAJJAR  Yukio MITSUYAMA  Hajime SHIMADA  Kazutoshi KOBAYASHI  Hiroyuki KANBARA  Hiroyuki OCHI  Takashi IMAGAWA  Kazutoshi WAKABAYASHI  Masanori HASHIMOTO  Takao ONOYE  Hidetoshi ONODERA  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E97-A No:12
      Page(s):
    2518-2529

    This paper proposes a mixed-grained reconfigurable architecture consisting of fine-grained and coarse-grained fabrics, each of which can be configured for different levels of reliability depending on the reliability requirement of target applications, e.g. mission-critical applications to consumer products. Thanks to the fine-grained fabrics, the architecture can accommodate a state machine, which is indispensable for exploiting C-based behavioral synthesis to trade latency with resource usage through multi-step processing using dynamic reconfiguration. In implementing the architecture, the strategy of dynamic reconfiguration, the assignment of configuration storage and the number of implementable states are key factors that determine the achievable trade-off between used silicon area and latency. We thus split the configuration bits into two classes; state-wise configuration bits and state-invariant configuration bits for minimizing area overhead of configuration bit storage. Through a case study, we experimentally explore the appropriate number of implementable states. A proof-of-concept VLSI chip was fabricated in 65nm process. Measurement results show that applications on the chip can be working in a harsh radiation environment. Irradiation tests also show the correlation between the number of sensitive bits and the mean time to failure. Furthermore, the temporal error rate of an example application due to soft errors in the datapath was measured and demonstrated for reliability-aware mapping.

  • A Cost-Effective Selective TMR for Coarse-Grained Reconfigurable Architectures Based on DFG-Level Vulnerability Analysis

    Takashi IMAGAWA  Hiroshi TSUTSUI  Hiroyuki OCHI  Takashi SATO  

     
    PAPER

      Vol:
    E96-C No:4
      Page(s):
    454-462

    This paper proposes a novel method to determine a priority for applying selective triple modular redundancy (selective TMR) against single event upset (SEU) to achieve cost-effective reliable implementation of application circuits onto coarse-grained reconfigurable architectures (CGRAs). The priority is determined by an estimation of the vulnerability of each node in the data flow graph (DFG) of the application circuit. The estimation is based on a weighted sum of the node parameters which characterize impact of the SEU in the node on the output data. This method does not require time-consuming placement-and-routing processes, as well as extensive fault simulations for various triplicating patterns, which allows us to identify the set of nodes to be triplicated for minimizing the vulnerability under given area constraint at the early stage of design flow. Therefore, the proposed method enables us efficient design space exploration of reliability-oriented CGRAs and their applications.

  • Efficient Memory Organization Framework for JPEG2000 Entropy Codec

    Hiroki SUGANO  Takahiko MASUZAKI  Hiroshi TSUTSUI  Takao ONOYE  Hiroyuki OCHI  Yukihiro NAKAMURA  

     
    PAPER-Realization

      Vol:
    E92-A No:8
      Page(s):
    1970-1977

    The encoding/decoding process of JPEG2000 requires much more computation power than that of conventional JPEG mainly due to the complexity of the entropy encoding/decoding. Thus usually multiple entropy codec hardware modules are implemented in parallel to process the entropy encoding/decoding. This module, however, requests many small-size memories to store intermediate data, and when multiple modules are implemented on a chip, employment of the large number of SRAMs increases difficulty of whole chip layout. In this paper, an efficient memory organization framework for the entropy encoding/decoding module is proposed, in which not only existing memory organizations but also our proposed novel memory organization methods are attempted to expand the design space to be explored. As a result, the efficient memory organization for a target process technology can be explored.

  • A Simulation Platform for Designing Cell-Array-Based Self-Reconfigurable Architecture

    Shin'ichi KOUYAMA  Tomonori IZUMI  Hiroyuki OCHI  Yukihiro NAKAMURA  

     
    PAPER

      Vol:
    E90-A No:4
      Page(s):
    784-791

    Recently, self-reconfigurable devices which can be partially reprogrammed by other part of the same device have been proposed. However, since conventional self-reconfigurable devices are LUT-array-based fine-grained devices, their time efficiency is spoiled by overhead for reconfiguration time to load large amount of configuration data. Therefore, we have to improve architectures. At the architecture design phase, it is difficult to estimate the performance, including reconfiguration overhead, of self-reconfigurable devices by static analysis, since it depends on many architecture parameters and unpredictable run-time behavior. In this paper, we propose a simulation-based platform for design exploration of self-reconfigurable devices. As a demonstration of the proposed platform, we implement an adaptive load distribution model on the devices of various reconfiguration granularities and evaluate performance of the devices.

  • Datapath-Layout-Driven Design for Low-Power Standard-Cell LSI Implementation

    Takahiro KAKIMOTO  Hiroyuki OCHI  Takao TSUDA  

     
    LETTER-VLSI Design

      Vol:
    E85-A No:12
      Page(s):
    2795-2798

    As a design flow for low-power FPGA implementation, Datapath-Layout-Driven Design (DLDD) has been proposed. This letter reports the effect of DLDD for standard-cell-based ASIC implementation, and proposes necessary improvements. Experimental results shows that about 8.3% reduction of power dissipation is achieved in the best case.

  • A Localization Scheme for Sensor Networks Based on Wireless Communication with Anchor Groups

    Hiroyuki OCHI  Shigeaki TAGASHIRA  Satoshi FUJITA  

     
    PAPER

      Vol:
    E89-D No:5
      Page(s):
    1614-1621

    In this paper, we propose a new localization scheme for wireless sensor networks consisting of a huge number of sensor nodes equipped with simple wireless communication devices such as wireless LAN and Bluetooth. The proposed scheme is based on the Point-In-Triangle (PIT) test proposed by He et al. The scheme is actually implemented by using Bluetooth devices of Class 2 standard, and the performance of the scheme is evaluated in an actual environment. The result of experiments indicates that the proposed scheme could realize a localization with an error of less than 2 m.

  • An Algorithm for Generating Generic BDDs

    Tetsushi KATAYAMA  Hiroyuki OCHI  Takao TSUDA  

     
    PAPER-Logic Synthesis

      Vol:
    E83-A No:12
      Page(s):
    2505-2512

    Binary Decision Diagrams (BDDs) are graph representation of Boolean functions. In particular, Ordered BDDs (OBDDs) are useful in many situations, because they provide canonical representation and they are manipulated efficiently. BDD packages which automatically generate OBDDs have been developed, and they are now widely used in logic design area, including formal verification and logic synthesis. Synthesis of pass-transistor circuits is one of successful applications of such BDD packages. Pass-transistor circuits are generated from BDDs by mapping each node to a selector which consists of two or four pass transistors. If circuits are generated from smaller BDDs, generated circuits have smaller number of transistors and hence save chip area and power consumption. In this paper, more generic BDDs which have no restrictions in variable ordering and variable appearance count on its paths are called Generic BDDs (GBDDs), and an algorithm for generating GBDDs is proposed for the purpose of synthesis of pass-transistor circuits. The proposed algorithm consists of two steps. At the first step, parse trees (PTs) for given Boolean formulas are generated, where a PT is a directed tree representation of Boolean formula(s) and it consists of literal nodes and operation nodes. In this step, our algorithm attempts to reduce the number of literal nodes of PTs. At the second step, a GBDD is generated for the PTs using Concatenation Method, where Concatenation Method generates a GBDD by connecting GBDDs vertically. In this step, our algorithm attempts to share isomorphic subgraphs. In experiments on ISCAS'89 and MCNC benchmark circuits, our program successfully generated 32 GBDDs out of 680 single-output functions and 4 GBDDs out of 49 multi-output functions whose sizes are smaller than OBDDs. GBDD size is reduced by 23.1% in the best case compared with OBDD.

  • Autonomous Repair Fault Tolerant Dynamic Reconfigurable Device

    Kentaro NAKAHARA  Shin'ichi KOUYAMA  Tomonori IZUMI  Hiroyuki OCHI  Yukihiro NAKAMURA  

     
    PAPER-Embedded, Real-Time and Reconfigurable Systems

      Vol:
    E91-A No:12
      Page(s):
    3612-3621

    Recently, reconfigurable devices are widely used in the fields of small amount production and trial production. They are also expected to be utilized in such mission-critical fields as space development, because system update and pseudo-repair can be achieved remotely by reconfiguring. However, in the case of conventional reconfigurable devices, configuration memory upsets caused by radiation and alpha particles reconfigure the device unpredictably, resulting in fatal system failures. Therefore, a reconfigurable device with high fault-tolerance against configuration upsets is required. In this paper, we propose an architecture of a fault-tolerant reconfigurable device that autonomously repairs configuration upsets by itself without interrupting system operations. The device consists of a 2D array of "Autonomous-Repair Cells" each of which repairs its upsets autonomously. The architecture has a scalability in fault tolerance; a finer-grained Autonomous-Repair Cell provides higher fault-tolerance. To determine the architecture, we analyze four autonomous repair techniques of the cell experimentally. Then, two autonomous repair techniques, simple multiplexing (S.M.) and memory multiplexing (M.M.), are applied; the former to programmable logics and the latter to cell-to-cell routing resources. Through evaluation, we show that proposed device achieves more than 10 years average lifetime against configuration upsets even in a severe situation such as a satellite orbit.

  • An Integrated Approach of Variable Ordering and Logic Mapping into LUT-Array-Based PLD

    Tomonori IZUMI  Shin'ichi KOUYAMA  Hiroyuki OCHI  Yukihiro NAKAMURA  

     
    PAPER

      Vol:
    E88-A No:4
      Page(s):
    907-914

    This paper presents an approach of logic mapping into LUT-Array-Based PLD where Boolean functions in the form of the sum of generalized complex terms (SGCTs) can be mapped directly. While previous mapping approach requires predetermined variable ordering, our approach performs mapping and variable reordering simultaneously. For the purpose, we propose a directed acyclic graph based on the multiple-valued decision diagram (MDD) and an algorithm to construct the graph. Our algorithm generates candidates of SGCT expressions for each node in a bottom-up manner and selects the variables in the current level by evaluating the sizes of SGCT expressions directly. Experimental results show that our approach reduces the number of terms maximum to 71 percent for the MCNC benchmark circuits.

  • Reliability Evaluation Environment for Exploring Design Space of Coarse-Grained Reconfigurable Architectures

    Takashi IMAGAWA  Masayuki HIROMOTO  Hiroyuki OCHI  Takashi SATO  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E93-A No:12
      Page(s):
    2524-2532

    This paper proposes a reliability evaluation environment for coarse-grained reconfigurable architectures. This environment is designed so that it can be easily extended to different target architectures and applications by automating the generation of the simulation inputs such as HDL codes for fault injection and configuration information. This automation enables us to explore a huge design space in order to efficiently analyze area/reliability trade-offs and find the best solution. This paper also shows demonstrative examples of the design space exploration of coarse-grained reconfigurable architectures using the proposed environment. Through the demonstrations, we discuss relationship between coarse-grained architectures and reliability, which has not yet been addressed in existing literatures and show the feasibility of the proposed environment.

  • A Zero-Suppressed BDD Package with Pruning and Its Application to GRM Minimization

    Hiroyuki OCHI  

     
    PAPER

      Vol:
    E79-A No:12
      Page(s):
    2134-2139

    Recently, various efficient algorithms for solving combinatorial optimization problems using BDD-based set manipulation techniques have been developed. Minato proposed O-suppressed BDDs (ZBDDs) which is suitable for set manipulation, and it is utilized for various search problems. In terms of practical limits of space, however, there are still many search problems which are solved much better by using conventional branch-and-bound techniques than by using BDDs or ZBDDs, while the ability of conventional branch-and-bound approaches is limited by computation time. In this paper, an extension of APPLY operation, named APPRUNE (APply + PRUNE) operation, is proposed, which performs APPLY operation (ZBDD construction) and pruning simultaneously in order to reduce the required space for intermediate ZBDDs. As a prototype, a specific algorithm of APPRUNE operation is shown by assuming that the given condition for pruning is a threshold function, although it is expected that APPRUNE operation will be more effective if more sophisticated condition are considered. To reduce size of ZBDDs in intermediate steps, this paper also pay attention to the number of cared variables. As an application, an exact-minimization algorithm for generalized Reed-Muller expressions (GRMs) is implemented. From experimental results, it is shown that time and memory usage improved 8.8 and 3.4 times, respectively, in the best case using APPRUNE operation. Results on generating GRMs of exact-minimum number of not only product terms but also literals is also shown.

  • Formal Design Verification of Combinational Circuits Specified by Recurrence Equations

    Hiroyuki OCHI  Shuzo YAJIMA  

     
    PAPER-Design Verification

      Vol:
    E79-D No:10
      Page(s):
    1431-1435

    In order to apply formal design verification, it is necessary to describe formally and correctly the specification of the circuit under verification. Especially when we apply conventional OBDD-based logic comparison method for verifying combinational circuits, another correct" logic circuits or Boolean formulae must be given as the specification. It is desired to develop an efficient automatic design verification method which interprets specification that can be described easier. This paper provides a new verification method which is useful for combinational circuits such as arithmetic circuits. The proposed method efficiently verifies whether a designed circuit satisfies a specification given by recurrence equations. This enables us to describe easily an error-free specification for arithmetic circuits. To perform verification efficiently using an ordinary OBDD package, an efficient truth-value rotation algorithm is developed. The truthvalue rotation algorithm efficiently generates an OBDD representing f(x + 1 (mod 2n)) from a given OBDD representing f(x). By experiments on SPARC station 10 model 51, it takes 180 secs to generate an OBDD for designed circuit of 23-bit square function, and additional 60 secs is sufficient to finish verifying that it satisfies the specification given by recurrence equations.

1-20hit(24hit)