The search functionality is under construction.

Author Search Result

[Author] Michitaka KAMEYAMA(65hit)

1-20hit(65hit)

  • Implementation of Voltage-Mode/Current-Mode Hybrid Circuits for a Low-Power Fine-Grain Reconfigurable VLSI

    Xu BAI  Michitaka KAMEYAMA  

     
    PAPER-Integrated Electronics

      Vol:
    E97-C No:10
      Page(s):
    1028-1035

    This paper proposes low-power voltage-mode/current-mode hybrid circuits to realize an arbitrary two-variable logic function and a full-adder function. The voltage and current mode can be selected for low-power operations at low and high frequency, respectively, according to speed requirement. An nMOS pass transistor network is shared to realize voltage switching and current steering for the voltage- and current-mode operations, respectively, which leads to high utilization of the hardware resources. As a result, when the operating frequency is more than 1.15,GHz, the current mode of the hybrid logic circuit is more power-efficient than the voltage mode. Otherwise, the voltage mode is more power-efficient. The power consumption of the hybrid two-variable logic circuit is lower than that of the conventional two-input look-up table (LUT) using CMOS transmission gates, when the operating frequency is more than 800,MHz. The delay and area of the hybrid two-variable logic circuit are increased by only 7% and 13%, respectively

  • Design of High-Performance Asynchronous Pipeline Using Synchronizing Logic Gates

    Zhengfan XIA  Shota ISHIHARA  Masanori HARIYAMA  Michitaka KAMEYAMA  

     
    PAPER-Integrated Electronics

      Vol:
    E95-C No:8
      Page(s):
    1434-1443

    This paper introduces a novel design method of an asynchronous pipeline based on dual-rail dynamic logic. The overhead of handshake control logic is greatly reduced by constructing a reliable critical datapath, which offers the pipeline high throughput as well as low power consumption. Synchronizing Logic Gates (SLGs), which have no data dependency problem, are used in the design to construct the reliable critical datapath. The design targets latch-free and extremely fine-grain or gate-level pipeline, where the depth of every pipeline stage is only one dual-rail dynamic logic. HSPICE simulation results, in a 65 nm design technology, indicate that the proposed design increases the throughput by 120% and decreases the power consumption by 54% compared with PS0, a classic dual-rail asynchronous pipeline implementation style, in 4-bit wide FIFOs. Moreover, this method is applied to design an array style multiplier. It shows that the proposed design reduces power by 37.9% compared to classic synchronous design when the workloads are 55%. A chip has been fabricated with a 44 multiplier function, which works well at 2.16G data-set/s (Post-layout simulation).

  • A Special-Purpose LSI for Inverse Kinematics Computation

    Michitaka KAMEYAMA  Takao MATSUMOTO  Hideki EGAMI  Tatsuo HIGUCHI  

     
    PAPER-Dedicated Processors

      Vol:
    E74-C No:11
      Page(s):
    3829-3837

    This paper presents a special-purpose LSI chip for inverse kinematics computation of robot manipulators. It is shown that inverse kinematic solutions of kinematically simple manipulators can be systematically described with the two-dimensional (2-D) vector rotation. The chip is fabricated with the 1.5-µm CMOS gate array. The arithmetic unit on the chip is designed using the COordinate Rotation DIgital Computer (CORDIC) algorithms, and it performs six types of operations based on the 2-D vector rotation at high speed. Pipelining is used to enhance the operating ratio of the unit to 100%. The computation time of a special purpose processor which is composed of the chip and a few memory chips is approximately 50 µs for a typical six degree-of-freedom manipulator. Moreover, the chip can be used for various types of manipulators, and the software development is very easy.

  • Quantum-Device-Oriented Multiple-Valued Logic System Based on a Super Pass Gate

    Xiaowei DENG  Takahiro HANYU  Michitaka KAMEYAMA  

     
    PAPER-Computer Hardware and Design

      Vol:
    E78-D No:8
      Page(s):
    951-958

    The investigation of device functions required from the systems point of view will be important for the development of the next generation of VLSI devices and systems. In this paper, a super pass transistor (SPT) model is presented as a quantum device candidate for future VLSI systems based on multiple-valued logic. A possible quantum device structure for the SPT model is also described, which employs the concepts of a lateral-resonant-tunneling quantum-dot transistor and a heterostructure field-effect transistor. Since it has the powerful capability of detecting multiple signal levels, the SPT will be useful for the implementation of highly compact multiple-valued VLSI systems. To exploit the functionality of the SPT, a super pass gate (SP-gate) corresponding to a single SPT is proposed as a multiple-valued universal logic module. The mathematical properties of the SP-gate are discussed. A design method for a multiple-valued SP-gate network is presented. An application of SP-gates to a multiple-valued image processing system is also demonstrated. The SP-gate network for the multiple-valued image processing system is evaluated in comparison with the corresponding NMOS implementation in terms of the number of transistors, interconnections and cascaded transistor stages. The size of a generalized series-parallel SP-gate network is also evaluated in comparison with a functionally equivalent multiple-valued series-parallel MOS pass transistor network. The results show that highly compact multiple-valued VLSI systems can be achieved if the SPT-model can be realized by an actual quantum device.

  • Data-Transfer-Aware Design of an FPGA-Based Heterogeneous Multicore Platform with Custom Accelerators

    Yasuhiro TAKEI  Hasitha Muthumala WAIDYASOORIYA  Masanori HARIYAMA  Michitaka KAMEYAMA  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E98-A No:12
      Page(s):
    2658-2669

    For an FPGA-based heterogeneous multicore platform, we present the design methodology to reduce the total processing time considering data-transfer. The reconfigurability of recent FPGAs with hard CPU cores allows us to realize a single-chip heterogeneous processor optimized for a given application. The major problem in designing such heterogeneous processors is data-transfer between CPU cores and accelerator cores. The total processing time with data-transfers is modeled considering the overlap of computation time and data-transfer time, and optimal design parameters are searched for.

  • Code Assignment Algorithm for Highly Parallel Multiple-Valued Combinational Circuits Based on Partition Theory

    Saneaki TAMAKI  Michitaka KAMEYAMA  Tatsuo HIGUCHI  

     
    PAPER-Logic Design

      Vol:
    E76-D No:5
      Page(s):
    548-554

    Design of locally computable combinational circuits is a very important subject to implement high-speed compact arithmetic and logic circuits in VLSI systems. This paper describes a multiple-valued code assignment algorithm for the locally computable combinational circuits, when a functional specification for a unary operation is given by the mapping relationship between input and output symbols. Partition theory usually used in the design of sequential circuits is effectively employed for the fast search for the code assignment problem. Based on the partition theory, mathematical foundation is derived for the locally computable circuit design. Moreover, for permutation operations, we propose an efficient code assignment algorithm based on closed chain sets to reduce the number of combinations in search procedure. Some examples are shown to demonstrate the usefulness of the algorithm.

  • A Bit-Serial Reconfigurable VLSI Based on a Multiple-Valued X-Net Data Transfer Scheme

    Xu BAI  Michitaka KAMEYAMA  

     
    PAPER-Computer System

      Vol:
    E96-D No:7
      Page(s):
    1449-1456

    A multiple-valued data transfer scheme using X-net is proposed to realize a compact bit-serial reconfigurable VLSI (BS-RVLSI). In the multiple-valued data transfer scheme using X-net, two binary data can be transferred from two adjacent cells to one common adjacent cell simultaneously at each “X” intersection. One cell composed of a logic block and a switch block is connected to four adjacent cross points by four one-bit switches so that the complexity of the switch block is reduced to 50% in comparison with the cell of a BS-RVLSI using an eight nearest-neighbor mesh network (8-NNM). In the logic block, threshold logic circuits are used to perform threshold operations, and then their binary dual-rail voltage outputs enter a binary logic module which can be programmed to realize an arbitrary two-variable binary function or a bit-serial adder. As a result, the configuration memory count and transistor count of the proposed multiple-valued cell are reduced to 34% and 58%, respectively, in comparison with those of an equivalent CMOS cell. Moreover, its power consumption for an arbitrary 2-variable binary function becomes 67% at 800 MHz under the condition of the same delay time.

  • Architecture of a Stereo Matching VLSI Processor Based on Hierarchically Parallel Memory Access

    Masanori HARIYAMA  Haruka SASAKI  Michitaka KAMEYAMA  

     
    PAPER-Digital Circuits and Computer Arithmetic

      Vol:
    E88-D No:7
      Page(s):
    1486-1491

    This paper presents a VLSI processor for high-speed and reliable stereo matching based on adaptive window-size control of SAD(Sum of Absolute Differences) computation. To reduce its computational complexity, SADs are computed using multi-resolution images. Parallel memory access is essential for highly parallel image processing. For parallel memory access, this paper also presents an optimal memory allocation that minimizes the hardware amount under the condition of parallel memory access at specified resolutions.

  • Memory Allocation for Multi-Resolution Image Processing

    Yasuhiro KOBAYASHI  Masanori HARIYAMA  Michitaka KAMEYAMA  

     
    PAPER-VLSI Systems

      Vol:
    E91-D No:10
      Page(s):
    2386-2397

    Hierarchical approaches using multi-resolution images are well-known techniques to reduce the computational amount without degrading quality. One major issue in designing image processors is to design a memory system that supports parallel access with a simple interconnection network. The complexity of the interconnection network mainly depends on memory allocation; it maps pixels onto memory modules and determines the required number of memory modules. This paper presents a memory allocation method to minimize the number of memory modules for image processing using multi-resolution images. For efficient search, the proposed method exploits the regularity of window-type image processing. A practical example demonstrates that the number of memory modules is reduced to less than 14% that of conventional methods.

  • A Collision Detection Processor for Intelligent Vehicles

    Masanori HARIYAMA  Michitaka KAMEYAMA  

     
    PAPER

      Vol:
    E76-C No:12
      Page(s):
    1804-1811

    Since carelessness in driving causes a terrible traffic accident, it is an important subject for a vehicle to avoid collision autonomously. Real-time collision detection between a vehicle and obstacles will be a key target for the next-generation car electronics system. In collision detection, a large storage capacity is usually required to store the 3-D information on the obstacles lacated in a workspace. Moreover, high-computational power is essential not only in coordinate transformation but also in matching operation. In the proposed collision detection VLSI processor, the matching operation is drastically accelerated by using a Content-Addressable Memory (CAM) which evaluates the magnitude relationships between an input word and all the stored words in parallel. A new obstacle representation based on a union of rectangular solids is also used to reduce the obstacle memory capacity, so that the collision detection can be parformed only by parallel magnitude comparison. Parallel architecture using several identical processor elements (PEs) is employed to perform the coordinate transformation at high speed based on the COordinate Rotation DIgital Computation (CORDIC) algorithms. The collision detection time becomes 5.2 ms using 20 PEs and five CAMs with a 42-kbit capacity.

  • Multiple-Valued Logic-in-Memory VLSI Architecture Based on Floating-Gate-MOS Pass-Transistor Logic

    Takahiro HANYU  Michitaka KAMEYAMA  

     
    PAPER-Non-Binary Architectures

      Vol:
    E82-C No:9
      Page(s):
    1662-1668

    A new logic-in-memory VLSI architecture based on multiple-valued floating-gate-MOS pass-transistor logic is proposed to solve the communication bottleneck between memory and logic modules. Multiple-valued stored data are represented by the threshold voltage of a floating-gate MOS transistor, so that a single floating-gate MOS transistor is effectively employed to merge multiple-valued threshold-literal and pass-switch functions. As an application, a four-valued logic-in-memory VLSI for high-speed pattern recognition is also presented. The proposed VLSI detects a stored reference word with the minimum Manhattan distance between a 16-bit input word and 16-bit stored reference words. The effective chip area, the switching delay and the power dissipation of a new four-valued full adder, which is a key component of the proposed logic-in-memory VLSI, are reduced to about 33 percent, 67 percent and 24 percent, respectively, in comparison with those of the corresponding binary CMOS implementation under a 0.5-µm flash EEPROM technology.

  • A Multiple-Valued Reconfigurable VLSI Architecture Using Binary-Controlled Differential-Pair Circuits

    Xu BAI  Michitaka KAMEYAMA  

     
    PAPER-Integrated Electronics

      Vol:
    E96-C No:8
      Page(s):
    1083-1093

    This paper presents a fine-grain bit-serial reconfigurable VLSI architecture using multiple-valued switch blocks and binary logic modules. Multiple-valued signaling is utilized to implement a compact switch block. A binary-controlled current-steering technique is introduced, utilizing a programmable three-level differential-pair circuit to implement a high-performance low-power arbitrary two-variable binary function, and increase the noise margins in comparison with the quaternary-controlled differential-pair circuit. A current-source sharing technique between a series-gating differential-pair circuit and a current-mode D-latch is proposed to reduce the current source count and improve the speed. It is demonstrated that the power consumption and the delay of the proposed multiple-valued cell based on the binary-controlled current-steering technique and the current-source-sharing technique are reduced to 63% and 72%, respectively, in comparison with those of a previous multiple-valued cell.

  • An FPGA-Oriented Motion-Stereo Processor with a Simple Interconnection Network for Parallel Memory Access

    Seunghwan LEE  Masanori HARIYAMA  Michitaka KAMEYAMA  

     
    PAPER-Image Processing, Image Pattern Recognition

      Vol:
    E83-D No:12
      Page(s):
    2122-2130

    In designing a field-programmable gate array (FPGA)-based processor for motion stereo, a parallel memory system and a simple interconnection network for parallel data transfer are essential for parallel image processing. This paper, firstly, presents an FPGA-oriented hierarchical memory system. To reduce the bandwidth requirement between an on-chip memory in an FPGA and external memories, we propose an efficient scheduling: Once pixels are transferred to the on-chip memory, operations associated with the data are consecutively performed. Secondly, a rectangular memory allocation is proposed which allocates pixels to be accessed in parallel onto different memory modules of the on-chip memory. Consequently, completely parallel access can be achieved. The memory allocation also minimizes the required capacity of the on-chip memory and thus is suitable for FPGA-based implementation. Finally, a functional unit allocation is proposed to minimize the complexity between memory modules and functional units. An experimental result shows that the performance of the processor becomes 96 times higher than that of a 400 MHz Pentium II.

  • Implementation of a Low-Power FPGA Based on Synchronous/Asynchronous Hybrid Architecture

    Shota ISHIHARA  Ryoto TSUCHIYA  Yoshiya KOMATSU  Masanori HARIYAMA  Michitaka KAMEYAMA  

     
    PAPER-Electronic Circuits

      Vol:
    E94-C No:10
      Page(s):
    1669-1679

    This paper presents a low-power FPGA based on mixed synchronous/asynchronous design. The proposed FPGA consists of several sections which consist of logic blocks, and each section can be used as either a synchronous circuit or an asynchronous circuit according to its workload. An asynchronous circuit is power-efficient for a low-workload section since it does not require the clock tree which always consumes the power. On the other hand, a synchronous circuit is power-efficient for a high-workload section because of its simple hardware. The major consideration is designing an area-efficient synchronous/asynchronous hybrid logic block. This is because the hardware amount of the asynchronous circuit is about double that of the synchronous circuit, and the typical implementation wastes half of the hardware in synchronous mode. To solve this problem, we propose a hybrid logic block that can be used as either a single asynchronous logic block or two synchronous logic blocks. The proposed FPGA is fabricated using a 65-nm CMOS process. When the workload of a section is below 22%, asynchronous mode is more power-efficient than synchronous mode. Otherwise synchronous mode is more power-efficient.

  • A VLSI-Oriented Digital Signal Processor Based on Pulse-Train Residue Arithmetic Circuit with a Multiplier

    Michitaka KAMEYAMA  Oluwole ADEGBENRO  Tatsuo HIGUCHI  

     
    PAPER-Signal Processing

      Vol:
    E68-E No:1
      Page(s):
    14-21

    This paper proposes a new residue number multiplication scheme based on the cylic type of relationship which exists between the entries in the residue number multiplication truth-table when the modulus is any prime number. Using the scheme, multiplication is direct without table consultation and an entire truth-table is realizable. The multiplier circuit is simple and compact and allows pipelined processing of data. The flexibility of the multiplier is exploited in the implementation of an RNS based high-order FIR digital filter by using a programmable low order section. The suitability of the modular digital processor for VLSI is also indicated.

  • Multiple-Valued Fine-Grain Reconfigurable VLSI Using a Global Tree Local X-Net Network

    Xu BAI  Michitaka KAMEYAMA  

     
    PAPER-VLSI Architecture

      Vol:
    E97-D No:9
      Page(s):
    2278-2285

    A global tree local X-net network (GTLX) is introduced to realize high-performance data transfer in a multiple-valued fine-grain reconfigurable VLSI (MVFG-RVLSI). A global pipelined tree network is utilized to realize high-performance long-distance bit-parallel data transfer. Moreover, a logic-in-memory architecture is employed for solving data transfer bottleneck between a block data memory and a cell. A local X-net network is utilized to realize simple interconnections and compact switch blocks for eight-near neighborhood data transfer. Moreover, multiple-valued signaling is utilized to improve the utilization of the X-net network, where two binary data can be transferred from two adjacent cells to one common adjacent cell simultaneously at each “X” intersection. To evaluate the MVFG-RVLSI, a fast Fourier transform (FFT) operation is mapped onto a previous MVFG-RVLSI using only the X-net network and the MVFG-RVLSI using the GTLX. As a result, the computation time, the power consumption and the transistor count of the MVFG-RVLSI using the GTLX are reduced by 25%, 36% and 56%, respectively, in comparison with those of the MVFG-RVLSI using only the X-net network.

  • Design and Implementation of a Low-Power Multiple-Valued Current-Mode Integrated Circuit with Current-Source Control

    Takahiro HANYU  Satoshi KAZAMA  Michitaka KAMEYAMA  

     
    PAPER-Multiple-Valued Architectures

      Vol:
    E80-C No:7
      Page(s):
    941-947

    A new multiple-valued current-mode (MVCM) integrated circuit using a switched current-source control technique is proposed for a 1.5 V-supply high-speed arithmetic circuit with low-power dissipation. The use of a differential logic circuit (DLC) with a pair of dual-rail inputs makes the input voltage swing small, which results in a high driving capability at a lower supply voltage, while having large static power dissipation. In the proposed DLC using a switched current control technique, the static power dissipation can be greatly reduced because current sources in non-active circuit blocks are turned off. Since the gate of each current source is directly controlled by using a multiphase clock whose technique has been already used in dynamic circuit design, no additional transistors are required for currentsource control. As a typical example of arithmetic circuits, a new 1.5 V-supply 5454-bit multiplier based on a 0.8µm standard CMOS technology is also designed. Its performance is about 1.3 times faster than that of a binary fastest multiplier under the normalized power dissipation. A prototype chip is also fabricated to confirm the basic operation of the proposed MVCM integrated circuit.

  • A Three-Dimensional Instrumentation VLSI Processor Based on a Concurrent Memory-Access Scheme

    Seunghwan LEE  Masanori HARIYAMA  Michitaka KAMEYAMA  

     
    PAPER-Integrated Electronics

      Vol:
    E80-C No:11
      Page(s):
    1491-1498

    Three-dimensional (3-D) instrumentation using an image sequence is a promising instrumentation method for intelligent systems in which accurate 3-D information is required. However, real-time instrumentation is difficult since much computation time and a large memory bandwidth are required. In this paper, a 3-D instrumentation VLSI processor with a concurrent memory-access scheme is proposed. To reduce the access time, frequently used data are stored in a cache register array and are concurrently transferred to processing elements using simple interconnections to the 8-nearest neighbor registers. Based on a row and column memory access pattern, we propose a diagonally interleaved frame memory by which pixel values of a row and column are stored across memory modules. Based on the concurrent memory-access scheme, a 40 GOPS vprocessor is designed and the delay time for the instrumentation is estimated to be 42 ms for a 256256 images.

  • Parallel VLSI Processors for Robotics Using Multiple Bus Interconnection Networks

    Bumchul KIM  Michitaka KAMEYAMA  Tatsuo HIGUCHI  

     
    PAPER-Robot Electronics

      Vol:
    E75-A No:6
      Page(s):
    712-719

    This paper proposes parallel VLSI processors for robotics based on multiple processing elements organized around multiple bus interconnection networks. The advantages of multiple bus interconnection networks are generality, simplicity of implementation and capability of parallel communications between processing elements, therefore it is considered to be suitable for parallel VLSI systems. We also propose the optimal scheduling formulated in an integer programming problem to minimize the delay time of the parallel VLSI processors.

  • FOREWORD

    Michitaka KAMEYAMA  

     
    FOREWORD

      Vol:
    E90-C No:10
      Page(s):
    1849-1849
1-20hit(65hit)