The search functionality is under construction.

Author Search Result

[Author] Michitaka KAMEYAMA(65hit)

21-40hit(65hit)

  • A Minimum-Latency Linear Array FFT Processor for Robotics

    Somchai KITTICHAIKOONKIT  Michitaka KAMEYAMA  

     
    PAPER-Speech Processing

      Vol:
    E76-D No:6
      Page(s):
    680-688

    In the applications of the fast Fourier transform (FFT) to real-world computation such as robot vision, high-speed processing with small latency is an important issue. In this paper, we propose a linear array processor for the minimum-latency FFT computation. The processor is constructed by identical butterfly elements (BE's). The key concept to minimize the latency is that each BE generates its output data immediately after its input data become available, with 100% utilization of its arithmetic unit. We also introduce the real-valued FFT to perform the complex-valued FFT. We utilize a double linear array structure so that the parallel processing can be realized without communication between the linear arrays. As a result, the hardware amount of a single BE is reduced to half that of conventional designs. The latency of the proposed FFT processor is greatly reduced in comparison with conventional linear array FFT processors.

  • Implementation of a Partially Reconfigurable Multi-Context FPGA Based on Asynchronous Architecture

    Hasitha Muthumala WAIDYASOORIYA  Masanori HARIYAMA  Michitaka KAMEYAMA  

     
    PAPER-Electronic Circuits

      Vol:
    E92-C No:4
      Page(s):
    539-549

    This paper presents a novel architecture to increase the hardware utilization in multi-context field programmable gate arrays (MC-FPGAs). Conventional MC-FPGAs use dedicated tracks to transfer context-ID bits. As a result, hardware utilization ratio decreases, since it is very difficult to map different contexts, area efficiently. It also increases the context switching power, area and static power of the context-ID tracks. The proposed MC-FPGA uses the same wires to transfer both data and context-ID bits from cell to cell. As a result, programs can be mapped area efficiently by partitioning them into different contexts. An asynchronous multi-context logic block architecture to increase the processing speed of the multiple contexts is also proposed. The proposed MC-FPGA is fabricated using 6-metal 1-poly CMOS design rules. The data and context-ID transfer delays are measured to be 2.03ns and 2.26ns respectively. We achieved 30% processing time reduction for the SAD based correspondance search algorithm.

  • Evaluation of Interconnect-Complexity-Aware Low-Power VLSI Design Using Multiple Supply and Threshold Voltages

    Hasitha Muthumala WAIDYASOORIYA  Masanori HARIYAMA  Michitaka KAMEYAMA  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E91-A No:12
      Page(s):
    3596-3606

    This paper presents a high-level synthesis approach to minimize the total power consumption in behavioral synthesis under time and area constraints. The proposed method has two stages, functional unit (FU) energy optimization and interconnect energy optimization. In the first stage, active and inactive energies of the FUs are optimized using a multiple supply and threshold voltage scheme. Genetic algorithm (GA) based simultaneous assignment of supply and threshold voltages and module selection is proposed. The proposed GA based searching method can be used in large size problems to find a near-optimal solution in a reasonable time. In the second stage, interconnects are simplified by increasing their sharing. This is done by exploiting similar data transfer patterns among FUs. The proposed method is evaluated for several benchmarks under 90 nm CMOS technology. The experimental results show that more than 40% of energy savings can be achieved by our proposed method.

  • Group Testing Based Detection of Web Service DDoS Attackers

    Dalia NASHAT  Xiaohong JIANG  Michitaka KAMEYAMA  

     
    PAPER

      Vol:
    E93-B No:5
      Page(s):
    1113-1121

    The Distributed Denial of Service attack (DDoS) is one of the major threats to network security that exhausts network bandwidth and resources. Recently, an efficient approach Live Baiting was proposed for detecting the identities of DDoS attackers in web service using low state overhead without requiring either the models of legitimate requests nor anomalous behavior. However, Live Baiting has two limitations. First, the detection algorithm adopted in Live Baiting starts with a suspects list containing all clients, which leads to a high false positive probability especially for large web service with a huge number of clients. Second, Live Baiting adopts a fixed threshold based on the expected number of requests in each bucket during the detection interval without the consideration of daily and weekly traffic variations. In order to address the above limitations, we first distinguish the clients activities (Active and Non-Active clients during the detection interval) in the detection process and then further propose a new adaptive threshold based on the Change Point Detection method, such that we can improve the false positive probability and avoid the dependence of detection on sites and access patterns. Extensive trace-driven simulation has been conducted on real Web trace to demonstrate the detection efficiency of the proposed scheme in comparison with the Live Baiting detection scheme.

  • Design of a Trinocular-Stereo-Vision VLSI Processor Based on Optimal Scheduling

    Masanori HARIYAMA  Naoto YOKOYAMA  Michitaka KAMEYAMA  

     
    PAPER

      Vol:
    E91-C No:4
      Page(s):
    479-486

    This paper presents a processor architecture for high-speed and reliable trinocular stereo matching based on adaptive window-size control of SAD (Sum of Absolute Differences) computation. To reduce its computational complexity, SADs are computed using images divided into non-overlapping regions, and the matching result is iteratively refined by reducing a window size. Window-parallel-and-pixel-parallel architecture is also proposed to achieve to fully exploit the potential parallelism of the algorithm. The architecture also reduces the complexity of an interconnection network between memory and functional units based on regularity of reference pixels. The stereo matching processor is designed in a 0.18 µm CMOS technology. The processing time is 83.2 µs@100 MHz. By using optimal scheduling, the increases in area and processing time is only 5% and 3% respectively compared to binocular stereo vision although the computational amount is double.

  • High-Level Synthesis of VLSI Processors for Intelligent Integrated Systems

    Yasuaki SAWANO  Bumchul KIM  Michitaka KAMEYAMA  

     
    PAPER

      Vol:
    E77-C No:7
      Page(s):
    1101-1107

    In intelligent integrated systems such as robotics for autonomous work, it is essential to respond to the change of the environment very quickly. Therefore, the development of special-purpose VLSI processors for intelligent integrated systems with small latency becomes an very important subject. In this paper, we present a scheduling algorithm for high-level synthesis. The input to the scheduler is a behavioral description which is viewed as a data flow graph (DFG). The scheduler minimizes the latency, which is the delay of the critical path in the DFG, and minimizes the number of functional units and buses by improving the utilization rates. By using an integer linear programming, the scheduler optimally assigns nodes and arcs in the DFG into steps.

  • Architecture of a Fine-Grain Field-Programmable VLSI Based on Multiple-Valued Source-Coupled Logic

    Md.Munirul HAQUE  Michitaka KAMEYAMA  

     
    PAPER

      Vol:
    E87-C No:11
      Page(s):
    1869-1875

    A novel Multiple-Valued Field-Programmable VLSI (MV-FPVLSI) architecture using the Multiple-Valued Source-Coupled Logic (MVSCL) is proposed to implement special-purpose processors. An MV-FPVLSI consists of identical cells, which are connected to 8-neighborhood ones. To reduce the complexity of the interconnection block between two cells in an MV-FPVLSI, a bit-serial fine-grain pipeline architecture is introduced which allows single-wire data transmission and as a result, the data-transmission delay becomes very small in comparison with that of a conventional FPGA. To reduce the number of switches in the interconnection block further, a cell, using multiple-valued source-coupled logic circuits, is proposed, where the input currents can be linearly summed just by wiring without using any active devices. Not only the data, but also the control signal can be superposed by linear summation. As a result, no input switch is required which contributes to smaller data transmission delay. Moreover, an arbitrary 2-input logic function can be generated by linear summation of the input currents and threshold operations using these reconfigurable MVSCL circuits. As the MVSCL circuit has high driving capability in comparison with that of an equivalent CMOS circuit, high-speed logic operation is also possible while maintaining low power.

  • Collision Detection VLSI Processor for Intelligent Vehicles Using a Hierarchically-Content-Addressable Memory

    Masanori HARIYAMA  Kazuhiro SASAKI  Michitaka KAMEYAMA  

     
    PAPER-Processors

      Vol:
    E82-C No:9
      Page(s):
    1722-1729

    High-speed collision detection is important to realize a highly-safe intelligent vehicle. In collision detection, high-computational power is required to perform matching operation between discrete points on surfaces of a vehicle and obstacles in real-world environment. To achieve the highest performance, a hierarchical matching scheme is proposed based on two representations: the coarse representation and the fine representation. A vehicle is represented as a set of rectangular solids in the fine representation (fine rectangular solids), and the coarse representation, which is also a set of rectangular solids, is produced by enlarging the fine representation. If collision occurs between an obstacle discrete point and a rectangular solid in the coarse representation (coarse rectangular solid), then it is sufficient to check the only fine rectangular solids contained in the coarse one. Consequently, checks for the other fine rectangular solids can be omitted. To perform the hierarchical matching operation in parallel, a hierarchically-content-addressable memory (HCAM) is proposed. Since there is no need to perform matching operation in parallel with fine rectangular solids contained in different coarse ones, the fine ones are mapped onto a matching unit. As a result, the number of matching units can be reduced without decreasing the performance. Under the condition of the same execution time, the area of the HCAM is reduced to 46.4% in comparison with that of the conventional CAM in which the hierarchical matching scheme is not used.

  • Logic-In-Control-Architecture-Based Reconfigurable VLSI Using Multiple-Valued Differential-Pair Circuits

    Nobuaki OKADA  Michitaka KAMEYAMA  

     
    PAPER-Application of Multiple-Valued VLSI

      Vol:
    E93-D No:8
      Page(s):
    2126-2133

    A fine-grain bit-serial multiple-valued reconfigurable VLSI based on logic-in-control architecture is proposed for effective use of the hardware resources. In logic-in-control architecture, the control circuits can be merged with the arithmetic/logic circuits, where the control and arithmetic/logic circuits are constructed by using one or multiple logic blocks. To implement the control circuit, only one state in a state transition diagram is allocated to one logic block, which leads to reduction of the complexity of interconnections between logic blocks. The fine-grain logic block is implemented based on multiple-valued current-mode circuit technology. In the fine-grain logic block, an arbitrary 3-variable binary function can be programmed by using one multiplexer and two universal literal circuits. Three-variable binary functions are used to implement the control circuit. Moreover, the hardware resources can be utilized to construct a bit-serial adder, because full-adder sum and carry can be realized by programming in the universal literal circuit. Therefore, the logic block can be effectively reconfigured for arithmetic/logic and control circuits. It is made clear that the hardware complexity of the control circuit in the proposed reconfigurable VLSI can be reduced in comparison with that of the control circuit based on a typically sequential circuit in the conventional FPGA and the fine-grain field-programmable VLSI reported until now.

  • A Switch Block Architecture for Multi-Context FPGAs Based on a Ferroelectric-Capacitor Functional Pass-Gate Using Multiple/Binary Valued Hybrid Signals

    Shota ISHIHARA  Noriaki IDOBATA  Masanori HARIYAMA  Michitaka KAMEYAMA  

     
    PAPER-Application of Multiple-Valued VLSI

      Vol:
    E93-D No:8
      Page(s):
    2134-2144

    Dynamically Programmable Gate Arrays (DPGAs) provide more area-efficient implementations than conventional Field Programmable Gate Arrays (FPGAs). One of typical DPGA architectures is multi-context architecture. An DPGA based on multi-context architecture is Multi-Context FPGA (MC-FPGA) which achieves fast switching between contexts. The problem of the conventional SRAM-based MC-FPGA is its large area and standby power dissipation because of the large number of configuration memory bits. Moreover, since SRAM is volatile, the SRAM-based multi-context FPGA is difficult to implement power-gating for standby power reduction. This paper presents an area-efficient and nonvolatile multi-context switch block architecture for MC-FPGAs based on a ferroelectric-capacitor functional pass-gate which merges a multiple-valued threshold function and a nonvolatile multiple-valued storage. The test chip for four contexts is fabricated in a 0.35 µm-CMOS/0.60 µm-ferroelectric-capacitor process. The transistor count of the proposed multi-context switch block is reduced to 63% in comparison with that of the SRAM-based one.

  • Design of a Reconfigurable Parallel Processor for Digital Control Using FPGAs

    Yoshichika FUJIOKA  Michitaka KAMEYAMA  Nobuhiro TOMABECHI  

     
    PAPER

      Vol:
    E77-C No:7
      Page(s):
    1123-1130

    In digital control, it is essential to make the delay time for a large number of multiply-additions small because of sensor feedback. To meet the requirement, an architecture of the reconfigurable parallel processor using field-programmable gate arrays (FPGAs) is proposed. Although the performance is drastically increased in the full custom VLSI implementation, even the reconfigurable parallel processor using FPGAs becomes useful for many practical digital control applications. The performance evaluation shows that the delay time for the resolved acceleration cotrol computation of a twelve-degrees-of-freedom (DOF) redundant manipulator becomes about 70 µs which is about seventeen times faster than that of a parallel processor approach using conventional digital signal processors (DSPs).

  • Acceleration of Block Matching on a Low-Power Heterogeneous Multi-Core Processor Based on DTU Data-Transfer with Data Re-Allocation

    Yoshitaka HIRAMATSU  Hasitha Muthumala WAIDYASOORIYA  Masanori HARIYAMA  Toru NOJIRI  Kunio UCHIYAMA  Michitaka KAMEYAMA  

     
    PAPER-Integrated Electronics

      Vol:
    E95-C No:12
      Page(s):
    1872-1882

    The large data-transfer time among different cores is a big problem in heterogeneous multi-core processors. This paper presents a method to accelerate the data transfers exploiting data-transfer-units together with complex memory allocation. We used block matching, which is very common in image processing, to evaluate our technique. The proposed method reduces the data-transfer time by more than 42% compared to the earlier works that use CPU-based data transfers. Moreover, the total processing time is only 15 ms for a VGA image with 1616 pixel blocks.

  • Highly-Parallel Stereo Vision VLSI Processor Based on an Optimal Parallel Memory Access Scheme

    Masanori HARIYAMA  Seunghwan LEE  Michitaka KAMEYAMA  

     
    PAPER-Integrated Electronics

      Vol:
    E84-C No:3
      Page(s):
    382-389

    In a real-time vision system, parallel memory access is essential for highly parallel image processing. The use of multiple memory modules is one efficient technique for parallel access. In the technique, data stored in different memory modules can be accessed in parallel. This paper presents an optimal memory allocation methodology to map data to be read in parallel onto different memory modules. Based on the methodology, a high-performance VLSI processor for three-dimensional instrumentation is proposed.

  • Fine-Grain Multiple-Valued Reconfigurable VLSI Using Series-Gating Differential-Pair Circuits and Its Evaluation

    Nobuaki OKADA  Michitaka KAMEYAMA  

     
    PAPER

      Vol:
    E91-C No:9
      Page(s):
    1437-1443

    A fine-grain reconfigurable VLSI for various applications including arithmetic operations is developed. In the fine-grain architecture, it is important to define a cell function which leads to high utilization of a logic block and reduction of a switch block. From the point of view, a universal-literal-based multiple-valued cell suitable for bit-serial reconfigurable computation is proposed. A series-gating differential-pair circuit is effectively employed for implementing a full-adder circuit of Sum and a universal literal circuit. Therefore, a simple logic block can be constructed using the circuit technology. Moreover, interconnection complexity can be reduced by utilizing multiple-valued signaling, where superposition of serial data bits and a start signal which indicates heading of one-word is introduced. Differential-pair circuits are also effectively employed for current-output replication, which leads to high-speed signaling to adjacent cells The evaluation is done based on 90 nm CMOS design rule, and it is made clear that the area of the proposed cell can be reduced to 78% in comparison with that of the CMOS implementatiuon. Moreover, its area-time product becomes 92% while the delay time is increased by 18%.

  • Implementation of Voltage-Mode/Current-Mode Hybrid Circuits for a Low-Power Fine-Grain Reconfigurable VLSI

    Xu BAI  Michitaka KAMEYAMA  

     
    PAPER-Integrated Electronics

      Vol:
    E97-C No:10
      Page(s):
    1028-1035

    This paper proposes low-power voltage-mode/current-mode hybrid circuits to realize an arbitrary two-variable logic function and a full-adder function. The voltage and current mode can be selected for low-power operations at low and high frequency, respectively, according to speed requirement. An nMOS pass transistor network is shared to realize voltage switching and current steering for the voltage- and current-mode operations, respectively, which leads to high utilization of the hardware resources. As a result, when the operating frequency is more than 1.15,GHz, the current mode of the hybrid logic circuit is more power-efficient than the voltage mode. Otherwise, the voltage mode is more power-efficient. The power consumption of the hybrid two-variable logic circuit is lower than that of the conventional two-input look-up table (LUT) using CMOS transmission gates, when the operating frequency is more than 800,MHz. The delay and area of the hybrid two-variable logic circuit are increased by only 7% and 13%, respectively

  • Design of High-Performance Asynchronous Pipeline Using Synchronizing Logic Gates

    Zhengfan XIA  Shota ISHIHARA  Masanori HARIYAMA  Michitaka KAMEYAMA  

     
    PAPER-Integrated Electronics

      Vol:
    E95-C No:8
      Page(s):
    1434-1443

    This paper introduces a novel design method of an asynchronous pipeline based on dual-rail dynamic logic. The overhead of handshake control logic is greatly reduced by constructing a reliable critical datapath, which offers the pipeline high throughput as well as low power consumption. Synchronizing Logic Gates (SLGs), which have no data dependency problem, are used in the design to construct the reliable critical datapath. The design targets latch-free and extremely fine-grain or gate-level pipeline, where the depth of every pipeline stage is only one dual-rail dynamic logic. HSPICE simulation results, in a 65 nm design technology, indicate that the proposed design increases the throughput by 120% and decreases the power consumption by 54% compared with PS0, a classic dual-rail asynchronous pipeline implementation style, in 4-bit wide FIFOs. Moreover, this method is applied to design an array style multiplier. It shows that the proposed design reduces power by 37.9% compared to classic synchronous design when the workloads are 55%. A chip has been fabricated with a 44 multiplier function, which works well at 2.16G data-set/s (Post-layout simulation).

  • A Special-Purpose LSI for Inverse Kinematics Computation

    Michitaka KAMEYAMA  Takao MATSUMOTO  Hideki EGAMI  Tatsuo HIGUCHI  

     
    PAPER-Dedicated Processors

      Vol:
    E74-C No:11
      Page(s):
    3829-3837

    This paper presents a special-purpose LSI chip for inverse kinematics computation of robot manipulators. It is shown that inverse kinematic solutions of kinematically simple manipulators can be systematically described with the two-dimensional (2-D) vector rotation. The chip is fabricated with the 1.5-µm CMOS gate array. The arithmetic unit on the chip is designed using the COordinate Rotation DIgital Computer (CORDIC) algorithms, and it performs six types of operations based on the 2-D vector rotation at high speed. Pipelining is used to enhance the operating ratio of the unit to 100%. The computation time of a special purpose processor which is composed of the chip and a few memory chips is approximately 50 µs for a typical six degree-of-freedom manipulator. Moreover, the chip can be used for various types of manipulators, and the software development is very easy.

  • Quantum-Device-Oriented Multiple-Valued Logic System Based on a Super Pass Gate

    Xiaowei DENG  Takahiro HANYU  Michitaka KAMEYAMA  

     
    PAPER-Computer Hardware and Design

      Vol:
    E78-D No:8
      Page(s):
    951-958

    The investigation of device functions required from the systems point of view will be important for the development of the next generation of VLSI devices and systems. In this paper, a super pass transistor (SPT) model is presented as a quantum device candidate for future VLSI systems based on multiple-valued logic. A possible quantum device structure for the SPT model is also described, which employs the concepts of a lateral-resonant-tunneling quantum-dot transistor and a heterostructure field-effect transistor. Since it has the powerful capability of detecting multiple signal levels, the SPT will be useful for the implementation of highly compact multiple-valued VLSI systems. To exploit the functionality of the SPT, a super pass gate (SP-gate) corresponding to a single SPT is proposed as a multiple-valued universal logic module. The mathematical properties of the SP-gate are discussed. A design method for a multiple-valued SP-gate network is presented. An application of SP-gates to a multiple-valued image processing system is also demonstrated. The SP-gate network for the multiple-valued image processing system is evaluated in comparison with the corresponding NMOS implementation in terms of the number of transistors, interconnections and cascaded transistor stages. The size of a generalized series-parallel SP-gate network is also evaluated in comparison with a functionally equivalent multiple-valued series-parallel MOS pass transistor network. The results show that highly compact multiple-valued VLSI systems can be achieved if the SPT-model can be realized by an actual quantum device.

  • Data-Transfer-Aware Design of an FPGA-Based Heterogeneous Multicore Platform with Custom Accelerators

    Yasuhiro TAKEI  Hasitha Muthumala WAIDYASOORIYA  Masanori HARIYAMA  Michitaka KAMEYAMA  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E98-A No:12
      Page(s):
    2658-2669

    For an FPGA-based heterogeneous multicore platform, we present the design methodology to reduce the total processing time considering data-transfer. The reconfigurability of recent FPGAs with hard CPU cores allows us to realize a single-chip heterogeneous processor optimized for a given application. The major problem in designing such heterogeneous processors is data-transfer between CPU cores and accelerator cores. The total processing time with data-transfers is modeled considering the overlap of computation time and data-transfer time, and optimal design parameters are searched for.

  • Code Assignment Algorithm for Highly Parallel Multiple-Valued Combinational Circuits Based on Partition Theory

    Saneaki TAMAKI  Michitaka KAMEYAMA  Tatsuo HIGUCHI  

     
    PAPER-Logic Design

      Vol:
    E76-D No:5
      Page(s):
    548-554

    Design of locally computable combinational circuits is a very important subject to implement high-speed compact arithmetic and logic circuits in VLSI systems. This paper describes a multiple-valued code assignment algorithm for the locally computable combinational circuits, when a functional specification for a unary operation is given by the mapping relationship between input and output symbols. Partition theory usually used in the design of sequential circuits is effectively employed for the fast search for the code assignment problem. Based on the partition theory, mathematical foundation is derived for the locally computable circuit design. Moreover, for permutation operations, we propose an efficient code assignment algorithm based on closed chain sets to reduce the number of combinations in search procedure. Some examples are shown to demonstrate the usefulness of the algorithm.

21-40hit(65hit)