The search functionality is under construction.

Author Search Result

[Author] Kazuyoshi TAKAGI(25hit)

1-20hit(25hit)

  • High-Throughput Rapid Single-Flux-Quantum Circuit Implementations for Exponential and Logarithm Computation Using the Radix-2 Signed-Digit Representation

    Masamitsu TANAKA  Kazuyoshi TAKAGI  Naofumi TAKAGI  

     
    PAPER

      Vol:
    E99-C No:6
      Page(s):
    703-709

    We present circuit implementations for computing exponentials and logarithms suitable for rapid single-flux-quantum (RSFQ) logic. We propose hardware algorithms based on the sequential table-lookup (STL) method using the radix-2 signed-digit representation that achieve high-throughput, digit-serial calculations. The circuits are implemented by processing elements formed in systolic-array-like, regularly-aligned pipeline structures. The processing elements are composed of adders, shifters, and readouts of precomputed constants. The iterative calculations are fully overlapped, and throughputs approach the maximum throughput of serial processing. The circuit size for calculating significand parts is estimated to be approximately 5-10 times larger than that of a bit-serial floating-point adder or multiplier.

  • Exact Minimization of Free BDDs and Its Application to Pass-Transistor Logic Optimization

    Kazuyoshi TAKAGI  Hiroshi HATAKEDA  Shinji KIMURA  Katsumasa WATANABE  

     
    PAPER

      Vol:
    E82-A No:11
      Page(s):
    2407-2413

    In several design methods for Pass-transistor Logic (PTL) circuits, Boolean functions are expressed as OBDDs in decomposed form and then the component OBDDs are directly mapped to PTL cells. The total size of OBDDs (number of nodes) corresponds to the circuit size. In this paper, we investigate a method for PTL synthesis based on exact minimization of Free BDDs (FBDDs). FBDDs are well-studied extension of OBDDs with free variable ordering on each path. We present statistics showing that more than 56% of 616126 NPN-equivalence classes of 5-variable Boolean functions have minimum FBDDs with less size than their OBDDs. This result can be used for PTL synthesis as libraries. We also applied the exact minimization algorithm of FBDDs to the minimization of subcircuits in the synthesis for MCNC benchmarks and found up to 5% size reduction.

  • Algorithms for Evaluating the Matrix Polynomial I+A+A2+…+AN-1 with Reduced Number of Matrix Multiplications

    Kotaro MATSUMOTO  Kazuyoshi TAKAGI  Naofumi TAKAGI  

     
    PAPER-Algorithms and Data Structures

      Vol:
    E101-A No:2
      Page(s):
    467-471

    The problem of evaluating the matrix polynomial I+A+A2+…+AN-1 with a reduced number of matrix multiplications has long been considered. Several algorithms have been proposed for this problem, which find a procedure requiring O(log N) matrix multiplications for a given N. Among them, the hybrid algorithm based on the double-base representation of N, i.e., using mixed radices 2 and 3, proposed by Dimitrov and Cooklev is most efficient. It has been suggested by them that the use of higher radices would not bring any more efficient algorithms. In this paper, we show that we can derive more efficient algorithms by using higher radices, and propose several efficient algorithms.

  • A Hardware Algorithm for Integer Division Using the SD2 Representation

    Naofumi TAKAGI  Shunsuke KADOWAKI  Kazuyoshi TAKAGI  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E89-A No:10
      Page(s):
    2874-2881

    A hardware algorithm for integer division is proposed. It is based on the radix-2 non-restoring division algorithm. Fast computation is achieved by the use of the radix-2 signed-digit (SD2) representation. The algorithm does not require normalization of the divisor, and hence, does not require an area-consuming leading-one (or zero) detection nor shifts of variable-amount. Combinational (unfolded) implementation of the algorithm yields a regularly structured array divider, and sequential implementation yields compact dividers.

  • Automated Passive-Transmission-Line Routing Tool for Single-Flux-Quantum Circuits Based on A* Algorithm

    Masamitsu TANAKA  Koji OBATA  Yuki ITO  Shota TAKESHIMA  Motoki SATO  Kazuyoshi TAKAGI  Naofumi TAKAGI  Hiroyuki AKAIKE  Akira FUJIMAKI  

     
    PAPER-Digital Applications

      Vol:
    E93-C No:4
      Page(s):
    435-439

    We demonstrated an automated passive-transmission-line routing tool for single-flux-quantum (SFQ) circuits. The tool is based on the A* algorithm, which is widely used in CMOS LSI design, and tuned for microstrip/strip lines formed in the SRL 4-Nb layer structure. In large-scale SFQ circuits with 10000-20000 Josephson junctions, such as microprocessors, 80-90% of the wires can be automatically routed in about ten minutes. We verified correct operation above 40 GHz for an automatically routed 44 switch circuit from on-chip high-speed tests. The resulting circuit size and operating frequency were comparable to those of a manually designed result. We believe that the tool is useful for large-scale SFQ circuit design using conventional fabrication processes.

  • Rapid Single-Flux-Quantum Truncated Multiplier Based on Bit-Level Processing Open Access

    Nobutaka KITO  Ryota ODAKA  Kazuyoshi TAKAGI  

     
    BRIEF PAPER-Superconducting Electronics

      Vol:
    E102-C No:7
      Page(s):
    607-611

    A rapid single-flux-quantum (RSFQ) truncated multiplier based on bit-level processing is proposed. In the multiplier, two operands are transformed to two serialized patterns of bits (pulses), and the multiplication is carried out by processing those bits. The result is obtained by counting bits. By calculating in bit-level, the proposed multiplier can be implemented in small area. The gate level design of the multiplier is shown. The layout of the 4-bit multiplier was also designed.

  • Minimum Cut Linear Arrangement of p-q Dags for VLSI Layout of Adder Trees

    Kazuyoshi TAKAGI  Naofumi TAKAGI  

     
    PAPER

      Vol:
    E82-A No:5
      Page(s):
    767-774

    Two algorithms for minimum cut linear arrangement of a class of graphs called p-q dags are proposed. A p-q dag represents the connection scheme of an adder tree, such as Wallace tree, and the VLSI layout problem of a bit slice of an adder tree is treated as the minimum cut linear arrangement problem of its corresponding p-q dag. One of the two algorithms is based on dynamic programming. It calculates an exact minimum solution within nO(1) time and space, where n is the size of a given graph. The other algorithm is an approximation algorithm which calculates a solution with O(log n) cutwidth. It requires O(n log n) time.

  • Logic Synthesis Method for Dual-Rail RSFQ Digital Circuits Using Root-Shared Binary Decision Diagrams

    Koji OBATA  Kazuyoshi TAKAGI  Naofumi TAKAGI  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E90-A No:1
      Page(s):
    257-266

    We propose a new method of logic synthesis for dual-rail RSFQ (rapid single-flux-quantum) digital circuits. RSFQ circuit technology is one of the strongest candidates for the next generation technology of digital circuits. For representing logic functions, we use a root-shared binary decision diagram (RSBDD) which is a directed acyclic graph constructed from binary decision diagrams. In the method, first we construct an RSBDD from given logic functions, and then reduce the number of nodes in the constructed RSBDD by variable re-ordering. Finally, we synthesize a dual-rail RSFQ circuit from the reduced RSBDD. We have implemented the method and have synthesized benchmark circuits. We have synthesized dual-rail circuits that consist of about 27% fewer logic elements than those synthesized by a Transduction-based method on average.

  • Computational Power of Nondeterministic Ordered Binary Decision Diagrams and Their Subclasses

    Kazuyoshi TAKAGI  Koyo NITTA  Hironori BOUNO  Yasuhiko TAKENAGA  Shuzo YAJIMA  

     
    PAPER

      Vol:
    E80-A No:4
      Page(s):
    663-669

    Ordered Binary Decision Diagrams (OBDDs) are graph-based representations of Boolean functions which are widely used because of their good properties. In this paper, we introduce nondeterministic OBDDs (NOBDDs) and their restricted forms, and evaluate their expressive power. In some applications of OBDDs, canonicity, which is one of the good properties of OBDDs, is not necessary. In such cases, we can reduce the required amount of storage by using OBDDs in some non-canonical form. A class of NOBDDs can be used as a non-canonical form of OBDDs. In this paper, we focus on two particular methods which can be regarded as using restricted forms of NOBDDs. Our aim is to show how the size of OBDDs can be reduced in such forms from theoretical point of view. Firstly, we consider a method to solve satisfiability problem of combinational circuits using the structure of circuits as a key to reduce the NOBDD size. We show that the NOBDD size is related to the cutwidth of circuits. Secondly, we analyze methods that use OBDDs to represent Boolean functions as sets of product terms. We show that the class of functions treated feasibly in this representation strictly contains that in OBDDs and contained by that in NOBDDs.

  • A VLSI Architecture with Multiple Fast Store-Based Block Parallel Processing for Output Probability and Likelihood Score Computations in HMM-Based Isolated Word Recognition

    Kazuhiro NAKAMURA  Ryo SHIMAZAKI  Masatoshi YAMAMOTO  Kazuyoshi TAKAGI  Naofumi TAKAGI  

     
    PAPER

      Vol:
    E95-C No:4
      Page(s):
    456-467

    This paper presents a memory-efficient VLSI architecture for output probability computations (OPCs) of continuous hidden Markov models (HMMs) and likelihood score computations (LSCs). These computations are the most time consuming part of HMM-based isolated word recognition systems. We demonstrate multiple fast store-based block parallel processing (MultipleFastStoreBPP) for OPCs and LSCs and present a VLSI architecture that supports it. Compared with conventional fast store-based block parallel processing (FastStoreBPP) and stream-based block parallel processing (StreamBPP) architectures, the proposed architecture requires fewer registers and less processing time. The processing elements (PEs) used in the FastStoreBPP and StreamBPP architectures are identical to those used in the MultipleFastStoreBPP architecture. From a VLSI architectural viewpoint, a comparison shows that the proposed architecture is an improvement over the others, through efficient use of PEs and registers for storing input feature vectors.

  • A Clock Scheduling Algorithm for High-Throughput RSFQ Digital Circuits

    Koji OBATA  Kazuyoshi TAKAGI  Naofumi TAKAGI  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E91-A No:12
      Page(s):
    3772-3782

    An algorithm for clock scheduling of concurrent-flow clocking rapid single-flux-quantum (RSFQ) digital circuits is proposed. RSFQ circuit technology is an emerging technology of digital circuits. In concurrent-flow clocking RSFQ digital circuits, all logic gates are driven by clock pulses. Appropriate clock scheduling makes clock frequency of the circuits higher. Given a clock period, the proposed algorithm determines the arrival time of clock pulses and the delay that should be inserted. Experimental results show that inserted delay elements by the proposed algorithm are 59.0% fewer and the height of clock trees are 40.4% shorter on average than those by a straightforward algorithm. The proposed algorithm can also be used to minimize the clock period, thus obtaining 19.0% shorter clock periods on average.

  • Nb 9-Layer Fabrication Process for Superconducting Large-Scale SFQ Circuits and Its Process Evaluation Open Access

    Shuichi NAGASAWA  Kenji HINODE  Tetsuro SATOH  Mutsuo HIDAKA  Hiroyuki AKAIKE  Akira FUJIMAKI  Nobuyuki YOSHIKAWA  Kazuyoshi TAKAGI  Naofumi TAKAGI  

     
    INVITED PAPER

      Vol:
    E97-C No:3
      Page(s):
    132-140

    We describe the recent progress on a Nb nine-layer fabrication process for large-scale single flux quantum (SFQ) circuits. A device fabricated in this process is composed of an active layer including Josephson junctions (JJ) at the top, passive transmission line (PTL) layers in the middle, and a DC power layer at the bottom. We describe the process conditions and the fabrication equipment. We use both diagnostic chips and shift register (SR) chips to improve the fabrication process. The diagnostic chip was designed to evaluate the characteristics of basic elements such as junctions, contacts, resisters, and wiring, in addition to their defect evaluations. The SR chip was designed to evaluate defects depending on the size of the SFQ circuits. The results of a long-term evaluation of the diagnostic and SR chips showed that there was fairly good correlation between the defects of the diagnostic chips and yields of the SRs. We could obtain a yield of 100% for SRs including 70,000JJs. These results show that considerable progress has been made in reducing the number of defects and improving reliability.

  • Circuit Description and Design Flow of Superconducting SFQ Logic Circuits Open Access

    Kazuyoshi TAKAGI  Nobutaka KITO  Naofumi TAKAGI  

     
    INVITED PAPER

      Vol:
    E97-C No:3
      Page(s):
    149-156

    Superconducting Single-Flux-Quantum (SFQ) devices have been paid much attention as alternative devices for digital circuits, because of their high switching speed and low power consumption. For large-scale circuit design, the role of computer-aided design environment is significant. As the characteristics of the SFQ devices are different from conventional devices, a new design environment is required. In this paper, we propose a new timing-aware circuit description method which can be used for SFQ circuit design. Based on the description and the dedicated algorithms we have been developing for SFQ logic circuit design, we propose an integrated design flow for SFQ logic circuits. We have designed a circuit using our developed design tools along with the design flow and demonstrated the correct operation.

  • Nested Loop Parallelization Using Polyhedral Optimization in High-Level Synthesis

    Akihiro SUDA  Hideki TAKASE  Kazuyoshi TAKAGI  Naofumi TAKAGI  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E97-A No:12
      Page(s):
    2498-2506

    We propose a synthesis method of nested loops into parallelized circuits by integrating the polyhedral optimization, which is a state-of-the-art technique in the field of software, into high-level synthesis. Our method constructs circuits equipped with multiple processing elements (PEs), using information generated by the polyhedral optimizing compiler. Since multiple PEs cannot concurrently access the off-chip RAM, a method for constructing on-chip buffers is also proposed. Our buffering method reduces the off-chip RAM access conflicts and further enables burst accesses and data reuses. In our experimental result, the buffered circuits generated by our method are 8.2 times on average and 26.5 times at maximum faster than the sequential non-buffered ones, when each of the parallelized circuits is configured with eight PEs.

  • Timing Verification of Sequential Logic Circuits Based on Controlled Multi-Clock Path Analysis

    Kazuhiro NAKAMURA  Shinji KIMURA  Kazuyoshi TAKAGI  Katsumasa WATANABE  

     
    PAPER-Timing Verification and Optimization

      Vol:
    E81-A No:12
      Page(s):
    2515-2520

    This paper introduces a new kind of false path, which is sensitizable but does not affect the decision of the maximum clock frequency. Such false paths exist in multi-clock operations controlled by waiting states, and the delay time of these paths can be greater than the clock period. This paper proposes a method to detect these waiting false paths based on the symbolic state traversal. In this method, the maximum allowable clock cycle of each path is computed using update cycles of each register.

  • A Verification Method for Single-Flux-Quantum Circuits Using Delay-Based Time Frame Model

    Takahiro KAWAGUCHI  Kazuyoshi TAKAGI  Naofumi TAKAGI  

     
    PAPER-Logic Synthesis, Test and Verification

      Vol:
    E98-A No:12
      Page(s):
    2556-2564

    Superconducting single-flux-quantum (SFQ) device is an emerging device which can realize digital circuits with high switching speed and low power consumption. In SFQ digital circuits, voltage pulses are used for carrier of information, and the representation of logic values is different from that of CMOS circuits. Design methods exclusive to SFQ circuits have been developed. In this paper, we present timing analysis and functional verification methods for SFQ circuits based on new timing model which we call delay-based time frame model. Assuming that possible pulse arrival is periodic, the model defines comprehensive time frames and representation of logic values. In static timing analysis, expected pulse arrival time is checked based on the model, and the order among pulse arrival times is calculated for each logic gate. In functional verification, the circuit behavior is abstracted in a form similar to a synchronous sequential circuit using the order of pulse arrival times, and then the behavior is verified using formal verification tools. Using our proposed methods, we can verify the functional behavior of SFQ circuits with complex clocking scheme, which appear often in practical design but cannot be dealt with in existing verification method. Experimental results show that our method can be applied to practical designs.

  • RSFQ 4-bit Bit-Slice Integer Multiplier

    Guang-Ming TANG  Kazuyoshi TAKAGI  Naofumi TAKAGI  

     
    PAPER

      Vol:
    E99-C No:6
      Page(s):
    697-702

    A rapid single-flux-quantum (RSFQ) 4-bit bit-slice multiplier is proposed. A new systolic-like multiplication algorithm suitable for RSFQ implementation is developed. The multiplier is designed using the cell library for AIST 10-kA/cm2 1.0-µm fabrication technology (ADP2). Concurrent flow clocking is used to design a fully pipelined RSFQ logic design. A 4n×4n-bit multiplier consists of 2n+17 stages. For verifying the algorithm and the logic design, a physical layout of the 8×8-bit multiplier has been designed with target operating frequency of 50GHz and simulated. It consists of 21 stages and 11,488 Josephson junctions. The simulation results show correct operation up to 62.5GHz.

  • Design and High-Speed Demonstration of Single-Flux-Quantum Bit-Serial Floating-Point Multipliers Using a 10kA/cm2 Nb Process

    Xizhu PENG  Yuki YAMANASHI  Nobuyuki YOSHIKAWA  Akira FUJIMAKI  Naofumi TAKAGI  Kazuyoshi TAKAGI  Mutsuo HIDAKA  

     
    PAPER

      Vol:
    E97-C No:3
      Page(s):
    188-193

    Recently, we proposed a new data-path architecture, named a large-scale reconfigurable data-path (LSRDP), based on single-flux-quantum (SFQ) circuits, to establish a fundamental technology for future high-end computers. In this architecture, a large number of SFQ floating-point units (FPUs) are used as core components, and their high performance and low power consumption are essential. In this research, we implemented an SFQ half-precision bit-serial floating-point multiplier (FPM) with a target clock frequency of 50GHz, using the AIST 10kA/cm2 Nb process. The FPM was designed, based on a systolic-array architecture. It contains 11,066 Josephson junctions, including on-chip high-speed test circuits. The size and power consumption of the FPM are 6.66mm × 1.92mm and 2.83mW, respectively. Its correct operation was confirmed at a maximum frequency of 93.4GHz for the exponent part and of 72.0GHz for the significand part by on-chip high-speed tests.

  • Hardware Synthesis from C Programs with Estimation of Bit Length of Variables

    Osamu OGAWA  Kazuyoshi TAKAGI  Yasufumi ITOH  Shinji KIMURA  Katsumasa WATANABE  

     
    PAPER

      Vol:
    E82-A No:11
      Page(s):
    2338-2346

    In the hardware synthesis methods with high level languages such as C language, optimization quality of the compilers has a great influence on the area and speed of the synthesized circuits. Among hardware-oriented optimization methods required in such compilers, minimization of the bit length of the data-paths is one of the most important issues. In this paper, we propose an estimation algorithm of the necessary bit length of variables for this aim. The algorithm analyzes the control/data-flow graph translated from C programs and decides the bit length of each variable. On several experiments, the bit length of variables can be reduced by half with respect to the declared length. This method is effective not only for reducing the circuit area but also for reducing the delay of the operation units such as adders.

  • Large-Scale Integrated Circuit Design Based on a Nb Nine-Layer Structure for Reconfigurable Data-Path Processors Open Access

    Akira FUJIMAKI  Masamitsu TANAKA  Ryo KASAGI  Katsumi TAKAGI  Masakazu OKADA  Yuhi HAYAKAWA  Kensuke TAKATA  Hiroyuki AKAIKE  Nobuyuki YOSHIKAWA  Shuichi NAGASAWA  Kazuyoshi TAKAGI  Naofumi TAKAGI  

     
    INVITED PAPER

      Vol:
    E97-C No:3
      Page(s):
    157-165

    We describe a large-scale integrated circuit (LSI) design of rapid single-flux-quantum (RSFQ) circuits and demonstrate several reconfigurable data-path (RDP) processor prototypes based on the ISTEC Advanced Process (ADP2). The ADP2 LSIs are made up of nine Nb layers and Nb/AlOx/Nb Josephson junctions with a critical current density of 10kA/cm2, allowing higher operating frequencies and integration. To realize truly large-scale RSFQ circuits, careful design is necessary, with several compromises in the device structure, logic gates, and interconnects, balancing the competing demands of integration density, design flexibility, and fabrication yield. We summarize numerical and experimental results related to the development of a cell-based design in the ADP2, which features a unit cell size reduced to 30-µm square and up to four strip line tracks in the unit cell underneath the logic gates. The ADP LSIs can achieve ∼10 times the device density and double the operating frequency with the same power consumption per junction as conventional LSIs fabricated using the Nb four-layer process. We report the design and test results of RDP processor prototypes using the ADP2 cell library. The RDP processors are composed of many arrays of floating-point units (FPUs) and switch networks, and serve as accelerators in a high-performance computing system. The prototypes are composed of two-dimensional arrays of several arithmetic logic units instead of FPUs. The experimental results include a successful demonstration of full operation and reconfiguration in a 2×2 RDP prototype made up of 11.5k junctions at 45GHz after precise timing design. Partial operation of a 4×4 RDP prototype made up of 28.5k-junctions is also demonstrated, indicating the scalability of our timing design.

1-20hit(25hit)