IEICE global.ieice.org Site

Author Search Result

[Author] Kazuyoshi TAKAGI(25hit)

1-20hit(25hit)

High-Throughput Rapid Single-Flux-Quantum Circuit Implementations for Exponential and Logarithm Computation Using the Radix-2 Signed-Digit Representation
Masamitsu TANAKA Kazuyoshi TAKAGI Naofumi TAKAGI

PAPER

Vol:
E99-C No:6
Page(s):
703-709
We present circuit implementations for computing exponentials and logarithms suitable for rapid single-flux-quantum (RSFQ) logic. We propose hardware algorithms based on the sequential table-lookup (STL) method using the radix-2 signed-digit representation that achieve high-throughput, digit-serial calculations. The circuits are implemented by processing elements formed in systolic-array-like, regularly-aligned pipeline structures. The processing elements are composed of adders, shifters, and readouts of precomputed constants. The iterative calculations are fully overlapped, and throughputs approach the maximum throughput of serial processing. The circuit size for calculating significand parts is estimated to be approximately 5-10 times larger than that of a bit-serial floating-point adder or multiplier.
Exact Minimization of Free BDDs and Its Application to Pass-Transistor Logic Optimization
Kazuyoshi TAKAGI Hiroshi HATAKEDA Shinji KIMURA Katsumasa WATANABE

PAPER

Vol:
E82-A No:11
Page(s):
2407-2413
In several design methods for Pass-transistor Logic (PTL) circuits, Boolean functions are expressed as OBDDs in decomposed form and then the component OBDDs are directly mapped to PTL cells. The total size of OBDDs (number of nodes) corresponds to the circuit size. In this paper, we investigate a method for PTL synthesis based on exact minimization of Free BDDs (FBDDs). FBDDs are well-studied extension of OBDDs with free variable ordering on each path. We present statistics showing that more than 56% of 616126 NPN-equivalence classes of 5-variable Boolean functions have minimum FBDDs with less size than their OBDDs. This result can be used for PTL synthesis as libraries. We also applied the exact minimization algorithm of FBDDs to the minimization of subcircuits in the synthesis for MCNC benchmarks and found up to 5% size reduction.
Algorithms for Evaluating the Matrix Polynomial I+A+A²+…+A^N-1 with Reduced Number of Matrix Multiplications
Kotaro MATSUMOTO Kazuyoshi TAKAGI Naofumi TAKAGI

PAPER-Algorithms and Data Structures

Vol:
E101-A No:2
Page(s):
467-471
The problem of evaluating the matrix polynomial I+A+A2+…+AN-1 with a reduced number of matrix multiplications has long been considered. Several algorithms have been proposed for this problem, which find a procedure requiring O(log N) matrix multiplications for a given N. Among them, the hybrid algorithm based on the double-base representation of N, i.e., using mixed radices 2 and 3, proposed by Dimitrov and Cooklev is most efficient. It has been suggested by them that the use of higher radices would not bring any more efficient algorithms. In this paper, we show that we can derive more efficient algorithms by using higher radices, and propose several efficient algorithms.
A Hardware Algorithm for Integer Division Using the SD2 Representation
Naofumi TAKAGI Shunsuke KADOWAKI Kazuyoshi TAKAGI

PAPER-VLSI Design Technology and CAD

Vol:
E89-A No:10
Page(s):
2874-2881
A hardware algorithm for integer division is proposed. It is based on the radix-2 non-restoring division algorithm. Fast computation is achieved by the use of the radix-2 signed-digit (SD2) representation. The algorithm does not require normalization of the divisor, and hence, does not require an area-consuming leading-one (or zero) detection nor shifts of variable-amount. Combinational (unfolded) implementation of the algorithm yields a regularly structured array divider, and sequential implementation yields compact dividers.
Automated Passive-Transmission-Line Routing Tool for Single-Flux-Quantum Circuits Based on A* Algorithm
Masamitsu TANAKA Koji OBATA Yuki ITO Shota TAKESHIMA Motoki SATO Kazuyoshi TAKAGI Naofumi TAKAGI Hiroyuki AKAIKE Akira FUJIMAKI

PAPER-Digital Applications

Vol:
E93-C No:4
Page(s):
435-439
We demonstrated an automated passive-transmission-line routing tool for single-flux-quantum (SFQ) circuits. The tool is based on the A* algorithm, which is widely used in CMOS LSI design, and tuned for microstrip/strip lines formed in the SRL 4-Nb layer structure. In large-scale SFQ circuits with 10000-20000 Josephson junctions, such as microprocessors, 80-90% of the wires can be automatically routed in about ten minutes. We verified correct operation above 40 GHz for an automatically routed 44 switch circuit from on-chip high-speed tests. The resulting circuit size and operating frequency were comparable to those of a manually designed result. We believe that the tool is useful for large-scale SFQ circuit design using conventional fabrication processes.
Rapid Single-Flux-Quantum Truncated Multiplier Based on Bit-Level Processing Open Access
Nobutaka KITO Ryota ODAKA Kazuyoshi TAKAGI

BRIEF PAPER-Superconducting Electronics

Vol:
E102-C No:7
Page(s):
607-611
A rapid single-flux-quantum (RSFQ) truncated multiplier based on bit-level processing is proposed. In the multiplier, two operands are transformed to two serialized patterns of bits (pulses), and the multiplication is carried out by processing those bits. The result is obtained by counting bits. By calculating in bit-level, the proposed multiplier can be implemented in small area. The gate level design of the multiplier is shown. The layout of the 4-bit multiplier was also designed.
Minimum Cut Linear Arrangement of p-q Dags for VLSI Layout of Adder Trees
Kazuyoshi TAKAGI Naofumi TAKAGI

PAPER

Vol:
E82-A No:5
Page(s):
767-774
Two algorithms for minimum cut linear arrangement of a class of graphs called p-q dags are proposed. A p-q dag represents the connection scheme of an adder tree, such as Wallace tree, and the VLSI layout problem of a bit slice of an adder tree is treated as the minimum cut linear arrangement problem of its corresponding p-q dag. One of the two algorithms is based on dynamic programming. It calculates an exact minimum solution within nO(1) time and space, where n is the size of a given graph. The other algorithm is an approximation algorithm which calculates a solution with O(log n) cutwidth. It requires O(n log n) time.
Logic Synthesis Method for Dual-Rail RSFQ Digital Circuits Using Root-Shared Binary Decision Diagrams
Koji OBATA Kazuyoshi TAKAGI Naofumi TAKAGI

PAPER-VLSI Design Technology and CAD

Vol:
E90-A No:1
Page(s):
257-266
We propose a new method of logic synthesis for dual-rail RSFQ (rapid single-flux-quantum) digital circuits. RSFQ circuit technology is one of the strongest candidates for the next generation technology of digital circuits. For representing logic functions, we use a root-shared binary decision diagram (RSBDD) which is a directed acyclic graph constructed from binary decision diagrams. In the method, first we construct an RSBDD from given logic functions, and then reduce the number of nodes in the constructed RSBDD by variable re-ordering. Finally, we synthesize a dual-rail RSFQ circuit from the reduced RSBDD. We have implemented the method and have synthesized benchmark circuits. We have synthesized dual-rail circuits that consist of about 27% fewer logic elements than those synthesized by a Transduction-based method on average.
Computational Power of Nondeterministic Ordered Binary Decision Diagrams and Their Subclasses
Kazuyoshi TAKAGI Koyo NITTA Hironori BOUNO Yasuhiko TAKENAGA Shuzo YAJIMA

PAPER

Vol:
E80-A No:4
Page(s):
663-669
Ordered Binary Decision Diagrams (OBDDs) are graph-based representations of Boolean functions which are widely used because of their good properties. In this paper, we introduce nondeterministic OBDDs (NOBDDs) and their restricted forms, and evaluate their expressive power. In some applications of OBDDs, canonicity, which is one of the good properties of OBDDs, is not necessary. In such cases, we can reduce the required amount of storage by using OBDDs in some non-canonical form. A class of NOBDDs can be used as a non-canonical form of OBDDs. In this paper, we focus on two particular methods which can be regarded as using restricted forms of NOBDDs. Our aim is to show how the size of OBDDs can be reduced in such forms from theoretical point of view. Firstly, we consider a method to solve satisfiability problem of combinational circuits using the structure of circuits as a key to reduce the NOBDD size. We show that the NOBDD size is related to the cutwidth of circuits. Secondly, we analyze methods that use OBDDs to represent Boolean functions as sets of product terms. We show that the class of functions treated feasibly in this representation strictly contains that in OBDDs and contained by that in NOBDDs.
A VLSI Architecture with Multiple Fast Store-Based Block Parallel Processing for Output Probability and Likelihood Score Computations in HMM-Based Isolated Word Recognition
Kazuhiro NAKAMURA Ryo SHIMAZAKI Masatoshi YAMAMOTO Kazuyoshi TAKAGI Naofumi TAKAGI

PAPER

Vol:
E95-C No:4
Page(s):
456-467
This paper presents a memory-efficient VLSI architecture for output probability computations (OPCs) of continuous hidden Markov models (HMMs) and likelihood score computations (LSCs). These computations are the most time consuming part of HMM-based isolated word recognition systems. We demonstrate multiple fast store-based block parallel processing (MultipleFastStoreBPP) for OPCs and LSCs and present a VLSI architecture that supports it. Compared with conventional fast store-based block parallel processing (FastStoreBPP) and stream-based block parallel processing (StreamBPP) architectures, the proposed architecture requires fewer registers and less processing time. The processing elements (PEs) used in the FastStoreBPP and StreamBPP architectures are identical to those used in the MultipleFastStoreBPP architecture. From a VLSI architectural viewpoint, a comparison shows that the proposed architecture is an improvement over the others, through efficient use of PEs and registers for storing input feature vectors.
A Clock Scheduling Algorithm for High-Throughput RSFQ Digital Circuits
Koji OBATA Kazuyoshi TAKAGI Naofumi TAKAGI

PAPER-VLSI Design Technology and CAD

Vol:
E91-A No:12
Page(s):
3772-3782
An algorithm for clock scheduling of concurrent-flow clocking rapid single-flux-quantum (RSFQ) digital circuits is proposed. RSFQ circuit technology is an emerging technology of digital circuits. In concurrent-flow clocking RSFQ digital circuits, all logic gates are driven by clock pulses. Appropriate clock scheduling makes clock frequency of the circuits higher. Given a clock period, the proposed algorithm determines the arrival time of clock pulses and the delay that should be inserted. Experimental results show that inserted delay elements by the proposed algorithm are 59.0% fewer and the height of clock trees are 40.4% shorter on average than those by a straightforward algorithm. The proposed algorithm can also be used to minimize the clock period, thus obtaining 19.0% shorter clock periods on average.
Nb 9-Layer Fabrication Process for Superconducting Large-Scale SFQ Circuits and Its Process Evaluation Open Access
Shuichi NAGASAWA Kenji HINODE Tetsuro SATOH Mutsuo HIDAKA Hiroyuki AKAIKE Akira FUJIMAKI Nobuyuki YOSHIKAWA Kazuyoshi TAKAGI Naofumi TAKAGI

INVITED PAPER

Vol:
E97-C No:3
Page(s):
132-140
We describe the recent progress on a Nb nine-layer fabrication process for large-scale single flux quantum (SFQ) circuits. A device fabricated in this process is composed of an active layer including Josephson junctions (JJ) at the top, passive transmission line (PTL) layers in the middle, and a DC power layer at the bottom. We describe the process conditions and the fabrication equipment. We use both diagnostic chips and shift register (SR) chips to improve the fabrication process. The diagnostic chip was designed to evaluate the characteristics of basic elements such as junctions, contacts, resisters, and wiring, in addition to their defect evaluations. The SR chip was designed to evaluate defects depending on the size of the SFQ circuits. The results of a long-term evaluation of the diagnostic and SR chips showed that there was fairly good correlation between the defects of the diagnostic chips and yields of the SRs. We could obtain a yield of 100% for SRs including 70,000JJs. These results show that considerable progress has been made in reducing the number of defects and improving reliability.
Circuit Description and Design Flow of Superconducting SFQ Logic Circuits Open Access
Kazuyoshi TAKAGI Nobutaka KITO Naofumi TAKAGI

INVITED PAPER

Vol:
E97-C No:3
Page(s):
149-156
Superconducting Single-Flux-Quantum (SFQ) devices have been paid much attention as alternative devices for digital circuits, because of their high switching speed and low power consumption. For large-scale circuit design, the role of computer-aided design environment is significant. As the characteristics of the SFQ devices are different from conventional devices, a new design environment is required. In this paper, we propose a new timing-aware circuit description method which can be used for SFQ circuit design. Based on the description and the dedicated algorithms we have been developing for SFQ logic circuit design, we propose an integrated design flow for SFQ logic circuits. We have designed a circuit using our developed design tools along with the design flow and demonstrated the correct operation.
Nested Loop Parallelization Using Polyhedral Optimization in High-Level Synthesis
Akihiro SUDA Hideki TAKASE Kazuyoshi TAKAGI Naofumi TAKAGI

PAPER-High-Level Synthesis and System-Level Design

Vol:
E97-A No:12
Page(s):
2498-2506
We propose a synthesis method of nested loops into parallelized circuits by integrating the polyhedral optimization, which is a state-of-the-art technique in the field of software, into high-level synthesis. Our method constructs circuits equipped with multiple processing elements (PEs), using information generated by the polyhedral optimizing compiler. Since multiple PEs cannot concurrently access the off-chip RAM, a method for constructing on-chip buffers is also proposed. Our buffering method reduces the off-chip RAM access conflicts and further enables burst accesses and data reuses. In our experimental result, the buffered circuits generated by our method are 8.2 times on average and 26.5 times at maximum faster than the sequential non-buffered ones, when each of the parallelized circuits is configured with eight PEs.
Timing Verification of Sequential Logic Circuits Based on Controlled Multi-Clock Path Analysis
Kazuhiro NAKAMURA Shinji KIMURA Kazuyoshi TAKAGI Katsumasa WATANABE

PAPER-Timing Verification and Optimization

Vol:
E81-A No:12
Page(s):
2515-2520
This paper introduces a new kind of false path, which is sensitizable but does not affect the decision of the maximum clock frequency. Such false paths exist in multi-clock operations controlled by waiting states, and the delay time of these paths can be greater than the clock period. This paper proposes a method to detect these waiting false paths based on the symbolic state traversal. In this method, the maximum allowable clock cycle of each path is computed using update cycles of each register.
A Verification Method for Single-Flux-Quantum Circuits Using Delay-Based Time Frame Model
Takahiro KAWAGUCHI Kazuyoshi TAKAGI Naofumi TAKAGI

PAPER-Logic Synthesis, Test and Verification

Vol:
E98-A No:12
Page(s):
2556-2564
Superconducting single-flux-quantum (SFQ) device is an emerging device which can realize digital circuits with high switching speed and low power consumption. In SFQ digital circuits, voltage pulses are used for carrier of information, and the representation of logic values is different from that of CMOS circuits. Design methods exclusive to SFQ circuits have been developed. In this paper, we present timing analysis and functional verification methods for SFQ circuits based on new timing model which we call delay-based time frame model. Assuming that possible pulse arrival is periodic, the model defines comprehensive time frames and representation of logic values. In static timing analysis, expected pulse arrival time is checked based on the model, and the order among pulse arrival times is calculated for each logic gate. In functional verification, the circuit behavior is abstracted in a form similar to a synchronous sequential circuit using the order of pulse arrival times, and then the behavior is verified using formal verification tools. Using our proposed methods, we can verify the functional behavior of SFQ circuits with complex clocking scheme, which appear often in practical design but cannot be dealt with in existing verification method. Experimental results show that our method can be applied to practical designs.
RSFQ 4-bit Bit-Slice Integer Multiplier
Guang-Ming TANG Kazuyoshi TAKAGI Naofumi TAKAGI

PAPER

Vol:
E99-C No:6
Page(s):
697-702
A rapid single-flux-quantum (RSFQ) 4-bit bit-slice multiplier is proposed. A new systolic-like multiplication algorithm suitable for RSFQ implementation is developed. The multiplier is designed using the cell library for AIST 10-kA/cm2 1.0-µm fabrication technology (ADP2). Concurrent flow clocking is used to design a fully pipelined RSFQ logic design. A 4n×4n-bit multiplier consists of 2n+17 stages. For verifying the algorithm and the logic design, a physical layout of the 8×8-bit multiplier has been designed with target operating frequency of 50GHz and simulated. It consists of 21 stages and 11,488 Josephson junctions. The simulation results show correct operation up to 62.5GHz.
Design and High-Speed Demonstration of Single-Flux-Quantum Bit-Serial Floating-Point Multipliers Using a 10kA/cm² Nb Process
Xizhu PENG Yuki YAMANASHI Nobuyuki YOSHIKAWA Akira FUJIMAKI Naofumi TAKAGI Kazuyoshi TAKAGI Mutsuo HIDAKA

PAPER

Vol:
E97-C No:3
Page(s):
188-193
Recently, we proposed a new data-path architecture, named a large-scale reconfigurable data-path (LSRDP), based on single-flux-quantum (SFQ) circuits, to establish a fundamental technology for future high-end computers. In this architecture, a large number of SFQ floating-point units (FPUs) are used as core components, and their high performance and low power consumption are essential. In this research, we implemented an SFQ half-precision bit-serial floating-point multiplier (FPM) with a target clock frequency of 50GHz, using the AIST 10kA/cm2 Nb process. The FPM was designed, based on a systolic-array architecture. It contains 11,066 Josephson junctions, including on-chip high-speed test circuits. The size and power consumption of the FPM are 6.66mm × 1.92mm and 2.83mW, respectively. Its correct operation was confirmed at a maximum frequency of 93.4GHz for the exponent part and of 72.0GHz for the significand part by on-chip high-speed tests.
Hardware Synthesis from C Programs with Estimation of Bit Length of Variables
Osamu OGAWA Kazuyoshi TAKAGI Yasufumi ITOH Shinji KIMURA Katsumasa WATANABE

PAPER

Vol:
E82-A No:11
Page(s):
2338-2346
In the hardware synthesis methods with high level languages such as C language, optimization quality of the compilers has a great influence on the area and speed of the synthesized circuits. Among hardware-oriented optimization methods required in such compilers, minimization of the bit length of the data-paths is one of the most important issues. In this paper, we propose an estimation algorithm of the necessary bit length of variables for this aim. The algorithm analyzes the control/data-flow graph translated from C programs and decides the bit length of each variable. On several experiments, the bit length of variables can be reduced by half with respect to the declared length. This method is effective not only for reducing the circuit area but also for reducing the delay of the operation units such as adders.
Large-Scale Integrated Circuit Design Based on a Nb Nine-Layer Structure for Reconfigurable Data-Path Processors Open Access
Akira FUJIMAKI Masamitsu TANAKA Ryo KASAGI Katsumi TAKAGI Masakazu OKADA Yuhi HAYAKAWA Kensuke TAKATA Hiroyuki AKAIKE Nobuyuki YOSHIKAWA Shuichi NAGASAWA Kazuyoshi TAKAGI Naofumi TAKAGI

INVITED PAPER

Vol:
E97-C No:3
Page(s):
157-165
We describe a large-scale integrated circuit (LSI) design of rapid single-flux-quantum (RSFQ) circuits and demonstrate several reconfigurable data-path (RDP) processor prototypes based on the ISTEC Advanced Process (ADP2). The ADP2 LSIs are made up of nine Nb layers and Nb/AlOx/Nb Josephson junctions with a critical current density of 10kA/cm2, allowing higher operating frequencies and integration. To realize truly large-scale RSFQ circuits, careful design is necessary, with several compromises in the device structure, logic gates, and interconnects, balancing the competing demands of integration density, design flexibility, and fabrication yield. We summarize numerical and experimental results related to the development of a cell-based design in the ADP2, which features a unit cell size reduced to 30-µm square and up to four strip line tracks in the unit cell underneath the logic gates. The ADP LSIs can achieve ∼10 times the device density and double the operating frequency with the same power consumption per junction as conventional LSIs fabricated using the Nb four-layer process. We report the design and test results of RDP processor prototypes using the ADP2 cell library. The RDP processors are composed of many arrays of floating-point units (FPUs) and switch networks, and serve as accelerators in a high-performance computing system. The prototypes are composed of two-dimensional arrays of several arithmetic logic units instead of FPUs. The experimental results include a successful demonstration of full operation and reconfiguration in a 2×2 RDP prototype made up of 11.5k junctions at 45GHz after precise timing design. Partial operation of a 4×4 RDP prototype made up of 28.5k-junctions is also demonstrated, indicating the scalability of our timing design.

1-20hit(25hit)

Author Search Result

[Author] Kazuyoshi TAKAGI(25hit)

High-Throughput Rapid Single-Flux-Quantum Circuit Implementations for Exponential and Logarithm Computation Using the Radix-2 Signed-Digit Representation

Exact Minimization of Free BDDs and Its Application to Pass-Transistor Logic Optimization

Algorithms for Evaluating the Matrix Polynomial I+A+A²+…+A^N-1 with Reduced Number of Matrix Multiplications

A Hardware Algorithm for Integer Division Using the SD2 Representation

Automated Passive-Transmission-Line Routing Tool for Single-Flux-Quantum Circuits Based on A* Algorithm

Rapid Single-Flux-Quantum Truncated Multiplier Based on Bit-Level Processing Open Access

Minimum Cut Linear Arrangement of p-q Dags for VLSI Layout of Adder Trees

Logic Synthesis Method for Dual-Rail RSFQ Digital Circuits Using Root-Shared Binary Decision Diagrams

Computational Power of Nondeterministic Ordered Binary Decision Diagrams and Their Subclasses

A VLSI Architecture with Multiple Fast Store-Based Block Parallel Processing for Output Probability and Likelihood Score Computations in HMM-Based Isolated Word Recognition

A Clock Scheduling Algorithm for High-Throughput RSFQ Digital Circuits

Nb 9-Layer Fabrication Process for Superconducting Large-Scale SFQ Circuits and Its Process Evaluation Open Access

Circuit Description and Design Flow of Superconducting SFQ Logic Circuits Open Access

Nested Loop Parallelization Using Polyhedral Optimization in High-Level Synthesis

Timing Verification of Sequential Logic Circuits Based on Controlled Multi-Clock Path Analysis

A Verification Method for Single-Flux-Quantum Circuits Using Delay-Based Time Frame Model

RSFQ 4-bit Bit-Slice Integer Multiplier

Design and High-Speed Demonstration of Single-Flux-Quantum Bit-Serial Floating-Point Multipliers Using a 10kA/cm² Nb Process

Hardware Synthesis from C Programs with Estimation of Bit Length of Variables

Large-Scale Integrated Circuit Design Based on a Nb Nine-Layer Structure for Reconfigurable Data-Path Processors Open Access

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles