Naofumi TAKAGI Kazuaki MURAKAMI Akira FUJIMAKI Nobuyuki YOSHIKAWA Koji INOUE Hiroaki HONDA
We propose a desk-side supercomputer with large-scale reconfigurable data-paths (LSRDPs) using superconducting rapid single-flux-quantum (RSFQ) circuits. It has several sets of computing unit which consists of a general-purpose microprocessor, an LSRDP and a memory. An LSRDP consists of a lot of, e.g., a few thousand, floating-point units (FPUs) and operand routing networks (ORNs) which connect the FPUs. We reconfigure the LSRDP to fit a computation, i.e., a group of floating-point operations, which appears in a 'for' loop of numerical programs by setting the route in ORNs before the execution of the loop. We propose to implement the LSRDPs by RSFQ circuits. The processors and the memories can be implemented by semiconductor technology. We expect that a 10 TFLOPS supercomputer, as well as a refrigerating engine, will be housed in a desk-side rack, using a near-future RSFQ process technology, such as 0.35 µm process.
Shuichi NAGASAWA Kenji HINODE Tetsuro SATOH Mutsuo HIDAKA Hiroyuki AKAIKE Akira FUJIMAKI Nobuyuki YOSHIKAWA Kazuyoshi TAKAGI Naofumi TAKAGI
We describe the recent progress on a Nb nine-layer fabrication process for large-scale single flux quantum (SFQ) circuits. A device fabricated in this process is composed of an active layer including Josephson junctions (JJ) at the top, passive transmission line (PTL) layers in the middle, and a DC power layer at the bottom. We describe the process conditions and the fabrication equipment. We use both diagnostic chips and shift register (SR) chips to improve the fabrication process. The diagnostic chip was designed to evaluate the characteristics of basic elements such as junctions, contacts, resisters, and wiring, in addition to their defect evaluations. The SR chip was designed to evaluate defects depending on the size of the SFQ circuits. The results of a long-term evaluation of the diagnostic and SR chips showed that there was fairly good correlation between the defects of the diagnostic chips and yields of the SRs. We could obtain a yield of 100% for SRs including 70,000JJs. These results show that considerable progress has been made in reducing the number of defects and improving reliability.
Kyosuke SANO Masato SUZUKI Kohei MARUYAMA Soya TANIGUCHI Masamitsu TANAKA Akira FUJIMAKI Masumi INOUE Nobuyuki YOSHIKAWA
We have studied on thermally assisted nano-structured transistors made of superconductor ultra-thin films. These transistors potentially work as interface devices for Josephson-CMOS (complementary metal oxide semiconductor) hybrid memory systems, because they can generate a high output voltage of sub-V enough to drive a CMOS transistor. In addition, our superconductor transistors are formed with very fine lines down to several tens of nm in widths, leading to very small foot print enabling us to make large capacity hybrid memories. Our superconductor transistors are made with niobium titanium nitride (NbTiN) thin films deposited on thermally-oxidized silicon substrates, on which other superconductor circuits or semiconductor circuits can be formed. The NbTiN thickness dependence of the critical temperature and of resistivity suggest thermally activated vortex or anti-vortex behavior in pseudo-two-dimensional superconducting films plays an important role for the operating principle of the transistors. To show the potential that the transistors can drive MOS transistors, we analyzed the driving ability of the superconductor transistors with HSPICE simulation. We also showed the turn-on behavior of a MOS transistor used for readout of a CMOS memory cell experimentally. These results showed the high potential of superconductor transistors for Josephson-CMOS hybrid memories.
Nobuyuki YOSHIKAWA Hiroshi TAGO Kaoru YONEYAMA
We have designed rapid single-flux-quantum (RSFQ) adder circuits using two different architectures: one is the conventional architecture employing globally synchronous clocking and the other is the data-driven self-timed (DDST) architecture. It has been pointed out that the timing margin of the RSFQ logic is very sensitive to the circuit parameter variations which are induced by the fabrication process and the device parameter uncertainty. Considering the physical timing in the circuits, we have shown that the DDST architecture is advantageous for realizing RSFQ circuits operating at very high frequencies. We have also calculated the theoretical circuit yield of the DDST adders and shown that a four-bit system operating at 10 GHz is feasible with sufficient operating margin, considering the present 1 kA/cm2 Nb Josephson technology.
Akira FUJIMAKI Masamitsu TANAKA Takahiro YAMADA Yuki YAMANASHI Heejoung PARK Nobuyuki YOSHIKAWA
We describe the development of single-flux-quantum (SFQ) microprocessors and the related technologies such as designing, circuit architecture, microarchitecture, etc. Since the microprocessors studied here aim for a general-purpose computing system, we employ the complexity-reduced (CORE) architecture in which the high-speed nature of the SFQ circuits is used not for increasing processor performance but for reducing the circuit complexity. The bit-serial processing is the most suitable way to realize the CORE architecture. We assembled all the best technologies concerning SFQ integrated circuits and designed the SFQ microprocessors, CORE1α, CORE1β, and CORE1γ. The CORE1β was made up of about 11000 Josephson junctions and successfully demonstrated. The peak performance reached 1400 million operations per second with a power consumption of 3.4 mW. We showed that the SFQ microprocessors had an advantage in a performance density to semiconductor's ones, which lead to the potential for constructing a high performance SFQ-circuit-based computing system.
Yuki YAMANASHI Shohei NISHIMOTO Nobuyuki YOSHIKAWA
A single-flux-quantum (SFQ) arithmetic logic unit (ALU) was designed and tested to evaluate the effectiveness of introducing dynamically reconfigurable logic gates in the design of a superconducting logic circuit. We designed and tested a bit-serial SFQ ALU that can perform six arithmetic/logic functions by using a dynamically reconfigurable AND/OR gate. To ensure stable operation of the ALU, we improved the operating margin of the SFQ AND/OR gate by employing a partially shielded structure where the circuit is partially surrounded by under- and over-ground layers to reduce parasitic inductances. Owing to the introduction of the partially shielded structure, the operating margin of the dynamically reconfigurable AND/OR gate can be improved without increasing the circuit area. This ALU can be designed with a smaller circuit area compared with the conventional ALU by using the dynamically reconfigurable AND/OR gate. We implemented the SFQ ALU using the AIST 2.5kA/cm2 Nb standard process 2. We confirmed high-speed operation and correct reconfiguration of the SFQ ALU by a high-speed test. The measured maximum operation frequency was 30GHz.
Shuichi NAGASAWA Masamitsu TANAKA Naoki TAKEUCHI Yuki YAMANASHI Shigeyuki MIYAJIMA Fumihiro CHINA Taiki YAMAE Koki YAMAZAKI Yuta SOMEI Naonori SEGA Yoshinao MIZUGAKI Hiroaki MYOREN Hirotaka TERAI Mutsuo HIDAKA Nobuyuki YOSHIKAWA Akira FUJIMAKI
We developed a Nb 4-layer process for fabricating superconducting integrated circuits that involves using caldera planarization to increase the flexibility and reliability of the fabrication process. We call this process the planarized high-speed standard process (PHSTP). Planarization enables us to flexibly adjust most of the Nb and SiO2 film thicknesses; we can select reduced film thicknesses to obtain larger mutual coupling depending on the application. It also reduces the risk of intra-layer shorts due to etching residues at the step-edge regions. We describe the detailed process flows of the planarization for the Josephson junction layer and the evaluation of devices fabricated with PHSTP. The results indicated no short defects or degradation in junction characteristics and good agreement between designed and measured inductances and resistances. We also developed single-flux-quantum (SFQ) and adiabatic quantum-flux-parametron (AQFP) logic cell libraries and tested circuits fabricated with PHSTP. We found that the designed circuits operated correctly. The SFQ shift-registers fabricated using PHSTP showed a high yield. Numerical simulation results indicate that the AQFP gates with increased mutual coupling by the planarized layer structure increase the maximum interconnect length between gates.
Nobuyuki YOSHIKAWA Kaoru YONEYAMA
We have developed a parameter optimization tool, Monte Carlo Josephson simulator (MJSIM), for rapid single flux quantum (RSFQ) digital circuits based on a Monte Carlo yield analysis. MJSIM can generate a number of net lists for the JSIM, where all parameter values are varied randomly according to the Gaussian distribution function, and calculate the circuit yields automatically. MJSIM can also produce an improved parameter set using the algorithm of the center-of-gravity method. In this algorithm, an improved parameter vector is derived by calculating the average of parameter vectors inside and outside the operating region. As a case study, we have optimized the circuit parameters of an RS flip-flop, and investigated the validity and efficiency of this optimization method by considering the convergency and initial condition dependence of the final results. We also proposed a method for accelerating the optimization speed by increasing 3σ spreads of the parameter distribution during the optimization.
Xizhu PENG Yuki YAMANASHI Nobuyuki YOSHIKAWA Akira FUJIMAKI Naofumi TAKAGI Kazuyoshi TAKAGI Mutsuo HIDAKA
Recently, we proposed a new data-path architecture, named a large-scale reconfigurable data-path (LSRDP), based on single-flux-quantum (SFQ) circuits, to establish a fundamental technology for future high-end computers. In this architecture, a large number of SFQ floating-point units (FPUs) are used as core components, and their high performance and low power consumption are essential. In this research, we implemented an SFQ half-precision bit-serial floating-point multiplier (FPM) with a target clock frequency of 50GHz, using the AIST 10kA/cm2 Nb process. The FPM was designed, based on a systolic-array architecture. It contains 11,066 Josephson junctions, including on-chip high-speed test circuits. The size and power consumption of the FPM are 6.66mm × 1.92mm and 2.83mW, respectively. Its correct operation was confirmed at a maximum frequency of 93.4GHz for the exponent part and of 72.0GHz for the significand part by on-chip high-speed tests.
Akira FUJIMAKI Masamitsu TANAKA Ryo KASAGI Katsumi TAKAGI Masakazu OKADA Yuhi HAYAKAWA Kensuke TAKATA Hiroyuki AKAIKE Nobuyuki YOSHIKAWA Shuichi NAGASAWA Kazuyoshi TAKAGI Naofumi TAKAGI
We describe a large-scale integrated circuit (LSI) design of rapid single-flux-quantum (RSFQ) circuits and demonstrate several reconfigurable data-path (RDP) processor prototypes based on the ISTEC Advanced Process (ADP2). The ADP2 LSIs are made up of nine Nb layers and Nb/AlOx/Nb Josephson junctions with a critical current density of 10kA/cm2, allowing higher operating frequencies and integration. To realize truly large-scale RSFQ circuits, careful design is necessary, with several compromises in the device structure, logic gates, and interconnects, balancing the competing demands of integration density, design flexibility, and fabrication yield. We summarize numerical and experimental results related to the development of a cell-based design in the ADP2, which features a unit cell size reduced to 30-µm square and up to four strip line tracks in the unit cell underneath the logic gates. The ADP LSIs can achieve ∼10 times the device density and double the operating frequency with the same power consumption per junction as conventional LSIs fabricated using the Nb four-layer process. We report the design and test results of RDP processor prototypes using the ADP2 cell library. The RDP processors are composed of many arrays of floating-point units (FPUs) and switch networks, and serve as accelerators in a high-performance computing system. The prototypes are composed of two-dimensional arrays of several arithmetic logic units instead of FPUs. The experimental results include a successful demonstration of full operation and reconfiguration in a 2×2 RDP prototype made up of 11.5k junctions at 45GHz after precise timing design. Partial operation of a 4×4 RDP prototype made up of 28.5k-junctions is also demonstrated, indicating the scalability of our timing design.
Naoki TAKEUCHI Taiki YAMAE Christopher L. AYALA Hideo SUZUKI Nobuyuki YOSHIKAWA
The adiabatic quantum-flux-parametron (AQFP) is an energy-efficient superconductor logic element based on the quantum flux parametron. AQFP circuits can operate with energy dissipation near the thermodynamic and quantum limits by maximizing the energy efficiency of adiabatic switching. We have established the design methodology for AQFP logic and developed various energy-efficient systems using AQFP logic, such as a low-power microprocessor, reversible computer, single-photon image sensor, and stochastic electronics. We have thus demonstrated the feasibility of the wide application of AQFP logic in future information and communications technology. In this paper, we present a tutorial review on AQFP logic to provide insights into AQFP circuit technology as an introduction to this research field. We describe the historical background, operating principle, design methodology, and recent progress of AQFP logic.
Fumihiro CHINA Naoki TAKEUCHI Hideo SUZUKI Yuki YAMANASHI Hirotaka TERAI Nobuyuki YOSHIKAWA
The adiabatic quantum flux parametron (AQFP) is an energy-efficient, high-speed superconducting logic device. To observe the tiny output currents from the AQFP in experiments, high-speed voltage drivers are indispensable. In the present study, we develop a compact voltage driver for AQFP logic based on a Josephson latching driver (JLD), which has been used as a high-speed driver for rapid single-flux-quantum (RSFQ) logic. In the JLD-based voltage driver, the signal currents of AQFP gates are converted into gap-voltage-level signals via an AQFP/RSFQ interface and a four-junction logic gate. Furthermore, this voltage driver includes only 15 Josephson junctions, which is much fewer than in the case for the previously designed driver based on dc superconducting quantum interference devices (60 junctions). In measurement, we successfully operate the JLD-based voltage driver up to 4 GHz. We also evaluate the bit error rate (BER) of the driver and find that the BER is 7.92×10-10 and 2.67×10-3 at 1GHz and 4GHz, respectively.
Taiki YAMAE Naoki TAKEUCHI Nobuyuki YOSHIKAWA
The adiabatic quantum-flux-parametron (AQFP) is an energy-efficient superconductor logic device. In a previous study, we proposed a low-latency clocking scheme called delay-line clocking, and several low-latency AQFP logic gates have been demonstrated. In delay-line clocking, the latency between adjacent excitation phases is determined by the propagation delay of excitation currents, and thus the rising time of excitation currents should be sufficiently small; otherwise, an AQFP gate can switch before the previous gate is fully excited. This means that delay-line clocking needs high clock frequencies, because typical excitation currents are sinusoidal and the rising time depends on the frequency. However, AQFP circuits need to be tested in a wide frequency range experimentally. Hence, in the present study, we investigate AQFP circuits adopting delay-line clocking with square excitation currents to apply delay-line clocking in a low frequency range. Square excitation currents have shorter rising time than sinusoidal excitation currents and thus enable low frequency operation. We demonstrate an AQFP buffer chain with delay-line clocking using square excitation currents, in which the latency is approximately 20ps per gate, and confirm that the operating margin for the buffer chain is kept sufficiently wide at clock frequencies below 1GHz, whereas in the sinusoidal case the operating margin shrinks below 500MHz. These results indicate that AQFP circuits adopting delay-line clocking can operate in a low frequency range by using square excitation currents.
Masataka SUGAHARA Wataru TAKANO Katsumi NIKI Nobuo HANEJI Nobuyuki YOSHIKAWA Katsuhiro IRIE
The measurement of dielectric property of cytochrome-c3 films reveals unusually large dielectric constant.
Tomoyuki TANAKA Christopher L. AYALA Nobuyuki YOSHIKAWA
Extremely energy-efficient logic devices are required for future low-power high-performance computing systems. Superconductor electronic technology has a number of energy-efficient logic families. Among them is the adiabatic quantum-flux-parametron (AQFP) logic family, which adiabatically switches the quantum-flux-parametron (QFP) circuit when it is excited by an AC power-clock. When compared to state-of-the-art CMOS technology, AQFP logic circuits have the advantage of relatively fast clock rates (5 GHz to 10 GHz) and 5 - 6 orders of magnitude reduction in energy before cooling overhead. We have been developing extremely energy-efficient computing processor components using the AQFP. The adder is the most basic computational unit and is important in the development of a processor. In this work, we designed and measured a 16-bit parallel prefix carry look-ahead Kogge-Stone adder (KSA). We fabricated the circuit using the AIST 10 kA/cm2 High-speed STandard Process (HSTP). Due to a malfunction in the measurement system, we were not able to confirm the complete operation of the circuit at the low frequency of 100 kHz in liquid He, but we confirmed that the outputs that we did observe are correct for two types of tests: (1) critical tests and (2) 110 random input tests in total. The operation margin of the circuit is wide, and we did not observe any calculation errors during measurement.
Akira FUJIMAKI Yoshiaki TAKAI Nobuyuki YOSHIKAWA
We present a design framework of a high-end server based on Single-Flux-Quantum (SFQ) circuit technologies. The server proposed here has multiple microprocessors and memories, which are mounted on a single board or package and are connected each other by SFQ interconnection switches. The extremely large bandwidth up to 100 Gbps/channel in the interconnection will be realized because of high throughput nature of the SFQ circuits. SFQ memories or Josephson-CMOS hybrid memories are employed as the shared memory of the multiprocessor. The SFQ microprocessors are constructed based on the complexity-reduced (CORE) architecture, in which complexity of the system is eased in exchange for using a high clock rate of the SFQ circuits. The processor is so-called Java-processor that directly executes the Java Byte Codes. Assuming a proper advancement of the Nb/AlOx/Nb integrated circuit process technology, we have estimated that the power consumption of the server system including a cryocooler is reduced by a factor of twenty as compared to the future CMOS system with the same processor performance, while the SFQ system has 100 times of magnitude larger memory-processor bandwidth.
Futabako MATSUZAKI Kenichi YODA Junichi KOSHIYAMA Kei MOTOORI Nobuyuki YOSHIKAWA
We have proposed a top-down design methodology for the RSFQ logic circuits based on the Binary Decision Diagram (BDD). In order to show the effectiveness of the methodology, we have designed a small RSFQ microprocessor based on simple architecture. We have compared the performance of the 8-bit RSFQ microprocessor with its CMOS version. It was found that the RSFQ system is superior in terms of the operating speed though it requires extremely large area. We have also implemented and tested a 1-bit ALU that is one of the important components of the microprocessor and confirmed its correct operation.
Tatsuro SUGIURA Yuki YAMANASHI Nobuyuki YOSHIKAWA
A physical random number generator, which generates truly random number trains by using the randomness of physical phenomena, is widely used in the field of cryptographic applications. We have developed an ultra high-speed superconductive physical random number generator that can generate random numbers at a frequency of more than 10 GHz by utilizing the high-speed operation and high-sensitivity of superconductive integrated circuits. In this study, we have statistically evaluated the quality of the random number trains generated by the superconductive physical random number generator. The performances of the statistical tests were based on a test method provided by National Institute of Standards and Technology (NIST). These statistical tests comprised several fundamental tests that were performed to evaluate the random number trains for their utilization in practical cryptographic applications. We have generated 230 random number trains consisting of 20,000-bits by using the superconductive physical random number generator fabricated by the SRL 2.5 kA/cm2 Nb standard process. The generated random number trains passed all the fundamental statistical tests. This result indicates that the superconductive random number generator can be sufficiently utilized in practical applications.