Huangtao WU Wenjin HUANG Rui CHEN Yihua HUANG
To implement the parallel acceleration of convolution operation of Convolutional Neural Networks (CNNs) on field programmable gate array (FPGA), large quantities of the logic resources will be consumed, expecially DSP cores. Many previous researches fail to make a well balance between DSP and LUT6. For better resource efficiency, a typical convolution structure is implemented with LUT6s in this paper. Besides, a novel convolution structure is proposed to further reduce the LUT6 resource consumption by modifying the typical convolution structure. The equations to evaluate the LUT6 resource consumptions of both structures are presented and validated. The theoretical evaluation and experimental results show that the novel structure can save 3.5-8% of LUT6s compared with the typical structure.
Yusuke KIMURA Amir Masoud GHAREHBAGHI Masahiro FUJITA
This paper introduces methods to modify a buggy sequential gate-level circuit to conform to the specification. In order to preserve the optimization efforts, the modifications should be as small as possible. Assuming that the locations to be modified are given, our proposed method finds an appropriate set of fan-in signals for the patch function of those locations by iteratively calculating the state correspondence between the specification and the buggy circuit and applying a method for debugging combinational circuits. The experiments are conducted on ITC99 benchmark circuits, and it is shown that our proposed method can work when there are at most 30,000 corresponding reachable state pairs between two circuits. Moreover, a heuristic method using the information of data-path FFs is proposed, which can find a correct set of fan-ins for all the benchmark circuits within practical time.
Ryosuke MATSUO Jun SHIOMI Tohru ISHIHARA Hidetoshi ONODERA Akihiko SHINYA Masaya NOTOMI
Optical circuits using nanophotonic devices attract significant interest due to its ultra-high speed operation. As a consequence, the synthesis methods for the optical circuits also attract increasing attention. However, existing methods for synthesizing optical circuits mostly rely on straight-forward mappings from established data structures such as Binary Decision Diagram (BDD). The strategy of simply mapping a BDD to an optical circuit sometimes results in an explosion of size and involves significant power losses in branches and optical devices. To address these issues, this paper proposes a method for reducing the size of BDD-based optical logic circuits exploiting wavelength division multiplexing (WDM). The paper also proposes a method for reducing the number of branches in a BDD-based circuit, which reduces the power dissipation in laser sources. Experimental results obtained using a partial product accumulation circuit used in a 4-bit parallel multiplier demonstrates significant advantages of our method over existing approaches in terms of area and power consumption.
In this paper, a new transceiver system for the in-vehicle communication system is proposed to enhance data transmission rate and timing accuracy in TDM-based application. The proposed system utilizes point-to-point (P2P) channel, a closed-loop clock forwarding path, and a transceiver with a repeater and clock delay adjuster. The proposed system with 4 ECU (Electronic Computing Unit) nodes is implemented in 180nm CMOS technology and, when compared with conventional bus-based system, achieved more than 125 times faster data transmission. The maximum data rate was 2.5Gbps at 1.8V power supply and the worst peak-to-peak jitter for the data and clock signals over 5000 data symbols were about 49.6ps and 9.8ps respectively.
Takahiro OHTOMO Hiroki YAMADA Mamoru SAWAHASHI Keisuke SAITO
In full duplex (FD), which improves the system capacity (or cell throughput) and reduces the transmission delay (or latency) through simultaneous transmission and reception in the same frequency band, self-interference (SI) from the transmitter should be suppressed using antenna isolation, an analog SI canceler, and digital SI canceler (DSIC) to a level such that the data or control channel satisfies the required block error rate (BLER). This paper proposes a structure of iterative DSIC with alternating estimate subtraction (AES-IDSIC) for orthogonal frequency division multiplexing (OFDM) using FD. We first present the required SI suppression level considering SI, quantization noise of an analog-to-digital converter, and nonlinear distortion of a power amplifier and RF receiver circuit for a direct conversion transceiver using FD. Then, we propose an AES-IDSIC structure that iterates the generation of the SI estimate, the downlink symbol estimate, and then alternately removes one of the estimates from the received signal in the downlink including SI. We investigate the average BLER performance of the AES-IDSIC for OFDM using FD in a multipath fading channel based on link-level simulations under the constraint that the derived required signal-to-SI ratio must be satisfied.
Nobutaka KITO Ryota ODAKA Kazuyoshi TAKAGI
A rapid single-flux-quantum (RSFQ) truncated multiplier based on bit-level processing is proposed. In the multiplier, two operands are transformed to two serialized patterns of bits (pulses), and the multiplication is carried out by processing those bits. The result is obtained by counting bits. By calculating in bit-level, the proposed multiplier can be implemented in small area. The gate level design of the multiplier is shown. The layout of the 4-bit multiplier was also designed.
In this paper, to make asynchronous circuit design easy, we propose a conversion method from synchronous Register Transfer Level (RTL) models to asynchronous RTL models with bundled-data implementation. The proposed method consists of the generation of an intermediate representation from a given synchronous RTL model and the generation of an asynchronous RTL model from the intermediate representation. This allows us to deal with different representation styles of synchronous RTL models. We use the eXtensible Markup Language (XML) as the intermediate representation. In addition to the asynchronous RTL model, the proposed method generates a simulation model when the target implementation is a Field Programmable Gate Array and a set of non-optimization constraints for the control circuit used in logic synthesis and layout synthesis. In the experiment, we demonstrate that the proposed method can convert synchronous RTL models specified manually and obtained by a high-level synthesis tool to asynchronous ones.
Shinpei YAMASHITA Michihiko SUHARA Kenichi KAWAGUCHI Tsuyoshi TAKAHASHI Masaru SATO Naoya OKAMOTO Kiyoto ASAKAWA
We fabricate and characterize a GaAsSb/InGaAs backward diode (BWD) toward a realization of high sensitivity zero bias microwave rectification for RF wave energy harvest. Lattice-matched p-GaAsSb/n-InGaAs BWDs were fabricated and their current-voltage (I-V) characteristics and S-parameters up to 67 GHz were measured with respect to several sorts of mesa diameters in μm order. Our theoretical model and analysis are well fitted to the measured I-Vs on the basis of WKB approximation of the transmittance. It is confirmed that the interband tunneling due to the heterojunction is a dominant transport mechanism to exhibit the nonlinear I-V around zero bias regime unlike recombination or diffusion current components on p-n junction contribute in large current regime. An equivalent circuit model of the BWD is clarified by confirming theoretical fitting for frequency dependent admittance up to 67 GHz. From the circuit model, eliminating the parasitic inductance component, the frequency dependence of voltage sensitivity of the BWD rectifier is derived with respect to several size of mesa diameter. It quantitatively suggests an effectiveness of mesa size reduction to enhance the intrinsic matched voltage sensitivity with increasing junction resistance and keeping the magnitude of I-V curvature coefficient.
Arnab MUKHOPADHYAY Tapas Kumar MAITI Sandip BHATTACHARYA Takahiro IIZUKA Hideyuki KIKUCHIHARA Mitiko MIURA-MATTAUSCH Hafizur RAHAMAN Sadayuki YOSHITOMI Dondee NAVARRO Hans Jürgen MATTAUSCH
This report focuses on an optimization scheme of advanced MOSFETs for designing CMOS circuits with high power efficiency. For this purpose the physics-based compact model HiSIM2 is applied so that the relationship between device and circuit characteristics can be investigated properly. It is demonstrated that the short-channel effect, which is usually measured by the threshold-voltage shift relative to long-channel MOSFETs, provides a consistent measure for device-performance degradation with reduced channel length. However, performance degradations of CMOS circuits such as the power loss cannot be predicted by the threshold-voltage shift alone. Here, the subthreshold swing is identified as an additional important measure for power-efficient CMOS circuit design. The increase of the subthreshold swing is verified to become obvious when the threshold-voltage shift is larger than 0.15V.
Daisuke OKAMOTO Hirohito YAMADA
To address the bandwidth bottleneck that exists between LSI chips, we have proposed a novel, high-sensitivity receiver circuit for differential optical transmission on a silicon optical interposer. Both anodes and cathodes of the differential photodiodes (PDs) were designed to be connected to a transimpedance amplifier (TIA) through coupling capacitors. Reverse bias voltage was applied to each of the differential PDs through load resistance. The proposed receiver circuit achieved double the current signal amplitude of conventional differential receiver circuits. The frequency response of the receiver circuit was analyzed using its equivalent circuit, wherein the temperature dependence of the PD was implemented. The optimal load resistances of the PDs were determined to be 5kΩ by considering the tradeoff between the frequency response and bias voltage drop. A small dark current of the PD was important to reduce the voltage drop, but the bandwidth degradation was negligible if the dark current at room temperature was below 1µA. The proposed circuit achieved 3-dB bandwidths of 18.9 GHz at 25°C and 13.7 GHz at 85°C. Clear eye openings in the TIA output waveforms for 25-Gbps 27-1 pseudorandom binary sequence signals were obtained at both temperatures.
A topological quantum circuit is a representation model for topological quantum computation, which attracts much attention recently as a promising fault-tolerant quantum computation model by using 3D cluster states. A topological quantum circuit can be considered as a set of “loops,” and we can transform the topology of loops without changing the functionality of the circuit if the transformation satisfies certain conditions. Thus, there have been proposed many researches to optimize topological quantum circuits by transforming the topology. There are two directions of research to optimize topological quantum circuits. The first group of research considers so-called a placement and wiring problem where we consider how to place “parts” in a 3D space which corresponds to already optimized sub-circuits. The second group of research focuses on how to optimize the structure and locations of loops in a relatively small circuit which is treated as one part in the above-mentioned first group of research. This paper proposes a new idea for the second group of research; our idea is to consider topological transformations as a placement and wiring problem for modules which we derive from the information how loops are crossed. By using such a formulation, we can use the techniques for placement and wiring problems, and successfully obtain an optimized solution. We confirm by our experiment that our method indeed can reduce the cost much more than the method by Paetznick and Fowler.
Soft error jeopardizes the reliability of semiconductor devices, especially those working at low voltage. In recent years, silicon-on-thin-box (SOTB), which is a FD-SOI device, is drawing attention since it is suitable for ultra-low-voltage operation. This work evaluates the contributions of SRAM, FF and combinational circuit to chip-level soft error rate (SER) based on irradiation test results. For this evaluation, this work performed neutron irradiation test for characterizing single event transient (SET) rate of SOTB and bulk circuits at 0.5 V. Using the SBU and MCU data in SRAMs from previous work, we calculated the MBU rate with/without error correcting code (ECC) and with 1/2/4-col MUX interleaving. Combining FF error rates reported in literature, we estimated chip-level SER and each contribution to chip-level SER for embedded and high-performance processors. For both the processors, without ECC, 95% errors occur at SRAM in both SOTB and bulk chips at 0.5 V and 1.0 V, and the overall chip-level SERs of the assumed SOTB chip at 0.5 V is at least 10 x lower than that of bulk chip. On the other hand, when ECC is applied to SRAM in the SOTB chip, SEUs occurring at FFs are dominant in the high-performance processor while MBUs at SRAMs are not negligible in the bulk embedded chips.
Toshinori SATO Tongxin YANG Tomoaki UKEZONO
Approximate computing is a promising paradigm to realize fast, small, and low power characteristics, which are essential for modern applications, such as Internet of Things (IoT) devices. This paper proposes the Carry-Predicting Adder (CPredA), an approximate adder that is scalable relative to accuracy and power consumption. The proposed CPredA improves the accuracy of a previously studied adder by performing carry prediction. Detailed simulations reveal that, compared to the existing approximate adder, accuracy is improved by approximately 50% with comparable energy efficiency. Two application-level evaluations demonstrate that the proposed approximate adder is sufficiently accurate for practical use.
The recent rapid increase in the scale of superconducting quantum computing systems greatly increases the demand for qubit control by digital circuits operating at qubit temperatures. In this paper, superconducting digital circuits, such as single-flux quantum and adiabatic quantum flux parametron circuits are described, that are promising candidates for this purpose. After estimating their energy consumption and speed, a conceptual overview of the superconducting electronics for controlling a multiple-qubit system is provided, as well as some of its component circuits.
In parallel computing systems, the interconnection network forms the critical infrastructure which enables robust and scalable communication between hundreds of thousands of nodes. The traditional packet-switched network tends to suffer from long communication time when network congestion occurs. In this context, we explore the use of circuit switching (CS) to replace packet switches with custom hardware that supports circuit-based switching efficiently with low latency. In our target CS network, a certain amount of bandwidth is guaranteed for each communication pair so that the network latency can be predictable when a limited number of node pairs exchange messages. The number of allocated time slots in every switch is a direct factor to affect the end-to-end latency, we thereby improve the slot utilization and develop a network topology generator to minimize the number of time slots optimized to target applications whose communication patterns are predictable. By a quantitative discrete-event simulation, we illustrate that the minimum necessary number of slots can be reduced to a small number in a generated topology by our design methodology while maintaining network cost 50% less than that in standard tori topologies.
In cloud computing environments, data processing systems with strong and stochastic stream data processing capabilities are highly desired by multi-service oriented computing-intensive applications. The independeny of different business data streams makes these services very suitable for parallel processing with the aid of multicore processors. Furthermore, for the random crossing of data streams between different services, data synchronization is required. Aiming at the stochastic cross service stream, we propose a hardware synchronization mechanism based on index tables. By using a specifically designed hardware synchronization circuit, we can record the business index number (BIN) of the input and output data flow of the processing unit. By doing so, we can not only obtain the flow control of the job package accessing the processing units, but also guarantee that the work of the processing units is single and continuous. This approach overcomes the high complexity and low reliability of the programming in the software synchronization. As demonstrated by numerical experiment results, the proposed scheme can ensure the validity of the cross service stream, and its processing speed is better than that of the lock-based synchronization scheme. This scheme is applied to a cryptographic server and accelerates the processing speed of the cryptographic service.
Katsuya OHISHI Takashi HISAKADO Tohlu MATSUSHIMA Osami WADA
This paper describes the equivalent-circuit model of a metamaterial composed of conducting spheres and wires. This model involves electromagnetic coupling between the conductors, with retardation. The lumped-parameter equivalent circuit, which imports retardation to the electromagnetic coupling, is developed in this paper from Maxwell's equation. Using the equivalent-circuit model, we clarify the relationship between the retardation and radiation loss; we theoretically demonstrate that the electromagnetic retardation in the near-field represents the radiation loss of the meta-atom in the far-field. Furthermore, this paper focuses on the retarded electromagnetic coupling between two meta-atoms; we estimate the changes in the resonant frequencies and the losses due to the distance between the two coupled meta-atoms. It is established that the dependence characteristics are significantly affected by electromagnetic retardation.
Nobuaki KOBAYASHI Tadayoshi ENOMOTO
We developed and applied a new circuit, called the “Self-controllable Voltage Level (SVL)” circuit, not only to expand both “write” and “read” stabilities, but also to achieve a low stand-by power and data holding capability in a single low power supply, 90-nm, 2-kbit, six-transistor CMOS SRAM. The SVL circuit can adaptively lower and higher the word-line voltages for a “read” and “write” operation, respectively. It can also adaptively lower and higher the memory cell supply voltages for the “write” and “hold” operations, and “read” operation, respectively. This paper focuses on the “hold” characteristics and the standby power dissipations (PST) of the developed SRAM. The average PST of the developed SRAM is only 0.984µW, namely, 9.57% of that (10.28µW) of the conventional SRAM at a supply voltage (VDD) of 1.0V. The data hold margin of the developed SRAM is 0.1839V and that of the conventional SRAM is 0.343V at the supply voltage of 1.0V. An area overhead of the SVL circuit is only 1.383% of the conventional SRAM.
Masaya TAMURA Yasumasa NAKA Kousuke MURAI
This paper presents the design of a capacitive coupler for underwater wireless power transfer (U-WPT) focusing on kQ product. Power transfer efficiency hinges on the coupling coefficient k between the couplers and Q-factor of water calculated from the complex permittivity. High efficiency can be achieved by handling k and the Q-factor effectively. First, the pivotal elements on k are derived from the equivalent circuit of the coupler. Next, the frequency characteristic of the Q-factor in tap water is calculated from the measured results. Then, the design parameters in which kQ product has the maximal values are determined. Finally, it is demonstrated that the efficiency of U-WPT with the capacitive coupling designed by our method achieves approximately 80%.
Tadashi KAWAI Kensuke NAGANO Akira ENOKIHARA
This paper presents a lumped-element Wilkinson power divider (WPD) using LC-ladder circuits composed of a capacitor and an inductor, and a series LR/CR circuit. The proposed WPD has only seven elements. As a result of designing the divider based on an even/odd mode analysis technique, we theoretically show that broadband WPDs can be realized compared to lumped-element WPDs composed of Π/T-networks and an isolation resistor. By designing the WPD to match at two operating frequencies, the relative bandwidth of about 42% can be obtained. This value is larger than that of the conventional WPD based on the distributed circuit theory. Electromagnetic simulation and experiment are performed to verify the design procedure for the lumped-element WPD designed at a center frequency of 922.5MHz, and good agreement with both is shown.