Menglong WU Yongfa XIE Yongchao SHI Jianwen ZHANG Tianao YAO Wenkai LIU
Direct-current biased optical orthogonal frequency division multiplexing (DCO-OFDM) converts bipolar OFDM signals into unipolar non-negative signals by introducing a high DC bias, which satisfies the requirement that the signal transmitted by intensity modulated/direct detection (IM/DD) must be positive. However, the high DC bias results in low power efficiency of DCO-OFDM. An adaptively biased optical OFDM was proposed, which could be designed with different biases according to the signal amplitude to improve power efficiency in this letter. The adaptive bias does not need to be taken off deliberately at the receiver, and the interference caused by the adaptive bias will only be placed on the reserved subcarriers, which will not affect the effective information. Moreover, the proposed OFDM uses Hartley transform instead of Fourier transform used in conventional optical OFDM, which makes this OFDM have low computational complexity and high spectral efficiency. The simulation results show that the normalized optical bit energy to noise power ratio (Eb(opt)/N0) required by the proposed OFDM at the bit error rate (BER) of 10-3 is, on average, 7.5 dB and 3.4 dB lower than that of DCO-OFDM and superimposed asymmetrically clipped optical OFDM (ACO-OFDM), respectively.
Eiji YOSHIYA Tomoya NAKANISHI Tsuyoshi ISSHIKI
In Internet of Things (IoT) applications, system-on-chip (SoCs) with embedded processors are widely used. As an embedded processor, RISC-V, which is license-free and has an extensible instruction set, is receiving attention. However, designing such embedded processors requires an enormous effort to achieve a highly efficient microarchitecture in terms of performance, power consumption, and circuit area, as well as the design verification of running complex software, including modern operating systems such as Linux. In this paper, we propose a method for directly describing the RTL structure of a pipelined RISC-V processor with cache memories, a memory management unit (MMU), and an AXI bus interface using the C++ language. This pipelined processor C++ model serves as a functional simulator of the complete RISC-V core, whereas our C2RTL framework translates the processor C++ model into a cycle-accurate RTL description in the Verilog-HDL and RTL-equivalent C model. Our processor design methodology using the C2RTL framework is unique compared to other existing methodologies because both the simulation and RTL models are derived from the same C++ source, which greatly simplifies the design verification and optimization processes. The effectiveness of our design methodology is demonstrated on a RISC-V processor that runs Linux OS on an FPGA board, achieving a significantly short simulation time of the original C++ processor model and RTL-equivalent C model in comparison to a commercial RTL simulator.
Yusuke YANO Kengo IOKIBE Toshiaki TESHIMA Yoshitaka TOYOTA Toshihiro KATASHITA Yohei HORI
Side-channel (SC) leakage from a cryptographic device chip is simulated as the dynamic current flowing out of the chip. When evaluating the simulated current, an evaluation by comparison with an actual measurement is essential; however, it is difficult to compare them directly. This is because a measured waveform is typically the output voltage of probe placed at the observation position outside the chip, and the actual dynamic current is modified by several transfer impedances. Therefore, in this paper, the probe voltage is converted into the dynamic current by using an EMC macro-model of a cryptographic device being evaluated. This paper shows that both the amplitude and the SC analysis (correlation power analysis and measurements to disclosure) results of the simulated dynamic current were evaluated appropriately by using the EMC macro-model. An evaluation confirms that the shape of the simulated current matches the measured one; moreover, the SC analysis results agreed with the measured ones well. On the basis of the results, it is confirmed that a register-transfer level (RTL) simulation of the dynamic current gives a reasonable estimation of SC traces.
Shogo SEMBA Hiroshi SAITO Masato TATSUOKA Katsuya FUJIMURA
In this paper, we propose four optimization methods during the Register Transfer Level (RTL) conversion from synchronous RTL models into asynchronous RTL models. The modularization of data-path resources and the use of appropriate D flip-flops reduce the circuit area. Fixing the control signal of the multiplexers and inserting latches for the data-path resources reduce the dynamic power consumption. In the experiment, we evaluated the effect of the proposed optimization methods. The combination of all optimization methods could reduce the energy consumption by 21.9% on average compared to the ones without the proposed optimization methods.
In this paper, to make asynchronous circuit design easy, we propose a conversion method from synchronous Register Transfer Level (RTL) models to asynchronous RTL models with bundled-data implementation. The proposed method consists of the generation of an intermediate representation from a given synchronous RTL model and the generation of an asynchronous RTL model from the intermediate representation. This allows us to deal with different representation styles of synchronous RTL models. We use the eXtensible Markup Language (XML) as the intermediate representation. In addition to the asynchronous RTL model, the proposed method generates a simulation model when the target implementation is a Field Programmable Gate Array and a set of non-optimization constraints for the control circuit used in logic synthesis and layout synthesis. In the experiment, we demonstrate that the proposed method can convert synchronous RTL models specified manually and obtained by a high-level synthesis tool to asynchronous ones.
Simulation-based verification of hardware designs, in particular, register-transfer-level (RTL) designs, has been widely used, and has been one of the major bottlenecks in design processes. One of the approaches is coverage-driven verification, of its target is improvement of some metric called coverage. In a prior work of ours, we have proposed a coverage-driven verification using both randomly generated simulation patterns and patterns generated by a SAT (satisfiability) solver, and have shown its effectiveness. In this paper, we extend this approach with an SMT (satisfiability modulo theory) solver, which can handle arithmetic relations among integer, floating-point or bit-vector variables. Experimental results show that the more arithmetic modules are included, the more an SMT-based method gets superior to the method using only a SAT solver.
Shimpei SATO Ryohei KOBAYASHI Kenji KISE
LSIs are generally designed through four stages including architectural design, logic design, circuit design, and physical design. In architectural design and logic design, designers describe their target hardware in RTL. However, they generally use different languages for each phase. Typically a general purpose programming language such as C or C++ and a hardware description language such as Verilog HDL or VHDL are used for architectural design and logic design, respectively. That is time-consuming way for designing a hardware and more efficient design environment is required. In this paper, we propose a new hardware modeling and high-speed simulation environment for architectural design and logic design. Our environment realizes writing and verifying hardware by one language. The environment consists of (1) a new hardware description language called ArchHDL, which enables to simulate hardware faster than Verilog HDL simulation, and (2) a source code translation tool from ArchHDL code to Verilog HDL code. ArchHDL is a new language for hardware RTL modeling based on C++. The key features of this language are that (1) designers describe a combinational circuit as a function and (2) the ArchHDL library realizes non-blocking assignment in C++. Using these features, designers are able to write a hardware transparently from abstracted level description to RTL description in Verilog HDL-like style. Source codes in ArchHDL is converted to Verilog HDL codes by the translation tool and they are used to synthesize for FPGAs or ASICs. As the evaluation of our environment, we implemented a practical many-core processor in ArchHDL and measured the simulation speed on an Intel CPU and an Intel Xeon Phi processor. The simulation speed for the Intel CPU by ArchHDL achieves about 4.5 times faster than the simulation speed by Synopsys VCS. We also confirmed that the RTL simulation by ArchHDL is efficiently parallelized on the Intel Xeon Phi processor. We convert the ArchHDL code to a Verilog HDL code and estimated the hardware utilization on an FPGA. To implement a 48-node many-core processor, 71% of entire resources of a Virtex-7 FPGA are consumed.
Yosuke KAKIUCHI Kiyoharu HAMAGUCHI
Verification of logic designs has been a long-standing bottleneck in the process of hardware design, where its automation and improvement of efficiency has demanding needs. Mainly simulation-based verification has been used for this purpose, and recently, coverage-driven verification has been widely used, of which target is improvement of some metric called coverage. Our target is the metric called toggle coverage. To find input patterns which cause some toggles on each signal, a SAT solver could be used, but this is computationally costly. In this paper, we study the effect of combination of random simulation and usage of a SAT solver. In particular, we use a SAT solver which can find multiple “diverse” solutions. With this solver, we can avoid generating similar patterns, which are unlikely to improve coverage. The experimental results show that, a small number of calls of a SAT solver can improve entire toggle coverage effectively, compared with simple random simulation.
This paper presents an M-channel (M=2n (n ∈ N)) integer discrete cosine transforms (IntDCTs) based on fast Hartley transform (FHT) for lossy-to-lossless image coding which has image quality scalability from lossy data to lossless data. Many IntDCTs with lifting structures have already been presented to achieve lossy-to-lossless image coding. Recently, an IntDCT based on direct-lifting of DCT/IDCT, which means direct use of DCT and inverse DCT (IDCT) to lifting blocks, has been proposed. Although the IntDCT shows more efficient coding performance than any conventional IntDCT, it entails many computational costs due to an extra information that is a key point to realize its direct-lifting structure. On the other hand, the almost conventional IntDCTs without an extra information cannot be easily expanded to a larger size than the standard size M=8, or the conventional IntDCT should be improved for efficient coding performance even if it realizes an arbitrary size. The proposed IntDCT does not need any extra information, can be applied to size M=2n for arbitrary n, and shows better coding performance than the conventional IntDCTs without any extra information by applying the direct-lifting to the pre- and post-processing block of DCT. Moreover, the proposed IntDCT is implemented with a half of the computational cost of the IntDCT based on direct-lifting of DCT/IDCT even though it shows the best coding performance.
Yuya MIYAOKA Yuhei NAGAO Masayuki KUROSAKI Hiroshi OCHI
In this paper, we propose a hardware architecture of high-speed sorted QR decomposition for 44 MIMO wireless communication systems. QR decomposition (QRD) is commonly used in many MIMO detection algorithms. In particular, sorted QR decomposition (SQRD) is the advanced algorithm to improve MIMO detection performance. We design an SQRD hardware architecture by using a modified Gram-Schmidt algorithm with pipelining and recursive processing. In addition, we propose an extended architecture which can decompose an augmented channel matrix for MMSE MIMO detection. These architecture can be applied in high-throughput MIMO-OFDM system such as IEEE802.11n which supports data throughput of up to 600 Mbps. We implement the proposed SQRD architecture and the proposed MMSE-SQRD architecture with 179k and 334k gates in 90 nm CMOS technology. These proposed design can achieve a high performance of up to 40.8 and 50.0 million 44 SQRD operations per second with the maximum operating frequency of 245 and 300 MHz.
Leonardo LANANTE, Jr. Masayuki KUROSAKI Hiroshi OCHI
Conventional algorithms for the joint estimation of carrier frequency offset (CFO) and I/Q imbalance no longer work when the I/Q imbalance depends on the frequency. In order to correct the imbalance across many frequencies, the compensator needed is a filter as opposed to a simple gain and phase compensator. Although, algorithms for estimating the optimal coefficients of this filter exist, their complexity is too high for hardware implementation. In this paper we present a new low complexity algorithm for joint estimation of CFO and frequency dependent I/Q imbalance. For the first part, we derive the estimation scheme using the linear least squares algorithm and examine its floating point performance compared to conventional algorithms. We show that the proposed algorithm can completely eliminate BER floor caused by CFO and I/Q imbalance at a lesser complexity compared to conventional algorithms. For the second part, we examine the hardware complexity in fixed point hardware and latency of the proposed algorithm. Based on BER performance, the circuit needs a wordlength of at least 16 bits in order to properly estimate CFO and I/Q imbalance. In this configuration, the circuit is able to achieve a maximum speed of 115.9 MHz in a Virtex 5 FPGA.
Wahyul Amien SYAFEI Yuhei NAGAO Ryuta IMASHIOYA Masayuki KUROSAKI Baiko SAI Hiroshi OCHI
This paper deals with our works on developing a high-throughput wireless LAN using a group layered space-time (GLST) system with low-complexity MIMO decoder. It achieves the throughput of 600 Mbps for 30 meter propagation distance by utilizing 80 MHz bandwidth in the 5 GHz frequency band. Run test under channel model B of IEEE802.11TGn demonstrates its excellent performance. The register transfer level results show that the developed system is synthesized successfully and the prototyping in the target FPGA chips of Stratix II EP2S180F1508C4 gives the expected results.
Marie Engelene J. OBIEN Satoshi OHTAKE Hideo FUJIWARA
Due to the difficulty of test pattern generation for sequential circuits, several design-for-testability (DFT) approaches have been proposed. An improvement to these current approaches is needed to cater to the requirements of today's more complicated chips. This paper introduces a new DFT method applicable to high-level description of circuits, which optimally utilizes existing functional elements and paths for test. This technique, called F-scan, effectively reduces the hardware overhead due to test without compromising fault coverage. Test application time is also kept at the minimum. The comparison of F-scan with the performance of gate-level full scan design is shown through the experimental results.
Muh-Tian SHIUE Chin-Kuo JAO Pei-Shin CHEN
In this paper, a novel orthogonal frequency-division multiplexing (OFDM) modulator/demodulator based on real-valued discrete Hartley transform (DHT) is presented and implemented for the IEEE 802.11a/g wireless local area network (LAN). Instead of the conventional complex-valued fast Fourier transform (FFT) for OFDM systems, the proposed architecture employs two real-valued fast DHT (FHT) kernels and one post processing unit. By taking advantage of the real-valued operation of FHT, this approach reduces the number of multiplications compared with the radix-2 FFT. The proposed DHT-based modulator/demodulator was designed and fabricated in 0.18-µm CMOS technology with a core area of 928935 µm2. The average power consumption is about 20.16 mW at 20 MHz and 1.8 V supply voltage. Measurement results of the integrated circuit illustrate its superior chip area and power consumption.
This paper introduces a coordinate calculation method for a real-time locating system. A ToA algorithm is used to obtain the target node coordinates, but a conventional DC method, which incurs heavy calculation time, is not suitable for embedded systems. This paper proposes the use of a P-control in the PID control algorithm to resolve real-time locating system issues. Performance measures of the accumulated operator number and position error are evaluated. It is shown that the PID method has less calculation and more robust performance than the DC method.
Bijan ALIZADEH Masahiro FUJITA
In this paper, we introduce a unified framework based on a canonical decision diagram called Horner Expansion Diagram (HED) [1] for the purpose of equivalence checking of datapath oriented hardware designs in various design stages from an algorithmic description to the gate-level implementation. The HED is not only able to represent and manipulate algorithmic specifications in terms of polynomial expressions with modulo equivalence but also express bit level adder (BLA) description of gate-level implementations. Our HED can support modular arithmetic operations over integer rings of the form Z2n. The proposed techniques have successfully been applied to equivalence checking on industrial benchmarks. The experimental results on different applications have shown the significant advantages over existing bit-level and also word-level equivalence checking techniques.
Taekyu KIM Jin LEE Seungbeom LEE Sin-Chong PARK
Tracking a large quantity of moving target tags simultaneously is essential for the localization and guidance of people in welfare facilities like hospitals and sanatoriums for the aged. The locating system using active RFID technology consists of a number of fixed RFID readers and tags carried by the target objects, or senior people. We compare the performances of several determination algorithms which use the power measurement of received signals emitted by the moving active RFID tags. This letter presents a study on the effect of collision in tracking large quantities of objects based on active RFID real time location system (RTLS). Traditional trilateration, fingerprinting, and well-known LANDMARC algorithm are evaluated and compared with varying number of moving tags through the SystemC-based computer simulation. From the simulation, we show the tradeoff relationship between the number of moving tags and estimation accuracy.
Sheng-Lyang JANG Chia-Wei CHANG Sheng-Chien WU Chien-Feng LEE Lin-yen TSAI Jhin-Fang HUANG
Novel low phase noise quadrature voltage-controlled oscillator (QVCO) and quadrature injection locked frequency divider (QILFD) with two coupled Hartley VCOs are proposed and implemented using the standard TSMC 0.18 µm CMOS 1P6M process. The QVCO employs pMOS as the core to reduce the up-conversion of low-frequency device noise to RF phase noise. It uses super-harmonic coupling technique to couple two differential Hartley VCOs and four small-size coupling transistors to set the directivity of quadrature output phases. At the 1.7 V supply voltage, the output phase noise of the QVCO is -124 dBc/Hz at 1 MHz offset frequency from the carrier frequency of 4.12 GHz, and the figure of merit is -185 dBc/Hz. At the supply voltage of 1.7 V, the total power consumption is 13.1 mW. At the supply voltage of 1.5 V, the tuning range of the free-running QILFD is from 2.05 GHz to 2.36 GHz, about 310 MHz, and the locking range of the ILFD is from 3.99 to 5.19 GHz, about 1.20 GHz, at the injection signal power of 0 dBm.
Tetsuji OGAWA Tetsunori KOBAYASHI
A discriminative modeling is applied to optimize the structure of a Partly-Hidden Markov Model (PHMM). PHMM was proposed in our previous work to deal with the complicated temporal changes of acoustic features. It can represent observation dependent behaviors in both observations and state transitions. In the formulation of the previous PHMM, we used a common structure for all models. However, it is expected that the optimal structure which gives the best performance differs from category to category. In this paper, we designed a new structure optimization method in which the dependence of the states and the observations of PHMM are optimally defined according to each model using the weighted likelihood-ratio maximization (WLRM) criterion. The WLRM criterion gives high discriminability between the correct category and the incorrect categories. Therefore it gives model structures with good discriminative performance. We define the model structure combination which satisfy the WLRM criterion for any possible structure combinations as the optimal structures. A genetic algorithm is also applied to the adequate approximation of a full search. With results of continuous lecture talk speech recognition, the effectiveness of the proposed structure optimization is shown: it reduced the word errors compared to HMM and PHMM with a common structure for all models.
Zhiqiang YOU Ken'ichi YAMAGUCHI Michiko INOUE Jacob SAVIR Hideo FUJIWARA
This paper proposes two power-constrained test synthesis schemes and scheduling algorithms, under non-scan BIST, for RTL data paths. The first scheme uses boundary non-scan BIST, and can achieve low hardware overheads. The second scheme uses generic non-scan BIST, and can offer some tradeoffs between hardware overhead, test application time and power dissipation. A designer can easily select an appropriate design parameter based on the desired tradeoff. Experimental results confirm the good performance and practicality of our new approaches.