The search functionality is under construction.
The search functionality is under construction.

IEICE TRANSACTIONS on Fundamentals

  • Impact Factor

    0.40

  • Eigenfactor

    0.003

  • article influence

    0.1

  • Cite Score

    1.1

Advance publication (published online immediately after acceptance)

Volume E95-A No.12  (Publication Date:2012/12/01)

    Special Section on Information Theory and Its Applications
  • FOREWORD

    Hiroshi KAMABE  

     
    FOREWORD

      Page(s):
    2099-2099
  • On the Achievable Rate Region in the Optimistic Sense for Separate Coding of Two Correlated General Sources

    Hiroki KOGA  

     
    PAPER-Source Coding

      Page(s):
    2100-2106

    This paper is concerned with coding theorems in the optimistic sense for separate coding of two correlated general sources X1 and X2. We investigate the achievable rate region Ropt (X1,X2) such that the decoding error probability caused by two encoders and one decoder can be arbitrarily small infinitely often under a certain rate constraint. We give an inner and an outer bounds of Ropt (X1,X2), where the outer bound is described by using new information-theoretic quantities. We also give two simple sufficient conditions under which the inner bound coincides with the outer bound.

  • Construction of Independent Set and Its Application for Designed Minimum Distance

    Junru ZHENG  Takayasu KAIDA  

     
    PAPER-Coding Theory

      Page(s):
    2107-2112

    The shift bound is a good lower bound of the minimum distance for cyclic codes, Reed-Muller codes and geometric Goppa codes. It is necessary to construct the maximum value of the independent set. However, its computational complexity is very large. In this paper, we consider cyclic codes defined by their defining set, and a new method to calculate the lower bound of the minimum distance using the discrete Fourier transform (DFT) is shown. The computational complexity of this method is compared with the shift bound's one. Moreover construction of independent set is shown.

  • Analysis of Error Floors for Non-binary LDPC Codes over General Linear Group through q-Ary Memoryless Symmetric Channels

    Takayuki NOZAKI  Kenta KASAI  Kohichi SAKANIWA  

     
    PAPER-Coding Theory

      Page(s):
    2113-2121

    In this paper, we compare the decoding error rates in the error floors for non-binary low-density parity-check (LDPC) codes over general linear groups with those for non-binary LDPC codes over finite fields transmitted through the q-ary memoryless symmetric channels under belief propagation decoding. To analyze non-binary LDPC codes defined over both the general linear group GL(m, F2) and the finite field F2m, we investigate non-binary LDPC codes defined over GL(m3, F2m4). We propose a method to lower the error floors for non-binary LDPC codes. In this analysis, we see that the non-binary LDPC codes constructed by our proposed method defined over general linear group have the same decoding performance in the error floors as those defined over finite field. The non-binary LDPC codes defined over general linear group have more choices of the labels on the edges which satisfy the condition for the optimization.

  • Simple Nonbinary Coding Strategy for Very Noisy Relay Channels

    Puripong SUTHISOPAPAN  Kenta KASAI  Anupap MEESOMBOON  Virasit IMTAWIL  Kohichi SAKANIWA  

     
    PAPER-Coding Theory

      Page(s):
    2122-2129

    From an information-theoretic point of view, it is well known that the capacity of relay channels comprising of three terminals is much greater than that of two terminal direct channels especially for low SNR region. Previously invented relay coding strategies have not been designed to achieve this relaying gain occurring in the low SNR region. In this paper, we propose a new simple coding strategy for a relay channel with low SNR or, equivalently, for a very noisy relay channel. The multiplicative repetition is utilized to design this simple coding strategy. We claim that the proposed strategy is simple since the destination and the relay can decode with almost the same computational complexity by sharing the same structure of decoder. An appropriate static power allocation which yields the maximum throughput close to the optimal one in low SNRs is also suggested. Under practical constraints such as equal time-sharing etc., the asymptotic performance of this simple strategy is within 0.5 dB from the achievable rate of a relay channel. Furthermore, the performance at few thousand bits enjoys a relaying gain by approximately 1 dB.

  • The Expected Write Deficiency of Index-Less Indexed Flash Codes

    Yuichi KAJI  

     
    PAPER-Coding Theory

      Page(s):
    2130-2138

    The expected write deficiency of the index-less indexed flash codes (ILIFC) is studied. ILIFC is a coding scheme for flash memory, and consists of two stages with different coding techniques. This study investigates the write deficiency of the first stage of ILIFC, and shows that omitting the second stage of ILIFC can be a practical option for realizing flash codes with good average performance. To discuss the expected write deficiency of ILIFC, a random walk model is introduced as a formalization of the behavior of ILIFC. Based on the random walk model, two different techniques are developed to estimate the expected write deficiency. One technique requires some computation, but gives very precise estimation of the write deficiency. The other technique gives a closed-form formula of the write deficiency under a certain asymptotic scenario.

  • Parameterization of Perfect Sequences over a Composition Algebra

    Takao MAEDA  Takafumi HAYASHI  

     
    PAPER-Sequence

      Page(s):
    2139-2147

    A parameterization of perfect sequences over composition algebras over the real number field is presented. According to the proposed parameterization theorem, a perfect sequence can be represented as a sum of trigonometric functions and points on a unit sphere of the algebra. Because of the non-commutativity of the multiplication, there are two definitions of perfect sequences, but the equivalence of the definitions is easily shown using the theorem. A composition sequence of sequences is introduced. Despite the non-associativity, the proposed theorem reveals that the composition sequence from perfect sequences is perfect.

  • Cryptanalysis of Stream Ciphers from a New Aspect: How to Apply Key Collisions to Key Recovery Attack

    Jiageng CHEN  Atsuko MIYAJI  

     
    PAPER-Cryptography

      Page(s):
    2148-2159

    In this paper, we propose two new attacks against stream cipher RC4 which can recover the secret key in different length with practical computational amount. However, we have to point out that the proposed attacks are performed under relatively strong related key models. The same as the usual related key models, the adversary can specify the key differentials without knowing the target key information. However, in our attacks, only the relation between two keystream outputs or the two final internal states are required for the attacker. In addition, we discover a statistical bias of RC4 which is the key point to one of the attacks. Besides the inappropriate usage during the WEP environment, RC4 is still considered to be secure with the proper setting, and we believe the result of this paper will add to the understanding of RC4 and how to use it correctly and safely.

  • Two-Dimensional Optical CDMA Systems Based on MWOOC with Generalized Prime Sequences

    Agus SUSILO  Tomoko K. MATSUSHIMA  Yasuaki TERAMACHI  

     
    PAPER-Spread Spectrum

      Page(s):
    2160-2167

    Two-dimensional (2-D) codes for optical code-division multiple access (O-CDMA) systems can increase the number of subscribers and simultaneous users as compared to one-dimensional time-spreading codes. Multiple-wavelength optical orthogonal code (MWOOC), which is one of the 2-D codes, uses prime sequences as a wavelength-hopping code and an optical orthogonal code (OOC) as a time-spreading code. MWOOCs have some advantages over other 2-D codes especially in high bit-rate O-CDMA systems. The only drawback of MWOOC is that the performance degrades significantly when the number of wavelengths is not prime. Recently a generalized class of modified prime sequence codes (MPSCs), which includes the class of original MPSCs as its subclass, was presented. An important property of generalized MPSCs is that the codes can be constructed over not only prime fields but also extension fields. It has been shown that the correlation property of generalized MPSCs is the same as that of the original MPSCs. This paper investigates MWOOC with generalized prime sequences, which can be obtained in the process of generating the generalized MPSCs, as a wavelength-hopping code. Use of the generalized prime sequences can solve the nonprime problem of MWOOCs. The average error probability of the proposed MWOOCs is formulated theoretically and numerical results are compared with that of the original schemes. It is shown that nonprime numbers, such as 2m, 3m and 5m, can be also taken as the number of wavelengths without degrading the system performance in the proposed systems.

  • Granular Gain of Low-Dimensional Lattices from Binary Linear Codes

    Misako KOTANI  Shingo KAWAMOTO  Motohiko ISAKA  

     
    LETTER-Coding Theory

      Page(s):
    2168-2170

    Granular gain of low-dimensional lattices based on binary linear codes is estimated using a quantization algorithm which is equivalently a soft-decision decoding of the underlying code. It is shown that substantial portion of the ultimate granular gain is achieved even in limited dimensions.

  • Special Section on VLSI Design and CAD Algorithms
  • FOREWORD

    Masahiro NUMA  

     
    FOREWORD

      Page(s):
    2171-2171
  • Yield-Driven Clock Skew Scheduling for Arbitrary Distributions of Critical Path Delays

    Yanling ZHI  Wai-Shing LUK  Yi WANG  Changhao YAN  Xuan ZENG  

     
    PAPER-Physical Level Design

      Page(s):
    2172-2181

    Yield-driven clock skew scheduling was previously formulated as a minimum cost-to-time ratio cycle problem, by assuming that variational path delays are in Gaussian distributions. However in today's nanometer technology, process variations show growing impacts on this assumption, as variational delays with non-Gaussian distributions have been observed on these paths. In this paper, we propose a novel yield-driven clock skew scheduling method for arbitrary distributions of critical path delays. Firstly, a general problem formulation is proposed. By integrating the cumulative distribution function (CDF) of critical path delays, the formulation is able to handle path delays with any distributions. It also generalizes the previous formulations on yield-driven clock skew scheduling and indicates their statistical interpretations. Generalized Howard algorithm is derived for finding the critical cycles of the underlying timing constraint graphs. Moreover, an effective algorithm based on minimum balancing is proposed for the overall yield improvement. Experimental results on ISCAS89 benchmarks show that, compared with two representative existing methods, our method remarkably improves the yield by 10.25% on average (up to 14.66%).

  • Via Programmable Structured ASIC Architecture “VPEX3” and CAD Design System

    Ryohei HORI  Taisuke UEOKA  Taku OTANI  Masaya YOSHIKAWA  Takeshi FUJINO  

     
    PAPER-Physical Level Design

      Page(s):
    2182-2190

    A low-cost and low-power via-programmable structured ASIC architecture named “VPEX3” and a VPEX3-specific CAD system are developed. In the VPEX3 architecture, which is an improved version of the old VPEX and VPEX2 architectures, an arbitrary logic function including sequential logic can be programmed by three via layers. The logic elements (LEs) of VPEX3 are 60% smaller than those of the previous VPEX2, which can be programmed by two via layers. In this paper, we describe a global architecture named Logic Array Block (LAB) composed of LE matrices. The clock lines are buffered in the buffering region on the left and right sides of LAB. Next, a VPEX3-specific CAD system utilizing an academic placement tool named “CAPO” and the “FGR” global router is developed. Since these tools are originally designed for ASICs, we developed CAD tools for supporting a structured ASIC architecture. In particular, we developed a detailed router that assigns via positions on the via-programmable routing fabric. Our CAD system successfully converts the HDL design to GDS-II data format including via-1, 2, 3 layouts, and the successful verification of LVS and DRC on GDSII is achieved. The performance of the VPEX3 architecture and the CAD system is evaluated using ISCAS benchmark circuits. The developed CAD system is used to successfully design a test chip composed of 130110 LEs.

  • On Gate Level Power Optimization of Combinational Circuits Using Pseudo Power Gating

    Yu JIN  Shinji KIMURA  

     
    PAPER-Physical Level Design

      Page(s):
    2191-2198

    In recent years, the demand for low-power design has remained undiminished. In this paper, a pseudo power gating (SPG) structure using a normal logic cell is proposed to extend the power gating to an ultrafine grained region at the gate level. In the proposed method, the controlling value of a logic element is used to control the switching activity of modules computing other inputs of the element. For each element, there exists a submodule controlled by an input to the element. Power reduction is maximized by controlling the order of the submodule selection. A basic algorithm and a switching activity first algorithm have been developed to optimize the power. In this application, a steady maximum depth constraint is added to prevent the depth increase caused by the insertion of the control signal. In this work, various factors affecting the power consumption of library level circuits with the SPG are determined. In such factors, the occurrence of glitches increases the power consumption and a method to reduce the occurrence of glitches is proposed by considering the parity of inverters. The proposed SPG method was evaluated through the simulation of the netlist extracted from the layout using the VDEC Rohm 0.18 µm process. Experiments on ISCAS'85 benchmarks show that the reduction in total power consumption achieved is 13% on average with a 2.5% circuit delay degradation. Finally, the effectiveness of the proposed method under different primary input statistics is considered.

  • Region Oriented Routing FPGA Architecture for Dynamic Power Gating

    Ce LI  Yiping DONG  Takahiro WATANABE  

     
    PAPER-Physical Level Design

      Page(s):
    2199-2207

    Dynamic power gating applicable to FPGA can reduce the power consumption effectively. In this paper, we propose a sophisticated routing architecture for a region oriented FPGA which supports dynamic power gating. This is the first routing solution of dynamic power gating for coarse-grained FPGA. This paper has 2 main contributions. First, it improves the routing resource graph and routing architecture to support special routing for a region oriented FPGA. Second, some routing channels are made wider to avoid congestion. Experimental result shows that 7.7% routing area can be reduced compared with the symmetric Wilton switch box in the region. Also, our proposed FPGA architecture with sophisticated P&R can reduce the power consumption of the system implemented in FPGA.

  • Novel Voltage Choice and Min-Cut Based Assignment for Dual-VDD System

    Haiqi WANG  Sheqin DONG  Tao LIN  Song CHEN  Satoshi GOTO  

     
    PAPER-Physical Level Design

      Page(s):
    2208-2219

    Dual-vdd has been proposed to optimize the power of circuits without violating the performance. In this paper, different from traditional methods which focus on making full use of slacks of non-critical gates, an efficient min-cut based voltage assignment algorithm concentrating on critical gates is proposed. And then this algorithm is integrated into a searching engine to auto-select rational voltages for dual-vdd system. Experimental results show that our search engine can always achieve good pair of dual-vdd, and our min-cut based algorithm outperformed previous works for voltage assignment both on power consumption and runtime.

  • Power Gating Implementation for Supply Noise Mitigation with Body-Tied Triple-Well Structure

    Yasumichi TAKAI  Masanori HASHIMOTO  Takao ONOYE  

     
    PAPER-Circuit Design

      Page(s):
    2220-2225

    This paper investigates power gating implementations that mitigate power supply noise. We focus on the body connection of power-gated circuits, and examine the amount of power supply noise induced by power-on rush current and the contribution of a power-gated circuit as a decoupling capacitance during the sleep mode. To figure out the best implementation, we designed and fabricated a test chip in 65 nm process. Experimental results with measurement and simulation reveal that the power-gated circuit with body-tied structure in triple-well is the best implementation from the following three points; power supply noise due to rush current, the contribution of decoupling capacitance during the sleep mode and the leakage reduction thanks to power gating.

  • A 128-bit Chip Identification Generating Scheme Exploiting Load Transistors' Variation in SRAM Bitcells

    Shunsuke OKUMURA  Shusuke YOSHIMOTO  Hiroshi KAWAGUCHI  Masahiko YOSHIMOTO  

     
    PAPER-Circuit Design

      Page(s):
    2226-2233

    We propose a chip identification (ID) generating scheme with random variation of transistor characteristics in SRAM bitcells. In the proposed scheme, a unique fingerprint is generated by grounding both bitlines in write operations. Through minor modifications, this scheme can be implemented for existing SRAMs. It has high speed, and it can be implemented in a very small area overhead. The generated fingerprint mainly reflects threshold voltages of load transistors in the bitcells. We fabricated test chips in a 65-nm process and obtained 12,288 sets of unique 128-bit fingerprints, which are evaluated in this paper. The failure rate of the IDs is found to be 2.110-12.

  • All-Digital Wireless Transceiver with Sub-Sampling Demodulation and Burst-Error Correction

    Sanad BUSHNAQ  Makoto IKEDA  Kunihiro ASADA  

     
    PAPER-Circuit Design

      Page(s):
    2234-2241

    In this paper, an all-digital wireless transceiver for near-field communication (NFC) is presented. A novel modulation technique that allows employing only all-digital components in the transceiver is used. The front-end uses all-digital sub-sampling for carrier demodulation, which does not need synchronization circuitry. Burst-errors generated by the front-end are corrected in baseband using hamming code and interleaving techniques. Experimentally, the all-digital transceiver was tested on FPGAs that performed successful wireless communication at range/diameter equal to 1, which is higher than recent NFC research. Our transceiver uses only all-digital components, and consumes less area compared to other research.

  • A Variability-Aware Energy-Minimization Strategy for Subthreshold Circuits

    Junya KAWASHIMA  Hiroshi TSUTSUI  Hiroyuki OCHI  Takashi SATO  

     
    PAPER-Device and Circuit Modeling and Analysis

      Page(s):
    2242-2250

    We investigate a design strategy for subthreshold circuits focusing on energy-consumption minimization and yield maximization under process variations. The design strategy is based on the following findings related to the operation of low-power CMOS circuits: (1) The minimum operation voltage (VDDmin) of a circuit is dominated by flip-flops (FFs), and VDDmin of an FF can be improved by upsizing a few key transistors, (2) VDDmin of an FF is stochastically modeled by a log-normal distribution, (3) VDDmin of a large circuit can be efficiently estimated by using the above model, which eliminates extensive Monte Carlo simulations, and (4) improving VDDmin may substantially contribute to decreasing energy consumption. The effectiveness of the proposed design strategy has been verified through circuit simulations on various circuits, which clearly show the design tradeoff between voltage scaling and transistor sizing.

  • A Globally Convergent Nonlinear Homotopy Method for MOS Transistor Circuits

    Dan NIU  Kazutoshi SAKO  Guangming HU  Yasuaki INOUE  

     
    PAPER-Device and Circuit Modeling and Analysis

      Page(s):
    2251-2260

    Finding DC operating points of nonlinear circuits is an important and difficult task. The Newton-Raphson method adopted in the SPICE-like simulators often fails to converge to a solution. To overcome this convergence problem, homotopy methods have been studied from various viewpoints. However, most previous studies are mainly focused on the bipolar transistor circuits and no paper presents the global convergence theorems of homotopy methods for MOS transistor circuits. Moreover, due to the improvements and advantages of MOS transistor technologies, extending the homotopy methods to MOS transistor circuits becomes more and more necessary and important. This paper proposes two nonlinear homotopy methods for MOS transistor circuits and proves the global convergence theorems for the proposed MOS nonlinear homotopy method II. Numerical examples show that both of the two proposed homotopy methods for MOS transistor circuits are more effective for finding DC operating points than the conventional MOS homotopy method and they are also capable of finding DC operating points for large-scale circuits.

  • Power Distribution Network Optimization for Timing Improvement with Statistical Noise Model and Timing Analysis

    Takashi ENAMI  Takashi SATO  Masanori HASHIMOTO  

     
    PAPER-Device and Circuit Modeling and Analysis

      Page(s):
    2261-2271

    We propose an optimization method for power distribution network that explicitly deals with timing. We have found and focused on the facts that decoupling capacitance (decap) does not necessarily improve gate delay depending on the switching timing within a cycle and that power wire expansion may locally degrade the voltage. To resolve the above facts, we devised an efficient sensitivity calculation of timing to decap size and power wire width for guiding optimization. The proposed method, which is based on statistical noise modeling and timing analysis, accelerates sensitivity calculation with an approximation and adjoint sensitivity analysis. Experimental results show that decap allocation based on the sensitivity analysis efficiently minimizes the worst-case circuit delay within a given decap budget. Compared to the maximum decap placement, the delay improvement due to decap increases by 3.13% even while the total amount of decaps is reduced to 40%. The wire sizing with the proposed method also efficiently reduces required wire resource necessary to attain the same circuit delay by 11.5%.

  • Bayesian Estimation of Multi-Trap RTN Parameters Using Markov Chain Monte Carlo Method

    Hiromitsu AWANO  Hiroshi TSUTSUI  Hiroyuki OCHI  Takashi SATO  

     
    PAPER-Device and Circuit Modeling and Analysis

      Page(s):
    2272-2283

    Random telegraph noise (RTN) is a phenomenon that is considered to limit the reliability and performance of circuits using advanced devices. The time constants of carrier capture and emission and the associated change in the threshold voltage are important parameters commonly included in various models, but their extraction from time-domain observations has been a difficult task. In this study, we propose a statistical method for simultaneously estimating interrelated parameters: the time constants and magnitude of the threshold voltage shift. Our method is based on a graphical network representation, and the parameters are estimated using the Markov chain Monte Carlo method. Experimental application of the proposed method to synthetic and measured time-domain RTN signals was successful. The proposed method can handle interrelated parameters of multiple traps and thereby contributes to the construction of more accurate RTN models.

  • Co-simulation of On-Chip and On-Board AC Power Noise of CMOS Digital Circuits

    Kumpei YOSHIKAWA  Yuta SASAKI  Kouji ICHIKAWA  Yoshiyuki SAITO  Makoto NAGATA  

     
    PAPER-Device and Circuit Modeling and Analysis

      Page(s):
    2284-2291

    Capacitor charging modeling efficiently and accurately represents power consumption current of CMOS digital circuits and actualizes co-simulation of AC power noise including the interaction with on-chip and on-board integrated power delivery network (PDN). It is clearly demonstrated that the AC power noise is dominantly characterized by the frequency-dependent impedance of PDN and also by the operating frequency of circuits as well. A 65 nm CMOS chip exhibits the AC power noise components in substantial relation with the parallel resonance of the PDN seen from on-chip digital circuits. An on-chip noise monitor measures in-circuit power supply voltage, while a near-field magnetic probing derives on-board power supply current. The proposed co-simulation well matches the power noise measurements. The proposed AC noise co-simulation will be essentially applicable in the design of PDNs toward on-chip power supply integrity (PSI) and off-chip electromagnetic compatibility (EMC).

  • A Body Bias Clustering Method for Low Test-Cost Post-Silicon Tuning

    Shuta KIMURA  Masanori HASHIMOTO  Takao ONOYE  

     
    PAPER-Logic Synthesis, Test and Verification

      Page(s):
    2292-2300

    Post-silicon tuning is attracting a lot of attention for coping with increasing process variation. However, its tuning cost via testing is still a crucial problem. In this paper, we propose tuning-friendly body bias clustering with multiple bias voltages. The proposed method provides a small set of compensation levels so that the speed and leakage current vary monotonically according to the level. Thanks to this monotonic leveling and limitation of the number of levels, the test-cost of post-silicon tuning is significantly reduced. During the body bias clustering, the proposed method explicitly estimates and minimizes the average leakage after the post-silicon tuning. Experimental results demonstrate that the proposed method reduces the average leakage by 25.3 to 51.9% compared to non clustering case. In a test case of four clusters, the number of necessary tests is reduced by 83% compared to the conventional exhaustive test approach. We reveal that two bias voltages are sufficient when only a small number of compensation levels are allowed for test-cost reduction. We also give an implication on how to synthesize a circuit to which post-silicon tuning will be applied.

  • Evaluation of a New Power-Gating Scheme Utilizing Data Retentiveness on Caches

    Kyundong KIM  Seidai TAKEDA  Shinobu MIWA  Hiroshi NAKAMURA  

     
    PAPER-Logic Synthesis, Test and Verification

      Page(s):
    2301-2308

    Caches are one of the most leakage consuming components in modern processor because of massive amount of transistors. To reduce leakage power of caches, several techniques using power-gating (PG) were proposed. Despite of its high leakage saving, a side effect of PG for caches is the loss of data during a sleep. If useful data is lost in sleep mode, it should be fetched again from a lower level memory. This consumes a considerable amount of energy, which very unfortunately mitigates the leakage saving. This paper proposes a new PG scheme considering data retentiveness of SRAM. After entering the sleep mode, data of an SRAM cell is not lost immediately and is usable by checking the validity of the data. Therefore, we utilize data retentiveness of SRAM to avoid energy overhead for data recovery, which results in further chance of leakage saving. To check availability, we introduce a simple hardware whose overhead is ignorable. Our experimental result shows that utilizing data retentiveness saves up to 32.42% of more leakage than conventional PG.

  • Transaction Ordering in Network-on-Chips for Post-Silicon Validation

    Amir Masoud GHAREHBAGHI  Masahiro FUJITA  

     
    PAPER-Logic Synthesis, Test and Verification

      Page(s):
    2309-2318

    In this paper, we have addressed the problem of ordering transactions in network-on-chips (NoCs) for post-silicon validation. The main idea is to extract the order of the transactions from the local partial orders in each NoC tile based on a set of “happened-before” rules, assuming transactions do not have a timestamp. The assumption is based on the fact that implementation and usage of a global time as timestamp in such systems may not be practical or efficient. When a new transaction is received in a tile, we send special messages to the neighboring tiles to inform them regarding the new transaction. The process of sending those special messages continues recursively in all the tiles that receive them until another such special message is detected. This way, we relate local orders of different tiles with each other. We show that our method can reconstruct the correct transaction orders when communication delays are deterministic. We have shown the effectiveness of our method by correctly ordering the transaction in NoCs with mesh and torus topologies with different sizes from 5*5 to 9*9. Also, we have implemented the proposed method in hardware to show its feasibility.

  • RazorProtector: Maintaining Razor DVS Efficiency in Large IR-Drop Zones by an Adaptive Redundant Data-Path

    Yukihiro SASAGAWA  Jun YAO  Takashi NAKADA  Yasuhiko NAKASHIMA  

     
    PAPER-Logic Synthesis, Test and Verification

      Page(s):
    2319-2329

    Recently, the DVS (Dynamic Voltage Scaling) method has been aggressively applied to processors with Razor Flip-Flops. With Razor FF detecting setup errors, the supply voltage in these processors is down-scaled to a near critical setup timing level for a maximum power consumption reduction. However, the conventional Razor and DVS combinations cannot tolerate well error rate variations caused by IR-drops and environment changes. At the near critical setup timing point, even a small error rate change will result in sharp performance degradation. In this paper, we propose RazorProtector, a DVS application method based on a redundant data-path which uses a multi-cycle redundant calculation to shorten the recovery penalty after a setup error occurrence. A dynamic redundancy-adapting scheme is also given to use effectively the designed redundant data-path based on a study of the program, device and error rate characteristics. Our results show that RazorProtector with the adaptive redundancy architecture can, compared to the traditional DVS method with Razor FF, under a large setup rate caused by a 10% unwanted voltage drop, reduce EDP up to 78% at 100 µs/V, 88% at 200 µs/V voltage scaling slope.

  • A Formal Approach to Optimal Register Binding with Ordered Clocking for Clock-Skew Tolerant Datapaths

    Keisuke INOUE  Mineo KANEKO  

     
    PAPER-Logic Synthesis, Test and Verification

      Page(s):
    2330-2337

    The impact of clock-skew on circuit timing increases rapidly as technology scales. As a result, it becomes important to deal with clock-skew at the early stages of circuit designs. This paper presents a novel datapath design that aims at mitigating the impact of clock-skew in high-level synthesis, by integrating margin (evaluated as the maximum number of clock-cycles to absorb clock-skew) and ordered clocking into high-level synthesis tasks. As a first attempt to the proposed datapath design, this paper presents a 0-1 integer linear programming formulation that focuses on register binding to achieve the minimum cost (the minimum number of registers) under given scheduling result. Experimental results show the optimal results can be obtained without increasing the latency, and with a few extra registers compared to traditional high-level synthesis design.

  • Scan-Based Attack on AES through Round Registers and Its Countermeasure

    Youhua SHI  Nozomu TOGAWA  Masao YANAGISAWA  

     
    PAPER-High-Level Synthesis and System-Level Design

      Page(s):
    2338-2346

    Scan-based side channel attack on hardware implementations of cryptographic algorithms has shown its great security threat. Unlike existing scan-based attacks, in our work we observed that instead of the secret-related-registers, some non-secret registers also carry the potential of being misused to help a hacker to retrieve secret keys. In this paper, we first present a scan-based side channel attack method on AES by making use of the round counter registers, which are not paid attention to in previous works, to show the potential security threat in designs with scan chains. And then we discussed the issues of secure DFT requirements and proposed a secure scan scheme to preserve all the advantages and simplicities of traditional scan test, while significantly improve the security with ignorable design overhead, for crypto hardware implementations.

  • Fault-Injection Analysis to Estimate SEU Failure in Time by Using Frame-Based Partial Reconfiguration

    Yoshihiro ICHINOMIYA  Tsuyoshi KIMURA  Motoki AMAGASAKI  Morihiro KUGA  Masahiro IIDA  Toshinori SUEYOSHI  

     
    PAPER-High-Level Synthesis and System-Level Design

      Page(s):
    2347-2356

    SRAM-based field programmable gate arrays (FPGAs) are vulnerable to a soft-error induced by radiation. Techniques for designing dependable circuits, such as triple modular redundancy (TMR) with scrubbing, have been studied extensively. However, currently available evaluation techniques that can be used to check the dependability of these circuits are inadequate. Further, their results are restrictive because they do not represent the result in terms of general reliability indicator to decide whether the circuit is dependable. In this paper, we propose an evaluation method that provides results in terms of the realistic failure in time (FIT) by using reconfiguration-based fault-injection analysis. Current fault-injection analyses do not consider fault accumulation, and hence, they are not suitable for evaluating the dependability of a circuit such as a TMR circuit. Therefore, we configure an evaluation system that can handle fault-accumulation by using frame-based partial reconfiguration and the bootstrap method. By using the proposed method, we successfully evaluated a TMR circuit and could discuss the result in terms of realistic FIT data. Our method can evaluate the dependability of an actual system, and help with the tuning and selection in dependable system design.

  • Achieving Maximum Performance for Bus-Invert Coding with Time-Splitting Transmitter Circuit

    Myungchul YOON  

     
    PAPER-High-Level Synthesis and System-Level Design

      Page(s):
    2357-2363

    An analytical performance evaluation model is presented in this paper. A time-splitting transmitter circuit employing a selectively activated flip-driver (SAFD) is presented and its performance is estimated by the new model. The optimal partitioning method which maximizes the performance of a given bus-invert (BI) coding circuit is also presented. When a bus is optimally partitioned, an ordinary BI circuit can reduce the number of bus transitions by about 25%, while an SAFD circuit can remove about 35% of them. The newly developed method is verified by simulations whose results correspond very well to the values predicted by the model.

  • FPGA Design of User Monitoring System for Display Power Control

    Tomoaki ANDO  Vasily G. MOSHNYAGA  Koji HASHIMOTO  

     
    PAPER-High-Level Synthesis and System-Level Design

      Page(s):
    2364-2372

    This paper introduces new FPGA design of user-monitoring system for power management of PC display. From the camera readings the system detects whether the user looks at the screen or not and produces signals to control the display backlight. The system provides over 88% eye detection accuracy at 8f/s image processing rate. We describe new eye-tracking algorithm and hardware and present the results of its experimental evaluation in prototype display power management system.

  • A High Level Design of Reconfigurable and High-Performance ASIP Engine for Image Signal Processing

    Hsuan-Chun LIAO  Mochamad ASRI  Tsuyoshi ISSHIKI  Dongju LI  Hiroaki KUNIEDA  

     
    PAPER-High-Level Synthesis and System-Level Design

      Page(s):
    2373-2383

    Emerging image and video applications and conventional MPSoC architectures encounter drastically increasing performance and flexibility requirements. In order to display high quality images, large amount of image processing needs to be carried out. These image processing algorithms are nonstandard and vary case by case, and it is difficult to achieve real time processing by using general purpose processors or DSP. In this paper, we present two reconfigurable Application Specific Instruction-set Processors (ASIP) which can perform several image processing algorithms by using the same processor architecture. These ASIPs can achieve performance similar to DSP; meanwhile, while the area is considerably smaller than DSP and slightly bigger than conventional RISC processor. 1D ASIP can perform 16 times higher compared to a RISC processor, and 2D ASIP can perform 3 to 7 times higher compared to a RISC processor.

  • A 115 mW 1 Gbps Bit-Serial Layered LDPC Decoder for WiMAX

    Xiongxin ZHAO  Xiao PENG  Zhixiang CHEN  Dajiang ZHOU  Satoshi GOTO  

     
    PAPER-High-Level Synthesis and System-Level Design

      Page(s):
    2384-2391

    Structured quasi-cyclic low-density parity-check (QC-LDPC) codes have been adopted in many wireless communication standards, such as WiMAX, Wi-Fi and WPAN. To completely support the variable code rate (multi-rate) and variable code length (multi-length) implementation for universal applications, the partial-parallel layered LDPC decoder architecture is straightforward and widely used in the decoder design. In this paper, we propose a high parallel LDPC decoder architecture for WiMAX system with dedicated ASIC design. Different from the block by block decoding schedule in most partial-parallel layered architectures, all the messages within each layer are updated simultaneously in the proposed fully-parallel layered decoder architecture. Meanwhile, the message updating is separated into bit-serial style to reduce hardware complexity. A 6-bit implementation is adopted in the decoder chip, since simulations demonstrate that 6-bit quantization is the best trade-off between performance and complexity. Moreover, the two-layer concurrent processing technique is proposed to further increase the parallelism for low code rates. Implementation results show that the decoder chip saves 22.2% storage bits and only takes 2448 clock cycles per iteration for all the code rates defined in WiMAX standard. It occupies 3.36 mm2 in SMIC 65 nm CMOS process, and realizes 1056 Mbps throughput at 1.2 V, 110 MHz and 10 iterations with 115 mW power occupation, which infers a power efficiency of 10.9 pJ/bit/iteration. The power efficiency is improved 63.6% in normalized comparison with the state-of-art WiMAX LDPC decoder.

  • A Flexible Architecture for TURBO and LDPC Codes

    Yun CHEN  Yuebin HUANG  Chen CHEN  Changsheng ZHOU  Xiaoyang ZENG  

     
    LETTER-High-Level Synthesis and System-Level Design

      Page(s):
    2392-2395

    Turbo codes and LDPC (Low-Density Parity-Check) codes are two of the most powerful error correction codes that can approach Shannon limit in many communication systems. But there are little architecture presented to support both LDPC and Turbo codes, especially by the means of ASIC. This paper have implemented a common architecture that can decode LDPC and Turbo codes, and it is capable of supporting the WiMAX, WiFi, 3GPP-LTE standard on the same hardware. In this paper, we will carefully describe how to share memory and logic devices in different operation mode. The chip is design in a 130 nm CMOS technology, and the maximum clock frequency can reach up to 160 MHz. The maximum throughput is about 104 Mbps@5.5 iteration for Turbo codes and 136 Mbps@10iteration for LDPC codes. Comparing to other existing structure, the design speed, area have significant advantage.

  • Regular Section
  • Unified Constant Geometry Fault Tolerant DCT/IDCT for Image Codec System on a Display Panel

    Jaehee YOU  

     
    PAPER-Digital Signal Processing

      Page(s):
    2396-2406

    System-on-display panel design methodologies are proposed with the purpose of integrating DCT and IDCT on display panels for image codec and peripheral systems so as to reduce the bus data rate, memory size and power consumption. Unified constant geometry algorithms and architectures including recursive additions are proposed for DCT and IDCT butterfly computation, recursive additions and interconnections between stages. These schemes facilitate VLSI implementation and improve fault tolerance, suitable for low-yield SOP processing technologies through duplicate use of a PE as all the butterfly and recursive addition stages are composed and interconnected in a regular fashion. Efficient redundancy replacement methodologies optimizing the computation speed and the amount of hardware in various application areas are also described with testability and reliability issues. Finally, a performance analysis of speed, hardware and interconnection complexity is described with the proposed work's advantages.

  • Low Complexity Systolic Array Structure for Extended QRD-RLS Equalizer

    Ji-Hye SHIN  Young-Beom JANG  

     
    PAPER-Digital Signal Processing

      Page(s):
    2407-2414

    In this paper, a new systolic array structure for the extended QR decomposition based recursive least-square (QRD-RLS) equalizer is proposed. The fact that the vectoring and rotation mode coordinate rotation digital computer (CORDIC) processors rotate in the same direction is used to show that the hardware complexity of the systolic array can be reduced. Furthermore, since the vectoring and rotation mode CORDIC processors in the proposed structure rotate simultaneously, operation time is also reduced. The performance of the proposed equalizer is analyzed by observing the flatness obtained by multiplying the frequency responses of the unknown channel with the proposed equalizer. Simulation results through hardware description language (HDL) coding and synthesis show that 23.8% of the chip implementation area can be reduced.

  • A Low-Cost Bit-Error-Rate BIST Circuit for High-Speed ADCs Based on Gray Coding

    Ya-Ting SHYU  Ying-Zu LIN  Rong-Sing CHU  Guan-Ying HUANG  Soon-Jyh CHANG  

     
    PAPER-Analog Signal Processing

      Page(s):
    2415-2423

    Real-time on-chip measurement of bit error rate (BER) for high-speed analog-to-digital converters (ADCs) does not only require expensive multi-port high-speed data acquisition equipment but also enormous post-processing. This paper proposes a low-cost built-in-self-test (BIST) circuit for high-speed ADC BER test. Conventionally, the calculation of BER requires a high-speed adder. The presented method takes the advantages of Gray coding and only needs simple logic circuits for BER evaluation. The prototype of the BIST circuit is fabricated along with a 5-bit high-speed flash ADC in a 90-nm CMOS process. The active area is only 90 µm 70 µm and the average power consumption is around 0.3 mW at 700 MS/s. The measurement of the BIST circuit shows consistent results with the measurement by external data acquisition equipment.

  • A High-Speed Low-Complexity Time-Multiplexing Reed-Solomon-Based FEC Architecture for Optical Communications

    Jeong-In PARK  Hanho LEE  

     
    PAPER-VLSI Design Technology and CAD

      Page(s):
    2424-2429

    A high-speed low-complexity time-multiplexing Reed-Solomon-based forward error correction architecture based on the pipelined truncated inversionless Berlekamp-Massey algorithm is presented in this paper. The proposed architecture has very high speed and very low hardware complexity compared with conventional Reed-Solomon-based forward error correction architectures. Hardware complexity is improved by employing a truncated inverse Berlekamp-Massey algorithm. A high-speed and high-throughput data rate is facilitated by employing a three-parallel processing pipelining technique and modified syndrome computation block. The time-multiplexing method for pipelined truncated inversionless Berlekamp-Massey architecture is used in the parallel Reed-Solomon decoder to reduce hardware complexity. The proposed architecture has been designed and implemented with 90-nm CMOS technology. Synthesis results show that the proposed 16-channel Reed-Solomon-based forward error correction architecture requires 417,600 gates and can operate at 640 MHz to achieve a throughput of 240 Gb/s. The proposed architecture can be readily applied to Reed-Solomon-based forward error correction devices for next-generation short-reach optical communications.

  • A Jitter Insertion and Accumulation Model for Clock Repeaters

    Monica FIGUEIREDO  Rui L. AGUIAR  

     
    PAPER-VLSI Design Technology and CAD

      Page(s):
    2430-2442

    This paper presents a model to estimate jitter insertion and accumulation in clock repeaters. We propose expressions to estimate, with low computational effort, both static and dynamic clock jitter insertion in repeaters with different sizes, interconnects and slew-rates. It requires only the pre-characterization of a reference repeater, which can be accomplished with a small number of simulations or measurements. Furthermore, we propose expressions for dynamic jitter accumulation that considers the dual nature of power and ground noise impact on delay. The complete model can be used to replace time-consuming transient noise simulations when evaluating jitter in clock distribution systems, and provide valuable insights regarding the impact of design parameters on jitter. Presented results show that our models can estimate jitter insertion and accumulation with an error within 10% of simulation results, for typical designs, and accurately reflect the impact of changing design parameters.

  • SSTA Scheme for Multiple Input Switching Case Based on Stochastic Collocation Method

    Gengsheng CHEN  Chenxi QIAN  Jun TAO  

     
    PAPER-VLSI Design Technology and CAD

      Page(s):
    2443-2450

    In this paper, a complete SSTA scheme is proposed to calculate the output waveform of a logic cell on any random selected point in the process variational space, or the mean value and variance of the output signal with very high accuracy and acceptable CPU cost. At first, Miller capacitances between the input nodes and internal nodes of a logic cell are introduced to construct the improved MCSM model so as to improve the modeling accuracy. Secondly, the stochastic collocation method jointed with the Modified Nested Sparse Grid technique is adopted for SSTA procedure to avoid the exponential increase of the collocation points number caused by tensor product. Thirdly, a Nominal waveform based Fast Simulation Method is developed to speedup the simulation on each collocation point. At last, Automatic Waveform Construction Technique is developed to construct the output waveform with the approximation points as little as possible to decrease the computational cost while guaranteeing high accuracy. Numerical results are also given to demonstrate the efficiency of the proposed algorithm.

  • Scalable Privacy-Preserving t-Repetition Protocol with Distributed Medical Data

    Ji Young CHUN  Dowon HONG  Dong Hoon LEE  Ik Rae JEONG  

     
    PAPER-Cryptography and Information Security

      Page(s):
    2451-2460

    Finding rare cases with medical data is important when hospitals or research institutes want to identify rare diseases. To extract meaningful information from a large amount of sensitive medical data, privacy-preserving data mining techniques can be used. A privacy-preserving t-repetition protocol can be used to find rare cases with distributed medical data. A privacy-preserving t-repetition protocol is to find elements which exactly t parties out of n parties have in common in their datasets without revealing their private datasets. A privacy-preserving t-repetition protocol can be used to find not only common cases with a high t but also rare cases with a low t. In 2011, Chun et al. suggested the generic set operation protocol which can be used to find t-repeated elements. In the paper, we first show that the Chun et al.'s protocol becomes infeasible for calculating t-repeated elements if the number of users is getting bigger. That is, the computational and communicational complexities of the Chun et al.'s protocol in calculating t-repeated elements grow exponentially as the number of users grows. Then, we suggest a polynomial-time protocol with respect to the number of users, which calculates t-repeated elements between users.

  • Performance Analysis of Hermite-Symmetric Subcarrier Coding for OFDM Systems over Fading Channels

    Fumihito SASAMORI  Shiro HANDA  

     
    PAPER-Communication Theory and Signals

      Page(s):
    2461-2469

    Orthogonal frequency division multiplexing (OFDM) has great advantages such as high spectrum efficiency and robustness against multipath fading. In order to enhance the advantages, an Hermite-symmetric subcarrier coding for OFDM, which is used for transmission systems like the asymmetric digital subscriber line (ADSL) and multiband OFDM in ultra-wideband (UWB) communications, is very attractive. The subcarrier coding can force the imaginary part of the OFDM signal to be zero, then another data sequence can be simultaneously transmitted in the quadrature channel. In order to theoretically verify the effectiveness of the Hermite-symmetric subcarrier coding in wireless OFDM (HC-OFDM) systems, we derive closed-form equations for bit error rate (BER) and throughput over fading channels. Our analytical results can theoretically indicate that the HC-OFDM systems achieve the improvement of the performances owing to the effect of the subcarrier coding.

  • Image Recovery by Decomposition with Component-Wise Regularization

    Shunsuke ONO  Takamichi MIYATA  Isao YAMADA  Katsunori YAMAOKA  

     
    PAPER-Image

      Page(s):
    2470-2478

    Solving image recovery problems requires the use of some efficient regularizations based on a priori information with respect to the unknown original image. Naturally, we can assume that an image is modeled as the sum of smooth, edge, and texture components. To obtain a high quality recovered image, appropriate regularizations for each individual component are required. In this paper, we propose a novel image recovery technique which performs decomposition and recovery simultaneously. We formulate image recovery as a nonsmooth convex optimization problem and design an iterative scheme based on the alternating direction method of multipliers (ADMM) for approximating its global minimizer efficiently. Experimental results reveal that the proposed image recovery technique outperforms a state-of-the-art method.

  • Statistical Learning Theory of Quasi-Regular Cases

    Koshi YAMADA  Sumio WATANABE  

     
    PAPER-General Fundamentals and Boundaries

      Page(s):
    2479-2487

    Many learning machines such as normal mixtures and layered neural networks are not regular but singular statistical models, because the map from a parameter to a probability distribution is not one-to-one. The conventional statistical asymptotic theory can not be applied to such learning machines because the likelihood function can not be approximated by any normal distribution. Recently, new statistical theory has been established based on algebraic geometry and it was clarified that the generalization and training errors are determined by two birational invariants, the real log canonical threshold and the singular fluctuation. However, their concrete values are left unknown. In the present paper, we propose a new concept, a quasi-regular case in statistical learning theory. A quasi-regular case is not a regular case but a singular case, however, it has the same property as a regular case. In fact, we prove that, in a quasi-regular case, two birational invariants are equal to each other, resulting that the symmetry of the generalization and training errors holds. Moreover, the concrete values of two birational invariants are explicitly obtained, hence the quasi-regular case is useful to study statistical learning theory.

  • Anonymous Authentication Scheme without Verification Table for Wireless Environments

    Ryoichi ISAWA  Masakatu MORII  

     
    LETTER-Cryptography and Information Security

      Page(s):
    2488-2492

    Lee and Kwon proposed an anonymous authentication scheme based on Zhu et al.'s scheme. However, Lee et al.'s scheme has two disadvantages. Firstly, their scheme is vulnerable to off-line dictionary attacks. An adversary can guess a user password from the user's login messages eavesdropped by the adversary. Secondly, an authentication server called a home agent requires a verification table, which violates the original advantage of Zhu et al.'s scheme. That is, it increases the key management costs of the home agent. In this letter, we show the weaknesses of Lee et al.'s scheme and another three existing schemes. Then, we propose a new secure scheme without the verification table, while providing security for off-line dictionary attacks and other attacks except for a certain type of combined attacks.

  • Parametric Forms of the Achievable Rate Region for Source Coding with a Helper

    Tetsunao MATSUTA  Tomohiko UYEMATSU  Ryutaroh MATSUMOTO  

     
    LETTER-Information Theory

      Page(s):
    2493-2497

    Source coding with a helper is one of the most fundamental fixed-length source coding problem for correlated sources. For this source coding, Wyner and Ahlswede-Korner showed the achievable rate region which is the set of rate pairs of encoders such that the probability of error can be made arbitrarily small for sufficiently large block length. However, their expression of the achievable rate region consists of the sum of indefinitely many sets. Thus, their expression is not useful for computing the achievable rate region. This paper deals with correlated sources whose conditional distribution is related by a binary-input output-symmetric channel, and gives a parametric form of the achievable rate region in order to compute the region easily.

  • Robustness of Image Quality Factors for Environment Illumination

    Shogo MORI  Gosuke OHASHI  Yoshifumi SHIMODAIRA  

     
    LETTER-Image

      Page(s):
    2498-2501

    This study examines the robustness of image quality factors in various types of environment illumination using a parameter design in the field of quality engineering. Experimental results revealed that image quality factors are influenced by environment illuminations in the following order: minimum luminance, maximum luminance and gamma.