The search functionality is under construction.

IEICE TRANSACTIONS on Fundamentals

  • Impact Factor

    0.48

  • Eigenfactor

    0.003

  • article influence

    0.1

  • Cite Score

    1.1

Advance publication (published online immediately after acceptance)

Volume E93-A No.12  (Publication Date:2010/12/01)

    Regular Section (Invited Survey)
  • FOREWORD

    Shoji SHINODA  

     
    FOREWORD

      Page(s):
    2353-2353
  • A Survey of the Origins and Evolution of the Microwave Circuit Devices in Japan from the 1920s up until 1945

    Tosiro KOGA  

     
    INVITED SURVEY PAPER

      Page(s):
    2354-2370

    We edit in this paper several archives on the research and development in the field of microwave circuit technology in Japan, that originated with the invention of Yagi-Uda antenna in 1925, together with generally unknown historical topics in the period from the 1920s up until the end of World War II. As the main subject, we investigate the origin and evolution of the Multiply Split-Anode Magnetron, and clarify that the basic magnetron technology had been established until 1939 under the direction of Yoji Ito in cooperation of expert engineers between the Naval Technical Institute (NTI) and the Nihon Musen Co., while the Cavity Magnetron was invented by Shigeru Nakajima of the Nihon Musen Co. in May 1939, and further that physical theory of the Multiply Split-Anode Cavity Magnetron Oscillation and the design theory of the Cavity Magnetron were established in collaboration between the world-known physicists and the expert engineers at the NTI Shimada Laboratory in the wartime. In addition, we clarify that Sin-itiro Tomonaga presented the Scattering Matrix representation of Microwave Circuits, and others. The development mentioned above was carried out, in strict secrecy, in an unusual wartime situation up until 1945.

  • Special Section on VLSI Design and CAD Algorithms
  • FOREWORD

    Kazutoshi WAKABAYASHI  

     
    FOREWORD

      Page(s):
    2371-2371
  • Redundant via Insertion: Removing Design Rule Conflicts and Balancing via Density

    Song CHEN  Jianwei SHEN  Wei GUO  Mei-Fang CHIANG  Takeshi YOSHIMURA  

     
    PAPER-Physical Level Design

      Page(s):
    2372-2379

    The occurrence of via defects increases due to the shrinking size in integrated circuit manufacturing. Redundant via insertion is an effective and recommended method to reduce the yield loss caused by via failures. In this paper, we introduce the redundant via allocation problem for layer partition-based redundant via insertion methods [1] and solve it using the genetic algorithm. At the same time, we use a convex-cost flow model to equilibrate the via density, which is good for the via density rules. The results of layer partition-based model depend on the partition and processing order of metal layers. Furthermore, even we try all of partitions and processing orders, we might miss the optimal solutions. By introducing the redundant via allocation problem on partitioning boundaries, we can avoid the sub-optimality of the original layer-partition based method. The experimental results show that the proposed method got 12 more redundant vias inserted on average and the via density balance can be greatly improved.

  • CAFE Router: A Fast Connectivity Aware Multiple Nets Routing Algorithm for Routing Grid with Obstacles

    Yukihide KOHIRA  Atsushi TAKAHASHI  

     
    PAPER-Physical Level Design

      Page(s):
    2380-2388

    Due to the increase of operation frequency in recent LSI systems, signal propagation delays are required to achieve specifications with very high accuracy. In order to achieve the severe requirements, signal propagation delay is taken into account in the routing design of PCB (Printed Circuit Board). In the routing design of PCB, the controllability of wire length is often focused on since it enables us to control the routing delay. In this paper, we propose CAFE router which obtains routes of multiple nets with target wire lengths for single layer routing grid with obstacles. CAFE router extends the route of each net from a terminal to the other terminal greedily so that the wire length of the net approaches its target wire length. Experiments show that CAFE router obtains the routes of nets with small length error in short time.

  • Regularity-Oriented Analog Placement with Conditional Design Rules

    Shigetoshi NAKATAKE  Masahiro KAWAKITA  Takao ITO  Masahiro KOJIMA  Michiko KOJIMA  Kenji IZUMI  Tadayuki HABASAKI  

     
    PAPER-Physical Level Design

      Page(s):
    2389-2398

    This paper presents a novel regularity evaluation of placement structure and techniques for handling conditional design rules along with dynamic diffusion sharing and well island generation, which are developed based on Sequence-Pair. The regular structures such as topological rows, arrays and repetitive structures are characterized by the way of forming sub-sequences of a sequence-pair. A placement objective is formulated balancing the regularity and the area efficiency. Furthermore, diffusion sharing and well island can be also identified looking into forming of a sequence-pair. In experiments, we applied our regularity-oriented placement mixed with the constraint-driven technique to real analog designs, and attained the results comparable to manual designs even when imposing symmetry constraints. Besides, the results also revealed the regularity serves to increase row-structures applicable to the diffusion sharing for area saving and wire-length reduction.

  • Statistical Timing Analysis Considering Clock Jitter and Skew due to Power Supply Noise and Process Variation

    Takashi ENAMI  Shinyu NINOMIYA  Ken-ichi SHINKAI  Shinya ABE  Masanori HASHIMOTO  

     
    PAPER-Device and Circuit Modeling and Analysis

      Page(s):
    2399-2408

    Clock driver suffers from delay variation due to manufacturing and environmental variabilities as well as combinational cells. The delay variation causes clock skew and jitter, and varies both setup and hold timing margins. This paper presents a timing verification method that takes into consideration delay variation inside a clock network due to both manufacturing variability and dynamic power supply noise. We also discuss that setup and hold slack computation inherently involves a structural correlation problem due to common paths, and demonstrate that assigning individual random variables to upstream clock drivers provides a notable accuracy improvement in clock skew estimation with limited increase in computational cost. We applied the proposed method to industrial designs in 90 nm process. Experimental results show that dynamic delay variation reduces setup slack by over 500 ps and hold slack by 16.4 ps in test cases.

  • Linear Time Calculation of On-Chip Power Distribution Network Capacitance Considering State-Dependence

    Shiho HAGIWARA  Koh YAMANAGA  Ryo TAKAHASHI  Kazuya MASU  Takashi SATO  

     
    PAPER-Device and Circuit Modeling and Analysis

      Page(s):
    2409-2416

    A fast calculation tool for state-dependent capacitance of power distribution network is proposed. The proposed method achieves linear time-complexity, which can be more than four orders magnitude faster than a conventional SPICE-based capacitance calculation. Large circuits that have been unanalyzable with the conventional method become analyzable for more comprehensive exploration of capacitance variation. The capacitance obtained with the proposed method agrees SPICE-based method completely (up to 5 digits), and time-linearity is confirmed through numerical experiments on various circuits. The maximum and minimum capacitances are also calculated using average and variance estimation. Calculation times are linear time-complexity, too. The proposed tool facilitates to build an accurate macro model of an LSI.

  • Measurement Circuits for Acquiring SET Pulse Width Distribution with Sub-FO1-Inverter-Delay Resolution

    Ryo HARADA  Yukio MITSUYAMA  Masanori HASHIMOTO  Takao ONOYE  

     
    PAPER-Device and Circuit Modeling and Analysis

      Page(s):
    2417-2423

    This paper presents two circuits to measure pulse width distribution of single event transients (SETs). We first review requirements for SET measurement in accelerated neutron radiation test and point out problems of previous works, in terms of time resolution, time/area efficiency for obtaining large samples and certainty in absolute values of pulse width. We then devise two measurement circuits and a pulse generator circuit that satisfy all the requirements and attain sub-FO1-inverter-delay resolution, and propose a measurement procedure for assuring the absolute width values. Operation of one of the proposed circuits was confirmed by a radiation experiment of alpha particles with a fabricated test chip.

  • Photomask Data Prioritization Based on VLSI Design Intent and Its Utilization for Mask Manufacturing

    Kokoro KATO  Masakazu ENDO  Tadao INOUE  Shigetoshi NAKATAKE  Masaki YAMABE  Sunao ISHIHARA  

     
    PAPER-Device and Circuit Modeling and Analysis

      Page(s):
    2424-2432

    The increase in the time required for data processing, mask drawing, and inspection of photomask, has led to substantial increase in mask manufacturing cost. This has become one of the major challenges in the semiconductor industry. We have developed a data flow process for mask manufacturing in which we refer to design intent information in order to reduce TAT of mask manufacturing processes. We convert design level information "Design Intent (DI)" into priority information of mask manufacturing data known as "Mask Data Rank (MDR)" so that we can identify and sort out the importance of mask patterns from the view point of the design side. As a result, we can reduce mask writing time and mask inspection time. Our objective is to build efficient data flow conversion system from DI to MDR. In this paper we introduce the idea of MDR and the software system that we built for DI extraction. Then we show the experimental results with actual chip data. Lastly we will discuss related issues and their solutions.

  • A Time Variant Analysis of Phase Noise in Differential Cross-Coupled LC Oscillators

    Jinhua LIU  Guican CHEN  Hong ZHANG  

     
    PAPER-Device and Circuit Modeling and Analysis

      Page(s):
    2433-2440

    This paper presents a systemic analysis for phase noise performances of differential cross-coupled LC oscillators by using Hajimiri and Lee's model. The effective impulse sensitivity functions (ISF) for each noise source in the oscillator is mathematically derived. According to these effective ISFs, the phase noise contribution from each device is figured out, and phase noise contributions from the device noise in the vicinity of the integer multiples of the resonant frequency, weighted by the Fourier coefficients of the effective ISF, are also calculated. The explicit closed-form expression for phase noise of the oscillator is definitely determined. The validity of the phase noise analysis is verified by good simulation agreement.

  • Accuracy Enhancement of Grid-Based SSTA by Coefficient Interpolation

    Shinyu NINOMIYA  Masanori HASHIMOTO  

     
    PAPER-Device and Circuit Modeling and Analysis

      Page(s):
    2441-2446

    Statistical timing analysis for manufacturing variability requires modeling of spatially-correlated variation. Common grid-based modeling for spatially-correlated variability involves a trade-off between accuracy and computational cost, especially for PCA (principal component analysis). This paper proposes to spatially interpolate variation coefficients for improving accuracy instead of fining spatial grids. Experimental results show that the spatial interpolation realizes a continuous expression of spatial correlation, and reduces the maximum error of timing estimates that originates from sparse spatial grids For attaining the same accuracy, the proposed interpolation reduced CPU time for PCA by 97.7% in a test case.

  • Gate Delay Estimation in STA under Dynamic Power Supply Noise

    Takaaki OKUMURA  Fumihiro MINAMI  Kenji SHIMAZAKI  Kimihiko KUWADA  Masanori HASHIMOTO  

     
    PAPER-Device and Circuit Modeling and Analysis

      Page(s):
    2447-2455

    This paper presents a gate delay estimation method that takes into account dynamic power supply noise. We review STA based on static IR-drop analysis and a conventional method for dynamic noise waveform, and reveal their limitations and problems that originate from circuit structures and higher delay sensitivity to voltage in advanced technologies. We then propose a gate delay computation that overcomes the problems with iterative computations and consideration of input voltage drop. Evaluation results with various circuits and noise injection timings show that the proposed method estimates path delay fluctuation well within 1% error on average.

  • A Dynamic Offset Control Technique for Comparator Design in Scaled CMOS Technology

    Xiaolei ZHU  Yanfei CHEN  Masaya KIBUNE  Yasumoto TOMITA  Takayuki HAMADA  Hirotaka TAMURA  Sanroku TSUKAMOTO  Tadahiro KURODA  

     
    PAPER-Device and Circuit Modeling and Analysis

      Page(s):
    2456-2462

    The accuracy of the comparator, which is often determined by its offset, is essential for the resolution of the high performance mixed-signal system. Various design efforts have been made to cancel or calibrate the comparator offset due to many factors like process variations, device thermal noise and input-referred supply noise. However, effective and simple method for offset cancel by applying additional circuits without scarifying the power, speed and area is always challenging. This work explores a dynamic offset control technique that employs charge compensation by timing control. The charge injection and clock feed-through by the latch reset transistor are investigated. A simple method is proposed to generate offset compensation voltage by implementing two source-drain shorted transistors on each regenerative node with timing control signals on their gates. Further analysis for the principle of timing based charge compensation approach for comparator offset control is described. The analysis has been verified by fabricating a 65 nm CMOS 1.2 V 1 GHz comparator that occupies 25 65 µm2 and consumes 380 µW. Circuits for offset control occupies 21% of the areas and 12% of the power consumption of the whole comparator chip.

  • Reduction of Area per Good Die for SoC Memory Built-In Self-Test

    Masayuki ARAI  Tatsuro ENDO  Kazuhiko IWASAKI  Michinobu NAKAO  Iwao SUZUKI  

     
    PAPER-Logic Synthesis, Test and Verification

      Page(s):
    2463-2471

    To reduce the manufacturing cost of SoCs with many embedded SRAMs, we propose a scheme to reduce the area per good die for the SoC memory built-in self-test (MBIST). We first propose BIST hardware overhead reduction by application of an encoder-based comparator. For the repair of a faulty SRAM module with 2-D redundancy, we propose spare assignement algorithm. Based on an existing range-cheking-first algorithm (RCFA), we propose assign-all-row-RCFA (A-RCFA) which assign unused spare rows to faulty ones, in order to suppress the degradation of repair rate due to compressed fail location information output from the encoder-based comparator. Then, considering that an SoC has many SRAM modules, we propose a heuristic algorithm based on iterative improvement algorithm (IIA), which determines whether each SRAM should have a spare row or not, in order to minimize area per a good die. Experimental results on practical scale benchmark SoCs with more than 1,000 SRAM modules indicate that encoder-based comparators reduce hardware overhead by about 50% compared to traditional ones, and that combining the IIA-based algorithm for determining redundancy architecture with the encoder-based comparator effectively reduces the area per good die.

  • Power Optimization of Sequential Circuits Using Switching Activity Based Clock Gating

    Xin MAN  Takashi HORIYAMA  Shinji KIMURA  

     
    PAPER-Logic Synthesis, Test and Verification

      Page(s):
    2472-2480

    Clock gating is the insertion of control signal for registers to switch off unnecessary clock signals selectively without violating the functional correctness of the original design so as to reduce the dynamic power consumption. Commercial EDA tools usually have a mechanism to generate clock gating logic based on the structural method where the control signals specified by designers are used, and the effectiveness of the clock gating depends on the specified control signals. In the research, we focus on the automatic clock gating logic generation and propose a method based on the candidate extraction and control signal selection. We formalize the control signal selection using linear formulae and devise an optimization method based on BDD. The method is effective for circuits with a lot of shared candidates by different registers. The method is applied to counter circuits to check the co-relation with power simulation results and a set of benchmark circuits. 19.1-71.9% power reduction has been found on counter circuitsafter layout and 2.3-18.0% cost reduction on benchmark circuits.

  • Scan-Based Side-Channel Attack against RSA Cryptosystems Using Scan Signatures

    Ryuta NARA  Kei SATOH  Masao YANAGISAWA  Tatsuo OHTSUKI  Nozomu TOGAWA  

     
    PAPER-Logic Synthesis, Test and Verification

      Page(s):
    2481-2489

    Scan-based side-channel attacks retrieve a secret key in a cryptography circuit by analyzing scanned data. Since they must be considerable threats to a cryptosystem LSI, we have to protect cryptography circuits from them. RSA is one of the most important cryptography algorithms because it effectively realizes a public-key cryptography system. RSA is extensively used but conventional scan-based side-channel attacks cannot be applied to it because it has a complicated algorithm. This paper proposes a scan-based side-channel attack which enables us to retrieve a secret key in an RSA circuit. The proposed method is based on detecting intermediate values calculated in an RSA circuit. We focus on a 1-bit time-sequence which is specific to some intermediate values. By monitoring the 1-bit time-sequence in the scan path, we can find out the register position specific to the intermediate value and we can know whether this intermediate value is calculated or not in the target RSA circuit. We can retrieve a secret key one-bit by one-bit from MSB to LSB. The experimental results demonstrate that a 1,024-bit secret key used in the target RSA circuit can be retrieved using 30.2 input messages within 98.3 seconds and its 2,048-bit secret key can be retrieved using 34.4 input within 634.0 seconds.

  • An Error Diagnosis Technique Based on Clustering of Elements

    Kosuke SHIOKI  Narumi OKADA  Kosuke WATANABE  Tetsuya HIROSE  Nobutaka KUROKI  Masahiro NUMA  

     
    PAPER-Logic Synthesis, Test and Verification

      Page(s):
    2490-2496

    In this paper, we propose an error diagnosis technique based on clustering LUT elements to shorten the processing time. By grouping some elements as a cluster, our technique reduces the number of elements to be considered, which is effective to shorten the processing time for screening error location sets. First, the proposed technique partitions the circuit into FFR (fanout-free region) called cluster, which is a subcircuit composed of LUT elements without fanout. After screening the set of clusters including error locations, this technique screens error location sets composed of elements in the remaining set of clusters, where corrections should be made. Experimental results with benchmark circuits have shown that our technique shortens the processing time to 1/170 in the best case, and rectifies circuits including 6 errors which cannot be rectified by the conventional technique.

  • A Design Methodology for a DPA-Resistant Circuit with RSL Techniques

    Daisuke SUZUKI  Minoru SAEKI  Koichi SHIMIZU  Akashi SATOH  Tsutomu MATSUMOTO  

     
    PAPER-Logic Synthesis, Test and Verification

      Page(s):
    2497-2508

    A design methodology of Random Switching Logic (RSL) using CMOS standard cell libraries is proposed to counter power analysis attacks against cryptographic hardware modules. The original RSL proposed in 2004 requires a unique RSL-gate for random data masking and glitch suppression to prevent secret information leakage through power traces. In contrast, our new methodology enables to use general logic gates supported by standard cell libraries. In order to evaluate its practical performance in hardware size and speed as well as resistance against power analysis attacks, an AES circuit with the RSL technique was implemented as a cryptographic LSI using 130-nm and 90-nm CMOS standard cell library. From the results of attack experiments that used a million traces, we confirmed that the RSL-AES circuit has very high DPA and CPA resistance thanks to the contributions of both the masking function and the glitch suppressing function.

  • Automatic Communication Synthesis with Hardware Sharing for Multi-Processor SoC Design

    Yuki ANDO  Seiya SHIBATA  Shinya HONDA  Hiroyuki TOMIYAMA  Hiroaki TAKADA  

     
    PAPER-High-Level Synthesis and System-Level Design

      Page(s):
    2509-2516

    We present a hardware sharing method for design space exploration of multi-processor embedded systems. In our prior work, we had developed a system-level design tool named SystemBuilder which automatically synthesizes target implementation of a system from a functional description. In this work, we have extended SystemBuilder so that it can automatically synthesize an area-efficient implementation which shares a hardware module among different applications. With SystemBuilder, designers only need to enable an option in order to share a hardware module. The designers, therefore, can easily explore a design space including hardware sharing in short time. A case study shows the effectiveness of the hardware sharing on design space exploration.

  • Improved Dictionary-Based Code-Compression Schemes with XOR Reference for RISC/VLIW Architecture

    Jui-Chun CHEN  Chang-Hong LIN  

     
    PAPER-High-Level Synthesis and System-Level Design

      Page(s):
    2517-2523

    Embedded systems are constrained by the available memory, and code-compression techniques address this issue by reducing the code size of application programs. The main challenge for the development of an effective code-compression technique is to reduce code size without affecting the overall system performance. Dictionary-based code-compression schemes are the most commonly used code-compression methods, because they can provide both good compression ratio and fast decompression. We propose an XOR-based reference scheme that can enhance the compression ratio on all the existing dictionary-based algorithms by changing the distribution of the symbols. Our approach works on all kinds of computer architecture with fixed length instructions, such as RISC or VLIW. Experiments show that our approach can further improve the compression ratio with nearly no hardware, performance, and power overheads.

  • Reliability Evaluation Environment for Exploring Design Space of Coarse-Grained Reconfigurable Architectures

    Takashi IMAGAWA  Masayuki HIROMOTO  Hiroyuki OCHI  Takashi SATO  

     
    PAPER-High-Level Synthesis and System-Level Design

      Page(s):
    2524-2532

    This paper proposes a reliability evaluation environment for coarse-grained reconfigurable architectures. This environment is designed so that it can be easily extended to different target architectures and applications by automating the generation of the simulation inputs such as HDL codes for fault injection and configuration information. This automation enables us to explore a huge design space in order to efficiently analyze area/reliability trade-offs and find the best solution. This paper also shows demonstrative examples of the design space exploration of coarse-grained reconfigurable architectures using the proposed environment. Through the demonstrations, we discuss relationship between coarse-grained architectures and reliability, which has not yet been addressed in existing literatures and show the feasibility of the proposed environment.

  • A Multi-Performance Processor for Reducing the Energy Consumption of Real-Time Embedded Systems

    Tohru ISHIHARA  

     
    PAPER-High-Level Synthesis and System-Level Design

      Page(s):
    2533-2541

    This paper proposes an energy efficient processor which can be used as a design alternative for the dynamic voltage scaling (DVS) processors in embedded system design. The processor consists of multiple PE (processing element) cores and a selective set-associative cache memory. The PE-cores have the same instruction set architecture but differ in their clock speeds and energy consumptions. Only a single PE-core is activated at a time and the other PE-cores are deactivated using clock gating and signal gating techniques. The major advantage over the DVS processors is a small overhead for changing its performance. The gate-level simulation demonstrates that our processor can change its performance within 1.5 microsecond and dissipates about 10 nano-joule while conventional DVS processors need hundreds of microseconds and dissipate a few micro-joule for the performance transition. This makes it possible to apply our multi-performance processor to many real-time systems and to perform finer grained and more sophisticated dynamic voltage control.

  • Variation-Aware Task and Communication Scheduling in MPSoCs for Power-Yield Maximization

    Mahmoud MOMTAZPOUR  Maziar GOUDARZI  Esmaeil SANAEI  

     
    PAPER-High-Level Synthesis and System-Level Design

      Page(s):
    2542-2550

    Parameter variations reveal themselves as different frequency and leakage powers per instances of the same MPSoC. By the increasing variation with technology scaling, worst-case-based scheduling algorithms result in either increasingly less optimal schedules or otherwise more lost yield. To address this problem, this paper introduces a variation-aware task and communication scheduling algorithm for multiprocessor system-on-chip (MPSoC). We consider both delay and leakage power variations during the process of finding the best schedule so that leakier processors are less utilized and can be more frequently put in sleep mode to reduce power. Our algorithm takes advantage of event tables to accelerate the statistical timing and power analysis. We use genetic algorithm to find the best schedule that maximizes power-yield under a performance-yield constraint. Experimental results on real world benchmarks show that our proposed algorithm achieves 16.6% power-yield improvement on average over deterministic worst-case-based scheduling.

  • Generic Permutation Network for QC-LDPC Decoder

    Xiao PENG  Xiongxin ZHAO  Zhixiang CHEN  Fumiaki MAEHARA  Satoshi GOTO  

     
    PAPER-High-Level Synthesis and System-Level Design

      Page(s):
    2551-2559

    Permutation network plays an important role in the reconfigurable QC-LDPC decoder for most modern wireless communication systems with multiple code rates and various code lengths. This paper presents the generic permutation network (GPN) for the reconfigurable QC-LDPC decoder. Compared with conventional permutation networks, this proposal could break through the input number restriction, such as power of 2 and other limited number, and optimize the network for any application in demand. Moreover, the proposed scheme could greatly reduce the latency because of less stages and efficient control signal generating algorithm. In addition, the proposed network processes the nature of high parallelism which could enable several groups of data to be cyclically shifted simultaneously. The synthesis results using the 90 nm technology demonstrate that this architecture can be implemented with the gate count of 18.3k for WiMAX standard at the frequency of 600 MHz and 10.9k for WiFi standard at the frequency of 800 MHz.

  • On Synthesizing a Reliable Multiprocessor for Embedded Systems

    Makoto SUGIHARA  

     
    PAPER-High-Level Synthesis and System-Level Design

      Page(s):
    2560-2569

    Utilizing a heterogeneous multiprocessor system has become a popular design paradigm to build an embedded system at a cheap cost. A reliability issue, which is vulnerability to soft errors, has not been taken into account in the conventional IC (integrated circuit) design flow, while chip area, performance, and power consumption have been done. This paper presents a system design paradigm in which a heterogeneous multiprocessor system is synthesized and its chip area is minimized under real-time and reliability constraints. First we define an SEU vulnerability factor as a vulnerability measure for computer systems so that we evaluate task-wise reliability over various processor structures. Next we build a mixed integer linear programming (MILP) model for minimizing the chip area of a heterogeneous multiprocessor system under real-time and SEU vulnerability constraints. Finally, we show several experimental results on our synthesis approach. Experimental results show that our design paradigm has achieved automatic generation of cost-competitive and reliable heterogeneous multiprocessor systems.

  • Task Allocation with Algorithm Transformation for Reducing Data-Transfer Bottlenecks in Heterogeneous Multi-Core Processors: A Case Study of HOG Descriptor Computation

    Hasitha Muthumala WAIDYASOORIYA  Daisuke OKUMURA  Masanori HARIYAMA  Michitaka KAMEYAMA  

     
    PAPER-High-Level Synthesis and System-Level Design

      Page(s):
    2570-2580

    Heterogeneous multi-core processors are attracted by the media processing applications due to their capability of drawing strengths of different cores to improve the overall performance. However, the data transfer bottlenecks and limitations in the task allocation due to the accelerator-incompatible operations prevents us from gaining full potential of the heterogeneous multi-core processors. This paper presents a task allocation method based on algorithm transformation to increase the freedom of task allocation. We use approximation methods such as CORDIC algorithms to map the accelerator-incompatible operations to accelerator cores. According to the experimental results using HOG descriptor computation, the proposed task allocation method reduces the data transfer time by more than 82% and the total processing time by more than 79% compared to the conventional task allocation method.

  • Combined Use of Rising and Falling Edge Triggered Clocks for Peak Current Reduction in IP-Based SoC/NoC Designs

    Tsung-Yi WU  Tzi-Wei KAO  How-Rern LIN  

     
    PAPER-High-Level Synthesis and System-Level Design

      Page(s):
    2581-2589

    In a typical SoC (System-on-Chip) design, a huge peak current often occurs near the time of an active clock edge because of aggregate switching of a large number of transistors. The number of aggregate switching transistors can be lessened if the SoC design can use a clock scheme of mixed rising and falling triggering edges rather than one of pure rising (falling) triggering edges. In this paper, we propose a clock-triggering-edge assignment technique and algorithms that can assign either a rising triggering edge or a falling triggering edge to each clock of each IP core of a given IP-based SoC/NoC (Network-on-Chip) design. The goal of the algorithms is to reduce the peak current of the design. Our proposed technique has been implemented as a software system. The system can use an LP technique to find an optimal or suboptimal solution within several seconds. The system also can use an ILP technique to find an optimal solution, but the ILP technique is not suitable to be used to solve a complex design. Experimental results show that our algorithms can reduce peak currents up to 56.3%.

  • HDLs Modeling Technique for Burst-Mode and Extended Burst-Mode Asynchronous Circuits

    Jung-Lin YANG  Jau-Cheng WEI  Shin-Nung LU  

     
    PAPER-High-Level Synthesis and System-Level Design

      Page(s):
    2590-2599

    A hardware description languages (HDLs) based modeling technique for asynchronous circuits is presented in this paper. A HDLs handshake package has been developed for expressing handshake-style digital systems in both VHDL and Verilog. Burst-mode and extended burst-mode (BM/XBM) circuits were used to demonstrate the usefulness of this work. This research successfully prototyped comparators, adders, RSA encoder/decoder, and several self-timed circuits for the full-custom IC and FPGAs designs. Furthermore, the HDLs handshake package implemented by this research can be utilized to develop behavioral test benches for studying and analyzing asynchronous designs. Extracting detailed timing information from asynchronous finite state machines (AFSMs), detecting delay faults for synthesized self-timed functional modules, and locating fundamental mode violation within realized AFSMs are proven applications. The anticipated HDL modeling technique and the transformation procedure are detailed in the rest of this paper.

  • A 9-bit 100-MS/s 1.46-mW Tri-Level SAR ADC in 65 nm CMOS

    Yanfei CHEN  Sanroku TSUKAMOTO  Tadahiro KURODA  

     
    PAPER-Circuit Design

      Page(s):
    2600-2608

    A 9-bit 100-MS/s successive approximation register (SAR) ADC with low power and small area has been implemented in 65-nm CMOS technology. A tri-level charge redistribution technique is proposed to reduce DAC switching energy and settling time. By connecting bottom plates of differential capacitor arrays for charge sharing, extra reference voltage is avoided. Two reference voltages charging and discharging the capacitors are chosen to be supply voltage and ground in order to save energy and achieve a rail-to-rail input range. Split capacitor arrays with mismatch calibration are implemented for small area and small input capacitance without linearity degradation. The ADC achieves a peak SNDR of 53.1 dB and consumes 1.46 mW from a 1.2-V supply, resulting in a figure of merit (FOM) of 39 fJ/conversion-step. The total active area is 0.012 mm2 and the input capacitance is 180 fF.

  • Low-Voltage Operational Active Inductor for LNA Circuit

    Masaaki SODA  Ningyi WANG  Michio YOTSUYANAGI  

     
    PAPER-Circuit Design

      Page(s):
    2609-2615

    A low voltage operational active inductor circuit is attractive for spiral-inductor-less LNA because of realizing high gain and low voltage operation simultaneously. In this paper, a simply structured low-voltage operational active inductor to enhance the amplifier gain is introduced and analyzed. This active inductor, which utilizes a transistor load operated in the triode region and a source follower, features a small DC voltage drop suitable for low voltage LNAs. An LNA using the active inductor load was designed with an input matching circuit using 90 nm CMOS technology. The LNA tuned to 2.4 GHz operation has 19.5 dB of the internal gain. In addition, the frequency characteristics are easily varied by changing the capacitance value in the active inductor circuit. The core circuit occupies only 0.0026 mm2 and consumes 2.8 mW with 1.2 V supply voltage.

  • Subtraction Inversion for Delta Path's Hardware Simplification in MASH Delta-Sigma Modulator

    Pao-Lung CHEN  

     
    LETTER-Circuit Design

      Page(s):
    2616-2620

    The multistage noise-shaping (MASH) delta-sigma modulator (DSM) is the key element in a fractional-N frequency synthesizer. A hardware simplification method with subtraction inversion is proposed for delta-path's design in a MASH delta-sigma modulator. The subtraction inversion method focuses on simplification of adder-subtractor unit in the delta path with inversion of subtraction signal. It achieves with less hardware cost as compared with the conventional approaches. As a result, the hardware organization is regular and easy for expanding into higher order MASH DSM design. Analytical details of the implementation way and hardware cost function with N-th order configuration are presented. Finally, simulations with hardware description language as well as synthesis data verified the proposed design method.

  • Low-Power High-Speed Data Serializer for Mobile TFT-LCD Driver ICs

    Jae-Hyuck WOO  Jae-Goo LEE  Young-Hyun JUN  Bai-Sun KONG  

     
    LETTER-Circuit Design

      Page(s):
    2621-2622

    A novel data serializer is proposed for use in mobile TFT-LCD driver ICs. The proposed data serializer adopting hierarchical switching and repeater/separator schemes provides 82% power reduction and 27% speed improvement with 27% area saving. Measured overall power consumption of a TFT-LCD driver IC with the proposed data serializer was reduced by as much as 49%.

  • Special Section on Wideband Systems
  • FOREWORD

    Shinsuke HARA  

     
    FOREWORD

      Page(s):
    2623-2623
  • Measurements and Modeling of Ultra-Wideband Propagation Losses around the Human Body Dependent on Room Volume

    Hironobu YAMAMOTO  Masato KOIWAI  Takehiko KOBAYASHI  

     
    PAPER

      Page(s):
    2624-2633

    This paper describes ultra wideband (UWB) radio propagation measurements and modeling for wireless body area network (WBAN) applications in different environments. Several propagation measurement campaigns and associated modelings were carried out in either a radio anechoic chamber or a specific room type; however, dependence of the radio propagation on surrounding environments was not studied. Multipaths (mainly reflected from floor, ceiling, and walls) highly depend on the environment. To address this problem, radio propagation around the human body was measured in a radio anechoic chamber and four different-sized rooms. Parameters in a conventional loss model derived from the measurements were found to significantly diverge and depend on room volume and line-of-sight (LOS)/non-LOS (NLOS) cases. A modified model considering the impact of room volume has been proposed for the LOS/NLOS cases. Different propagation mechanisms were discussed along with parameter derivation. Probability distributions for the UWB propagation losses were also examined.

  • Doppler Spread Mitigation Using Harmonic Transform for Wireless OFDM Systems in Mobile Communications

    Saiyan SAIYOD  Sakchai THIPCHAKSURAT  Ruttikorn VARAKULSIRIPUNTH  

     
    PAPER

      Page(s):
    2634-2645

    In wireless OFDM systems, the system performance is suffered from frequency offset and symbol timing offset due to the Doppler effect. Using the discrete Fourier transform (DFT) and inverse discrete Fourier transform (IDFT) for traditional signal transformation from the time-domain into frequency-domain, and vice versa, the system performance may be severely degraded. To make the OFDM system that can tolerate the above problems, we have considered that the harmonic transform can be applicable to the traditional signal transformation, thereby improving the system performance. In this paper, we combine the good characteristics of harmonic transform and instantaneous frequency to be a novel transformation for wireless OFDM systems. We propose a modified discrete harmonic transform (MDHT) which can be performed adaptively. Our proposed scheme called the modified discrete harmonic transform OFDM (MDHT-OFDM scheme). We derive the equations of the novel discrete harmonic transform which are suitable for wireless OFDM systems and the novel channel estimation cooperated with the novel transformation. The proposed channel estimation is performed in both time-domain and frequency-domain. The performance of a MDHT-OFDM scheme is evaluated by means of a simulation. We compare the performance of a MDHT-OFDM scheme with one of the conventional DFT-OFDM scheme in the term of symbol error rate (SER). MDHT-OFDM scheme can achieve better performance than that of the conventional DFT-OFDM scheme in mitigating the Doppler spread.

  • Channel Estimator Employing Narrowband Interference Detector of Wideband OFDM Receiver

    Naohiko IWAKIRI  Takehiko KOBAYASHI  

     
    PAPER

      Page(s):
    2646-2653

    A multiband system can flexibly create spectral holes to avoid interference between different systems. When two systems within the same frequency band coexist, the multiband system must immediately detect the signals from all users to remove unwanted interference. The complication of creating spectral holes is to obtain an occupied frequency band and an angle-of-arrival of interfering system. These parameters must be measured at the receiver of multiband system and then fed back to the transmitter. This paper presents a channel estimator with an interference detector that is developed to implement and test it's functionality in a multiband system. The proposed estimator can precisely detect the parameters before demodulation, and quickly feed back the interfering system parameters to transmitter. The effective design and the detection error rate were evaluated via verification tests in an anechoic chamber and computer simulations. The results of the proposed technique show an ability of interference detection as well as channel estimation.

  • Performance Evaluation of Iterative LDPC-Coded MIMO OFDM System with Time Interleaving

    Kazuhiko MITSUYAMA  Kohei KAMBARA  Takayuki NAKAGAWA  Tetsuomi IKEDA  Tomoaki OHTSUKI  

     
    PAPER

      Page(s):
    2654-2662

    Multiple-input multiple-output (MIMO) OFDM technique is an attractive solution to increase the spectrum efficiency for mobile transmission applications. However, high spatial correlation makes signal detection difficult in real outdoor environments, and thus various methods have been developed to improve the detection performance. An iterative low-density parity-check (LDPC) coded multiple-input multiple-output (MIMO) system is a promising method for solving this problem, and its performance has been analyzed theoretically. This paper proposes an iterative LDPC minimum mean square error with soft interference cancellation (LDPC-MMSE-SIC) receiver with a time de-interleaver in front of the MMSE detector and evaluates its performance by computer simulation using channel state information (CSI) acquired in real outdoor measurements. We show that the iterative detection and decoding system with time interleaving, which is long enough to cover a fading cycle, achieves excellent error rate performance in mobile LOS environments and outperforms an LDPC maximum likelihood detection (LDPC-MLD) receiver with the same error correction and interleaving.

  • Combined Trellis Precoding and Error Correcting Codes in Multi-User MIMO-OFDM Systems

    Tsuguhide AOKI  Hideki OCHIAI  Ryuji KOHNO  

     
    PAPER

      Page(s):
    2663-2671

    A major drawback with linear precoding in a downlink multi-user MIMO system is the increase in the transmit power when a channel is correlated. On the other hand, nonlinear trellis precoding in downlink multi-user MIMO systems is capable of minimizing the transmit power by adding a shaping sequence to the original transmit sequence. However, conventional trellis precoding cannot be directly applied to existing bit-interleaved coded MIMO-OFDM systems since the trellis precoding and error correcting codes should be designed separately. In this paper, we proposed to embed trellis precoding into the error correcting codes that are used in the original multi-user MIMO-OFDM system employing linear precoding. Major advantage of this approach is that the receiving procedure at user terminals designed for the original system need not be changed up to the error correcting decoder to support our trellis precoding. Computer simulations show that the proposed trellis precoding provides improvements of 2 dB and 2.5 dB in 22 and 33 MIMO configurations, respectively.

  • Phase Rotation for Constructing Uniform Frequency Spectrum in IFDMA Communication

    Takeo YAMASAKI  Osamu TAKYU  Koichi ADACHI  Yohtaro UMEDA  Masao NAKAGAWA  

     
    PAPER

      Page(s):
    2672-2681

    In this paper, a scheme for constructing the flat frequency spectrum of interleaved frequency division multiple access (IFDMA) is proposed. Since IFDMA is one of the single carrier modulation schemes, the frequency spectrum components are fluctuated and depend on the information data sequence. Even if IFDMA modulation scheme makes frequency spectrum dispersive for obtaining frequency diversity gain, frequency diversity gain is reduced by the fluctuation of frequency spectrum. In addition, in decision directed channel estimation (DDCE), which achieves good channel estimation accuracy in fast fading environment, the accuracy of channel transfer function estimated at the significant attenuated frequency component is much degraded. In the proposed technique, a random phase sequence is multiplied to the information data sequence for constructing the flat frequency spectrum. As a result, the frequency diversity gain is enlarged and the accuracy of channel estimation by DDCE is improved. Furthermore, we consider the blind estimation technique for the random phase sequence selected by transmitter. We show the effects of the proposed scheme by computer simulation.

  • A Method of Cognizing Primary and Secondary Radio Signals

    Satoshi TAKAHASHI  

     
    PAPER

      Page(s):
    2682-2690

    A cognitive radio will have to sense and discover the spectral environments where it would not cause primary radios to interfere. Because the primary radios have the right to use the frequency, the cognitive radios as the secondary radios must detect radio signals before use. However, the secondary radios also need identifying the primary and other secondary radios where the primary radios are vulnerable to interference. In this paper, a method of simultaneously identifying signals of primary and secondary radios is proposed. The proposed bandwidth differentiation assumes the primary and secondary radios use orthogonal frequency division multiplexing (OFDM), and the secondary radios use at the lower number of subcarriers than the primary radios. The false alarm and detection probabilities are analytically evaluated using the characteristic function method. Numerical evaluations are also conducted on the assumption the primary radio is digital terrestrial television broadcasting. Result showed the proposed method could achieve the false alarm probability of 0.1 and the detection probability of 0.9 where the primary and secondary radio powers were 2.5 dB and 3.6 dB higher than the noise power. In the evaluation, the reception signals were averaged over the successive 32 snapshots, and the both the primary and secondary radios used QPSK. The power ratios were 4.7 dB and 8.4 dB where both the primary and secondary radios used 64QAM.

  • Time Domain Feedback Equalizer for Fast Fading Channel in OFDM with Scattered Pilot

    Yutaro NAKAGAWA  Yukitoshi SANADA  

     
    LETTER

      Page(s):
    2691-2695

    In this letter, a new feedback equalization scheme to suppress inter-carrier interference (ICI) in an OFDM system using scattered pilot is investigated. On a fast fading channel severe ICI occurs due to a Doppler shift and it deteriorates a bit error rate (BER) seriously because of small subcarrier spacing. In an ISDB-T receiver the equalization is mainly processed in a frequency domain because the scattered pilot is transmitted over the subcarriers. However, the frequency domain equalization may not suppress severe ICI in the case of the fast fading channel with a large Doppler shift. The proposed equalization scheme uses the scattered pilot symbols transformed in a time domain as the reference signal for feedback taps. Numerical results through computer simulation show that the proposed scheme improves the BER performance especially with low carrier-to-noise ratio (CNR) conditions.

  • Performance Bound for Turbo-Coded 2-D FSO/CDMA Systems over Atmospheric Turbulence Channels

    Anh T. PHAM  Tu A. LUU  Ngoc T. DANG  

     
    LETTER

      Page(s):
    2696-2699

    We propose Turbo-coded two-dimensional (2-D) free-space optical (FSO) CDMA systems for broadband access networks. The performance bound for the proposed system over atmospheric turbulence channels is obtained considering multiple-access interference (MAI) and receiver noise. The results show that the proposed system offers a better performance than that of previously proposed ones. Also, it has a better tolerance to the atmospheric turbulence and the increase in the number of users.

  • On Communication and Interference Range of Multi-Gbps Millimeter-Wave WPAN System

    Chin-Sean SUM  Zhou LAN  Junyi WANG  Hiroshi HARADA  Shuzo KATO  

     
    LETTER

      Page(s):
    2700-2703

    This paper investigates the communication range and interference range of millimeter-wave wireless personal area networks (WPAN) based on realistic system design. Firstly, the effective communication range of the millimeter-wave networks are calculated based on realistic physical (PHY) layer design and 60 GHz channel obtained from actual measurements. Secondly, an interference model is developed to facilitate the analysis of the impact of interferer-to-victim range on the victim link performance. It is found that system with BPSK modulation is able to support use cases with higher number of portable devices within a 3 m range, while system with 16QAM modulation is more suitable for fixed high speed data streaming devices within a shorter range of 1 m. Also, the interferer-to-victim range that causes no interference in all conditions is found to be approximately 40 m, while a 25 m range causes a typical bit error rate (BER) degradation of 1-digit (e.g. BER = 10-6 to 10-5).

  • Prioritized Aggregation for Compressed Video Streaming on mmWave WPAN Systems

    Zhou LAN  Chin Sean SUM  Junyi WANG  Hiroshi HARADA  Shuzo KATO  

     
    LETTER

      Page(s):
    2704-2707

    This paper proposes a prioritized aggregation method that supports compressed video transmission on millimeter wave wireless personal area network (mmWave WPAN) systems. Frame aggregation is an effective means to improve system efficiency and throughput for wide band systems such as mmWave WPAN. It is required by the applications that the mmWave WPAN systems should provide Gbps or multiGbps transmission capability. The proposed scheme targets not only transmission efficiency but also support of compressed video transmission which currently is very popular. The proposal combines MAC layer aggregation with PHY layer skew modulation to facilitate the video transmission in a way that more important data is better protected. Simulation results show that the average peak signal to noise ratio (PSNR) performance is improved by 5 dB compared to conventional method, while the Gbps transmission requirement is fulfilled.

  • Special Section on Theory of Concurrent Systems and its Applications
  • FOREWORD

    Satoshi TAOKA  

     
    FOREWORD

      Page(s):
    2708-2708
  • Optimal Configuration for Multiversion Real-Time Systems Using Slack Based Schedulability

    Sayuri TERADA  Toshimitsu USHIO  

     
    PAPER

      Page(s):
    2709-2716

    In an embedded control system, control performances of each job depend on its latency and a control algorithm implemented in it. In order to adapt a job set to optimize control performances subject to schedulability, we design several types of control software for each job, which will be called versions, and select one version from them when the job is released. A real-time system where each job has several versions is called a multiversion real-time system. A benefit and a CPU utilization of a job depend on the versions. So, it is an important problem to select a version of each job so as to maximize the total benefit of the system subject to a schedulability condition. Such a problem will be called an optimal configuration problem. In this paper, we assume that each version is specified by the relative deadline, the execution time, and the benefit. We show that the optimal configuration problem is transformed to a maximum path length problem. We propose an optimal algorithm based on the forward dynamic programming. Moreover, we propose sub-optimal algorithms to reduce computation times. The efficiencies of the proposed algorithms are illustrated by simulations.

  • Delay Time Determination for the Timed Petri Net Model of a Signaling Pathway Based on Its Structural Information

    Yoshimasa MIWA  Yuki MURAKAMI  Qi-Wei GE  Chen LI  Hiroshi MATSUNO  Satoru MIYANO  

     
    PAPER

      Page(s):
    2717-2729

    This paper proposes a method to incorporate the concept of time for the inclusion of dynamics of signaling pathway in a Petri net model, i.e., to use timed Petri nets. Incorporation of delay times into a Petri net model makes it possible to conduct quantitative evaluation on a target signaling pathway. However, experimental data describing detailed reactions are not available in most cases. An algorithm given in this paper determines delay times of a timed Petri net only from the structural information of it. The suitability of this algorithm has been confirmed by the results of an application to the IL-1 signaling pathway.

  • Parallel Degree of Well-Structured Workflow Nets

    Nan QU  Shingo YAMAGUCHI  Qi-Wei GE  

     
    PAPER

      Page(s):
    2730-2739

    In this paper, we discuss the parallel degree of well-structured workflow nets, WF-nets, for short. First, we give the definition of parallel degree, PARAdeg, for WF-nets. Second, we show it is intractable to compute the value of PARAdeg for acyclic well-structured WF-nets. Next we construct two heuristic algorithms to compute the value. The first algorithm is focused on nest structure and the second one is focused on the longest path. Finally, we perform an experiment to compare the two algorithms and the result is that the accuracy of the first algorithm based on nest structure was higher than that of the second one based on the longest path for most well-structured WF-nets and the accuracy of the second one is better than that of first one only when the well-structured workflow nets are mainly composed by the parallel structures.

  • Rule-Based Ad-Hoc Workflow Modeling for Service Coordination: A Case Study of a Telecom Operational Support System

    Jae-Yoon JUNG  Joonsoo BAE  

     
    LETTER

      Page(s):
    2740-2743

    Workflow technology has spread over the wide areas which require process control (e.g. logistics and e-business) or resource coordination (e.g. cooperative work and grid computing). Among various types of workflow, we introduce a case of ad-hoc workflow process in a Korean telecom company. Since such a service process is generally accompanied with customer's participation, the procedure and state are flexibly changed and sometimes capricious to cope with customer's request and operator's unexpected situation. In case of network service provisioning or problem shooting processes, customers often request the changes of their service types or visit appointments, which result in flexible and adaptive management of the process instances. In this paper, we present a novel approach to workflow modeling based on modified ECA rules (named P-ECA) for the purpose of ad-hoc workflow process modeling. The rule-based workflow modeling is comprehensible to engineers and can be implemented in programs at ease; therefore it is expected that it can be widely adopted for the ad-hoc and adaptive workflow modeling which requires dynamic changes of its states by internal or external events.

  • Regular Section
  • New Differential Cryptanalytic Results for Reduced-Round CAST-128

    Meiqin WANG  Xiaoyun WANG  Kam Pui CHOW  Lucas Chi Kwong HUI  

     
    PAPER-Cryptography and Information Security

      Page(s):
    2744-2754

    CAST-128 is a block cipher used in a number of products, notably as the default cipher in some versions of GPG and PGP. It has been approved for Canadian government use by the Communications Security Establishment. Haruki Seki et al. found 2-round differential characteristics and they can attack 5-round CAST-128. In this paper, we studied the properties of round functions F1 and F3 in CAST-128, and identified differential characteristics for F1 round function and F3 round function. So we identified a 6-round differential characteristic with probability 2-53 under 2-23.8 of the total key space. Then based on 6-round differential characteristic, we can attack 8-round CAST-128 with key sizes greater than or equal to 72 bits and 9-round CAST-128 with key sizes greater than or equal to 104 bits. We give the summary of attacks on reduced-round CAST-128 in Table 10.

  • A Low Complexity Dual-Mode Pulse-Triggered Flip-Flop Design Based on Unified AND/XNOR Logic

    Jin-Fa LIN  Yin-Tshung HWANG  Ming-Hwa SHEU  

     
    LETTER-Circuit Theory

      Page(s):
    2755-2757

    A dual-mode pulse-triggered flip-flop design supporting functional versatility is presented. A low-complexity unified logic module, consisting of only five transistors, for dual-mode pulse generation is devised using pass transistor logic (PTL). Potential threshold voltage loss problem is successfully resolved to ensure the signal integrity. Despite the extra logic for dual-mode operations, the circuit complexity of the proposed design is comparable to those of the single mode designs. Simulations in different process corners and switching activities prove the competitive performance of proposed design against various single mode designs.

  • On (1) Error Correctable Integer Codes

    Hristo KOSTADINOV  Hiroyoshi MORITA  Nikolai MANEV  

     
    LETTER-Information Theory

      Page(s):
    2758-2761

    Integer codes correct errors of a given type, which means that for a given communication channel and modulator we can choose the type of the errors (which are the most common) then construct integer code capable of correcting those errors. A new general construction of single (1) error correctable integer codes will be presented. Comparison between single and multiple (1) error correctable integer codes over AWGN channel using QAM scheme will be presented.