The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] register(102hit)


  • New Signature Schemes Based on 3rd Order Shift Registers

    Chik-How TAN  Xun YI  Chee-Kheong SIEW  


    E85-A No:1

    In this paper, we propose a new digital signature scheme based on a third order linear feedback shift register for signing documents. This signature scheme is different from most of the signature schemes that are based on discrete logarithm problem, elliptic curves discrete logarithm problem, RSA or quadratic residues. An efficient algorithm for computing kth term of a sequence is also presented. The advantage of this scheme is that the computation is efficient than Schnorr scheme. We also show that the security of the proposed signature scheme is equivalent to that of Schnorr signature scheme.

  • A New Hardware/Software Partitioning Algorithm for DSP Processor Cores with Two Types of Register Files

    Nozomu TOGAWA  Takashi SAKURAI  Masao YANAGISAWA  Tatsuo OHTSUKI  

    LETTER-Hardware/Software Codesign

    E84-A No:11

    This letter proposes a hardware/software partitioning algorithm for digital signal processor cores with two register files. Given a compiled assembly code and a timing constraint of execution time, the proposed algorithm generates a processor core configuration with a new assembly code running on the generated processor core. The proposed algorithm considers two register files and determines the number of registers in each of register files. Moreover the algorithm considers two or more types of functional units for each arithmetic or logical operation and assigns functional units with small area to a processor core without causing performance penalty. A generated processor core will have small area compared with processor cores which have a single register file or those which consider only one type of functional units for each operation. The experimental results demonstrate the effectiveness and efficiency of the proposed algorithm.

  • Register Constraint Analysis to Minimize Spill Code for Application Specific DSPs

    Tatsuo WATANABE  Nagisa ISHIURA  


    E84-A No:6

    This letter presents a method which attempts to minimize the number of spill codes to resolve usage conflicts of distributed registers in application specific DSPs. It searches for a set of ordering restrictions among operations which sequentialize the lifetimes of the values residing in the same register as much as possible. Experimental results show that the proposed analysis method reduces the number of register spills into 28%.

  • An Algorithm for Cryptanalysis of Certain Keystream Generators Suitable for High-Speed Software and Hardware Implementations

    Miodrag J. MIHALJEVIC  Marc P. C. FOSSORIER  Hideki IMAI  


    E84-A No:1

    An algorithm for cryptanalysis of certain keystream generators is proposed. The developed algorithm has the following two advantages over other reported ones: it is more powerful, and it can be implemented by a high-speed software or a simple hardware suitable for high parallel architectures. The algorithm is based on error-correction of information bits only (of the corresponding binary block code) with a novel method for construction of the parity-checks, and the employed error-correction procedure is an APP based threshold decoding. Experimental and theoretical analyses of the algorithm performance are presented, and its complexity is evaluated. The proposed algorithm is compared with recently proposed improved fast correlation attacks based on convolutional codes and turbo decoding. The underlying principles, performance and complexity are compared, and the gain obtained with the novel approach is pointed out.

  • Dynamic Fast Issue (DFI) Mechanism for Dynamic Scheduled Processors

    Abderazek BEN ABDALLAH  Mudar SAREM  Masahiro SOWA  

    PAPER-VLSI Architecture

    E83-A No:12

    Superscalar processors can achieve increased performance by issuing instructions Out-of-Order (OoO) from the original instruction stream. Implementing an OoO instruction scheme requires a hardware mechanism to prevent incorrectly executed instructions from updating registers values. In addition, performance decreases if data dependencies, a branch or a trap among instructions appears. To this end we propose a new mechanism named Dynamic Fast Issue (DFI) mechanism to issue instructions in an OoO fashion to multiple parallel functional units without considerable hardware complexity. The above system, which will be implemented in our Superscalar Functional Assignments Register Microprocessor(FARM), solves data dependencies, supports precise interrupt and branch prediction, which are the main problems associated with the dynamic scheduling of instructions in superscalar machines. Results are written only once,Write-once, directly into the register file (RF). To ensure that results are written in order in their appropriate output registers, a record of instruction order and state is maintained by a status buffer (STB). A 64 entries integrated register file is implemented to hold both renamed and logical registers. To recover the processor state from an interrupt or a branch miss-prediction, a status buffer (STB) and a recovery list table (RLT) are implemented. Novel aspects of the above system architecture as well as the principle underlying this process and the constraints that must be met is presented. Performance evaluation results are performed through full-pipelined-level architectural simulator and SPECint95 benchmark programs.

  • A Hardware/Software Cosynthesis System for Digital Signal Processor Cores with Two Types of Register Files

    Nozomu TOGAWA  Masao YANAGISAWA  Tatsuo OHTSUKI  


    E83-A No:3

    In digital signal processing, bit width of intermediate variables should be longer than that of input and output variables in order to execute intermediate operations with high precision. Then a processor core for digital signal processing is required to have two types of register files, one of which is used by input and output variables and the other one is used by intermediate variables. This paper proposes a hardware/software cosynthesis system for digital signal processor cores with two types of register files. Given an application program and its data, the system synthesizes a hardware description of a processor core, an object code running on the processor core, and software environments. A synthesized processor core can be composed of a processor kernel, multiple data memory buses, hardware loop units, addressing units, and multiple functional units. Furthermore it can have two types of register files RF1 and RF2. The bit width and number of registers in RF1 or RF2 will be determined based on a given application program. Thus a synthesized processor core will have small area with keeping high precision of intermediate operations compared with a processor core with only one register file. The experimental results demonstrate the effectiveness of the proposed system.

  • Test Structure for Characterizing Capacitance Matrix of Multi-Layer Interconnects in VLSI

    Tetsuhisa MIDO  Hiroshi ITO  Kunihiro ASADA  


    E82-C No:4

    A compact new test structure using shift register circuits for extracting components of the capacitance matrix of the multi-layer interconnections has been proposed. An extraction method of the capacitance matrix is also presented. As a result of fabrication, capacitance values obtained by measurement are in good agreement with the numerical calculation. We also showed an estimation method of the measurement errors.

  • Effectiveness of a High Speed Context Switching Method Using Register Bank

    Jun-ichi ITO  Takumi NAKANO  Yoshinori TAKEUCHI  Masaharu IMAI  

    PAPER-LSI Architecture

    E81-A No:12

    This paper proposes a method to reduce the context switching time using a register bank to store contexts of working tasks. Hardware cost and performance were measured by modeling the register bank and controller in VHDL. Following results were obtained: (1) The controller can be implemented with a much smaller amount of hardware cost compared to that of the register bank, which is realized by SRAM module. (2) Context switching time can be reduced to less than 50% compared to that by software implementation. (3) Combination of the proposed architecture with our previous work (RTOS implemented in HW) gives us much higher performance of a hard real-time system.

  • Register-Transfer Level Testability Analysis and Its Application to Design for Testability

    Mizuki TAKAHASHI  Ryoji SAKURAI  Hiroaki NODA  Takashi KAMBE  


    E81-A No:12

    In this paper, we propose a new register transfer level (RT level) testability analysis method. Controllability and observability measures are defined for signal vectors based on the numbers of values they can take. The control part and the datapath part are automatically identified in the given RT level model and distinctive analysis methods are applied. We also describe a DFT point selection method based on our testability measures. In a experiment on a signal processing circuit whose gate count is 7690 including 578 FFs, almost the same fault coverage is achieved with fewer scan FFs than a conventional method based on gate level testability analysis.

  • Efficient Hybrid Allocation of Processor Registers for Compiling Telephone Call Control Programs

    Norio SATO  

    PAPER-Communication Software

    E81-B No:10

    An efficient hybrid scheme has been developed for optimizing register allocation applicable to CISC and RISC processors, which is crucial for maximizing their execution speed. Graph-coloring at the function level is combined with a powerful local register assigner. This assigner uses accurate program flows and access patterns of variables, and optimizes a wider local range, called an extended basic-block (EBB), than other optimizing compilers. The EBB is a set of basic-blocks that constitute a tree-shaped control flow, which is suitable for the large nested branches that frequently appear in embedded system-control programs, such as those for telephone call processing. The coloring at the function level involves only the live-ranges of program variables that span EBBs. The interference graph is therefore very small even for large functions, so it can be constructed quickly. Instead of iterative live-range splitting or spilling, the unallocated live-ranges are optimized by the EBB-based register assigner, so neither load/store insertion nor code motion is needed. This facilitates generating reliable code and debug symbols. The information provided for the EBB-based assigner facilitates the priority-based heuristics, fine-grained interference checking, and deferred coloring, all of which increase the colorability. Using a thread-support package for CHILL as a sample program, performance measurement showed that local variables are successfully located in registers, and the reduction of static cycles is about 20-30%. Further improvements include using double registers and improving debuggability.

  • Parallel Architecture for Generalized LFSR in LSI Built-In Self Testing

    Tomoko K. MATSUSHIMA  Toshiyasu MATSUSHIMA  Shigeichi HIRASAWA  

    PAPER-Reliability and Fault Analysis

    E81-A No:6

    This paper presents a new architecture for multiple-input signature analyzers. The proposed signature analyzer with Hδ inputs is designed by parallelizing a GLFSR(δ,m), where δ is the number of input signals and m is the number of stages in the feedback shift register. The GLFSR, developed by Pradhan and Gupta, is a general framework for representing LFSR-based signature analyzers. The parallelization technique described in this paper can be applied to any kind of GLFSR signature analyzer, e. g. , SISRs, MISRs, multiple MISRs and MLFSRs. It is shown that a proposed signature analyzer with Hδ inputs requires less complex hardware than either single GLFSR(Hδ,m)s or a parallel construction of the H original GLFSR(δ,m)s. It is also shown that the proposed signature analyzer, while requiring simpler hardware, has comparable aliasing probability with analyzers using conventional GLFSRs for some CUT error models of the same test response length and test time. The proposed technique would be practical for testing CUTs with a large number of output sequences, since the test circuit occupies a smaller area on the LSI chip than the conventional multiple-input signature analyzers of comparable aliasing probability.

  • A 167-MHz 1-Mbit CMOS Synchronous Cache SRAM

    Hideharu YAHATA  Yoji NISHIO  Kunihiro KOMIYAJI  Hiroshi TOYOSHIMA  Atsushi HIRAISHI  Yoshitaka KINOSHITA  


    E80-C No:4

    A 167-MHz 1-Mbit CMOS synchronous cache SRAM was developed using 0.40-µm process technology. The floor plan was designed so that the address registers are located in the center of the chip, and high-speed circuits were developed such as the quasi latch (QL) sense amplifier and the one-shot control (OSC) output register. To maintain suitable setup and hold time margins, an equivalent margin (EM) design method was developed. 167-MHz operation was measured at a supply voltage of 2.5 V and an ambient temperature of 75. The same margins 1.1 ns of the setup time and hold time were measured for the specifications of a setup time of 2.0 ns and a hold time of 0.5 ns.

  • A Partial Scan Design Approach based on Register-Transfer Level Testability Analysis

    Akira MOTOHARA  Sadami TAKEOKA  Mitsuyasu OHTA  Michiaki MURAOKA  

    PAPER-Design for Testability

    E79-D No:10

    An approach to design for testability using register-transfer level (RTL) partial scan selection is described. We define an RTL circuit model which enables efficient description in an electronic system design automation (ESDA) tool and testability analysis which leads to effective partial scan selection for RTL design including data path circuits and control circuits such as state machines. We also introduced a method of partial scan selection at RTL which selects critical registers and state machines based on RTL testability analysis. DFT techniques using gate level testability measures have been studied and concluded that they are not successful in achieving high fault coverage [15]. However, we started this work for the following reasons, 1) In sequential ATPG procedure, more than two memory elements belonging to a functional units such as registers and state machines are often required to be justified at a time. At RTL, state machines and registers are explicitly described and recognized as functional units while gate level memory elements are scattered over the circuit. 2) As discussed in [6], if the circuit is modified so that the test sequence which causes state transition between initial and final states of sequential ATPG can be easily obtained, ATPG results can be also improved. Complex state machines can be identified at RTL. According to the experimental results, our gate level DFT achieves high fault coverage comparable with the previously published most successful DFT methods, and DFT at RTL resulted in higher fault coverage than gate level DFT at much shorter CPU time.

  • Data Bypassing Register File for Low Power Microprocessor

    Makoto IKEDA  Kunihiro ASADA  

    LETTER-Integrated Electronics

    E78-C No:10

    In this paper, we propose a register file with data bypassing function. This register file bypasses data using data bypassing units instead of functional units when actual operation in functional units such as ALU is unnecessary. Applying this method to a general purpose microprocessor with benchmark programs, we demonstrate 50% power consumption reduction in functional units. Though length of bus lines increases a little due to an additional hardware in register file, as buses are not driven when data is bypassed, power consumption in bus lines is also reduced by 40% compared with the conventional architecture.

  • Design of a 3.3 V Single Power-Supply 64 Mbit Flash Memory with Dynamic Bit-Line Latch (DBL) Programming Scheme

    Hiroshi SUGAWARA  Toshio TAKESHIMA  Hiroshi TAKADA  Yoshiaki S. HISAMUNE  Kohji KANAMORI  Takeshi OKAZAWA  Tatsunori MUROTANI  Isao SASAKI  


    E78-C No:7

    A 3.3 V single power-supply 64 Mb flash memory with a DBL programming scheme has been developed and fabricated with 0.4 µm CMOS technology. 50 ns access time and 256 b erase/programming unit-capacity have been achieved by using hierarchical word- and bit-line structures and DBL programming scheme. Furthermore in order to lower operating voltage the HiCR cell is used. The chip size is 19.3 mm13.3 mm.

  • Multibit-Parallel Scrambling Techniques for Distributed Sample Scrambling

    Seok Chang KIM  Byeong Gi LEE  

    PAPER-Communication Device and Circuit

    E78-B No:7

    In this paper, we develop parallel scrambling techniques for the distributed sample scrambling (DSS), which are directly applicable to the bit- and multibit-interleaved multiplexing environments. We first consider how to realize PSRGs, parallel samplings and parallel corrections for the multibit-parallel DSS (MPDSS), which are the fundamental problems in realizing the MPDSS scramblers and descramblers. The results are summarized in three sets of theorems, and a corollary is attached to each theorem to specifically handle the case of the parallel DSS (PDSS). The theorems and corollaries are supported by examples that demonstrate the relevant MPDSS scramblers and descramblers.

  • A realization of an arbitrary BPC Permutation in Hypercube Connected Computer Networks

    Hiroshi MASUYARA  Yuichiro MORITA  Etsuko MASUYAMA  

    PAPER-Computer Networks

    E78-D No:4

    A multiple instruction stream-multiple data stream (MIMD) computer is a parallel computer consisting of a large number of identical processing elements. The essential feature that distinguishes one MIMD computer family from another is the interconnection network. In this paper, we are concerned with a representative type of interconnection networks: the hypercube connected network. A family of regular graphs is presented as a possible candidate for the implementation of a distributed system and for fault-tolerant architectures. The symmetry of graphs makes it possible to determine message routing by using a simple distributed algorithm. A candidate having the same property is the hypercube connected network. Arbitrary data permutations are generally accomplished by sorting. For certain classes of permutations, however, this is, for many frequently used permutations in parallel processing such as bit reversal, bit shuffle, bit complement, matrix transpose, butterfly permutations used in FFT algorithms, and segment shuffles, there exist algorithms that are more efficient than the best sorting algorithm. One such class is the bit permute complement (BPC) class of permutations. In this paper, we, first, develop an algorithm to realize an arbitrary BPC permutation in hypercube connected networks. The developed algorithm in hypercube connected networks requires only 1 token memory register in each node. We next evaluate the ability to realize BPC permutations in these networks of an arbitrary size by estimating the number of required routing steps.

  • Register-Transfer Module Selection for Sub-Micron ASIC Design

    Vasily G. MOSHNYAGA  Yutaka MORI  Keikichi TAMARU  


    E78-D No:3

    In order to shorten the time-to-market, Application-Specific Integrated Circuits (ASIC's) are designed from a library of pre-defined layout implementations for register-transfer modules such as multipliers, adders, RAM, ROM, etc. Current approaches to selecting the implementations from the library usually deal with their timing-area estimates and do not consider delay of the intermodule wiring. However, as sub-micron design rules are utilized for IC fabrication, wiring delay becomes comparable to the functional unit delay and can not longer be ignored even in register-transfer synthesis. In this paper we propose an algorithm that combines module selection with Performance-Driven module placement and reduces an impact of wiring on sub-micron ASIC performance. The algorithm not only efficiently exploits multiple module realizations in the design library, but also finds the module placement which minimizes wiring delay. Experimental results on several benchmarks show that considering both module and wiring issues, more than 30% reduction of the total circuit delay can be achieved.

  • A 180 MHz Multiple-Registered 16 Mbit SDRAM with Flexible Timing Scheme

    Hisashi IWAMOTO  Naoya WATANABE  Akira YAMAZAKI  Seiji SAWADA  Yasumitsu MURAI  Yasuhiro KONISHI  Hiroshi ITOH  Masaki KUMANOYA  


    E77-C No:8

    A multiple-registered architecture is described for 180 MHz 16 Mbit synchronous DRAM. The proposed architecture realizes a flexible control of critical timings such as I/O line busy time and achieves an operation at 180 MHz clock rate with area penalty of only 5.4% over the conventional DRAM.

  • A Verification Method via Invariant for Communication Protocols Modeled as Extended Communicating Finite-State Machines

    Masahiro HIGUCHI  Osamu SHIRAKAWA  Hiroyuki SEKI  Mamoru FUJII  Tadao KASAMI  

    PAPER-Signaling System and Communication Protocol

    E76-B No:11

    This paper presents a method for verifying safety property of a communication protocol modeled as two extended communicating finite-state machines with two unbounded FIFO channels connecting them. In this method, four types of atomic formulae specifying a condition on a machine and a condition on a sequence of messages in a channel are introduced. A human verifier describes a logical formula which expresses conditions expected to be satisfied by all reachable global states, and a verification system proves that the formula is indeed satisfied by such states (i.e. the formula is an invariant) by induction. If the invariant is never satisfied in any unsafe state, it can be concluded that the protocol it safe. To show the effectiveness of this method, a sample protocol extracted from the data transfer phase of the OSI session protocol was verified by using the verification system.
