Chik-How TAN Xun YI Chee-Kheong SIEW
In this paper, we propose a new digital signature scheme based on a third order linear feedback shift register for signing documents. This signature scheme is different from most of the signature schemes that are based on discrete logarithm problem, elliptic curves discrete logarithm problem, RSA or quadratic residues. An efficient algorithm for computing kth term of a sequence is also presented. The advantage of this scheme is that the computation is efficient than Schnorr scheme. We also show that the security of the proposed signature scheme is equivalent to that of Schnorr signature scheme.
Nozomu TOGAWA Takashi SAKURAI Masao YANAGISAWA Tatsuo OHTSUKI
This letter proposes a hardware/software partitioning algorithm for digital signal processor cores with two register files. Given a compiled assembly code and a timing constraint of execution time, the proposed algorithm generates a processor core configuration with a new assembly code running on the generated processor core. The proposed algorithm considers two register files and determines the number of registers in each of register files. Moreover the algorithm considers two or more types of functional units for each arithmetic or logical operation and assigns functional units with small area to a processor core without causing performance penalty. A generated processor core will have small area compared with processor cores which have a single register file or those which consider only one type of functional units for each operation. The experimental results demonstrate the effectiveness and efficiency of the proposed algorithm.
Tatsuo WATANABE Nagisa ISHIURA
This letter presents a method which attempts to minimize the number of spill codes to resolve usage conflicts of distributed registers in application specific DSPs. It searches for a set of ordering restrictions among operations which sequentialize the lifetimes of the values residing in the same register as much as possible. Experimental results show that the proposed analysis method reduces the number of register spills into 28%.
Miodrag J. MIHALJEVIC Marc P. C. FOSSORIER Hideki IMAI
An algorithm for cryptanalysis of certain keystream generators is proposed. The developed algorithm has the following two advantages over other reported ones: it is more powerful, and it can be implemented by a high-speed software or a simple hardware suitable for high parallel architectures. The algorithm is based on error-correction of information bits only (of the corresponding binary block code) with a novel method for construction of the parity-checks, and the employed error-correction procedure is an APP based threshold decoding. Experimental and theoretical analyses of the algorithm performance are presented, and its complexity is evaluated. The proposed algorithm is compared with recently proposed improved fast correlation attacks based on convolutional codes and turbo decoding. The underlying principles, performance and complexity are compared, and the gain obtained with the novel approach is pointed out.
Abderazek BEN ABDALLAH Mudar SAREM Masahiro SOWA
Superscalar processors can achieve increased performance by issuing instructions Out-of-Order (OoO) from the original instruction stream. Implementing an OoO instruction scheme requires a hardware mechanism to prevent incorrectly executed instructions from updating registers values. In addition, performance decreases if data dependencies, a branch or a trap among instructions appears. To this end we propose a new mechanism named Dynamic Fast Issue (DFI) mechanism to issue instructions in an OoO fashion to multiple parallel functional units without considerable hardware complexity. The above system, which will be implemented in our Superscalar Functional Assignments Register Microprocessor(FARM), solves data dependencies, supports precise interrupt and branch prediction, which are the main problems associated with the dynamic scheduling of instructions in superscalar machines. Results are written only once,Write-once, directly into the register file (RF). To ensure that results are written in order in their appropriate output registers, a record of instruction order and state is maintained by a status buffer (STB). A 64 entries integrated register file is implemented to hold both renamed and logical registers. To recover the processor state from an interrupt or a branch miss-prediction, a status buffer (STB) and a recovery list table (RLT) are implemented. Novel aspects of the above system architecture as well as the principle underlying this process and the constraints that must be met is presented. Performance evaluation results are performed through full-pipelined-level architectural simulator and SPECint95 benchmark programs.
Nozomu TOGAWA Masao YANAGISAWA Tatsuo OHTSUKI
In digital signal processing, bit width of intermediate variables should be longer than that of input and output variables in order to execute intermediate operations with high precision. Then a processor core for digital signal processing is required to have two types of register files, one of which is used by input and output variables and the other one is used by intermediate variables. This paper proposes a hardware/software cosynthesis system for digital signal processor cores with two types of register files. Given an application program and its data, the system synthesizes a hardware description of a processor core, an object code running on the processor core, and software environments. A synthesized processor core can be composed of a processor kernel, multiple data memory buses, hardware loop units, addressing units, and multiple functional units. Furthermore it can have two types of register files RF1 and RF2. The bit width and number of registers in RF1 or RF2 will be determined based on a given application program. Thus a synthesized processor core will have small area with keeping high precision of intermediate operations compared with a processor core with only one register file. The experimental results demonstrate the effectiveness of the proposed system.
Tetsuhisa MIDO Hiroshi ITO Kunihiro ASADA
A compact new test structure using shift register circuits for extracting components of the capacitance matrix of the multi-layer interconnections has been proposed. An extraction method of the capacitance matrix is also presented. As a result of fabrication, capacitance values obtained by measurement are in good agreement with the numerical calculation. We also showed an estimation method of the measurement errors.
Jun-ichi ITO Takumi NAKANO Yoshinori TAKEUCHI Masaharu IMAI
This paper proposes a method to reduce the context switching time using a register bank to store contexts of working tasks. Hardware cost and performance were measured by modeling the register bank and controller in VHDL. Following results were obtained: (1) The controller can be implemented with a much smaller amount of hardware cost compared to that of the register bank, which is realized by SRAM module. (2) Context switching time can be reduced to less than 50% compared to that by software implementation. (3) Combination of the proposed architecture with our previous work (RTOS implemented in HW) gives us much higher performance of a hard real-time system.
Mizuki TAKAHASHI Ryoji SAKURAI Hiroaki NODA Takashi KAMBE
In this paper, we propose a new register transfer level (RT level) testability analysis method. Controllability and observability measures are defined for signal vectors based on the numbers of values they can take. The control part and the datapath part are automatically identified in the given RT level model and distinctive analysis methods are applied. We also describe a DFT point selection method based on our testability measures. In a experiment on a signal processing circuit whose gate count is 7690 including 578 FFs, almost the same fault coverage is achieved with fewer scan FFs than a conventional method based on gate level testability analysis.
An efficient hybrid scheme has been developed for optimizing register allocation applicable to CISC and RISC processors, which is crucial for maximizing their execution speed. Graph-coloring at the function level is combined with a powerful local register assigner. This assigner uses accurate program flows and access patterns of variables, and optimizes a wider local range, called an extended basic-block (EBB), than other optimizing compilers. The EBB is a set of basic-blocks that constitute a tree-shaped control flow, which is suitable for the large nested branches that frequently appear in embedded system-control programs, such as those for telephone call processing. The coloring at the function level involves only the live-ranges of program variables that span EBBs. The interference graph is therefore very small even for large functions, so it can be constructed quickly. Instead of iterative live-range splitting or spilling, the unallocated live-ranges are optimized by the EBB-based register assigner, so neither load/store insertion nor code motion is needed. This facilitates generating reliable code and debug symbols. The information provided for the EBB-based assigner facilitates the priority-based heuristics, fine-grained interference checking, and deferred coloring, all of which increase the colorability. Using a thread-support package for CHILL as a sample program, performance measurement showed that local variables are successfully located in registers, and the reduction of static cycles is about 20-30%. Further improvements include using double registers and improving debuggability.
Tomoko K. MATSUSHIMA Toshiyasu MATSUSHIMA Shigeichi HIRASAWA
This paper presents a new architecture for multiple-input signature analyzers. The proposed signature analyzer with Hδ inputs is designed by parallelizing a GLFSR(δ,m), where δ is the number of input signals and m is the number of stages in the feedback shift register. The GLFSR, developed by Pradhan and Gupta, is a general framework for representing LFSR-based signature analyzers. The parallelization technique described in this paper can be applied to any kind of GLFSR signature analyzer, e. g. , SISRs, MISRs, multiple MISRs and MLFSRs. It is shown that a proposed signature analyzer with Hδ inputs requires less complex hardware than either single GLFSR(Hδ,m)s or a parallel construction of the H original GLFSR(δ,m)s. It is also shown that the proposed signature analyzer, while requiring simpler hardware, has comparable aliasing probability with analyzers using conventional GLFSRs for some CUT error models of the same test response length and test time. The proposed technique would be practical for testing CUTs with a large number of output sequences, since the test circuit occupies a smaller area on the LSI chip than the conventional multiple-input signature analyzers of comparable aliasing probability.
Hideharu YAHATA Yoji NISHIO Kunihiro KOMIYAJI Hiroshi TOYOSHIMA Atsushi HIRAISHI Yoshitaka KINOSHITA
A 167-MHz 1-Mbit CMOS synchronous cache SRAM was developed using 0.40-µm process technology. The floor plan was designed so that the address registers are located in the center of the chip, and high-speed circuits were developed such as the quasi latch (QL) sense amplifier and the one-shot control (OSC) output register. To maintain suitable setup and hold time margins, an equivalent margin (EM) design method was developed. 167-MHz operation was measured at a supply voltage of 2.5 V and an ambient temperature of 75. The same margins 1.1 ns of the setup time and hold time were measured for the specifications of a setup time of 2.0 ns and a hold time of 0.5 ns.
Akira MOTOHARA Sadami TAKEOKA Mitsuyasu OHTA Michiaki MURAOKA
An approach to design for testability using register-transfer level (RTL) partial scan selection is described. We define an RTL circuit model which enables efficient description in an electronic system design automation (ESDA) tool and testability analysis which leads to effective partial scan selection for RTL design including data path circuits and control circuits such as state machines. We also introduced a method of partial scan selection at RTL which selects critical registers and state machines based on RTL testability analysis. DFT techniques using gate level testability measures have been studied and concluded that they are not successful in achieving high fault coverage [15]. However, we started this work for the following reasons, 1) In sequential ATPG procedure, more than two memory elements belonging to a functional units such as registers and state machines are often required to be justified at a time. At RTL, state machines and registers are explicitly described and recognized as functional units while gate level memory elements are scattered over the circuit. 2) As discussed in [6], if the circuit is modified so that the test sequence which causes state transition between initial and final states of sequential ATPG can be easily obtained, ATPG results can be also improved. Complex state machines can be identified at RTL. According to the experimental results, our gate level DFT achieves high fault coverage comparable with the previously published most successful DFT methods, and DFT at RTL resulted in higher fault coverage than gate level DFT at much shorter CPU time.
In this paper, we propose a register file with data bypassing function. This register file bypasses data using data bypassing units instead of functional units when actual operation in functional units such as ALU is unnecessary. Applying this method to a general purpose microprocessor with benchmark programs, we demonstrate 50% power consumption reduction in functional units. Though length of bus lines increases a little due to an additional hardware in register file, as buses are not driven when data is bypassed, power consumption in bus lines is also reduced by 40% compared with the conventional architecture.
Hiroshi SUGAWARA Toshio TAKESHIMA Hiroshi TAKADA Yoshiaki S. HISAMUNE Kohji KANAMORI Takeshi OKAZAWA Tatsunori MUROTANI Isao SASAKI
A 3.3 V single power-supply 64 Mb flash memory with a DBL programming scheme has been developed and fabricated with 0.4 µm CMOS technology. 50 ns access time and 256 b erase/programming unit-capacity have been achieved by using hierarchical word- and bit-line structures and DBL programming scheme. Furthermore in order to lower operating voltage the HiCR cell is used. The chip size is 19.3 mm13.3 mm.
In this paper, we develop parallel scrambling techniques for the distributed sample scrambling (DSS), which are directly applicable to the bit- and multibit-interleaved multiplexing environments. We first consider how to realize PSRGs, parallel samplings and parallel corrections for the multibit-parallel DSS (MPDSS), which are the fundamental problems in realizing the MPDSS scramblers and descramblers. The results are summarized in three sets of theorems, and a corollary is attached to each theorem to specifically handle the case of the parallel DSS (PDSS). The theorems and corollaries are supported by examples that demonstrate the relevant MPDSS scramblers and descramblers.
Hiroshi MASUYARA Yuichiro MORITA Etsuko MASUYAMA
A multiple instruction stream-multiple data stream (MIMD) computer is a parallel computer consisting of a large number of identical processing elements. The essential feature that distinguishes one MIMD computer family from another is the interconnection network. In this paper, we are concerned with a representative type of interconnection networks: the hypercube connected network. A family of regular graphs is presented as a possible candidate for the implementation of a distributed system and for fault-tolerant architectures. The symmetry of graphs makes it possible to determine message routing by using a simple distributed algorithm. A candidate having the same property is the hypercube connected network. Arbitrary data permutations are generally accomplished by sorting. For certain classes of permutations, however, this is, for many frequently used permutations in parallel processing such as bit reversal, bit shuffle, bit complement, matrix transpose, butterfly permutations used in FFT algorithms, and segment shuffles, there exist algorithms that are more efficient than the best sorting algorithm. One such class is the bit permute complement (BPC) class of permutations. In this paper, we, first, develop an algorithm to realize an arbitrary BPC permutation in hypercube connected networks. The developed algorithm in hypercube connected networks requires only 1 token memory register in each node. We next evaluate the ability to realize BPC permutations in these networks of an arbitrary size by estimating the number of required routing steps.
Vasily G. MOSHNYAGA Yutaka MORI Keikichi TAMARU
In order to shorten the time-to-market, Application-Specific Integrated Circuits (ASIC's) are designed from a library of pre-defined layout implementations for register-transfer modules such as multipliers, adders, RAM, ROM, etc. Current approaches to selecting the implementations from the library usually deal with their timing-area estimates and do not consider delay of the intermodule wiring. However, as sub-micron design rules are utilized for IC fabrication, wiring delay becomes comparable to the functional unit delay and can not longer be ignored even in register-transfer synthesis. In this paper we propose an algorithm that combines module selection with Performance-Driven module placement and reduces an impact of wiring on sub-micron ASIC performance. The algorithm not only efficiently exploits multiple module realizations in the design library, but also finds the module placement which minimizes wiring delay. Experimental results on several benchmarks show that considering both module and wiring issues, more than 30% reduction of the total circuit delay can be achieved.
Hisashi IWAMOTO Naoya WATANABE Akira YAMAZAKI Seiji SAWADA Yasumitsu MURAI Yasuhiro KONISHI Hiroshi ITOH Masaki KUMANOYA
A multiple-registered architecture is described for 180 MHz 16 Mbit synchronous DRAM. The proposed architecture realizes a flexible control of critical timings such as I/O line busy time and achieves an operation at 180 MHz clock rate with area penalty of only 5.4% over the conventional DRAM.
Masahiro HIGUCHI Osamu SHIRAKAWA Hiroyuki SEKI Mamoru FUJII Tadao KASAMI
This paper presents a method for verifying safety property of a communication protocol modeled as two extended communicating finite-state machines with two unbounded FIFO channels connecting them. In this method, four types of atomic formulae specifying a condition on a machine and a condition on a sequence of messages in a channel are introduced. A human verifier describes a logical formula which expresses conditions expected to be satisfied by all reachable global states, and a verification system proves that the formula is indeed satisfied by such states (i.e. the formula is an invariant) by induction. If the invariant is never satisfied in any unsafe state, it can be concluded that the protocol it safe. To show the effectiveness of this method, a sample protocol extracted from the data transfer phase of the OSI session protocol was verified by using the verification system.