1-16hit |
Shinichi KAWAMURA Yuichi KOMANO Hideo SHIMIZU Saki OSUKA Daisuke FUJIMOTO Yuichi HAYASHI Kentaro IMAFUKU
The residue number system (RNS) is a method for representing an integer x as an n-tuple of its residues with respect to a given set of moduli. In RNS, addition, subtraction, and multiplication can be carried out by independent operations with respect to each modulus. Therefore, an n-fold speedup can be achieved by parallel processing. The main disadvantage of RNS is that we cannot efficiently compare the magnitude of two integers or determine the sign of an integer. Two general methods of comparison are to transform a number in RNS to a mixed-radix system or to a radix representation using the Chinese remainder theorem (CRT). We used the CRT to derive an equation approximating a value of x relative to M, the product of moduli. Then, we propose two algorithms that efficiently evaluate the equation and output a sign bit. The expected number of steps of these algorithms is of order n. The algorithms use a lookup table that is (n+3) times as large as M, which is reasonably small for most applications including cryptography.
Ming-Hwa SHEU Yuan-Ching KUO Su-Hon LIN Siang-Min SIAO
This paper presents a novel adaptable 4-moduli set {2n + k, 2n+1, 2n-1, 22n+1}. It offers diverse dynamic ranges (DRs) from 25n-2n to 25n + k-2n + k that are used to conquer the over-range issue in RNS-application hardware designs. The proposed adaptable set possesses the coarse parameter n and fine parameter k. It not only has better parallelism and larger dynamic range (DR) than the existing adaptive 3-moduli sets, but also holds more sizable and flexible than the general 4-moduli sets with single parameter. For the adaptable R-to-B conversion, this paper first derives a fast reverse converting algorithm based on Chinese Remainder Theorem (CRT) and then presents the efficient converter architecture. From the experimental results, the proposed adaptable converter achieves better hardware performance in various DRs. Based on TSMC 0.18 µm CMOS technology, the proposed converter design is implemented and its results get at least 20.93% saving of Area-Delay-Power (ADP) products on average when comparing with the latest converter works.
Based on a reverse converter algorithm derived from the New Chinese Remainder Theorem I, an algorithm for sign detection of RNS {2n-1, 2n, 2n+1} is presented in this paper. The hardware of proposed algorithm can be implemented using two n-bit additions and one (n+1)-bit comparator. Comparing with the previous paper, the proposed algorithm has reduced the number of additions used in the circuit. The experimental results show that the proposed circuit achieves 17.3% savings in area for small moduli and 10.5% savings in area for large moduli on an average, with almost the same speed. The power dissipations obtain 12.6% savings in average.
Keivan NAVI Mohammad ESMAEILDOUST Amir SABBAGH MOLAHOSSEINI
This paper presents a general architecture for designing efficient reverse converters based on the moduli set {2α, 22β+1-1, 2β-1}, where β < α ≤ 2β, by using a parallel implementation of mixed-radix conversion (MRC) algorithm. The moduli set {2α, 22β+1-1, 2β-1} is free from modulo (2k+1)-type which can result in an efficient arithmetic unit for residue number system (RNS). The values of α and β can be selected to provide the required dynamic range (DR) and also to adjust the desired equilibrium between moduli bit-width. The simple multiplicative inverses of the proposed moduli set and also using novel techniques to simplify conversion equations lead to a low-complexity and high-performance general reverse converter architecture that can be used to support different DRs. Moreover, due to the current importance of the 5n-bit DR moduli sets, we also introduced the moduli set {22n, 22n+1-1, 2n-1} which is a special case of the general set {2α, 22β+1-1, 2β-1}, where α=2n and β=n. The converter for this special set is derived from the presented general architecture with higher speed than the fastest state-of-the-art reverse converter which has been designed for the 5n-bit DR moduli set {22n, 22n+1-1, 2n-1}. Furthermore, theoretical and FPGA implementation results show that the proposed reverse converter for moduli set {22n, 22n+1-1, 2n-1} results in considerable improvement in conversion delay with less hardware requirements compared to other works with similar DR.
Amir Sabbagh MOLAHOSSEINI Chitra DADKHAH Keivan NAVI Mohammad ESHGHI
In this paper, the new residue number system (RNS) moduli sets {22n, 2n -1, 2n+1 -1} and {22n, 2n -1, 2n-1 -1} are introduced. These moduli sets have 4n-bit dynamic range and well-formed moduli which can result in high-performance residue to binary converters as well as efficient RNS arithmetic unit. Next, efficient residue to binary converters for the proposed moduli sets based on mixed-radix conversion (MRC) algorithm are presented. The converters are ROM-free and they are realized using carry-save adders and modulo adders. Comparison with the other residue to binary converters for 4n-bit dynamic range moduli sets shown that the presented designs based on new moduli sets {22n, 2n -1, 2n+1 -1} and {22n, 2n -1, 2n-1 -1} are improved the conversion delay and result in hardware savings. Also, the proposed moduli sets can lead to efficient binary to residue converters, and they can speed-up internal RNS arithmetic processing, compared with the other 4n-bit dynamic range moduli sets.
Su-Hon LIN Ming-Hwa SHEU Chao-Hsiang WANG
The moduli set (2n, 2n+1-1, 2n-1) which is free of (2n+1)-type modulus is profitable to construct a high-performance residue number system (RNS). In this paper, we derive a reduced-complexity residue-to-binary conversion algorithm for the moduli set (2n, 2n+1-1, 2n-1) by using New Chinese Remainder Theorem (CRT). The resulting converter architecture mainly consists of simple adder and multiplexer (MUX) which is suitable to realize an efficient VLSI implementation. For the various dynamic range (DR) requirements, the experimental results show that the proposed converter can significantly achieve at least 23.3% average Area-Time (AT) saving when comparing with the latest designs. Based on UMC 0.18 µm CMOS cell-based technology, the chip area for 16-bit residue-to-binary converter is 931931 µm2 and its working frequency is about 135 MHz including I/O pad.
A new Hybrid-Carry-Selection (HCS) approach for deriving an efficient modulo 2n-1 addition is presented in this study. Its resulting adder architecture is simple and applicable for all n values. Based on 180-nm CMOS technology, the HCS-based modulo 2n-1 adder demonstrates its superiority in Area-Time (AT) performance over existing solutions.
Fast and simple algorithm of a parity checker for a large residue numbers is presented. A new set of RNS moduli with 2r-(2l1) form for fast modular multiplication is proposed. The proposed RNS moduli has a large dynamic range for a large RNS number. The parity of a residue number can be checked by the Chinese remainder theorem (CRT). A CRT-based parity checker is simply organized by the Montgomery reduction method (MRM), implemented by using multipliers and the carry-save adder array. We present a fast parity checker with minimal hardware processed in three clock cycles for 32-bit RNS modulus set.
Hanae NOZAKI Atsushi SHIMBO Shinichi KAWAMURA
This paper proposes a new algorithm to achieve about two-times speedup of modular exponentiation which is implemented by Montgomery multiplication based on Residue Number Systems (RNS). In RNS Montgomery multiplication, its performance is determined by two base transformations dominantly. For the purpose of realizing parallel processing of these base transformations, i. e. "duplicate processing," we present two procedures of RNS Montgomery multiplication, in which RNS bases a and b are interchanged, and perform them alternately in modular exponentiation iteration. In an investigation of implementation, 1.87-times speedup has been obtained for 1024-bit modular multiplication. The proposed RNS Montgomery multiplication algorithm has an advantage in achieving the performance corresponding to that the upper limit of the number of parallel processing units is doubled.
Algorithms are presented for the four elementary arithmetic operations, to perform reliable floating-point arithmetic operations. These arithmetic operations can be achieved by applying residue techniques to the weighted number systems and performed with no accuracy lost in the process of the computing. The arithmetic operations presented can be used as elementary tools (on many existing architectures) to ensure the reliability of numerical computations. Simulation results especially for the solutions of ill-conditioned problems are given with emphasis on the practical usability of the tools.
Makoto SYUTO Eriko SATAKE Koichi TANNO Okihiko ISHIZUKA
In this letter, we propose high-speed binary to residue converters for moduli 2n, 2n 1 without using look-up table. For integration of residue arithmetic circuit using a signed-digit (SD) number representation with ordinary binary system, the proposed circuits carry out the efficient conversion. Using SD adders instead of ordinary adders that are used in conventional binary to residue converter, the high-speed conversion without the carry propagation can be achieved. Thus, the proposed converter is independent of the size of modulus and can speed up the binary to residue conversion. On the simulation, the conversion delay times are 1.78 ns for modulus 210-1 and 1.73 ns for modulus 210+1 under the condition of 0.6 µm CMOS technology, respectively. The active area of the proposed converter for moduli 210 1 is 335 µm325 µm.
A novel residue arithmetic algorithm using radix-2 signed-digit (SD) number representation is presented. By this representation, memoryless residue arithmetic circuits using SD adders can be implemented. Conventional residue arithmetic circuits have been designed using binary number arithmetic system, but the carry propagation arises which limits the speed of arithmetic operations in residue modules. In this paper, a p-digit radix-2 SD number system is introduced to simplify the residue operation. For a modulus m, 2p-1 m 2p+2p-1-1, in a residue number system (RNS), the modulo m addition is performed by using two p-digit SD adders, one for the addition and one for the residue operation. Thus, the modulo m addition time is independent of the word length of operands. When m=2p or m= 2p 1, the modulo m addition is implemented by using only one SD adder. Moreover, a modulo m multiplier is constructed using a binary modulo m SD adder tree, and the modulo m multiplication can be performed in a time proportional to log 2 p. The VHDL implementation method for the presented algorithm is also discussed. The design and simulation results of some residue arithmetic circuits show that high speed residue arithmetic circuits can be obtained by the presented algorithms.
A compact residue arithmetic multiplier based on the radix-4 signed-digit arithmetic is presented. Conventional residue arithmetic circuits have been designed using binary number arithmetic system, but the carry propagation arises which limits the speed of arithmetic operations in residue modules. In this paper, two radix-4 signed-digit (SD) number representations, {-2,-1,0,1,2} and {-3,-2,-1,0,1,2,3}, are introduced. The former is used for the input and output, and the later for the inner arithmetic circuit of the presented multiplier. Integers 4p and 4p 1 are used as moduli of residue number system (RNS), where p is a positive integer and both circuits for partial product generation and sum of the partial products can be efficiently constructed by using the multiple-valued current-mode circuits. The modulo m addition, m=4p or m=4p 1, can be performed by an SD adder or an end-around-carry SD adder with the multiple-valued circuits and the addition time is independent of the word length of operands. The modulo m multiplier can be compactly constructed using a binary tree of the multiple-valued modulo m SD adders, and consequently the modulo m multiplication is performed in O(log p) time. The number of MOS transistors required in the presented residue arithmetic multiplier is about 86p2 + 66p.
To realize high-speed computations in a residue number system (RNS), an implementation method for residue arithmetic circuits using signed-digit (SD) number representation is proposed. Integers mp = (2p-1) known as Mersenne numbers are used as moduli, so that modulo mp addition can be performed by an end-around-carry SD adder and the addition time is independent of the word length of operands. Using a binary modulo mp SD adder tree, the modulo mp multiplication can be performed in a time proportional to log2p.
Makoto HONDA Michitaka KAMEYAMA Tatsuo HIGUCHI
The demand for high-speed image processing is obvious in many real-world computations such as robot vision. Not only high throughput but also small latency becomes an important factor of the performance, because of the requirement of frequent visual feedback. In this paper, a high-performance VLSI image processor based on the multiple-valued residue arithmetic circuit is proposed for such applications. Parallelism is hierarchically used to realize the high-performance VLSI image processor. First, spatially parallel architecture that is different from pipeline architecture is considered to reduce the latency. Secondly, residue number arithmetic is introduced. In the residue number arithmetic, data communication between the mod mi arithmetic units is not necessary, so that multiple mod mi arithmetic units can be completely separated to different chips. Therefore, a number of mod mi multiply adders can be implemented on a single VLSI chip based on the modulus-slice concept. Finally, each mod mi arithmetic unit can be effectively implemented in parallel structure using the concept of a pseudoprimitive root and the multiple-valued current-mode circuit technology. Thus, it is made clear that the throughout use of parallelism makes the latency 1/3 in comparison with the ordinary binary implementation.
Satoshi MIKI Hiroshi MIYANAGA Hironori YAMAUCHI
This paper presents a method for LSI implementation of a long-tap acoustic echo canceller algorithm using the residue number system (RNS) and the mixed-radix number system (MRS). It also presents a quantitative comparison of echo canceller architectures, one using the RNS and the other using the binary number system (BNS). In the RNS, addition, subtraction, and multiplication are executed quickly but scaling, overflow detection, and division are difficult. For this reason, no echo canceller using the RNS has been implemented. We therefore try to design an echo canceller architecture using the RNS and the NLMS algorithm. It is shown that the echo canceller algorithm can be effectively implemented using the RNS by introducing the MRS. The quantitative comparison of echo canceller architectures shows that a long-tap acoustic echo canceller can be implemented more effectively in terms of chip size and power dissipation by the architecture using the RNS.