The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] computer arithmetic(16hit)

1-16hit
  • Low-Latency Low-Cost Architecture for Square and Cube Roots

    Jihyuck JO  In-Cheol PARK  

     
    PAPER-Digital Signal Processing

      Vol:
    E100-A No:9
      Page(s):
    1951-1955

    This paper presents a low-latency, low-cost architecture for computing square and cube roots in the fixed-point format. The proposed architecture is designed based on a non-iterative root calculation scheme to achieve fast computations. While previous non-iterative root calculators are restricted to a square-root operation due to the limitation of their mathematical property, the root computation is generalized in this paper to apply an approximation method to the non-iterative scheme. On top of that, a recurrent method is proposed to select parameters, which enables us to reduce the table size while keeping the maximum relative error value low. Consequently, the proposed root calculator can support both square and cube roots at the expense of small delay and low area overheads. This extension can be generalized to compute the nth roots, where n is a positive integer.

  • Multiplier-less and Table-less Linear Approximation for Square-Related Functions

    In-Cheol PARK  Tae-Hwan KIM  

     
    PAPER-Fundamentals of Information Systems

      Vol:
    E93-D No:11
      Page(s):
    2979-2988

    Square-related functions such as square, inverse square, square-root and inverse square-root operations are widely used in digital signal processing and digital communication algorithms, and their efficient realizations are commonly required to reduce the hardware complexity. In the implementation point of view, approximate realizations are often desired if they do not degrade performance significantly. In this paper, we propose new linear approximations for the square-related functions. The traditional linear approximations need multipliers to calculate slope offsets and tables to store initial offset values and slope values, whereas the proposed approximations exploit the inherent properties of square-related functions to linearly interpolate with only simple operations, such as shift, concatenation and addition, which are usually supported in modern VLSI systems. Regardless of the bit-width of the number system, more importantly, the maximum relative errors of the proposed approximations are bounded to 6.25% and 3.13% for square and square-root functions, respectively. For inverse square and inverse square-root functions, the maximum relative errors are bounded to 12.5% and 6.25% if the input operands are represented in 20 bits, respectively.

  • Efficient MRC-Based Residue to Binary Converters for the New Moduli Sets {22n, 2n -1, 2n+1 -1} and {22n, 2n -1, 2n-1 -1}

    Amir Sabbagh MOLAHOSSEINI  Chitra DADKHAH  Keivan NAVI  Mohammad ESHGHI  

     
    PAPER-Computer Systems

      Vol:
    E92-D No:9
      Page(s):
    1628-1638

    In this paper, the new residue number system (RNS) moduli sets {22n, 2n -1, 2n+1 -1} and {22n, 2n -1, 2n-1 -1} are introduced. These moduli sets have 4n-bit dynamic range and well-formed moduli which can result in high-performance residue to binary converters as well as efficient RNS arithmetic unit. Next, efficient residue to binary converters for the proposed moduli sets based on mixed-radix conversion (MRC) algorithm are presented. The converters are ROM-free and they are realized using carry-save adders and modulo adders. Comparison with the other residue to binary converters for 4n-bit dynamic range moduli sets shown that the presented designs based on new moduli sets {22n, 2n -1, 2n+1 -1} and {22n, 2n -1, 2n-1 -1} are improved the conversion delay and result in hardware savings. Also, the proposed moduli sets can lead to efficient binary to residue converters, and they can speed-up internal RNS arithmetic processing, compared with the other 4n-bit dynamic range moduli sets.

  • A Hardware Algorithm for Integer Division Using the SD2 Representation

    Naofumi TAKAGI  Shunsuke KADOWAKI  Kazuyoshi TAKAGI  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E89-A No:10
      Page(s):
    2874-2881

    A hardware algorithm for integer division is proposed. It is based on the radix-2 non-restoring division algorithm. Fast computation is achieved by the use of the radix-2 signed-digit (SD2) representation. The algorithm does not require normalization of the divisor, and hence, does not require an area-consuming leading-one (or zero) detection nor shifts of variable-amount. Combinational (unfolded) implementation of the algorithm yields a regularly structured array divider, and sequential implementation yields compact dividers.

  • Hardware Algorithm for Computing Reciprocal of Euclidean Norm of a 3-D Vector

    Fumio KUMAZAWA  Naofumi TAKAGI  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E89-A No:6
      Page(s):
    1799-1806

    A hardware algorithm for computing the reciprocal of the Euclidean norm of a 3-dimensional (3-D) vector which appears frequently in 3-D computer graphics is proposed. It is based on a digit-recurrence algorithm for computing the Euclidean norm and an on-line division (on-line reciprocal computation) algorithm. These algorithms are modified, so that the reciprocal of the Euclidean norm is computed by performing on-line division where the divisor is the partial result of Euclidean norm computation. Division, square-rooting, and reciprocal square-root computation, which are important operations in 3-D graphics, can also be performed using a circuit based on the proposed algorithm.

  • Counter Tree Diagrams: A Unified Framework for Analyzing Fast Addition Algorithms

    Jun SAKIYAMA  Naofumi HOMMA  Takafumi AOKI  Tatsuo HIGUCHI  

     
    PAPER-IP Design

      Vol:
    E86-A No:12
      Page(s):
    3009-3019

    This paper presents a unified representation of fast addition algorithms based on Counter Tree Diagrams (CTDs). By using CTDs, we can describe and analyze various adder architectures in a systematic way without using specific knowledge about underlying arithmetic algorithms. Examples of adder architectures that can be handled by CTDs include Redundant-Binary (RB) adders, Signed-Digit (SD) adders, Positive-Digit (PD) adders, carry-save adders, parallel counters (e.g., 3-2 counters and 4-2 counters) and networks of such basic adders/counters. This paper also discusses the CTD-based analysis of carry-propagation-free adders using various number representations.

  • Digit-Recurrence Algorithm for Computing Reciprocal Square-Root

    Naofumi TAKAGI  Daisuke MATSUOKA  Kazuyoshi TAKAGI  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E86-A No:1
      Page(s):
    221-228

    A digit-recurrence algorithm for computing reciprocal square-root which appears frequently in multimedia and graphics applications is proposed. The reciprocal square-root is computed by iteration of carry-propagation-free additions, shifts, and multiplications by one digit. Different specific versions of the algorithm are possible, depending on the radix, the redundancy factor of the digit set, and etc. Details of a radix-2 version and a radix-4 version and designs of a floating-point reciprocal square-root circuit based on them are shown.

  • Design of the Floating-Point Adder Supporting the Format Conversion and the Rounding Operations with Simultaneous Rounding Scheme

    Woo-Chan PARK  Cheol-Ho JEONG  Tack-Don HAN  

     
    LETTER-Computer System Element

      Vol:
    E85-D No:8
      Page(s):
    1341-1345

    The format conversion operations between a floating-point number and an integer number and a round operation are the important standard floating-point operations. In most cases, these operations are implemented by adding additional hardware to the floating-point adder. The SR (simultaneous rounding) method, one of the techniques used to improve the performance of the floating-point adder, can perform addition and rounding operations at the same stage and is an efficient method with respect to the silicon area and its performance. In this paper, a hardware model to execute CRops (conversion and rounding operations) for the SR floating-point adder is presented and CRops are analyzed on the proposed hardware model. Implementation details are also discussed. The proposed scheme can maintain the advantages of the SR method and can perform each CRop with three pipeline stages.

  • Design of High-Radix VLSI Dividers without Quotient Selection Tables

    Takafumi AOKI  Kimihiko NAKAZAWA  Tatsuo HIGUCHI  

     
    PAPER-VLSI Design

      Vol:
    E84-A No:11
      Page(s):
    2623-2631

    In this paper, we propose a unified high-radix division algorithm for high-speed signal and data processing applications, and present the design and evaluation of high-radix parallel dividers based on the proposed algorithm. By prescaling the input operands and converting some significant digits of a partial remainder into non-redundant representation, the quotient digit can be obtained directly from the partial remainder without using quotient digit selection tables. Performance evaluation shows that the proposed radix-4 and radix-8 divider architectures achieve faster computation with the same level of hardware complexity than the binary counterparts. We also show an experimental fabrication of a radix-4 divider chip in 0.35 µm CMOS technology.

  • A Digit-Recurrence Algorithm for Cube Rooting

    Naofumi TAKAGI  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E84-A No:5
      Page(s):
    1309-1314

    A digit-recurrence algorithm for cube rooting is proposed. In cube rooting, the digit-recurrence equation of the residual includes the square of the partial result of the cube root. In the proposed algorithm, the square of the partial result is kept, and the square, as well as the residual, is updated by addition/subtraction, shift, and multiplication by one or two digits. Different specific versions of the algorithm are possible, depending on the radix, the digit set of the cube root, and etc. Any version of the algorithm can be implemented as a sequential (folded) circuit or a combinational (unfolded) circuit, which is suitable for VLSI realization.

  • Evolutionary Synthesis of Fast Constant-Coefficient Multipliers

    Naofumi HOMMA  Takafumi AOKI  Tatsuo HIGUCHI  

     
    PAPER-Nonlinear Problems

      Vol:
    E83-A No:9
      Page(s):
    1767-1777

    This paper presents an efficient graph-based evolutionary optimization technique called Evolutionary Graph Generation (EGG), and its application to the design of fast constant-coefficient multipliers using parallel counter-tree architecture. An important feature of EGG is its capability to handle the general graph structures directly in evolution process instead of encoding the graph structures into indirect representations, such as bit strings and trees. This paper also addresses the major problem of EGG regarding the significant computation time required for verifying the function of generated circuits. To solve this problem, a new functional verification technique for arithmetic circuits is proposed. It is demonstrated that the EGG system can create efficient multiplier structures which are comparable or superior to the known conventional designs.

  • Radix-2-4-8 CORDIC for Fast Vector Rotation

    Takafumi AOKI  Ichiro KITAORI  Tatsuo HIGUCHI  

     
    PAPER

      Vol:
    E83-A No:6
      Page(s):
    1106-1114

    This paper presents a constant-scale-factor radix-2-4-8 CORDIC algorithm for fast vector rotation and sine/cosine computation. The CORDIC algorithm is a well-known hardware algorithm for computing various elementary functions. Due to its sequential nature of computation, however, significant reduction in processing latency is required for real-time signal processing applications. The proposed radix-2-4-8 CORDIC algorithm dynamically changes the radix of computation during operation, and makes possible the reduction in the number of iterations by 37% for 64-bit precision. This paper also describes the hardware implementation of radix-2-4-8 CORDIC unit that can be installed into practical digital signal processors.

  • Signed-Weight Arithmetic and Its Application to a Field-Programmable Digital Filter Architecture

    Takafumi AOKI  Yoshiki SAWADA  Tatsuo HIGUCHI  

     
    PAPER-Configurable Computing and Fault Tolerance

      Vol:
    E82-C No:9
      Page(s):
    1687-1698

    This paper presents a new number representation called the Signed-Weight (SW) number system, which is useful for designing configurable counter-tree architectures for digital signal processing applications. The SW number system allows the unified manipulation of positive and negative numbers in arithmetic circuits by adjusting the signs assigned to individual digit positions. This makes possible the construction of highly regular arithmetic circuits without introducing irregular arithmetic operations, such as negation and sign extension in the two's complement representation. This paper also presents the design of a Field-Programmable Digital Filter (FPDF) architecture--a special-purpose FPGA architecture for high-speed FIR filtering--using the proposed SW arithmetic system.

  • Evolutionary Design of Arithmetic Circuits

    Takafumi AOKI  Naofumi HOMMA  Tatsuo HIGUCHI  

     
    PAPER

      Vol:
    E82-A No:5
      Page(s):
    798-806

    This paper presents a new approach to designing arithmetic circuits by using a graph-based evolutionary optimization technique called Evolutionary Graph Generation (EGG). The key idea of the proposed method is to introduce a higher level of abstraction for arithmetic algorithms, in which arithmetic circuit structures are modeled as data-flow graphs associated with specific number representation systems. The EGG system employs evolutionary operations to transform the structure of graphs directly, which makes it possible to generate the desired circuit structure efficiently. The potential capability of EGG is demonstrated through an experiment of generating constant-coefficient multipliers.

  • Floating-Point Divide Operation without Special Hardware Supports

    Takashi AMISAKI  Umpei NAGASHIMA  Kazutoshi TANABE  

     
    LETTER-Numerical Analysis and Optimization

      Vol:
    E82-A No:1
      Page(s):
    173-177

    Three multiplicative algorithms for the floating-point divide operation are compared: the Newton-Raphson method, Goldschmidt's algorithm, and a naive method that simply calculates a form of the Taylor series expansion of a reciprocal. The series also provides a theoretical basis for Goldschmidt's algorithm. It is well known that, of the Newton-Raphson method and Goldschmidt's algorithm, the former is the more accurate while the latter is the faster on a pipelined unit. However, little is reported about the naive method. In this report, we analyze the speed and accuracy of each method and present the results of numerical tests, which we conducted to confirm the validity of the accuracy analysis. Basically, the comparison are made in the context of software implementation (e. g. , a macro library) and compliance with the IEEE Standard 754 rounding is not considered. It is shown that the naive method is useful in a realistic setting where the number of iterations is small and the method is implemented on a pipelined floating-point unit with a multiply-accumulate configuration. In such a situation, the naive method gives a more accurate result with a slightly lower latency, as compared with Goldschmidt's algorithm, and is much faster than but slightly inferior in accuracy to the Newton-Raphson method.

  • A Modular Inversion Hardware Algorithm with a Redundant Binary Representation

    Naofumi TAKAGI  

     
    PAPER-Computer Hardware and Design

      Vol:
    E76-D No:8
      Page(s):
    863-869

    A hardware algorithm for modular inversion is proposed. It is based on the extended Euclidean algorithm. All intermediate results are represented in a redundant binary representation with a digit set {0, 1,1}. All addition/subtractions are performed without carry propagation. A modular inversion is carried out in O (n) clock cycles where n is the word length of the modulus. The length of each clock cycle is constant independent of n. A modular inverter based on the algorithm has a regular cellular array structure with a bit slice feature and is very suitable for VLSI implementation. Its amount of hardware is proportional to n.