The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] multiplication(86hit)

41-60hit(86hit)

  • Comments on 'A 70 MHz Multiplierless FIR Hilbert Transformer in 0.35 µm Standard CMOS Library'

    Oscar GUSTAFSSON  

     
    LETTER-VLSI Design Technology and CAD

      Vol:
    E91-A No:3
      Page(s):
    899-900

    In this comment we point out that the mapping from carry-propagation adders to carry-save adders in the context of shift-and-add multiplication is inconsistent. Based on this it is shown that the implementation in Ref.[1] does not achieve any complexity reduction in practice.

  • RSFQ Baseband Digital Signal Processing

    Anna Yurievna HERR  

     
    INVITED PAPER

      Vol:
    E91-C No:3
      Page(s):
    293-305

    Ultra fast switching speed of superconducting digital circuits enable realization of Digital Signal Processors with performance unattainable by any other technology. Based on rapid-single-flux technology (RSFQ) logic, these integrated circuits are capable of delivering high computation capacity up to 30 GOPS on a single processor and very short latency of 0.1 ns. There are two main applications of such hardware for practical telecommunication systems: filters for superconducting ADCs operating with digital RF data and recursive filters at baseband. The later of these allows functions such as multiuser detection for 3G WCDMA, equalization and channel precoding for 4G OFDM MIMO, and general blind detection. The performance gain is an increase in the cell capacity, quality of service, and transmitted data rate. The current status of the development of the RSFQ baseband DSP is discussed. Major components with operating speed of 30 GHz have been developed. Designs, test results, and future development of the complete systems including cryopackaging and CMOS interface are reviewed.

  • Montgomery Multiplication with Twice the Bit-Length of Multipliers

    Masayuki YOSHINO  Katsuyuki OKEYA  Camille VUILLAUME  

     
    PAPER-Implementation

      Vol:
    E91-A No:1
      Page(s):
    203-210

    We present a novel approach for computing 2n-bit Montgomery multiplications with n-bit hardware Montgomery multipliers. Smartcards are usually equipped with such hardware Montgomery multipliers; however, due to progresses in factoring algorithms, the recommended bit length of public-key schemes such as RSA is steadily increasing, making the hardware quickly obsolete. Thanks to our double-size technique, one can re-use the existing hardware while keeping pace with the latest security requirements. Unlike the other double-size techniques which rely on classical n-bit modular multipliers, our idea is tailored to take advantage of n-bit Montgomery multipliers. Thus, our technique increases the perenniality of existing products without compromises in terms of security.

  • A 70 MHz Multiplierless FIR Hilbert Transformer in 0.35 µm Standard CMOS Library

    Yasuhiro TAKAHASHI  Toshikazu SEKINE  Michio YOKOYAMA  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E90-A No:7
      Page(s):
    1376-1383

    This paper presents the implementation of a 31-tap FIR Hilbert transform digital filter chip used in the digital-IF receivers, to confirm the effectiveness of our new design method. Our design method that we previously reported is based on a computation sharing multiplier using a new horizontal and vertical common subexpression techniques. A 31-tap FIR Hilbert transform digital filter was implemented and fabricated in 0.35 µm CMOS standard cell library. The chip's core contains approximately 33k transistors and occupies 0.86 mm2. The chip also has an operating speed of 70 MHz over. The implementation results show that the proposed Hilbert transformer has a smallest cost factor and so that is a high performance filter.

  • A New Upper Bound for the Minimal Density of Joint Representations in Elliptic Curve Cryptosystems

    Erik DAHMEN  Katsuyuki OKEYA  Tsuyoshi TAKAGI  

     
    PAPER

      Vol:
    E90-A No:5
      Page(s):
    952-959

    The most time consuming operation to verify a signature with the Elliptic Curve Digital Signature Algorithm is a multi-scalar multiplication with two scalars. Efficient methods for its computation are the Shamir method and the Interleave method, whereas the performance of those methods can be improved by using general base-2 representations of the scalars. In exchange for the speed-up, those representations require the precomputation of several points that must be stored. In the case of two precomputed points, the Interleave method and the Shamir method provide the same, optimal efficiency. In the case of more precomputed points, only the Interleave method can be sped-up in an optimal way and is currently more efficient than the Shamir method. This paper proposes a new general base-2 representation of the scalars that can be used to speed up the Shamir method. It requires the precomputation of ten points and is more efficient than any other representation that also requires ten precomputed points. Therefore, the proposed method is the first to improve the Shamir method such that it is faster than the Interleave method.

  • Concurrent Error Detection in Montgomery Multiplication over GF(2m)

    Che-Wun CHIOU  Chiou-Yng LEE  An-Wen DENG  Jim-Min LIN  

     
    PAPER-Information Security

      Vol:
    E89-A No:2
      Page(s):
    566-574

    Because fault-based attacks on cryptosystems have been proven effective, fault diagnosis and tolerance in cryptography have started a new surge of research and development activity in the field of applied cryptography. Without magnitude comparisons, the Montgomery multiplication algorithm is very attractive and popular for Elliptic Curve Cryptosystems. This paper will design a Montgomery multiplier array with a bit-parallel architecture in GF(2m) with concurrent error detection capability to protect it against fault-based attacks. The robust Montgomery multiplier array with concurrent error detection requires only about 0.2% extra space overhead (if m=512 is as an example) and requires four extra clock cycles compared to the original Montgomery multiplier array without concurrent error detection.

  • Toward Incremental Parallelization Using Navigational Programming

    Lei PAN  Wenhui ZHANG  Arthur ASUNCION  Ming Kin LAI  Michael B. DILLENCOURT  Lubomir F. BIC  Laurence T. YANG  

     
    PAPER-Parallel/Distributed Programming Models, Paradigms and Tools

      Vol:
    E89-D No:2
      Page(s):
    390-398

    The Navigational Programming (NavP) methodology is based on the principle of self-migrating computations. It is a truly incremental methodology for developing parallel programs: each step represents a functioning program, and each intermediate program is an improvement over its predecessor. The transformations are mechanical and straightforward to apply. We illustrate our methodology in the context of matrix multiplication, showing how the transformations lead from a sequential program to a fully parallel program. The NavP methodology is conducive to new ways of thinking that lead to ease of programming and high performance. Even though our parallel algorithm was derived using a sequence of mechanical transformations, it displays certain performance advantages over the classical handcrafted Gentleman's Algorithm.

  • Efficient Hyperelliptic Curve Cryptosystems Using Theta Divisors

    Masanobu KATAGI  Toru AKISHITA  Izuru KITAMURA  Tsuyoshi TAKAGI  

     
    PAPER-Elliptic Curve Cryptography

      Vol:
    E89-A No:1
      Page(s):
    151-160

    It has recently been reported that the performance of hyperelliptic curve cryptosystems (HECC) is competitive to that of elliptic curve cryptosystems (ECC). Concerning the security of HECC, the theta divisors play an important role. The scalar multiplication using a random base point is vulnerable to an exceptional procedure attack, which is a kind of side-channel attacks, using theta divisors. In the case of cryptographic protocols of the scalar multiplication using fixed base point, however, the exceptional procedure attack is not applicable. First, we present novel efficient scalar multiplication using theta divisors, which is the positive application of theta divisors on HECC. Second, we develop a window-based method using theta divisors that is secure against side-channel attacks. It is not obvious how to construct a base point D such that all pre-computed points are theta divisors. We present an explicit algorithm for generating such divisors.

  • Refined Computations for Points of the Form 2kP Based on Montgomery Trick

    Daisuke ADACHI  Tomio HIRATA  

     
    LETTER-Information Security

      Vol:
    E89-A No:1
      Page(s):
    334-339

    This paper focuses on algorithms for an efficient scalar multiplication. It proposes two algorithms for computing points of the form 2kP in affine coordinates. One works for k=2, and the other works for an arbitrary natural number k. The efficiency of these algorithms is based on a trade-off between a field inversion and several field multiplications. Montgomery trick is used to implement this trade-off. Since a field inversion is usually more expensive than 10 field multiplications, the proposed algorithms are efficient in comparison with existing ones.

  • A New Type of Fast Endomorphisms on Jacobians of Hyperelliptic Curves and Their Cryptographic Application

    Katsuyuki TAKASHIMA  

     
    PAPER-Elliptic Curve Cryptography

      Vol:
    E89-A No:1
      Page(s):
    124-133

    The Gallant-Lambert-Vanstone method [14](GLV method for short) is a scalar multiplication method for elliptic curve cryptography (ECC). In WAP WTLS [49], SEC 2 [44], ANSI X9.62 [1] and X9.63 [2], several domain parameters for applications of the GLV method are described. Curves with those parameters have efficiently-computable endomorphisms. Recently the GLV method for Jacobians of hyperelliptic curve (HEC) has also been studied. In this paper, we discuss applications of the GLV method to curves with real multiplication (RM). It is the first time to use RM for efficient scalar multiplication as far as we know. We describe the general algorithm for using such RM, and we show that some genus 2 curves with RM have enough effciency to be used in the GLV method as in the previous CM case. Moreover, we will see that such RM curves can be obtained abundantly unlike the previously proposed CM curves of genus 2.

  • A Hardware Algorithm for Modular Multiplication/Division Based on the Extended Euclidean Algorithm

    Marcelo E. KAIHARA  Naofumi TAKAGI  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E88-A No:12
      Page(s):
    3610-3617

    A hardware algorithm for modular multiplication/division which performs modular division, Montgomery multiplication, and ordinary modular multiplication is proposed. The modular division in our algorithm is based on the extended Euclidean algorithm. We employ our newly proposed computation method that consists of processing the multiplier from the most significant digit first to calculate Montgomery multiplication. Finally, the ordinary modular multiplication is based on shift-and-add multiplication. Each of these three operations is carried out through the iteration of simple operations such as shifts and additions/subtractions. To avoid carry propagation in all additions and subtractions, the radix-2 signed-digit representation is employed. A modular multiplier/divider based on the algorithm has a linear array structure with a bit-slice feature and carries out n-bit modular multiplication/division in O(n) clock cycles, where the length of the clock cycle is constant and independent of n. This multiplier/divider can be implemented using a hardware amount only slightly larger than that of the modular divider.

  • Design Optimization of a High-Speed, Area-Efficient and Low-Power Montgomery Modular Multiplier for RSA Algorithm

    Shoichi MASUI  Kenji MUKAIDA  Masahiko TAKENAKA  Naoya TORII  

     
    PAPER-Digital

      Vol:
    E88-C No:4
      Page(s):
    576-581

    High-speed, area-efficient, and low-power Montgomery modular multipliers for RSA algorithm have been developed for digital signature and user authentication in high-speed network systems and smart card LSIs. The multiplier-accumulators (MAC) in the developed Montgomery modular multipliers have a non-identical multiplicand/multiplier word length organization. This organization can eliminate the bandwidth bottleneck associated with a data memory, and enables to use a single-port memory for area and power reductions. The developed MAC is faster than the conventional identical word length organization due to the shortened critical path. For smart card applications, an area-efficient architecture with 42 kgates can produce 1.2 digital signatures in a second for 2,048-bit key length with the power consumption of 6.8 mW.

  • Improvements of Addition Algorithm on Genus 3 Hyperelliptic Curves and Their Implementation

    Masaki GONDA  Kazuto MATSUO  Kazumaro AOKI  Jinhui CHAO  Shigeo TSUJII  

     
    PAPER-Public Key Cryptography

      Vol:
    E88-A No:1
      Page(s):
    89-96

    Genus 3 hyperelliptic curve cryptosystems are capable of fast-encryption on a 64-bit CPU, because a 56-bit field is enough for their definition fields. Recently, Kuroki et al. proposed an extension of the Harley algorithm, which had been known as the fastest addition algorithm of divisor classes on genus 2 hyperelliptic curves, on genus 3 hyperelliptic curves and Pelzl et al. improved the algorithm. This paper shows an improvement of the Harley algorithm on genus 3 hyperelliptic curves using Toom's multiplication. The proposed algorithm takes only I + 70M for an addition and I + 71M for a doubling instead of I + 76M and I + 74M respectively, which are the best possible of the previous works, where I and M denote the required time for an inversion and a multiplication over the definition field respectively. This paper also shows 2 variations of the proposed algorithm in order to adapt the algorithm to various platforms. Moreover this paper discusses finite field arithmetic suitable for genus 3 hyperelliptic curve cryptosystems and shows implementation results of the proposed algorithms on a 64-bit CPU. The implementation results show a 160-bit scalar multiplication can be done within 172 µs on a 64-bit CPU Alpha EV68 1.25 GHz.

  • Fast Elliptic Curve Multiplications Resistant against Side Channel Attacks

    Tetsuya IZU  Tsuyoshi TAKAGI  

     
    PAPER-Tamper-Resistance

      Vol:
    E88-A No:1
      Page(s):
    161-171

    This paper proposes fast elliptic curve multiplication algorithms resistant against side channel attacks, based on the Montgomery-type scalar multiplication. The proposed scalar multiplications can be applied to all curves over prime fields, e.g., any standardized curves over finite fields with characteristic larger than 3. The method utilizes the addition formulas xECDBL and xECADD assembled by only x-coordinates of points, and is applicable for any types of curves over finite fields. Then, we encapsulate two addition formulas into one formula xECADDDBL, which accomplishes a faster computation because several auxiliary variables of two formulas can be shared. We also develop a novel addition chain for the new formula, with which we can compute scalar multiplications. The improvement of our scalar multiplications over previous Coron's dummy operation method is about 18% for a 160-bit scalar multiplication. Our method requires no table-up of precomputed points and it is suitable for the implementation on memory constraint computing architectures, e.g., smart cards. Moreover, we optimize the proposed algorithms for parallelized implementations with SIMD operations. Compared with the similar scheme proposed by Fischer et al., our scheme is about 16% faster.

  • An SPA-Based Extension of Schindler's Timing Attack against RSA Using CRT

    Yuuki TOMOEDA  Hideyuki MIYAKE  Atsushi SHIMBO  Shinichi KAWAMURA  

     
    PAPER-Tamper-Resistance

      Vol:
    E88-A No:1
      Page(s):
    147-153

    At CHES 2000, Schindler introduced a timing attack that enables the factorization of an RSA-modulus if RSA implementations use the Chinese Remainder Theorem and Montgomery multiplication. In this paper we introduce another approach for deriving the secret prime factor by focusing on the conditional branch Schindler used in his attack. One of the countermeasures against Schindler's attack is the blinding method. If input data are blinded with a fixed value or short-period random numbers, Schindler's attack does not work but our method can still factorize the RSA-modulus.

  • Efficient Scalar Multiplication on Montgomery-Form Elliptic Curves

    Yuichi FUTA  Motoji OHMORI  

     
    PAPER-Information Security

      Vol:
    E87-A No:8
      Page(s):
    2126-2136

    Montgomery-form elliptic curves have the advantage of faster arithmetic than Weierstrass-form elliptic curves. The dominant operation of the Elliptic Curve Cryptosystem (ECC) is scalar multiplication of points on an elliptic curve, and it usually includes scalar multiplication of a fixed base point of ECC. For Weierstrass-form elliptic curves, accelerating methods of scalar multiplication by using a pre-computed table of the fixed point have been widely studied. However, such methods cannot naturally expand to Montgomery-form elliptic curves. In this paper, we propose a fast scalar multiplication method on Montgomery-form elliptic curves by using a pre-computed table for the first time. Our method is 1.6 times as fast as the known method for Montgomery-form elliptic curves under the practical conditions that the size of the definition field is 160 bits and the memory size used for the pre-computed table is 3.2 KB.

  • A Super-Programming Technique for Large Sparse Matrix Multiplication on PC Clusters

    Dejiang JIN  Sotirios G. ZIAVRAS  

     
    PAPER-Scientific and Engineering Computing with Applications

      Vol:
    E87-D No:7
      Page(s):
    1774-1781

    The multiplication of large spare matrices is a basic operation in many scientific and engineering applications. There exist some high-performance library routines for this operation. They are often optimized based on the target architecture. For a parallel environment, it is essential to partition the entire operation into well balanced tasks and assign them to individual processing elements. Most of the existing techniques partition the given matrices based on some kind of workload estimation. For irregular sparse matrices on PC clusters, however, the workloads may not be well estimated in advance. Any approach other than run-time dynamic partitioning may degrade performance. In this paper, we apply our super-programming approach to parallel large matrix multiplication on PC clusters. In our approach, tasks are partitioned into super-instructions that are dynamically assigned to member computer nodes. Thus, the load balancing logic is separated from the computing logic; the former is taken over by the runtime environment. Our super-programming approach facilitates ease of program development and targets high efficiency in dynamic load balancing. Workloads can be balanced effectively and the optimization overhead is small. The results prove the viability of our approach.

  • Correction on "A Scalar Multiplication Algorithm with Recovery of the y-Coordinate on the Montgomery Form and Analysis of Efficiency for Elliptic Curve Cryptosystems"

    Jiin-Chiou CHENG  Wen-Chung KUO  Chi-Sung LAIH  

     
    LETTER-Information Security

      Vol:
    E87-A No:7
      Page(s):
    1827-1829

    In, Okeya and Sakurai proposed the recovery of the y-coordinate on a Montgomery-form elliptic curve. With their method, it can calculate efficiently coordinates of scalar multiplication of point, in which we need only x-coordinate and finally, (x,y) of the terminal point can be recovered. The method is very suitable for some applications such as ECDSA-V and MQV, etc. Unfortunately, there is a significant fault in that paper. Thus, many results about computation amount are wrong due to the significant fault. First, we will show this fault, and then raise the correction of the significant fault. Finally, Table A1 about comparison of computation amount in is also corrected.

  • Efficient Squaring of Large Integers

    Wu-Chuan YANG  Peng-Yueh HSEIH  Chi-Sung LAIH  

     
    LETTER

      Vol:
    E87-A No:5
      Page(s):
    1189-1192

    The efficient squaring algorithm is an important role in large integer arithmetic. All multiplication algorithms can be used for squaring large integers, but their performance can be greatly improved by using the standard squaring algorithm. The standard squaring algorithm is quite well-known, but unfortunately there is an improper carry handling bug in it. Recently, Guajardo and Paar proposed a modified squaring algorithm to fix the bug in the standard squaring algorithm. In this paper, we first point out that there is still an error-indexing bug in the Guajardo-Paar squaring algorithm. Then, we propose a new efficient squaring algorithm that not only avoids the bugs in both the standard squaring algorithm and the Guajardo-Paar squaring algorithm but also improves the performance in squaring computation. Our analyses and our simulations indicate that the proposed squaring algorithm is about 2.5 times faster in comparison with the standard multiplication algorithm in Pentium Series CPU. The performance of 1024-bit RSA cryptosystem can be saved 34.3% by using the proposed squaring algorithm to replace the standard multiplication.

  • Fast Elliptic Curve Multiplications with SIMD Operations

    Tetsuya IZU  Tsuyoshi TAKAGI  

     
    PAPER-Asymmetric Cipher

      Vol:
    E87-A No:1
      Page(s):
    85-93

    The Single Instruction, Multiple Data (SIMD) architecture enables computation in parallel on a single processor. The SIMD operations are implemented on some processors such as Pentium 3/4, Athlon, SPARC, or even on smart cards. This paper proposes efficient algorithms for assembling an elliptic curve addition (ECADD), doubling (ECDBL), and k-iterated ECDBL (k-ECDBL) with SIMD operations. We optimize the number of auxiliary variables and the order of basic field operations used for these addition formulas. If an addition chain has k-bit zero run, we can replace k-time ECDBLs to the proposed faster k-ECDBL and the total efficiency of the scalar multiplication can be improved. Using the singed binary chain, we can compute a scalar multiplication about 10% faster than the previously fastest algorithm proposed by Aoki et al. Combined with the sliding window method or the width-w NAF window method, we also achieve about 10% faster parallelized scalar multiplication algorithms with SIMD operations. For the implementation on smart cards, we establish two fast parallelized scalar multiplication algorithms with SIMD resistant against side channel attacks.

41-60hit(86hit)