Fahad QURESHI Oscar GUSTAFSSON
In this work we consider optimized twiddle factor multipliers based on shift-and-add-multiplication. We propose a low-complexity structure for twiddle factors with a resolution of 32 points. Furthermore, we propose a slightly modified version of a previously reported multiplier for a resolution of 16 points with lower round-off noise. For completeness we also include results on optimal coefficients for eight-points resolution. We perform finite word length analysis for both coefficients and round-off errors and derive optimized coefficients with minimum complexity for varying requirements.
Naoki KANAYAMA Tadanori TERUYA Eiji OKAMOTO
In the present paper, we propose elliptic curve scalar multiplication methods on pairing-friendly elliptic curves. The proposed method is efficient on elliptic curves on which Atei pairing or optimal pairing is efficiently computed.
Yuko OZASA Masanori HIROTOMO Masakatu MORII
In this paper, we present a specific type of irreducible polynomial which is an irreducible m-term polynomial of degree m. Designing the parallel multiplier over GF(2m) by the quadrinomial obtained from this irreducible polynomial, its critical delay path is smaller than that of conventional multipliers for some degree m.
Daisuke SUZUKI Tsutomu MATSUMOTO
This paper describes a modular exponentiation processing method and circuit architecture that can exhibit the maximum performance of FPGA resources. The modular exponentiation architecture proposed by us comprises three main techniques. The first one is to improve the Montgomery multiplication algorithm in order to maximize the performance of the multiplication unit in an FPGA. The second one is to balance and improve the circuit delay. The third one is to ensure scalability of the circuit. Our architecture can perform fast operations using small-scale resources; in particular, it can complete a 512-bit modular exponentiation as fast as in 0.26 ms with the smallest Virtex-4 FPGA, XC4VF12-10SF363. In fact the number of SLICEs used is approx. 4200, which proves the compactness of our design. Moreover, the scalability of our design also allows 1024-, 1536-, and 2048-bit modular exponentiations to be processed in the same circuit.
Kenta NEKADO Yasuyuki NOGAMI Hidehiro KATO Yoshitaka MORIKAWA
Recently, pairing-based cryptographic application sch-emes have attracted much attentions. In order to make the schemes more efficient, not only pairing algorithm but also arithmetic operations in extension field need to be efficient. For this purpose, the authors have proposed a series of cyclic vector multiplication algorithms (CVMAs) corresponding to the adopted bases such as type-I optimal normal basis (ONB). Note here that every basis adapted for the conventional CVMAs are just special classes of Gauss period normal bases (GNBs). In general, GNB is characterized with a certain positive integer h in addition to characteristic p and extension degree m, namely type-〈h.m〉 GNB in extension field Fpm. The parameter h needs to satisfy some conditions and such a positive integer h infinitely exists. From the viewpoint of the calculation cost of CVMA, it is preferred to be small. Thus, the minimal one denoted by hmin will be adapted. This paper focuses on two remaining problems: 1) CVMA has not been expanded for general GNBs yet and 2) the minimal hmin sometimes becomes large and it causes an inefficient case. First, this paper expands CVMA for general GNBs. It will improve some critical cases with large hmin reported in the conventional works. After that, this paper shows a theorem that, for a fixed prime number r, other prime numbers modulo r uniformly distribute between 1 to r-1. Then, based on this theorem, the existence probability of type-〈hmin,m〉 GNB in Fpm and also the expected value of hmin are explicitly given.
Takaya YAMAZATO Koji NAKAO Hiraku OKADA Masaaki KATAYAMA
We consider a distributed transmission of data packet to a sink where the distance of a sensor node to a sink is much longer than the maximum communication range of each sensor node. We give a simple modification to the transmitter, i.e., multiplication of random phase before the transmission. Thanks to Turbo Code, it is possible to extend the transmission range as the received amplitude varies symbol by symbol for our scheme while whole data packet may be lost for the conventional scheme. In this letter, we report the experimental results of our scheme equivalently developed using visible light communication.
Chien-Ning CHEN Sung-Ming YEN SangJae MOON
Simple power analysis (SPA) can be employed in examining the power consumption trace of elliptic curve scalar multiplication to retrieve the computational sequence. However, SPA cannot distinguish point addition from point subtraction. The attacker still requires an exhaustive search to recover the private key when it is recoded in NAF or recoded by the 2-bit sliding window method. The average Hamming weight of an n-bit NAF recoded scalar is n/3, and an exhaustive search among the 2n/3 candidates is required. This paper shows that in a left-to-right NAF recoded or a left-to-right 2-bit sliding window manipulated scalar the relative position of nonzero bits will reveal their values. Our analysis skill reduces the number of candidates of the scalar from the naive search of 2n/3 to 22n/9 and 20.19n respectively for the cases of NAF and sliding window method.
This paper presents a new approach to precompute points [3]P, [5]P,..., [2k-1]P, for some k ≥ 2 on an elliptic curve over Fp. Those points are required for the efficient evaluation of a scalar multiplication, the most important operation in elliptic curve cryptography. The proposed method precomputes the points in affine coordinates and needs only one single field inversion for the computation. The new method is superior to all known methods that also use one field inversion, if the required memory is taken into consideration. Compared to methods that require several field inversions for the precomputation, the proposed method is faster for a broad range of ratios of field inversions and field multiplications. The proposed method benefits especially from ratios as they occur on smart cards.
Masayuki YOSHINO Katsuyuki OKEYA Camille VUILLAUME
A technique for computing the quotient (⌊ ab/n ⌋) of Euclidean divisions from the difference of two remainders (ab (mod n) - ab (mod n+1)) was proposed by Fischer and Seifert. The technique allows a 2-bit modular multiplication to work on most -bit modular multipliers. However, the cost of the quotient computation rises sharply when computing modular multiplications larger than 2 bits with a recursive approach. This paper addresses the computation cost and improves on previous 2-bit modular multiplication algorithms to return not only the remainder but also the quotient, resulting in an higher performance in the recursive approach, which becomes twice faster in the quadrupling case and four times faster in the octupling case. In addition to Euclidean multiplication, this paper proposes a new 2-bit Montgomery multiplication algorithm to return both of the remainder and the quotient.
Minseok WOO Byoungkwon MOON Daejeong KIM
A new delay-locked loop (DLL)-based multiphase generator is presented. To achieve an arbitrary integer multiplication factor, a voltage-controlled variable delay ring (VCDR) is adopted, and a new "generate and reset" (GNR) cell is developed. The whole circuit of the closed loop was designed and characterized in a 1.2-V 0.13-µm CMOS process. The simulated results show that the loop operates from 1.0 MHz to 1.2 GHz under the supply voltage of 1.2 V, and the GNR cell exhibits low supply sensitivity of 1300-ps/V.
Kazuyuki TANIMURA Ryuta NARA Shunitsu KOHARA Youhua SHI Nozomu TOGAWA Masao YANAGISAWA Tatsuo OHTSUKI
Modular multiplication is the most dominant arithmetic operation in elliptic curve cryptography (ECC), that is a type of public-key cryptography. Montgomery multiplier is commonly used to compute the modular multiplications and requires scalability because the bit length of operands varies depending on its security level. In addition, ECC is performed in GF(P) or GF(2n), and unified architecture for multipliers in GF(P) and GF(2n) is required. However, in previous works, changing frequency is necessary to deal with delay-time difference between GF(P) and GF(2n) multipliers because the critical path of the GF(P) multiplier is longer. This paper proposes unified dual-radix architecture for scalable Montgomery multiplications in GF(P) and GF(2n). This proposed architecture unifies four parallel radix-216 multipliers in GF(P) and a radix-264 multiplier in GF(2n) into a single unit. Applying lower radix to GF(P) multiplier shortens its critical path and makes it possible to compute the operands in the two fields using the same multiplier at the same frequency so that clock dividers to deal with the delay-time difference are not required. Moreover, parallel architecture in GF(P) reduces the clock cycles increased by dual-radix approach. Consequently, the proposed architecture achieves to compute a GF(P) 256-bit Montgomery multiplication in 0.28 µs. The implementation result shows that the area of the proposal is almost the same as that of previous works: 39 kgates.
Masayuki YOSHINO Katsuyuki OKEYA Camille VUILLAUME
This paper proposes novel algorithms for computing double-size modular multiplications with few modulus-dependent precomputations. Low-end devices such as smartcards are usually equipped with hardware Montgomery multipliers. However, due to progresses of mathematical attacks, security institutions such as NIST have steadily demanded longer bit-lengths for public-key cryptography, making the multipliers quickly obsolete. In an attempt to extend the lifespan of such multipliers, double-size techniques compute modular multiplications with twice the bit-length of the multipliers. Techniques are known for extending the bit-length of classical Euclidean multipliers, of Montgomery multipliers and the combination thereof, namely bipartite multipliers. However, unlike classical and bipartite multiplications, Montgomery multiplications involve modulus-dependent precomputations, which amount to a large part of an RSA encryption or signature verification. The proposed double-size technique simulates double-size multiplications based on single-size Montgomery multipliers, and yet precomputations are essentially free: in an 2048-bit RSA encryption or signature verification with public exponent e=216+1, the proposal with a 1024-bit Montgomery multiplier is at least 1.5 times faster than previous double-size Montgomery multiplications.
Young-Geun LEE Han-Sam JUNG Ki-Seok CHUNG
Many DSP applications such as FIR filtering and DCT (discrete cosine transformation) require multiplication with constants. Therefore, optimizing the performance of constant multiplication improves the overall performance of these applications. It is well-known that shifting can replace a constant multiplication if the constant is a power of two. In this paper, we extend this idea in such a way that by employing more than two barrel shifters, we can design highly efficient constant multipliers. We have found that by using two or three shifters, we can generate a large set of constants. Using these constants, we can execute a typical set of FIR or DCT applications with few errors. Furthermore, with variable precision support, we can carry out a fairly large class of DSP applications with high computational efficiency. Compared to conventional multipliers, we can achieve power savings of up to 56% with negligible computational errors.
Yuji OHKAWA Kazunori MIYAKAWA Tomoki MATSUBARA Kenji KIKUCHI Shirou SUZUKI Misao KUBOTA Norifumi EGAMI Akira KOBAYASHI
A high-sensitivity pickup tube using HARP (high-gain avalanche rushing amorphous photoconductor) photoconductive film, which makes use of the avalanche multiplication phenomenon, has been studied for making a high-sensitivity television camera. The avalanche multiplication factor, i.e., sensitivity, was increased by thickening the film. A 35-µm-thick HARP film, which was more sensitive than the previous 25-µm-thick film with an avalanche multiplication factor of about 600, and a 2/3rd-inch pickup tube using the film were developed. Measurements on the pickup tube demonstrated that it had an avalanche multiplication factor of about 1000, low lag, and high resolution. Moreover, image defects caused by shooting of intense spot lights were investigated, and it was found that exposing the film to UV light before operation and controlling the temperature of the film during operation could suppress the defects.
Masaaki SHIRASE Yukinori MIYAZAKI Tsuyoshi TAKAGI Dong-Guk HAN Dooho CHOI
Pairing-based cryptography provides us many novel cryptographic applications such as ID-based cryptosystems and efficient broadcast encryptions. The security problems in ubiquitous sensor networks have been discussed in many papers, and pairing-based cryptography is a crucial technique to solve them. Due to the limited resources in the current sensor node, it is challenged to optimize the implementation of pairings on sensor nodes. In this paper we present an efficient implementation of pairing over MICAz, which is widely used as a sensor node for ubiquitous sensor network. We improved the speed of ηT pairing by using a new efficient multiplication specialized for ATmega128L, called the block comb method and several optimization techniques to save the number of data load/store operations. The timing of ηT pairing over GF(2239) achieves about 1.93 sec, which is the fastest implementation of pairing over MICAz to the best of our knowledge. From our dramatic improvement, we now have much high possibility to make pairing-based cryptography for ubiquitous sensor networks practical.
Yasuyuki NOGAMI Yumi SAKEMI Takumi OKIMOTO Kenta NEKADO Masataka AKANE Yoshitaka MORIKAWA
For ID-based cryptography, not only pairing but also scalar multiplication must be efficiently computable. In this paper, we propose a scalar multiplication method on the circumstances that we work at Ate pairing with Barreto-Naehrig (BN) curve. Note that the parameters of BN curve are given by a certain integer, namely mother parameter. Adhering the authors' previous policy that we execute scalar multiplication on subfield-twisted curve
Yuen-Hong Alvin HO Chi-Un LEI Hing-Kit KWAN Ngai WONG
In the context of multiple constant multiplication (MCM) design, we propose a novel common sub-expression elimination (CSE) algorithm that models the optimal synthesis of coefficients into a 0-1 mixed-integer linear programming (MILP) problem with a user-defined generic logic depth constraint. We also propose an efficient solution space, which combines all minimal signed digit (MSD) representations and the shifted sum (difference) of coefficients. In the examples we demonstrate, the combination of the proposed algorithm and solution space gives a better solution comparing to existing algorithms.
In the execution on a smart card, elliptic curve cryptosystems have to be secure against side channel attacks such as the simple power analysis (SPA), the differential power analysis (DPA), and the refined power analysis (RPA), and so on. MMM-algorithm proposed by Mamiya, Miyaji, and Morimoto is a scalar multiplication algorithm secure against SPA, DPA, and RPA, which can decrease the computational complexity by increasing the size of a pre-computed table. However, it provides only 4 different cases of pre-computed tables. From the practical point of view, a wider range of time-memory tradeoffs is usually desired. This paper generalizes MMM-algorithm to improve the flexibility of tables as well as the computational complexity. Our improved algorithm is secure, efficient and flexible for the storage size.
Shunji KOZAKI Kazuto MATSUO Yasutomo SHIMBARA
Scalar multiplication methods using the Frobenius maps are known for efficient methods to speed up (hyper)elliptic curve cryptosystems. However, those methods are not efficient for the cryptosystems constructed on fields of small extension degrees due to costs of the field operations. Iijima et al. showed that one can use certain automorphisms on the quadratic twists of elliptic curves for fast scalar multiplications without the drawback of the Frobenius maps. This paper shows an extension of the automorphisms on the Jacobians of hyperelliptic curves of arbitrary genus.
This paper proposes an efficient systolic array construction method for optimal planar systolic design of the matrix multiplication. By connection network adjustment among systolic array processing element (PE), the input/output data are jumping in the systolic array for multiplication operation requirements. Various 2-D systolic array topologies, such as square topology and hexagonal topology, have been studied to construct appropriate systolic array configuration and realize high performance matrix multiplication. Based on traditional Kung-Leiserson systolic architecture, the proposed "Jumping Systolic Array (JSA)" algorithm can increase the matrix multiplication speed with less processing elements and few data registers attachment. New systolic arrays, such as square jumping array, redundant dummy latency jumping hexagonal array, and compact parallel flow jumping hexagonal array, are also proposed to improve the concurrent system operation efficiency. Experimental results prove that the JSA algorithm can realize fully concurrent operation and dominate other systolic architectures in the specific systolic array system characteristics, such as band width, matrix complexity, or expansion capability.