In the execution on a smart card, elliptic curve cryptosystems have to be secure against side channel attacks such as the simple power analysis (SPA), the differential power analysis (DPA), and the refined power analysis (RPA), and so on. MMM-algorithm proposed by Mamiya, Miyaji, and Morimoto is a scalar multiplication algorithm secure against SPA, DPA, and RPA, which can decrease the computational complexity by increasing the size of a pre-computed table. However, it provides only 4 different cases of pre-computed tables. From the practical point of view, a wider range of time-memory tradeoffs is usually desired. This paper generalizes MMM-algorithm to improve the flexibility of tables as well as the computational complexity. Our improved algorithm is secure, efficient and flexible for the storage size.
Modern microprocessors achieve high application performance at the acceptable level of power dissipation. In terms of power to performance trade-off, the instruction window is particularly important. This is because enlarging the window size achieves high performance but naive scaling of the conventional instruction window can severely increase the complexity and power consumption. In this paper, we propose low-power instruction window techniques for contemporary microprocessors. First, the small reorder buffer (SROB) reduces power dissipation by deferred allocation and early release. The deferred allocation delays the SROB allocation of instructions until their all data dependencies are resolved. Then, the instructions are executed in program order and they are released faster from the SROB. This results in higher resource utilization and low power consumption. Second, we replace a conventional issue queue by a direct lookup table (DLT) with an efficient tag translation technique. The translation scheme resolves the instruction dependency, especially for the case of one producer to multiple consumers. The efficiency of the translation scheme stems from the fact that the vast majority of instruction dependency exists within a basic block. Experimental results show that our proposed design reduces the power consumption significantly for SPEC2000 benchmarks.
Shunji KOZAKI Kazuto MATSUO Yasutomo SHIMBARA
Scalar multiplication methods using the Frobenius maps are known for efficient methods to speed up (hyper)elliptic curve cryptosystems. However, those methods are not efficient for the cryptosystems constructed on fields of small extension degrees due to costs of the field operations. Iijima et al. showed that one can use certain automorphisms on the quadratic twists of elliptic curves for fast scalar multiplications without the drawback of the Frobenius maps. This paper shows an extension of the automorphisms on the Jacobians of hyperelliptic curves of arbitrary genus.
Erik DAHMEN Katsuyuki OKEYA Tsuyoshi TAKAGI
The most time consuming operation to verify a signature with the Elliptic Curve Digital Signature Algorithm is a multi-scalar multiplication with two scalars. Efficient methods for its computation are the Shamir method and the Interleave method, whereas the performance of those methods can be improved by using general base-2 representations of the scalars. In exchange for the speed-up, those representations require the precomputation of several points that must be stored. In the case of two precomputed points, the Interleave method and the Shamir method provide the same, optimal efficiency. In the case of more precomputed points, only the Interleave method can be sped-up in an optimal way and is currently more efficient than the Shamir method. This paper proposes a new general base-2 representation of the scalars that can be used to speed up the Shamir method. It requires the precomputation of ten points and is more efficient than any other representation that also requires ten precomputed points. Therefore, the proposed method is the first to improve the Shamir method such that it is faster than the Interleave method.
This paper focuses on algorithms for an efficient scalar multiplication. It proposes two algorithms for computing points of the form 2kP in affine coordinates. One works for k=2, and the other works for an arbitrary natural number k. The efficiency of these algorithms is based on a trade-off between a field inversion and several field multiplications. Montgomery trick is used to implement this trade-off. Since a field inversion is usually more expensive than 10 field multiplications, the proposed algorithms are efficient in comparison with existing ones.
Masanobu KATAGI Toru AKISHITA Izuru KITAMURA Tsuyoshi TAKAGI
It has recently been reported that the performance of hyperelliptic curve cryptosystems (HECC) is competitive to that of elliptic curve cryptosystems (ECC). Concerning the security of HECC, the theta divisors play an important role. The scalar multiplication using a random base point is vulnerable to an exceptional procedure attack, which is a kind of side-channel attacks, using theta divisors. In the case of cryptographic protocols of the scalar multiplication using fixed base point, however, the exceptional procedure attack is not applicable. First, we present novel efficient scalar multiplication using theta divisors, which is the positive application of theta divisors on HECC. Second, we develop a window-based method using theta divisors that is secure against side-channel attacks. It is not obvious how to construct a base point D such that all pre-computed points are theta divisors. We present an explicit algorithm for generating such divisors.
The Gallant-Lambert-Vanstone method [14](GLV method for short) is a scalar multiplication method for elliptic curve cryptography (ECC). In WAP WTLS [49], SEC 2 [44], ANSI X9.62 [1] and X9.63 [2], several domain parameters for applications of the GLV method are described. Curves with those parameters have efficiently-computable endomorphisms. Recently the GLV method for Jacobians of hyperelliptic curve (HEC) has also been studied. In this paper, we discuss applications of the GLV method to curves with real multiplication (RM). It is the first time to use RM for efficient scalar multiplication as far as we know. We describe the general algorithm for using such RM, and we show that some genus 2 curves with RM have enough effciency to be used in the GLV method as in the previous CM case. Moreover, we will see that such RM curves can be obtained abundantly unlike the previously proposed CM curves of genus 2.
This paper proposes fast elliptic curve multiplication algorithms resistant against side channel attacks, based on the Montgomery-type scalar multiplication. The proposed scalar multiplications can be applied to all curves over prime fields, e.g., any standardized curves over finite fields with characteristic larger than 3. The method utilizes the addition formulas xECDBL and xECADD assembled by only x-coordinates of points, and is applicable for any types of curves over finite fields. Then, we encapsulate two addition formulas into one formula xECADDDBL, which accomplishes a faster computation because several auxiliary variables of two formulas can be shared. We also develop a novel addition chain for the new formula, with which we can compute scalar multiplications. The improvement of our scalar multiplications over previous Coron's dummy operation method is about 18% for a 160-bit scalar multiplication. Our method requires no table-up of precomputed points and it is suitable for the implementation on memory constraint computing architectures, e.g., smart cards. Moreover, we optimize the proposed algorithms for parallelized implementations with SIMD operations. Compared with the similar scheme proposed by Fischer et al., our scheme is about 16% faster.
Montgomery-form elliptic curves have the advantage of faster arithmetic than Weierstrass-form elliptic curves. The dominant operation of the Elliptic Curve Cryptosystem (ECC) is scalar multiplication of points on an elliptic curve, and it usually includes scalar multiplication of a fixed base point of ECC. For Weierstrass-form elliptic curves, accelerating methods of scalar multiplication by using a pre-computed table of the fixed point have been widely studied. However, such methods cannot naturally expand to Montgomery-form elliptic curves. In this paper, we propose a fast scalar multiplication method on Montgomery-form elliptic curves by using a pre-computed table for the first time. Our method is 1.6 times as fast as the known method for Montgomery-form elliptic curves under the practical conditions that the size of the definition field is 160 bits and the memory size used for the pre-computed table is 3.2 KB.
The Single Instruction, Multiple Data (SIMD) architecture enables computation in parallel on a single processor. The SIMD operations are implemented on some processors such as Pentium 3/4, Athlon, SPARC, or even on smart cards. This paper proposes efficient algorithms for assembling an elliptic curve addition (ECADD), doubling (ECDBL), and k-iterated ECDBL (k-ECDBL) with SIMD operations. We optimize the number of auxiliary variables and the order of basic field operations used for these addition formulas. If an addition chain has k-bit zero run, we can replace k-time ECDBLs to the proposed faster k-ECDBL and the total efficiency of the scalar multiplication can be improved. Using the singed binary chain, we can compute a scalar multiplication about 10% faster than the previously fastest algorithm proposed by Aoki et al. Combined with the sliding window method or the width-w NAF window method, we also achieve about 10% faster parallelized scalar multiplication algorithms with SIMD operations. For the implementation on smart cards, we establish two fast parallelized scalar multiplication algorithms with SIMD resistant against side channel attacks.
Yasuyuki SAKAI Kouichi SAKURAI
The computational performance of cryptographic protocols using an elliptic curve strongly depends on the efficiency of the scalar multiplication. Some elliptic curve based cryptographic protocols, such as signature verification, require computation of multi scalar multiplications of kP+lQ, where P and Q are points on an elliptic curve. An efficient way to compute kP+lQ is to compute two scalar multiplications simultaneously, rather than computing each scalar multiplication separately. We introduce new efficient algorithms for simultaneous scalar multiplication on an elliptic curve. We also give a detailed analysis of the computational efficiency of our proposed algorithms.
Katsuyuki OKEYA Kouichi SAKURAI
We develop efficient precomputation methods of multi-scalar multiplication on ECC. We should recall that multi-scalar multiplication is required in some elliptic curve cryptosystems including the signature verification of ECDSA signature scheme. One of the known fast computation methods of multi-scalar multiplication is a simultaneous method. A simultaneous method consists of two stages; precomputation stage and evaluation stage. Precomputation stage computes points of precomputation, which are used at evaluation stage. Evaluation stage computes multi-scalar multiplication using precomputed points. In the evaluation stage of simultaneous methods, we can compute the multi-scalar multiplied point quickly because the number of additions is small. However, if we take a large window width, we have to compute an enormous number of points in precomputation stage. Hence, we have to compute an abundance of inversions, which have large computational amount. As a result, precomputation stage requires much time, as well known. Our proposed method reduces from O(22w) inversions to O(w) inversions for a window width w, using Montgomery trick. In addition, our proposed method computes uP and vQ first, then compute uP+vQ, where P,Q are elliptic points. This procedure enables us to remove unused points of precomputation. Compared with the method without Montgomery trick, our proposed method is 3.6 times faster in the case of the precomputation stage for simultaneous sliding window NAF method with window width w=3 and 160-bit scalars under the assumption that I/M=30, S/M=0.8, where I,M,S respectively denote computational amounts of inversion, multiplication and squaring on a finite field.
Hiroaki OGURO Tetsutaro KOBAYASHI
We introduce efficient algorithms for the τ-adic sliding window method, which is a scalar multiplication algorithm on Koblitz curves over F2m. The τ-adic sliding window method is divided into two parts: the precomputation part and the main computation part. Until now, there has been no efficient way to deal with the precomputation part; the required points of the elliptic curves were calculated one by one. We propose two fast algorithms for the precomputation part. One of the proposed methods decreases the cost of the precomputation part by approximately 30%. Since more points are calculated, the total cost of scalar multiplication is decreased by approximately 7.5%.
N. M. Alam CHOWDHURY Jun-ichi TAKADA Masanobu HIROSE
A novel formulation for the Scalar-field approach of Integral Equation formulation of the Measured Equation of Invariance (SIE-MEI) is derived from the scalar reciprocity relation to solve the scalar Helmholtz equation. The basics of this formulation are similar to IE-MEI method for the electromagnetic (EM) problem. The surface integral equation is derived from reciprocity relation and on-surface MEI postulates are used. As a result it generates a sparse linear system with the same number of unknowns as of Boundary Element Method (BEM) and keeps the merits in minimum storage memory requirements and CPU time consumption for computing the final matrix. IE-MEI method has been proposed for two-dimensional (2D) electromagnetic problem, but three-dimensional (3D) problem is very difficult to be extend. This scalar-field approach of IE-MEI method is identical to electromagnetic in 2D, but easily extended to the 3D scalar-field scattering problem contrary to EM problem. The numerical results of sphere and cube are verified with some rigorous or numerical solutions, which give excellent agreement.
The understanding of instruction set usage in typical DOS/Windows applications plays a very important role in designing high performance x86 compatible microprocessors. This paper presents the tools to such analysis, the analysis results, and their implications on the design of a RISC-based superscalar processor for efficient x86 instruction execution. The analyzed results are used to optimize the execution of frequently executed instructions and micro operations.
Yasuyuki SAKAI Kouichi SAKURAI
We discuss multidoubling methods for efficient elliptic scalar multiplication. The methods allows computation of 2k P directly from P without computing the intermediate points, where P denotes a randomly selected point on an elliptic curve. We introduce algorithms for elliptic curves with Montgomery form and Weierstrass form defined over finite fields with characteristic greater than 3 in terms of affine coordinates. These algorithms are faster than k repeated doublings. Moreover, we apply the algorithms to scalar multiplication on elliptic curves and analyze computational complexity. As a result of our implementation with respect to the Montgomery and Weierstrass forms in terms of affine coordinates, we achieved running time reduced by 28% and 31%, respectively, in the scalar multiplication of an elliptic curve of size 160-bit over finite fields with characteristic greater than 3.
Media processing has become one of the dominant computing workloads. In this context, SIMD instructions have been introduced in current processors to raise performance, often the main goal of microprocessor designers. Today, however, designers have become concerned with the power consumption, and in some cases low power is the main design goal (laptops). In this paper, we show that SIMD ISA extensions on a superscalar processor can be one solution to reduce power consumption and keeping a high performance level. We reduce the average power consumption by decreasing the number of instructions, the number of cache references, and using dynamic power management to transform the speedup in performance in power consumption reduction.
Katsuyuki OKEYA Kouichi SAKURAI
We present a scalar multiplication algorithm with recovery of the y-coordinate on a Montgomery-form elliptic curve over any non-binary field. The previous algorithms for scalar multiplication on a Montgomery form do not consider how to recover the y-coordinate. So although they can be applicable to certain restricted schemes (e.g. ECDH and ECDSA-S), some schemes (e.g. ECDSA-V and MQV) require scalar multiplication with recovery of the y-coordinate. We compare our proposed scalar multiplication algorithm with the traditional scalar multiplication algorithms (including Window-methods on the Weierstrass form), and discuss the Montgomery form versus the Weierstrass form in the performance of implementation with several techniques of elliptic curve cryptosystems (including ECES, ECDSA, and ECMQV). Our results clarify the advantage of the cryptographic usage of Montgomery-form elliptic curve in constrained environments such as mobile devices and smart cards.
Sun-Mo KIM Jung-Woo LEE Soo-Haeng LEE Sang-Bang CHOI
Cache memories are small fast memories used to temporarily hold the contents of main memory that are likely to be referenced by processors so as to reduce instruction and data access time. In study of cache performance, most of previous works have employed simulation-based methods. However, that kind of researches cannot precisely explain the obtained results. Moreover, when a new processor is designed, huge simulations must be performed again with several different parameters. This research classifies cache structures for superscalar processors into four types, and then represents analytical model of instruction fetch process for each cache type considering various kinds of architectural parameters such as the frequency of branch instructions in program, cache miss rate, cache miss penalty, branch misprediction frequency, and branch misprediction penalty, and etc. To prove the correctness of the proposed models, we performed extensive simulations and compared the results with the analytical models. Simulation results showed that the proposed model can estimate the expected instruction fetch rate accurately within 10% error in most cases. This paper shows that the increase of cache misses reduces the instruction fetch rate more severely than that of branch misprediction does. The model is also able to provide exact relationship between cache miss and branch misprediction for the instruction fetch analysis. The proposed model can explain the causes of performance degradation that cannot be uncovered by the simulation method only.
Elliptic curves Em: By2 = x3+Ax2+x are suitable for cryptographic use because fast addition operations can be defined over Em. In elliptic curve cryptosystems, encryption/decryption involves multiplying a point P on Em by a large integer n. In this paper, we propose a fast algorithm for computing such scalar multiplication over Em. The new algorithm requires fewer operations than previously proposed algorithms. As a result, elliptic curve cryptosystems based on Em can be speeded up by using the new algorithm.