This paper studies the state-of-the-art software optimization methodology for symmetric cryptographic primitives on Pentium III and 4 processors. We aim at maximizing speed by considering the internal pipeline architecture of these processors. This is the first paper studying an optimization of ciphers on Prescott, a new core of Pentium 4. Our AES program with 128-bit key achieves 251 cycles/block on Pentium 4, which is, to our best knowledge, the fastest implementation of AES on Pentium 4. We also optimize SNOW2.0 keystream generator. Our program of SNOW2.0 runs at the rate of 2.75 µops/cycle on Pentium III, which seems the most efficient code ever made for a real-world cipher primitive. Our another interest is to optimize cryptographic primitives that essentially utilize 64-bit operations on Pentium processors. For the first example, the FOX128 block cipher, we propose a technique for speeding-up by interleaving two independent blocks using a register group separation. For another examples, we consider fast implementation of SHA512 and Whirlpool. It will be shown that the new SIMD instruction sets introduced in Pentium 4 excellently contribute to fast hashing of SHA512.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Mitsuru MATSUI, Sayaka FUKUDA, "How to Maximize Software Performance of Symmetric Primitives on Pentium III and 4" in IEICE TRANSACTIONS on Fundamentals,
vol. E89-A, no. 1, pp. 2-10, January 2006, doi: 10.1093/ietfec/e89-a.1.2.
Abstract: This paper studies the state-of-the-art software optimization methodology for symmetric cryptographic primitives on Pentium III and 4 processors. We aim at maximizing speed by considering the internal pipeline architecture of these processors. This is the first paper studying an optimization of ciphers on Prescott, a new core of Pentium 4. Our AES program with 128-bit key achieves 251 cycles/block on Pentium 4, which is, to our best knowledge, the fastest implementation of AES on Pentium 4. We also optimize SNOW2.0 keystream generator. Our program of SNOW2.0 runs at the rate of 2.75 µops/cycle on Pentium III, which seems the most efficient code ever made for a real-world cipher primitive. Our another interest is to optimize cryptographic primitives that essentially utilize 64-bit operations on Pentium processors. For the first example, the FOX128 block cipher, we propose a technique for speeding-up by interleaving two independent blocks using a register group separation. For another examples, we consider fast implementation of SHA512 and Whirlpool. It will be shown that the new SIMD instruction sets introduced in Pentium 4 excellently contribute to fast hashing of SHA512.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1093/ietfec/e89-a.1.2/_p
Copy
@ARTICLE{e89-a_1_2,
author={Mitsuru MATSUI, Sayaka FUKUDA, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={How to Maximize Software Performance of Symmetric Primitives on Pentium III and 4},
year={2006},
volume={E89-A},
number={1},
pages={2-10},
abstract={This paper studies the state-of-the-art software optimization methodology for symmetric cryptographic primitives on Pentium III and 4 processors. We aim at maximizing speed by considering the internal pipeline architecture of these processors. This is the first paper studying an optimization of ciphers on Prescott, a new core of Pentium 4. Our AES program with 128-bit key achieves 251 cycles/block on Pentium 4, which is, to our best knowledge, the fastest implementation of AES on Pentium 4. We also optimize SNOW2.0 keystream generator. Our program of SNOW2.0 runs at the rate of 2.75 µops/cycle on Pentium III, which seems the most efficient code ever made for a real-world cipher primitive. Our another interest is to optimize cryptographic primitives that essentially utilize 64-bit operations on Pentium processors. For the first example, the FOX128 block cipher, we propose a technique for speeding-up by interleaving two independent blocks using a register group separation. For another examples, we consider fast implementation of SHA512 and Whirlpool. It will be shown that the new SIMD instruction sets introduced in Pentium 4 excellently contribute to fast hashing of SHA512.},
keywords={},
doi={10.1093/ietfec/e89-a.1.2},
ISSN={1745-1337},
month={January},}
Copy
TY - JOUR
TI - How to Maximize Software Performance of Symmetric Primitives on Pentium III and 4
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 2
EP - 10
AU - Mitsuru MATSUI
AU - Sayaka FUKUDA
PY - 2006
DO - 10.1093/ietfec/e89-a.1.2
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E89-A
IS - 1
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - January 2006
AB - This paper studies the state-of-the-art software optimization methodology for symmetric cryptographic primitives on Pentium III and 4 processors. We aim at maximizing speed by considering the internal pipeline architecture of these processors. This is the first paper studying an optimization of ciphers on Prescott, a new core of Pentium 4. Our AES program with 128-bit key achieves 251 cycles/block on Pentium 4, which is, to our best knowledge, the fastest implementation of AES on Pentium 4. We also optimize SNOW2.0 keystream generator. Our program of SNOW2.0 runs at the rate of 2.75 µops/cycle on Pentium III, which seems the most efficient code ever made for a real-world cipher primitive. Our another interest is to optimize cryptographic primitives that essentially utilize 64-bit operations on Pentium processors. For the first example, the FOX128 block cipher, we propose a technique for speeding-up by interleaving two independent blocks using a register group separation. For another examples, we consider fast implementation of SHA512 and Whirlpool. It will be shown that the new SIMD instruction sets introduced in Pentium 4 excellently contribute to fast hashing of SHA512.
ER -