Montgomery modular multiplication is one of the most efficient algorithms for modular multiplication of large integers. On resource-constraint embedded processors, memory-access operations play an important role as arithmetic operations in the modular multiplication. To improve the efficiency of Montgomery modular multiplication on embedded processors, this paper concentrates on reducing the memory-access operations through adding a few working registers. We first revisit previous popular Montgomery modular multiplication algorithms, and then present improved algorithms for Montgomery modular multiplication and squaring for arbitrary prime fields. The algorithms adopt the general ideas of hybrid multiplication algorithm proposed by Gura and lazy doubling algorithm proposed by Lee. By careful optimization and redesign, we propose novel implementations for Montgomery multiplication and squaring called coarsely integrated product and operand hybrid scanning algorithm (CIPOHS) and coarsely integrated lazy doubling algorithm (CILD). Then, we implement the algorithms on general MIPS64 processor and OCTEON CN6645 processor equipped with specific multiply-add instructions. Experiments show that CIPOHS and CILD offer the best performance both on the general MIPS64 and OCTEON CN6645 processors. But the proposed algorithms have obvious advantages for the processors with specific multiply-add instructions such as OCTEON CN6645. When the modulus is 2048 bits, the CIPOHS and CILD outperform the CIOS algorithm by a factor of 47% and 58%, respectively.
Yang LI
Chinese Academy of Sciences
Jinlin WANG
Chinese Academy of Sciences
Xuewen ZENG
Chinese Academy of Sciences
Xiaozhou YE
Chinese Academy of Sciences
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Yang LI, Jinlin WANG, Xuewen ZENG, Xiaozhou YE, "Fast Montgomery Modular Multiplication and Squaring on Embedded Processors" in IEICE TRANSACTIONS on Communications,
vol. E100-B, no. 5, pp. 680-690, May 2017, doi: 10.1587/transcom.2016EBP3189.
Abstract: Montgomery modular multiplication is one of the most efficient algorithms for modular multiplication of large integers. On resource-constraint embedded processors, memory-access operations play an important role as arithmetic operations in the modular multiplication. To improve the efficiency of Montgomery modular multiplication on embedded processors, this paper concentrates on reducing the memory-access operations through adding a few working registers. We first revisit previous popular Montgomery modular multiplication algorithms, and then present improved algorithms for Montgomery modular multiplication and squaring for arbitrary prime fields. The algorithms adopt the general ideas of hybrid multiplication algorithm proposed by Gura and lazy doubling algorithm proposed by Lee. By careful optimization and redesign, we propose novel implementations for Montgomery multiplication and squaring called coarsely integrated product and operand hybrid scanning algorithm (CIPOHS) and coarsely integrated lazy doubling algorithm (CILD). Then, we implement the algorithms on general MIPS64 processor and OCTEON CN6645 processor equipped with specific multiply-add instructions. Experiments show that CIPOHS and CILD offer the best performance both on the general MIPS64 and OCTEON CN6645 processors. But the proposed algorithms have obvious advantages for the processors with specific multiply-add instructions such as OCTEON CN6645. When the modulus is 2048 bits, the CIPOHS and CILD outperform the CIOS algorithm by a factor of 47% and 58%, respectively.
URL: https://global.ieice.org/en_transactions/communications/10.1587/transcom.2016EBP3189/_p
Copy
@ARTICLE{e100-b_5_680,
author={Yang LI, Jinlin WANG, Xuewen ZENG, Xiaozhou YE, },
journal={IEICE TRANSACTIONS on Communications},
title={Fast Montgomery Modular Multiplication and Squaring on Embedded Processors},
year={2017},
volume={E100-B},
number={5},
pages={680-690},
abstract={Montgomery modular multiplication is one of the most efficient algorithms for modular multiplication of large integers. On resource-constraint embedded processors, memory-access operations play an important role as arithmetic operations in the modular multiplication. To improve the efficiency of Montgomery modular multiplication on embedded processors, this paper concentrates on reducing the memory-access operations through adding a few working registers. We first revisit previous popular Montgomery modular multiplication algorithms, and then present improved algorithms for Montgomery modular multiplication and squaring for arbitrary prime fields. The algorithms adopt the general ideas of hybrid multiplication algorithm proposed by Gura and lazy doubling algorithm proposed by Lee. By careful optimization and redesign, we propose novel implementations for Montgomery multiplication and squaring called coarsely integrated product and operand hybrid scanning algorithm (CIPOHS) and coarsely integrated lazy doubling algorithm (CILD). Then, we implement the algorithms on general MIPS64 processor and OCTEON CN6645 processor equipped with specific multiply-add instructions. Experiments show that CIPOHS and CILD offer the best performance both on the general MIPS64 and OCTEON CN6645 processors. But the proposed algorithms have obvious advantages for the processors with specific multiply-add instructions such as OCTEON CN6645. When the modulus is 2048 bits, the CIPOHS and CILD outperform the CIOS algorithm by a factor of 47% and 58%, respectively.},
keywords={},
doi={10.1587/transcom.2016EBP3189},
ISSN={1745-1345},
month={May},}
Copy
TY - JOUR
TI - Fast Montgomery Modular Multiplication and Squaring on Embedded Processors
T2 - IEICE TRANSACTIONS on Communications
SP - 680
EP - 690
AU - Yang LI
AU - Jinlin WANG
AU - Xuewen ZENG
AU - Xiaozhou YE
PY - 2017
DO - 10.1587/transcom.2016EBP3189
JO - IEICE TRANSACTIONS on Communications
SN - 1745-1345
VL - E100-B
IS - 5
JA - IEICE TRANSACTIONS on Communications
Y1 - May 2017
AB - Montgomery modular multiplication is one of the most efficient algorithms for modular multiplication of large integers. On resource-constraint embedded processors, memory-access operations play an important role as arithmetic operations in the modular multiplication. To improve the efficiency of Montgomery modular multiplication on embedded processors, this paper concentrates on reducing the memory-access operations through adding a few working registers. We first revisit previous popular Montgomery modular multiplication algorithms, and then present improved algorithms for Montgomery modular multiplication and squaring for arbitrary prime fields. The algorithms adopt the general ideas of hybrid multiplication algorithm proposed by Gura and lazy doubling algorithm proposed by Lee. By careful optimization and redesign, we propose novel implementations for Montgomery multiplication and squaring called coarsely integrated product and operand hybrid scanning algorithm (CIPOHS) and coarsely integrated lazy doubling algorithm (CILD). Then, we implement the algorithms on general MIPS64 processor and OCTEON CN6645 processor equipped with specific multiply-add instructions. Experiments show that CIPOHS and CILD offer the best performance both on the general MIPS64 and OCTEON CN6645 processors. But the proposed algorithms have obvious advantages for the processors with specific multiply-add instructions such as OCTEON CN6645. When the modulus is 2048 bits, the CIPOHS and CILD outperform the CIOS algorithm by a factor of 47% and 58%, respectively.
ER -