In this letter, we present a novel single-precision floating-point multiply-accumulator (FNA-MAC) to achieve lower hardware resource, reduced computing latency and improved computing accuracy for continuous dot product operations. By further fusing the normalization and alignment in the traditional FMA algorithm, the proposed architecture eliminates the first N-1 normalization and rounding operations for an N-point dot product, and preserves the precision of interim results in a significant bit size that is twice of that in the traditional methods. The normalization and rounding of the final result is processed at the cost of consuming an additional multiply-add operation. The simulation results show that the improvement in computational accuracy is significant. Meanwhile, when comparing to a recently published FMA design, the proposed FNA-MAC can reduce the slice look-up table/flip-flop resource and computing latency by a fact of 18%, 33.3%, respectively.
Min YUAN
Zhejiang University
Qianjian XING
Zhejiang University
Zhenguo MA
Zhejiang University
Feng YU
Zhejiang University
Yingke XU
Zhejiang University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Min YUAN, Qianjian XING, Zhenguo MA, Feng YU, Yingke XU, "A Fused Continuous Floating-Point MAC on FPGA" in IEICE TRANSACTIONS on Fundamentals,
vol. E101-A, no. 9, pp. 1594-1598, September 2018, doi: 10.1587/transfun.E101.A.1594.
Abstract: In this letter, we present a novel single-precision floating-point multiply-accumulator (FNA-MAC) to achieve lower hardware resource, reduced computing latency and improved computing accuracy for continuous dot product operations. By further fusing the normalization and alignment in the traditional FMA algorithm, the proposed architecture eliminates the first N-1 normalization and rounding operations for an N-point dot product, and preserves the precision of interim results in a significant bit size that is twice of that in the traditional methods. The normalization and rounding of the final result is processed at the cost of consuming an additional multiply-add operation. The simulation results show that the improvement in computational accuracy is significant. Meanwhile, when comparing to a recently published FMA design, the proposed FNA-MAC can reduce the slice look-up table/flip-flop resource and computing latency by a fact of 18%, 33.3%, respectively.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/transfun.E101.A.1594/_p
Copy
@ARTICLE{e101-a_9_1594,
author={Min YUAN, Qianjian XING, Zhenguo MA, Feng YU, Yingke XU, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={A Fused Continuous Floating-Point MAC on FPGA},
year={2018},
volume={E101-A},
number={9},
pages={1594-1598},
abstract={In this letter, we present a novel single-precision floating-point multiply-accumulator (FNA-MAC) to achieve lower hardware resource, reduced computing latency and improved computing accuracy for continuous dot product operations. By further fusing the normalization and alignment in the traditional FMA algorithm, the proposed architecture eliminates the first N-1 normalization and rounding operations for an N-point dot product, and preserves the precision of interim results in a significant bit size that is twice of that in the traditional methods. The normalization and rounding of the final result is processed at the cost of consuming an additional multiply-add operation. The simulation results show that the improvement in computational accuracy is significant. Meanwhile, when comparing to a recently published FMA design, the proposed FNA-MAC can reduce the slice look-up table/flip-flop resource and computing latency by a fact of 18%, 33.3%, respectively.},
keywords={},
doi={10.1587/transfun.E101.A.1594},
ISSN={1745-1337},
month={September},}
Copy
TY - JOUR
TI - A Fused Continuous Floating-Point MAC on FPGA
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 1594
EP - 1598
AU - Min YUAN
AU - Qianjian XING
AU - Zhenguo MA
AU - Feng YU
AU - Yingke XU
PY - 2018
DO - 10.1587/transfun.E101.A.1594
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E101-A
IS - 9
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - September 2018
AB - In this letter, we present a novel single-precision floating-point multiply-accumulator (FNA-MAC) to achieve lower hardware resource, reduced computing latency and improved computing accuracy for continuous dot product operations. By further fusing the normalization and alignment in the traditional FMA algorithm, the proposed architecture eliminates the first N-1 normalization and rounding operations for an N-point dot product, and preserves the precision of interim results in a significant bit size that is twice of that in the traditional methods. The normalization and rounding of the final result is processed at the cost of consuming an additional multiply-add operation. The simulation results show that the improvement in computational accuracy is significant. Meanwhile, when comparing to a recently published FMA design, the proposed FNA-MAC can reduce the slice look-up table/flip-flop resource and computing latency by a fact of 18%, 33.3%, respectively.
ER -