Insufficient Vectorization: A New Method to Exploit Superword Level Parallelism

Wei GAO; Lin HAN; Rongcai ZHAO; Yingying LI; Jian LIU

doi:10.1587/transinf.2016EDP7236

IEICE TRANSACTIONS on Information

Insufficient Vectorization: A New Method to Exploit Superword Level Parallelism

Wei GAO, Lin HAN, Rongcai ZHAO, Yingying LI, Jian LIU

Full Text Views

0

Cite this

Summary :

Single-instruction multiple-data (SIMD) extension provides an energy-efficient platform to scale the performance of media and scientific applications while still retaining post-programmability. However, the major challenge is to translate the parallel resources of the SIMD hardware into real application performance. Currently, all the slots in the vector register are used when compilers exploit SIMD parallelism of programs, which can be called sufficient vectorization. Sufficient vectorization means all the data in the vector register is valid. Because all the slots which vector register provides must be used, the chances of vectorizing programs with low SIMD parallelism are abandoned by sufficient vectorization method. In addition, the speedup obtained by full use of vector register sometimes is not as great as that obtained by partial use. Specifically, the length of vector register provided by SIMD extension becomes longer, sufficient vectorization method cannot exploit the SIMD parallelism of programs completely. Therefore, insufficient vectorization method is proposed, which refer to partial use of vector register. First, the adaptation scene of insufficient vectorization is analyzed. Second, the methods of computing inter-iteration and intra-iteration SIMD parallelism for loops are put forward. Furthermore, according to the relationship between the parallelism and vector factor a method is established to make the choice of vectorization method, in order to vectorize programs as well as possible. Finally, code generation strategy for insufficient vectorization is presented. Benchmark test results show that insufficient vectorization method vectorized more programs than sufficient vectorization method by 107.5% and the performance achieved by insufficient vectorization method is 12.1% higher than that achieved by sufficient vectorization method.

Publication: IEICE TRANSACTIONS on Information Vol.E100-D No.1 pp.91-106

Publication Date: 2017/01/01

Publicized: 2016/09/29

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2016EDP7236

Type of Manuscript: PAPER

Category: Software System

Authors

Wei GAO
  State Key Laboratory of Mathematical Engineering and Advanced Computing
Lin HAN
  State Key Laboratory of Mathematical Engineering and Advanced Computing
Rongcai ZHAO
  State Key Laboratory of Mathematical Engineering and Advanced Computing
Yingying LI
  State Key Laboratory of Mathematical Engineering and Advanced Computing
Jian LIU
  Nanjing University of Posts and Telecommunications

Keyword

SIMD extension, SIMD parallelism, vector register, insufficient vectorization

Cite this

Copy

Wei GAO, Lin HAN, Rongcai ZHAO, Yingying LI, Jian LIU, "Insufficient Vectorization: A New Method to Exploit Superword Level Parallelism" in IEICE TRANSACTIONS on Information, vol. E100-D, no. 1, pp. 91-106, January 2017, doi: 10.1587/transinf.2016EDP7236.
Abstract: Single-instruction multiple-data (SIMD) extension provides an energy-efficient platform to scale the performance of media and scientific applications while still retaining post-programmability. However, the major challenge is to translate the parallel resources of the SIMD hardware into real application performance. Currently, all the slots in the vector register are used when compilers exploit SIMD parallelism of programs, which can be called sufficient vectorization. Sufficient vectorization means all the data in the vector register is valid. Because all the slots which vector register provides must be used, the chances of vectorizing programs with low SIMD parallelism are abandoned by sufficient vectorization method. In addition, the speedup obtained by full use of vector register sometimes is not as great as that obtained by partial use. Specifically, the length of vector register provided by SIMD extension becomes longer, sufficient vectorization method cannot exploit the SIMD parallelism of programs completely. Therefore, insufficient vectorization method is proposed, which refer to partial use of vector register. First, the adaptation scene of insufficient vectorization is analyzed. Second, the methods of computing inter-iteration and intra-iteration SIMD parallelism for loops are put forward. Furthermore, according to the relationship between the parallelism and vector factor a method is established to make the choice of vectorization method, in order to vectorize programs as well as possible. Finally, code generation strategy for insufficient vectorization is presented. Benchmark test results show that insufficient vectorization method vectorized more programs than sufficient vectorization method by 107.5% and the performance achieved by insufficient vectorization method is 12.1% higher than that achieved by sufficient vectorization method.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2016EDP7236/_p

Copy

@ARTICLE{e100-d_1_91,
author={Wei GAO, Lin HAN, Rongcai ZHAO, Yingying LI, Jian LIU, },
journal={IEICE TRANSACTIONS on Information},
title={Insufficient Vectorization: A New Method to Exploit Superword Level Parallelism},
year={2017},
volume={E100-D},
number={1},
pages={91-106},
abstract={Single-instruction multiple-data (SIMD) extension provides an energy-efficient platform to scale the performance of media and scientific applications while still retaining post-programmability. However, the major challenge is to translate the parallel resources of the SIMD hardware into real application performance. Currently, all the slots in the vector register are used when compilers exploit SIMD parallelism of programs, which can be called sufficient vectorization. Sufficient vectorization means all the data in the vector register is valid. Because all the slots which vector register provides must be used, the chances of vectorizing programs with low SIMD parallelism are abandoned by sufficient vectorization method. In addition, the speedup obtained by full use of vector register sometimes is not as great as that obtained by partial use. Specifically, the length of vector register provided by SIMD extension becomes longer, sufficient vectorization method cannot exploit the SIMD parallelism of programs completely. Therefore, insufficient vectorization method is proposed, which refer to partial use of vector register. First, the adaptation scene of insufficient vectorization is analyzed. Second, the methods of computing inter-iteration and intra-iteration SIMD parallelism for loops are put forward. Furthermore, according to the relationship between the parallelism and vector factor a method is established to make the choice of vectorization method, in order to vectorize programs as well as possible. Finally, code generation strategy for insufficient vectorization is presented. Benchmark test results show that insufficient vectorization method vectorized more programs than sufficient vectorization method by 107.5% and the performance achieved by insufficient vectorization method is 12.1% higher than that achieved by sufficient vectorization method.},
keywords={},
doi={10.1587/transinf.2016EDP7236},
ISSN={1745-1361},
month={January},}

Copy

TY - JOUR
TI - Insufficient Vectorization: A New Method to Exploit Superword Level Parallelism
T2 - IEICE TRANSACTIONS on Information
SP - 91
EP - 106
AU - Wei GAO
AU - Lin HAN
AU - Rongcai ZHAO
AU - Yingying LI
AU - Jian LIU
PY - 2017
DO - 10.1587/transinf.2016EDP7236
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E100-D
IS - 1
JA - IEICE TRANSACTIONS on Information
Y1 - January 2017
AB - Single-instruction multiple-data (SIMD) extension provides an energy-efficient platform to scale the performance of media and scientific applications while still retaining post-programmability. However, the major challenge is to translate the parallel resources of the SIMD hardware into real application performance. Currently, all the slots in the vector register are used when compilers exploit SIMD parallelism of programs, which can be called sufficient vectorization. Sufficient vectorization means all the data in the vector register is valid. Because all the slots which vector register provides must be used, the chances of vectorizing programs with low SIMD parallelism are abandoned by sufficient vectorization method. In addition, the speedup obtained by full use of vector register sometimes is not as great as that obtained by partial use. Specifically, the length of vector register provided by SIMD extension becomes longer, sufficient vectorization method cannot exploit the SIMD parallelism of programs completely. Therefore, insufficient vectorization method is proposed, which refer to partial use of vector register. First, the adaptation scene of insufficient vectorization is analyzed. Second, the methods of computing inter-iteration and intra-iteration SIMD parallelism for loops are put forward. Furthermore, according to the relationship between the parallelism and vector factor a method is established to make the choice of vectorization method, in order to vectorize programs as well as possible. Finally, code generation strategy for insufficient vectorization is presented. Benchmark test results show that insufficient vectorization method vectorized more programs than sufficient vectorization method by 107.5% and the performance achieved by insufficient vectorization method is 12.1% higher than that achieved by sufficient vectorization method.
ER -

IEICE TRANSACTIONS on Information