Quantcast
Channel: Intel® Many Integrated Core Architecture
Viewing all articles
Browse latest Browse all 1347

Why does the Xeon Phi speed up more than 16 times when using Vector Instructions?

$
0
0

Hello,

The question comes from following code:

float fa[128] __attribute__((align(64)));
float fb[128] __attribute__((align(64)));

for(j=0; j<100000000; j++)
{
for(k=0; k<128; k++)
{
fa[k]=a*fa[k]+fb[k];
}
}

When i compile it with icc and -no-vec option it takes about 124 s to complete and with auto-vectorization it only needs 1.5 s. This means there is a speedup of about 80x even though the vector units can only process 16 Floats at once.

Doing the same on an Intel Xeon E5-1620 v2 @ 3.70GHz results in 5,6 s with -no-vec and 1.5 s with auto-vectorization.

All testswere done using only 1 core.

Why does the Xeon Phi speed up so good with Vector Instructions and the Xeon doesnt? Shouldnt the Xeon speed up 8 times, as the Vector registers are 256 bit?


Viewing all articles
Browse latest Browse all 1347


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>