Why does the Xeon Phi speed up more than 16 times when using Vector Instructions?

September 11, 2014, 4:59 am

Latest and popular articles on Intel Technologies

≫ Next: Authentication problems (NIS/YP and HostBased)

Hello,

The question comes from following code:

float fa[128] __attribute__((align(64)));
float fb[128] __attribute__((align(64)));

for(j=0; j<100000000; j++)
{
for(k=0; k<128; k++)
{
fa[k]=a*fa[k]+fb[k];
}
}

When i compile it with icc and -no-vec option it takes about 124 s to complete and with auto-vectorization it only needs 1.5 s. This means there is a speedup of about 80x even though the vector units can only process 16 Floats at once.

Doing the same on an Intel Xeon E5-1620 v2 @ 3.70GHz results in 5,6 s with -no-vec and 1.5 s with auto-vectorization.

All testswere done using only 1 core.

Why does the Xeon Phi speed up so good with Vector Instructions and the Xeon doesnt? Shouldnt the Xeon speed up 8 times, as the Vector registers are 256 bit?

↧