Hey everyone,
I'm working on a simple financial application (actually benchmarking CPU vs MIC), the first version of the code is without intrinsics function (the compiler is vectorizing the loops) and I wanted to try with the intrinsics. Here is my problem on the CPU, I can observe a gain of performance of 30% with the m256 intrinsics function (vs the CPU without intrinsics) but on the MIC with the m512 the performance is worst than the MIC without the intrinsics (OpenMP + intrinsics), is it normal ?
I can not post the code because it is too big but I can maybe try to reproduce it on a simple piece of code.
Thank you
GS