Quantcast
Channel: Intel® Many Integrated Core Architecture
Viewing all articles
Browse latest Browse all 1347

Achieving peak on Xeon Phi

$
0
0

Hi,

I am on a corei7 quad core machine with ASUS P9X79WS motherboard and Xeon Phi 3120A card installed.

Operating system is RHEL 6.4 with mpss 3.1 for phi and parallel_sutdio_2013 SP1 installed.

Just for detail, the phi card has 57 cores, with capability of about 1003 GFlops for double precision.

I am seeing some performance issues that I don't understand.

When I time MKL's parallel DGEMM on phi card, it is getting 300GFlops, which is about 30% of peak.

Note that I am doing native execution.

Now this performance is not matching what is posted here http://software.intel.com/en-us/intel-mkl/ (achieving about 80% of peak).

So, my first question is, is this difference solely because I am using low-end phi card so there are limitations?

 

After seeing this, I wrote a test program that tries to achieve peak with assembly language.

The function is simple. It runs a loop that iterates for 25,000,000 times and in each iteration, I am doing 30 independent FMA instructions and unrolled 8 times. So the total flop in each iteration is 30x8x2x8 = 3840 . Note that, this means I am doing 25000000x3840 floating point operations without accessing any memory.

Now if I run this code serially, I get 8.74 Gflops which is basically the serial peak (8.8 Gflops).

If I run this code in parallel with 2 threads on 1 core, I get 17.4 Gflops which is basically the peak for 1 core (17.6 Gflops).

Now the problem is, if I run the same code in parallel with 2 threads per core and using 56 cores (112 threads), I only get 89% of peak.

But if I run it with 4 threads per core i.e. a total of 224 threads, I get 99% of peak, which is what I expect.

So, my second question is, even when I have no memory access at all, why did I need 4 threads to achieve peak?

Is there any other latency that we don't know about that gets hidden by 4 threads per core?

Can someone please clarify?

Sorry for the long post and Thank you for reading.


Viewing all articles
Browse latest Browse all 1347

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>