I have a system with the following configuration:
- Xeon Phi 31s1p (57 cores, 8Gb RAM) with MPSS 3.6 installed;
- Intel I7 3820 @3.6GHz;
- Asus P9X79WS (BIOS version 4802 - the most recent);
- 8Gb RAM;
- Sapphire Radeon HD 7990 video card
The operating system is Windows 10 x64.
I've tried the Intel tutorial about automatically offloading work from Matlab to the Phi (Using Intel® Math Kernel Library with MathWorks* MATLAB* on Intel® Xeon Phi™ Coprocessor System).
Since 31s1p has only 8Gb of RAM, I've used set MKL_MIC_MAX_MEMORY=8G instead of the tutorial's set MKL_MIC_MAX_MEMORY=16G.
The MATLAB version is R2015a.
The problem I am experiencing is it is significantly faster to run the plain vanilla matrix multiplication code on the i7 than on the Phi:
A = rand(10000, 10000); B = rand(10000, 10000); tic C = A*B; toc
The Phi takes anywhere from 300s to 600 seconds, during which time the system is almost frozen. Sometimes the micsmc shows a short spike on the Phi, many other times it shows nothing (it could be that things do happen on the Phi even when the interface shows nothing, but since the system is frozen, the image doesn't get refreshed).
On the other hand, when I run the LAPACK benchmark in native mode on the Phi (no Matlab involved), everything goes as per the tutorial and I get some 700+ Gflops/s
Any idea what is going wrong?
Thanks!