Quantcast
Channel: Intel® Many Integrated Core Architecture
Viewing all articles
Browse latest Browse all 1347

Intel MKL linpack benchmark gets killed on Xeon Phi

$
0
0

Hi all,

I've got a weird problem: I wanted to test the GLOPS performance of the Xeon Phi's that are entrusted to me: 2 x Xeon Phi 5110P, 1x Xeon Phi 7120 . I read that the linpack benchmark is included in Intel's MKL libs and that a Xeon Phi version is included. So I grabbed the binaries and ran them on my Xeon Phi's.

On the 7120 (with mpss 3.3.2) the benchmark runs fine:

Thu Feb 12 16:58:54 CET 2015
Intel(R) Optimized LINPACK Benchmark data

Current date/time: Thu Feb 12 16:58:54 2015

CPU frequency:    1.238 GHz
Number of CPUs: 1
Number of cores: 244
Number of threads: 244

Parameters are set to:

Number of tests: 14
Number of equations to solve (problem size) : 2048  4096  6144  8192  10240 12288 14336 16384 18432 20480 22528 24576 26624 28672
Leading dimension of array                  : 2112  6208  6208  8256  10304 12352 14400 18496 18496 20544 22592 26688 26688 28736
Number of trials to run                     : 3     3     3     3     3     3     3     3     3     3     3     3     3     3
Data alignment value (in Kbytes)            : 4     4     4     4     4     4     4     4     4     4     4     4     4     4

Maximum memory requested that can be used=6591927552, at the size=28672
Performance Summary (GFlops)

Size   LDA    Align.  Average  Maximal
2048   2112   4       62.4610  89.8029
4096   6208   4       254.9105 260.5183
6144   6208   4       399.6637 404.3374
8192   8256   4       484.3184 491.6444
10240  10304  4       577.4737 587.8460
12288  12352  4       639.3712 643.3008
14336  14400  4       696.0603 701.3388
16384  18496  4       744.9810 748.8416
18432  18496  4       788.7247 791.7044
20480  20544  4       818.3679 820.8570
22528  22592  4       846.7491 848.7561
24576  26688  4       868.7217 870.2109
26624  26688  4       884.2233 885.7552
28672  28736  4       896.8622 896.9412

Residual checks PASSED

End of test

 

However, on both 5110P's (with mpss 3.4.2) the benchmark gets killed before it is complete!

mic0 $ cd linpack/
mic0 $ export LD_LIBRARY_PATH=$PWD
mic0 $ ./runme_mic
This is a SAMPLE run script for SMP LINPACK. Change it to reflect
the correct number of CPUs/threads, problem input files, etc..
Fri Feb 13 10:01:12 CET 2015
Intel(R) Optimized LINPACK Benchmark data

Current date/time: Fri Feb 13 10:01:12 2015

CPU frequency:    1.053 GHz
Number of CPUs: 1
Number of cores: 240
Number of threads: 240

Parameters are set to:

Number of tests: 14
Number of equations to solve (problem size) : 2048  4096  6144  8192  10240 12288 14336 16384 18432 20480 22528 24576 26624 28672
Leading dimension of array                  : 2112  6208  6208  8256  10304 12352 14400 18496 18496 20544 22592 26688 26688 28736
Number of trials to run                     : 3     3     3     3     3     3     3     3     3     3     3     3     3     3
Data alignment value (in Kbytes)            : 4     4     4     4     4     4     4     4     4     4     4     4     4     4

Maximum memory requested that can be used=6591927552, at the size=28672

=================== Timing linear equation system solver ===================

Size   LDA    Align. Time(s)    GFlops   Residual     Residual(norm) Check
2048   2112   4      0.596      9.6303   4.795780e-12 3.950479e-02   pass
2048   2112   4      0.073      78.7107  4.795780e-12 3.950479e-02   pass
2048   2112   4      0.074      77.8766  4.795780e-12 3.950479e-02   pass
4096   6208   4      0.214      214.2289 2.216840e-11 4.613649e-02   pass
4096   6208   4      0.203      225.7619 2.216840e-11 4.613649e-02   pass
4096   6208   4      0.204      224.5814 2.216840e-11 4.613649e-02   pass
6144   6208   4      0.457      338.6425 3.562570e-11 3.301736e-02   pass
6144   6208   4      0.445      347.2770 3.562570e-11 3.301736e-02   pass
6144   6208   4      0.446      346.9953 3.562570e-11 3.301736e-02   pass
8192   8256   4      0.900      407.1775 7.232445e-11 3.782865e-02   pass
8192   8256   4      0.869      421.7898 7.232445e-11 3.782865e-02   pass
8192   8256   4      0.867      422.8278 7.232445e-11 3.782865e-02   pass
10240  10304  4      1.449      494.0793 1.010026e-10 3.389721e-02   pass
10240  10304  4      1.373      521.5753 1.010026e-10 3.389721e-02   pass
10240  10304  4      1.371      522.2989 1.010026e-10 3.389721e-02   pass
12288  12352  4      2.241      552.0942 1.454923e-10 3.393283e-02   pass
12288  12352  4      2.184      566.5285 1.454923e-10 3.393283e-02   pass
12288  12352  4      2.185      566.1465 1.454923e-10 3.393283e-02   pass
14336  14400  4      3.313      592.9472 2.006193e-10 3.448820e-02   pass
14336  14400  4      3.228      608.5453 2.006193e-10 3.448820e-02   pass
14336  14400  4      3.224      609.3674 2.006193e-10 3.448820e-02   pass
16384  18496  4      4.621      634.5835 2.524725e-10 3.324476e-02   pass
16384  18496  4      4.462      657.1922 2.524725e-10 3.324476e-02   pass
16384  18496  4      4.461      657.3274 2.524725e-10 3.324476e-02   pass
./runme_mic: line 45:  5271 Killed                  ./xlinpack_$arch lininput_$arch
Done: Fri Feb 13 10:05:15 CET 2015

 

How can I debug this? a 'gdb' run shows nothing, it just states that all threads get killed. The "runme_mic" script is from the MKL kit itself:

#!/bin/sh
[....]
echo "This is a SAMPLE run script for SMP LINPACK. Change it to reflect"
echo "the correct number of CPUs/threads, problem input files, etc.."

#    Setting up affinity for better threading performance
export KMP_AFFINITY=explicit,granularity=fine,proclist=[1-$(($(cat /proc/cpuinfo|grep proc|wc -l)-1)),0]

arch=mic
{
  date
  ./xlinpack_$arch lininput_$arch
  echo -n "Done: "
  date
} | tee lin_$arch.txt

What's going wrong ? how can I debug this? I've tried it with binaries from both the Intel v14 and Intel v15 compilers.

 


Viewing all articles
Browse latest Browse all 1347

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>