Hi all,
I've got a weird problem: I wanted to test the GLOPS performance of the Xeon Phi's that are entrusted to me: 2 x Xeon Phi 5110P, 1x Xeon Phi 7120 . I read that the linpack benchmark is included in Intel's MKL libs and that a Xeon Phi version is included. So I grabbed the binaries and ran them on my Xeon Phi's.
On the 7120 (with mpss 3.3.2) the benchmark runs fine:
Thu Feb 12 16:58:54 CET 2015 Intel(R) Optimized LINPACK Benchmark data Current date/time: Thu Feb 12 16:58:54 2015 CPU frequency: 1.238 GHz Number of CPUs: 1 Number of cores: 244 Number of threads: 244 Parameters are set to: Number of tests: 14 Number of equations to solve (problem size) : 2048 4096 6144 8192 10240 12288 14336 16384 18432 20480 22528 24576 26624 28672 Leading dimension of array : 2112 6208 6208 8256 10304 12352 14400 18496 18496 20544 22592 26688 26688 28736 Number of trials to run : 3 3 3 3 3 3 3 3 3 3 3 3 3 3 Data alignment value (in Kbytes) : 4 4 4 4 4 4 4 4 4 4 4 4 4 4 Maximum memory requested that can be used=6591927552, at the size=28672 Performance Summary (GFlops) Size LDA Align. Average Maximal 2048 2112 4 62.4610 89.8029 4096 6208 4 254.9105 260.5183 6144 6208 4 399.6637 404.3374 8192 8256 4 484.3184 491.6444 10240 10304 4 577.4737 587.8460 12288 12352 4 639.3712 643.3008 14336 14400 4 696.0603 701.3388 16384 18496 4 744.9810 748.8416 18432 18496 4 788.7247 791.7044 20480 20544 4 818.3679 820.8570 22528 22592 4 846.7491 848.7561 24576 26688 4 868.7217 870.2109 26624 26688 4 884.2233 885.7552 28672 28736 4 896.8622 896.9412 Residual checks PASSED End of test
However, on both 5110P's (with mpss 3.4.2) the benchmark gets killed before it is complete!
mic0 $ cd linpack/ mic0 $ export LD_LIBRARY_PATH=$PWD mic0 $ ./runme_mic This is a SAMPLE run script for SMP LINPACK. Change it to reflect the correct number of CPUs/threads, problem input files, etc.. Fri Feb 13 10:01:12 CET 2015 Intel(R) Optimized LINPACK Benchmark data Current date/time: Fri Feb 13 10:01:12 2015 CPU frequency: 1.053 GHz Number of CPUs: 1 Number of cores: 240 Number of threads: 240 Parameters are set to: Number of tests: 14 Number of equations to solve (problem size) : 2048 4096 6144 8192 10240 12288 14336 16384 18432 20480 22528 24576 26624 28672 Leading dimension of array : 2112 6208 6208 8256 10304 12352 14400 18496 18496 20544 22592 26688 26688 28736 Number of trials to run : 3 3 3 3 3 3 3 3 3 3 3 3 3 3 Data alignment value (in Kbytes) : 4 4 4 4 4 4 4 4 4 4 4 4 4 4 Maximum memory requested that can be used=6591927552, at the size=28672 =================== Timing linear equation system solver =================== Size LDA Align. Time(s) GFlops Residual Residual(norm) Check 2048 2112 4 0.596 9.6303 4.795780e-12 3.950479e-02 pass 2048 2112 4 0.073 78.7107 4.795780e-12 3.950479e-02 pass 2048 2112 4 0.074 77.8766 4.795780e-12 3.950479e-02 pass 4096 6208 4 0.214 214.2289 2.216840e-11 4.613649e-02 pass 4096 6208 4 0.203 225.7619 2.216840e-11 4.613649e-02 pass 4096 6208 4 0.204 224.5814 2.216840e-11 4.613649e-02 pass 6144 6208 4 0.457 338.6425 3.562570e-11 3.301736e-02 pass 6144 6208 4 0.445 347.2770 3.562570e-11 3.301736e-02 pass 6144 6208 4 0.446 346.9953 3.562570e-11 3.301736e-02 pass 8192 8256 4 0.900 407.1775 7.232445e-11 3.782865e-02 pass 8192 8256 4 0.869 421.7898 7.232445e-11 3.782865e-02 pass 8192 8256 4 0.867 422.8278 7.232445e-11 3.782865e-02 pass 10240 10304 4 1.449 494.0793 1.010026e-10 3.389721e-02 pass 10240 10304 4 1.373 521.5753 1.010026e-10 3.389721e-02 pass 10240 10304 4 1.371 522.2989 1.010026e-10 3.389721e-02 pass 12288 12352 4 2.241 552.0942 1.454923e-10 3.393283e-02 pass 12288 12352 4 2.184 566.5285 1.454923e-10 3.393283e-02 pass 12288 12352 4 2.185 566.1465 1.454923e-10 3.393283e-02 pass 14336 14400 4 3.313 592.9472 2.006193e-10 3.448820e-02 pass 14336 14400 4 3.228 608.5453 2.006193e-10 3.448820e-02 pass 14336 14400 4 3.224 609.3674 2.006193e-10 3.448820e-02 pass 16384 18496 4 4.621 634.5835 2.524725e-10 3.324476e-02 pass 16384 18496 4 4.462 657.1922 2.524725e-10 3.324476e-02 pass 16384 18496 4 4.461 657.3274 2.524725e-10 3.324476e-02 pass ./runme_mic: line 45: 5271 Killed ./xlinpack_$arch lininput_$arch Done: Fri Feb 13 10:05:15 CET 2015
How can I debug this? a 'gdb' run shows nothing, it just states that all threads get killed. The "runme_mic" script is from the MKL kit itself:
#!/bin/sh [....] echo "This is a SAMPLE run script for SMP LINPACK. Change it to reflect" echo "the correct number of CPUs/threads, problem input files, etc.." # Setting up affinity for better threading performance export KMP_AFFINITY=explicit,granularity=fine,proclist=[1-$(($(cat /proc/cpuinfo|grep proc|wc -l)-1)),0] arch=mic { date ./xlinpack_$arch lininput_$arch echo -n "Done: " date } | tee lin_$arch.txt
What's going wrong ? how can I debug this? I've tried it with binaries from both the Intel v14 and Intel v15 compilers.