Hi,
I got a simple C++ code to call lapack dpotrf function to do cholesky decomposition and dgetrf and dgetri. I got very weird behavior. on Xeon server with 6 Xeon Phi Cards
1) Performance:
For matrix size 12000x12000:
run with export MKL_MIC_ENABLE=1 is slower than MKL_MIC_ENABLE=0: 20 seconds vs 11 seconds
It seems that MKL does not do a good job here.
2) Bug:
For cholesky decomposition: with matrix size 10000 x 10000 or 9500 x 9500 with MKL_MIC_ENABLE=1
cause "Segmentation fault (core dumped)" with kernel log:
traps: inverse[10032] general protection ip:7fc3b27955c7 sp:7fffe2da28c0 error:0 in libmkl_intel_thread.so[7fc3b206b000+ffc000]
This problem happens on MKL 11.2.1 and 11.3.0 (newest Studio 2016).
Any help please?