I use GotoBlas and mpich to run hpl in the cluster(the cpu is Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz). I use two ways to compile GotoBlas:(1)make (2)make USE_THREAD=0 TARGET=NEHALEM. The library used in the makefile of hpl is libgoto.a. However, the two different ways of compiling GotoBlas all leads to a low efficiency of HPL results: only 150GFlops(the theorical peak is 330 GFlops). Do I have some mistakes in compiling GotoBlas? Thanks for your answer.
↧