HPL benchmark performance obtained on a host + 1 MIC cards is coming only 154GFlops. The Host system has 102 GB memory. The theoretical peak is 1.2TF + + 256GFLOPS = 1.4TF. May I please know how to optimize the hpl performance? I've used the OFFLOAD execution, with the executable xhpl_offload_intel64.When i run hpl benchmark on simple host i am able to achieve 92 % performance. I am attaching all the files that i am using. Awaiting your quick reply.
↧