I'm getting bad performance with MPI barriers in a microbenchmark on this system configuration:
- multiple Xeon Phi coprocessors
- Intel MPSS 3.5 (April 2015), Linux
- Intel MPI 5.0 update 3
- OFED-3.12-1
export I_MPI_MIC=1 export I_MPI_DEBUG=5 export I_MPI_FABRICS=shm:dapl export I_MPI_DAPL_PROVIDER=ofa-v2-scif0 export I_MPI_PIN_MODE=lib export I_MPI_PIN_CELL=core /opt/intel/impi/5.0.3.048/intel64/bin/mpirun -hosts mic0,mic1 -ppn 30 -n 60 ./exe (( omitted tons of debug lines: DAPL and processor pinning are occurring correctly )) [0] MPI startup(): I_MPI_DAPL_PROVIDER=ofa-v2-scif0 [0] MPI startup(): I_MPI_DEBUG=5 [0] MPI startup(): I_MPI_FABRICS=shm:dapl [0] MPI startup(): I_MPI_MIC=1 [0] MPI startup(): I_MPI_PIN_MAPPING=30:0 1,1 9,2 17,3 25,4 33,5 41,6 49,7 57,8 65,9 73,10 81,11 89,12 97,13 105,14 113,15 121,16 129,17 137,18 145,19 153,20 161,21 169,22 177,23 185,24 193,25 201,26 209,27 217,28 225,29 0 # OSU MPI Barrier Latency Test # Avg Latency(us) 1795.31
I'm not sure if these results are representative of the mpirun configuration I've used (two coprocessors, ppn=30).
Additional results (ppn=60, 120 PEs):
/opt/intel/impi/5.0.3.048/intel64/bin/mpirun -hosts mic0,mic1 -ppn 60 -n 120 ./exe # OSU MPI Barrier Latency Test # Avg Latency(us) 5378.48