Hi,
I am testing my MPI application on 2 KNCs attached to the same host CPU. I observe a *strongly* fluctuating performance (by a factor of 10 or even more) --- for example between 10 and 160 Gflop/s (per card). This variation is observed within a loop doing the same computation in every iteration. When it runs at 160 Gflop/s one loop iteration takes around 0.05 seconds, which means the fluctuations occur at a timescale longer than that.
I am using:
I_MPI_FABRICS_LIST=dapl
I_MPI_DAPL_PROVIDER_LIST=ofa-v2-scif0
Observations:
- If I use the Infiniband card instead but still both cards on the same host, (I_MPI_DAPL_PROVIDER_LIST=ofa-v2-mlx4_0-1u), the performance is consistent.
- If I reduce the number of cores used by my application the performance gets more stable. With 56 cores I still see fluctuations, with 48 cores it is mostly fine, but still visible.
- I did not observe anything peculiar with the "osu" bandwidth-benchmark, apart from a "dip" at 8 kB (which can be reduced by changing I_MPI_DAPL_DIRECT_COPY_THRESHOLD --- but this parameter shows no influence for my actual application).
- I tried 2 hardware setups: (1) a dual socket server board, where (I think) data has to pass through the southbridge and/or QPI(?) and (2) a system with a PLX PCIe switch. The fluctuations happen on both systems.
Is there anything wrong with my configuration? Is this an known issue? Any suggestions?
Thanks
Simon