Hi,
I am experiencing a severy performance loss when using multiple rails in Intel MPI 5.0 and the KNC and an mlx5 adapter (which has 2 ports). With Intel MPI 4.1 it was much better.
Let me give an example of the performance of our application (per KNC):
- Intel MPI 4.1, single-rail (I_MPI_DAPL_PROVIDER_LIST=ofa-v2-mlx5_0-1u): 220 Gflop/s
- Intel MPI 4.1, dual-rail (-IB I_MPI_OFA_ADAPTER_NAME=mlx5_0 I_MPI_OFA_NUM_PORTS=2): 270 Gflop/s
- Intel MPI 5.0, single-rail (I_MPI_DAPL_PROVIDER_LIST=ofa-v2-mlx5_0-1u): 220 Gflop/s
- Intel MPI 5.0, dual-rail (-IB I_MPI_OFA_ADAPTER_NAME=mlx5_0 I_MPI_OFA_NUM_PORTS=2): 150 Gflop/s
- Intel MPI 5.0, single-rail (-IB I_MPI_OFA_ADAPTER_NAME=mlx5_0 I_MPI_OFA_NUM_PORTS=1): 150 Gflop/s
With DAPL the performance is unchanged, but apparently there is no way to use it with dual-rail support. With OFA I got the best performance in v4.1, but with v5.0 it is extremely low. In particular it is the same for 1 or 2 ports.
Is there anything I am overlooking in the documentation?
Thanks,
Simon