Dear forum,
I'm testing the host-device bandwidth using dapl fabric and Intel MPI (Isend/Irecv/Wait). 1.5 GB data are repeatedly sent back and forth. The initial result is:
host to device: ~5.6 GB/sec device to host: ~5.8 GB/sec
Problem 1: The first send-receive appears to be extremely slow. Its bandwidth is:
host to device: ~2.6 GB/sec device to host: ~2.5 GB/sec
I immediately thought of Linux' deferred memory allocation Jim pointed out in this post, so I memset the array prior to send/receive, but of little avail. So...is it because of the overhead of Intel MPI's first send/receive?
Problem 2: When I increased the data size to 2 GB, the following message was displayed:
[mic_name]:SCM:3be5:19664b40: 9659192 us(9659192 us!!!): DAPL ERR reg_mr Cannot allocate memory
The program can complete without a problem, though. So what causes that error message?
Thanks for any advice.