Hi,
I have a weird issue on asynchronous offload. I program the asynchronous offload as follows,
sigin2=2
MAX_BLOCK=2
be_mic=me/2
t1_all=get_time()
do it=1,iter
i=1
t1=get_time()
bs2= BLOCK_LOW(i+1,MAX_BLOCK,be_mic)
be2= BLOCK_HIGH(i+1,MAX_BLOCK, be_mic)
!DIR$ OFFLOAD_TRANSFER target(mic : mic_id) mandatory signal(sigin2) &
in(zelectron(:,bs2:be2): alloc_if(.false.) free_if(.false.))
hots(3) = hots(3)+get_time()-t1
t1=get_time()
i=2
bs=BLOCK_LOW(i,MAX_BLOCK,be_mic)
be=BLOCK_HIGH(i,MAX_BLOCK,be_mic)
!DIR$ OFFLOAD target(mic : mic_id) mandatory &
nocopy(zelectron : alloc_if(.false.) free_if(.false.))
call kernel(bs,be,zelectron,me,nparame)
hots(4) = hots(4)+get_time()-t1
t1=get_time()
!DIR$ OFFLOAD_WAIT target(mic : mic_id) mandatory wait(sigin2)
hots(5) = hots(5)+get_time()-t1
enddo ! it=1,iter
hots(2) = hots(2)+get_time()-t1_all
In the code above, zelectron is a single-precision float array with the dimension of (nparame, me),
where nparame=8, me=1000000, and iter=1000. The array hots is used to time different pieces of the code.
In the main loop, zelectron(:, bs2:be2) is transferred asynchronously from CPU to MIC,
meanwhile the zelectron(:, bs:be) is engaged in computing on the MIC.
After the computing, the offload_wait clause is used to finish the data transfer.
After running, the values of hots are listed below,
hots(2) = 4.8, hots(3)=0.99, hots(4)=3.81, hots(5)=0.0
It seems the asynchronous offload doesn't work, but why? Is there something wrong in the code?
The compiler version is Intel(R) 64, Version 15.0.3.187 Build 20150407. The code is running on Xeon Phi 7120P.
Could you please help me on this issue?
Thanks and regards,
Shaohua