Hello,
Here is an example that contains some features of a larger code I want to run on MIC in offload mode. I'm begining with offload mode.
It doesn't work and I do not understand why.
That is an OpenMP region with several loops that I want to offload on the MIC, some of the data should be kept on the MIC from one to
the other offloaded loop. Around them, there are some other instructions that stay on the host.
Here it is.
PROGRAM BASIC_OMP !$ USE OMP_LIB IMPLICIT NONE INTEGER, PARAMETER :: N = 512 !dir$ attributes offload:mic:: U, V REAL(8), DIMENSION(:), ALLOCATABLE :: U, V !dir$ attributes align:64 :: U, V REAL(8) :: LAMBDA = 0.0_8 REAL(8) :: scal = 5.0_8 / 12.0_8 ! INTEGER :: NBTHRDS, MYTHRD, I, IERR=0 !dir$ attributes offload:mic:: LAMBDA, scal, NBTHRDS, MYTHRD, I, IERR, N #define ALLOC alloc_if(.true.) #define FREE free_if(.true.) #define RETAIN free_if(.false.) #define REUSE alloc_if(.false.) write(6,'(A)') 'Starting Compute' !$OMP PARALLEL DEFAULT (NONE) & !$OMP SHARED (U, V) & !$OMP SHARED (scal, IERR, LAMBDA) & !$OMP PRIVATE(NBTHRDS, MYTHRD, I) ! !$OMP MASTER write (6,'(A)') 'on CPU, step 1' !$OMP END MASTER ! !dir$ offload begin target(mic:0) in(U:length(N) ALLOC RETAIN) in(V:length(N) ALLOC RETAIN) in(scal) !$OMP MASTER write (6,'(A)') 'on MIC, step 2' !$OMP END MASTER NBTHRDS = 1 MYTHRD = 0 !$ NBTHRDS = OMP_GET_NUM_THREADS () !$ MYTHRD = OMP_GET_THREAD_NUM () !$OMP MASTER !$ WRITE (6,'(A,L3)') 'OMP_IN_PARALLEL : ', OMP_IN_PARALLEL () WRITE (6,'(2(A,I5))') 'NBTHRDS = ', NBTHRDS, ' MYTHRD = ', MYTHRD !$OMP END MASTER !$OMP MASTER ALLOCATE (U(N), V(N), STAT = IERR) IF (IERR /= 0) THEN WRITE (6,'(A)') ' Allocation Problem U, V' WRITE (6,'(A,I8)') 'IERR = ', IERR STOP END IF !$OMP END MASTER !$OMP BARRIER !$OMP DO PRIVATE (I) SCHEDULE(STATIC) DO I = 1, N U (I) = scal V (I) = 0.2_8 + REAL (I,8) END DO !$OMP END DO !$OMP MASTER write (6,'(A)') 'on MIC, end step 2' !$OMP END MASTER !dir$ end offload !$OMP MASTER write (6,'(A)') 'on CPU, step 3, between offloads' !$OMP END MASTER call flush (6) !$OMP BARRIER !dir$ offload begin target(mic:0) in(U:length(n) REUSE FREE) in(V:length(n) REUSE FREE) inout(LAMBDA) !$OMP MASTER write (6,'(A)') 'on MIC, step 4' write (6,*) 'L ', LAMBDA write (6,*) 'U ', U(1) write (6,*) 'V ', V(1) !$OMP END MASTER !$OMP DO REDUCTION (+:LAMBDA) PRIVATE (I) SCHEDULE(STATIC) DO I = 1, N LAMBDA = LAMBDA + U (I) * V (I) END DO !$OMP END DO !$OMP MASTER write (6,'(A)') 'on MIC, end step 4' call flush (6) !$OMP END MASTER !$OMP BARRIER !dir$ end offload !$OMP MASTER write (6,'(A)') 'on CPU, step 5' write (6,'(A,E22.15)') 'LAMBDA = ', LAMBDA !$OMP END MASTER !$OMP END PARALLEL STOP END PROGRAM BASIC_OMP
Compilation and running on a Xeon based host with 2 MIC 5110P (I use only one).
ifort -O3 -openmp -g -traceback basic_omp.F90 -o basic_offload.out
On the host, I set the following env. variables :
export OMP_NUM_THREADS=2
export MIC_ENV_PREFIX=MIC
export MIC_OMP_NUM_THREADS=118
The execution gives me :
./basic_offload.out Starting Compute on CPU, step 1 on MIC, step 2 OMP_IN_PARALLEL : F NBTHRDS = 1 MYTHRD = 0 on MIC, end step 2 on MIC, step 2 OMP_IN_PARALLEL : F NBTHRDS = 1 MYTHRD = 0 Allocation Problem U, V IERR = 151 on CPU, step 3, between offloads offload error: process on the device 0 unexpectedly exited with code 0
What I do not understand :
- the code is compiled as an OpenMP parallelized application. Inside the OpenMP region, OpenMP tells me it is not in parallel
- therefore, the number of threads is false, everything is printed twice (2 threads on the host), allocation has to be done by the master
thread only, but the error tells me the opposite.
- why does it not behave like I want it to ?
N.B. : With the -no-offload option, everything's fine on the host.
Thank you in advance for your comments.
Regards,
Guy.