Hi,
I have written a program like below
#define CPU_THREADS 2 #define INPUTSIZE 2 #pragma omp parallel num_threads(CPU_THREADS) { #pragma omp for for(i =0; i < INPUTSIZE; i++) { .................................. ..............some code............... .. .................................. for(j=0; j < 100; j++) { ......some code....... #pragma offload target(mic) in() out() { #pragma omp parallel num_threads(240) { #pragma omp for for(i=0; i< 240; i++) { .....................some code................ .......................................... } } } } } }
Case1:Now I set CPU_THRAEDS = 2 and INPUTSIZE = 2. When i run i expected to see OFFLOAD_MAIN processes using top command on MIC. But i found only OFFLOAD_MAIN is running. The execution time is 65+sec. This is nearly equal to if set CPU_THREADS=1 and INPUT_SIZE=2.
Case2:Now I set CPU_THRAEDS = 1 and INPUTSIZE = 1. This time i ran two different executables of program in parallel. Now i could able to see two OFFLOAD_MAIN processes are running if i use top command on MIC. and the execution time is also 50+ secs.
In both the cases the work load is same. why i was getting 15 sec difference. Why I couldnt find two OFFLOAD_MAIN process in Case1.
Can we make two offload calls from a parallel region and make it to execute at a time on xeon phi?
Please help me out.