Quantcast
Channel: Intel® Many Integrated Core Architecture
Viewing all articles
Browse latest Browse all 1347

Can we execute offload calls in parrallel on MIC at a time from a single process

$
0
0

Hi,

I have written a program like below

#define CPU_THREADS 2
#define INPUTSIZE 2
#pragma omp parallel num_threads(CPU_THREADS)
{
           #pragma omp for
           for(i =0; i < INPUTSIZE; i++)
              {
                         ..................................
                         ..............some code............... ..
                          ..................................
                          for(j=0; j < 100; j++)
                               {
                                        ......some code.......
                                        #pragma offload target(mic) in() out()
                                         {
                                             #pragma omp parallel num_threads(240)
                                           {
                                              #pragma omp for
                                              for(i=0; i< 240; i++)
                                              {
                                                        .....................some code................
                                                          ..........................................
                                               }
                                           }
                                       }
                               }
                  }
}

Case1:Now I set  CPU_THRAEDS = 2 and INPUTSIZE = 2. When i run i expected to see OFFLOAD_MAIN processes using top command on MIC. But i found only OFFLOAD_MAIN is running. The execution time is 65+sec. This is nearly equal to if set CPU_THREADS=1 and INPUT_SIZE=2.

Case2:Now I set CPU_THRAEDS = 1 and INPUTSIZE = 1. This time i ran two different executables of program in parallel. Now i could able to see  two OFFLOAD_MAIN processes are running if i use top command on MIC. and the execution time is also 50+ secs.

In both the cases the work load is same. why i was getting 15 sec difference. Why I couldnt find two OFFLOAD_MAIN process in Case1.

Can we make two offload calls from a parallel region and make it to execute at a time on xeon phi?

Please help me out.

 


Viewing all articles
Browse latest Browse all 1347

Trending Articles