Hello ,
I am running a code in openMP which is like this:
#pragma omp parallel for default( none ) shared( X , Y ,V ,V ,H , W ,N ) private ( i,x,y ,Kx,Ky,initD ,T ) for ( y = 0; y < H; y++ ) { for ( x = 0; x < W; x++ ) { initD = aValue; for ( i = 0; i < N; i++ ) { .....Kx,Ky... ...X,Y.. } V[ x + y * Width ] = T; } }
Now , I want to run it on mic card , so when I just add the line:
#pragma offload target (mic) in ( X:length( W ) ) in ( Y:length( H ) ) out ( V:length( W * H) )
the performance drops drammatically!
What should I pay attention to?What changes do I have to make?
Thank you!