Quantcast
Channel: Intel® Many Integrated Core Architecture
Viewing all articles
Browse latest Browse all 1347

Unexpected Performance for Separate Process of offload

$
0
0

Hi everyone,

 

When I tried to separate the offload process  for axpy(y[i] = x[i] * a + y[i]) (allocate/copy memory for x/y to coprocessor(xeon phi)-> run the kernel on the coprocessor(xeon phi)-> get the result back from coprocessor to host(cpu) -> free the memory in coprocessor(xeon phi) ).

I found that the time of allocate/copy memory for x/y is longer than the whole process(all process running together with inout pragma for x/y)

Could anyone explain why this situation happens? Is there any better way to separate the offload process?(The purpose of separate offload process is to collect the time of every subprocess, not just for axpy, but other applications.)

Following is the performance. The attached file is the axpy.c

Thanks,

Jiawen

[liu@fornax Test_offomp]$ ./a.out 

Checking for Intel(R) Xeon Phi(TM) (Target CPU) devices...

 

Number of Target devices installed: 2

 

Offload sections will execute on: Target CPU (offload mode)

 

Copy back to host successfully!

 

PASS axpy

 

Copy time = 0.01594615 sec

 

Kernel time = 0.00443697 sec

 

Free time = 0.00104403 sec

 

Total time for separate process = 0.02142906 sec

 

Total time for inout combined   = 0.01055193 sec

AllegatoDimensione
Downloadtext/x-csrcaxpy.c4.21 KB

Viewing all articles
Browse latest Browse all 1347

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>