Quantcast
Channel: Intel® Many Integrated Core Architecture
Viewing all articles
Browse latest Browse all 1347

Xeon-Phi asynchronous offload from host openMP parallel region

$
0
0



I am using intel's offload pragmas in host openMP code. The code looks as follows

    int s1 = f(a,b,c);

    #prama offload singnal(s1) in (...) out(x:len)

    {

        for (int i = 0; i < len; ++i)

        {

            x[i] = ...

        }    

    }

    #pragma omp parallel default(shared)

    {

        #pragma omp for schedule(dynamic) nowait

        for (int i = 0; i < count; ++i)

        {

            /* code */

        }

        #pragma omp for schedule(dynamic) 

        for (int j = 0; j < count2; ++j)

        {

            /* code */

        }

    }

    #pragma offload wait(s1)

    {

        /* code */

    }

The code offload calculation of  $x$ to MIC. The code keeps itself busy by assining some openMP to CPU cores. The above code works as expected. However, the first offload pragma takes a lot of time and has become the bottleneck. Nevertheless overall , it pays off to offload computation of $x$ to MIC. One way to potentially overcome this latency issue I'm trying is as follows 

    

    int s1 = f(a,b,c);

    

    #pragma omp parallel default(shared)

    {

        #pragma omp single nowait

        {

            #prama offload singnal(s1) in (...) out(x:len)

            {

                for (int i = 0; i < len; ++i)

                {

                    x[i] = ...

                }    

            }

        }

        #pragma omp for schedule(dynamic) nowait

        for (int i = 0; i < count; ++i)

        {

            /* code */

        }

        #pragma omp for schedule(dynamic) 

        for (int j = 0; j < count2; ++j)

        {

            /* code */

        }

    }

    #pragma offload wait(s1)

    {

        /* code */

    }

SO this new code, assigns a thread to do the offload while other openmp threads can be used for other worksharing constructs. However this code doesn't work. I get following error message 

    device 1 does not have a pending signal for wait(0x1)

Offload report points that above piece of code is the main culprit. One temporary work around is using a constant as signal i.e. signal(0), which works. However, I need a more permanent solution. Can anyone shade light on what is going wrong in my code.

Thanks


Viewing all articles
Browse latest Browse all 1347


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>