Quantcast
Channel: Intel® Many Integrated Core Architecture
Viewing all articles
Browse latest Browse all 1347

packed store operation for vectorization on MIC

$
0
0

Hey everyone, I have a loop structure that looks like the following.

do all atoms i
     numneigh(i) = 0

     do all potential neighbors k do

          j = potential_neighbor(k)
          delx = x(j)-x(i); dely = y(j)-y(i); delz = z(j)-z(i)

         dr2 = delx**2 + dely**2 + delz**2

          if(dr2.lt.rcut)
                   numneigh(i) = numneigh(i)+1
                   neighbor(i)(numneigh(i)) = j
           endif
       end do
enddo

Now I am reading a paper discussing how we can implement this code efficiently on a MIC. Clearly the above loop will not auto vectorize due to the loop dependence in lines 12 and 13. They mention that an efficient way to vectorize appending to slits in this manner is to use a packed store. They further mention: "For a SIMD register packed with req values, the result of a comparison with Rc+Rs is a W bit mask, and a packed store write a a subset of indices to contiguous memory based upon the mask." However as a Chemical Engineering PhD, I don't know really know whats going on here. I ran a decent google search, and the info was a little above my head. Could anyone explain this concept further to me. And if possible, modify the above code for me with this procedure so that I have an example to look at?


Viewing all articles
Browse latest Browse all 1347

Latest Images

Trending Articles



Latest Images

<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>