Quantcast
Channel: Intel® Many Integrated Core Architecture
Viewing all articles
Browse latest Browse all 1347

Strange results on MIC

$
0
0

Hello, for my thesis I have run a simple code used to study a Lennard Jones system on a Xeon Phi coprocessor and I tried to vectorize it and study the variations on execution time. 

The machine I used in particular has 61 cores with 32 kB of L1 cache and 512 kB of L2 cache, the vector register can memorize 512 bit.

I implemented the code with, and without, the cell-list method and used different numbers of particles, in particular from 512 to 16384, doubling it each time.

Positions and forces are memorized in three different vectors (rx,ry,rz and fx,fy,fz).

I have good results in the case without the cell-list but in the other one I have some strange results.

The dependence between the cell-list and the number of particle should be linear with the cell-list method implemented, indeed I obtained a straight line plotting the time over the number of particles, but with N=8192 and N=16384 the time of execution is much higher.

I tried to do some calculation with values of N near these values but the scaling is correct for each other number, only for those two there's a problem.

To make it clear I report some value: 

N      Time
512    6.14995
1024   11.1381
2048   23.1964
4096   51.9393
6144   78.1251
8192   389.724
10240  144.173
12288  167.772
14336  209.669
16384  822.131

I think is a technical problem but I really don't know exactly why this happens.

I also observed a really low variation using the vectorization, without the cell-list I observed a variation of a factor 4x, more or less, but with the cell-list it's only around 1.5x.

Questions:

Does anybody have an idea of what could the problem be? Why those particular values are strange and why the vectorization gain is so low?

My professor told me that can happen that some values show strange results on the execution, did anybody observe something like this?

Thank you very much.

Below I report the main loop in which are evaluated the forces, in few words the main part of the execution, implemented with the cell-list.

for(vcy=0; vcy<ncell; vcy++){

    for(vcx=0; vcx<ncell; vcx++){

        previouspartc=0;

        // Central cell index
        c=vcx*ncell+vcy;

        // Define previouspart
        for(p=1; p<=c; p++) previouspartc=previouspartc+npart[p-1];

        // Loop over central cell's particles
        for(i=0; i<npart[c]-1; i++){

            for(j=i+1; j<npart[c]; j++){

                ftempx=0.; ftempy=0.;
                dx =rx1[previouspartc+i]-rx1[previouspartc+j];
                dy =ry1[previouspartc+i]-ry1[previouspartc+j];
                dx = (dx + 0.5*dy)*L;
                dy = dy*halfsq3*L;
                r2 = dx*dx + dy*dy;
                if(r2<r2cut) {

                    rr2 = 1./r2;
                    rr6 = rr2*rr2*rr2;
                    enk+=(c12*rr6 -c6)*rr6 -ecut;
                    vir=(cf12*rr6-cf6)*rr6*rr2;
                    ftempx=vir*dx;
                    ftempy=vir*dy;
                }
                fx1[previouspartc+i]+=ftempx;
                fy1[previouspartc+i]+=ftempy;
                fx1[previouspartc+j]-=ftempx;
                fy1[previouspartc+j]-=ftempy;
            }
        }

        // Create the two indexes vcx1, vcy1 of the neighbour cells (the one on the right and the three under)
        vcx1[0]=vcx+1;   vcy1[0]=vcy;
        for(k=1; k<4; k++){

            vcx1[k]=vcx-1+(k-1);
            vcy1[k]=vcy-1;
        }

        // Loop over near cells
        for(k=0; k<4; k++){

            previouspartc1=0;

            // PBC
            shiftx=0.; shifty=0.;

            if(vcx1[k] <0){ shiftx= -1; vcx1[k]=ncell-1;}

            else if(vcx1[k] >=ncell){ shiftx= 1; vcx1[k]=0;}

            if(vcy1[k] <0){ shifty= -1; vcy1[k]=ncell-1;}

            else if(vcy1[k] >=ncell){ shifty= 1; vcy1[k]=0;}

            // Scalar cell index of neighbour cell
            c1=vcx1[k]*ncell+vcy1[k];


            // Define previouspart
            for(p=1; p<=c1; p++) previouspartc1=previouspartc1+npart[p-1];

            for(i=0; i<npart[c]; i++){

                for(j=0; j<npart[c1]; j++){

                    ftempx=0.; ftempy=0.;
                    dx =rx1[previouspartc+i]-(rx1[previouspartc1+j]+shiftx);
                    dy =ry1[previouspartc+i]-(ry1[previouspartc1+j]+shifty);
                    dx = (dx + 0.5*dy)*L;
                    dy = dy*halfsq3*L;
                    r2 = dx*dx + dy*dy;
                    if(r2<r2cut) {

                        rr2 = 1./r2;
                        rr6 = rr2*rr2*rr2;
                        enk+=(c12*rr6 -c6)*rr6 -ecut;
                        vir=(cf12*rr6-cf6)*rr6*rr2;
                        ftempx=vir*dx;
                        ftempy=vir*dy;
                    }
                    fx1[previouspartc+i]+=ftempx;
                    fy1[previouspartc+i]+=ftempy;
                    fx1[previouspartc1+j]-=ftempx;
                    fy1[previouspartc1+j]-=ftempy;
                }
            }
        }
    }
}       

Viewing all articles
Browse latest Browse all 1347

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>