Hi, guys
I am trying to run a mpi program on the MIC using 240 threads, and I use the Vtune to analyze my program and find the L2_DATA_READ_MISS_CACHE_FILL/ L2_DATA_READ_MISS_MEM_FILL is too high(about 88:1).
I guess that too many L2_DATA_READ_MISS_CACHE_FILL is the reason of my program's poor performance. So I want to know that ,in what kind of situation, one core needs to get data from another core's L2 cache instead of from the memory ? Since remote cache accesses have as high a latency as memory accesses, they should be avoided if possible.
Thank you
Hu