Quantcast
Channel: Intel® Many Integrated Core Architecture
Viewing all articles
Browse latest Browse all 1347

icc15 produces 20% slower code than icc14

$
0
0

Hi all,

I tried out the new Intel compiler (15.0.0 20140723) with a big intrinsics kernel on the MIC. The programs runs 20% slower than compared to icpc 14.0.3 20140422. I analyzed and attached the generated assembler code using icc14 (block_icc14.s) and icc15 (block_icc15.s) for a large block of the kernel. The programs were compiled with -O3.

There is no big difference between both assembler files. The number of instructions for 

prefetching icc14 = 105; icc15 = 105

fmadd         icc14 = 63  ; icc15 = 64

is equal. Also the order of the arithmetic and align/blend instructions are mostly equivalent, but the Intel compiler 15 produces a lot of nop-instructions in the form of (mov       al, al). Why? In total icc15 generates 350 lines of assembler with 11 nop-instructions. Icc14 generates only 333 lines of assembler.

The biggest difference seems to be caused by the order of the prefetch instructions. It is totally different. Also it seems to me that something has changed from icc14 to icc15? At least the syntax is different

vprefetchnta ZMMWORD PTR [2048+r8+r9*4]     // icc14

vprefetchnta BYTE PTR [2048+r8+r9*4]        // icc15 

I insert prefetches by hand using Intrinsics. If I remove all my prefetch Intrinsics there is no performance difference between icc14 and icc15. Is there some information what has change from icc14 to icc15 espacially for MIC intrinsics. The information I found on the Intel website is quiet sparse.  

Thanks,

Patrick

AllegatoDimensione
Scaricablock_icc15.c26.03 KB
Scaricablock_icc14.c24.77 KB

Viewing all articles
Browse latest Browse all 1347

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>