Quantcast
Channel: Intel® Many Integrated Core Architecture
Viewing all articles
Browse latest Browse all 1347

Usage of _mm512_mask_prefetch_i32gather_ps for doubles

$
0
0

Dear all,

I want to implement prefetching for sparse complex double precision data using Intrinsics.

A linear array contains the indexes of the sparse complex double elements like so {1,2,3,4,150,151,7000,7001,10000,10001}

As each of these elements are 16 contiguous bytes in memory, how should I use the prefetch intrinsic meant for single precision floats correctly?

Should I use _mm512_mask_prefetch_i32gather_ps() and explicitly prefetch each 4 byte piece of the 16 bytes?

Or can I expect that each element in the index register will cause 64 bytes to be prefetched into cache?  In that case I could perform some modular arithmetic on the index values to only prefetch individual unique cache lines. (I have actually tried this approach with disappointing results)

Best regards,

Alastair


Viewing all articles
Browse latest Browse all 1347

Latest Images

Trending Articles



Latest Images