Hi there
I was trying to offload some computation to MIC using "pragma", sending data addressed by a pointer p, then how to ensure the alignment of data on MIC after MIC recieved it? Does" __assume(p, 64)" work?I was trying to use instrinsics to load data to the vector RF, which requires the alignment of data.
Another problem, that I was trying to active lots of threads for the calculation using "#pragma omp parallel for", and some arrays inside the loop must be thread private while also 64-byte aligned.
I was using "_mm_malloc()" inside the loop to ensure these, but this leads to reduplicated and unnecessary allocation.
Thanks.