Quantcast
Channel: Intel® Many Integrated Core Architecture
Viewing all articles
Browse latest Browse all 1347

Xeon Phi crashes on too-large SCIF memory registration

$
0
0

Is there a mechanism with SCIF to register a memory region with all endpoints? At the moment, I have a for-loop with scif_register() on this memory region with each endpoint. Memory registration is rather expensive and I would like to avoid unnecessarily incurring this cost repeatedly if there is possibly a faster way to register with all endpoints.

With my current method, if the memory region is sufficiently large (e.g., 6 GB+), the coprocessor crashes during scif_register():

  1. Error occurs: "Connection to mic0 closed by remote host." and the ssh connection drops.
  2. Attempting further ssh connections fail
  3. `micctrl -s' still reports "online", but attempting `micctrl --reboot mic0' will stall with status "shutdown". Only power-cycling the host platform will restore operation.

System Info:

  • Xeon Phi 5110P; MPSS 3.5.1

EDIT 20150706-1517EST: 3.2 GB works. 3.8 GB and above will crash the device.


Viewing all articles
Browse latest Browse all 1347

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>