If you are troubleshooting problems while running with an Intel(R) Xeon Phi(tm) Coprocessor, there is a variety of commands available to see what might be happening. Here is a quick summary of the types of commands that might be useful, and under what conditions. For now I will document the commands that are relevant if you are running with a Linux* based host operating system, however will update this document to include Windows* at a later stage.
Command or Procedure | Used for |
/opt/intel/mic/bin/micdebug.sh | Script that collects and bundles a variety of diagnostic information (basically, all of the information listed below, plus more). This is useful if you are trying to escalate a problem to a manufacturer or to Intel, or for you to review if you are troubleshooting |
lspci | This command is typically shipped with the Linux* operating system; in this context we are running it on the host system. Useful switches are: 'lspci -vvv' and 'lspci | grep 225'. Helps you determine if the coprocessor is detected, and details about the link state. |
dmesg (on the host), dmesg (on the coprocessor) | Kernel ring buffer - allows you to see errors that may have come up on boot, or during execution. Because this is a buffer, it gets cleared out every so often, so it may not be complete |
cat /var/log/messages | Depending on how syslog is configured, this file may contain kernel and user messages |
/opt/intel/mic/bin/micinfo | Information about what coprocessors are connected, and what drivers/firmware/flash versions are used |
/usr/sbin/micctrl -s | Allows you to see status of each of the coprocessors |
sudo service micras start tail -f /var/log/micras.log.. | Runs diagnostics and logs error conditions to /var/log/messages on the host, prepending a “micras” tag to each message. |
Getting the coprocessor buffer log: echo 0 > /sys/class/mic/scif/watchdog_enabled Then, use the following steps to show the micro-OS kernel log buffer Mount debugfs on the host: mount -t debugfs none /sys/kernel/debug Dump the buffer: cat /sys/kernel/debug/mic_debug/mic0/log_buf > <some file of your choice> (shows contents of the buffer up until now) sudo tail -f /sys/kernel/debug/mic_debug/mic0/log_buf | tee -a <some file of your choice> (collects any recent and new data as things run; also outputs contents to STDOUT) | This is useful if, for example, the card resets or becomes unresponsive after an undertermined period of time. Because the buffer log gets wiped on restart, and you want to collect information about the problem when and while it happens, you can mount that buffer log and keep 'watch' on it to collect any potential errors. These instructions allow you to mount that buffer log on the host, and watch and write that content to a file. |
Immagine icona:
