Hello,
I'm attempting to run a simple offload example:
#include <stdio.h> #include <omp.h> int main(){ double sum; int i,n, nt; n=2000000000; sum=0.0e0; #pragma offload target(mic:0) { #pragma omp parallel for reduction(+:sum) for(i=1;i<=n;i++){ sum = sum + i; } //nt = omp_get_max_threads(); #pragma omp parallel { #pragma omp single nt = omp_get_num_threads(); } #ifdef __MIC__ printf("Hello MIC reduction %f threads: %d\n",sum,nt); #else printf("Hello CPU reduction %f threads: %d\n",sum,nt); #endif } }
This program ran fine previously but we recently rebooted our Phi nodes in our cluster and since then this offloading example will not run. The native compiled MIC binaries still run without a problem since the reboot.
Before running I type:
. /usr/local/intel/ClusterStudioXE_2013/composer_xe_2013_sp1/bin/compilervars.sh intel64 make export MIC_OMP_NUM_THREADS=120 export MIC_ENV_PREFIX=MIC export OFFLOAD_REPORT=3
Here is my Makefile:
CC=icc CFLAGS=-std=c99 -O3 -vec-report3 -openmp -offload EXE=reduce_offload_mic $(EXE) : reduce_omp_mic.c $(CC) -o $@ $< $(CFLAGS) .PHONY: clean clean: rm $(EXE)
However, when I run the program here is the output:
[frenchwr@vmp903 Offload]$ ./reduce_offload_mic offload error: cannot offload to MIC - device is not available [Offload] [HOST] [State] Unregister data tables
I have ensured that mpss is running and even restarted the service with:
sudo service mpss restart
but still the same error (even after re-building the executable).
All of my mic tests pass:
[frenchwr@vmp903 Offload]$ miccheck MicCheck 3.4-r1 Copyright 2013 Intel Corporation All Rights Reserved Executing default tests for host Test 0: Check number of devices the OS sees in the system ... pass Test 1: Check mic driver is loaded ... pass Test 2: Check number of devices driver sees in the system ... pass Test 3: Check mpssd daemon is running ... pass Executing default tests for device: 0 Test 4 (mic0): Check device is in online state and its postcode is FF ... pass Test 5 (mic0): Check ras daemon is available in device ... pass Test 6 (mic0): Check running flash version is correct ... pass Test 7 (mic0): Check running SMC firmware version is correct ... pass Executing default tests for device: 1 Test 8 (mic1): Check device is in online state and its postcode is FF ... pass Test 9 (mic1): Check ras daemon is available in device ... pass Test 10 (mic1): Check running flash version is correct ... pass Test 11 (mic1): Check running SMC firmware version is correct ... pass Status: OK
Here's the output from micinfo:
[frenchwr@vmp903 Offload]$ micinfo MicInfo Utility Log Created Fri Aug 28 18:14:23 2015 System Info HOST OS : Linux OS Version : 2.6.32-431.29.2.el6.x86_64 Driver Version : 3.4-1 MPSS Version : 3.4 Host Physical Memory : 132110 MB Device No: 0, Device Name: mic0 Version Flash Version : 2.1.02.0390 SMC Firmware Version : 1.16.5078 SMC Boot Loader Version : 1.8.4326 uOS Version : 2.6.38.8+mpss3.4 Device Serial Number : ADKC42900304 Board Vendor ID : 0x8086 Device ID : 0x225c Subsystem ID : 0x7d95 Coprocessor Stepping ID : 2 PCIe Width : Insufficient Privileges PCIe Speed : Insufficient Privileges PCIe Max payload size : Insufficient Privileges PCIe Max read req size : Insufficient Privileges Coprocessor Model : 0x01 Coprocessor Model Ext : 0x00 Coprocessor Type : 0x00 Coprocessor Family : 0x0b Coprocessor Family Ext : 0x00 Coprocessor Stepping : C0 Board SKU : C0PRQ-7120 P/A/X/D ECC Mode : Enabled SMC HW Revision : Product 300W Passive CS Cores Total No of Active Cores : 61 Voltage : 1037000 uV Frequency : 1238095 kHz Thermal Fan Speed Control : N/A Fan RPM : N/A Fan PWM : N/A Die Temp : 46 C GDDR GDDR Vendor : Samsung GDDR Version : 0x6 GDDR Density : 4096 Mb GDDR Size : 15872 MB GDDR Technology : GDDR5 GDDR Speed : 5.500000 GT/s GDDR Frequency : 2750000 kHz GDDR Voltage : 1501000 uV Device No: 1, Device Name: mic1 Version Flash Version : 2.1.02.0390 SMC Firmware Version : 1.16.5078 SMC Boot Loader Version : 1.8.4326 uOS Version : 2.6.38.8+mpss3.4 Device Serial Number : ADKC42900319 Board Vendor ID : 0x8086 Device ID : 0x225c Subsystem ID : 0x7d95 Coprocessor Stepping ID : 2 PCIe Width : Insufficient Privileges PCIe Speed : Insufficient Privileges PCIe Max payload size : Insufficient Privileges PCIe Max read req size : Insufficient Privileges Coprocessor Model : 0x01 Coprocessor Model Ext : 0x00 Coprocessor Type : 0x00 Coprocessor Family : 0x0b Coprocessor Family Ext : 0x00 Coprocessor Stepping : C0 Board SKU : C0PRQ-7120 P/A/X/D ECC Mode : Enabled SMC HW Revision : Product 300W Passive CS Cores Total No of Active Cores : 61 Voltage : 1040000 uV Frequency : 1238095 kHz Thermal Fan Speed Control : N/A Fan RPM : N/A Fan PWM : N/A Die Temp : 47 C GDDR GDDR Vendor : Samsung GDDR Version : 0x6 GDDR Density : 4096 Mb GDDR Size : 15872 MB GDDR Technology : GDDR5 GDDR Speed : 5.500000 GT/s GDDR Frequency : 2750000 kHz GDDR Voltage : 1501000 uV
From searching online I see a few other users who have run into the:
offload error: cannot offload to MIC - device is not available [Offload] [HOST] [State] Unregister data tables
issue, but I don't see any good resolution (other than by restarting mpss, which does not resolve the issue for me).