Quantcast
Channel: Intel® Many Integrated Core Architecture
Viewing all 1347 articles
Browse latest View live

Building R for Phi

$
0
0

Hi,

Has anyone successfully built R to run natively on Phi?

Thanks,

George


Phi seems not fully support AVX512? Any way to do MATRIX transpose?

$
0
0

I found in past topics that mm512_unpacklo_* is not supported on phi. In my own implementation, it seems mm512_permute* and mm512_shuffle* is also not supported. So far all matrix transpose operation in past posts seems implemented by using mm512_swizzle* and mm512_blend* instructions. However, use these two operations requires two times more element movement, seems low efficiency. Is their any other choices to do matrix transpose?

 

Intel(R) Manycore Platform Software Stack (MPSS) - Long-Term-Support Archive

$
0
0

In this page you will find the last releases of the Intel(R) Manycore Platform Software Stack (MPSS) Long Term Support product (LTS). The most recent release is found here: http://software.intel.com/en-us/articles/intel-many-integrated-core-architecture-intel-mic-architecture-platform-software-stack and we recommend customers use the latest release wherever possible.

N-1 release for MPSS 3.4.x series

MPSS 3.4.4 release for Linux

MPSS versionDownloads availableSize (range)MD5 Checksum
mpss-3.4.4   (released: June 2, 2015)

Linux (mpss-3.4.4-linux.tar) for RedHat 6.3, RedHat 6.4, RedHat 6.5, RedHat 6.6, RedHat 7.0, SuSE SLES11 SP2, SuSE SLES11 SP3, SuSE SLES12


~420MB

603fee578662bd83ac78cb0293c0b4df

 

Software for Coprocessor OS (k1om) (mpss-3.4.4-k1om.tar)

~700MB42c2eba4d727991e4e8f99dababeba63
 SOURCE (mpss-src-3.4.4.tar)~270MB0030c519e7740ad9d8552aa8bedc4e94
 Download Cache (mpss-downloadcache-3.4.4.tar)~1.1GB47031c23014ce5a0f43ff093ad42251d
Documentation linkDescriptionLast Updated OnSize (approx)
releaseNotes-linux.txtEnglish - Release NotesJune 2015~54KB
readme.txtReadme (includes installation instructions) for Linux (English)June 2015~20KB
MPSS_Users_Guide.pdfComplete Users Guide for MPSS for Linux (English)June 2015~2MB
SCIF_UserGuide.pdfSCIF User guideJune 2015~700KB
license.txtINTEL SOFTWARE LICENSE AGREEMENT for Intel® Manycore Platform Software Stack (Intel® MPSS)June 2015~30KB

 

 

N-2 release for MPSS 3.4.x series

MPSS 3.4.3 release for Linux

MPSS versionDownloads availableSize (range)MD5 Checksum
mpss-3.4.3   (released: February 20, 2015)

Linux (mpss-3.4.3-linux.tar) for RedHat 6.3, RedHat 6.4, RedHat 6.5, RedHat 6.6, RedHat 7.0, SuSE SLES11 SP2, SuSE SLES11 SP3, SuSE SLES12


~400MB

fa960e90045a1ab16e1b68920030233c

 

Software for Coprocessor OS (k1om) (mpss-3.4.3-k1om.tar)

~700MB85b4f4b6873a8ec21cc9e1d6d95cec04
 SOURCE (mpss-src-3.4.3.tar)~270MB1fdd717f025ee6c6c999f991e76dde9f
 Download Cache (mpss-downloadcache-3.4.3.tar)~1.1GB1ec83289d06ec8c12dea80f7a5482034
Documentation linkDescriptionLast Updated OnSize (approx)
releaseNotes-linux.txtEnglish - Release NotesFebruary 2015~62KB
readme.txtReadme (includes installation instructions) for Linux (English)February 2015~20KB
MPSS_Users_Guide.pdfComplete Users Guide for MPSS for Linux (English)February 2015~2MB
SCIF_UserGuide.pdfSCIF User guideFebruary 2015~700KB
license.txtINTEL SOFTWARE LICENSE AGREEMENT for Intel® Manycore Platform Software Stack (Intel® MPSS)February 2015~30KB

MPSS 3.4.3 release for Microsoft* Windows

MPSS versionDownloads availableSizeMD5 Checksum
 mpss-3.4.3-windows.zip (released: February 20, 2015)Microsoft* Windows~310MB

588c1431fa0803f5b478aa771703efa2

Software for Coprocessor OS (k1om)  (mpss-3.4.3-k1om.tar) ~700MB

85b4f4b6873a8ec21cc9e1d6d95cec04

 

Documentation linkDescriptionLast Updated OnSize
releaseNotes-windows.txtEnglish - release notesFebruary 2015~25KB
readme-windows.pdfEnglish (includes installation instructions) for Microsoft* WindowsFebruary 2015~550KB
MPSS_Users_Guide-windows.pdfUser, Cluster and Advanced Configuration Guide for MPSSFebruary 2015~2

 

N-3 release for MPSS 3.4.x series

MPSS 3.4.2 release for Linux

 

MPSS versionDownloads availableSize (range)MD5 Checksum
mpss-3.4.2   (released: December 3, 2014)

Linux (mpss-3.4.2-linux.tar) for RedHat 6.3, RedHat 6.4, RedHat 6.5, RedHat 6.6, RedHat 7.0, SuSE SLES11 SP2, SuSE SLES11 SP3


~400MB

40896e317418fd20a758fd7ce2408aac

 Software for Coprocessor OS (k1om) (mpss-3.4.2-k1om.tar)~700MB27004c1423bb3e29010de2284577d024
 SOURCE (mpss-src-3.4.2.tar)~270MBb5031821ac8d4faaf12b4fbb1728e97a
 Download Cache (mpss-downloadcache-3.4.2.tar)~1.1GB4d937079b4ef2a8eef821e12f2e61ebd

 

 

Documentation linkDescriptionLast Updated OnSize (approx)
releaseNotes-linux.txtEnglish - Release NotesDecember 2014~75KB
readme.txtReadme (includes installation instructions) for Linux (English)December 2014~20KB
MPSS_Users_Guide.pdfComplete Users Guide for MPSS for Linux (English)December 2014~2MB
SCIF_UserGuide.pdfSCIF User guideDecember 2014~700KB
license.txtINTEL SOFTWARE LICENSE AGREEMENT for Intel® Manycore Platform Software Stack (Intel® MPSS)September 2013~30KB

MPSS 3.4.2 release for Microsoft* Windows

 

MPSS versionDownloads availableSizeMD5 Checksum
 mpss-3.4.2-windows.zip (released: December 3, 2014)Microsoft* Windows~310MB

64b2bb347ce870098b2e8dafa10e5d67

Software for Coprocessor OS (k1om)  (mpss-3.4.2-k1om.tar) ~700MB

27004c1423bb3e29010de2284577d024

 

Documentation linkDescriptionLast Updated OnSize
releaseNotes-windows.txtEnglish - release notesDecember 2014~30KB
readme-windows.pdfEnglish (includes installation instructions) for Microsoft* WindowsDecember 2014~620KB
MPSS_Users_Guide-windows.pdfUser, Cluster and Advanced Configuration Guide for MPSSDecember 2014~2MB

 

N-4 release for MPSS 3.4.x series

MPSS 3.4.1 release for Linux

 

MPSS versionDownloads availableSize (range)MD5 Checksum
mpss-3.4.1   (released: October 22 2014)

Linux (mpss-3.4.1-linux.tar) for RedHat 6.3, RedHat 6.4, RedHat 6.5, RedHat 6.6, RedHat 7.0, SuSE SLES11 SP2, SuSE SLES11 SP3


~400MB

e985afee031baf542090883d3752fcfa

 Software for Coprocessor OS (k1om) (mpss-3.4.1-k1om.tar)~700MB23d3db962c2abc659945598aa6793374
 SOURCE (mpss-src-3.4.1.tar)~270MB73ecb48cf74bd815ae8c3753868c80d8
 Download Cache (mpss-downloadcache-3.4.1.tar)~1.1GB3bdc15046dbd4b23a58cb1684d73e05f

 

 

Documentation linkDescriptionLast Updated OnSize (approx)
releasenotes-linux.txtEnglish - Release NotesOctober 2014~75KB
readme.txtReadme (includes installation instructions) for Linux (English)October 2014~20KB
MPSS_Users_Guide.pdfComplete Users Guide for MPSS for Linux (English)October 2014~2MB
SCIF_UserGuide.pdfSCIF User guideOctober 2014~700KB
license.txtINTEL SOFTWARE LICENSE AGREEMENT for Intel® Manycore Platform Software Stack (Intel® MPSS)September 2013~30KB

 

MPSS 3.4.1 release for Microsoft* Windows

 

MPSS versionDownloads availableSizeMD5 Checksum
mpss-3.4.1-windows.zip  (released: October 22 2014)Microsoft* Windows~310MB

27b8c2ced28569b58c9d00255bc3219f

Software for Coprocessor OS (k1om) (mpss-3.4.1-k1om.tar) ~700MB

23d3db962c2abc659945598aa6793374

 

Documentation linkDescriptionLast Updated OnSize
releaseNotes-windows.txtEnglish - release notesOctober 2014~30KB
readme-windows.pdfEnglish (includes installation instructions) for Microsoft* WindowsOctober 2014~620KB
MPSS_Users_Guide-windows.pdfUser, Cluster and Advanced Configuration Guide for MPSSOctober 2014~2MB

 

  • Intel Many Integrated Cores
  • Sviluppatori
  • Professori
  • Studenti
  • Linux*
  • Microsoft Windows* 10
  • Microsoft Windows* 8.x
  • Avanzato
  • Principiante
  • Intermedio
  • Architettura Intel® Many Integrated Core
  • Server
  • Desktop
  • URL
  • Per iniziare
  • 31S1P problems (MSI-X Enable-, or 4G Decoding, probably)

    $
    0
    0

    Hello, everyone.  I've been lurking on the forums for a few days now while I schemed up a cooling solution for my shiny new 31S1P. 

    I'm pretty sure I've conquered the cooling requirements.  Check!

    However, I cannot get the card to work correctly.  I'm using a Z97-WS motherboard with "4G Decoding" enabled in the BIOS settings. The CPU is a Celeron G1820 which is a cheap little lga1150 socket CPU that seemed to be enough for this rig.  I'm running the latest BIOS (2403, I believe from 2015-06-18 or thereabouts), latest version of CentOS 7.1, which is 7.1.1503 (Core). 

    I've followed all of the advice and forums I could find online about this issue, to little avail.  Here is a piece of my console log showing the relevant information I am likely to be asked to provide if I don't do it here:

    ----------------------------------------------------------------------------------------

    [root@x mpss-3.5.2]# dmesg | grep MSI
    [    0.102438] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI]
    [    0.408378] pcieport 0000:00:01.0: irq 40 for MSI/MSI-X
    [    0.408786] pcieport 0000:01:00.0: irq 41 for MSI/MSI-X
    [    0.408881] pcieport 0000:02:08.0: irq 42 for MSI/MSI-X
    [    0.408972] pcieport 0000:02:10.0: irq 43 for MSI/MSI-X
    [    0.409070] pcieport 0000:06:00.0: irq 44 for MSI/MSI-X
    [    0.409184] pcieport 0000:07:01.0: irq 45 for MSI/MSI-X
    [    0.409349] pcieport 0000:07:02.0: irq 46 for MSI/MSI-X
    [    0.409465] pcieport 0000:07:03.0: irq 47 for MSI/MSI-X
    [    0.409579] pcieport 0000:07:04.0: irq 48 for MSI/MSI-X
    [    0.409692] pcieport 0000:07:05.0: irq 49 for MSI/MSI-X
    [    0.409808] pcieport 0000:07:06.0: irq 50 for MSI/MSI-X
    [    0.409920] pcieport 0000:07:07.0: irq 51 for MSI/MSI-X
    [    0.452551] xhci_hcd 0000:00:14.0: irq 52 for MSI/MSI-X
    [    0.518593] xhci_hcd 0000:10:00.0: irq 53 for MSI/MSI-X
    [    0.518597] xhci_hcd 0000:10:00.0: irq 54 for MSI/MSI-X
    [    0.518600] xhci_hcd 0000:10:00.0: irq 55 for MSI/MSI-X
    [    0.710232] e1000e 0000:00:19.0: irq 56 for MSI/MSI-X
    [    0.825566] igb 0000:0d:00.0: irq 57 for MSI/MSI-X
    [    0.825570] igb 0000:0d:00.0: irq 58 for MSI/MSI-X
    [    0.825573] igb 0000:0d:00.0: irq 59 for MSI/MSI-X
    [    0.825577] igb 0000:0d:00.0: irq 60 for MSI/MSI-X
    [    0.825581] igb 0000:0d:00.0: irq 61 for MSI/MSI-X
    [    0.855040] igb 0000:0d:00.0: Using MSI-X interrupts. 2 rx queue(s), 2 tx queue(s)
    [    0.984604] i915 0000:00:02.0: irq 62 for MSI/MSI-X
    [    1.187177] ahci 0000:00:1f.2: irq 63 for MSI/MSI-X
    [    1.189283] ahci 0000:0a:00.0: irq 64 for MSI/MSI-X
    [    1.190251] ahci 0000:0f:00.0: irq 65 for MSI/MSI-X
    [   12.487762] mei_me 0000:00:16.0: irq 66 for MSI/MSI-X
    [   12.702815] snd_hda_intel 0000:00:03.0: irq 67 for MSI/MSI-X
    [   12.702983] snd_hda_intel 0000:00:1b.0: irq 68 for MSI/MSI-X
    [root@x mpss-3.5.2]# lspci | grep -i coproc
    03:00.0 Co-processor: Intel Corporation Xeon Phi coprocessor 31S1 (rev 11)
    [root@x mpss-3.5.2]# lspci -s 03:00.0 -vv
    03:00.0 Co-processor: Intel Corporation Xeon Phi coprocessor 31S1 (rev 11)
        Subsystem: Intel Corporation Device 2500
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 255
        Region 0: Memory at <unassigned> (64-bit, prefetchable) [disabled] [size=8G]
        Region 4: Memory at bf200000 (64-bit, non-prefetchable) [disabled] [size=128K]
        Capabilities: [44] Power Management version 3
            Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot-,D3cold-)
            Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [4c] Express (v2) Endpoint, MSI 00
            DevCap:    MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 <64us
                ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
            DevCtl:    Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                MaxPayload 256 bytes, MaxReadReq 512 bytes
            DevSta:    CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
            LnkCap:    Port #0, Speed 5GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <4us, L1 unlimited
                ClockPM- Surprise- LLActRep- BwNot-
            LnkCtl:    ASPM Disabled; RCB 64 bytes Disabled- CommClk-
                ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
            LnkSta:    Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
            DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR-, OBFF Not Supported
            DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
            LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                 Compliance De-emphasis: -6dB
            LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [88] MSI: Enable- Count=1/16 Maskable- 64bit+
            Address: 0000000000000000  Data: 0000
        Capabilities: [98] MSI-X: Enable- Count=16 Masked-
            Vector table: BAR=4 offset=00017000
            PBA: BAR=4 offset=00018000
        Capabilities: [100 v1] Advanced Error Reporting
            UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
            UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
            UESvrt:    DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
            CESta:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
            CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
            AERCap:    First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-

    [root@x mpss-3.5.2]# dmesg | grep mic
    [    0.000000] CPU0 microcode updated early to revision 0x1c, date = 2014-07-03
    [    0.061159] CPU1 microcode updated early to revision 0x1c, date = 2014-07-03
    [    0.068898] atomic64 test passed for x86-64 platform with CX8 and with SSE
    [    0.089803] ACPI: Dynamic OEM Table Load:
    [    0.091965] ACPI: Dynamic OEM Table Load:
    [    0.093790] ACPI: Dynamic OEM Table Load:
    [    0.387895] microcode: CPU0 sig=0x306c3, pf=0x2, revision=0x1c
    [    0.387899] microcode: CPU1 sig=0x306c3, pf=0x2, revision=0x1c
    [    0.387920] microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba
    [    0.526732] mousedev: PS/2 mouse device common for all mice
    [    0.710216] e1000e 0000:00:19.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
    [    3.071226] usb 5-2: ep 0x81 - rounding interval to 1024 microframes, ep desc says 2040 microframes
    [    3.439111] usb 5-2.1: ep 0x81 - rounding interval to 64 microframes, ep desc says 80 microframes
    [    3.439113] usb 5-2.1: ep 0x82 - rounding interval to 1024 microframes, ep desc says 2040 microframes
    [root@x mpss-3.5.2]# micinfo
    MicInfo Utility Log
    Created Mon Aug 17 04:01:04 2015

        System Info
            HOST OS            : Linux
            OS Version        : 3.10.0-229.el7.x86_64
            Driver Version        : NotAvailable
            MPSS Version        : 3.5.2

            Host Physical Memory    : 16141 MB
    micinfo: No devices found : host driver is not loaded: No such file or directory

    [root@x j]# depmod
    [root@x j]# modprobe mic
    modprobe: FATAL: Module mic not found.
    [root@x j]# service mpss start
    Starting mpss (via systemctl):                             [  OK  ]
    [root@x j]# micctrl -s
      [Error] micrasrelmond: State failed - non existent MIC device
    [root@x j]#

    ----------------------------------------------------------------------------------------

     

    If I can get it to show up in a dmesg | grep mic output again, I'll post it here.  i've gotten that output of that to vary a little. 

    I don't have a special BIOS from ASUS, but as I said above, it is the latest available and it only came out a few weeks ago.  Could this be an instance of what Frances was talking about here? https://software.intel.com/en-us/forums/topic/538897#comment-1811230

     

    In other words, the fact that MSI-X doesn't appear to be operative for my 31S1P.  I cannot for the life of me figure out how to force to to be enabled.  Is this going to require recompiling my kernel?

     

    If anyone has any ideas, I'm all ears/eyes.

    Thanks!

    How to allocation MICs to all the MPI processors equally for AO?

    $
    0
    0

    Hello,
    Could you please take a look at this problem? My machine has 16 CPUs and 4 MICs (47 coprocessors each), and I run my program with 8 MPI processors (mpi_comm_size = 8) and want to use MKL routines with automatic offload (AO) mode. As you can see in the test code attached, I tried three different methods.
    METHOD-1: I allocate the 4 MICs to the first 4 CPUs each and let the other CPUs run w/o MIC. In this case the program works well as expected and I got the following performance test result when solving zgemm for 5k*5k size of complex & dense matrices.
    CPU_ID      0       1        2         3        4           5         6         7 
    time(s)     1.67   1.93   1.97   1.93   13.85   12.94   12.94  12.93

    METHOD-2: Now, this is the problematic situation. I want all the 8 CPUs to share the 4 MICs equally expecting that the CPUs show a performance of about 4 seconds for the same zgemm problem as method-1. However, this method does not work well but gives error messages right away or after solving its first zgemm problem,
    *** glibc detected *** ../../../bin/test: malloc(): memory corruption: 0x00007f59fc000010 ***
    or
    CPU_ID      0       1        2         3        4           5         6         7 
    time(s)      101    10     101       95      26        25       14      14
    *** glibc detected *** ../../../bin/test: free(): corrupted unsorted chunks: 0x0000000009f47270 ***

    METHOD-3: If I replace mkl_mic_set_workdivision() with mkl_mic_set_resource_limit(), then the program does not crash but there's no response at all. I see that the CPU and MIC usages are almost zero.

    Please take a look at a piece of my code attached and give some advices.
    Thank you.

    AllegatoDimensione
    Downloadtest.cpp1.5 KB

    Xeon Phi and offload from MATLAB MEX file

    $
    0
    0

    Hello,

    I am having a really hard time figuring out how to use the Xeon Phi offload mode from within MATLAB MEX files under Linux. I have managed to force MATLAB to use icc for compilation and verified that the mex files run fine. The problems start when using the offload pragma - as far as I can tell, nobody has tried that yet and I suspect this is some (fixable?) issue with libraries. Can someone here help me with this?

    Consider the following simple code

    int main()
    {
      __attribute__((target(mic : 0))) int vsize;
    
    #pragma offload target(mic:0)
      vsize = 10;
    }

    When I execute this with OFFLOAD_REPORT=3, I get the following output

    $ ./test
    [Offload] [HOST]          [State]           Initialize logical card 0 = physical card 0
    [Offload] [MIC 0] [File]                    test.c
    [Offload] [MIC 0] [Line]                    23
    [Offload] [MIC 0] [Tag]                     Tag 0
    [Offload] [HOST]  [Tag 0] [State]           Start target
    [Offload] [HOST]  [Tag 0] [State]           Setup target entry: __offload_entry_test_c_23mainicc0101288930704RqbsVt
    [Offload] [HOST]  [Tag 0] [State]           Host->target pointer data 0
    [Offload] [HOST]  [Tag 0] [Signal]          signal : none
    [Offload] [HOST]  [Tag 0] [Signal]          waits  : none
    [Offload] [HOST]  [Tag 0] [State]           Host->target pointer data 0
    [Offload] [HOST]  [Tag 0] [State]           Host->target copyin data 4
    [Offload] [HOST]  [Tag 0] [State]           Execute task on target
    [Offload] [HOST]  [Tag 0] [State]           Target->host pointer data 0
    [Offload] [MIC 0] [Tag 0] [State]           Start target entry: __offload_entry_test_c_23mainicc0101288930704RqbsVt
    [Offload] [MIC 0] [Tag 0] [Var]             vsize  INOUT
    [Offload] [HOST]  [Tag 0] [CPU Time]        0.301827(seconds)
    [Offload] [MIC 0] [Tag 0] [CPU->MIC Data]   4 (bytes)
    [Offload] [MIC 0] [Tag 0] [MIC Time]        0.000171(seconds)
    [Offload] [MIC 0] [Tag 0] [MIC->CPU Data]   4 (bytes)
    
    [Offload] [MIC 0] [Tag 0] [State]           Target->host copyout data   4

     

    I have written a MEX file that does the same thing. FYI, a MEX file is essentially a dynamic .so library with one specific symbol exported. The result of running the MEX file under MATLAB is as follows

     

    >> mictest
    [Offload] [HOST]          [State]           Initialize logical card 0 = physical card 0
    offload error: cannot load library to the device 0 (error code 5)
    
    ------------------------------------------------------------------------
           Segmentation violation detected at Fri Aug 21 14:57:31 2015
    ------------------------------------------------------------------------
    
    [...]

    I have looked around and tried to set the OFFLOAD_INIT=on_start variable before starting MATLAB. The results were VERY promising, but still some problems remain unsolved:

    [Offload] [MIC 0] [File]                    mictest_mex.c
    [Offload] [MIC 0] [Line]                    41
    [Offload] [MIC 0] [Tag]                     Tag 0
    [Offload] [HOST]  [Tag 0] [State]           Start target
    [Offload] [HOST]  [Tag 0] [State]           Setup target entry: __offload_entry_mictest_mex_c_41mexFunctionicc0104735023118W8NJ2
    [Offload] [HOST]  [Tag 0] [State]           Host->target pointer data 0
    [Offload] [HOST]  [Tag 0] [Signal]          signal : none
    [Offload] [HOST]  [Tag 0] [Signal]          waits  : none
    [Offload] [HOST]  [Tag 0] [State]           Host->target pointer data 0
    [Offload] [HOST]  [Tag 0] [State]           Host->target copyin data 4
    [Offload] [HOST]  [Tag 0] [State]           Execute task on target
    offload error: cannot create pipeline on the device 0 (error code 14)

    So it seems that MIC is indeed doing something, but one last step is missing to make this work. The library paths and the whole bash environment is the same in both cases. I have also looked at output of the nm command and it seems that in both cases (C standalone and MATLAB MEX) the number and names of symbols that contain work 'offload' are the same/similar.

    I think this can be solved: I have seen a document about MKL using offload inside MATLAB, alas for Windows. Does anybody have a clue where to start?

    Thanks a lot!

    Marcin Krotkiewski

     

    Regarding sgemm benchmarks for MIC devices

    $
    0
    0

    Hi Intel forums,

    I've had difficulty reproducing the performance reported on the following page:

    https://www-ssl.intel.com/content/www/us/en/benchmarks/server/xeon-phi/xeon-phi-sgemm-dgemm.html

    Using the mkl sgemm routine on my 3120 series Xeon Phi, I haven't even approached the 1.7 TFLOP/S level claimed above. The best performance I achieve is ~0.7 TFLOP/S. Presumably, this is because I don't fully understand the threading and vectorization APIs, and I'm not using them optimally. I was wondering if anyone knows where to find the source & environment details used for Intel's official benchmark. Maybe I could compare "correct" usage with my code to better understand the tools.

    Thanks,

    Chris

    Current status of OFED support

    $
    0
    0

    Hello,

    Could you please provide us with some matrix: what OFED should be used for what OS distribution in case of the latest MPSS (3.5.2, linux)? We need to know in what cases MPSS supports the OFED and in what cases better to use alternative OFED distribution.

    Best regards,

    Taras


    How Xeon Phi divides address space with distributed L2

    $
    0
    0

    Hello,

    I have been working with Knights Corner platform for some time. Like they do with libnuma and DPDK, I have been wondering if I could write a cache and memory controller-aware memory allocation code for Xeon Phi. Last time I asked, I didn't get much information on the subject (https://software.intel.com/en-us/comment/1799811#comment-1799811), but then I came across this while browsing through the datasheet (http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/x...).

    "Communication around the ring follows a Shortest Distance Algorithm (SDA). Coresident with each core structure is a portion of a distributed tag directory. These tags are hashed to distribute workloads across the enabled cores. Physical addresses are also hashed to distribute memory accesses across the memory controllers."

    I believe a full description of this scheme is the answer I'm seeking.

    If someone from Intel could provide me with a more detailed explanation, or where I could find one, I would be very grateful. More honestly, I NEED to know this.
    For instance,
     a) how does it hash physical addresses? Does it divide 40-bit physical address space by cache line size (64B) and distribute 0x400000000 (2^34) cache lines to DTD by performing a modulo operation on ordinal number of each cache line?
     b) Is L2 address space segmentation in PA space somewhat preserved in VA space as well? For instance, would every 60th cache line belong in a specific core's L2 tag directory?
     c) which one of the following does "enabled cores" mean? i) all on-board cores, ii) cores with any executing threads, iii) cores not disabled by some means I'm not aware of. If iii) is the case, how do you disable the core?

    For your information, I am currently using Xeon Phi 5110P, and could possibly be purchasing/using more 31S1Ps.

    Thank you for your attention.

    Jun

    Intel® Parallel Studio XE 2016: High Performance for HPC Applications and Big Data Analytics

    $
    0
    0

    Intel® Parallel Studio XE 2016, launched on August 25, 2015, is the latest installment in our developer toolkit for high performance computing (HPC) and technical computing applications. This suite of compilers, libraries, debugging facilities, and analysis tools, targets Intel® architecture, including support for the latest Intel® Xeon® processors (codenamed Skylake) and Intel® Xeon Phi™ processors (codenamed Knights Landing). Intel® Parallel Studio XE 2016 helps software developers design, build, verify and tune code in Fortran, C++, C, and Java.

    There are four things that I like to highlight when I describe this year's tool release:

    1. Intel® Data Analytics Acceleration Library
    2. Vectorization Advisor
    3. MPI Performance Snapshot
    4. High performance support for industry standards, the latest processors, operating systems and their related development environments.

    Intel Data Analytics Acceleration Library (Intel® DAAL)

    Data Scientists are finding Intel® DAAL very exciting because it helps speed big data analytics. It’s designed for use with popular data platforms including Hadoop, Spark, R, and Matlab, for highly efficient data access. We’ve seen Intel DAAL accelerate PCA by 4-7X ,and a customer that has seen 200X for the Alternating Least Square prediction algorithm, when compared with the latest open source Spark + MLlib. (details for both claims are in my blog about DAAL).  Intel DAAL was created by the renowned team that creates the Intel® Math Kernel Library (Intel® MKL). Intel DAAL can be thought of as “Intel MKL for Big Data” – but it is actually much more! Many more details on Intel DAAL, including ways to download it today for free are in my blog about DAAL. Intel DAAL is available for Linux, OS X and Windows. 

    Vectorization Advisor

    Vectorization is the process of using SIMD instructions in processors. In the quest to “modernize” application to get top performance out of any modern processor, a software developer needs to tackle multithreading, vectorization and fabric scaling. Intel® Advisor XE 2016 provides tools to help with multithreading and vectorization:

    • Vectorization Advisor is an analysis tool that helps identify loops that will benefit most from vectorization by identifying obstacles to vectorization that are particular to your program, explore the benefit of alternative data reorganizations, and increase the confidence that transformations, aimed to increase vectorization, will preserve the correctness of your original program.
    • Threading Advisor is a threading design and prototyping tool that lets you analyze, design, tune, and check threading design options rapidly.

    Threading Advisor has gained a reputation in the past five years for helping find the right choice for multithreading an application more quickly and without costly oversights. The experience of refining this ‘advisor’ has helped us to create this new advisor for vectorization with our knowledge of the best ways to give advice based on a program analysis.

    Vector Advisor cannot tell you anything we could not show you how to do yourself. However, when I teach ‘vectorization’ I tend to rattle off a list of things to check. Each item that I suggest to “check” involves using a tool in a particular way. Bringing that into one tool, make life easier and definitely makes the process faster and more efficient. One of the key Vectorization Advisor features is a Survey Report that offers integrated compiler report data and performance data all in one place, including GUI-embedded advice on how to fix vectorization issues specific to your code. This page augments that GUI-embedded advice with links to web-based vectorization resources.

    An excellent 12 minute introduction to the Vectorization Advisor is available as a video online.

    MPI Performance Snapshot

    The MPI Performance Snapshot is a scalable lightweight performance tool for MPI applications. It collects a variety of MPI application statistics (such as communication, activity, and load balance) and presents it in an easy-to-read format. The tool is not available separately but is provided as part of the Intel® Parallel Studio XE 2016 Cluster Edition.

    The MPI Performance Snapshot is trying to solve the following problems as it relates to analysis of MPI application when scaling out to thousands of ranks:

    1. Cluster Sizes continue to grow so applications are getting more and more scalable
    2. Large amounts of data are collected when doing profiling at larger scale - that can easily become unmanageable
    3. It's hard to identify which are the key metrics to track when you gather so much data

    By addressing these three items, MPI Performance Snapshot improves scaling to at least 32K ranks which is an order of magnitude above what is tolerable with the prior Intel Trace Analyzer and Collector. Therefore, we can now recommend when aiming to optimize a large scale run (anything above 1000 MPI ranks), we suggesting starting with the MPI Performance Snapshot capability first and figure out where you need to dig deeper (which processes are slowing you down, where are the peaks in your memory usage, etc.).  Then, do another run with the Intel Trace Analyzer and Collector on a subset of selected ranks to get a more detailed per-process information in order to visualize how a communication algorithm is implemented and if see if there are apparent bottlenecks.

    MPI Performance Snapshot combines lightweight statistics from the Intel® MPI Library with OS and hardware-level counters to provide you with high-level categorization of your application: MPI vs. OpenMP load imbalance info, memory usage, and a break-down of MPI vs. computation vs. serial time.

    For more details, you should check out the full MPI Performance Snapshot User's Guide and Analyzing MPI Applications with MPI Performance Snapshot on the Intel Trace Analyzer and Collector documentation page.

    High performance support for…

    The latest processors...

    are supported including support for the Skylake microarchitecture and Knight Landing microarchitecture.

    The latest industry standards...

    We take pride in having very strong support for industry standards – we aim to be a leader and maintain our reputation of being second-to-none. 

    Our Fortran support even includes a feature from the draft Fortran 2015 standard which can help MPI-3 users. The current status of features of Fortran can be found in Dr. Fortran’s blog “Intel® Fortran Compiler - Support for Fortran language standards.”

    The current status of C/C++ standard support features can be found in Jennifer’s blogs “C++14 Features Supported by Intel® C++ Compiler” and “C11 Support in Intel C++ Compiler.” 

    Our OpenMP support is detailed in the latest user guide for the C/C++ compiler and the latest user guide for the Fortran compiler.

    Operating system support includes Debian 7.0, 8.0; Fedora 21, 22; Red Hat Enterprise Linux 5, 6, 7; SuSE LINUX Enterprise Server 11,12; Ubuntu 12.04 LTS (64-bit only), 13.10, 14.04 LTS, 15.04; OS X 10.10; Windows 7 thru 10, Windows Server 2008-2012. These are just the versions we have tested, many additional operating systems should work (for instance, CentOS).

    Learn More

    There is a series of webinars being held starting in September 2015 which cover many topics related to Intel Parallel Studio XE 2016.  The webinars can be attended live, and offer interactive question and answer time. The webinars will also be available for replay after the live webinar is held.  The first webinar is on September 1 – “What’s New in Intel® Parallel Studio XE 2016?

    Many more ways to learn more are on the Intel® Parallel Studio XE 2016 website.  A number of benchmarks illustrating performance measurements are online as well.

    There are many new features that I did not dive into, including great new support for MPI+OpenMP tuning with Intel VTune Amplifier XE, as well as a number of enhancements to Intel® Threading Building Blocks including the incresingly popular flow graph capabilities and task arenas,

    Download Intel® Parallel Studio XE 2016 today

    An evaluation copy can be obtained by requesting an evaluation copy of Intel® Parallel Studio XE 2016. It is available for purchase worldwide.

    Students, educators, academic researchers and open source contributors may qualify for some free tools.

    The Intel Performance Libraries are also available via the Community Licensing for Intel Performance Libraries. Under this option, the library is free for any one who registers, with no royalties, and no restrictions on company or project size. The community licensing program offers the current versions of libraries without Intel Premier Support access (Intel Premier Support offers exclusive 1-on-1 support via an interactive and secure web site where you can submit questions or problems and monitor previously submitted issues. Intel® Premier Support requires registration after purchase of the software, or special qualification offered to students, educators, academic researchers and open source contributors.).

     

     

     

    Immagine icona: 

  • Big data
  • Elaborazione basata su cluster
  • Modernizzazione codici
  • Architettura Intel® Many Integrated Core
  • Ottimizzazione
  • Elaborazione parallela
  • Threading
  • Vettorizzazione
  • Intel® Cluster Ready
  • Intel® Streaming SIMD Extensions
  • Message Passing Interface
  • OpenMP*
  • C/C++
  • Fortran
  • Java*
  • Sviluppatori
  • Professori
  • Studenti
  • Apple OS X*
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Microsoft Windows* 10
  • Microsoft Windows* 8.x
  • Includere in RSS: 

    1
  • Avanzato
  • Principiante
  • Intermedio
  • No Cost Options for Intel Parallel Studio XE, Support yourself, Royalty-Free

    $
    0
    0

    Intel® Parallel Studio XE is a very popular product from Intel that includes the Intel Compilers, Intel Performance Libraries, tools for analysis, debugging and tuning, tools for MPI and the Intel MPI Library. Did you know that some of these are available for free?

    Here is a guide to “what is available free” from the Intel Parallel Studio XE suites.

    WhoWhat is Free?InformationWhere?
    Community Licenses for Everyone

    Intel® Math Kernel Library

    Intel® Data Acceleration Library

    Intel® Threading Building Blocks

    Intel® Integrated Performance Primitives

    Community Licensing for Intel Performance Libraries – free for all, registration required, no royalties, no restrictions on company or project size, current versions of libraries, no Intel Premier Support access.

    (Linux, Windows or OS X versions)

    Community Licensing for Intel Performance Libraries
    Evaluation Copies for Everyone

    Compilers, libraries and analysis tools (most everything!)

    Evaluation Copies – Try before you buy.

    (Linux, Windows or OS X versions)

    Try before you buy
    Use as an Academic Researcher

    Linux, Windows or OS X versions of:

    Intel® Math Kernel Library

    Intel® Data Acceleration Library

    Intel® Threading Building Blocks

    Intel® Integrated Performance Primitives

    Intel® MPI Library (not available for OS X)

    If you will use in conjunction with academic research at institutions of higher education.

    (Linux, Windows or OS X versions, expect Intel® MPI Library which is not supported on OS X)

    Qualify for Use as an Academic Researcher
    Student

    Compilers, libraries and analysis tools (most everything!)

    If you are a current student at a degree-granting institutions.

    (Linux, Windows or OS X versions)

    Qualify for Use as a Student
    Teacher

    Compilers, libraries and analysis tools (most everything!)

    If you will use in a teaching curriculum.

    (Linux, Windows or OS X versions)

    Qualify for Use as an Educator
    Use as an
    Open Source Contributor

    Intel® Parallel Studio XE Professional Edition for Linux

    If you are a developer actively contributing to a open source projects – and that is why you will utilize the tools.

    (Linux versions)

    Qualify for Use as an Open Source Contributor

    Free licenses for certain users has always been an important dimension in our offerings. One thing that really distinguishes Intel is that we sell excellent tools and provide second-to-none support for software developers who buy our tools. We provide multiple options - and we hope you will exactly what you need in one or our options.

     

    Immagine icona: 

  • Ricerca
  • Big data
  • Elaborazione basata su cluster
  • Modernizzazione codici
  • Debugging
  • Strumenti di sviluppo
  • Architettura Intel® Many Integrated Core
  • Ottimizzazione
  • Elaborazione parallela
  • Threading
  • Vettorizzazione
  • Intel® Cluster Ready
  • Message Passing Interface
  • OpenMP*
  • C/C++
  • Fortran
  • Sviluppatori
  • Professori
  • Studenti
  • Apple OS X*
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Microsoft Windows* 10
  • Microsoft Windows* 8.x
  • Includere in RSS: 

    1
  • Avanzato
  • Principiante
  • Intermedio
  • Debugging Intel® Xeon Phi™ Applications on Linux* Host

    $
    0
    0

    Contents

    Introduction

    Intel® Xeon Phi™ coprocessor is a product based on the Intel® Many Integrated Core Architecture (Intel® MIC). Intel® offers a debug solution for this architecture that can debug applications running on an Intel® Xeon Phi™ coprocessor.

    There are many reasons for the need of a debug solution for Intel® MIC. Some of the most important ones are the following:

    • Developing native Intel® MIC applications is as easy as for IA-32 or Intel® 64 hosts. In most cases they just need to be cross-compiled (-mmic).
      Yet, Intel® MIC Architecture is different to host architecture. Those differences could unveil existing issues. Also, incorrect tuning for Intel® MIC could introduce new issues (e.g. alignment of data, can an application handle more than hundreds of threads?, efficient memory consumption?, etc.)
    • Developing offload enabled applications induces more complexity as host and coprocessor share workload.
    • General lower level analysis, tracing execution paths, learning the instruction set of Intel® MIC Architecture, …

    Debug Solution for Intel® MIC

    For Linux* host, Intel offers a debug solution for Intel® MIC which is based on GNU* GDB. It can be used on the command line for both host and coprocessor. There is also an Eclipse* IDE integration that eases debugging of applications with hundreds of threads thanks to its user interface. It also supports debugging offload enabled applications.

    How to get it?

    There are currently two ways to obtain Intel’s debug solution for Intel® MIC Architecture on Linux* host:

    Both packages contain debug solutions for Intel® MIC Architecture!

    Attention:
    Never mix debugging tools from Intel® Parallel Studio XE with the ones from Intel® Manycore Platform Software Stack! Use all tools from the very same package. Different packages might have different debugger versions with different feature sets.

    Note:
    Intel® Composer XE 2013 SP1 contains GNU* GDB 7.5. With Intel® Parallel Studio XE 2015 GNU* GDB 7.7, and with Intel® Parallel Studio XE 2015 Update 2 GNU* GDB 7.8 (host only; 7.7 for coprocessor) is available. Intel® Parallel Studio XE 2016 contains GNU* GDB 7.8 for both host & coprocessor.
    MPSS versions have different versions of GNU* GDB – please check the Release Notes of the individual MPPS releases.
    There has been a change in product naming: Intel® Parallel Studio XE Composer Edition is the successor of Intel® Composer XE, starting with 2015.

    Why use GNU* GDB provided by Intel?

    • New features/improvements offered back to GNU* community
    • Latest GNU* GDB versions in future releases
    • Improved C/C++ & Fortran support thanks to Project Archer and contribution through Intel
    • Increased support for Intel® architecture (esp. Intel® MIC)
    • Additional debugging capabilities – more later

    Latest Intel related HW support and features are provided in the debug solution from Intel!

    Why is Intel providing a Command Line and Eclipse* IDE Integration?

    The command line with GNU* GDB has the following advantages:

    • Well known syntax
    • Lightweight: no dependencies
    • Easy setup: no project needs to be created
    • Fast for debugging hundreds of threads
    • Can be automatized/scripted

    Using the Eclipse* IDE provides more features:

    • Comfortable user interface
    • Most known IDE in the Linux* space
    • Use existing Eclipse* projects
    • Simple integration of the Intel enhanced GNU* GDB
    • Works also with Photran* plug-in to support Fortran
    • Supports debugging of offload enabled applications
      (not supported by command line)

    Features

    Intel’s GNU* GDB, starting with version 7.5, provides additional extensions that are available:

    • Support for Intel® Many Integrated Core Architecture (Intel® MIC Architecture):
      Displays registers (zmmX & kX) and disassembles the instruction set
    • Support for Intel® Transactional Synchronization Extensions (Intel® TSX):
      Helpers for Restricted Transactional Memory (RTM) model
      (only for host)
    • Data Race Detection (pdbx):
      Detect and locate data races for applications threaded using POSIX* thread (pthread) or OpenMP* models
    • Branch Trace Store (btrace):
      Record branches taken in the execution flow to backtrack easily after events like crashes, signals, exceptions, etc.
      (only for host)
    • Pointer Checker:
      Assist in finding pointer issues if compiled with Intel® C++ Compiler and having Pointer Checker feature enabled
      (only for host)
    • Register support for Intel® Memory Protection Extensions (Intel® MPX) and Intel® Advanced Vector Extensions 512 (Intel® AVX-512):
      Debugger is already prepared for future generations
    • And more...

    The features for Intel® MIC Architecture highlighted above are described in the following.
    Note that newer GNU* GDB versions with more features are already available, but those do not add anything in addition for Intel® MIC Architecture.

    Register and Instruction Set Support

    Compared to Intel® architecture on host systems, Intel® MIC Architecture comes with a different instruction and register set. Intel’s GNU* GDB comes with transparently integrated support for those.  Use is no different than with host systems, e.g.:

    • Disassembling of instructions:
      
      		(gdb) disassemble $pc, +10
      
      		Dump of assembler code from 0x11 to 0x24:
      
      		0x0000000000000011 <foobar+17>: vpackstorelps %zmm0,-0x10(%rbp){%k1}
      
      		0x0000000000000018 <foobar+24>: vbroadcastss -0x10(%rbp),%zmm0
      
      		⁞
      
      		


      In the above example the first ten instructions are disassembled beginning at the instruction pointer ($pc). Only first two lines are shown for brevity. The first two instructions are Intel® MIC specific and their mnemonic is correctly shown.
       
    • Listing of mask (kX) and vector (zmmX) registers:
      
      		(gdb) info registers zmm
      
      		k0   0x0  0
      
      		     ⁞
      
      		zmm31 {v16_float = {0x0 <repeats 16 times>},
      
      		      v8_double = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
      
      		      v64_int8 = {0x0 <repeats 64 times>},
      
      		      v32_int16 = {0x0 <repeats 32 times>},
      
      		      v16_int32 = {0x0 <repeats 16 times>},
      
      		      v8_int64 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
      
      		      v4_uint128 = {0x0, 0x0, 0x0, 0x0}}
      
      		


      Also registers have been extended by kX (mask) and zmmX (vector) register sets that come with Intel® MIC.

    If you use the Eclipse* IDE integration you’ll get the same information in dedicated windows:

    • Disassembling of instructions:
      Eclipse* IDE Disassembly Window
    • Listing of mask (kX) and vector (zmmX) registers:
      Eclipse* IDE Register Window

    Data Race Detection

    A quick excursion about what data races are:

    • A data race happens…
      If at least two threads/tasks access the same memory location w/o synchronization and at least one thread/task is writing.
    • Example:
      Imaging the two functions thread1()& thread2() are executed concurrently by different threads.

      
      		int a = 1;
      
      		int b = 2;
      
      		                                         | t
      
      		int thread1() {      int thread2() {     | i
      
      		  return a + b;        b = 42;           | m
      
      		}                    }                   | e
      
      		                                         v
      
      		


      Return value of thread1() depends on timing: 3 vs. 43!
      This is one (trivial) example of a data race.

    What are typical symptoms of data races?

    • Data race symptoms:
      • Corrupted results
      • Run-to-run variations
      • Corrupted data ending in a crash
      • Non-deterministic behavior
    • Solution is to synchronize concurrent accesses, e.g.:
      • Thread-level ordering (global synchronization)
      • Instruction level ordering/visibility (atomics)
        Note:
        Race free but still not necessarily run-to-run reproducible results!
      • No synchronization: data races might be acceptable

    GDB data race detection points out unsynchronized data accesses. Not all of them might incur data races. It is the responsibility of the user to decide which ones are not expected and filter them (see next).
    Due to technical limitations not all unsynchronized data accesses can be found, e.g.: 3rd party libraries or any object code not compiled with –debug parallel (see next).

    How to detect data races?

    • Prepare to detect data races:
      • Only supported with Intel® C++/Fortran Compiler:
        Compile with -debug parallel (icc, icpc or ifort)
        Only objects compiled with-debug parallel are analyzed!
      • Optionally, add debug information via –g
    • Enable data race detection (PDBX) in debugger:
      
      		(gdb) pdbx enable
      
      		(gdb) c
      
      		data race detected
      
      		1: write shared, 4 bytes from foo.c:36
      
      		3: read shared, 4 bytes from foo.c:40
      
      		Breakpoint -11, 0x401515 in L_test_..._21 () at foo.c:36
      
      		*var = 42; /* bp.write */
      
      		

    Data race detection requires an additional library libpdbx.so.5:

    • Keeps track of the synchronizations
    • Part of Intel® C++ & Fortran Compiler
    • Copy to coprocessor if missing
      (found at <install-dir>/compilers_and_libraries/linux/lib/mic/libpdbx.so)

    Supported parallel programming models:

    • OpenMP*
    • POSIX* threads

    Data race detection can be enabled/disabled at any time

    • Only memory access are analyzed within a certain period
    • Keeps memory footprint and run-time overhead minimal

    There is finer grained control for minimizing overhead and selecting code sections to analyze by using filter sets.

    More control about what to analyze with filters:

    • Add filter to selected filter set, e.g.:
      
      		(gdb) pdbx filter line foo.c:36
      
      		(gdb) pdbx filter code 0x40518..0x40524
      
      		(gdb) pdbx filter var shared
      
      		(gdb) pdbx filter data 0x60f48..0x60f50
      
      		(gdb) pdbx filter reads # read accesses
      
      		

      Those define various filter on either instructions by specifying source file and line or the addresses (range), or variables using symbol names or addresses (range) respectively. There is also a filter to only report accesses that use (read) data in case of a data race.
       
    • There are two basic configurations, that are exclusive:
       
      • Ignore events specified by filters (default behavior)
        
        				(gdb) pdbx fset suppress
        
        				
      • Ignore events not specified by filters
        
        				(gdb) pdbx fset focus
        
        				

        The first one defines a white list, whilst the latter one blacklists code or data sections that should not be analyzed.
         
    • Get debug command help
      
      		(gdb) help pdbx
      
      		

      This command will provide additional help on the commands.

    Use cases for filters:

    • Focused debugging, e.g. debug a single source file or only focus on one specific memory location.
    • Limit overhead and control false positives. Detection involves some runtime and memory overhead at runtime. The more filters narrow down the scope of analysis, the more the overhead will be reduced. This can also be used to exclude false positives. Those can occur if real data races are detected, but without any impact on application’s correctness by design (e.g. results of multiple threads don’t need to be globally stored in strict order).
    • Exclude 3rd party code for analysis

    Some additional hints using PDBX:

    • Optimized code (symptom):
      
      		(gdb) run
      
      		data race detected
      
      		1: write question, 4 bytes from foo.c:36
      
      		3: read question, 4 bytes from foo.c:40
      
      		Breakpoint -11, 0x401515 in foo () at foo.c:36
      
      		*answer = 42;
      
      		(gdb)
      
      		

       
    • Incident has to be analyzed further:
      • Remember: data races are reported on memory objects
      • If symbol name cannot be resolved: only address is printed
         
    • Recommendation:
      Unoptimized code (-O0) makes it easier to understand due to removed/optimized away temporaries, etc.
       
    • Reported data races appear to be false positives:
      • Not all data races are bad… user intended?
      • OpenMP*: Distinct parallel sections using the same variable (same stack frame) can result in false positives

    Note:
    PDBX is not available for Eclipse* IDE and will only work for remote debugging of native coprocessor applications. See section Debugging Remotely with PDBX for more information on how to use it.

    Debugging on Command Line

    There are multiple versions available:

    • Debug natively on Intel® Xeon Phi™ coprocessor
    • Execute GNU* GDB on host and debug remotely

    Debug natively on Intel® Xeon Phi™ coprocessor
    This version of Intel’s GNU* GDB runs natively on the coprocessor. It is included in Intel® MPSS only and needs to be made available on the coprocessor first in order to run it. Depending on the MPSS version it can be found at the provided location:

    • MPSS 2.1: /usr/linux-k1om-4.7/linux-k1om/usr/bin/gdb
    • MPSS 3.*: included in gdb-7.*+mpss3.*.k1om.rpm as part of package mpss-3.*-k1om.tar
      (for MPSS 3.1.2, please see Errata, for MPSS 3.1.4 use mpss-3.1.4-k1om-gdb.tar)

      For MPSS 3.* the coprocessor native GNU* GDB requires debug information from some system libraries for proper operation. Please see Errata for more information.

    Execute GNU* GDB on host and debug remotely
    There are two ways to start GNU* GDB on the host and debug remotely using GDBServer on the coprocessor:

    • Intel® MPSS:
      • MPSS 2.1: /usr/linux-k1om-4.7/bin/x86_64-k1om-linux-gdb
      • MPSS 3.*: <mpss_root>/sysroots/x86_64-mpsssdk-linux/usr/bin/k1om-mpss-linux/k1om-mpss-linux-gdb
      • GDBServer:
        /usr/linux-k1om-4.7/linux-k1om/usr/bin/gdbserver
        (same path for MPSS 2.1 & 3.*)
    • Intel® Parallel Studio XE Composer Edition:
      • Source environment to start GNU* GDB:
        
        				$ source compilervars.[sh|csh] [ia32|intel64]
        
        				$ gdb-mic
        
        				
      • GDBServer:
        <install-dir>/debugger_2016/gdb/targets/mic/bin/gdbserver

    The sourcing of the debugger environment is only needed once. If you already sourced the according compilervars.[sh|csh] script you can omit this step and gdb-mic should already be in your default search paths.

    Attention: Do not mix GNU* GDB & GDBServer from different packages! Always use both from either Intel® MPSS or Intel® Parallel Studio XE Composer Edition!

    Debugging Natively

    1. Make sure GNU* GDB is already on the target by:
      • Copy manually, e.g.:
        • MPSS 2.1:
          
          						$ scp /usr/linux-k1om-4.7/linux-k1om/usr/bin/gdb mic0:/tmp
          
          						
        • MPSS 3.*: Install gdb-7.*+mpss3.*.k1om.rpm
      • Add to the coprocessor image (see Intel® MPSS documentation)
         
    2. Run GNU* GDB on the Intel® Xeon Phi™ coprocessor, e.g.:
      • MPSS 2.1:
        
        				$ ssh –t mic0 /tmp/gdb
        
        				
      • MPSS 3.*:
        
        				$ ssh –t mic0 /usr/bin/gdb
        
        				

         
    3. Initiate debug session, e.g.:
      • Attach:
        
        				(gdb) attach <pid>

        <pid> is PID on the coprocessor
         
      • Load & execute:
        
        				(gdb) file <path_to_application>

        <path_to_application> is path on coprocessor

    Some additional hints:

    • If native application needs additional libraries:
      Set $LD_LIBRARY_PATH, e.g. via:
      
      		(gdb) set env LD_LIBRARY_PATH=/tmp/
      
      		

      …or set the variable before starting GDB
       
    • If source code is relocated, help the debugger to find it:
      
      		(gdb) set substitute-path <from> <to>

      Change paths from <from> to<to>. You can relocate a whole source (sub-)tree with that.

    Debugging is no different than on host thanks to a real Linux* environment on the coprocessor!

    Debugging Remotely

    1. Copy GDBServer to coprocessor, e.g.:
      • Intel® MPSS:
        
        				$ scp /usr/linux-k1om-4.7/linux-k1om/usr/bin/gdbserver mic0:/tmp
        
        				
      • Intel® Parallel Studio XE Composer Edition:
        
        				$ scp <install-dir>/debugger_2016/gdb/targets/mic/bin/gdbserver mic0:/tmp

        During development you can also add GDBServer to your coprocessor image!
         
    2. Start GDB on host, e.g.:
      
      		$ source compilervars.[sh|csh] [ia32|intel64]
      
      		$ gdb-mic
      
      		


      Note:
      There are also versions named gdb-ia and gdb-ia-mic which are for IA-32/Intel® 64 only!
      (Only for Intel® Parallel Studio XE 2015 Update 2 Composer Edition: gdb-ia is 7.8, gdb-ia-mic is 7.7)
    3. Connect:
      
      		(gdb) target extended-remote | ssh -T mic0 /tmp/gdbserver --multi –
      
      		

       
    4. Set sysroot from MPSS installation, e.g.:
      
      		(gdb) set sysroot /opt/mpss/3.1.4/sysroots/k1om-mpss-linux/
      
      		

      If you do not specify this you won't get debugger support for system libraries.
       
    5. Debug:
      • Attach:
        
        				(gdb) file <path_to_application>
        
        				(gdb) attach <pid>

        <path_to_application> is path on host, <pid> is PID on the coprocessor
         
      • Load & execute:
        
        				(gdb) file <path_to_application>
        
        				(gdb) set remote exec-file <remote_path_to_application>

        <path_to_application> is path on host, <remote_path_to_application> is path on the coprocessor

    Some additional hints:

    • If remote application needs additional libraries:
      Set $LD_LIBRARY_PATH, e.g. via:
      
      		(gdb) target extended-remote | ssh mic0 LD_LIBRARY_PATH=/tmp/ /tmp/gdbserver --multi -
      
      		

       
    • If source code is relocated, help the debugger to find it:
      
      		(gdb) set substitute-path <from> <to>

      Change paths from <from> to <to>. You can relocate a whole source (sub-)tree with that.
       
    • If libraries have different paths on host & target, help the debugger to find them:
      
      		(gdb) set solib-search-path <lib_paths>

      <lib_paths> is a colon separated list of paths to look for libraries on the host

    Debugging is no different than on host thanks to a real Linux* environment on the coprocessor!

    Debugging Remotely with PDBX

    PDBX has some pre-requisites that must be fulfilled for proper operation. Use pdbx check command to see whether PDBX is working:

    1. First step:
      
      		(gdb) pdbx check
      
      		checking inferior...failed.
      
      		


      Solution:
      Start a remote application (inferior) and hit some breakpoint (e.g. b main& run)
       
    2. Second step:
      
      		(gdb) pdbx check
      
      		checking inferior...passed.
      
      		checking libpdbx...failed.
      
      		


      Solution:
      Use set solib-search-path <lib_paths> to provide the path of libpdbx.so.5 on the host.
       
    3. Third step:
      
      		(gdb) pdbx check
      
      		checking inferior...passed.
      
      		checking libpdbx...passed.
      
      		checking environment...failed.
      
      		


      Solution:
      Set additional environment variables on the target for OpenMP*. Those need to be set with starting GDBServer (similar to setting $LD_LIBRARY_PATH).
    • $INTEL_LIBITTNOTIFY32=""
    • $INTEL_LIBITTNOTIFY64=""
    • $INTEL_ITTNOTIFY_GROUPS=sync

    Debugging with Eclipse* IDE

    Intel offers an Eclipse* IDE debugger plug-in for Intel® MIC that has the following features:

    • Seamless debugging of host and coprocessor
    • Simultaneous view of host and coprocessor threads
    • Supports multiple coprocessor cards
    • Supports both C/C++ and Fortran
    • Support of offload extensions (auto-attach to offloaded code)
    • Support for Intel® Many Integrated Core Architecture (Intel® MIC Architecture): Registers & Disassembly

    Eclipse* IDE with Offload Debug Session

    The plug-in is part of both Intel® MPSS and Intel® Parallel Studio XE Composer Edition.

    Pre-requisites

    In order to use the provided plug-in the following pre-requisites have to be met:

    • Supported Eclipse* IDE version:
      • 4.5 with Eclipse C/C++ Development Tools (CDT) 8.7 or later
      • 4.4 with Eclipse C/C++ Development Tools (CDT) 8.3 or later
      • 4.2 & 4.3 with Eclipse C/C++ Development Tools (CDT) 8.1 or later

    We recommend: Eclipse* IDE for C/C++ Developers (4.5)

    • Java* Runtime Environment (JRE) 6.0 or later (7.0 for Eclipse* 4.4)
    • For Fortran optionally Photran* plug-in
    • Remote System Explorer (aka. Target Management) to debug native coprocessor applications
    • Only for plug-in from Intel® Parallel Studio XE Composer Edition, source compilervars.[sh|csh] for Eclipse* IDE environment!

    Install Intel® C++ Compiler plug-in (optional):
    Add plug-in via “Install New Software…”:
    Install Intel® C++ Compiler plug-in (optional)
    This Plug-in is part of Intel® Parallel Studio XE Composer Edition (<install-dir>/ide_support_2016/eclipse/compiler_xe/). It adds Intel® C++ Compiler support which is not mandatory for debugging. For Fortran the counterpart is the Photran* plug-in. These plug-ins are recommended for the best experience.

    Note:
    Uncheck “Group items by category”, as the list will be empty otherwise!
    In addition, it is recommended to disable checking for latest versions. If not done, installation could take unnecessarily long and newer components might be installed that did not come with the vanilla Eclipse package. Those could cause problems.

    Install Plug-in for Offload Debugging

    Add plug-in via “Install New Software…”:
    Install Plug-in for Offload Debugging

    Plug-in is part of:

    • Intel® MPSS:
      • MPSS 2.1: <mpss_root>/eclipse_support/
      • MPSS 3.*: /usr/share/eclipse/mic_plugin/
    • Intel® Parallel Studio XE Composer Edition:<install-dir>/ide_support_2016/eclipse/gdb_xe/

    Note:
    Uncheck “Group items by category”, as the list will be empty otherwise!
    In addition, it is recommended to disable checking for latest versions. If not done, installation could take unnecessarily long and newer components might be installed that did not come with the vanilla Eclipse package. Those could cause problems.

    Configure Offload Debugging

    • Create a new debug configuration for “C/C++ Application”
    • Click on “Select other…” and select MPM (DSF) Create Process Launcher:Configure Offload Debugging
      The “MPM (DSF) Create Process Launcher” needs to be used for our plug-in. Please note that this instruction is for both C/C++ and Fortran applications! Even though Photran* is installed and a “Fortran Local Application” entry is visible (not in the screenshot above!) don’t use it. It is not capable of using MPM.
       
    • In “Debugger” tab specify MPM script of Intel’s GNU* GDB:
      • Intel® MPSS:
        • MPSS 2.1: <mpss_root>/mpm/bin/start_mpm.sh
        • MPSS 3.*: /usr/bin/start_mpm.sh
          (for MPSS 3.1.1, 3.1.2 or 3.1.4, please see Errata)
      • Intel® Parallel Studio XE Composer Edition:
        <install-dir>/debugger_2016/mpm/mic/bin/start_mpm.sh
        Configure Offload Debugging (Debugger)
        Here, you finally add Intel’s GNU* GDB for offload debugging (using MPM (DSF)). It is a script that takes care of setting up the full environment needed. No further configuration is required (e.g. which coprocessor cards, GDBServer & ports, IP addresses, etc.); it works fully automatic and transparent.

    Start Offload Debugging

    Debugging offload enabled applications is not much different than applications native for the host:

    • Create & build an executable with offload extensions (C/C++ or Fortran)
    • Don’t forget to add debug information (-g) and reduce optimization level if possible (-O0)
    • Start debug session:
      • Host & target debugger will work together seamlessly
      • All threads from host & target are shown and described
      • Debugging is same as used from Eclipse* IDE

    Eclipse* IDE with Offload Debug Session (Example)

    This is an example (Fortran) of what offload debugging looks like. On the left side we see host & mic0 threads running. One thread (11) from the coprocessor has hit the breakpoint we set inside the loop of the offloaded code. Run control (stepping, continuing, etc.), setting breakpoints, evaluating variables/memory, … work as they used to.

    Additional Requirements for Offload Debugging

    For debugging offload enabled applications additional environment variables need to be set:

    • Intel® MPSS 2.1:
      COI_SEP_DISABLE=FALSE
      MYO_WATCHDOG_MONITOR=-1

       
    • Intel® MPSS 3.*:
      AMPLXE_COI_DEBUG_SUPPORT=TRUE
      MYO_WATCHDOG_MONITOR=-1

    Set those variables before starting Eclipse* IDE!

    Those are currently needed but might become obsolete in the future.

    For MPSS 2.1, please be aware that the debugger cannot and should not be used in combination with Intel® VTune™ Amplifier XE (COI_SEP_DISABLE=FALSE). Hence, disabling SEP (as part of Intel® VTune™ Amplifier XE) is valid.

    For MPSS 3.*, AMPLXE_COI_DEBUG_SUPPORT=TRUE extracts K1OM object code map files from fat SOs (with host & K1OM object code) and places it under /tmp/coi_procs/<card #>/<process ID>/load_lib/ on the coprocessor. This is not only required for Intel® VTune™ Amplifier XE but also for the debugger. Additionally, use the mic_extract tool to extract K1OM object code from fat SOs on the host (where Eclipse IDE* runs on). Otherwise the current debugger won’t find the K1OM object code on the host, e.g.:

    
    	$ mic_extract libx.so
    
    	

    If libx.so contains K1OM object code as well, another file is created aside libx.so, like libxMIC.so. The latter contains the K1OM object code. See https://software.intel.com/en-us/node/524818 for more information.

    In addition, the watchdog monitor must be disabled because a debugger can stop execution for an unspecified amount of time. Hence the system watchdog might assume that a debugged application, if not reacting anymore, is dead and will terminate it. For debugging we do not want that.

    Note:
    Do not set those variables for a production system!

    For Intel® MPSS 3.2 and later:
    MYO debug libraries are no longer installed with Intel MPSS 3.2 by default. This is a change from earlier Intel MPSS versions. Users must install the MYO debug libraries manually in order to debug MYO enabled applications using the Eclipse plug-in for offload debugging. For Intel MPSS 3.2 (and later) the MYO debug libraries can be found in the package mpss-myo-dbg-* which is included in the mpss-*.tar file.

    MPSS 3.2 and 3.2.1 do not support offload debugging with Intel® Composer XE 2013 SP1, please see Errata for more information!

    Configure Native Debugging

    Configure Remote System Explorer
    To debug native coprocessor applications we need to configure the Remote System Explorer (RSE).

    Note:
    Before you continue, make sure SSH works (e.g. via command line). You can also specify different credentials (user account) via RSE and save the password.

    The basic steps are quite simple:

    1. Show the Remote System window:
      Menu Window->Show View->Other…
      Select: Remote Systems->Remote Systems
       
    2. Add a new system node for each coprocessor:
      RSE Remote Systems Window
      Context menu in window Remote Systems: New Connection…
    • Select Linux, press Next>
    • Specify hostname of the coprocessor (e.g. mic0), press Next>
    • In the following dialogs select:
      • ssh.files
      • processes.shell.linux
      • ssh.shells
      • ssh.terminals

    Repeat this step for each coprocessor!

    Transfer GDBServer
    Transfer of the GDBServer to the coprocessor is required for remote debugging. We choose /tmp/gdberver as target on the coprocessor here (important for the following sections).

    Copy GDBServer to coprocessor, e.g.:

    • Intel® MPSS:
      
      		$ scp /usr/linux-k1om-4.7/linux-k1om/usr/bin/gdbserver mic0:/tmp
      
      		
    • Intel® Parallel Studio XE Composer Edition:
      
      		$ scp <install-dir>/debugger_2016/gdb/targets/mic/bin/gdbserver mic0:/tmp

      During development you can also add GDBServer to your coprocessor image!

    Debug Configuration

    Eclipse* IDE Debug Configuration Window

    To create a new debug configuration for a native coprocessor application (here: native_c++) create a new one for C/C++ Remote Application.

    Set Connection to the coprocessor target configured with RSE before (here: mic0).

    Specify the remote path of the application, wherever it was copied to (here: /tmp/native_c++). We’ll address how to manually transfer files later.

    Set the flag for “Skip download to target path.” if you don’t want the debugger to upload the executable to the specified path. This can be meaningful if you have complex projects with external dependencies (e.g. libraries) and don’t want to manually transfer the binaries.
    (for MPSS 3.1.2 or 3.1.4, please see Errata)

    Note that we use C/C++ Remote Application here. This is also true for Fortran applications because there’s no remote debug configuration section provided by the Photran* plug-in!

    Eclipse* IDE Debug Configuration Window (Debugger)

    In Debugger tab, specify the provided Intel GNU* GDB for Intel® MIC (here: gdb-mic).

    Eclipse* IDE Debug Configuration Window (Debugger) -- Specify .gdbinit

    In the above example, set sysroot from MPSS installation in .gdbinit, e.g.:

    
    	set sysroot /opt/mpss/3.1.4/sysroots/k1om-mpss-linux/
    
    	

    You can use .gdbinit or any other command file that should be loaded before starting the debugging session. If you do not specify this you won't get debugger support for system libraries.

    Note:
    See section Debugging on Command Line above for the correct path of GDBServer, depending on the chosen package (Intel® MPSS or Intel® Parallel Studio XE Composer Edition)!

    Eclipse* IDE Debug Configuration Window (Debugger/GDBServer)

    In Debugger/Gdbserver Settings tab, specify the uploaded GDBServer (here: /tmp/gdbserver).

    Build Native Application for the Coprocessor

    Configuration depends on the installed plug-ins. For C/C++ applications we recommend to install the Intel® C++ Compiler XE plug-in that comes with Intel® Parallel Studio XE Composer Edition. For Fortran, install Photran* (3rd party) and select the Intel® Fortran Compiler manually.

    Make sure to use the debug configuration and provide options as if debugging on the host (-g). Optionally, disabling optimizations by –O0 can make the instruction flow comprehendible when debugging.

    The only difference compared to host builds is that you need to cross-compile for the coprocessor: Use the –mmic option, e.g.:
    Eclipse* IDE Project Properties

    After configuration, clean your build. This is needed because Eclipse* IDE might not notice all dependencies. And finally, build.

    Note:
    That the configuration dialog shown only exists for the Intel® C++ Compiler plug-in. For Fortran, users need to install the Photran* plug-in and switch the compiler/linker to ifort by hand plus adding -mmic manually. This has to be done for both the compiler & linker!

    Start Native Debugging

    Transfer the executable to the coprocessor, e.g.:

    • Copy manually  (e.g. via script on the terminal)
    • Use the Remote Systems window (RSE) to copy files from host and paste to coprocessor target (e.g. mic0):
      RSE Remote Systems Window (Copy)
      Select the files from the tree (Local Files) and paste them to where you want them on the target to be (e.g. mic0)
       
    • Use NFS to mirror builds to coprocessor (no need for update)
    • Use debugger to transfer (see earlier)

    Note:
    It is crucial that the executable can be executed on the coprocessor. In some cases the execution bits might not be set after copying.

    Start debugging using the C/C++ Remote Application created in the earlier steps. It should connect to the coprocessor target and launch the specified application via the GDBServer. Debugging is the same as for local/host applications.
    Native Debugging Session (Remote)

    Note:
    This works for coprocessor native Fortran applications the exact same way!

    Documentation

    More information can be found in the official documentation:

    • Intel® MPSS:
      • MPSS 2.1:
        <mpss_root>/docs/gdb/gdb.pdf
        <mpss_root>/eclipse_support/README-INTEL
      • MPSS 3.*:
        <mpss_root>/sysroots/x86_64-mpsssdk-linux/usr/share/doc/gdb-<version>/GDB.pdf
        (not available for all; please see Errata)
    • Intel® Parallel Studio XE Composer Edition:
      <install-dir>/documentation_2016/en/debugger/gdb-mic/gdb.pdf
      <install-dir>/documentation_2016/en/debugger/ps2016/get_started.htm

    The PDF gdb.pdf is the original GNU* GDB manual for the base version Intel ships, extended by all features added. So, this is the place to get help for new commands, behavior, etc.
    README-INTEL from Intel® MPSS contains a short guide how to install and configure the Eclipse* IDE plug-in.
    PDF eclmigdb_config_guide.pdf provides an overall step-by-step guide how to debug with the command line and with Eclipse* IDE.

    Using Intel® C++ Compiler with the Eclipse* IDE on Linux*:
    http://software.intel.com/en-us/articles/intel-c-compiler-for-linux-using-intel-compilers-with-the-eclipse-ide-pdf/
    The knowledgebase article (Using Intel® C++ Compiler with the Eclipse* IDE on Linux*) is a step-by step guide how to install, configure and use the Intel® C++ Compiler with Eclipse* IDE.

    Errata

    • With the recent switch from MPSS 2.1 to 3.1 some packages might be incomplete or missing. Future updates will add improvements. Documentation for GNU* GDB is missing up to 3.2 (3.3 and later contain it).
       
    • For MPSS 3.1.2 and 3.1.4 the respective package mpss-3.1.[2|4]-k1om.tar is missing. It contains binaries for the coprocessor, like the native GNU* GDB for the coprocessor. It also contains /usr/libexec/sftp-server which is needed if you want to debug native applications on the coprocessor and require Eclipse* IDE to transfer the binary automatically. As this is missing you need to transfer the files manually (select “Skip download to target path.” in this case).
      As a workaround, you can use mpss-3.1.1-k1om.tar from MPSS 3.1.1 and install the binaries from there. If you use MPSS 3.1.4, the native GNU* GDB is available separately via mpss-3.1.4-k1om-gdb.tar.
       
    • With MPSS 3.1.1, 3.1.2 or 3.1.4 the script <mpss_root>/mpm/bin/start_mpm.sh uses an incorrect path to the MPSS root directory. Hence offload debugging is not working. You can fix this by creating a symlink for your MPSS root, e.g. for MPSS 3.1.2:

      $ ln -s /opt/mpss/3.1.2 /opt/mpss/3.1

      Newer versions of MPSS correct this. This workaround is not required if you use the start_mpm.sh script from the Intel® Parallel Studio XE Composer Edition package.
       
    • For MPSS 3.* the coprocessor native GNU* GDB requires debug information from some system libraries for proper operation.
      Beginning with MPSS 3.1, debug information for system libraries is not installed on the coprocessor anymore. If the coprocessor native GNU* GDB is executed, it will fail when loading/continuing with a signal (SIGTRAP).
      Current workaround is to copy the .debug folders for the system libraries to the coprocessor, e.g.:

      $ scp -r /opt/mpss/3.1.2/sysroots/k1om-mpss-linux/lib64/.debug root@mic0:/lib64/
       
    • MPSS 3.2 and 3.2.1 do not support offload debugging with Intel® Composer XE 2013 SP1.
      Offload debugging with the Eclipse plug-in from Intel® Composer XE 2013 SP1 does not work with Intel MPSS 3.2 and 3.2.1. A configuration file which is required for operation by the Intel Composer XE 2013 SP1 package has been removed with Intel MPSS 3.2 and 3.2.1. Previous Intel MPSS versions are not affected. Intel MPSS 3.2.3 fixes this problem (there is no version of Intel MPSS 3.2.2!).
  • Intel(R) Xeon Phi(TM) Coprocessor
  • Debugger
  • GNU* GDB
  • Eclipse* IDE
  • Sviluppatori
  • Linux*
  • Server
  • C/C++
  • Fortran
  • Avanzato
  • Principiante
  • Intermedio
  • Architettura Intel® Many Integrated Core
  • Server
  • Desktop
  • URL
  • Per iniziare
  • Sviluppo multithread
  • Errori di threading
  • Debugging Intel® Xeon Phi™ Applications on Windows* Host

    $
    0
    0

    Contents

    Introduction

    Intel® Xeon Phi™ coprocessor is a product based on the Intel® Many Integrated Core Architecture (Intel® MIC). Intel® offers a debug solution for this architecture that can debug applications running on an Intel® Xeon Phi™ coprocessor.

    There are many reasons for the need of a debug solution for Intel® MIC. Some of the most important ones are the following:

    • Developing native Intel® MIC applications is as easy as for IA-32 or Intel® 64 hosts. In most cases they just need to be cross-compiled (/Qmic).
      Yet, Intel® MIC Architecture is different to host architecture. Those differences could unveil existing issues. Also, incorrect tuning for Intel® MIC could introduce new issues (e.g. alignment of data, can an application handle more than hundreds of threads?, efficient memory consumption?, etc.)
    • Developing offload enabled applications induces more complexity as host and coprocessor share workload.
    • General lower level analysis, tracing execution paths, learning the instruction set of Intel® MIC Architecture, …

    Debug Solution for Intel® MIC

    For Windows* host, Intel offers a debug solution, the Intel® Debugger Extension for Intel® MIC Architecture Applications. It supports debugging offload enabled application as well as native Intel® MIC applications running on the Intel® Xeon Phi™ coprocessor.

    How to get it?

    To obtain Intel® Debugger Extension for Intel® MIC Architecture on Windows* host, you need the following:

    Debug Solution as Integration

    Debug solution from Intel® based on GNU* GDB:

    • Full integration into Microsoft Visual Studio*, no command line version needed
    • Available with Intel® Composer XE 2013 SP1 and later
      (Intel® Parallel Studio XE Composer Edition is the successor)

    Note:
    Pure native debugging on the coprocessor is also possible by using Intel’s version of GNU* GDB for the coprocessor. This is covered in the following article for Linux* host:
    http://software.intel.com/en-us/articles/debugging-intel-xeon-phi-applications-on-linux-host

    Why integration into Microsoft Visual Studio*?

    • Microsoft Visual Studio* is established IDE on Windows* host
    • Integration reuses existing usability and features
    • Fortran support added with Intel® Parallel Studio XE Composer Edition for Fortran (former Intel® Fortran Composer XE)

    Components Required

    The following components are required to develop and debug for Intel® MIC Architecture:

    • Intel® Xeon Phi™ coprocessor
    • Windows* Server 2008 RC2, Windows* 7 or later
    • Microsoft Visual Studio* 2012 or later
      Support for Microsoft Visual Studio* 2013 was added with Intel® Composer XE 2013 SP1 Update 1. Microsoft Visual Studio* 2015 is supported with Intel® Parallel Studio XE 2016 Composer Edition and Intel® Parallel Studio XE 2015 Composer Edition Update 4, and later.
    • Intel® MPSS 3.1 or later
    • C/C++ development:
      Intel® C++ Composer XE 2013 SP1 for Windows* or later
    • Fortran development:
      Intel® Fortran Composer XE 2013 SP1 for Windows* or later

    Configure & Test

    It is crucial to make sure that the coprocessor setup is correctly working. Otherwise the debugger might not be fully functional.

    Setup Intel® MPSS:

    • Follow Intel® MPSS readme-windows.pdf for setup
    • Verify that the Intel® Xeon Phi™ coprocessor is running

    Before debugging applications with offload extensions:

    • Use official examples from:
      C:\Program Files (x86)\IntelSWTools\samples_2016\en
    • Verify that offloading code works

    It is crucial to make sure that the coprocessor setup is correctly working. Otherwise the debugger might not be fully functional.

    Prerequisite for Debugging

    Debugger integration for Intel® MIC Architecture only works when debug information is being available:

    • Compile in debug mode with at least the following option set:
      /Zi (compiler) and /DEBUG (linker)
    • Optional: Unoptimized code (/Od) makes debugging easier
      (due to removed/optimized away temporaries, etc.)
      Visual Studio* Project Properties (Debug Information &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp; Optimization)

    Applications can only be debugged in 64 bit

    • Set platform to x64
    • Verify that /MACHINE:x64 (linker) is set!
      Visual Studio* Project Properties (Machine)

    Debugging Applications with Offload Extension

    Start Microsoft Visual Studio* IDE and open or create an Intel® Xeon Phi™ project with offload extensions. Examples can be found in the Samples directory of Intel® Parallel Studio XE Composer Edition (former Intel® Composer XE), that is:

    C:\Program Files (x86)\IntelSWTools\samples_2016\en

    • compiler_c\psxe\mic_samples.zip or
    • compiler_f\psxe\mic_samples.zip

    We’ll use intro_SampleC from the official C++ examples in the following.

    Compile the project with Intel® C++/Fortran Compiler.

    Characteristics of Debugging

    • Set breakpoints in code (during or before debug session):
      • In code mixed for host and coprocessor
      • Debugger integration automatically dispatches between host/coprocessor
    • Run control is the same as for native applications:
      • Run/Continue
      • Stop/Interrupt
      • etc.
    • Offloaded code stops execution (offloading thread) on host
    • Offloaded code is executed on coprocessor in another thread
    • IDE shows host/coprocessor information at the same time:
      • Breakpoints
      • Threads
      • Processes/Modules
      • etc.
    • Multiple coprocessors are supported:
      • Data shown is mixed:
        Keep in mind the different processes and address spaces
      • No further configuration needed:
        Debug as you go!

    Setting Breakpoints

    Debugging Applications with Offload Extension - Setting Breakpoints

    Note the mixed breakpoints here:
    The ones set in the normal code (not offloaded) apply to the host. Breakpoints on offloaded code apply to the respective coprocessor(s) only.
    The Breakpoints window shows all breakpoints (host & coprocessor(s)).

    Start Debugging

    Start debugging as usual via menu (shown) or <F5> key:
    Debugging Applications with Offload Extension - Start Debugging

    While debugging, continue till you reach a set breakpoint in offloaded code to debug the coprocessor code.

    Thread Information

    Debugging Applications with Offload Extension - Thread Information

    Information of host and coprocessor(s) is mixed. In the example above, the threads window shows two processes with their threads. One process comes from the host, which does the offload. The other one is the process hosting and executing the offloaded code, one for each coprocessor.

    Additional Requirements

    For debugging offload enabled applications additional environment variables need to be set:

    • Intel® MPSS 2.1:
      COI_SEP_DISABLE=FALSE
      MYO_WATCHDOG_MONITOR=-1

       
    • Intel® MPSS 3.*:
      AMPLXE_COI_DEBUG_SUPPORT=TRUE
      MYO_WATCHDOG_MONITOR=-1

    Set those variables before starting Visual Studio* IDE!

    Those are currently needed but might become obsolete in the future. Please be aware that the debugger cannot and should not be used in combination with Intel® VTune™ Amplifier XE. Hence disabling SEP (as part of Intel® VTune™ Amplifier XE) is valid. The watchdog monitor must be disabled because a debugger can stop execution for an unspecified amount of time. Hence the system watchdog might assume that a debugged application, if not reacting anymore, is dead and will terminate it. For debugging we do not want that.

    Note:
    Do not set those variables for a production system!

    Debugging Native Coprocessor Applications

    Pre-Requisites

    Create a native Intel® Xeon Phi™ coprocessor application and transfer & execute the application to the coprocessor target:

    • Use micnativeloadex.exe provided by Intel® MPSS for an application C:\Temp\mic-examples\bin\myApp, e.g.:

      > "C:\Program Files\Intel\MPSS\bin\micnativeloadex.exe""C:\Temp\mic-examples\bin\myApp" -d 0
    • Option –d 0 specifies the first device (zero based) in case there are multiple coprocessors per system
    • The application is executed directly after transfer

    micnativeloadex.exe transfers the specified application to the specified coprocessor and directly executes it. The command itself will be blocked until the transferred application terminates.
    Using micnativeloadex.exe also takes care about dependencies (i.e. libraries) and transfers them, too.

    Other ways to transfer and execute native applications are also possible (but more complex):

    • SSH/SCP
    • NFS
    • FTP
    • etc.

    Debugging native applications with Start Visual Studio* IDE is only possible via Attach to Process…:

    • micnativeloadex.exe has been used to transfer and execute the native application
    • Make sure the application waits till attached, e.g. by:
      
      		static int lockit = 1;
      
      		while(lockit) { sleep(1); }
      
      		
    • After having attached, set lockit to 0 and continue.
    • No Visual Studio* solution/project is required.

    Only one coprocessor at a time can be debugged this way.

    Configuration

    Open the options via TOOLS/Options… menu:Debugging Native Coprocessor Applications - Configuration

    It tells the debugger extension where to find the binary and sources. This needs to be changed every time a different coprocessor native application is being debugged.

    The entry solib-search-path directories works the same as for the analogous GNU* GDB command. It allows to map paths from the build system to the host system running the debugger.

    The entry Host Cache Directory is used for caching symbol files. It can speed up lookup for big sized applications.

    Attach

    Open the options via TOOLS/Attach to Process… menu:Debugging Native Coprocessor Applications - Attach to Process...

    Specify the Intel(R) Debugger Extension for Intel(R) MIC Architecture. Set the IP and port the GDBServer should be executed with. The usual port for GDBServer is 2000 but we recommend to use a non-privileged port (e.g. 16000).
    After a short delay the processes of the coprocessor card are listed. Select one to attach.

    Note:
    Checkbox Show processes from all users does not have a function for the coprocessor as user accounts cannot be mapped from host to target and vice versa (Linux* vs. Windows*).

    Documentation

    More information can be found in the official documentation from Intel® Parallel Studio XE Composer Edition:
    C:\Program Files (x86)\IntelSWTools\documentation_2016\en\debugger\ps2016\get_started.htm

  • Intel(R) Xeon Phi(TM) Coprocessor
  • Visual Studio
  • Debugger
  • Sviluppatori
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Microsoft Windows* 8.x
  • Yocto Project
  • Server
  • Windows*
  • C/C++
  • Fortran
  • Avanzato
  • Principiante
  • Intermedio
  • Architettura Intel® Many Integrated Core
  • Server
  • Desktop
  • URL
  • Argomenti sui compilatori
  • Per iniziare
  • Sviluppo multithread
  • iconv issue

    $
    0
    0

    hi all,

     

    I'm trying to build something for the Phi that depends on iconv; the library routines are present , but the following application fails when run on the Phi:

    #include <stdlib.h>
    #include <iconv.h>
    
    int main () {
      iconv_t cd;
      cd = iconv_open("latin1","UTF-8");
      if(cd == (iconv_t)(-1)) exit(1);
      iconv_close(cd);
    
      exit(0);
    }
    

    if I build this using "icc -o iconv_test iconv_test.c" and run it on the host it return no error (exit code 0).

    However, if I build this for the Phi "icc -mmic -o iconv_test iconv_test.c" it always returns exitcode 1. An strace shows the following

    open("/usr/lib64/gconv/gconv-modules.cache", O_RDONLY) = -1 ENOENT (No such file or directory)
    brk(0)                                  = 0x714000
    brk(0x735000)                           = 0x735000
    open("/usr/lib64/gconv/gconv-modules", O_RDONLY) = -1 ENOENT (No such file or directory)
    exit_group(1)     

     

    and indeed, those module files are missing  - where can I find them?

     

    Questions about SCIF Driver

    $
    0
    0

    I have a system with 2 PHI cards installed running on redhat 7.0. I am able to run code on the cards as pure offload and I can ssh into the cards. I am trying to get symmetric mode to work.

    1) Does symmetric mode require OFED, or is OFED only required when there is a physical Infiniband card?

    2) What are the proper steps to verify that the SCIF driver is properly loaded? mic shows up as a driver but there is no indication of anything named SCIF. 

    [root@infinity ~]# lsmod
    Module                  Size  Used by
    mic                   666166  16
    vtsspp                372813  0
    sep3_15               527535  0
    pax                    13181  0
    bridge                115385  0
    stp                    12976  1 bridge
    llc                    14552  2 stp,bridge
    ipt_REJECT             12541  2
    xt_comment             12504  2
    nf_conntrack_ipv4      14862  2
    nf_defrag_ipv4         12729  1 nf_conntrack_ipv4
    xt_conntrack           12760  2
    nf_conntrack          105702  2 xt_conntrack,nf_conntrack_ipv4
    iptable_filter         12810  1
    ip_tables              27239  1 iptable_filter
    intel_powerclamp       18764  0
    coretemp               13435  0
    intel_rapl             18773  0
    kvm                   461126  0
    iTCO_wdt               13480  0
    crct10dif_pclmul       14289  0
    crc32_pclmul           13113  0
    crc32c_intel           22079  0
    ghash_clmulni_intel    13259  0
    iTCO_vendor_support    13718  1 iTCO_wdt
    cryptd                 20359  1 ghash_clmulni_intel
    mei_me                 18646  0
    sb_edac                26819  0
    pcspkr                 12718  0
    nfsd                  290215  13
    mei                    82723  1 mei_me
    edac_core              57650  1 sb_edac
    lpc_ich                21073  0
    mfd_core               13435  1 lpc_ich
    i2c_i801               18135  0
    auth_rpcgss            59343  1 nfsd
    nfs_acl                12837  1 nfsd
    lockd                  93977  1 nfsd
    ipmi_si                53353  0
    ipmi_msghandler        45603  1 ipmi_si
    sunrpc                295293  15 nfsd,auth_rpcgss,lockd,nfs_acl
    shpchp                 37032  0
    ioatdma                67762  0
    acpi_power_meter       18087  0
    acpi_pad              116305  0
    ext4                  562391  7
    mbcache                14958  1 ext4
    jbd2                  102940  1 ext4
    raid10                 48128  2
    sd_mod                 45499  12
    crc_t10dif             12714  1 sd_mod
    crct10dif_common       12595  2 crct10dif_pclmul,crc_t10dif
    ast                    56119  1
    syscopyarea            12529  1 ast
    sysfillrect            12701  1 ast
    sysimgblt              12640  1 ast
    nvidia               8374856  0
    drm_kms_helper         98226  1 ast
    ttm                    93488  1 ast
    drm                   311588  5 ast,ttm,drm_kms_helper,nvidia
    igb                   192078  0
    ahci                   29870  8
    libahci                32009  1 ahci
    ptp                    18933  1 igb
    libata                218854  2 ahci,libahci
    pps_core               19106  1 ptp
    dca                    15130  2 igb,ioatdma
    i2c_algo_bit           13413  2 ast,igb
    i2c_core               40325  7 ast,drm,igb,i2c_i801,drm_kms_helper,i2c_algo_bit,nvidia
    wmi                    19070  0
    dm_mirror              22135  0
    dm_region_hash         20862  1 dm_mirror
    dm_log                 18411  2 dm_region_hash,dm_mirror
    dm_mod                104038  25 dm_log,dm_mirror

     


    Knight's Landing + Java

    $
    0
    0

    Dear Intel Staff,

    I just got to know some details of your great presentation of Knight's Landing (KNL) at Hot Chips this year. Information about KNL on the website is still sparse. From your slides I understand that there will be a version of KNL that is socked and can be used as a primary CPU in a rack. However, this raises quite some questions that I cannot find satisfying answers.

    Our scenario:
    We have a research cluster that consists mostly of 2 socket systems with normal Ivy-Bridge Xeon CPUs. Our main application is a JVM based machine learning system that uses the MKL via JNI to accelerate computations. We intend to extend this cluster soon and would like to utilize Phi processors. But whether we can use them depends on a few things. (see below)

    What I would like to know from you:

    1. Is KNL (socketed) a fully featured x86_64 CPU? For me that means:
      1. Can we run an off the shelf Linux on it? (e.g. Redhat Server, Ubuntu Server, etc.?)
      2. Can we run an off the shelf JVM on it? (e.g. Oracle x64 JVM)
      3. Are there any hardware restrictions that make native code invocation via JNI  difficult/impossible?
      4. Are there any restrictions for invoking MKL on matrices that reside in the main memory? (i.e. data is not stored in the MCDRAM, but in the DDR4 memory. On Knight's Corner this was terribly inefficient for small matrices because of the offloading-overhead for shipping the data back and forth.)
      5. Will normal multi-threading via pThreads or the Java-Multithreading-Framework work? Will all (logical/physical) cores be accessible this way?
    2. Will there be (cheaper?) versions with less than 16 GB MCDRAM?
    3. What is the expected price range for KNL?
    4. When will KNL become available?

    Many thanks in advance,

    Matt

    Simple Offload Example Failing

    $
    0
    0

    Hello,

    I'm attempting to run a simple offload example: 

    #include <stdio.h>
    #include <omp.h>
    
    int main(){
    double sum; int i,n, nt;
    
       n=2000000000;
       sum=0.0e0;
    
       #pragma offload target(mic:0)
       {
        #pragma omp parallel for reduction(+:sum)
        for(i=1;i<=n;i++){
           sum = sum + i;
        }
        //nt = omp_get_max_threads();
        #pragma omp parallel
        {
           #pragma omp single
           nt = omp_get_num_threads();
        }
    
        #ifdef __MIC__
           printf("Hello MIC reduction %f threads: %d\n",sum,nt);
        #else
           printf("Hello CPU reduction %f threads: %d\n",sum,nt);
        #endif
       }
    }

    This program ran fine previously but we recently rebooted our Phi nodes in our cluster and since then this offloading example will not run. The native compiled MIC binaries still run without a problem since the reboot.

    Before running I type:

    . /usr/local/intel/ClusterStudioXE_2013/composer_xe_2013_sp1/bin/compilervars.sh intel64
    make
    export MIC_OMP_NUM_THREADS=120
    export MIC_ENV_PREFIX=MIC
    export OFFLOAD_REPORT=3

    Here is my Makefile:

    CC=icc
    CFLAGS=-std=c99 -O3 -vec-report3 -openmp -offload
    EXE=reduce_offload_mic
    
    $(EXE) : reduce_omp_mic.c
    	$(CC) -o $@ $< $(CFLAGS)
    
    .PHONY: clean
    
    clean:
    	rm $(EXE)

    However, when I run the program here is the output:

    [frenchwr@vmp903 Offload]$ ./reduce_offload_mic
    offload error: cannot offload to MIC - device is not available
    [Offload] [HOST]  [State]   Unregister data tables

    I have ensured that mpss is running and even restarted the service with:

    sudo service mpss restart

    but still the same error (even after re-building the executable).

    All of my mic tests pass:

    [frenchwr@vmp903 Offload]$ miccheck
    MicCheck 3.4-r1
    Copyright 2013 Intel Corporation All Rights Reserved
    
    Executing default tests for host
      Test 0: Check number of devices the OS sees in the system ... pass
      Test 1: Check mic driver is loaded ... pass
      Test 2: Check number of devices driver sees in the system ... pass
      Test 3: Check mpssd daemon is running ... pass
    Executing default tests for device: 0
      Test 4 (mic0): Check device is in online state and its postcode is FF ... pass
      Test 5 (mic0): Check ras daemon is available in device ... pass
      Test 6 (mic0): Check running flash version is correct ... pass
      Test 7 (mic0): Check running SMC firmware version is correct ... pass
    Executing default tests for device: 1
      Test 8 (mic1): Check device is in online state and its postcode is FF ... pass
      Test 9 (mic1): Check ras daemon is available in device ... pass
      Test 10 (mic1): Check running flash version is correct ... pass
      Test 11 (mic1): Check running SMC firmware version is correct ... pass
    
    Status: OK

    Here's the output from micinfo:

    [frenchwr@vmp903 Offload]$ micinfo
    MicInfo Utility Log
    Created Fri Aug 28 18:14:23 2015
    
    
    	System Info
    		HOST OS			: Linux
    		OS Version		: 2.6.32-431.29.2.el6.x86_64
    		Driver Version		: 3.4-1
    		MPSS Version		: 3.4
    		Host Physical Memory	: 132110 MB
    
    Device No: 0, Device Name: mic0
    
    	Version
    		Flash Version 		 : 2.1.02.0390
    		SMC Firmware Version	 : 1.16.5078
    		SMC Boot Loader Version	 : 1.8.4326
    		uOS Version 		 : 2.6.38.8+mpss3.4
    		Device Serial Number 	 : ADKC42900304
    
    	Board
    		Vendor ID 		 : 0x8086
    		Device ID 		 : 0x225c
    		Subsystem ID 		 : 0x7d95
    		Coprocessor Stepping ID	 : 2
    		PCIe Width 		 : Insufficient Privileges
    		PCIe Speed 		 : Insufficient Privileges
    		PCIe Max payload size	 : Insufficient Privileges
    		PCIe Max read req size	 : Insufficient Privileges
    		Coprocessor Model	 : 0x01
    		Coprocessor Model Ext	 : 0x00
    		Coprocessor Type	 : 0x00
    		Coprocessor Family	 : 0x0b
    		Coprocessor Family Ext	 : 0x00
    		Coprocessor Stepping 	 : C0
    		Board SKU 		 : C0PRQ-7120 P/A/X/D
    		ECC Mode 		 : Enabled
    		SMC HW Revision 	 : Product 300W Passive CS
    
    	Cores
    		Total No of Active Cores : 61
    		Voltage 		 : 1037000 uV
    		Frequency		 : 1238095 kHz
    
    	Thermal
    		Fan Speed Control 	 : N/A
    		Fan RPM 		 : N/A
    		Fan PWM 		 : N/A
    		Die Temp		 : 46 C
    
    	GDDR
    		GDDR Vendor		 : Samsung
    		GDDR Version		 : 0x6
    		GDDR Density		 : 4096 Mb
    		GDDR Size		 : 15872 MB
    		GDDR Technology		 : GDDR5
    		GDDR Speed		 : 5.500000 GT/s
    		GDDR Frequency		 : 2750000 kHz
    		GDDR Voltage		 : 1501000 uV
    
    Device No: 1, Device Name: mic1
    
    	Version
    		Flash Version 		 : 2.1.02.0390
    		SMC Firmware Version	 : 1.16.5078
    		SMC Boot Loader Version	 : 1.8.4326
    		uOS Version 		 : 2.6.38.8+mpss3.4
    		Device Serial Number 	 : ADKC42900319
    
    	Board
    		Vendor ID 		 : 0x8086
    		Device ID 		 : 0x225c
    		Subsystem ID 		 : 0x7d95
    		Coprocessor Stepping ID	 : 2
    		PCIe Width 		 : Insufficient Privileges
    		PCIe Speed 		 : Insufficient Privileges
    		PCIe Max payload size	 : Insufficient Privileges
    		PCIe Max read req size	 : Insufficient Privileges
    		Coprocessor Model	 : 0x01
    		Coprocessor Model Ext	 : 0x00
    		Coprocessor Type	 : 0x00
    		Coprocessor Family	 : 0x0b
    		Coprocessor Family Ext	 : 0x00
    		Coprocessor Stepping 	 : C0
    		Board SKU 		 : C0PRQ-7120 P/A/X/D
    		ECC Mode 		 : Enabled
    		SMC HW Revision 	 : Product 300W Passive CS
    
    	Cores
    		Total No of Active Cores : 61
    		Voltage 		 : 1040000 uV
    		Frequency		 : 1238095 kHz
    
    	Thermal
    		Fan Speed Control 	 : N/A
    		Fan RPM 		 : N/A
    		Fan PWM 		 : N/A
    		Die Temp		 : 47 C
    
    	GDDR
    		GDDR Vendor		 : Samsung
    		GDDR Version		 : 0x6
    		GDDR Density		 : 4096 Mb
    		GDDR Size		 : 15872 MB
    		GDDR Technology		 : GDDR5
    		GDDR Speed		 : 5.500000 GT/s
    		GDDR Frequency		 : 2750000 kHz
    		GDDR Voltage		 : 1501000 uV

     

    From searching online I see a few other users who have run into the:

    offload error: cannot offload to MIC - device is not available
    [Offload] [HOST]  [State]   Unregister data tables

    issue, but I don't see any good resolution (other than by restarting mpss, which does not resolve the issue for me).

    Why MIC requires strict data alignment? How about auto vectorize of unaligned data?

    $
    0
    0

    MIC requires strict 64Byte data alignment to utilize vpu, but why? I found Sparc also have such an requirement. But other multi-core CPU can handle unaligned data.

    As MIC can automatically vectorize a for loop of data(with compiler optimization), what if the data is unaligned in this case? will the auto optimization still work?  if yes, how?

    offload_transfer: array of variables?

    $
    0
    0

    Hello,

    I would like to pre-allocate a number of buffers for later data transfers from CPU to MIC, using explicit offloading in C++.

    It works nicely if each buffer corresponds to an explicit variable name, as e.g. in the double-buffering examples. However, I would like to have a configurable number of such buffers (more than 2), i.e. an array of buffers. (the buffers are used for asynchronous processing on the MIC, and I need quite a few of them).

    I do have a workaround, i.e. allocate a single very big buffer, and cut it into pieces (by using offsets and 'into' for transfers), but as the buffers do not need to be to be contiguous, I'm afraid adding this constraint may cause problems to find a big block available at runtime. So I would prefer to have several smaller buffers if possible.

    The code below will probably describe easily the issue. In the first part, it works fine with 2 variable names. But in the second part, with an array, I don't find how to proceed (or is it simply not possible?). I tried without success various syntaxes, but could not find one accepted by the compiler.

    I would be glad if someone could help on this matter. Thanks in advance for any feedback on this!

    cheers, Sylvain

     

    #pragma offload_attribute (push,target(mic))
    #include <stdio.h>
    #pragma offload_attribute (pop)
    
    #define ALLOC alloc_if(1) free_if(0)
    #define FREE alloc_if(0) free_if(1)
    #define REUSE alloc_if(0) free_if(0)
    
    int main() {
    
      int size=100;      // size of buffer
      char input[size];  // buffer for input data on the CPU
    
      char *ptr1=NULL;  // reference to MIC buffer 1
      char *ptr2=NULL;  // reference to MIC buffer 2
    
      // pre-allocate MIC buffers
      #pragma offload_transfer target(mic:0) nocopy(ptr1 : length(size) ALLOC)
      #pragma offload_transfer target(mic:0) nocopy(ptr2 : length(size) ALLOC)
    
      // test use of buffer 1
      snprintf(input,size,"valPtr1");
      #pragma offload target(mic:0) in(input[0:size] : REUSE into(ptr1[0:size]))
      {
        printf("MIC: %p = %s\n",ptr1,ptr1);
      }
    
      // test use of buffer 2
      snprintf(input,size,"valPtr2");
      #pragma offload target(mic:0) in(input[0:size] : REUSE into(ptr2[0:size]))
      {
        printf("MIC: %p = %s\n",ptr2,ptr2);
      }
    
    
      // try to do same as above, but with an array instead of fixed variable names ptr1,ptr2
      // so that number of elements can be increased and iterated
      // e.g. instead of ptr1 and ptr2, use ptrX[1], ptrX[2] ... ptrX[N]
    
      // compiler does not seem to complain for the allocation
      // but it crashes at runtime
      char *ptrX[2]={NULL,NULL};
      for (int i=0;i<2;i++) {
        #pragma offload_transfer target(mic:0) nocopy(ptrX[i] : length(size) ALLOC)
      }
    
      // and then, how to use the buffers ???
      /*
      for (int i=0;i<2;i++) {
        snprintf(input,size,"valPtrX%d",i);
        #pragma offload target(mic:0) in(input[0:size] : REUSE into((???)[0:size]))
        {
          printf("MIC: %p = %s\n",???,???);
        }
      }
      */
    
      return 0;
    }
    

     

    No Cost Options for Intel Math Kernel Library (MKL), Support yourself, Royalty-Free

    $
    0
    0

    The Intel® Math Kernel Library (Intel® MKL), the high performance math library for x86 and x86-64, is available for free for everyone (click here now to register and download). Purchasing is only necessary if you want access to Intel® Premier Support (direct 1:1 private support from Intel), older versions of the library or access to other tools in Intel® Parallel Studio XE. Intel continues to actively develop and support this very powerful library - and everyone can benefit from that!

    Intel® Math Kernel Library (Intel® MKL) is a very popular library product from Intel that accelerates math processing routines to increase application performance. Intel® MKL includes highly vectorized and threaded Linear Algebra, Fast Fourier Transforms (FFT), Vector Math and Statistics functions. The easiest way to take advantage of all of that processing power is to use a carefully optimized computing math library; even the best compiler can’t compete with the level of performance possible from a hand-optimized library. If your application already relies on the BLAS or LAPACK functionality, simply re-link with Intel® MKL to get better performance on Intel and compatible architectures.

    Intel® MKL is most often obtained with the Intel® Compilers and all the other Intel® Performance Libraries in various products from Intel. It can be obtained with tools for analysis, debugging and tuning, tools for MPI and the Intel® MPI Library by acquiring the Intel® Parallel Studio XE. Did you know that some of these are available for free?

    Here is a guide to various ways to obtain the latest version of the Intel® Math Kernel Library (Intel® MKL) for free without access to Intel® Premier Support (get support by posting to the Intel Math Kernel Library forum). Anytime you want, the full suite of tools (Intel® Parallel Studio XE) with Intel® Premier Support and access to previous library versions can be purchased worldwide.

    WhoWhat is Free?InformationWhere?
    Community Licenses for Everyone

    Intel® Math Kernel Library (Intel® MKL)

    Intel® Data Analytics Acceleration Library
    (Intel® DAAL)

    Intel® Threading Building Blocks
    (Intel® TBB)

    Intel® Integrated Performance Primitives (Intel® IPP)

    Community Licensing for Intel® Performance Libraries – free for all, registration required, no royalties, no restrictions on company or project size, current versions of libraries, no Intel Premier Support access.

    (Linux*, Windows* or OS X* versions)

    Forums for discussion and support are open to everyone:

    Community Licensing for Intel Performance Libraries
    Evaluation Copies for Everyone

    Intel® Math Kernel Library (Intel® MKL)
    along with Compilers, libraries and analysis tools (most everything!)

    Evaluation Copies – Try before you buy.

    (Linux, Windows or OS X versions)

    Try before you buy
    Use as an Academic Researcher

    Linux, Windows or OS X versions of:

    Intel® Math Kernel Library

    Intel® Data Analytics Acceleration Library

    Intel® Threading Building Blocks

    Intel® Integrated Performance Primitives

    Intel® MPI Library (not available for OS X)

    If you will use in conjunction with academic research at institutions of higher education.

    (Linux, Windows or OS X versions, except the Intel® MPI Library which is not supported on OS X)

    Qualify for Use as an Academic Researcher
    Student

    Intel® Math Kernel Library (Intel® MKL)
    along with Compilers, libraries and analysis tools (most everything!)

    If you are a current student at a degree-granting institutions.

    (Linux, Windows or OS X versions)

    Qualify for Use as a Student
    Teacher

    Intel® Math Kernel Library (Intel® MKL)
    along with Compilers, libraries and analysis tools (most everything!)

    If you will use in a teaching curriculum.

    (Linux, Windows or OS X versions)

    Qualify for Use as an Educator
    Use as an
    Open Source Contributor

    Intel® Math Kernel Library (Intel® MKL)
    along with all of the
    Intel® Parallel Studio XE Professional Edition for Linux

    If you are a developer actively contributing to a open source projects – and that is why you will utilize the tools.

    (Linux versions)

    Qualify for Use as an Open Source Contributor

    Free licenses for certain users has always been an important dimension in our offerings. One thing that really distinguishes Intel is that we sell excellent tools and provide second-to-none support for software developers who buy our tools. We provide multiple options - and we hope you will exactly what you need in one or our options.

     

    Immagine icona: 

  • Ricerca
  • Big data
  • Elaborazione basata su cluster
  • Modernizzazione codici
  • Debugging
  • Strumenti di sviluppo
  • Architettura Intel® Many Integrated Core
  • Ottimizzazione
  • Elaborazione parallela
  • Threading
  • Vettorizzazione
  • Intel® Cluster Ready
  • Message Passing Interface
  • OpenMP*
  • C/C++
  • Fortran
  • Sviluppatori
  • Professori
  • Studenti
  • Apple OS X*
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Microsoft Windows* 10
  • Microsoft Windows* 8.x
  • Includere in RSS: 

    1
  • Avanzato
  • Principiante
  • Intermedio
  • Viewing all 1347 articles
    Browse latest View live


    <script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>