difference between omp target and native mode

Now that OpenMP 4 is fairly well supported on the Fortran side (except for simd reduction), I've been able to set up an example which can run in host, MIC native, and offload mode, simply by changing compile options.

I've arranged the benchmark to minimize the accounting for data transfers between host and coprocessor by running the test loop thousands of times between transfers, yet the offload performance doesn't approach MIC native performance. A small part of the problem is that the offload mode peaks at 59 threads (about double the performance of default number of threads), while the native mode shows gains up to 177 threads.

Another small part of the problem is that compiler directives surrounded by #ifdef __MIC__ are used only for -mmic compilation. I would use them also for offload target mode if I knew the incantation. Stuff like !dir$ no vector for the case where the vectorization is slow due to ineffective software prefetch, and !dir$ unroll(0).

According to the vecanalysis tool

http://software.intel.com/en-us/articles/vecanalysis-python-script-for-a...

not only are the conditional directives not used for omp target mode, in every test some vector operations which are reported as "lightweight" in the native mode are reported as "medium" for omp target, and some "medium" are promoted to "heavyweight." Examination of the .s files doesn't show any difference to account for the difference in reports. The MIC .s files are difficult to read as there appears to be no way to suppress a debug symbol showing prior to each instruction.

Also, the native mode compilation vecanalysis reports no peeled vectorized loops and several vectorized remainder loops, opposite to the omp target mode. That's another problem which should be only minor.

My C++ version gives similar performance to the Fortran in MIC native mode, but isn't sufficiently stable in omp target mode. The old problem of reporting buffer overlaps when transferring explicitly more than 64MB remains, among others. The only suggestion I've received about that is that the current MPSS may not be supporting the earlier KNC coprocessors (apparently all current production models have more than 4GB RAM). It's strange that the problem is solved for ifort but not for icpc.

There's also a remaining conflict between target map and target update, where both are needed in the same application, which was fixed in ifort. gcc seems to be copying the lack of support for target update. I filed a bug report with gcc about omp simd reduction being accepted (even where icc rejects it) but killing the optimization which occurred without the directive, which was verified.

Allegato	Dimensione
Scarica lcd_omp4.tgz	15.19 KB

difference between omp target and native mode

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112