Not all applications are created equal. Some are chomping at the bit to harvest as much parallelism as a target platform can provide. Those may be good candidates for running on an Intel® Xeon Phi™ Coprocessor. Other applications are scalar (not vectorized) and sequential (not threaded). They won't even make full use of an Intel Xeon processor, much less an Intel Xeon Phi Coprocessor. Before moving to a highly-parallel platform, the developer-tuner needs to expose enough parallelism to hit the limits of the Intel Xeon platform. Once the demand for threads and vectors or memory bandwidth exceeds what an Intel Xeon processor can deliver, an Intel Xeon Phi coprocessor has the potential to provide further performance improvements.
Assessing whether an application holds promise for showing compelling performance with a given platform, or currently exposes enough parallelism to make ready use of it, is a challenge that faces many developers today. An application may have potential, but that potential may not yet be fully realized. And the cability of platforms to harvest that potential may change over time, such that an application may not have a good fit for the first implementation in a processor family, but a later generation of the same family may be able to offer compelling performance for the same code.
At ISC13, I'm giving a theater presentation and chalk talk seeking to address the following questions that we tend to have as application developers and tuners:
- When would I need an Intel Xeon Phi coprocessor vs. an Intel Xeon processor?
- How do I tell whether my application is a good fit for a Intel Xeon Phi Coprocessor?
- What should my expectations be for the speedup I can achieve?
- What do I need to do to make the application sign on "extreme hardware?"
- How do I develop a good intuition about this?
Here's a link to the ISC13 theater presentation and chalk talk. Check back here for updates to that content.
I've also worked with my colleage Chao Mei to prepare a lab that works through these issues. It goes step by step, with make files, reference solutions, VTune project files and even an answer key, so you can make use of it as a beginner. The link to will be here, as soon as I've had a chance for some others to try it out. I strongly believe that we need to make our developer and analysis tools more powerful, effective and intuitive if we're to help motivate developers to do the hard work of parallelizing their applications, regardless of platform.
Have fun exposing and harvesting extreme parallelism! You might also check out a related "right for me"blog.
I hope to see you at the theater presentation, Wed. June 19 at 1:20pm in the Intel booth at ISC, with a chalk talk to follow.
CJ Newburn
Performance and Feature Architect, Intel
Immagine icona:
