Hi,
I am experiencing a performance issue with TBB on Xeon Phi. On a server machine with two X5680, TBB runs faster than OpenMP from a group of benchmarks I have. This is also true with one of my other machine which has one i7-3820. However, the opposite happens on Xeon Phi.
Attached is a simple test program which shows how I use both of them. It compares the performance of the two by printing out execution time of a dummy workload. Simply typing make and make run would show you the result. The following is an example output: [xxx@yyy test]$ make icpc -o bin.x64 main.cc work.cc -fopenmp -g -O3 -no-vec -tbb icpc -o bin.mic main.cc work.cc -fopenmp -g -O3 -DCOMPILE_FOR_XEON_PHI -no-vec -tbb [xxx@yyy test]$ make run ./bin.x64 tbb: 0.592992 ./bin.x64 omp: 0.937134 ./bin.mic tbb: 0.807429 ./bin.mic omp: 0.284160 [xxx@yyy test]$ which icc /var/intel/parallel_studio_xe_2013/composer_xe_2013_sp1.2.144/bin/intel64/icc As you can see, TBB is ~2x faster on the Xeon server but ~3x slower on Xeon Phi. What would be a proper explanation of this symptom? Also, do you see any problem or have comments on the usage? (One thing I tried is adjusting the granularity of tbb::blocked_range but no good) Ultimately, is TBB not good for Xeon Phi for performance? Please advise. Thanks in advance. Regards, Hee-Seok