Profile events on xeon by using perf

Hi,

I am doing some experiments using xeon and make a comparison between xeon and AMD, I am using perf in both machines. My concern is that the results of my events in xeon are thousand times higher then the results from AMD, but the runtime on xeon is much better than the AMD. I am measuring cache, instructions and cpu-clock in both machines.

My application is a matrix multiplication, size 1000x1000 and I am running a sequential execution, not parallel yet.

Can you explain why these differences (thousand times) between the machines? for example, in AMD caches-references = 713,432 and xeon is 127,708,365 caches-references.

this is the xeon configuration

Model: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
CPU MHz: 1200.000
CPU cores per Processor: 8
Host Physical Memory: 65933 MB
Architecture: x86_64
Host Physical Memory: 65933 MB
L1 dcache: 32K
L1 icache: 32K
L2 cache: 256K
L3 cache: 20480K
cache_alignment: 64

this is AMD configuration

AMD Opteron 2427

Instruction set: x86-64
Speed: 2.2 Ghz
L1 instruction cache: 6 x 64 Kb
L1 data cache: 6 x 64 Kb
L2 cache: 6 x 512 Kb
L3 cache: 6 Mb

take a look in this example,

=== results from AMD ======

perf stat -e cache-references,cache-misses,branch-instructions,cpu-clock bpsh 15 ./mm1 1000 1

Program runs in 17.52 seconds

Performance counter stats for 'bpsh 15 ./mm1 1000 1':

713,432 cache-references

35,538 cache-misses # 4.981 % of all cache refs

411,916 branch-instructions

2.701875 cpu-clock (msec)

17.560428347 seconds time elapsed

=== results from Xeon ======

now, I compiled mm1 on xeon as offload, but there is no #pragma offload directive, so the code run intirely on xeon (processor)

perf stat -e cache-references,cache-misses,branch-instructions,cpu-clock ./mm1 1000 1

Program runs in 2.69 seconds

Performance counter stats for './mm1 1000 1':

127,708,365 cache-references

477,245 cache-misses # 0.374 % of all cache refs

507,201,088 branch-instructions

2701.183114 cpu-clock (msec)

2.701594439 seconds time elapsed

do you have any idea why the results are so different?

thanks,

Profile events on xeon by using perf

Trending Articles

Scuffham Amps - S-GEAR 2.6.0 VST, AAX, STANDALONE x86 x64 (R2R NO iLok2, +NO...

Practice Sheet of Right form of verbs for HSC Students

VHSE First (1st) Allotment 2025 - vhscap.kerala.gov.in

UNIVERSE LEAGUE – UNIVERSE LEAGUE – WAR (We Are Ready) – EP [iTunes Plus M4A]

City Hunter Teledrama – Episode 18 – 07th May 2016

Comment on Proposed Criteria for Identifying Predatory Conferences by Luke...

Bureau of Internal Revenue: Regional Offices (Directory)

Kendrick Lamar – Not Like Us (2024) [24Bit-88.2kHz] [PMEDIA] ⭐️

Inception 2010 Hindi Dual Audio 650MB BRRip 720p ESubs HEVC

East Hull MD admits sexual assaults after another victim comes forward

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

R. v. Sargeant, 2023 ONSC 6406 (CanLII)

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Who’s been sentenced at Northampton Magistrates’ Court

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Family cries out as traditional ruler allegedly abducts brother, extorts N2.5m

Long-Running Conflict In Springfield (MA) Gangland Sphere Has Manzi Family &...

Wondershare Filmora X v10.1.20.16 x64

Man arrested after fracas in flat

Man charged in ongoing Sexual Assault Investigation Derek Nyilas, 46, Faces...