Cycle-accurate measurement on the Athlon
As several of you are, I am working on a general framework to
achieve cycle-accurate measurement on my Athlon Thunderbird. My
initial focus is on integer code.
BITS 32
GLOBAL time_code
SECTION .mytext progbits alloc exec nowrite align=64
%macro read_timestamp_counter 0
xor eax, eax
cpuid
rdtsc
%endmacro
ALIGN XXX
time_code:
push ebx
push esi
read_timestamp_counter
mov esi, eax
%rep YYY
nop
%endrep
read_timestamp_counter
sub eax, esi
pop esi
pop ebx
ret
I then call time_code() 20,000 times from a tiny C program for
different values of XXX and YYY (XXX=0..63 and YYY=0..1023)
For a given XXX and YYY, I make sure I get consistent results, that
is I check there are no more than 40 values - 0.2% - different from
the preponderant value.
For a given YYY, I check that all results are identical.
Conclusions:
In this case, alignment does not matter, which is not surprising
since NOPs are only one byte long. But I had to check if I wanted to
be thorough.
I computed a linear regression: CYCLES = (YYY + 1) / 3 + 60
The formulate is accurate to within one cycle, as long as x is
greater to 6.
Interesting points:
The first and second measurements are often incorrect, but the third
measurement is always correct. Therefore the first two measurements
should be discarded.
There are some strange results for several values of YYY:
11 64
12 64
13 64
14 65
15 65
16 65
17 68 *** EXPECTED 66 ***
18 67 *** EXPECTED 66 ***
19 67 *** EXPECTED 66 ***
20 67
21 67
22 67
23 68
24 68
25 68
17 is funky :-)
Anyway, I'll be happy to hear any comments, or suggestions. Next
week, I'll play around with instructions other than NOPs. I suspect
alignement will then matter, I wonder how much.
My ultimate goal is to write an "optimizing assembler", that is an
assembler which reorders instructions to find the optimal code
schedule on a particular x86 micro-architecture.
Cheers,
Shill
|