Mombu the Programming Forum

Go Back   Mombu the Programming Forum > Programming > Cycle-accurate measurement on the Athlon
User Name
Password
REGISTER NOW! Mark Forums Read




Reply
1 12th October 19:36
shill
External User
 
Posts: 1
Default Cycle-accurate measurement on the Athlon



As several of you are, I am working on a general framework to
achieve cycle-accurate measurement on my Athlon Thunderbird. My
initial focus is on integer code.

BITS 32

GLOBAL time_code

SECTION .mytext progbits alloc exec nowrite align=64

%macro read_timestamp_counter 0
xor eax, eax
cpuid
rdtsc
%endmacro

ALIGN XXX
time_code:
push ebx
push esi
read_timestamp_counter
mov esi, eax
%rep YYY
nop
%endrep
read_timestamp_counter
sub eax, esi
pop esi
pop ebx
ret

I then call time_code() 20,000 times from a tiny C program for
different values of XXX and YYY (XXX=0..63 and YYY=0..1023)

For a given XXX and YYY, I make sure I get consistent results, that
is I check there are no more than 40 values - 0.2% - different from
the preponderant value.

For a given YYY, I check that all results are identical.

Conclusions:

In this case, alignment does not matter, which is not surprising
since NOPs are only one byte long. But I had to check if I wanted to
be thorough.

I computed a linear regression: CYCLES = (YYY + 1) / 3 + 60
The formulate is accurate to within one cycle, as long as x is
greater to 6.

Interesting points:

The first and second measurements are often incorrect, but the third
measurement is always correct. Therefore the first two measurements
should be discarded.

There are some strange results for several values of YYY:
11 64
12 64
13 64
14 65
15 65
16 65
17 68 *** EXPECTED 66 ***
18 67 *** EXPECTED 66 ***
19 67 *** EXPECTED 66 ***
20 67
21 67
22 67
23 68
24 68
25 68

17 is funky :-)

Anyway, I'll be happy to hear any comments, or suggestions. Next
week, I'll play around with instructions other than NOPs. I suspect
alignement will then matter, I wonder how much.

My ultimate goal is to write an "optimizing assembler", that is an
assembler which reorders instructions to find the optimal code
schedule on a particular x86 micro-architecture.

Cheers,

Shill
  Reply With Quote


 


2 12th October 19:38
jdm95003
External User
 
Posts: 1
Default Cycle-accurate measurement on the Athlon



Hi,

Are you doing something to prevent interrupts from affecting
your results? How about time-slicing? Are you running this
on an OS that does not use time-slicing (like DOS)? If you
are trying to get results accurate to 1 cycle, these things
would matter a lot.

Jim Monte
  Reply With Quote
3 12th October 19:38
matt taylor
External User
 
Posts: 1
Default Cycle-accurate measurement on the Athlon


So it would seem, but when timing small pieces of code they aren't an issue
at all. Scheduling overhead is thousands of cycles. Interrupt latency is
also extremely high. For small pieces of code these are a huge bias and easy
to detect. My test bench handles this by taking 16 results and discarding
the whole set until it gets good results. The most effective way to do that
is to loop until all 16 results are identical.

-Matt
  Reply With Quote
4 12th October 19:38
grumble
External User
 
Posts: 1
Default Cycle-accurate measurement on the Athlon


A typical 10 ms time slice will last over 10 million cycles on
modern processors. Moreover, when an interrupt occurs, it takes the
OS several thousand cycles to service it, empirically I would say
between 5,000 and 30,000 cycles.

When timing only a few thousand instructions, it is easy to detect
whether an interrupt occurred, because the measurement will be much
higher than expected. If we time the code twice in a row, the
probability that an interrupt occur both times is minuscule.
  Reply With Quote
Reply


Thread Tools
Display Modes




666