Fastest Code for byte-substitutions in a string
Not on *every* processor:
Clocks Size
Operands 808x 286 386 486 Bytes
LOOP: label: jump 18 8+m 11+m 6 2
DEC: reg8 3 2 2 1 2
JNZ: label: jump 16 7+m 7+m 3 2-4
8086: LOOP is 18; DEC/JNZ is 19 (3+16)
80286: LOOP is 8+m; DEC/JNZ is 9+m (2+7+m)
80386: LOOP is 11+m; DEC/JNZ is 9+m (2+7+m)
80486: LOOP is 6; DEC/JNZ is 4 (1+3)
So on 808x and 80286, LOOP is faster. On 8088, it is MUCH faster
because it's a 2-byte opcode that easily fits in the prefetch queue,
whereas DEC/JNZ is minimum 4 bytes and probably won't be in the
prefetch queue.
Maybe you meant to write "on every processor with 32-bit registers"?
|