Mombu the Programming Forum

Go Back   Mombu the Programming Forum > Programming > fastest memory copying...
User Name
Password
REGISTER NOW! Mark Forums Read




Reply
1 19th October 09:38
robal
External User
 
Posts: 1
Default fastest memory copying...



Hi all.
I need very, very fast memory copying routine - blocks are about 50kB and
bigger. Do you know some interesting code to dowload? I have no idea about
assembler, but memory copying is critical in my current project, so could
you help me?
  Reply With Quote


 


2 19th October 09:38
spike
External User
 
Posts: 1
Default fastest memory copying...



What is your current project?

//Spike
  Reply With Quote
3 19th October 09:38
matt taylor
External User
 
Posts: 1
Default fastest memory copying...


K7 Optimization Manual goes through a rigorous process of deriving a fast
memcpt for Athlon. They got 1.6 GB/sec. I am not sure exactly what the specs
on that system were, but the maximum theoretical throughput could not have
been more than 2.1 GB/sec.

You can also look at the code for ScienceMark (www.sciencemark.de). They
have some copy memory routines that are freely downloadable.

Also, look at the comments made in the thread "memcpy bandwidth on PIV" just
6 days ago. You may find some further relevant information there.

-Matt
  Reply With Quote
4 19th October 09:39
ivan korotkov
External User
 
Posts: 1
Default fastest memory copying...


If you're using C, why not simply
_asm
{
mov esi, offset source
mov edi, offset dest
mov ecx, size
rep movsd
}

It's fast enough.

Ivan
  Reply With Quote
5 19th October 09:39
matt taylor
External User
 
Posts: 1
Default fastest memory copying...


An MMX copy loop is superior. The K7 Optimization Manual suggests using rep
movsd for blocks between 16 and 512 bytes in size. Above 512 bytes, MMX
should be used. I haven't seen a figure given for Pentium 4, but I would
expect it to favor MMX even moreso than Athlon.

If you time your prefetches correctly, you get even better performance.

-Matt
  Reply With Quote
6 19th October 09:40
tim roberts
External User
 
Posts: 1
Default fastest memory copying...


Actually, if you're using C, the single statement:
memcpy( dest, source, size );
will expand to exactly that, inline, as long as optimizations are turned
on.

However, as Matt pointed out, it is possible to beat "rep movsd" if you're
willing to work at it.
--
- Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.
  Reply With Quote
7 19th October 09:41
hutch
External User
 
Posts: 1
Default fastest memory copying...


The fastest memory copy algorith I have seen was one posted by a guy
called LINGO in the win32asm forum about 6 months ago, it was an XMM
algorithm using software pretouch of the data ahead and from memory it
was an unrolled loop as well.

Without MMX or XMM the REP MOVSD style of block copy is hard to beat.

Regards,

hutch at movsd dot com
  Reply With Quote
8 19th October 09:41
matt taylor
External User
 
Posts: 1
Default fastest memory copying...


Before MMX was introduced, people used the FPU to do the same thing. Even on
the 486, it was demonstrated that a simple unrolled mov/add/mov/add loop to
emulate movsd would beat rep movsd.

I have tested SSE loops on my Athlon, and it only performs worse than an MMX
copy. K8 should get the same results: it likewise has an 8-byte bus. SSE
could help a Pentium 4 out, though.

-Matt
  Reply With Quote
9 19th October 09:41
randall hyde
External User
 
Posts: 1
Default fastest memory copying...


"Even on the 486..."

Actually, it wasn't until later Pentiums that Intel improved the performance
of
rep movsb/w/d to the point it rivalled straight line code in performance.

The bad news is that most memory moves are *still* bus bandwidth limited.
So most of the games people play with block moves are only useful on
certain machines, when moving around blocks of data that fit within the
cache.

Cheers,
Randy Hyde
  Reply With Quote
10 19th October 09:41
hutch
External User
 
Posts: 1
Default fastest memory copying...


I agree with Randy's comment on memory bandwidth but both MMX and XMM
have a trick that I don't think can be done with the standard 32 bit
instructions, the capacity to read from cache and write back without
cache pollution does yield improved performance in memory copy.

With XMM an instruction like MOVNTPS which will only work on 16 byte
aligned memory reads but its write rate is far higher than with a
normal integer inastruction.

Regards,

hutch at movsd dot com
  Reply With Quote
Reply


Thread Tools
Display Modes




666