Here's my code for testing the speed of various memory copy functions. The value printed by the print function after each 100 iterations of the function being tested is the average time (in milliseconds) that it took to execute that function. The below VB6 source code has comments that show how it works.
When the program is actually run, I find that there is really no speed difference at all between the different functions. Not sure why this is, but maybe on modern CPUs, it always takes the same amount of time to copy a given number of bytes, regardless if they are copied by Byte, Word, or DWord. So copying 4 bytes takes the same amount time as copying 2 words or 1 dword. Unlike on older CPUs, maybe you don't get a speed boost by optimizing your program, by having it copy dwords or words instead of bytes.
Here's the results of running this program 3 different times.
First time I ran the program:
25.66
26.17
25.90
25.83
26.29
25.71
Second time I ran the program:
27.36
30.50
30.17
26.73
26.88
26.18
Third time I ran the program:
25.58
25.98
25.64
25.44
25.86
25.73
As you can see, the there is no consistency at all between different times I ran the tester program. Nor is there any consistency regarding which function is faster. Sometimes one function was faster, and sometimes another one was faster. The only thing consistent is that the times tended to hover around 26ms, and every once in a while the functions (for no apparent reason) ran slower, sometimes taking about 30ms to complete. I'm not sure what caused those outlier 30ms times. And all of these inconsistencies I've mentioned are present despite getting calculating an average time, by running a given function 100 times, each time it was tested. I hope somebody can explain these inconsistencies.
The first 5 Copy functions are ones in a DLL I made myself in assembly language, and assembled with FASM. Below is the source code for that DLL file. It's also has comments so you can see how it works.
Code:
Private Declare Sub CopyBytes Lib "FastMemCopy.dll" (ByRef Dest As Any, ByRef Src As Any, ByVal ByteCount As Long)
Private Declare Sub CopyWords Lib "FastMemCopy.dll" (ByRef Dest As Any, ByRef Src As Any, ByVal WordCount As Long)
Private Declare Sub CopyDWords Lib "FastMemCopy.dll" (ByRef Dest As Any, ByRef Src As Any, ByVal DWordCount As Long)
Private Declare Sub CopyBytesFast Lib "FastMemCopy.dll" (ByRef Dest As Any, ByRef Src As Any, ByVal ByteCount As Long)
Private Declare Sub CopyWordsFast Lib "FastMemCopy.dll" (ByRef Dest As Any, ByRef Src As Any, ByVal WordCount As Long)
Private Declare Sub CopyMemory Lib "kernel32.dll" Alias "RtlMoveMemory" (ByRef Destination As Any, ByRef Source As Any, ByVal Length As Long)
Private Declare Function timeBeginPeriod Lib "winmm.dll" (ByVal uPeriod As Long) As Long
Private Declare Function timeEndPeriod Lib "winmm.dll" (ByVal uPeriod As Long) As Long
Private Declare Function timeGetTime Lib "winmm.dll" () As Long
Private Sub Form_Load()
Dim Mem1(100000000 - 1) As Byte
Dim Mem2(100000000 - 1) As Byte
Dim TimeStart As Long
Dim TimeEnd As Long
Dim TimePassed As Double
Dim TimePassedAvg As Double
Dim i As Long
timeBeginPeriod 1
'Perform 100 iterations of copying 100 million bytes, 1 byte at a time
TimePassedAvg = 0
For i = 1 To 100
TimeStart = timeGetTime
CopyBytes Mem2(0), Mem1(0), 100000000
TimeEnd = timeGetTime
TimePassed = TimeEnd - TimeStart
TimePassedAvg = TimePassedAvg + TimePassed / 100
Next i
Print TimePassedAvg
'Perform 100 iterations of copying 100 million bytes, 2 bytes at a time
TimePassedAvg = 0
For i = 1 To 100
TimeStart = timeGetTime
CopyWords Mem2(0), Mem1(0), 50000000
TimeEnd = timeGetTime
TimePassed = TimeEnd - TimeStart
TimePassedAvg = TimePassedAvg + TimePassed / 100
Next i
Print TimePassedAvg
'Perform 100 iterations of copying 100 million bytes, 4 bytes at a time
TimePassedAvg = 0
For i = 1 To 100
TimeStart = timeGetTime
CopyDWords Mem2(0), Mem1(0), 25000000
TimeEnd = timeGetTime
TimePassed = TimeEnd - TimeStart
TimePassedAvg = TimePassedAvg + TimePassed / 100
Next i
Print TimePassedAvg
'Should dentical to the fourth test, as 100000000 is an exact multiple of 4 bytes
TimePassedAvg = 0
For i = 1 To 100
TimeStart = timeGetTime
CopyBytesFast Mem2(0), Mem1(0), 100000000 'Copy as many 4byte blocks as possible and then copy remaining data 1 byte at a time
TimeEnd = timeGetTime
TimePassed = TimeEnd - TimeStart
TimePassedAvg = TimePassedAvg + TimePassed / 100
Next i
Print TimePassedAvg
'Should dentical to the fourth test, as 100000000 is an exact multiple of 4 bytes
TimePassedAvg = 0
For i = 1 To 100
TimeStart = timeGetTime
CopyWordsFast Mem2(0), Mem1(0), 50000000 'Copy as many 4byte blocks as possible and then copy remaining data 2 bytes at a time
TimeEnd = timeGetTime
TimePassed = TimeEnd - TimeStart
TimePassedAvg = TimePassedAvg + TimePassed / 100
Next i
Print TimePassedAvg
'Perform 100 iterations of copying 100 million bytes using CopyMemory
'Not sure what method CopyMemory uses, but it is supposed to work on overlapping memory regions, so it must use an advanced technique
TimePassedAvg = 0
For i = 1 To 100
TimeStart = timeGetTime
CopyMemory Mem2(0), Mem1(0), 100000000
TimeEnd = timeGetTime
TimePassed = TimeEnd - TimeStart
TimePassedAvg = TimePassedAvg + TimePassed / 100
Next i
Print TimePassedAvg
timeEndPeriod 1
End Sub
Here's the results of running this program 3 different times.
First time I ran the program:
25.66
26.17
25.90
25.83
26.29
25.71
Second time I ran the program:
27.36
30.50
30.17
26.73
26.88
26.18
Third time I ran the program:
25.58
25.98
25.64
25.44
25.86
25.73
As you can see, the there is no consistency at all between different times I ran the tester program. Nor is there any consistency regarding which function is faster. Sometimes one function was faster, and sometimes another one was faster. The only thing consistent is that the times tended to hover around 26ms, and every once in a while the functions (for no apparent reason) ran slower, sometimes taking about 30ms to complete. I'm not sure what caused those outlier 30ms times. And all of these inconsistencies I've mentioned are present despite getting calculating an average time, by running a given function 100 times, each time it was tested. I hope somebody can explain these inconsistencies.
The first 5 Copy functions are ones in a DLL I made myself in assembly language, and assembled with FASM. Below is the source code for that DLL file. It's also has comments so you can see how it works.
Code:
format PE GUI 4.0 DLL
entry dllmain
include "macro\export.inc"
Arg1 equ ebp+8
Arg2 equ Arg1+4
Arg3 equ Arg2+4
section ".text" code readable executable
dllmain:
mov eax,1
ret 12
CopyBytes:
push ebp
mov ebp,esp
push esi
push edi
push ecx
mov edi,[Arg1]
mov esi,[Arg2]
mov ecx,[Arg3] ;Number of bytes to copy
rep movsb ;Copy data 1 byte at a time
pop ecx
pop edi
pop esi
leave
ret 12
CopyWords:
push ebp
mov ebp,esp
push esi
push edi
push ecx
mov edi,[ebp+8]
mov esi,[ebp+12]
mov ecx,[ebp+16] ;Number of words (2 byte blocks) to copy
rep movsw ;Copy data 1 word at a time
pop ecx
pop edi
pop esi
leave
ret 12
CopyDWords:
push ebp
mov ebp,esp
push esi
push edi
push ecx
mov edi,[ebp+8]
mov esi,[ebp+12]
mov ecx,[ebp+16] ;Number of dwords (4 byte blocks) to copy
rep movsd ;Copy data 1 dword at a time
pop ecx
pop edi
pop esi
leave
ret 12
CopyBytesFast:
push ebp
mov ebp,esp
push esi
push edi
push ecx
mov edi,[Arg1]
mov esi,[Arg2]
mov eax,[Arg3] ;Number of bytes to copy
xor edx,edx
mov ecx,4
div ecx
mov ecx,eax
rep movsd ;First, copy as much data as possible 4 bytes at a time
mov ecx,edx
rep movsb ;Then, copy remaining data 1 byte at a time
pop ecx
pop edi
pop esi
leave
ret 12
CopyWordsFast:
push ebp
mov ebp,esp
push esi
push edi
push ecx
mov edi,[Arg1]
mov esi,[Arg2]
mov eax,[Arg3] ;Number of words to copy
xor edx,edx
mov ecx,2
div ecx
mov ecx,eax
rep movsd ;First, copy as much data as possible 2 words at a time
mov ecx,edx
rep movsw ;Then, copy remaining data 1 word at a time
pop ecx
pop edi
pop esi
leave
ret 12
section ".edata" export readable
export "FastMemCopy.dll",\
CopyBytes, "CopyBytes",\
CopyWords, "CopyWords",\
CopyDWords, "CopyDWords",\
CopyBytesFast, "CopyBytesFast",\
CopyWordsFast, "CopyWordsFast"
section ".reloc" fixups readable
dq 0