Memset me up Scotty.
From: Iman Habib (pixelpajasREMOVETHIS_at_hotmail.com)
Date: 02/23/04
- Previous message: Clax86: "Get the FAQs"
- Next in thread: Matt Taylor: "Re: Memset me up Scotty."
- Reply: Matt Taylor: "Re: Memset me up Scotty."
- Reply: Grumble: "Re: Memset me up Scotty."
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Mon, 23 Feb 2004 09:17:04 +0000 (UTC)
Hi guys..
I'm trying to pull out a fast memset routine out of my magic hat
for a toy 3D engine of mine.
And to be honest.. I suck at assembly optimizations. =...(
The routine i have manged to make is about twise as fast as
regular "rep stosd" 32 bit memset on my AMD Athlon XP.
But I am still not content as I have a gut feeling that it is
possible to make it faster.
So i'll let you guys poke at my memset code
and se if you can find more places to optmize. =)
Or even better.. some of you may have links to webpages that have better
code
cheers
//iman
-----------------8<----------------8<--------------------
inline void memset32mmx(unsigned int *dest, unsigned int c, unsigned int
len)
{
unsigned int apa[2];
apa[0] = apa[1] = c;
if(len < 2) { // i know i can remove the code here.. remake it, put
it in the next
_asm { // asm block and make it a bit faster.. but it wont be
significant.. do it later
mov eax,c
mov edi,dest
mov ecx,len
cld
rep stosd
}
return;
}
_asm {
mov edx, [dest]
mov eax, len
mov ecx, eax
shr eax, 1 //len/2
and ecx, 1 //len%2
movd mm1, c
movq mm0, [apa]
l:
movntq [edx], mm0
add edx, 8
dec eax
jnz l
test ecx, ecx
je q
sub edx, 4
movntq [edx], mm0
q:
// sfence
emms
}
}
-----------------8<----------------8<--------------------
cheers
//iman
- Previous message: Clax86: "Get the FAQs"
- Next in thread: Matt Taylor: "Re: Memset me up Scotty."
- Reply: Matt Taylor: "Re: Memset me up Scotty."
- Reply: Grumble: "Re: Memset me up Scotty."
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]