Re: xchg & lock question
- From: "Eman" <spamtrap@xxxxxxxxxx>
- Date: Tue, 28 Mar 2006 17:21:46 +0400
"robertwessel2@xxxxxxxxx" <spamtrap@xxxxxxxxxx> wrote in message
news:1143520003.863450.17400@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
[..]
> All IA32 processors automatically lock memory referencing XCHG
> instructions, as do 286s. 8086/8 and 8018Xs did not.
>
> You comment about the lock signal not being asserted outside of a
> particular CPU when that CPU has the data item cached (specifically if
> the processor has the cache line cached in an *exclusive* state), is
> correct, but somewhat irrelevant. From a software perspective the
> locked region does not have to exceed the actual memory operand
> (although it often does), so "locked" access to an exclusive cache line
> without asserting the actual inter-processor lock signal does not cause
> a software visible change, except for being much, much, faster.
Thanks for feedback, Robert. As i understand, not only CPU
but some other hardware may be in game with the bus. Does it matter?
Non cacheable memory (for example on a I/O card) isn't a problem, since
it won't, *ahem*, be cached. Whether atomic memory transactions make
it all the out to the device is a different question.
DMA accesses are not a problem since that can't (on a proper
implementation) interrupt an atomic RMW write cycle. A configuration
that DMAs into memory without regard for caching by CPUs is going to
have other troubles. If we're talking about something PC-like, there
won't be any problems.
Actually i'm not asm/hardware geek, but sometimes i'm using
asm in multithreading code and must be sure of my locks in a
multiprocessor case. So i need clear reason and idea why
Microsoft prefers using explicit LOCK in InterlockedExchange API
rather than XCHG. Some other people also "wondered that" (example:
http://coding.derkeiler.com/Archive/Delphi/borland.public.delphi.language.basm/2003-10/0359.html)
The questions arise from this point. Windows DDK guru told me
that "XCHG does not have an implicit lock", i'm not inclined
to distrust, so how should i interpret that in conformity with
CPU / hardware terms?
I saw some Linux kernel traffic a couple of years ago where someone
determined that the more complex cmpxchg loop usually runs faster than
a simple xchg does, IIRC, this was on P4s. I've not verified that so
do your own testing.
More relevant to Windows, however: the cmpxchg loop will work much
better on a uniprocessor system since without the lock prefix the
cmpxchg loop *won't* do the lock operation unconditionally, and will
run considerably faster. This factors into the way MS builds a lot of
their uni- vs. multi-processor code. If you look at a look at the
uniprocessor code, you see things like the lock prefixes being replaced
by no-ops, without any other code changes. So some part of this may be
driven by MS's build/validation process.
Thanks again for informative answer.
According to your explanation i conclude that xchg and the cmpxchg loop
provide the same logical functionality from a software perspective, even
in the multiprocessor case with a possible other bus agents. If so, the
latter reason (MS builds) looks to be most close to the point.
Just in case i've imagine the LOCKs in the InterlockedXXX APIs could be
NOPed at run time. However, dump of running Windows XP' kernel code on
PIII uniprocessor machine shows them in place.
.
- References:
- xchg & lock question
- From: Eman
- Re: xchg & lock question
- From: robertwessel2@xxxxxxxxx
- Re: xchg & lock question
- From: Eman
- Re: xchg & lock question
- From: robertwessel2@xxxxxxxxx
- xchg & lock question
- Prev by Date: Declaring x86_64 registers as input in gcc inline assembler
- Next by Date: Re: graphics in pmode
- Previous by thread: Re: xchg & lock question
- Next by thread: Name mangling under WASM
- Index(es):
Relevant Pages
|