Re: xchg & lock question



"robertwessel2@xxxxxxxxx" <spamtrap@xxxxxxxxxx> wrote in message
news:1143520003.863450.17400@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

[..]

> All IA32 processors automatically lock memory referencing XCHG
> instructions, as do 286s. 8086/8 and 8018Xs did not.
>
> You comment about the lock signal not being asserted outside of a
> particular CPU when that CPU has the data item cached (specifically if
> the processor has the cache line cached in an *exclusive* state), is
> correct, but somewhat irrelevant. From a software perspective the
> locked region does not have to exceed the actual memory operand
> (although it often does), so "locked" access to an exclusive cache line
> without asserting the actual inter-processor lock signal does not cause
> a software visible change, except for being much, much, faster.

Thanks for feedback, Robert. As i understand, not only CPU
but some other hardware may be in game with the bus. Does it matter?


Non cacheable memory (for example on a I/O card) isn't a problem, since
it won't, *ahem*, be cached. Whether atomic memory transactions make
it all the out to the device is a different question.

DMA accesses are not a problem since that can't (on a proper
implementation) interrupt an atomic RMW write cycle. A configuration
that DMAs into memory without regard for caching by CPUs is going to
have other troubles. If we're talking about something PC-like, there
won't be any problems.


Actually i'm not asm/hardware geek, but sometimes i'm using
asm in multithreading code and must be sure of my locks in a
multiprocessor case. So i need clear reason and idea why
Microsoft prefers using explicit LOCK in InterlockedExchange API
rather than XCHG. Some other people also "wondered that" (example:
http://coding.derkeiler.com/Archive/Delphi/borland.public.delphi.language.basm/2003-10/0359.html)
The questions arise from this point. Windows DDK guru told me
that "XCHG does not have an implicit lock", i'm not inclined
to distrust, so how should i interpret that in conformity with
CPU / hardware terms?


I saw some Linux kernel traffic a couple of years ago where someone
determined that the more complex cmpxchg loop usually runs faster than
a simple xchg does, IIRC, this was on P4s. I've not verified that so
do your own testing.

More relevant to Windows, however: the cmpxchg loop will work much
better on a uniprocessor system since without the lock prefix the
cmpxchg loop *won't* do the lock operation unconditionally, and will
run considerably faster. This factors into the way MS builds a lot of
their uni- vs. multi-processor code. If you look at a look at the
uniprocessor code, you see things like the lock prefixes being replaced
by no-ops, without any other code changes. So some part of this may be
driven by MS's build/validation process.

Thanks again for informative answer.
According to your explanation i conclude that xchg and the cmpxchg loop
provide the same logical functionality from a software perspective, even
in the multiprocessor case with a possible other bus agents. If so, the
latter reason (MS builds) looks to be most close to the point.

Just in case i've imagine the LOCKs in the InterlockedXXX APIs could be
NOPed at run time. However, dump of running Windows XP' kernel code on
PIII uniprocessor machine shows them in place.

.



Relevant Pages

  • 2.6.19-rc2 cpu hotplug lockdep warning: possible circular locking dependency
    ... Note that this is plain 2.6.19-rc2 (_without_ the slab cpu ... which lock already depends on the new lock. ... Using ACPI for SMP configuration information ... # ACPI Support ...
    (Linux-Kernel)
  • Re: Non blocking spin lock..
    ... The unconditional call to 'xchg' flushes the ... There is no need to synchronize the CPU if you definitely can't ... lock is attempted. ...
    (comp.programming.threads)
  • Re: race on multi-processor solaris
    ... > want to block if the lock holder is not running. ... and there is a CPU structure for each CPU. ... interrupts") are handled by "interrupt threads", ... Before we set the waiters bit, we grab the lock protecting the lock's ...
    (comp.unix.solaris)
  • Re: FreeBSD mail list etiquette
    ... :their Giant kernel lock, and their network lock. ... packet if the protocol thread is on a different cpu. ... These have to do with unexpected blocking deep in a ...
    (freebsd-hackers)
  • Re: LTTng finds abnormally long APIC interrupt handler : 58.2 ms
    ... start function tracing on the kernel_softirq_raise marker and the stop ... The first one can enable tracing for the local CPU, ... unique lock identifier as I assume, or is it a pointer to a lock "class" ?) ... The output, rather long, shows the functions called with the spinlock ...
    (Linux-Kernel)