Re: lock




o///annabee wrote:

Temp in this case, is one of the internal hidden registers within the
CPU itself. (You do know that the P4 as an example has 128 internal
registers that are used as part of the OOE engine and for register
renaming).

You just told me. :) I suspected it of course, but I did not know.

www.sandpile.org and arstechnica.com has a lot of info on the internals
of various CPUs. Ars is more of a news site, but Hannibal does
excellent architecture reviews of CPUs which will give you a good
understanding of how CPUs physically work.

XCHG can be slow if using memory due to the implied LOCK, but generally
if you're only using registers it's pretty quick (because of register
renaming within the OOE).

Yes. Thanks. That was one part I wanted to know, because since "xchg reg
reg" is one byte, it be a better alternative to "mov reg reg" in some
cases. Also, I think the first read through the Intel System manual,
regarding LOCK is a bit confusing. Someplace it says that lock is active
only if [mem] is involved, and sometimes it says that xchg allways incurs
a lock. And yet another place it says that a full lock is not needed at
all in some circumstances. Some places it says that if the source operand
is memory, we will have "occationally" a #UD exception..... Theres seems
to be some diffrences between each processor generation and so forth. I
was hoping that someone could give me a bit lighter version containing
only some hopefully simple rule that could take care of the most
significant points.

When LOCK is actually used depends on the specific revision and family
of the x86 CPU in question. eg, many x86 CPUs that will only ever work
in a single CPU setup and are not multi-core don't need to issue a lock
unless bus-mastering is involved. Dual-core or CPUs designed for SMP
setups will always issue a LOCK. Because of this, the docs get a little
vague and contradictory. However for simplicity sake, NEVER assume and
always prefix your code with LOCK (even though it's implied for memory
operations on current CPUs).

(When speaking of in the context of using multiple threads, and shielding
them from messing up for each other, and cause unpreditable slowdowns).

For semaphores, mutexes and spinlocks, cmpxchg is your friend.
Do you mean for _creating_ those or for something else?

This is the real (main) reason for my query. I want to know how (if) I can
replace the EnterCritialSection on a system like windows. I cannot see how
it can be done, without at least some API calls ?

Sure, AFAIK EnterCriticalSection is a wrapping call that does LOCK
CMPXCHG... and sets some thread information used by task manager,
perfmon and debugging extensions.

You can emulate all the functionality yourself if you so wish.

When I asked the reason for the slowness of xchg (LOCK) I was wanting to
know what exactly is causing this. In like for instance, does a lock
implicate that a lot of instructions after it become slow as well? In a
predictable or unpredictable way?

LOCK will simply lock the address/data busses, so if any other CPU or
external device wants to access memory, they are blocked from doing so
until the LOCK is released.

In addition. Can anyone tell me the practical diffrence between
serializing two threads and two CPUs? I mean, is there a great deal of
(practical) diffrence between running two threads on a single CPU and or
many?

Not really, on 2 physical CPUs the separate threads run at the same
physical time, but on a single CPU (single core) the separate threads
take turns running... You still to implement some method for each
thread to play nice with each other.

The reason I ask, and the reason I want a replacement of the API, is that
in my small experience with threads, once you take that step, the sources
of errors are caotically unpredictable.

Only if your design hasn't been thought out well, and you haven't taken
into account that different threads can be running at the same time.
Most of the errors I've seen relate to either race-conditions,
dead-locks or the programmer hasn't locked some shared memory.

An application might run well for
weeks, only on one machine. Some error may exits in it, that you are
unable to reproduce, or even discover, because of complexity and timing
issues, generating diffrent outcomes. The few apps I have that uses
threads, act incredibly diffrent from machine to machine and os to os.
Sometimes they work smooth and run like butter, while other times, they
start literally lagging, and some of theese lags results in crashes and
some do not. In fact this is also happening in other apps, including
proffesional apps. And seems espesially problematic when the OS is loaded
with many apps, or the apps with much data and or many threads. Therefore,
I think it would be vitaly important if we could be able to control this
locking mechanism ourself.

The problem is that multi-threaded programming is hard... but not
impossible.

I havent been thinking much about it, and I have not yet done any serious
test with trying to replace the OS functions. But if anyone else has done
this, I am interessted in knowing about it.

Sometimes I wonder if its at all possible to prove the correctness of a
multithreaded app?

Yes it is. Simply treat each thread as a separate application, and
ensure that any access to shared memory has the appropriate locks, and
that race-conditions and dead-lock issues are accounted for (dead-locks
can be very difficult to eliminate completely depending on your
application).

Maybe I am lazy for asking, but I have read a bit about threads over the
net, but whereas the general guidelines seems easy, implementing them in a
useful way seems to me very hard. Its anoying to think that something that
appear to run fine on one machine, looks like the utter *** on another,
and refuses to run on a third, and crashes after some time on the forth
and deadlocks on the fifth.... While the code seems correct on the tenth
read...... :(

If you want to read more about multi-threaded programming, then I
suggest you find a copy of Andrew Tannenbaum's "Operating Systems:
Design and Implementation, Third Edition" which goes into detail
regarding SMP programming at the OS level. Much of the theory also
applies to any multi-threaded programming. (2nd or even 1st editions
are fine as well). The book explains the various locking issues,
provides suitable work-arounds and may even explain why Windows does
things the way it does. (The book is heavily focused on Minix and OS
design, but many of the concepts apply to normal applications
themselves. Well, an OS is just an application itself).

Now some code...

Scenario: You have a shared buffer that 2 or more threads have access
to. You set a single dword to indicate which thread has access to the
shared memory... The dword is zero if the buffer is not locked.

lockMemBuffer:
;; expects ebx to be thread ID
;; expects edi to be memory location of byte
;; no registers preserved
xor eax, eax
lock cmpxchg dword [ds:edi], ebx
jnz .cantgetlock
mov eax, ebx ;; indicate we have a lock on memory
stc
ret
..cantgetlock:
;; eax has the thread that has the lock
clc
ret

Now you can call the above procedure from any thread, just set edi to
point to the location of your lock, and ebx to the thread ID of how
owns the lock. To release lock, simply write 0 (zero) to the dword
location. Carry is set if you have the lock, and not set if you don't,
eax will then contain the thread ID of the one that does.

So if you can get a lock on the shared memory, you can continue
process, and if not you can either process something else, or just keep
trying to obtain the lock either by sleeping a few nano seconds, or
just trying straight away... eg:

@@:
mov ebx, 01
lea edi, [memory_location_for_lock]
call lockMemBuffer
jc .@f ;; keep processing
mov eax, 05;
call sleep ; sleep for 5 nanoseconds
;; Instead of sleep you could call yeild() to
;; let the cpu do other stuff
;; and return back here after some
;; time
jmp @b
..@@:
...
<do what we need to do with the shared memory>
...
mov dword [memory_location_for_lock], 0
;; release the lock
...
<keep going>

PS. The above code is very sub-optimal.. but hey I can leave it up to
you to improve it and mold it to your needs.

--
Darran (aka Chewy509) brought to you by Google Groups!

.