Re: further optimizations
- From: "Wolfgang Kern" <nowhere@xxxxxxxx>
- Date: Mon, 30 Jul 2007 11:15:54 +0200
Wannabee wrote:
Nonsense :) MOV don't need a LOCK! only RD-modify-WR needs LOCK.
It does need lock if data is not aligned?
[<test: D$ 0 064 0 050]
mov edx test
mov al B$edx ;ok
mov al B$edx+1 ;ok
mov ax w$edx ;ok
mov ax W$edx + 1 ; not atomic need lock? (below pentium yes)
mov eax D$edx ;ok atomic
mov eax D$edx + 1; not atomic below p6, atomic if cached and not crossing
cache line.
Is this correct? or did I misunderstand?
I don't play much with Intel's,
but I think there should not be any difference with LOCK,
The LOCK prefix is only needed in a MultiProcessor environment,
there are only a few instructions which allow and may need a LOCK
[ADC,ADD,AND,BTC,BTR,BTS,CMPXCHG,CMPXCHG8B,DEC,INC,NEG,NOT,OR,SBB,
SUB,XADD,XCHG(implied) and XOR] for memory operands,
all others can be assumed to never become split by dual CPU access.
Lock is automatically applied when executing an xchg against memory...
is what reads from my Intel manual.
Yes, the only inherent LOCKed instruction I know is XCHG reg,[mem].
[about ESP and IRQ...]
I haven't seen api calls for it (SetWindowHookEx just intercept MSG),
but many hw-drivers got both user- and kernel-mode functions and they
can replace IDT contents and often do it just for speed reasons.
Yeah? But how do they do it. Can you write up a small example?
You must have seen such code, right?
the seen examples looks like:
MOVZX eax byte[Irq_Nr] ;got from PCI-config (the driver know its hardware)
ADD al,[INT2IRQ_offset];got by readback PIC/APIC
SIDT qq[temp] ;are 6 bytes
MOV ecx,[temp+2] ;get base address
LEA ecx,[ecx+eax*8] ;add table offset to it
CALL ... ;some calulation for the new entry to EDX:EAX
MOV [ecx],eax ;write the new 8 byte entry to IDT
MOV [ecx+4],edx ;(some save the old contents before)
So it depends on what this entry contains if it act as an InterruptGATE,
an InterruptTRAP or a TASK-GATE (8E,8F,8D at the fifth byte).
And if the RPL field is 3 (and not a task-gate) then the IRQ will
interrupt a user application without swapping SS:ESP and of course
use the users STACK to push EFL,CS,EIP and pop them with IRET.
The users stack (from top to until incl.ESP) is not affected with this,
but everything below ESP is never usable anyway...
And that's why a PROC usually starts with
push ebp ;Ok, this first two could be renounced :)
mov ebp, esp ;
sub esp, xxx ;"create a stack frame for locals"
everything above and inclusive where ESP points to is now save from
beeing altered by IRQs, but not so for the things below it.
I still dont understand. (When considering the stack).
It had reprogrammed what you call IDT (interrupt table?) and after
terminated, the interupt would call to no longer existing memory?
But what has that todo with esp "abusing"?
I haven't seen where windoze/linux/any OS states that user
stacks may never be used for INT/IRQ.
In this case it would be very wise to disallow IRQs during
ESP-'abusing'.
Otherwise you easy encounter one of these forced crash situations. :)
Can you please explain in more detail? Are you saying a system level irq
can be hooked by an application, so that the application then uses the
stack of the system??
No I mean the opposite may occure, if a hardware driver installs its IRQ-
routine which runs at PL=3 then the user stack becomes part of the story.
Part of what story? I cannot see how programs can interact like that
without using "seperated" stacks. (Making sure context is correct) Either
there is one stack, in the one interrupt routine,`that is not seen and
used elsewhere or there are two or more stacks? I mean contexts.
Regardless of the interrupt routine got an own stack,
an IRQ needs to push/pop EFL,CS,EIP to an already existing active stack
(and SS:ESP if CPL<>RPL).
I try really hard to understand what you are saying, but I cannot for the
life of me fit it together with anything. It allmost like you have deviced
your own art of sounding completly ununderstandable. :)
Sorry for that :) it is a confusing story anyway.
Can you please try again? It must be rather a small job to provide an
example showing what you mean. I like very much to see the driver code
that installs and replaces such an irq,
as shown above.
and how it manages the stacks,
IF CPL<>RPL (current different from requested priviledge) then
an IRQ pushes five parameters; ELSE only three
A long story.., You can read the full IRET behaviour in AMDs 24594.pdf
pages 305..310 it exactly explain the reverse of an IRQ action.
and why and how an interrupt can occur and why and how the occuring
interrupt could ever care about the previous/current content of esp,
at the time of the interrupt.
Because any hardware may raise an IRQ pin anytime by whatsoever cause
and it needs a valid stack (uses the current active) to work at all.
You can try to watch the memory contents below your ESP and tell
yourself if the pattern their are from your app or from elsewhere.
__
wolfgang
.
- Follow-Ups:
- Re: further optimizations
- From: //\\\\o//\\\\annabee
- Re: further optimizations
- References:
- further optimizations
- From: bob
- Re: further optimizations
- From: Wolfgang Kern
- Re: further optimizations
- From: bob
- Re: further optimizations
- From: Wolfgang Kern
- Re: further optimizations
- From: ¬a\\/b
- Re: further optimizations
- From: Wolfgang Kern
- Re: further optimizations
- From: //\\\\o//\\\\annabee
- Re: further optimizations
- From: Wolfgang Kern
- Re: further optimizations
- From: //\\\\o//\\\\annabee
- Re: further optimizations
- From: Wolfgang Kern
- Re: further optimizations
- From: //\\\\o//\\\\annabee
- Re: further optimizations
- From: Wolfgang Kern
- Re: further optimizations
- From: //\\\\o//\\\\annabee
- further optimizations
- Prev by Date: Re: Macro and/or Compile-Time-Language Iteration with datatypes??
- Next by Date: Re: Linux X demo
- Previous by thread: Re: further optimizations
- Next by thread: Re: further optimizations
- Index(es):
Relevant Pages
|