Re: Question about Instruction Format (ModR/M)




RM vs. PM don't make a difference on maximum instruction length... the
following can be executed in real mode, and it's 14 bytes long

add dword cs:[eax+ebx*2+04030201h],08070605h

this is legal in any 16-bit code segment in any mode (RM/PM/VM) on 386+
machines... it would be coded as:

67 66 2E 81 84 58 01 02 03 04 05 06 07 08

(the first three bytes, being prefixes, could be in any order)


Althought it executes in RM on 386+ as an extension, technically it's not
RM
code. This is a 32-bit PM instruction executing in 16-bit RM/PM due to
the
overrides. It uses both a segment and size override. And, it uses two
32-bit offsets. Now, if it can be done without those two prefixes and
using
16-bit offsets...


you're confusing the processor mode (real vs protected) with the
instruction
encoding... the ONLY thing that affects this instruction's decode is
the
operand size and address size of the code segment when it gets decoded
by
the processor... just because it uses the 386+ 32-bit addressing
scheme and
32-bit operands by giving the Osize and Asize prefixes while executing
in a
16-bit code segment does not make it a "protected mode"
instruction.... to
say an instruction is a "protected mode" instruction is saying that
it
cannot be valid outside of protected mode,... the closest example that
comes
to mind would be instructions similar to Intel's VMCALL... valid ONLY
within
VMX operation, thus making that a VMX instruction...


I agree that the specific processor makes a difference though... I can't
think of a situation where normally you'd be able to get beyond 10 bytes
on
a 286, unless there were extra/unnecessary/redundant prefix bytes. (with
my
mind currently in 32-bit mode atm, I'm having problems even thinking of a
situation where you'd even reach 10 bytes on a 286 w/o redundant
prefixes)

...


closest I can find would be something like this:

lock
cs:
cs:
cs:
add word [0201h],0403h

encodes as 10 bytes:
F0 2E 2E 2E 81 06 01 02 03 04

Perfectly valid instruction... one more cs: would push it over that 10-
byte
limit though


5) Some instructions are one instruction if not followed by certain
postfixes and another if they are.
(e.g., sgdt is 0x0f 0x90 - but if followed by 0xc1, 0xc2, 0xc3,
0xc4,
is vmcall (0x0f 0x90 0xc1), vmlaunch, vmresume, vmxoff respectively,
many
more...)

0F 90 is SETO, not SGDT.. you mean 0F 01 followed by C1, C2, C3, or C4


Yes, 0f 91. (Can't read through all the commented C code...)


01... you must've still been tired... or your keyboard doesn't like
0's too
well


by the way, the reason for this is because the SGDT instruction doesn't
make
sense when used w/ a mod r/m byte of C0 or greater... it has to point to
a
memory, not a register... so these "unused" encodings are treated special
by
the CPU's decoder... not hard to handle in a disassembler.... if you
determine your mnemonic is SGDT, then before you actually commit that,
check
for the high two bits of the mod-r/m byte to see if they are both set...
if
so, your mnemonic is NOT SGDT but one of the VMxxxx's mentioned above


...


tell me if you need a better explanation?


not hard to handle in a disassembler...

Depends on the design. "Postfix" is a problem for table driven
disassemblers. Essentially, they have to assign one extra potentially
unused byte to the instruction format in the table. Then decode or fall
back to a different encoding - with one less byte. Of course, if it falls
back, they have to push the extra byte back onto the disassembly stream.
Then, when they attempt to reevaluate the stream, they need to use a
different table with a different instruction and instruction format. This
"fallback" format is radically different from most non-obsolete
instructions. It creates a decode which may generate a (temporary)
invalid
state, e.g., 1) valid or 2) invalid, adjust, valid. The f2,f3,66 override
prefixes on the SSE FP instructions are similar, but because they are
prefixes the decode table can be switched prior to decoding the opcode.



0F... lookup in table... says proceed to 2nd-byte table lookup
01... lookup in table... says SGDT if mod bits <> 11, else VMX Table
lookup


unless Intel/AMD/other CPU leaders can come up with a new processor
family,
situations like this will just keep getting worse... they keep trying to
cram as much as they can into the "unused" portions of the opcode
mappings...

I said basically the same thing to anonymous poster elsewhere. And, he
said
that wasn't the case. He claimed 1) that it'd break x86 compatibility and
2) he worked on the developement an x86 chip. At the time, I took it to
mean "now that 64-bit is here, 32-bit is dead..." I didn't bring up the
fact that there were many past situations where this had already occurred.


well Intel TRIED making a 64-bit processor from scratch... was it the
Itanium.. can't remember.. all i know is that I looked at the
programming
manuals for a few minutes and deleted them... it was such a
nightmare....
the processor was built for compilers to write the code, not humans...
(I
feel sorry for the human(s) that wrote the compiler)


0F would have been POP CS, not PUSH CS...
PUSH CS exists (0E)

Wow! Two valid corrections. Man, I don't recall being sleepy when I
posted. Good thing I posted a disclaimer... :)

reason there's no POP CS... you change CS via RETF/IRET/JMPF... if you
were
to be allowed to randomly change CS via POP *shiver*.... popping CS but
leaving IP alone?
...

Yeah, most would view that as similar to a NULL pointer... assuming that
it
jumped to some unknown. But, actually, I think it could've been useful.
Think of it as primitive task switching... You'd have to align the pop's
to
the same ip for each segment and have an entry point right after the pop's
in the other segment.

Seg1 Seg2
start1: start2:
pop cs pop cs ; to other seg entryX:
entry1: entry2:
... ...
push Seg2 push Seg1
jmp start1: jmp start2:


Anyone know if they did this on x86?


the theory is sound... but the nightmares...

it would make for some interesting anti-debugging code too... you'd
forget
where you're at or what you were debugging the thing for....

anyway.. I know by the 8088 that POP CS was invalid... don't know
about
earlier... Interestingly, though, WinXP's Debug.exe will assemble POP
CS as
0F...

--
Bx.C / x87asm


.



Relevant Pages

  • Re: Question about Instruction Format (ModR/M)
    ... 386+ 15 byte maximum instruction length - generates GP fault if exceeded ... the instruction length varies by CPU and CPU mode also. ... if it can be done without those two prefixes and using ... they have to push the extra byte back onto the disassembly stream. ...
    (alt.lang.asm)
  • Re: bit representation of assembler commands
    ... the tables in Appendix A translate from opcodes into assembly ... Chapter 3 translates mneumonics into opcodes. ... Redundant prefixes do not change the decoded instruction ... The maximum instruction length is 15 bytes. ...
    (comp.lang.asm.x86)
  • Re: Cross-Modifying Code
    ... > There's an explicit warning that instruction fetch and page table accesses ... > don't honour xFENCE for prefetched code either. ... > then you don't know when to issue the CPUID. ... The Opteron manuals don't comment on the interaction between executing code ...
    (comp.lang.asm.x86)
  • Re: Problem in understanding OpSiz (66h) and ArgSiz (67h) prefixes.
    ... >> for the following instruction only. ... modes are toggled if the 66/67 prefixes respectively are present. ... 2E0000 mov, al ... Redundant prefixes accumulate: ...
    (comp.lang.asm.x86)
  • Re: unset($this) or maybe unset(&$this)
    ... which does not involve a jump is executed the next instruction to be executed will be the very next instruction in the same object method. ... The code may still exist in the class definition, but the object, which contains a copy of that code in memory, does not, so how can it continue executing any of that code? ... When you call something like obj.Func; in C++ (assuming obj is an instance of class Foo), the compiler actually translates that to something like: ...
    (comp.lang.php)

Loading