Re: Paging/Segmentation: how are they realy implemented



Hi

I have read about paging, segmentation and paged segmentation and I
believe I have (nearly) understood how these techniques are implemented
in hardware. However, I am till confused about the some details which
I'll highly appreciated your assistance on.

1- When using pure paging and for a page size equal to 4KB=2^12, each
page should be located at 4KB's offset in the main memory. There is
no similar restriction with segmentation since segments don't have a
pre-set size. What about paged segmentation? Should segments be located
at 2^12 boundaries since each segment now is a set of pages [assume
page size is equal to 2^12].
Don't mix all this together ,Paging is disk ,segmenting is memory.
Yes the 4k boundary still exists if we are paging regardless of size.
Stick with me ,Hopefully after the answer in your question 3 this
will be more clear. You're getting into _virtual_ addressing here
and page faults.

2- How can we as users choose any of the above techniques, is there any
register to set by the compiler (or the linker, loader)?

Taking 'as a user' ,meaning as an assembly language
*application* programmer. You don't want to deal with
these at all ,that's operating system business. As an
*operating systems* writer ,do what you want ,unless
writing drivers and such for an existing system ,then
the system determines what *must* happen. The switches
are in registers within the CPU (see below).

3- From what I read segmentation requires the use of the assembly
indirect addressing, while each address contains two fields [register
segment: offset]. If we are in real mode, register segment content
[after probably right shift] determines the starting address of the
segment in the main memory and the offset field represents the offset
within the segment in the physical memory. However, if we are using
protected mode, the content of the register segment points to a segment
table which includes the starting address of the segment. Am I right?
This has to be broken down into parts to be understood ,as
is it is way too broad. I don't like the terms 'assembly
indirect addressing' and 'register segment: offset' ,how
about just saying indirect addressing or segment:offset format.

1 > Real mode (8086 emulation) has 20 bit addresses consisting
of a 16 bit segment that is shifted left 4 bits (multiplied by 16)
then added to a 16 bit offset. Thus there are hundreds of
segment:offset pairs that describe the same location in memory
(confusing enough for ya ). In real mode a segment of memory can
start on any 16 byte boundary (memory is limited to 1 meg.).
Q.E.D. now forget all about real mode like a good programmer :^)
and never discuss this again. Nor talk about selectors (we will)
or any thing else from the pre 70's.

2 > There are 2 versions of protected mode ,linear and virtual.
Virtual segmented memory mode has very little to do with real
mode segmentation. In this mode the XXXX:offset XXXX is NOT
the segment ...repeat ,not the segment ,it is the segment
selector. The segment selector points to the segment descriptor
in the table so to find an address like 20h:00100h do this.
1- 20h=10000 binary note bit 3 then shift selector right 3 bits .
2- bit 3 says look in the GDT (if 1 look in LDT)
3- selector is now 4h
4-go to the segment descriptor table an get the forth
entry (4h). It should be 64 bits and look like this:
bits 0 -----------------------------------------------------
limit bits(0-15)segment bits(0-23)access (8bits)PLLSTTTO

----------------------------------------------------------63
limit bits(16-19) A O D G segment bits (24-31)

5- assemble the limit bits ,if G bit is set ,multiply by 4096.
6- if this number is > offset then error with a GPF
7- assemble the segment bits and multiply them by 4096 (4K)
8- If P=0 this is not in memory ,get it back.
9- add segment to offset to get actual address

4- Now regarding protected mode/real mode. Are they part of the CPU
modes which defines the execution mode? How are they related to user
mode and kernel mode?
Yes ,Apples and oranges. User mode and kernel mode have to do
with the OS ,not the CPU.

5- Is the segment table part of the CPU architecture?
Yes ,the architecture goes beyond the physical and includes
software (in AMD x-64) that handles this. This is a tricky
question ,I certainly hope it is not phrased like this on
a test.

6- Who set the values of the segment registers and the segment table? I
presume the kernel. Does the kernel decide on behalf of the user which
INDEX VALUE the code/stack/data/ etc segment is given and set the
content of the segment table entries [including in particular the
starting address of the segment in the main memory and its size]
accordingly?
I presume the segment table content should be saved/updated each time a
segment is relocated?
Please say selector and descriptor table.
Yes ,the kernel must do this.MS-DOS uses real mode ,windows
3.x used virtual86 mode (we didn't cover this) ,linux uses
protected mode. Only the kernel knows what mode it needs.
In dos however a programmer can cheat and change modes
for 'dos extenders' (like emm386). You wouldn't want a *NIX
user to be able to change to real mode and perform a cli hlt
instruction set would you ?
In protected mode CS=DS=ES=SS=zero as the descriptor obfuscates
the true base address,only FS &GS are alterable.
Yes ,the descriptor table is changed by the OS when a selector
is paged.

7- Sometimes I encounter while using my computer, system messages [like
protection errors] showing a similar address to OFFF:XXXX which
indicates a very high number of segments in an application , a highly
unlikely situation as the number of segments in an application tends to
be moderately small. So what 0FFF stands for and why it is as high?

Must be windows (sigh) ,you mean GPF,don't mix the term memory
segment with code segment (as in CS). Although a program most
likely has a single memory segment (in a 32 bit os) that includes
DS,SS and all ,windows reports the 8086 address in the GPF (windows
is brain dead in lots of places). Remember in 8086 the segment can
be anywhere on a 16 byte boundary in the 1st meg (64 thousand places)
so that address could be showing only one segment loaded. Understand?
Unix OS's won't do this to you.

8- Actually, how the CPU differentiates between the next 3 instructions
[in protected mode]:
Mov 3, CS:XXX
Mov 3, DS:XXX
Mov 3, SS:XXX

Will each of the above instructions be translated to a different
numerical code depending of the type of segment?

If this is case why not skip this "unnecessary" step by binding
permanently each segment register to a fixed entry in the segment
table? May be this has not been done because an application might have
a large number of segments than the available cpu segments registers.
As such a particular register will be used for more than one segment
and its content should be used to index the segment table. Am I right?
These are not protected mode instructions.

Yes ,they will be translated ,but not on the type of segment.
When the code is compiles these values are computed (CS,DS,SS)
and an absolute address is formed. The OS will assign them to
a contiguous block of memory and set the descriptor accordingly.
does this make sense ?
The step is necessary as the code segment could be 2Meg ,the
data segment could be 2 Gig ,and the stack could be 4K. Every
application will be different.

It's not the number of segments ,but their size.


9- The last question about pure segmentation. You can see from the
figure available at this link
http://www.cs.jhu.edu/~yairamir/cs418/os5/sld040.htm
that the address is considered as one unit value, instead of two
fields, and I have seen than in many other references and even exam
papers the students are asked to find a physical address (in pure
segmentation) for a particular virtual address. And the virtual address
given is simply one field hexadecimal value, example 0x43 instead of
two fields as pure segmentation is
described(http://www.cs.jhu.edu/~yairamir/cs418/os5/sld039.htm)

This is overly confusing ,these are the worst tables I have
seen ,and the wording is extremely inaccurate.There are 3 ways
to give an address;
1>Linear ;good we know exactly where it is
2>segment:offset ;bad ,but used to clarify boundary's & jmps,etc...
Caution :you must know if this is 8086,32X32,or virtual.
3>selector:offset ;used for virtual addresses

I understand that in paper we have to find the segment index and
offset by splitting the address(0x43) into two fields. However why the
address is considered as only one field instead of two fields. Will the
CPU appends CS content to XXXX while it encounters an instruction
similar to load 3, CS:XXXX
You must extrapolate the mode from the problem.Given the above
I would guess 8086 -- (CS*16)+0x43
if they said RCX -- (RCX*4096)+)0x43
If there was a GDT or LDT figure on the test
I would guess follow the procedure in Question3 part 2 above.

Many thanks for your help and sorry for the long message
Regards :)
Hope this helps ,
Mark Whitlock.

.



Relevant Pages