Re: Paging/Segmentation: how are they realy implemented



"Maria" <spamtrap@xxxxxxxxxx> wrote in message
news:1143291256.725291.217630@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
I have read about paging, segmentation and paged segmentation and I
believe I have (nearly) understood how these techniques are implemented
in hardware. However, I am till confused about the some details which
I'll highly appreciated your assistance on.

1- When using pure paging and for a page size equal to 4KB=2^12,
each page should be located at 4KB's offset in the main memory.
There is no similar restriction with segmentation since segments don't
have a pre-set size. What about paged segmentation? Should
segments be located at 2^12 boundaries since each segment now is
a set of pages [assume page size is equal to 2^12].

Segments are _not_ sets of pages. It may make sense to set segment bases to
coincide with the start of a page, but it's perfectly valid to set a segment
to extend from the middle of one page to the middle of another.

It is very rare for a system to implement both segments and paging at the
same time. When paging is used, typically the code, data, and stack
segments all refer to the entire address space; this is referred to a "flat"
memory space.

2- How can we as users choose any of the above techniques, is there any
register to set by the compiler (or the linker, loader)?

In a modern OS, segments and paging are handled by the kernel and user code
just accepts the documented model.

MS DOS barely qualifies as an OS, partly because it doesn't do much of
anything with memory management (or drivers). This is the only OS in even
minimal use where user code has access to the necessary registers and memory
structures used by the MMU.

3- From what I read segmentation requires the use of the assembly
indirect addressing, while each address contains two fields [register
segment: offset]. If we are in real mode, register segment content
[after probably right shift] determines the starting address of the
segment in the main memory and the offset field represents the offset
within the segment in the physical memory. However, if we are using
protected mode, the content of the register segment points to a segment
table which includes the starting address of the segment. Am I right?

In protected mode, the CS, DS, ES, FS, GS, SS registers hold selectors,
which reference entries in the GDT or LDT. It's not technically correct to
refer to them as segments, but many do.

4- Now regarding protected mode/real mode. Are they part of the CPU
modes which defines the execution mode? How are they related to user
mode and kernel mode?

User/kernel modes refer to protection levels. Protected/real modes refer to
how the MMU is operating.

5- Is the segment table part of the CPU architecture?

I'm not sure what that question means. The registers are definitely part of
the architecture, and the architecture defines what the LDT/GDT (if any)
looks like.

6- Who set the values of the segment registers and the segment table? I
presume the kernel.

Yes, the kernel handles all that.

Does the kernel decide on behalf of the user which INDEX VALUE
the code/stack/data/ etc segment is given and set the content of the
segment table entries [including in particular the starting address of
the segment in the main memory and its size] accordingly?

Yes. On systems with a flat memory model (virtually all of them these
days), all processes have constant selectors for the code, data, and stack
segments. In fact, the selectors are the same for all processes on the
system -- the trick is that each process has different page tables.

I presume the segment table content should be saved/updated each time a
segment is relocated?

If the kernel moves the location of a segment in memory, it will update the
LDT/GDT entry for that segment. User code won't (and can't) notice; that's
the point.

7- Sometimes I encounter while using my computer, system messages
[like protection errors] showing a similar address to OFFF:XXXX which
indicates a very high number of segments in an application , a highly
unlikely situation as the number of segments in an application tends to
be moderately small. So what 0FFF stands for and why it is as high?

0x0fff is the selector. Why it's so high, you'd need a debugger to
determine. I'd expect the normal CS/DS/SS selectors to be much lower.

8- Actually, how the CPU differentiates between the next 3 instructions
[in protected mode]:
Mov 3, CS:XXX
Mov 3, DS:XXX
Mov 3, SS:XXX

Will each of the above instructions be translated to a different
numerical code depending of the type of segment?

MOV DS:EAX, 3 is equivalent to MOV EAX, 3 since DS is the default segment
for data operations.

MOV CS:EAX, 3 will look similar except there will be a segment-override
prefix specifying the following instruction should use CS instead of DS.
Ditto for MOV SS:EAX, 3.

If this is case why not skip this "unnecessary" step by binding
permanently each segment register to a fixed entry in the segment
table? May be this has not been done because an application might have
a large number of segments than the available cpu segments registers.
As such a particular register will be used for more than one segment
and its content should be used to index the segment table. Am I right?

Protected mode was designed before paging came out, so the idea was that
each program would get its own code, data, and stack segments (i.e.
selectors), and protection between programs would be enforced by each
program not being allowed to access memory outside those segments. An
individual program was not expected to need more than these three segments,
though it was possible to do it if needed.

When paging was added, all of that became unnecessary since each program
could have a full 32-bit address space to itself.

9- The last question about pure segmentation. You can see from the
figure available at this link
http://www.cs.jhu.edu/~yairamir/cs418/os5/sld040.htm
that the address is considered as one unit value, instead of two
fields, and I have seen than in many other references and even exam
papers the students are asked to find a physical address (in pure
segmentation) for a particular virtual address. And the virtual address
given is simply one field hexadecimal value, example 0x43 instead of
two fields as pure segmentation is
described(http://www.cs.jhu.edu/~yairamir/cs418/os5/sld039.htm)

I understand that in paper we have to find the segment index and
offset by splitting the address(0x43) into two fields. However why the
address is considered as only one field instead of two fields. Will the
CPU appends CS content to XXXX while it encounters an instruction
similar to load 3, CS:XXXX

x86 is confusing because there are one, two, or three different address
spaces depending on whether protection and/or paging are enabled on the CPU,
and many people don't refer to them by their official names.

In protected mode without paging, which is what the slides you refer to seem
to explain, you have virtual addresses and linear addresses. CS:XXX is a
virtual address. When a program actually accesses that location, the MMU
will check to make sure the offset is less than the limit and then convert
the virtual address into a linear address by adding the segment base to the
offset. When paging is off, the linear address is the same as the physical
address.

In protect mode _with_ paging, the page tables are used to translate the
linear address into a physical address.

Note that in x86, virtual and linear addresses are limited to 32 bits, but
with PAE the physical address can be up to 36 bits (or 32 bits without PAE).

On a modern OS, the base of CS/DS/ES/SS are all 0 and the limits are all
2^32-1, which means the virtual and linear addresses are always the same*.
In fact, this is such an overwhelming trend that AMD64's long mode
completely ignores the base/limit for those four segments.

* FS and GS work differently, but that's a more complicated discussion (and
depends on which OS you're talking about).

S

--
Stephen Sprunk "Stupid people surround themselves with smart
CCIE #3723 people. Smart people surround themselves with
K5SSS smart people who disagree with them." --Aaron Sorkin

.



Relevant Pages

  • Re: Att. Alex Nichol -VM cont.
    ... Intel architecture involves three memory management models and they are FLAT ... "The segment selector identifies the segment to be accessed and the offset ... Section 3.6 Paging ... PSE flag ...
    (microsoft.public.windowsxp.general)
  • Re: Rados Sigma and the Halting Problem for Programs
    ... of assembler op-codes, register values, main memory plus ... divided in an op-codes segment, registers segment, ... main memory segment, screen segment, etc. ...
    (comp.theory)
  • Re: Rados Sigma and the Halting Problem for Programs
    ... of assembler op-codes, register values, main memory plus ... divided in an op-codes segment, registers segment, ... main memory segment, screen segment, etc. ...
    (sci.math)
  • Re: Rados Sigma and the Halting Problem for Programs
    ... of assembler op-codes, register values, main memory plus ... divided in an op-codes segment, registers segment, ... main memory segment, screen segment, etc. ...
    (sci.logic)
  • Re: Virtual memory?
    ... Intel architecture involves three memory management models and they are FLAT ... "The segment selector identifies the segment to be accessed and the offset ... Section 3.6 Paging ... page swapped out to disk is done in 4k incriments. ...
    (microsoft.public.windowsxp.general)