Re: Optimization problem



According to <tnorgd@xxxxxxxxx>:
- I was told that memory paging is a very important factor when
efficiency is concerned

This is "old lore": a pearl of wisdom which, as time flies, has gone
a bit stale.

Schematically, you have several layers of physical memory. What the
application sees is a "virtual memory space", i.e. a conceptually huge
bag of bytes. Underneath, both the hardware and the operating system
perform a number of tricks to simulate that huge space efficiently,
given the physical constraints of what silicon provides.

The innermost level is the L1 cache (L = level) which is located in the
CPU core and is able to respond to the CPU at full speed (i.e. read or
write access in one clock cycle). On my home PC (an Intel Q6600 at 2.4
GHz), each of the four CPU cores has 32 KBytes of L1 cache for data (and
another 32 KB for code). The CPU is most happy when the data set fits in
the L1 cache. The CPU keeps in L1 cache a copy of the most recently
accessed data (this last statement is a very rough approximation of how
a cache works, but this will do for the present discussion).

When the CPU accesses a data element which is not currently in the L1
cache, it must go to the next level, the L2 cache. The L2 cache is not
as fast as the L1 cache and reading in it imposes a huge penalty in
latency (I do not have figures on hand, but 20 cycles on first access
woulkd be typical). On the other hand, the L2 cache is quite bigger: my
Q6600 has 8 MBytes of L2 cache (split into two 4 MB halves, each half
being shared by two cores). The hardware applies various read-ahead
strategies to limit the latency in the typical case of sequential
accesses, so things actually work out more smoothly than what the "20
cycles" above would suggest. Nevertheless, if your dataset is quite
bigger than the L1 cache, chances are that your overall performance will
be more limited by memory bandwidth than raw computing power (a Q6600
can perform quite a lot in a single clock cycle).

When you go out of L2 cache as well, the CPU must rely on the plain RAM.
Access penalty is higher (very roughly a hundred clock cycles for first
access) and such RAM is bigger (8 GBytes on my PC). The RAM is better at
sending many successive bytes than bytes randomly located in RAM; and
when the CPU goes to the main RAM, it actually request a chunk of a few
dozen bytes, because usual applications access data sequentially, so
that it pays off to read some bytes in advance. Under ideal conditions,
my PC RAM will be able to send about 16 GBytes of data to the CPU per
second.

At that point, the hardware stops doing tricks (unless there is a L3
cache, by my PC does not have such a cache). The operating system
implements the next step.

The operating system presents to the application a virtual space which
consists of pages (say 4 or 8 KBytes each, depending on hardware). From
the application point of view, this is all "memory". Underneath, the
operating system uses the MMU (a hardware facility) to map those pages
on actual physical RAM. If the application requires more RAM than is
available, then the operating system will mark some pages as "not
available yet". If the application accesses such a page, the OS
transparently take over, and fills in the page with the requested
contents, taken from the harddisk. This requires the OS to kick another
page out of physical RAM. Some pages are copies of data which already
exists on the harddisk (pages which contain executable code taken from
an executable file), while others have fresh data which the OS must then
copy into a dedicated area on the disk (swap space). Basically, the OS
performs the same caching process than what the hardware does, except
that it does it between the RAM and the harddisk.

Only that last phase (OS-handled caching between physical RAM and
harddisk) is really called _paging_. Briefly stated, if you hit the
paging state (i.e. your application is too big to fit in RAM) then you
are doomed. Garbage collectors, in particular, have a habit of hitting
much of the application memory on a regular basis, which means heavy
disk trashing in paging conditions. The GC used by Java is quite good in
that respect, but only relatively to other GC-based systems, so you
really do not want to get out of physical RAM. Fortunately, RAM prices
have so dropped in later years that paging is steadily becoming a thing
of the past (in 1994, on my PC, simply booting and getting with a
graphical interface with a single window implied a fair amount of
paging; now, 15 years later, I never hit paging at all).


So you should worry about sizes of L1 and L2 caches, not paging.


- Does anybody have a better solution to this problem?

Performance problems should be addressed with actual measures. The
"better" solution would be to design your Point class so that how it
stores and accesses data is of no importance to the rest of the
application, at the source level. This would allow you to try several
strategies and find out which one works best with your hardware.


--Thomas Pornin
.



Relevant Pages

  • Re: Purchasing the correct hardware: dual-core intel? Big cache?
    ... to 1/2 day purely by tuning the SQL. ... disk/data layout, then finally the CPU. ... Fact is, with 4G of RAM most of the data sits in RAM, so reads incur ... but I'm not sure how to tell if the cache is getting used ...
    (freebsd-questions)
  • Re: upgrading memory
    ... what is in memory has to be written on the disk and the new ... Although this is paging, it's essentially one-time paging, usually ... and it doesn't use the CPU and it doesn't page. ... Additional RAM might help some, but how just much it'll help mostly ...
    (microsoft.public.windowsxp.basics)
  • Re: Purchasing the correct hardware: dual-core intel? Big cache?
    ... that stuff...it really shouldn't be CPU bound. ... Fact is, with 4G of RAM most of the data sits in RAM, so reads incur ... but I'm not sure how to tell if the cache is getting used ... We're looking hard at getting either Intel dual-core procs, ...
    (freebsd-questions)
  • Re: memory reading and writing
    ... from the CPU and distributing the processing around...AGP cards ... being linked up to a special higher-speed bus between CPU, RAM ... a "cache" for "texture bitmaps" and such that it can get much ... the video cards too...that's why they carry such large video RAM ...
    (alt.lang.asm)
  • Re: CPU Speed
    ... I just looked in the BIOS under the Advanced section and under the CPU Speed it shows: ... Cache RAM: 512 KB ... To UNSUBSCRIBE, email to debian-user-REQUEST@xxxxxxxxxxxxxxxx ...
    (Debian-User)