Re: GGC's Machine Code Production ?
- From: "christian.bau" <christian.bau@xxxxxxxxxxxxxxxxxx>
- Date: Thu, 1 May 2008 07:37:45 -0700 (PDT)
On May 1, 3:20 pm, Taygun Kekec <taygunke...@xxxxxxxxx> wrote:
You are totally right ,I also think the optimized one should finish
filling quicker than Unoptimized.c does.
But Unoptimized.c finishes quicker...
Compile & Run and you will see.
On a typical 32 bit system, the memory that you are filling is about
900 Megabyte.
Now have a look in which order you are storing the values: In the
"unoptimised" version, you are storing to array elements in sequential
order. In the so-called "optimised" version, half of the stores are in
sequential order, but then your code will have to read up to 15000
different pointers, and store a single four byte value. Each value
stored in a completely different location.
Modern processors usually transfer data from and to memory in units of
whole cache lines, often 64 byte or more per cache line. The code
accessing the data in a linear way only writes whole cache lines as
needed. The "optimised" code has to read and write a whole cache line,
just to store a single integer.
Now if you want to really, really optimise the code: Consider that on
most implementations, memcpy is highly optimised in ways that you
could never think of. How could you achieve the task with 99% of the
work done in calls to memcpy?
.
- Follow-Ups:
- Re: GGC's Machine Code Production ?
- From: Willem
- Re: GGC's Machine Code Production ?
- References:
- GGC's Machine Code Production ?
- From: Taygun Kekec
- Re: GGC's Machine Code Production ?
- From: Thad Smith
- Re: GGC's Machine Code Production ?
- From: Taygun Kekec
- GGC's Machine Code Production ?
- Prev by Date: OT - Re: K&R2, exercise 6.4
- Next by Date: Reverse and print a string
- Previous by thread: Re: GGC's Machine Code Production ?
- Next by thread: Re: GGC's Machine Code Production ?
- Index(es):
Relevant Pages
|
Loading