Re: GGC's Machine Code Production ?



On May 1, 3:20 pm, Taygun Kekec <taygunke...@xxxxxxxxx> wrote:
You are totally right ,I also think the optimized one should finish
filling quicker than Unoptimized.c does.
But Unoptimized.c  finishes quicker...
Compile & Run and you will see.

On a typical 32 bit system, the memory that you are filling is about
900 Megabyte.

Now have a look in which order you are storing the values: In the
"unoptimised" version, you are storing to array elements in sequential
order. In the so-called "optimised" version, half of the stores are in
sequential order, but then your code will have to read up to 15000
different pointers, and store a single four byte value. Each value
stored in a completely different location.

Modern processors usually transfer data from and to memory in units of
whole cache lines, often 64 byte or more per cache line. The code
accessing the data in a linear way only writes whole cache lines as
needed. The "optimised" code has to read and write a whole cache line,
just to store a single integer.

Now if you want to really, really optimise the code: Consider that on
most implementations, memcpy is highly optimised in ways that you
could never think of. How could you achieve the task with 99% of the
work done in calls to memcpy?
.



Relevant Pages

  • Re: Cached memory never gets released
    ... Stock linux 2.4.26 kernel. ... Due to flash bug 3M of memory gets lost due to font memory getting lost ... The output of "free" cache number steadily grows. ... longer to exhaust all of system memory with the cache. ...
    (Linux-Kernel)
  • Re: Problem: Creating a raw binary string
    ... > While its true that a 64-bit cpu will move twice the data per instruction it ... > Memory bus width plays an important role here and unless it too is widened / ... You are forgetting the two levels of cache in the processor. ... The memory chips are addressed in Row col fashion. ...
    (alt.comp.lang.borland-delphi)
  • Re: Is Greenspun enough?
    ... Most OSes memory map executables directly from the file system so code doesn't pollute the file cache or swap space. ...
    (comp.lang.lisp)
  • Re: Superstitious learning in Computer Architecture
    ... Without a LOT of logic or some other better approach, re-executing the instructions requires re-decoding and it ties up the cache memory bus transferring more data as instructions than the instructions are working on. ... The concept of cache is fundamentally flawed in that it STILL restricts access to one word per clock cycle, when a single modern ALU can easily use 5 plus whatever is eaten up with instruction accesses. ... The size of an optimizing compiler is proportional to the SQUARE of the size of the language times the SQUARE of the complexity of the machine - because all interactions must be considered. ...
    (comp.arch.arithmetic)
  • Re: FPGA-based hardware accelerator for PC
    ... I know that in most cases the CPU ... that it contsins no cache, as BRAMs are too precious resources to be wasted ... The BRAMs are what define the opportunity, ... many threads with full associativity of memory lines using hashed MMU ...
    (comp.arch.fpga)

Loading