Re: speed of int vs bool for large matrix

From: jacob navia (jacob_at_jacob.remcomp.fr)
Date: 09/11/04

  • Next message: John Hanley: "int to string"
    Date: Sat, 11 Sep 2004 19:37:02 +0200
    
    

    Keith Thompson wrote:
    > jacob navia <jacob@jacob.remcomp.fr> writes:
    >
    >
    >>If you declare your table as a table of unsigned chars, memory
    >>requitrements will be 1 000 000 bytes, in the other case will be
    >>4 times as much (supposing a 32 bit machine), i.e. there is 4 times
    >>as much data that will fit the Level 1 cache.
    >
    >
    > You're assuming there is a "Level 1 cache".
    >

    I am assuming of course a modern CPU, either a PPC, x86 or similar.
    Please lets be realistic: handling 1-4MB matrices in a DSP
    with 4K of RAM would be ridiculous...

    >
    >>Input output requirements will be much more reduced and the
    >>program will be surely much faster.
    >
    >
    > What input output requirements? The OP didn't indicate that the
    > matrix would be read from or written to an external file.
    >

    Input output to/from the CPU of course!
    I am not speaking about file I/O but RAM I/O.

    >
    >>If you have a purely boolean matrix, I would use a packed
    >>representation with 8 bits/byte, making your matrix just
    >>125K, i.e. 8 times less than if you use the int data type.
    >
    >
    > Which will likely make access to individual elements slower.
    >

    No, this is my point. If you have to read from RAM 8 times less
    data, each read producing at least 10-20 wait states, the few
    more instructions to access the individual bits would be
    compensated by the smaller I/O.

    >
    >>Note that reducing space requirements increases speed, since
    >>filling the cache from RAM introduces an enormous amount of wait
    >>states. Memory access go through the bus at 300 MHZ, with a CPU
    >>running at 3 000 MHZ, i.e. a memory access can be 10 times slower
    >>when it doesn't hit the L1 cache.
    >
    >
    > That's all very system-specific.
    >

    Not really. As the CPUs of modern micro computers go to
    ever higher clock speeds, memory doesn't catch up at all,
    and I/O from RAM becomes the limiting factor.
    >
    >>Of course this analysis implies a sequential access to each
    >>element of the matrix. In other cases where you just access
    >>randomly elements of the matrix, an integer representation would
    >>beat all others since you are NOT limited by I/O.
    >>
    >>MAIN POINT
    >>----------
    >>Speculating about speed will never replace actual measurements.
    >>
    >>1: Build a test matrix.
    >>2: Simulate the access pattern of your application.
    >>3: Time it using unsigned char, int.
    >>
    >>Then you will know for sure.
    >
    >
    > You could have safely skipped everything before "MAIN POINT".
    >

    No, I wanted to describe alternatives and leave the OP the choice
    but of an informed choice.


  • Next message: John Hanley: "int to string"

    Relevant Pages

    • Re: speeding up my runtime on a c6713.
      ... it for cache, and possibly enabled the cache controller? ... I ran out of IRAM memory so I moved all my variables to ERAM. ... "Are you using all 256k of DSP ram, or have you reserved up to 64k of ...
      (comp.dsp)
    • Re: Geriatric Pentium
      ... processor scavenging spare ram cycles to back it up to ram and restore ... Note that if it stores something the cache is of course ... as well as a memory write cycle occurring. ... Program load is close to that now. ...
      (comp.lang.java.advocacy)
    • Re: speeding up my runtime on a c6713.
      ... it for cache, and possibly enabled the cache controller? ... I ran out of IRAM memory so I moved all my variables to ERAM. ... "Are you using all 256k of DSP ram, or have you reserved up to 64k of ...
      (comp.dsp)
    • Re: How does the ZIP chip work?
      ... If the ZIP chip does not have it's own memory, ... kind of cache? ... the zipchip has internal ram. ... to the screen buffers ot the i/o addresses (of course i/o addresses are ...
      (comp.sys.apple2)
    • Re: Slow File Load Through ODBC Driver
      ... buffer cache from 772 Megs to 132Megs ... virtual) ram immediately to fox when fox generated a large select statement. ... XP professional (suspecting kernel memory handling) -- no effect; ...
      (microsoft.public.fox.programmer.exchange)