Re: Problem: Creating a raw binary string

From: Jim P (Jim_P_at_mad.scientist.com)
Date: 11/30/04


Date: Tue, 30 Nov 2004 11:48:43 -0800

Bruce Roberts wrote:

> "Nicholas Sherlock" <n_sherlock@hotmail.com> wrote in message
> news:co8air$oqe$1@lust.ihug.co.nz...
>
>
>>>I'm still trying to understand why 64-bit cpu will be any faster than
>
> 32-bit
>
>>>for the average desktop o/s and applications. IMO its a canard.
>>
>>AFAICS, they can push double the data with one instruction, so
>>applications written to take advantage of that fact can nearly halve
>>their execution time.
>
>
> While its true that a 64-bit cpu will move twice the data per instruction it
> doesn't mean that programs can really benefit substantially from this.
> Memory bus width plays an important role here and unless it too is widened /
> made faster, doubling the required data movement per instruction is actually
> going to result in a performance loss. I doubt that 64-bit integers are
> really going to be needed by the vast majority of programs. So programmers
> either stick to 32-bit integers, thus raising all sorts of word alignment
> issues or use 64-bit integers, effectively doubling the amount of memory
> being moved without actually using it. String work, a significant part of
> most business apps, work with 8 or 16 bit chunks, so these operations are
> not likely going to realize any benefit from the increased size.
>
> There are application areas that will benefit from 64-bit. I just don't
> think that this move is going to contribute to a significant improvement for
> all, or even most, applications.
>
> I don't think that one can take the experience of moves from 8-bit to 16-bit
> to 32-bit as a guide to probably gains moving to 64-bit. In those moves the
> cpu and bus architectures were expanding to better represent work being done
> in software. IOW programs routinely used 32-bit data types even though they
> had to provide code to process these types on more limited hardware. This
> situation doesn't exist as a general case today for 64-bit, i.e. most
> programs do not make heavy use of 64-bit data types. Although as I wrote the
> last sentence it occurred to me that Currency is 64-bit and some business
> apps probably use the type quite heavily. Still, I suspect that 64-bit speed
> gains are more likely to be single digit percentages rather than in the
> 30-60% range.
>
>
This is a great arguement - except that is not how the memory is
actually addressed.

You are forgetting the two levels of cache in the processor. The
smaller Level 1 cache and then the larger 500K or larger Level 2 cache.

and the memory operations are typically based upon cache operations and
not upon processor requirements. Thus the reading and writing of a
Integer 32 bit or 64 bit is done to the cache and not to memory.

The cache operations are handled by the memory controller and are
typically done in blocks and take advantage of the structure of the
memory chips and performance features they provide by none standard
modes of data fetching.

and the caching operation does some assuming that the next set of bytes
are going to be needed also. (as in program code) or handling an array.

So the memory operation might be as much as 64 bytes for each operation - -

again note this is different than the processor read and write which is
to the level 1 cache and the cache handler then looks to see if it is
there and if not cause a look in the level 2 cache and then finally
empties a block in the cache for fetching the memory block.

Once this block of is in the cache and in effect on the processor chip -
the rules change.

I am not going to go into the details here as they keep playing
different cames to speed up the processor and do different algorithms
all the time - to get faster operation - - and as more transistors are
present to implement them - - I gave up trying to keep track of the
different concepts and ideas behind this.

It is this caching operation that freed up the Memory buss, It used to
be the total speed bottle neck but with Cache, this was removed totally.
  Now it is not unusual to see memory buss usage levels as low as 10% on
the average. The larger the cache the less memory buss usage is
present. That allowed for on board video to share the main memory of
the processor. (note video is a memory hog - in terms of amount of time
or bandwidth that is needed) and only a small processor performance hit
occured.

But still when the information is not in the cache, The high speed main
memory is valuable. - - but the data is received into the cache in
chucks - - or blocks.

The memory chips are addressed in Row col fashion. Kind of an X,Y
matrix. The Row address is put to the chip first. As this has to be
decoded into the Row Select line and then the data comes out in the
columns. All Columns at once. Then Col or Y is decoded to select which
col is desired. Note this means that all col information is present at
the same time. - - so getting the next col information is very fast.
Very fast. It is simply a matter of selecting the next col desired. and
that can be done automatically in the chip as the data is clocked out.
So Very fast block transfers are possible from the memory chips. This is
what occurs and part of the reason - - bring in a block Not a single 64
bit integer at a time.

This high speed clocking of the data is what the relates to the transfer
rates you see for the memory chips. It is not the random access transfer
rates. But this block rate.

and as this has become more and more the standard. a lot work has been
done in this area. To interface better to the cache and cache
controllers in the micro.

Jim P.



Relevant Pages

  • Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs
    ... :This doesn't contradict your claim since main memory is not really involved. ... that gives the same not-very-real-world cache state for all iterations ... full, and the cpu stalls anyway. ... static instruction order makes it easiest for them, ...
    (freebsd-arch)
  • Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs
    ... :This doesn't contradict your claim since main memory is not really involved. ... that gives the same not-very-real-world cache state for all iterations ... full, and the cpu stalls anyway. ... static instruction order makes it easiest for them, ...
    (freebsd-current)
  • Re: Superstitious learning in Computer Architecture
    ... don't really eat up that much memory bandwidth. ... That's what instruction caches and Harvard architecture is for. ... about is a loop with a 100% hit in the instruction cache, ... There's also a processor+DRAM chip (Mitsubishi DN10000 series, ...
    (comp.arch.arithmetic)
  • Re: Instruction And Data memory
    ... The difference is that instruction memory is exactly that: ... Cache efficiency. ... instructions, requiring an I-cache refill. ...
    (sci.electronics.design)
  • Re: Cached memory never gets released
    ... Stock linux 2.4.26 kernel. ... Due to flash bug 3M of memory gets lost due to font memory getting lost ... The output of "free" cache number steadily grows. ... longer to exhaust all of system memory with the cache. ...
    (Linux-Kernel)