Re: Borland Delphi + 64Bits

From: Danny Thorpe (dthorpe_at_bozofilter.borland.com)
Date: 03/30/05


Date: 29 Mar 2005 15:11:03 -0800

Richard Foersom wrote:

>
> I think Danny T. point of view is guided by - do whatever Windows
> does, and MS Visual C++ for Win64 keeps integer as 32 bit AFAIK.

Thanks for the lack of confidence.

I've been working with 64 bit systems (at the machine code level) since
1995. I have a DEC Alpha AXP 200 4/233 sitting at the end of my desk.
It's not Borland equipment. It's mine.

Do a google search. You'll find I've been railing against the "just
recompile" 64 bit code portability myth for more than a decade.

>
> On AMD64 how efficient is it to load and sign extend a 32 integer as
> compared to straight load and use of 64 bit integer? If you know
> please explain.
>

Moving less is almost always faster than moving more. Loading 4 bytes
of memory requires less time and has less impact on the cache system
than loading 8 bytes to do the same thing. You can load twice as many
4 byte elements in the same amount of time as 8 byte elements.

Crossing cache lines (caused by unaligned data) creates memory fetch
delays. 8 byte data has a greater risk of falling across a cache line
boundary than 4 byte data. Aligning 8 byte data requires wasting more
memory for pad bytes than aligning 4 byte data.

Sign extending a value in a register is trivial because it's in the
register already. Getting the value into the register from faraway
memory is the hard part.

This general rule of thumb is supported by the instruction latencies
documented by AMD here:
http://www.amd.com/us-en/assets/content_type/DownloadableAssets/dwamd_25
112.pdf

Moving register to register completes in 1 clock cycle, regardless of
source or destination register size. Moving between memory and
register of any size is 3 to 4 clocks, gated by the memory bus
bandwidth. Moving from anywhere to a register of any size with sign
extension (MOVSX) has exactly the same latency as moving without sign
extension - 1 clock.

There is no instruction execution performance difference between 32 bit
and 64 bit integer operations in AMD64. The issue is memory. Bigger
data creates performance problems.

-Danny

-- 
Delphi Compiler Core:  http://blogs.borland.com/dcc


Relevant Pages

  • Re: Time dilation - cant spot my error.
    ... moving in. ... while his clock has registered 1.1547 years he recons B's clock has only ... after 0.5774 years and his clock will register 0.5774 years as before. ... There are subtle traps to avoid. ...
    (sci.physics.relativity)
  • Re: Time dilation - cant spot my error.
    ... moving in. ... while his clock has registered 1.1547 years he recons B's clock has only ... after 0.5774 years and his clock will register 0.5774 years as before. ... the L.T. is a basic tool of relativity. ...
    (sci.physics.relativity)
  • Re: Time dilation - cant spot my error.
    ... moving in. ... while his clock has registered 1.1547 years he recons B's clock has only ... after 0.5774 years and his clock will register 0.5774 years as before. ... I *have* made a spacetime diagram. ...
    (sci.physics.relativity)
  • Time dilation - cant spot my error.
    ... What SR says is that if you compare a clock moving w.r.t yourself it ... Essentially a clock tick is an event. ... The distance of 1 light year was measured in A's FoR. ... after 0.5774 years and his clock will register 0.5774 years as before. ...
    (sci.physics.relativity)
  • Re: can somebody help me with the problem with tasm models
    ... When Intel created the x86 originally, ... registers...now, when addressing memory with something like "", this ... valid...the rest aren't yet wired in and are ignored in memory addressing ... "offset" register, this would give a 20-bit address...if, in time, they ...
    (alt.lang.asm)