Re: FastCode CPUID

From: Roelof Engelbrecht (roelof_at_NOSPAM.tca.net)
Date: 09/03/04

  • Next message: Steve: "Re: First step"
    Date: Thu, 2 Sep 2004 17:02:41 -0500
    
    

    "Dennis" <marianndkc@home3.gvdnet.dk> wrote:

    > L1 cache size and L2 cache size is enough.

    OK. I'm thinking about something like this as a global variable in
    FastCodeCPUID, initialized in the initialization section:

    type
      TCPU = record
        Vendor: TVendor;
        EffFamily: Byte; //ExtendedFamily + Family
        EffModel: Byte; //(ExtendedModel shl 4) + Model
        CodeL1CacheSize, //KB
        DataL1CacheSize, //KB
        L2CacheSize: Integer; //KB
        InstructionSupport: TInstructionSupport;
      end;

    I'll probably add CPU speed as well. It may come in handy for benchmarking
    and other purposes. FastCode function can the just refer to CPU.L2CacheSize
    to adapt to L2 cache size.

    > The least thing I should do is to make sure there is a proper description
    at
    > the homepage of the targets and which processors mar to each target.

    Agreed. I'll will look at it and give you my comments.

    > I do not find the target names too bad, perhaps except Opteron.
    >
    > We should also remember that I actually benchmark on P4 1600 Northwood, P4
    > 2800 Prescott, XP 2500+ Barton, Opteron 1400 and P3 1400 Celeron only.
    These
    > processors are the basic target set. Then I get Pentium M results from
    > helpers - thanks. I need to get a Pentium M - Banias or Dothan.

    I have a Pentium M Banias, so I can help out when needed.

    > If we claim that a P4N winner is optimized for Xeon X we are actually
    lying.

    Not really if it is a Prestonia (DP) or Gallatin (MP) Xeon. The Northwood,
    Prestonia, Gallatin processors are exactly the same, except for the L2 and
    L3 cache sizes.

    L2 cache:
    Northwood Celeron: 128 KB
    Northwood Mobile Celeron: 256 KB
    Northwood Pentium 4: 512 KB
    Northwood Mobile Pentium 4: 512 KB
    Prestonia Xeon DP: 512 KB
    Gallatin Pentium 4 EE: 512 KB
    Gallatin Xeon MP: 512 KB

    L3 cache:
    Northwood Celeron: none
    Northwood Mobile Celeron: none
    Northwood Pentium 4: none
    Northwood Mobile Pentium 4: none
    Prestonia Xeon DP: none or 1 MB
    Gallatin Pentium 4 EE: 2 MB
    Gallatin Xeon MP: 1 MB, 2 MB, or 4 MB

    So, if the code (and related data) fits in 128 KB L2 cache (the minimum
    available on the P4 non-SSE3 architecture), all these processor will operate
    esentially the same, because their "engines" are the same. The difference
    comes in when L2 cache > 128 KB, L3 cache and/or main memory is accessed,
    but you cannot really optimize for that because there are too many
    permutations.

    > I would like to improve/expand the set of machines I benchmark and
    validate
    > upon, but I am short on money ;-)

    If the function stays within the minimum L2 cache size available on each
    architecture, you only need to benchmark on one processor per architecture.

    > Should the function pointer be named CompareTextFastcode, CompareTextFC or
    > CompareText?
    >
    > I will update the library user guide / design guide when we decide it.

    Leaving it CompareText is probably the easiest, because existing code will
    compile without changes. However, if SysUtils (which contains the RTL
    CompareText) is listed after FastcodeCompareTextUnit in the uses list, then
    the SysUtils version will be called. You can always get past this by using
    FastcodeCompareTextUnit.Fastcode but it is probably easier to just to ensure
    that the FastCode libraries are listed after the RTL libraries in the uses
    list. The original RTL CompareText wil still be accessible through
    SysUtils.CompareText. My second choice would be to call it fcCompareText.

    > > To make the function names a little shorter (and more distinct) you can
    > > perhaps use "_fct" instead of FastCode, yielding function names such as
    > > CompareText_fctP3 and CompareText_fctPM.
    >
    > I would prefer FC for Fastcode, but I like the long names and "hate"
    > underscores ;-)

    How about fcCompareTextP3 and fcCompareTextPM? We can also go to
    fcCompareTextP4SSE3 instead of fcCompareTextP4_SSE3. I really don't care
    that much...

    > I am sure that the Delphi community is looking forward to have some more
    > libraries :-) I need all the help I can get on building them. Send some to
    > me as soon as you finish them. Put your name in them too.

    Will we have individual libraries for each function, or will we ultimately
    combine libraries by function, for example FastCodeMath for all the Math
    stuff and FastCodeText for all the Text stuff?

    Roelof


  • Next message: Steve: "Re: First step"

    Relevant Pages

    • Re: intel celeron
      ... to pc upgrading and troubleshooting. ...  HP says the motherboard supports Willamette and Northwood ... present one will be adequate for the Pentium 4. ...
      (alt.comp.hardware.pc-homebuilt)
    • Re: Loud (not noisy) CPU fan in my dads Dell Optiplex GX260 machine.
      ... at faster RPM, quieter at lower RPM. ... Many of today's motherboards have fan speed controllers built in. ... Prescott and Northwood were both different versions of the Pentium 4. ...
      (alt.sys.pc-clone.dell)
    • Re: Processor Replacement
      ... I have a Dimension 8300 that came with a Pentium 4, 3.0 MHz 512 cache. ... I can get a Prescott Pentiun 4, 3.0 MHz, I MB cache. ... I've owned a couple of 8300's though both have/had Northwood CPU's. ...
      (alt.sys.pc-clone.dell)
    • Fastcode Target Descriptions
      ... The following targets are currently active: Pentium 4 Dual Core - Presler, ... Pentium 4 Northwood, Pentium M Core Duo - Yonah, Pentium M Dothan, AMD Dual ...
      (borland.public.delphi.language.basm)