Re: GMP vs. straight C arithmetic

From: Paul Hsieh (qed_at_pobox.com)
Date: 03/14/04

  • Next message: Joe \: "Re: newbie help, choosing language"
    Date: 13 Mar 2004 17:57:21 -0800
    
    

    Chapter33@aol.com (Mark R.Bannister) wrote:
    > qed@pobox.com (Paul Hsieh) wrote:
    > > Chapter33@aol.com says...
    > > > qed@pobox.com (Paul Hsieh) wrote:
    > > > > Chapter33@aol.com (Mark R.Bannister) wrote:
    > > > I'm not convinced your argument applies in this case. I have no
    > > > intention to make PROSE bytecode mirror instructions that you would
    > > > expect to send to a chip. They are higher level instructions than
    > > > that. One opcode in the PROSE world might be written as 40 or 50+
    > > > opcodes in the microprocessor world.
    > >
    > > Well, you're on the right track, but if you provide 32 bit integers types --
    > > well think about it. What kind of higher level opcodes are you planning to
    > > have that affect types as small as these which are not dominated by the
    > > bytecode interpreter overhead?
    >
    > The advantage comes when using higher-level structures that are not
    > available in lower-level languages or assembly. Naturally, if you're
    > doing basic mathematics with 32 bit integer types, there's not a lot I
    > can do to reduce the bytecode interpreter overhead.

    Then you lose by 10x-100x on *any* numerical computation. This at a
    very fundamental level is what drags down *all* interpreted languages
    performance. Every interpreter has reasonable performance routines
    for their higher level structures -- that's not where they lose. They
    lose because "i = i + 1" takes 20+ clocks, while in C its almost
    always a completely free instruction (essentially 0 clocks, because
    its running in parallel with some other instruction.)

    > [...] However, working
    > with complex objects, stacks, autosizeable matrix arrays and tree
    > structures (all typeless collections) will be easy and fast.

    Just like they are in Lua (which uses a single opcode for an associate
    array lookup)? Lua is still > 10x behind where C is.

    > > > > Given the kind of background I have, one of my goals was to make a
    > > > > *scripting* language that has performance comparable to C *WITHOUT*
    > > > > using JIT. I would say on this score I have some very compelling
    > > > > ideas that I think even if not successful would easily leave any
    > > > > existing scripting languages in the dust.
    > > >
    > > > Why *without* using JIT, in particular?
    > >
    > > Because JIT requires enormous resources to implement. If you want to
    > > leverage this, then why not just use the Java bytecode? Java already is
    > > within striking distance of C's performance.
    >
    > Not in my experience it isn't.

    Ok, whatever. I am not here to defend Java. I just know that its a
    lot closer than 10x off of the performance of C. Something like 2x
    probably.

    > [...] How does Perl do it? I thought I'd read somewhere that Perl compiles a
    > script before running it?

    I think you need to do more research. Look up "Parrot". Perl,
    Python, LUA, Ruby, etc, they are all bytecode compiled before they are
    executed. They are not compiled to the native machine code. C# is,
    but its arguably not any more "natively compiled" than Java with a
    JIT.

    > > > <snip>
    > > > > What's the big "catch" in your language? I couldn't think of one for
    > > > > mine, so I stopped pursuing my ideas. This is what I was looking for
    > > > > when I was reading your specs. This is what makes people care about a
    > > > > language, and its what's I want to glom my ideas onto.
    > > >
    > > > The "catch" is all in the hierarchy. You name me a single language
    > > > available today that allows me, as a developer, to work in a
    > > > programmable framework with the same flexibility as you can work with
    > > > data in LDAP over a network,
    > >
    > > What is LDAP? Will fammiliarity with it be relevant to its supposed ease
    > > of use?
    >
    > LDAP is the driving force behind my whole concept. It is the
    > Lightweight Directory Access Protocol, used for accessing information
    > stored in corporate databases and directory trees, where information
    > is organised in a hierarchical form.

    So you're hoping to gain performance by accelerating a (network)
    protocol driven data structure? Do these map in a natural way to
    ordinary data structures that don't impose an additonal performance
    hit?

    > > > [...] and as easily as manipulating data in a
    > > > spread***, and I'll stop work on PROSE right away.
    > >
    > > Well, the MSVC debugger has a fairly sophisticated set of facilities for
    > > watching data and manipulating data on the fly as your program executes.
    > > (As did the WATCOM C/C++ and Borland debuggers before it.)
    >
    > (scratches head) but I wasn't talking about debugging ... that's
    > another subject altogether.
     
    I was just associating visualization of data structures with the way a
    debugger will typically do so for you.
     
    > > > [...] A language that
    > > > can be as simple to use as AWK, but can support large distributed
    > > > programming projects like C++ and Java, and can - if you choose - be
    > > > nothing more than a collection of associated objects and
    > > > "side-effects" with no "procedural" code at all. Hell, you can even
    > > > code in a "flow-chart" fashion if you fancy it.
    > >
    > > Is this a solution in search of a problem?
    >
    > Not at all. Side-effects are another crucial part of the language.
    > Take the tree structure example I gave earlier. Add a node to one
    > part of the tree and you might be creating a new method. Add a node
    > to a different part of the tree, and you might be creating a new file
    > in a filesystem. And another place, you could be adding a new host to
    > a DNS zone. It is basically bringing the UNIX SVR4 concept of
    > "virtual filesystems" into use by a programming language, where
    > different mount points (or branches in the hierarchy) may be managed
    > by different device drivers, and operations such as add and delete are
    > passed to back-end library routines to perform whichever operation is
    > appropriate for that context.
     
    Zzzzz ... you lost me long time ago. This does not count as a
    "catch". What you wrote above is why non-programmers hate people like
    us.
      
    > Yes your argument is sound for the example you give. However, int and
    > intm are treated as two entirely separate types. There is nothing
    > magic in PROSE that says "intm" is like "int" but with arbitrary
    > precision. If I start making up magic rules like that, the language
    > certainly will get complicated, and more prohibitive for new user
    > defined types.

    Well, the problem is that your audience wont necessarily care or
    understand the distinction between int and intm. By exposing both
    types, you inherit C's problem of wrapping in smaller integers
    happening at a different point than the larger.
     
    > Writing your example as objects and methods,
    >
    > (intm)X = 42 + (int)A
    >
    > would become something like this:
    >
    > X.assign(
    > 42.add(
    > A.convert(
    > typeof(int)
    > )
    > ).convert(
    > typeof(intm)
    > )
    > )

    Wait, so you are assuming "42" is of type int instead of intm? Does
    this mean that the end-user cannot specify inline constants > 32bits?
    Or do different constants have different types?
     
    > > > > > So ... where do I find the complete list of the 2's complement integer
    > > > > > "anomalies" you speak of, and how do you suggest I get around them and
    > > > > > solve that particular issue?
    > > > >
    > > > > Well the -0x80000000 = 0x80000000 thing is really the only one. :)
    > > >
    > > > Ah, well, there we go :-) In which case, you've made a very good
    > > > argument for me to keep int and intm separate, thanks! :-D
    > >
    > > Right, but why not drop int altogether?
    >
    > We're going round in circles here. 1) Performance.

    Which you wont have ...

    > [...] 2) What if I need to call system APIs? They can't work with MP ints.

    I see. So making sure your language can speak directly to protocals
    with its primitive types is important? Library calls, or just the
    overhead from making calls in general is not where the performance in
    most applications goes.

    > > Well ok, I'm not interested in type pre-declared languages.
    >
    > Paul ... Paul ... you're not paying attention. I state clearly in the
    > spec that you do not need to declare variables before they are used.

    Then you have two different kinds of variables? Lets say you want to
    do that following:

       intm X = Y;

    where Y is not a defined type. If the type enforcement cannot be
    determined at compile time, is this a compile time error? If not,
    then do you eat the overhead of type checking at runtime?

    --
    Paul Hsieh
    http://www.pobox.com/~qed/
    http://bstring.sf.net/
    

  • Next message: Joe \: "Re: newbie help, choosing language"