Re: GMP vs. straight C arithmetic

From: Mark R.Bannister (Chapter33_at_aol.com)
Date: 03/13/04


Date: 13 Mar 2004 08:30:46 -0800

qed@pobox.com (Paul Hsieh) wrote in message news:<MPG.1abb0c785be61020989792@news.sf.sbcglobal.net>...
> Chapter33@aol.com says...
> > qed@pobox.com (Paul Hsieh) wrote:
> > > Chapter33@aol.com (Mark R.Bannister) wrote:
> > I'm not convinced your argument applies in this case. I have no
> > intention to make PROSE bytecode mirror instructions that you would
> > expect to send to a chip. They are higher level instructions than
> > that. One opcode in the PROSE world might be written as 40 or 50+
> > opcodes in the microprocessor world.
>
> Well, you're on the right track, but if you provide 32 bit integers types --
> well think about it. What kind of higher level opcodes are you planning to
> have that affect types as small as these which are not dominated by the
> bytecode interpreter overhead?

The advantage comes when using higher-level structures that are not
available in lower-level languages or assembly. Naturally, if you're
doing basic mathematics with 32 bit integer types, there's not a lot I
can do to reduce the bytecode interpreter overhead. However, working
with complex objects, stacks, autosizeable matrix arrays and tree
structures (all typeless collections) will be easy and fast. In the
world of C or microprocessing, you have to write a whole bunch of
supporting code for this kind of data manipulation. Where as in
PROSE, one add operation (for example) can be used for creating a new
node in a tree structure using an optimised library back-end that
performs probably better than anyone else's implementations of tree
structures -- because PROSE is all about this and we've simply put
more man hours into making it go fast. One add instruction, to the
right node in a hierarchy, introduces a new local function -- as
another example (dynamic methods). A single instruction performing a
lot of crucial work.

> > > Given the kind of background I have, one of my goals was to make a
> > > *scripting* language that has performance comparable to C *WITHOUT*
> > > using JIT. I would say on this score I have some very compelling
> > > ideas that I think even if not successful would easily leave any
> > > existing scripting languages in the dust.
> >
> > Why *without* using JIT, in particular?
>
> Because JIT requires enormous resources to implement. If you want to leverage
> this, then why not just use the Java bytecode? Java already is within striking
> distance of C's performance.

Not in my experience it isn't. How does Perl do it? I thought I'd
read somewhere that Perl compiles a script before running it?

>
> > <snip>
> > > What's the big "catch" in your language? I couldn't think of one for
> > > mine, so I stopped pursuing my ideas. This is what I was looking for
> > > when I was reading your specs. This is what makes people care about a
> > > language, and its what's I want to glom my ideas onto.
> >
> > The "catch" is all in the hierarchy. You name me a single language
> > available today that allows me, as a developer, to work in a
> > programmable framework with the same flexibility as you can work with
> > data in LDAP over a network,
>
> What is LDAP? Will fammiliarity with it be relevant to its supposed ease of
> use?

LDAP is the driving force behind my whole concept. It is the
Lightweight Directory Access Protocol, used for accessing information
stored in corporate databases and directory trees, where information
is organised in a hierarchical form. Every object in a Directory
Information Tree (DIT) is a member of a number of classes defining how
the object behaves, and is made up of a number of attributes. LDAP is
a well-established internet standard used world-wide, and a great tool
for fast data access over a network. The latest specification is for
LDAP v3 and defined in RFC3377. I envisage accessing the tree
structures of many running PROSE daemons over the network in a similar
fashion to LDAP.

>
> > [...] and as easily as manipulating data in a
> > spread***, and I'll stop work on PROSE right away.
>
> Well, the MSVC debugger has a fairly sophisticated set of facilities for
> watching data and manipulating data on the fly as your program executes. (As
> did the WATCOM C/C++ and Borland debuggers before it.)

(scratches head) but I wasn't talking about debugging ... that's
another subject altogether.

>
> > [...] A language that
> > can be as simple to use as AWK, but can support large distributed
> > programming projects like C++ and Java, and can - if you choose - be
> > nothing more than a collection of associated objects and
> > "side-effects" with no "procedural" code at all. Hell, you can even
> > code in a "flow-chart" fashion if you fancy it.
>
> Is this a solution in search of a problem?

Not at all. Side-effects are another crucial part of the language.
Take the tree structure example I gave earlier. Add a node to one
part of the tree and you might be creating a new method. Add a node
to a different part of the tree, and you might be creating a new file
in a filesystem. And another place, you could be adding a new host to
a DNS zone. It is basically bringing the UNIX SVR4 concept of
"virtual filesystems" into use by a programming language, where
different mount points (or branches in the hierarchy) may be managed
by different device drivers, and operations such as add and delete are
passed to back-end library routines to perform whichever operation is
appropriate for that context.

>
> > > > Your comments re int and 2's complement "anomalies" are interesting.
> > >
> > > Here's the core case you need to worry about:
> > >
> > > (intm)X = (int)A + (int)B
> > >
> > > If the add causes a wrap around in the int domain (but obviously not
> > > in the intm domain) then what should happen here? Its not just a
> > > question of specifying it -- it a question of what do you think a user
> > > of your language should expect *WITHOUT* reading all your language
> > > documentation?
> >
> > Well, to me, that says --- cast A to an int, cast B to an int, add
> > them, cast the result to an intm and store in X.
>
> That's not obvious to me. Why wouldn't the compiler pick the largest type and
> cause the natural casting to propogate downwards as necessary? In such a case
> you wouldn't have to worry about wrapping.
>
> For example, what if someone writes:
>
> (intm)X = 42 + (int)A;
>
> What type is 42? Is it intm or is it int? If you use a top down recursive
> natural casting inheritance, it would first case the 42 and (int)A to intm.
>
> In other words, only if all the integers are (int) or indeterminant should you
> allow for the shorter 32 bit operations. This would have the fewest unexpected
> anomolies while still allowing for the kind of performance or accuracy the
> programmer might expect.

Yes your argument is sound for the example you give. However, int and
intm are treated as two entirely separate types. There is nothing
magic in PROSE that says "intm" is like "int" but with arbitrary
precision. If I start making up magic rules like that, the language
certainly will get complicated, and more prohibitive for new user
defined types.

Writing your example as objects and methods,

  (intm)X = 42 + (int)A

would become something like this:

  X.assign(
    42.add(
      A.convert(
        typeof(int)
      )
    ).convert(
      typeof(intm)
    )
  )

Now, I need rules that can be applied without exception to all types.
So consider:

  (string)X = 42 + (int)A

The assignment type cannot play any role until an expression has been
parsed, otherwise, the above will not result in the addition of two
integers, because following your rules the add operation would become
a string concatenation, which is certainly no good. The constant 42
is defined as a fixed integer constant in the specification. If you
want an MP constant, you'd have to use 42M, or "42" for a string
constant.

>
> > > > So ... where do I find the complete list of the 2's complement integer
> > > > "anomalies" you speak of, and how do you suggest I get around them and
> > > > solve that particular issue?
> > >
> > > Well the -0x80000000 = 0x80000000 thing is really the only one. :)
> >
> > Ah, well, there we go :-) In which case, you've made a very good
> > argument for me to keep int and intm separate, thanks! :-D
>
> Right, but why not drop int altogether?

We're going round in circles here. 1) Performance. 2) What if I
need to call system APIs? They can't work with MP ints.

>
> > [...]
> >
> > typeset int(-32) my_var1 # signed integer 32 bits precision
> > typeset intm my_var2 # MP int (arbitrary precision)
> > typeset float(32) my_var3 # fixed-precision float (32 bits)
> > typeset floatm(128) my_var4 # MP float (min. 128 bits)
>
> Well ok, I'm not interested in type pre-declared languages. You will never get
> your language into the anything remotely close to the realm of "easy to use
> languages" so long as you retain rigid variable declarations. The alternatives
> of dynamically typed or implicitely (where the compiler deduces your types)
> typed languages are far superior in terms of ease of use. You ought to at
> least try out languages like Python or Lua to see for yourself.

Paul ... Paul ... you're not paying attention. I state clearly in the
spec that you do not need to declare variables before they are used.
This is entirely optional. *If* you declare them, then you have the
added advantage of knowing exactly what type you're dealing with, and
forcing assignments to the variable to be converted to an explicit
type. If you don't declare them, then their type will change to fit
whatever type you happen to assign to them.

Keep shooting :-)
Mark.


Loading