Re: Towards faster Python implementations - theory



John Nagle wrote:

Modifying "at a distance" is exactly what I'm getting at. That's the
killer from an optimizing compiler standpoint. The compiler, or a
maintenance programmer, looks at a block of code, and there doesn't seem
to be anything unusual going on. But, if in some other section of
code, something does a "setattr" to mess with the first block of code,
something unusual can be happening. This is tough on both optimizing
compilers and maintenance programmers.

Agreed. It's tempting to look at some code and say which types one
thinks are being manipulated, but the actual program behaviour can be
quite different. Adding explicit type declarations "anchors" the names
to a very restrictive set of types - typically those permitted through
interfaces or inheritance in many object-oriented languages - and the
challenge, as we've already discussed, is to attempt to "anchor" the
names when they are in a sense "floating free" without any explicit
annotations from the programmer.

Python also presents other challenges. Unlike systems programming
languages, almost nothing is a local operation: each Python function
is mostly a product of function calls and the occasional conditional
or looping construct, themselves wired up using yet more function
calls. Whilst the expection is that most of these will return sensible
results (eg. a call to the iter method on a sequence returns an
iterator), there isn't any guarantee that this will be the case, and
even then the precise iterator involved can vary. Some might claim
that the class of operations which could be done locally are precisely
the ones which would benefit from identification and optimisation,
namely those involving primitive types, yet some pretty good solutions
for such cases exist already: Pyrex, various numeric packages, and so
on.

Python has that capability mostly because it's free in an
"everything is a dictionary" implementation. ("When all you have
is a hash, everything looks like a dictionary".) But that limits
implementation performance. Most of the time, nobody is using
"setattr" to mess with the internals of a function, class, or
module from far, far away. But the cost for that flexibility is
being paid, unnecessarily.

I've heard claims that virtual machines are evolving to make
dictionary-based access more performant, and I guess Microsoft's DLR
and any JVM improvements might point in that direction, but Shed Skin
shows the way by avoiding such things entirely.

I'm suggesting that the potential for "action at a distance" somehow
has to be made more visible.

One option might be a class "simpleobject", from which other classes
can inherit. ("object" would become a subclass of "simpleobject").
"simpleobject" classes would have the following restrictions:

- New fields and functions cannot be introduced from outside
the class. Every field and function name must explicitly appear
at least once in the class definition. Subclassing is still
allowed.

I suppose one could mandate that only methods defined inside class
blocks belong to the class and subclasses, and that various kinds of
optimisations can then be applied to the code in those methods such
that the self parameter's type is constrained in a way probably
already imposed by CPython. This would forbid the adding of functions
to classes and instances after the fact, but in my fairly conservative
world of programming, I hardly ever do such things. I do add
attributes to instances outside those instances (by the above
definition) quite often, though.

- Unless the class itself uses "getattr" or "setattr" on itself,
no external code can do so. This lets the compiler eliminate the
object's dictionary unless the class itself needs it.

I think this is a natural progression from determining the access
needs of particular classes and their objects.

This lets the compiler see all the field names and assign them fixed slots
in a fixed sized object representation. Basically, this means simple objects
have a C/C++ like internal representation, with the performance that comes
with that representation.

Yes, I agree that this would be desirable.

With this, plus the "Shed Skin" restrictions, plus the array features of
"numarray", it should be possible to get computationally intensive code
written in Python up to C/C++ levels of performance. Yet all the dynamic
machinery of Python remains available if needed.

All that's necessary is not to surprise the compiler.

One thing I had cause to think about again today was the cost of
function invocations. Python not only has pervasive dynamic dispatch,
but there are also such things as *args and **kwargs and the cost of
dealing with them. However, such things are very useful in integrating
components and extending class hierarchies, so I wouldn't want to see
them vanish in their entirety.

Paul

.



Relevant Pages

  • Re: Why I love python.
    ... Python needs drastic performance improvement if it is to scrap-off the ... How can the compiler know what code to produce? ... A different approach would be for the programmer to *suggest* what ... Notice that in such a case the decoration does not alter ...
    (comp.lang.python)
  • Re: Python from Wise Guys Viewpoint
    ... > one of these statements about functions in Python are true: ... Currently, the best optimizing compiler for Python is Psyco, the ...
    (comp.lang.python)
  • Re: Python from Wise Guys Viewpoint
    ... > one of these statements about functions in Python are true: ... Currently, the best optimizing compiler for Python is Psyco, the ...
    (comp.lang.lisp)
  • Re: Needless copying in iterations?
    ... compiler can't know this isn't a pathological case. ... 'there's no point optimizing the common case of printing ints ... It seems to me that the "consenting adults" philosophy of Python doesn't ... alist.append# side-effects DON'T matter ...
    (comp.lang.python)
  • Re: New to Python - Compiled Language?
    ... Python isn't a compiled language in the sense of native code ... When I say compiled language, I do mean a language like C++ which is ... transformed into machine code by a compiler. ... I'm a video game programmer and the applications I'm hoping to use ...
    (comp.lang.python)