Re: "STL from the Ground Up"



Michael wrote:
On 2008-02-15 13:38:54 +1000, Jon Harrop <usenet@xxxxxxxxxxxxxx> said:
For one, yes. There are many other benefits as well like safety, a
high-level intermediate language than can interoperate with many other
languages, run-time compilation and so on.

Yes, it provides safety in the face of incorrect programming and subtle
flaws. In somecases this is usefull. However it is usually more
agreeable to have well written code in the first place.

If your language lacks expressive features then you cannot write code well
in it. I give some examples of this below. Maybe you can tweak my C++ but
it will never rival the OCaml/F# code that it more powerful in every case.

Intermediate languages (i assume your refering to Java Bytecode &
others?) are not features of a language.

I'm not familiar with Java but both C# and F# integrate language features
solely to let you generate, manipulate and (in particlar) compile and run
intermediate (MSIL) code.

It is perfectly acceptable for
java, C#, and your lovely languages to be directly compiled into
processor native instructions.

How could you provide run-time code generation, reflection, platform
independence?

Vice versa, C, C++ and even fortran 77 could be compiled to the
intermediate languages.

Yes, they only require a subset of the functionality.

(I do concede that
these compilers aren't common or even existant. I assume that if the
need was sufficient the task of writing such would be undertaken.)

I have actually done that in order to translate numerical libraries written
in Fortran into .NET assemblies for easy consumption by .NET languages.

Run time compilation? Is a feature of an IDE - XCode can be set to
perform run time behind the scenes compilation of c, c++, objective-c
and java. (At last reference many months ago, the feature isn't likely
to have been dropped).

That is not what I was referring to. I mean the ability to translate
expressions into native code and evaluate them on-the-fly all within your
own program. Nothing to do with the IDE.

Usually a language that provides molly coddling is not fast.

Our products got a lot faster.

Faster than a comparable compiled c/c++ program. Or faster in
comparison to prior versions of the same program?

Same program is 5x faster in OCaml than C++. That's a 250kLOC commercial
application, not a toy:

http://www.ffconsultancy.com/products/smoke_vector_graphics/?cp

My work was impossible in C++ five years ago and I'm quite sure it will
be impossible in C++0x. They are decades behind the current state of
the art...

In what areas are they behind?

Off the top of my head:

. Static checking.
. Encapsulation.
How?

Functors, for example, provide statically-checked parameterization of
modules over modules. Functors are part of the ML module system and
constitute and entire sublanguage. C++ has nothing comparable.

. Pattern matching.
Regex, yes? Usually seen as a library in anything bellow a scripting
language.

I was referring to pattern matching over algebraic data types but regular
expressions are another good example: they run much faster if they're
run-time compiled to native code.

. Type inference.
Hmm, RTTI

In OCaml or F#, you write:

let f(a, (b, c)) = ((a, b), c)

The compiler infers the types of all subexpressions for you and makes them
available to you via graphical throwback in the GUI. The type inferred by
the F# compiler is:

val f : 'a * ('b * 'c) -> ('a * 'b) * 'c

In C++, you must write something grotesque to do the same trivial task,
like:

#include <utility>

using namespace std;

template<class A, class B, class C>
const pair<pair<A, B>, C> f(const pair<A, pair<B, C> > &x)
{
return pair<pair<A, B>, C>(pair<A, B>(x.first, x.first.second.first),
x.second.second);
}

This is the main reason why OCaml and F# typically require 1/4 as much code
to do the same thing. Note that this is not dynamic typing and, in
particular, there is absolutely no run-time performance cost in F#.

. Interoperability.
I can't think of anything more common than c. Most OS's are written in
it, and export to it.

Provided you're only talking to the OS that's fine but few applications are
written in C. F# code can interoperate with C, C++, C#, VB and Excel with a
high-level interface that is both type safe and garbage collected.

. Run-time compilation.
IDE. Look at XCode, i'm sure there are others.

Nothing to do with the IDE. Look at MetaOCaml, EVAL in Lisp, LLVM or the
MSIL code emitter in .NET for examples of run-time compilation.

. Marshalling.
This sounds like RPC and related. See Pattern Matching.

These languages use uniform internal representations of data structures.
This allows these data structures to be marshalled to file, disk or over a
network (also called pickling and serialization).

. Compile times.
I would rather a slightly longer compile vs run time.

Our compile times dropped from 24hrs with g++ to only 30secs with ocamlopt.
As I said, our run times became 5x faster as well.

. Optimizations.
hmm, This would be compiler dependant. If i'm not wrong most
optimisation are micro and applied to an intermediate language between
input and output. The best place to look for optimisations would be in
a multi language compiler.

There is a lot of commonality, like closures, currying and pattern matching.
When the OCaml compiler is given a complicated nested pattern match it
decomposes it into decision trees compiled the fewest possible comparisons
and dispatches. In contrast, C++ compilers only generate flat sequences of
comparisons (worst case performance) or virtual function dispatch that only
works for a single level of run-time dispatch.

This is why OCaml programs that make heavy use of complicated pattern
matches can be orders of magnitude faster than C++ programs. C++ compilers
have no hope of performing such optimizations because the information is
not available to them. In particular, the C++ language does not even
support closed sum types.

. Concurrency.
Wow, threads (and synchronisers).

And software transactional memory, asynchronous workflows etc.

Again usually in libraries (re
Marshalling & PMatching). Many languages don't support concurrency as
primitives because of the many approaches to achieve it. You would be
limiting the code's portability.

These languages are all more portable than C++ IME.

. Parallelism.
Yes, similar arguement as with concurrency. It is best to let the
compiler decide, or the system decide.

The compiler's hands are tied unless the language exposes room for
parallelism.

. Structural typing.
Probably a given, you've found a hole i my knowledge.

OCaml can infer the equivalent of entire class hierarchies, completely
removing the need to define them in your code.

Consider computing the symbolic derivative of an expression in OCaml:

let rec d e x = match e with
| `Int _ -> `Int 0
| `Add(f, g) -> `Add(d f x, d g x)
| `Mul(f, g) -> `Add(`Mul(f, d g x), `Mul(g, d f x))
| `Var v -> `Int (if v=x then 1 else 0);;

The OCaml compiler infers the type:

val d :
([< `Add of 'a * 'a | `Int of 'b | `Mul of 'a * 'a | `Var of 'c ]
as 'a) ->
'c -> ([> `Add of 'd * 'd | `Int of int | `Mul of 'a * 'd ] as 'd)

That is equivalent to an entire class hierarchy with derived types for
integers, variables, addition and multiplication operators in the
expression type. Moreover, the type inferred by OCaml proves that this
function removes all variables (there is no `Var in the return type of this
function).

The nearest C++ equivalent is not only hideous but it can't even handle
common subexpressions because it lacks a garbage collector to determine
when an expression becomes unreferenced:

using namespace std;

class Var;

class Base {
public:
virtual ~Base() {};
virtual const Base *clone() = 0;
virtual const Base *d(const string &v) const = 0;
virtual ostream &print(ostream &o) const = 0;
};

ostream &operator<<(ostream &o, const Base &e) { e.print(o); return o; }

class Int : public Base {
const int n;
public:
Int(int m) : n(m) {}
~Int() {}
const Base *clone() { return new Int(n); }
const Base *d(const string &v) const { return new Int(0); }
ostream &print(ostream &o) const { return o << n; }
};

class Var : public Base {
const string var;
public:
Var(string v) : var(v) {}
~Var() {}
const Base *clone() { return new Var(var); }
const Base *d(const string &v) const { return new Int(var == v ? 1 : 0); }
ostream &print(ostream &o) const { return o << var; }
};

class Plus : public Base {
const Base *e1, *e2;
public:
Plus(const Base *a, const Base *b) : e1(a), e2(b) {}
~Plus() { delete e1; delete e2; }
const Base *clone() { return new Plus(e1, e2); }
const Base *d(const string &v) const { return new Plus(e1->d(v),
e2->d(v)); }
ostream &print(ostream &o) const
{ return o << "(" << *e1 << " + " << *e2 << ")"; }
};

class Times : public Base {
const Base *e1, *e2;
public:
Times(const Base *a, const Base *b) : e1(a), e2(b) {}
~Times() { delete e1; delete e2; }
const Base *clone() { return new Times(e1, e2); }
const Base *d(const string &v) const
{ return new Plus(new Times(e1, e2->d(v)), new Times(e1->d(v), e2)); }
ostream &print(ostream &o) const { return o << "(" << *e1 << " * " << *e2
<< ")"; }
};

class Expr {
public:
Base *e;
Expr(Base *a) : e(a) {}
};

const Expr operator+(const Expr e1, const Expr e2)
{ return Expr(new Plus(e1.e->clone(), e2.e->clone())); }
const Expr operator*(const Expr e1, const Expr e2)
{ return Expr(new Times(e1.e->clone(), e2.e->clone())); }

. Recursion.
Most languages support recursion, C and C++ do. Are you instead
refering to thier optimisation?

Tail recursion, call with current continuation and continuation passing
style. C++ provides none of these.

. Closures.
Probably a given, you've found a hole i my knowledge.

Closures are function values that can capture variables from their
environment. Closures require garbage collection to be implemented in a
usable form because captured variables must be kept alive for at least the
lifetime of the closure, which may outlive the scope where it is defined
because functions can return closures. Indeed, that's exactly what the "d"
function above did.

This has an adverse effect on lots of practical applications like parsing
and the manipulation of trees.

it does? I never knew using a library would be wrong. That is the usual
method for gaining access to complex methods, without having to write
them ground up.

Libraries can't work around all missing language features. Say you use flex
and bison to write a parser given a BNF grammar. You might want to save
memory by hash consing subexpressions (automatically sharing all identical
subexpressions) but that turns your abstract syntax tree into a graph. To
deallocate it you must now start doing graph traversals, which means you're
literally writing a garbage collection.

In practice, if you don't already have garbage collection then you limit
yourself by not using things like hash consing and bloating all of your
graphs into trees by copying data to avoid recombinant data structures.
This is why parsers written in C/C++ are typically many times slower than
parsers written in languages like OCaml. This is part of what OCaml was
bred for, so that isn't surprising.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?u
.



Relevant Pages

  • Re: Verbose functional languages?
    ... I wasn't referring to laziness. ... I think this is another serious problem with Haskell (and OCaml). ... Much of its needs to be in the language or at least have enough support ... if the compiler unboxes them for you by way of optimization. ...
    (comp.lang.functional)
  • Re: "STL from the Ground Up"
    ... high-level intermediate language than can interoperate with many other ... If your language lacks expressive features then you cannot write code ... memory management in comparison. ... Mostly because type errors mean that the programmer and compiler disagree ...
    (comp.programming)
  • Some Straight Dope
    ... His program is currently 303 lines long and still much slower and more verbose than the fastest implementations in the other languages. ... but OCaml eventually comes out being eight times slower than ... and suitable to do a language comparison of any sort. ... Some people say that advances in compiler technology roughly contribute a doubling in efficiency every 18 years. ...
    (comp.lang.functional)
  • Re: About alternatives to Matlab
    ... Python will almost certainly be notably slower than moderately ... the powerful and flexible Python language. ... particularly that an OCaml user can easily switch back and forth ... The knowledge is there for the compiler to use but I don't believe any of ...
    (comp.lang.python)
  • Re: About alternatives to Matlab
    ... Python will almost certainly be notably slower than moderately ... the powerful and flexible Python language. ... particularly that an OCaml user can easily switch back and forth ... The knowledge is there for the compiler to use but I don't believe any of ...
    (comp.lang.python)