Re: "STL from the Ground Up"



On 2008-02-16 11:28:26 +1000, Jon Harrop <usenet@xxxxxxxxxxxxxx> said:

Michael wrote:
On 2008-02-15 13:38:54 +1000, Jon Harrop <usenet@xxxxxxxxxxxxxx> said:
For one, yes. There are many other benefits as well like safety, a
high-level intermediate language than can interoperate with many other
languages, run-time compilation and so on.

Yes, it provides safety in the face of incorrect programming and subtle
flaws. In somecases this is usefull. However it is usually more
agreeable to have well written code in the first place.

If your language lacks expressive features then you cannot write code well
in it. I give some examples of this below. Maybe you can tweak my C++ but
it will never rival the OCaml/F# code that it more powerful in every case.

Expressiveness lets you write code faster, that i find no fault with. The fault i find, is allowing programmers to write poor code because they can. We should be encouraging good code.


Intermediate languages (i assume your refering to Java Bytecode &
others?) are not features of a language.

I'm not familiar with Java but both C# and F# integrate language features
solely to let you generate, manipulate and (in particlar) compile and run
intermediate (MSIL) code.

Ah you were refering to self-writing applications? My fault i assumed you were refering to just-in-time compilation.


It is perfectly acceptable for
java, C#, and your lovely languages to be directly compiled into
processor native instructions.

How could you provide run-time code generation, reflection, platform
independence?

To the last - That i believe was the reason for source-code. It was meant to be (but so rarely is) platform independant. From here the trade up goes from strict interpretation (maximally portable), to maximum speed (portable to exact machine copies only).

Runtime code generation can be effected by embedding a compiler/interpreter.

To support reflection you make a compiler embbed type and functional information in the compiled program. If you want it complete for libraries too, you'll need direct support for libraries, or allow programmatic hooks into the reflection system.


Vice versa, C, C++ and even fortran 77 could be compiled to the
intermediate languages.

Yes, they only require a subset of the functionality.

Again i believed you meant Just in time compilation.

All programs can be reduced to a series of minimaly turing complete operations. They'd undoubtedly only require a subset of a highorder languages expressive ability. The trick is to identify the more expressive higher level constructs hidden in the spaghetti.

Accross all languages some will have features directly accessable, others must describe an implementation of them.


(I do concede that
these compilers aren't common or even existant. I assume that if the
need was sufficient the task of writing such would be undertaken.)

I have actually done that in order to translate numerical libraries written
in Fortran into .NET assemblies for easy consumption by .NET languages.

Run time compilation? Is a feature of an IDE - XCode can be set to
perform run time behind the scenes compilation of c, c++, objective-c
and java. (At last reference many months ago, the feature isn't likely
to have been dropped).

That is not what I was referring to. I mean the ability to translate
expressions into native code and evaluate them on-the-fly all within your
own program. Nothing to do with the IDE.

I see what you mean. You would have to be careful with the generation of such expressions though. If what i understand is correct in thier limited form they could report on information that shouldn't be reported (imagine a game player looking at the opponents cash), in a more dangerous form they'd rewrite internal logic permanetly.

Mind you i am using c++ that hands you the loaded gun, aims it, and gives you a nerve steadying drink...


Usually a language that provides molly coddling is not fast.

Our products got a lot faster.

Faster than a comparable compiled c/c++ program. Or faster in
comparison to prior versions of the same program?

Same program is 5x faster in OCaml than C++. That's a 250kLOC commercial
application, not a toy:

http://www.ffconsultancy.com/products/smoke_vector_graphics/?cp

Nod.


My work was impossible in C++ five years ago and I'm quite sure it will
be impossible in C++0x. They are decades behind the current state of
the art...

In what areas are they behind?

Off the top of my head:

. Static checking.
. Encapsulation.
How?

Functors, for example, provide statically-checked parameterization of
modules over modules. Functors are part of the ML module system and
constitute and entire sublanguage. C++ has nothing comparable.

Cool


. Pattern matching.
Regex, yes? Usually seen as a library in anything bellow a scripting
language.

I was referring to pattern matching over algebraic data types but regular
expressions are another good example: they run much faster if they're
run-time compiled to native code.

Hmm, i find that they'd be faster if they were already compiled. Though in the context of the previous i believe you mean from an interpreted byte code (platform independant) which i'd have to give you.


. Type inference.
Hmm, RTTI

In OCaml or F#, you write:

let f(a, (b, c)) = ((a, b), c)

The compiler infers the types of all subexpressions for you and makes them
available to you via graphical throwback in the GUI. The type inferred by
the F# compiler is:

val f : 'a * ('b * 'c) -> ('a * 'b) * 'c

In C++, you must write something grotesque to do the same trivial task,
like:

#include <utility>

using namespace std;

template<class A, class B, class C>
const pair<pair<A, B>, C> f(const pair<A, pair<B, C> > &x)
{
return pair<pair<A, B>, C>(pair<A, B>(x.first, x.first.second.first),
x.second.second);
}

This is the main reason why OCaml and F# typically require 1/4 as much code
to do the same thing. Note that this is not dynamic typing and, in
particular, there is absolutely no run-time performance cost in F#.

I believe the first was a Compiler/IDE feature, the way you describe it. Unless of course your meaning is that the expression tree can be accessed programmatically internal to a program in which that line was written? If so that is handy, otherwise i'm lost, why do you need to know the type of an expression you already know as a programmer?


. Interoperability.
I can't think of anything more common than c. Most OS's are written in
it, and export to it.

Provided you're only talking to the OS that's fine but few applications are
written in C. F# code can interoperate with C, C++, C#, VB and Excel with a
high-level interface that is both type safe and garbage collected.

I see what you mean, but that is the point isn't it? The code interoperates at the lowest denominator (your correct in that c doesn't have to be the denominator). What i believe your refering to is the range of decent application bindings and interfaces. That is roughly the same as my using an objective binding to opengl, or praising boost for providing code that seemlessly works with the full features of c++.


. Run-time compilation.
IDE. Look at XCode, i'm sure there are others.

Nothing to do with the IDE. Look at MetaOCaml, EVAL in Lisp, LLVM or the
MSIL code emitter in .NET for examples of run-time compilation.

Definetly usefull.


. Marshalling.
This sounds like RPC and related. See Pattern Matching.

These languages use uniform internal representations of data structures.
This allows these data structures to be marshalled to file, disk or over a
network (also called pickling and serialization).

I see that this is usefull too. But what happens (and it will) when the internal representations change in the future? (re Java) A more durable solution would keep the language seperate from data representation.


. Compile times.
I would rather a slightly longer compile vs run time.

Our compile times dropped from 24hrs with g++ to only 30secs with ocamlopt.
As I said, our run times became 5x faster as well.

Granted quite a difference. I believe you are refering a c++/g++ to Ocaml/ocamlopt? I'd also like to inquire as to the compartive use of libraries?


. Optimizations.
hmm, This would be compiler dependant. If i'm not wrong most
optimisation are micro and applied to an intermediate language between
input and output. The best place to look for optimisations would be in
a multi language compiler.

There is a lot of commonality, like closures, currying and pattern matching.
When the OCaml compiler is given a complicated nested pattern match it
decomposes it into decision trees compiled the fewest possible comparisons
and dispatches. In contrast, C++ compilers only generate flat sequences of
comparisons (worst case performance) or virtual function dispatch that only
works for a single level of run-time dispatch.

This is why OCaml programs that make heavy use of complicated pattern
matches can be orders of magnitude faster than C++ programs. C++ compilers
have no hope of performing such optimizations because the information is
not available to them. In particular, the C++ language does not even
support closed sum types.

Hmm, i believe you are talking technique here and not language. To this a compiler can flatten, nest, break down, discard, genericise, inline, outline as much as it wants of a program - providing the end result is a program with the expected behaviour. Take a look a sql (and related database query languages).

On the otherhand looking at it in terms of language, i can only see the compiler being more effective due to it's native understanding of the description. Imagine a completely new feature Y, implement it in terms of a language as a description (not as a part of the language) and i wouldn't doubt that ocaml would be in a similar boat to c++.

I believed you were refering to optimisation applied to an expression/execution tree. In such cases written language isn't as important. In such a case similarly intellegent ocaml or c++ compilers would be effectively the same.


. Concurrency.
Wow, threads (and synchronisers).

And software transactional memory, asynchronous workflows etc.

Again usually in libraries (re
Marshalling & PMatching). Many languages don't support concurrency as
primitives because of the many approaches to achieve it. You would be
limiting the code's portability.

These languages are all more portable than C++ IME.

Okay poor point, most systems have settled out. Mind you there are different computer architectures that don't support the view of the humble desktop computer.


. Parallelism.
Yes, similar arguement as with concurrency. It is best to let the
compiler decide, or the system decide.

The compiler's hands are tied unless the language exposes room for
parallelism.

Not really. Compilers can take advantage of gaurantees in the language, eg encapsulation, dependancies, partial ordering.

Admittedly unless you take pains to write the algorithms to be friendly to such manipulation, your likely to end up with difficulties on naive compilers. The easiest way to provide a compiler with such information is to build it into the language. However that still doesn't address the core issue, that of the compiler understanding the program (not just the language it is written in).


. Structural typing.
Probably a given, you've found a hole i my knowledge.

OCaml can infer the equivalent of entire class hierarchies, completely
removing the need to define them in your code.

Consider computing the symbolic derivative of an expression in OCaml:

let rec d e x = match e with
| `Int _ -> `Int 0
| `Add(f, g) -> `Add(d f x, d g x)
| `Mul(f, g) -> `Add(`Mul(f, d g x), `Mul(g, d f x))
| `Var v -> `Int (if v=x then 1 else 0);;

The OCaml compiler infers the type:

val d :
([< `Add of 'a * 'a | `Int of 'b | `Mul of 'a * 'a | `Var of 'c ]
as 'a) ->
'c -> ([> `Add of 'd * 'd | `Int of int | `Mul of 'a * 'd ] as 'd)

That is equivalent to an entire class hierarchy with derived types for
integers, variables, addition and multiplication operators in the
expression type. Moreover, the type inferred by OCaml proves that this
function removes all variables (there is no `Var in the return type of this
function).

The nearest C++ equivalent is not only hideous but it can't even handle
common subexpressions because it lacks a garbage collector to determine
when an expression becomes unreferenced:

You have a point there, the technique is usefull like closures. On the other hand that is a historical decision in the c type system to go for declaration based types, as opposed to implicit typing.

I think this is more a case of opinion/preference. I certainly want structures that are shared between modules to be declared. That way everyone can see what the structure is. On the other hand internal to a module structural typeing may have an application.


using namespace std;

class Var;

class Base {
public:
virtual ~Base() {};
virtual const Base *clone() = 0;
virtual const Base *d(const string &v) const = 0;
virtual ostream &print(ostream &o) const = 0;
};

ostream &operator<<(ostream &o, const Base &e) { e.print(o); return o; }

class Int : public Base {
const int n;
public:
Int(int m) : n(m) {}
~Int() {}
const Base *clone() { return new Int(n); }
const Base *d(const string &v) const { return new Int(0); }
ostream &print(ostream &o) const { return o << n; }
};

class Var : public Base {
const string var;
public:
Var(string v) : var(v) {}
~Var() {}
const Base *clone() { return new Var(var); }
const Base *d(const string &v) const { return new Int(var == v ? 1 : 0); }
ostream &print(ostream &o) const { return o << var; }
};

class Plus : public Base {
const Base *e1, *e2;
public:
Plus(const Base *a, const Base *b) : e1(a), e2(b) {}
~Plus() { delete e1; delete e2; }
const Base *clone() { return new Plus(e1, e2); }
const Base *d(const string &v) const { return new Plus(e1->d(v),
e2->d(v)); }
ostream &print(ostream &o) const
{ return o << "(" << *e1 << " + " << *e2 << ")"; }
};

class Times : public Base {
const Base *e1, *e2;
public:
Times(const Base *a, const Base *b) : e1(a), e2(b) {}
~Times() { delete e1; delete e2; }
const Base *clone() { return new Times(e1, e2); }
const Base *d(const string &v) const
{ return new Plus(new Times(e1, e2->d(v)), new Times(e1->d(v), e2)); }
ostream &print(ostream &o) const { return o << "(" << *e1 << " * " << *e2
<< ")"; }
};

class Expr {
public:
Base *e;
Expr(Base *a) : e(a) {}
};

const Expr operator+(const Expr e1, const Expr e2)
{ return Expr(new Plus(e1.e->clone(), e2.e->clone())); }
const Expr operator*(const Expr e1, const Expr e2)
{ return Expr(new Times(e1.e->clone(), e2.e->clone())); }

Yes it's ugly. Most ugly and/or complex code gets shuffled into libraries for a reason. What till you require newer features in ocaml that have yet to be introduced/standardised.


. Recursion.
Most languages support recursion, C and C++ do. Are you instead
refering to thier optimisation?

Tail recursion, call with current continuation and continuation passing
style. C++ provides none of these.

Tail recursion i believe is a compiler optimisation (though yes it can also be an explicit feature of a language). I have to agree that c++ doesn't provide current continuation or continuation passing. Though if one wanted them, an enterprising individual could emulate them. (Probably comparable to c programmers using function pointer tables to emulate objects)


. Closures.
Probably a given, you've found a hole i my knowledge.

Closures are function values that can capture variables from their
environment. Closures require garbage collection to be implemented in a
usable form because captured variables must be kept alive for at least the
lifetime of the closure, which may outlive the scope where it is defined
because functions can return closures. Indeed, that's exactly what the "d"
function above did.

For a closure a GC isn't strictly necessary. Resource management like smart pointers/references/handles/copies can be used where appropriate. Admittedly this requires more ground work than a gc version.


This has an adverse effect on lots of practical applications like parsing
and the manipulation of trees.

it does? I never knew using a library would be wrong. That is the usual
method for gaining access to complex methods, without having to write
them ground up.

Libraries can't work around all missing language features. Say you use flex
and bison to write a parser given a BNF grammar. You might want to save
memory by hash consing subexpressions (automatically sharing all identical
subexpressions) but that turns your abstract syntax tree into a graph. To
deallocate it you must now start doing graph traversals, which means you're
literally writing a garbage collection.

Yes i agree. Libraries aren't silver bullets. Mind you, how memory is managed depends upon what you need. GCs aren't bad, they just aren't good. They are wonderful when the only resource to be cleaned is memory. To often though deletion requires side effects to occur, messages, i/o port closeing, alteration of memory in a live object. And not as often but still, we'd like the effects to occur soon and not never.

I personally prefer to choose the management style based upon the problem.


In practice, if you don't already have garbage collection then you limit
yourself by not using things like hash consing and bloating all of your
graphs into trees by copying data to avoid recombinant data structures.

Reference Counting. Also who says we must not use tailored memory management? We know beforehand often enough which trees will share nodes, we can tailor our approach to increase efficiency.

This is why parsers written in C/C++ are typically many times slower than
parsers written in languages like OCaml. This is part of what OCaml was
bred for, so that isn't surprising.

I agree, custom make usually beats generic make hands down.

.



Relevant Pages

  • Re: Verbose functional languages?
    ... I wasn't referring to laziness. ... I think this is another serious problem with Haskell (and OCaml). ... Much of its needs to be in the language or at least have enough support ... if the compiler unboxes them for you by way of optimization. ...
    (comp.lang.functional)
  • Re: "STL from the Ground Up"
    ... high-level intermediate language than can interoperate with many other ... If your language lacks expressive features then you cannot write code ... memory management in comparison. ... Mostly because type errors mean that the programmer and compiler disagree ...
    (comp.programming)
  • Some Straight Dope
    ... His program is currently 303 lines long and still much slower and more verbose than the fastest implementations in the other languages. ... but OCaml eventually comes out being eight times slower than ... and suitable to do a language comparison of any sort. ... Some people say that advances in compiler technology roughly contribute a doubling in efficiency every 18 years. ...
    (comp.lang.functional)
  • Re: About alternatives to Matlab
    ... Python will almost certainly be notably slower than moderately ... the powerful and flexible Python language. ... particularly that an OCaml user can easily switch back and forth ... The knowledge is there for the compiler to use but I don't believe any of ...
    (comp.lang.python)
  • Re: About alternatives to Matlab
    ... Python will almost certainly be notably slower than moderately ... the powerful and flexible Python language. ... particularly that an OCaml user can easily switch back and forth ... The knowledge is there for the compiler to use but I don't believe any of ...
    (comp.lang.python)