Re: Seed7 (was: Program compression)

On 6 Jul., 09:36, jaycx2.3.calrob...@xxxxxxxxxxxxxxxxxxxxxx (Robert
Maas, wrote:
Date: Thu, 26 Jun 2008 14:21:31 -0700 (PDT)

Why this response is so belated:
= <news:rem-2008jun25-003@xxxxxxxxx>
Anyway, thank you for the feedback.

From: thomas.mer...@xxxxxx
Are there standard packages available to provide these as
"givens" with a well-documented API so that different application
programmers can read each other's code?
It is not the intention of Seed7 that everybody re-invents the
wheel. There is a well-documented API. The predefined statements of
Seed7 are described here:

Given that such statements aren't in the *core* of the language,
but are added later when the library containing their definitions
is loaded ...
Actually the statements are added during the parsing process.

(possibly when building the executable that is saved on
the disk to avoid the cost of loading the library again each time
the executable is run):
- Does Seed7 include a parser that reads Seed7 source-code syntax
(from an input stream such as from a text file, or from the
contents of a string) and produces a parse tree (as a pointy
Actually there are the functions 'parseFile' and 'parseStri' which
can be used to parse Seed7 source programs into a values of the
type 'program'. I just added a short description about the type
'program' to the manual. See:

Note that I try to handle the program currently executed and the
program which was parsed into a 'program' variable to be separate.
It is possible to execute 'program' values and to request the code
of a 'program' value in a structured form. Actually the Seed7 to C
compiler uses this feature to generate the C code. Currently the
features of this reflection are designed to make them usable for the

It is not my intend to support programs which manipulaten their own
code as it is done with "self modifying code".

- If so, does this parser automatically get enhanced whenever a new
statement type is defined in some library that was loaded, so
that statements defined in the library can now be parsed?
Yes. This is something happening during the parsing process.
Loading a library at runtime as a way to introduce new statements
for the program which is currently running IMHO makes no sense.

- If so, is there also an inverse function that prettyprints from a
parse tree back out to textual source-code syntax?
During the parsing some information, such as whitespace and comments
are lost. Some information about the position of an expression
is maintained. Generally I think that such a prettyprinter would be

- If so, does that prettyprinter function also get automatically
enhanced whenever a new statement type is defined, so that
statements of that new type can be printed out meaningfully?
I have nothing done in this direction, but I think that that should
be possible. It might me necessary to extend the the reflection, if
some functionality necessary for prettyprinting, is missing.

I ask because in Lisp it's almost trivial to write a utility that
reads in source code, analyzes it for some properties such as
undeclared free variables or functions etc., in order to build a
master cross-reference listing for an entire project and to flag
globally undefined functions, and also it's trivial to write code
that writes code and then either executes it with the same
core-image or prettyprints it to a file to be compiled and/or
loaded later. So I wonder if Seed7 provides the primitives needed
to make the same kinds of tasks equally easy for Seed7 sourcecode.
Such things are planned for Seed7. I have started with a program
which generates html documentation including a source file where
every use of a function is linked to its definition. The program
(doc7) works to some degree, but not good enough to release it.

Other types and their functions (methods) are described here:

| boolean conv A Conversion to boolean
| ( Type of argument A: integer,
| boolean conv 0 => FALSE,
| boolean conv 1 => TRUE )
Is the behaviour defined for other values given? Does it throw an
exception, or are compiler writers free to treat other integers any
way they feel, resulting in code that produces different results
under different implementations? The Common Lisp spec is careful to
have undefined behaviour only in cases where the cost of
prescribing the behaviour would have a good chance of greatly
increasing the cost of implementation. Is such the case here?
This function was added short ago to be helpful for the I4 P-Code
interpreter which would be used for the P4 Pascal compiler.
Yes, I transfered this classic Pascal compiler to Seed7...
The I4 interpreter needs cheap functions to transfer all its
basic types like boolean, float, char, set to and from integer.
The Pascal version of the I4 interpreter uses an unchecked
cased record (which is equivalent to a C union). Therefore I
introduced this function. It was a mistake to add this function
to the documentation, since it is currently only experimental.
BTW it works like the odd function.

Again, these are such basic container types that they really ought
to be provided in a standard package. Are they?
A short explanation of Seed7 container types is here:

| ord(A) Ordinal number
| ( Type of result: integer,
| ord(FALSE) => 0, ord(TRUE) => 1 )

So this is just the inverse of boolean conv?
Yes, but the introduction of 'boolean conv' was just experimental.
The function 'odd(integer)' is the function to be preferred
to convert an integer into a boolean.

| succ(A) Successor
| ( succ(FALSE) => TRUE,
| pred(A) Predecessor
| pred(TRUE) => FALSE )

Why even bother, unless this is a hackish way of conditionally
signalling an exception?
Such functions are present to be usable in generic code. That way
the generic code can assume that the 'succ' function is present. The
'incr(A)' function, which is just a shortcut for 'A := succ(A)', is
also present just for this purpose. An example of a template
function which uses 'incr' is here:

| rand(A, B) Random value in the range [A, B]
| ( rand(A, B) returns a random value such that
| A <= rand(A, B) and rand(A, B) <= B holds.
| rand(A, A) => A,

What distribution within that range, uniform or what?
Uniform. I added an explanation to the documentation.

What sorts of datatypes are allowed for A and B?
Do they have to be of the same datatype, or can they be unrelated?
rand(3,9.5) rand(4.7SinglePrecision, 9.7DoublePrecision)
rand("FALSE",4.3) rand(FALSE,"TRUE") rand(TRUE,3)
Which if any of those expressions are conforming to the spec?
There is a general rule to keep the descriptions short. This rule
can be found at the beginning of the chapter "PREDEFINED TYPES":
The operators have, when not stated otherwise, the type described
in the subchapter as parameter type and result type.

Note that the 'and' and 'or' operators do not work correct when
side effects appear in the right operand.

What is that supposed to mean??? If you use an OR expression to
execute the right side only if the left side is false, what
happens? I would expect what I said to happen, but the spec says
that doesn't work correctly? So what really happens?
This is a bug in the spec. I have corrected the sentence to:
Note that this early termination behaviour of the 'and' and 'or'
operators also has an influence when the right operand has side

(or (integerp x) (error "X isn't an integer")) ;Lisp equivalent

The result an 'integer' operation is undefined when it overflows.

That's horrible!
AFAIK many languages such as C, C++ and Java have this behaviour.
I would like to raise exceptions in such a case, but as long
as there is no portable support for that in C, Posix or some
other common standard, it would be hard to support it with
satisfactory performance.

Does Seed7 provide any way to perform
unlimited-size integer arithmetic, such as would be useful to
perform cryptographic algorithms based on products of large prime
numbers, where the larger the primes are the more secure the
cryptographic system is?

| ! Faktorial

Is that how the word is spelled somewhere in the English-speaking world?
This is how a word looks like when it is not translated correctly.
Thank you for pointing this out.

| div Integer division truncated towards zero
| ( A div B => trunc(flt(A) / flt(B)),
| rem Reminder of integer division div
| ( A rem B => A - (A div B) * B,

Most CPUs, or software long-division procedures, compute quotient
and remainder simultaneously.
Is there any way for a Seed7 program to get them together?
Currently not, but it is not hard to add such a thing.

It's wasteful to throw away the remainder then need to multiply the
quotient by the divisor and subtract to generate a copy of the
remainder that was thrown away a moment earlier.
I guess that a good optimizing compiler can recognize the situation
when 'a div b' and 'a rem b' are computed close together without
changing a or b in between. Since Seed7 is compiled to C, I think
that I can rely on the C compiler to do this optimisation.

(multiple-value-setq (q r) (floor a b)) ;Can do it in Lisp

| ** Power
| ( A ** B is okay for B >= 0,
| A ** 0 => 1,
| 1 ** B => 1,

So -1 ** -1 is required by the spec to signal an exception, instead
of giving the mathematically correct result of -1?
In the general case 'a ** -1' does not have an integer result.
AFAIK Ada also does it that way.
BTW: The type float also has exponentiation operators defined.

While 0 ** 0 which is mathematically undefined is *required* to return 1?
This behaviour is borrowed from FORTRAN, Ada and some other
programming languages which support exponentiation.

I'm too tired to proofread your spec any further.

| flip(A) Deliver a hash with keys and values flipped
| ( Type of result: hash [baseType] array keyType )

How is that possible if the table isn't a 1-1 mapping??
That's the reason the result is of type:
hash [baseType] array keyType.
The values in the hash tables are arrays with keyType elements.

What precisely do you mean by "templates"?
What computer scientists mean when they speak about "templates"
is explained here:

I.e. exactly what C++ implements, everything else is different and
not the same thing and substancard compared to C++ templates,
Wrong. I use the word template to describe a function which is
executed at compile time and declares some things while executing
(at compile time). For example: The function 'FOR_DECLS' is used to
declare for loops. FOR_DECLS gets a type as parameter and declares
a for loop for that type. This is explained here:

As you can see it is necessary to call template functions explicit.
They are not invoked implicit as the C++ template functions.
IMHO this explicit calls of template functions make the program
easier to read. Maybe I should add something to the FAQ.

] Although functions can return arbitrary complex values (e.g. arrays of
] structures with string elements) the memory allocated for all
] intermediate results is freed automatically without the help of a
] garbage collector.
I have improved the FAQ for this. See:

| Memory used by local variables and parameters is automatically freed
| when leaving a function.

That doesn't makes sense to me. Suppose there's a function that has
a local variable pointing ...
Pointers are something else. If you are using pointers you are
responsible to manage that they point to reasonable data.

The automatic freeing of local variables has exceptions (sorry
I will add an explanation to the FAQ). The values referred by
pointers and the values refered by interface types are not managed

to an empty collection-object (set, list,
array, hashtable, etc.) allocated elsewhere. Now the function runs
a loop that adds some additional data to that collection-object, so
the object is now larger than before. Now the function returns. How
much of that collection-object is "memory used by local variables
and parameters" hence "automatically freed when leaving a
function", and how much of that collection-object is *not* such and
hence *not* automatically freed upon leaving the function?
I see. We have a cultural misunderstanding.
Lets say the collection is an array.
What you were suggesting is a collection declared with:

array ptr myData

which means that the collection contains pointers to myData
(some structure). In this case you are right and automatic
management is not possible (at least in the sense I talked about).

What I have in my mind when talking about automatic managed memory
is a collection declared with:

array myData

In this case the collection contains (copies of) the actual data.
When such a collection is freed it can also free its content
since it owns it. And this are the things which done automatically
in a stack oriented manner.

If you use pointer structures for everything you are right that
a GC or a manually managed heap is necessary. In Seed7 many things
can be done with abstract datatypes.

If abstract datatypes are used in an efficient way there is not
so much need to use pointers in Seed7 is not so high as in some
other languages.

Debug-use case: A application is started. A sematic error (file
missing for example) throws user into break package. User fixes the
problem, but saves a pointer to some structure in a global for
later study.
You are much too deep in the Lisp way of thinking. If a stack
shrinks the elements popped from the top just don't exist any more.

So it's impossible in Seed7 to have a function create an object and
return it, because anything created by a function doesn't exist any
more after return?
The return variable is excluded from this mechanism.

It is also a common bug in C/C++ and other similar languages when
a function returns a pointer to some local data (which is at the
stack) ...

Are you saying that in Seed7 it's impossible to have any data that
is not on the stack, hence it's impossible for any function to
allocate memory for some object and then *return* a pointer to that
object so that the caller can later work with that object?

If you want such data to be available later you need to make a
(deep) copy.

That's ...
[snip reasons why not all data should be stack oriented]
I agree that some data cannot be managed in a stack oriented way.

;Static type checking fails to detect this type-mismatch undefined-method error.
... Every bug found at compile-time will not make you trouble at
run-time. The earlier you can eliminate bugs the better.

Why do you feel the need to have two different times, one where
static stuff is checked but you have no idea what's really
happening, and then one where things actually happen?
I think that compile-time type checking can find bugs which
slip through the fingers when you test your program.
IMHO even a test with 100% code coverage is not sufficient since
the combination of all places where values are generated and
all places where this values are used must be taken into account.
I have aggain improved the FAQ to contain this argumentation:

I suppose in
Seed7 it's impossible to have an interactive loop where you can
type in a line of code and it *immediately* does something and you
*immediately* see whether it did what you expected it to do instead
of needing to wait until the whole program is compiled before you
can see what that one line of code did?
No. There is the type 'program' which can be used for that.

BTW the Seed7 parser usually processes 200000 lines per second.

... There is also the type 'bigInteger' which serves as
unlimited-size signed integer. The type 'bigInteger' is explained

| div Integer division truncated towards zero
| ( A div B => trunc(A / B),
| rem Reminder of integer division div
| ( A rem B => A - (A div B) * B,

For bigIntegers, it's especially painful not to have division with
both quotient and remainder directly returned. It takes a lot of
extra time to multiply the quotient by the divisor (then subtract
that from the original value) to get back the remainder after
having thrown away the remainder in the first place.
When there is demand, such a function can be added.

What is the precise meaning of type 'char'?
The 'char' values use the UTF-32 encoding, see:

| The type 'char' describes UNICODE characters. The 'char' values use
| the UTF-32 encoding. In the source file a character literal is written
| as UTF-8 UNICODE character enclosed in single quotes. For example:
| 'a' ' ' '\n' '!' '\\' '2' '"' '\"' '\''

That's not written well at all. It doesn't make sense.
Agree. Historically this where just character literal examples.
Some of them use escape sequences, which were explained below.
Now you get the impression that this are UTF-8 literal
examples which was not the original intend.

UTF-8 coding of a single character, it's two different UTF-8
(US-ASCII subset thereof) characters which are a C convention for
Likewise \\ is two UTF-8 (US-ASCII) characters.
Likewise \" is two UTF-8 (US-ASCII) characters.
Likewise \' is two UTF-8 (US-ASCII) characters.
This are escape sequences. They are explained in the next paragraph.
I have moved the character literal examples after the explanation
of escape sequences. That way you don't get the impression that
this is an explanation of UTF-8 literals.

I don't think you have any idea what UTF-8 really means.
As I have implemented UTF-8 support for Seed7, I think I know
something about it.

Greetings Thomas Mertes

Seed7 Homepage:
Seed7 - The extensible programming language: User defined statements
and operators, abstract data types, templates without special
syntax, OO with interfaces and multiple dispatch, statically typed,
interpreted or compiled, portable, runs under linux/unix/windows.