Static or runtime type-checking (was: Program compression)



From: thomas.mer...@xxxxxx
I know that everything and the kitchen sink has been added to LISP.

That's one of the wonderful things about Lisp. See below...

I spoke of the paradigma and I spoke about adding some things in
the way they are used and implemented in Seed7.

Seed7 seems to be rather limited in what it can easily support.

You can add things outside a paradigma, but it will disturb the
paradigma.

Continuing about Lisp: Lisp provides powerful flexible tools to
allow adding lots of new paradigms without upsetting the existing
system. Starting from a simple lambda calculus, i.e. a way to
define new anonymous function objects at any time, with three
list-processing primitives as applied to linked lists (push new
cell (pointing to new element) onto front of list without losing
the original list, skip over cell at front of list to see what's
left (after the first element) without losing the original list,
peek at first element of list (whatever first cell points to)),
Lisp was extended to include a reader for s-expressions to produce
internal linked-lists which are effectively parse trees of the
s-expressions, a corresponding printer, a universal function (EVAL)
that converts a special kind of parse tree (nested list) into the
result of applying its first element to the rest of the elements, a
toplevel read-applyUniversalFunction-print loop for interactive
use, tagged objects distinguishable at all times without requiring
a priori kowledge, mark-and-sweep garbage collector, global
variables that can be assigned, symbols that can have a set of
properties in addition to a value and function binding and print
name, special variables that can mask earlier instances of same,
numbers and arithmetic functions, strings, structures with named
slots, file I/O, a library of mapping functions, parse-tree
transformations as an initial action when applying the universal
function, extension of the lambda calculus to include lexical
closures and hence shared OWN/STATIC variables, multiple value
returns, keyword parameters, multiple packages of symbols,
user-reconfigurable read table, and several forms of OOP. Many of
those add-ons required changes to the kernel to be efficient, but
enough different mechanisms are already installed supporting enough
different software paradigms that ordinary users can now add
virtually any new facility (except threads) by ordinary
application-level (user) programming.

I see from your recent postings that Seed7 supports most (but not
all) of those mechanisms, hence most (but not all) of those
paradigms, and that Seed7 supports an innovative new way to extend
the syntax by ordinary user programming that is quite different
from how it can be done in Lisp. If and when both of these happen:
- Seed7 supports all the paradigms that Lisp already supports, and
- Seed7 demonstrates its ability to do CGI by live demos similar to
what I already have for seven other languages;
I may get interested enough to download a copy of Seed7 to a
computer I have access to (not my regular shell account) and try to
set up some CGI applications/demos myself, starting with copies of
the demos you will have provided as a starting set.

REM> That's not earlier at all. With my line-at-a-time development
REM> style, as soon as I write the line of code that tries to do the
REM> illegal operation and submit it to REP, it'll signal an error.
TM> So when you type the line
TM> (add a b)
TM> (I assume that 'add' can add two numbers and will fail (at runtime)
TM> when one of the parameters is not a number) the 'REP' knows if this
TM> is a legal or illegal operation?
REM> No it doesn't need to know. It just passes the expression to EVAL, and
REM> takes care to catch any error thereby generated.
TM> If the function arguments are present, an expression can be checked.

By whom? The function called already either checks parameters for
type and range etc., anything needed to validate them, and signals
an error if they are not valid (per the <before> requirements of
the function called and doing this at-start parameter-validity
checking), or if not written in a robust manner like that it passes
them to lower-level functions which signal an error themselves if
they can't operate on the parameters as passed to them. It's a lot
easier to find a bug caused by passing bad parameter if the
toplevel function called does a validity check and issues a
meaningful error message, than if some deep inside function errors
out. But it *is* possible to track down the error in either case,
using the stack frames in the debugger to see what functions are
called with what parameters and using divide-and-conquer to find
the point where it's "obvious" that a bad parameter is being passed
through some point. Then either there's a bug in *that* function
making that bad call which somehow wasn't detected earlier by
line-at-a-time debugging, or a bad parameter slipped into that
function without being validated properly.

The hard thing is: Verify at compile (or edit) time that some
function will always get values of the correct type at runtime.

With line-at-a-time debugging of new code, edit occurs only moments
before execution, so it really doesn't matter whether a mistake is
noticed during edit itself or a couple seconds later when execution
is tried on test data for the first time. The mistake is promptly
fixed, and execution is tried again with the corrected line of
code, and now it works. Fixing the one line of code takes the same
length of time whether it's spotted at single-line-edit time or at
single-line-first-try-at-execution time. I actually prefer to have
it checked when executed and *not* a few seconds earlier while I'm
in the middle of editing it, because most of the time I compose the
line of code in a nonlinear manner, taking a template from
something similar I did earlier and then changing various parts
until it has all the parts correct, and it would be a pain if it
kept interrupting me to tell me what I already know that it is
*not*yet* correct. That's why I'm editing it, and why I haven't yet
submitted it for first attempt at execution, because I *know* it's
not yet ready for prime time. When I am *finished* making it the
best I can, *then* I attempt to execute it, and *then* I want it
checked for correctness, and whether it does a static check or just
calls EVAL on the form and signals an error if anything bombs out,
really doesn't matter at that point.

Now line-at-a-time testing of new lines of code will only check
whether the code works for the test data I'm giving it. It's my job
to make sure I submit all reasonable variations on valid data as
test data and verify it works in all such cases, and to put code
earlier in the function I'm building to make sure no other kind of
data can ever get to this particular line of code. That's why I'm
needed, to make such human judgements. The program can't write
itself. It needs me to make such judgements of where to put what
specific type-checking upon entry to which functions, and what
additional bug checks to put inside functions to detect possible
deficiencies in my understanding of my own algorithm I've written.
There's a CS concept called an "invariant", sort of in the sense of
a physical law, like constant energy or momentum. At a certain
point in a loop, all the various data involved in the loop are in a
well-defined relationship with each other, and that can often be
explicitly checked to make sure that going once around the loop
they haven't somehow fallen out of that correct relationship. As
one simple example, if you're in a loop processing accounts
receivable, the total of unprocessed input and processed output
ought to stay the same at all times, equal to a check value you
computed upon entry. If you have OOP for both input and processed
data, with a method that tells you the total, then it's trivial to
write code like this (vaguely Java notation here):
ARset processAccountsReceivable(ARset in) {
real tot = in.total();
ARset out = new ARset();
while in.hasMore() {
ARitem oneItem = in.getNext();
...Process oneItem...yielding ARitem processedItem...
out.add(processedItem);
unless (tot == in.total() + out.total()) // Test invariant here!
throw exception ... Bug, we've lost or gained something!!
}
return out;
}
More likely you move input records, after processing, to more than
one output set, such as good records to one set and bad records to
another set and deductions to a third set. Then it's more
compilcated to make sure you don't have a money leak, but simply
checking if the total of everything is the same after each
iteration will detect most money leaks. Anyway, that's just an
example of how an each-time-around-loop consistency/invariant check
can protect against an important type of bugs.

How does the execution of an expression, while editing the program,

(while building the program *after* a line of code has been
finished getting edited into believed-correct form)

ver[y|i]fy that it will get a value of the correct type at runtime?

The previous lines of code that produce a particular type of data
to assign to a local variable, or the validation of parameters
going into the function, or a bug check if a call to subroutine and
subsequent assignment to local variable can return two different
types depending on success/failure of the operation whereupon only
one of those two types is valid for the next line of code. As I
showed elsewhere, the OR pseudo-function can be used to define a
default value to handle the failure case, allowing the next line of
code to work correct. Here's that example again:
(setq ix2 (or (position key str :start ix1) (length str)))
(setq token (subseq str ix1 ix2))
If there's no instance of key after ix1, token runs to end of string,
else token runs just to ix2 where the key (closing delimiter) was found.
Note there's an alternate way, which I consider uglier:
(setq ix2 (position key str :start ix1))
(setq token (if ix2 (subseq str ix1 ix2) (subseq str ix1)))
I don't like that way of coding because the function name and first
two arguments are repeated, and if I'm later changing the name of
something there I might change one but forget to change the other.

What I say is:
Without some sort of static type checking it is a huge effort. See:

What you say is wrong. Why do you say it?

http://seed7.sourceforge.net/faq.htm#static_type_checking
With static type checking all type checks are performed during
compile-time. ...

That's a lie! Only internal types can be checked at compile time.
Intentional types can't be checked. With OOP you can often convert
intentional types to internal types by wrapping each
intenationally-typed object with a OOP class type, but that can
become a royal pain in practice, especially when dealing with
polymorphic types. For example, there are several different types
of mathematical expressions: sum, difference, product, quotient,
power, root, and various transcendental functions such as sine and
cosine and exponent and logarithm. *EACH* of those types of
expressions can be differentiated, and *EACH* of them can have any
of the other types as sub-expressions, and in the case of sum or
product any number of any mix of any other types can be immeditate
sub-expressions. There's no way a static type-check at compile time
will make sure that some expression you're going to read in from a
file or you've handcoded as a literal constant or hand-built from
nested calls to constructors is not only differentiable at the top
level but all sub-expressions are also differentiable so that the
recursive algorithm for differentiation won't bomb out due to
invalid data type deep inside some expression. IMO it's better to
not bother with all that polymorphic wrapper stuff and just use
runtime checking with graceful error recovery if the expression
read from a file or from user's terminal isn't differentiable.

Would you like to have a "swordfight" in the sense of I give an
example from my own code where I have a intentional datatype with
runtime checking and you tell how runtime checking can be
completely replaced by compile-time checking, I challenge and you
parry my threat, repeatedly, until either I give up challenging or
you fail to parry?

But never mind. Part of intentional type is range checking, and you
already admitted that compiletime static type checking can't deal
with that correctly. For example, if I have GPS coordinates with
latitude and longitude and altitude, then latitude must be from -90
degrees to +90 degrees, inclusive at both ends, longitude must be
from -180 degrees to +180 degrees, exclusive at one end or other,
and where if latitude is at pole then longitude doesn't matter, and
altitude must be from -10 miles (deepest drillhole where a GPS
receiver might work) to +100 or so miles (highest altitude where
downward-directed antennas on GPS satellites can send the signal).
So how are you going to do trigonometry on such values, converting
from one GPS location to another, without runtime type checking to
make sure the final result is valid?

The discussion was about finding type errors at compiler time.

My point was that only internal types, not internal types, can be
checked at compile time by standard type checks.

The validity of a value (E.g.: to be an age with a value less than
150) is not the question for a type check.

It may indeed be a question for an intentional-type check.
Not all 32-bit integers (internal type) are valid ages (intentional type).

When a program is written the programmer has something in mind.
E.g.: What values his variables or parameters will hold.
Declarations with type allow the programmer to express his
thoughts. This again eases reading the program.

Why do you consider it *necessary* to include these declarations
*before* the line of code is written that depends on them? The
programmer may want to test his line of code *immediately* and not
need to spend time formally naming and declaring the appropriate
data type before checking if that line of code works.

Have you thought of my idea of allowing declarations of types of
local variables and nested-expresssion-passed values *later* as an
option, rather then requiring them to be coded *first*?

By the way, in Seed7, how **do** you check the type of a data value
that is returned from a function and *immediately* passed to
another function without first assigning to a local variable?
Do you have something like (THE <type> <expression>) in Lisp?
Or do you forbid nesting of function calls, requiring every
intermediate result to be assigned to a named and typed local
variable?
.



Relevant Pages

  • Re: Cons cell archaic!?
    ... How would the implementation of a Lisp without a using cons look like? ... the irregularity in its often cited regular syntax. ... Lisp at core is based on functional programing on lists. ...
    (comp.lang.lisp)
  • Re: Program compression
    ... TM> supported by your favorite language (LISP) are good concepts. ... then call the Java compiler to compile that source file to a Class ... TM-STC> Since static type checking makes run-time type checks unnecessary, ...
    (comp.programming)
  • Re: How powerful macros are?
    ... >> for software engineers to use arrays for everything, ... Did I miss something or is #an impossibly unbearable syntax in Lisp? ... and arrays are often slower for very short lists. ... I've also extended the language so that I have re-readable hash tables ...
    (comp.lang.lisp)
  • Re: Cons cell archaic!?
    ... why lisp has the cons problem and was never ... langs that deal with lists, vast majority of list usage is just simple ... and if it's not done in those other languages it must be ... Java and Lisp as languages worth consiering, ...
    (comp.lang.lisp)
  • Re: Lisp collections
    ... If a programming language specification talks about a data type, ... For example, The Common Lisp Hyperspec talks about lists, ... misapprehension that Lisp programmers generally just used lists to store ...
    (comp.lang.lisp)