Re: object system...




"Dmitry A. Kazakov" <mailbox@xxxxxxxxxxxxxxxxx> wrote in message
news:163b2w1jiz24c$.115zwvzcu6sr7.dlg@xxxxxxxxxxxxx
On Tue, 13 Jan 2009 23:54:00 +1000, cr88192 wrote:

"Dmitry A. Kazakov" <mailbox@xxxxxxxxxxxxxxxxx> wrote in message
news:1obq05ayl19zl$.mghpopxh028w.dlg@xxxxxxxxxxxxx
On Mon, 12 Jan 2009 22:50:28 +1000, cr88192 wrote:

4. when rebuilding the code. in this case, usually the type information
is kept purely as sideband or scaffolding, and once the code is built
it
ceases to matter, or at least until more code needs to be built around
it..

Same as 3, the context must know the type where an operation on it
should
happen. Context /= object /= type.

only when the code already exists at compile time...

if the code does not exist, it can't be typed at this point, and so the
information needs to be kept around until it can be used...

Yes, for a compiler the program is a text.


only to the parser...

then it is ASTs, an IL format, ASM, ...


for example, right out of the box does the OS know every possible app
that
can run on it?...

Yes. For an OS every possible program is known by its type "application."
Compare, does a program declaring an integer variable know every possible
integer? The answer is yes. It might not know some concrete integer.

It is important not to confuse construction of concrete values and their
types.


the concrete value is a 32 bit integer...
and the type is some sideband data (for most of my code, this data is stored
as signature strings). it is all data which is there for if/when it is
needed...


no, the OS only knows about apps after they are installed...

Yes, that is the behavior of "application type." <=> it knows
applications.


FWIW, the OS can be compared to a huge self-modifying program, where it
provides the address space and manages the disk, but when loaded the apps
integrate into itself. the static compiler is much like a dynamic compiler,
....

the main difference though with a self-modifying app is that all this
happens in the same address space, and that a good deal more information
needs to be retained about the running code...

much like the OS kernel, the statically-compiled parts of the app tend to be
frozen from the POV of the dynamically loaded/built code.


but, yes, if nearly all of the code can see itself as its own data, it
can utilize itself in building new code and features.

That is self-referential and thus inconsistent. To make it consistent
such
systems need to be split into two, neither of which is self-referential.
For example, into interpreter and the code to interpret. For the
interpreter the code is data. The types are still static for the code
and
do not need to be known there.

not really...

when we compile to machine code, there is no longer an interpreter...

Of course, there is. That is the CPU, an interpreter of machine code.


yes, but this is not in the sense that is meant...


in this case, an interpreter does not build a confined inside world, but
a
compiler starts integrating itself with the outside world, thus allowing
the
app to perform tasks beyond those originally built into the statically
compiled parts of the app.

This would lead you straight into barber's antinomy etc.


not really, the app only needs to know about itself, and have info about the
external world (typically, this is through the use of external information,
such as DLL's and header files, and information written into data files...).

but, there is no need for it to compile (all of) itself, because the
external compiler (AKA: static compiler and linker) has already done so...

so, in the minimalist case (an extreme example), only the dynamic linker is
loaded, and has available some info about the app to be linked, at which
point it will precede to pull some libraries into memory (say, the
assembler, compiler, garbage collector, ...), and at which point, it could
start pulling in and compiling scripts, ...

so, if the app is of any non-trivial size, it has essentially built itself
in memory (the linker being the only real "original" part of the app...).


now, as for the barber:
it is a paradox if both possibilities are to be considered at the same time
(as in an axiom), however if on each day he makes this same decision (using
the prior day as his current status, as is more typical in procedural
logic), then he will only shave every-other day (the axiom will be true at
one try, but false at the next, then true, then false, ...).

if the rate is doubled, say, he tries to shave again every afternoon or
night, then he will be shaven every morning...


it applies to an object, which is an instance of a class implementing
an
interface...

Object is an instance of its type. It is not an instance of the class
rooted in the type. An instance of the class is the object's type. Other
instances are types derived from it.

this doesn't make sense...

Class is a set of types. Set is not an element of, as we know after
Russell.


ok.

however, as can also be noted, software is based on code, not on abstract
logic...
if we say that in reality the types do not exist, but are merely
manifestations of the present state, then there is not a problem with saying
that a type is represented by a type...


or, by the same definition, a compiler for C++ can't be written in C++,
since it would have to compile itself, but oh wait, this is the whole point
of writing the compiler in its own language...

C compilers are also usually written in C, and before that, assemblers in
assembler...

now, it is like the prior example:
the compiler is always compiled via a prior version of itself, but not by
itself at the same time it is being compiled...


it is declared against a class but it uses the instance.

This is another case. If a polymorphic object is declared, then it is an
instance of the type which is a transitive closure of the class. I don't
know which case you mean, polymorphic or a non-polymorphic one.

I am not sure what is being said here...

Because class cannot be a type, you need some distinct type to represent
the class. The set of values of the type is a closure of the values of
types from the class.


structs, strings, ...

now, there are more direct versions of this wonkiness:
consider SmallTalk;
classes are first class objects (aka: instances) of prior existing classes
(aka: classes are needed to represent their own representation).

now, how ST implementations pull this off in practice, I am less certain,
but I am aware of this bit of amusing trivia...


it is not declared against the instance, nor is it used against the
class...

When object is not declared to have a type related to the interface,
then
any operation of the interface applied to the object is a type error.

In a consistent strong typed system, you just cannot do these things.
That
is the idea of strong typing, not to allow this mess.

again, this does not make sense.

To break types indeed does not make sense. The point is, that if you tried
to formalize the stuff we discuss, you would quickly discover that you
could not do it. It is inconsistent like "all Cretans are liars."


formalisms don't matter so much as long as the code is working...


do you use an interface to access static methods?...

Static method? If you mean C++ static member, that is an artefact of its
poor design, which confuses visibility issues with types.


close enough... I meant like Java and C# static methods, where the class can
be accessed apart from its instance...


I say, the purpose of a higher level language is to get more work
done
more effectively, and if this is by hiding or exposing the machine is
not important, only that more work be done, that it be done faster,
and
that it be done better...

It cannot be unimportant, because exposing machine prevents some vital
optimizations. For example, if you handle register allocation
manually,
you prevent the compiler to do it for you. And with most of
optimizations the
compiler will beat you by margin.

That is apart from the issues of code reuse, portability, safety,
maintainability etc.

maybe so, but if there were useful optimizations that could be done
here,
almost invariably it would have been done.

This is done in Ada, where use of built-in types is strongly
discouraged.
That gives you a lot of optimization unavailable in C. Consider a simple
example:

procedure Foo (A : in out Some_Array) is
subtype Index is Integer range A'Range; -- Constrained to the range of
A
I : Index := ...;
...
A (I) := 23;

No index range check here is necessary, because the compiler knows that
I
may not be outside the index range of A. It is declared to be inside it.
That the implementation would use a 64-bit hardware integer for it does
not matter.

now, can you demonstrate that this provides a usable performance gain?...

Yes, in our case optimization gives about 30% performance gain. But that
is
beside the point. The point is that if you describe semantics in machine
independent terms, this allows the compiler to perform a wide range of
optimizations. Whether this possibility is realized is another question.
This a win-win game. You get a quality program, which is potentially more
efficient.


hmm:
typedef unsigned int int24 __attribute__ ((bits 24));

yeah...

then again, I also use things like:
typedef struct gcp_ *gcp;

to give me types that the compiler will automatically complain about if one
tries to mess with them in much of any way other than passing them around
(or at least not without casting).


I may put a little more weight into this claim if evidence can be shown,
or
at least a justifiably solid explanation is given...

"compiler can choose" does not necessarily mean "faster"...

maybe it means "can omit bounds check", but C does not use these anyways
(and more recent compilers, such as those for C#, are in many cases able
to
figure out when to leave out the bounds check).

It can use a shorter machine type for the index instead of int. When you
specify int, the compiler does not know if this int is "as-is" or else an
implementation artefact, and unsigned char would go as well. To check such
issues is a heavy burden and is theoretically incomputable at all.
Furthermore, the compiler can use indexing machine instructions which may
have no explicit index at all. Because the index range is known the
compiler could load the whole array into a cache etc.


and you are saying an unsigned char would be faster for an array index?...

on both x86 and x86-64, 32 bit integers are the fastest types...

now, a char can save space, which has other uses, but very often a compiler
will internally "upgrade" a char to an int for sake of both convinience and
performance.

but, in any case, typically array index ops (or pointer ops, ...) can only
work with indexes of the same size (full 32 bits on x86, or 64 bits on
x86-64), which may mean expanding them to the needed size (via movsx or
similar...).


When you expose machine types, you cannot describe their semantics
in
a machine-independent way, obviously.

doesn't matter when the machines in question implement more-or-less
the
same semantics...

How do you define "more or less same semantics?" Computing is
discrete,
1+1 is either 2 or else wrong. Does 15326+35221 overflow?

you know... 2+2=5...

whether or not numbers overflow depends on their sizes and other
factor.

It depends on the type of. If you lack this information in types you
cannot write a program with defined semantic.

In Ada I write:

type T is range 1..2;

this instructs the compiler to implement a semantics that makes
impossible
to assign 2+2 to a variable of the type T.

yes, but typically the compiler usually does not need to know this.

It usually does, because using of built-in types is discouraged.


what kind of reason is that?...

in most mainstream languages, they are not discouraged.
in fact, they are the only things offered.


only a few languages have implemented these particular features, and the
language in question was also known for having many issues which made it, in
the minds of many people, not a good language software development (I have
not used it personally, but these types of claims tend not to be wholly
baseless).

I think, that it was overly pedantic about types, ... were high on the list.


all that is needed to know is that the types are big enough to hold the
values in question, and if there is an overflow, that is the programmers'
problem (or, possibly, the fault of whoever it was trying to compile the
code on whatever arch it does not work on...).

Everything is programmer's problem. The argument of "big enough" is plain
wrong. Consider a modular type with circular shift operation as a
counterexample.


if one needs a circular shift, it can be implemented easily enough with
existing types.

for example, a 20 bit circular left-shift:
v=((v<<i)&((1<<20)-1))|((v>>(20-i))&((1<<i)-1));

or, a right shift:
v=(((v&((1<<20)-1))>>i)|((v<<(20-i))&((1<<20)-1));

now, these could be wrapped nicely in macros, and there is no need for some
funky type...


it can also be noted that most processors wont have built-in operations for
these cases anyways (and many expressions in the above would likely be
reduced to constants by the compiler).


they can fix the problem, but who ever says it is expected or required to
work?...

it is like when driving a car... a simple fault can crash the car and/or
kill people, and it is a matter requiring a decent level of skill and
attention, and lacking much to prevent all this, yet if the driver
crashes
the car, that is their problem, not the problem of anyone else...

Certain levels of certification require static analysis.


for the manufacturers, maybe...
for the drivers, it is their fault if they wreck the things, not the
manufacturers.
the manufactures tend to just put in a few features to reduce the overall
damage if possible...


but, anyways, if needed, there are exact sized integers:
int16_t; int32_t; int64_t; ...

That does not work. See my example above. Another example is when a wider
range integer is used for a hardware register, such as an analogue output,
you might get serious legal problems writing values out of range...


and it is also trivial to write range clamps as well...

2 if statements, that is sufficient...


more so, for many tasks, how can one justify using a language like Ada
anyways?...

Do you mean technical or economical justification? We just do not have
enough staff and time to develop reusable software in C++. It is too
expensive and risky to write high integrity software in C++. (We still
develop about 70% in C++, alas)


well, I guess it depends some on domain then.

probably, the majority of developers target ordinary PC's...
the extra hassles just wouldn't be worth while...


for example, how good of support does it have for DirectX, OpenGL, Win32
API, ... ?...
how effortlessly does it share data with C, C++, ?...
how good is it at implementing customized memory management, VMs, and
self-modifying code?...
how about easily doing copy/paste porting between languages?... (this is a
major win for many of the mainstream languages, since the similar syntax in
many cases keeps the porting effort fairly low in many cases...)
....

if the answers are "more effort than C and C++", then there is a problem...

to face them effectively, a language has to be without any major detractors,
with integration issues being minimal, and while still offering a general
improvement over many of their shortcommings.


so, I figure, rather than trying to replace what exists, I will just try to
absorb it, while still hopefully making some improvements in the process
(this is the hard area, as it is difficult to implement things without their
implementation sucking, but if one tries to at least implement it reasonably
faithfully, that is probably ok...).

sadly, the C++ standard is a bit scarrier than the C standard, which is why
I have not as of yet tried to implement a dynamic compiler for it (and
another reason to insist on C for the APIs, which makes it a whole lot
easier to automatically tool the existing code).

so, at least in part, I have long since written much of my code with the
prospect of automatic tooling in mind...

actually, I have long since been using many specialized automatic tools to
process my source anyways...


or such...


--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de


.



Relevant Pages

  • Re: any regex gurus out there?
    ... > No matter what number or combination of backslashes I used it didn't work. ... - The compiler wouldn't take it otherwise. ... or escape each backslash with a backslash, ... > French language, the letter A may have three different accents. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Comparison to Comparison
    ... alive", no matter how long you wait after the Collect call, and no ... regardless of the size of the array. ... As near as I can tell, the issue is that when you assign into the objectas part of the argument to the WeakReference constructor, the compiler generates a hidden local variable. ... But, absent a more invasive inspection of the execution of the code, I'd admit that even your code example isn't proof of that. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: C Programmer Needed
    ... >> For that matter, there isn't much from C99 that I miss. ... only thing either does is assert a guarantee that the program does not ... compiler vendors that matter a long time ago. ...
    (comp.programming)
  • Re: object system...
    ... ceases to matter, or at least until more code needs to be built around it.. ... for a compiler the program is a text. ... optimizations the ... No index range check here is necessary, because the compiler knows that I ...
    (comp.object)
  • Re: Strange value
    ... Note that the compiler always initializes data members in the ... For instance, _matter before _identifier. ... Welcome to "Undefined Behavior". ...
    (comp.lang.cpp)