Re: Is C99 the final C? (some suggestions)

From: Paul Hsieh (qed_at_pobox.com)
Date: 12/11/03


Date: Thu, 11 Dec 2003 10:09:54 GMT

Sidney Cadot wrote:
> Paul Hsieh wrote:
> >>>Its a big deal to me. I don't own any material on the C language
> >>>that specifies all of this.
> >>
> >>The C standard specifies what is UB and what's not. Grepping an
> >>implementation is a fragile way of dedicing semantics, I would say.
> >
> > Yes, and what percentage of C programmers in the world do you think have ever
> > had the standard in their hands that they actually refer to?
>
> I think there are relatively few excuses for any professional C
> programmer not to have access to the standard (and reading it once or
> twice wouldn't hurt either). It's only 18 bucks in electronic form,
> and you don't even have to go out to get it.

Care to take a quick survey of comp.lang.c patrons who own their own copy of
the standard?

> > My objection to your proposal can simply be restated as "all
> > machines have limited resources", deal with it.
>
> Then the standard would have to specify what it means exactly by a
> "resource". Standards are no places for vague terminology.

You mean vague terminologies like "stack"?
 
> >>[...] If the thing needs to be able to run in
> >>a multithreaded environment, you could even do that, with some
> >>extra effort. You need some form of mutex locking, but you can
> >>resort to Lamport's bakery algorithm for that.
> >
> > I had to look this one up -- that's one of the classic useless
> > non-solutions (especially if you want to call malloc more than 4
> > billion times, for example), which explains why I've never
> > committed it to memory.
>
> We have long longs nowadays. 2^32 is not an important limit.

And long long's are portable?!?! Are you even following your own
train of thought?
 
> > You still need to be able to identify process instances, which *BY
> > ITSELF* is not portable. Process instances identifiers would
> > count as something *BEYOND* just malloc/free/etc.
>
> Why do you need to distinguish process instances?

Reread the Lamport's bakery algorithm. One of the crucial steps is
"for each process, check their number, and pick the max, then add 1".
Another is "for each process that has a lower number, wait in line
behind them." This means you have a way of knowing how many processes
there are, and how to index them. This means you have to hook out or
wrap whatever "spawn" mechanism you have in your language (or use some
other platform specific mechanism to iterate through all running
threads.)

> [...] In your API, each call carries the heap it refers to as a
> parameter.

Yes, but if two threads simultaneously call heap functions on the same
heap (heaps are not restricted to being distinct across threads) then
there is a race condition.

> >>I would say that writing a more powerful heap manager is possible in
> >>principle, even in a multithreaded environment, as long as you have a
> >>well-implemented malloc() (and friends) at your disposal.
> >
> > *BZZZT!!!* Wrong.
> >
> >>It would be a drag to do so and it would be slow, but it is possible.
> >
> > Impossible. Slow, fast or otherwise. Its just impossible.
>
> If you can explain to me why you need to explicitly distinguish threads
> in the memory manager, I'll concede the point.

Go look up the source code for *ANY* multithreading supporting
malloc/free. I absolutely guarantee that every single one of them
uses some kind of platform specific exclusion object to make sure that
two simultaneous calls to the heap do not collide.

One way or another the heap is some kind of quasi-linked list of memory
entries. You should have learned from your OS classes (assuming you passed it
and/or comprehended the material) that all ADTs, including linked-lists have to
be protected by some sort of critical section like mechanism if more than one
thread has access to it. If you do not understand this, then you don't
understand multithreading.
 
> >>So far, nobody has challenged me to come up with a specific phrasing for
> >>this (which actually surprises me). Here you go. It's a bit rough, but
> >>you get the idea.
> >>
> >>---- Proposed addition to the standard to add stack overflows:
> >>---- Draft
> >>
> >>a. For an active function invocation, define an upper bound for the
> >> amount of memory in use by this function as follows (in the
> >> following, denote "Upper Limit on Memory Usage" by "ULMU"):
> >
> > And what if the local/parameters are not stored, or storable on the stack?
>
> How do you mean? They have to be stored. And I'm presuming they're
> stored on the stack.

Most microprocessors have these temporary storage elements called
"registers". They are often not part of any stack structure. Many modern
compilers will map local variables and temporaries to these "registers", thus
not requiring stack storage for them.
 
> > What if an implementation uses multiple stacks depending on the
> > *type* of the parameter?
>
> The hypothetical implementation [...]

That the ANSI C standard currently does not have any problem with ...

> [...] that does this will be out of luck. Not because the proposal
> wouldn't work, mind you, but because the ULMU is a very loose bound.
> But hey, it beats having no bound whatsoever.

No it doesn't. Dealing with the stack is an area that clearly should be
platform specific. For example, it could be that an embedded system has
fast and slow ram. Certain data types might be best suited to being mapped
into fast RAM, whether they are local or not, but you might have a lot more
slow RAM, so you'd put all the rest of the stuff in there.

> >> - For every parameter/local variable of basic type there is a
> >> corresponding constant macro of type size_t:
> >
> > You missed all the implicit temporaries required. Homework
> > exercise: for any fixed number n, create a function which requires
> > at least n implicit temporaries (not explicitely declared) to
> > implement.
>
> Ah yes! So it seems. An interesting problem. Have to think about that
> for a bit.

Why don't you think about the implication of the results of this
exercise with respect to your proposal while you are at it.
 
> > And what about the number of alloca() calls? alloca() is a common
> > extension which just eats extra space off the stack dynamically --
> > the big advantage being that you don't have to call free or
> > anything like it, for it to be properly cleaned up. It will be
> > cleaned up upon the function's return (the simplest implementation
> > of a "freeall".)
>
> Ok, we can add alloca'ed space is each function invocation, plus an
> overhead constant per active alloca.

The alloca() calls can be in a loop. Knowing how much memory a set of
alloca's will perform in any function is equivalent to solving the
halting problem.
 
> Unless you allow that I'm trying to define an upper bound of total use,
> which may be loose as far as I am concerned. The idiomatic use would be
> to compare this to the ULMU_TOTAL to see if it fits. This may yield the
> number of bytes on the smallest check for all I care. On those silly
> platforms, you will grossly underestimate stack capacity now and then.
> [...] Still beats the current situation.

This versus just living with runtime stack checking? I'll take the
runtime stack checking.

> >>When reading the code that used (rather than defines) the operator, it
> >>helps tremendously if the meaning jumps from the page, rather than
> >>having to look it up in the Big Manual of Operators in use at your local
> >>workplace.
> >
> > You are completely ignoring that large population of people who
> > happily use operator overloading today. You don't understand that
> > what you are saying is nothing more than your very narrow opinion.
> > And you're using your opinion to try to trump a proposal whose
> > merits lean very heavily on its technical features.
>
> My opionion is determined largely by the psychological objections (to
> which you don't subscribe.)

Says who? You see, unlike some people, by idea isn't driven by an
opinion. Its driven by a consideration for design. Since you *CAN'T*
add just any old operator to satisfy everyone, then how do you satisfy
the demand for more operators? Some variation on redefinable
operators or operator overloading is the logical conclusion -- just
fix its problems (since type-index functions don't really exist in C,
so you just add in a bunch of them so that you don't lose any of the
ones you have). For all I know, I might not ever use such a feature,
even if it *WERE* to be endorsed by the standard.

> My technical problem is that I can see no way that this could be
> implemented.

Well I don't know what to make of this -- you can't see a race
condition when its staring you in the face, and you can't imagine
problems with knowing the size of a function's stack usage.

> > > A small hypothetical case study. Suppose engineer A, working on
> > > nuclear reactor core controllers, works in a team where "?<"
> > > means "minimum" and "#<" means "x is implied by y" for some data
> > > structure involved in logic reasoning on the state of the
> > > system. Now he switches companies; he goes to work on electronic
> > > toys and gadgets. Here the convention is different, but the same
> > > operators are used.
> >
> > The example you've made up has to do with basic comprehension of
> > the engineer -- since "?<" would clearly be one of the "extended
> > operators", the programmer is supposed to look up the its
> > definition. Just like they would have to look up logMsg() or any
> > other function name that might show up in more than one project.
> > Just like they are supposed to magically know how clock() behaves
> > on different OS'es.
>
> There is a possibility of error caused by bad design (which isn't
> trimmed to the type of mistakes we fallible humans make) in your
> operator introduction scheme.

Same is true with free form function names, of course.

> >>You haven't shown me that it can be implemented in a compiler,
> >>even in theory. That should count for something.
> >
> > I don't think there's a credible assertion that my idea is
> > impossible. You want me to describe a whole compiler
> > implementation just to support my idea?
>
> I listed some specific objections with regards to disambiguation of
> the grammer that the parser would have to be able to handle.

Even without a full grammar being specified? That's a pretty good trick.

> > About a year ago, Slumberger gave an amazing talk about implementing pico-java
> > bytecode into a smart card. It was incredible, because it was obvious that
> > they would be able to use all the Java tools/libraries, they could simulate it
> > almost perfectly in software -- mostly inheriting the pico-java architecture,
> > and because of Java's bignum class, it meant that exposing a widening multiply
> > and carry propogating add functionality in their smart card would be enough to
> > automatically have all the software they need for RSA all ready to go.
>
> ...So the smartcard presumes it's inserted into a machine that has these
> classes available?

No ...

> [...] They don't reside on the card itself....

Yes they do. That's the point of creating a pico-java assembly
language. Its so that you just have to upload a minimal java kernel to
it, and all of a sudden you have the whole world's Java libraries at your
disposal.

> [...] Well, they
> should be using Java then, I suppose. No need to augment C.

You misunderstand the point. Most of this kind of development
actually *does* happen in C + assembly language. Slumberger just
demonstrated that using Java is far superior, for the more general
reason that they get to inherit the entire Java infrastructure to help
them design their smart card software. But hidden in all of this is
the fact that C is further penalized quite specifically on the issue
of bignums which become a huge pain in the ass to deal with. There is
*NO* infrastructure for porting bignum functionality in C, using
embedded frameworks, gcc, or otherwise. Its all just "roll your own"
territory.
 
> > Ok, a large 2s complement integer is filled with an
> > array of unsigned sub-integers, except for the one at the top. *That* one is
> > signed, and that's how you know if the bignum integer is negative or not. The
> > problem is that the act of addition can cause a roll around at the top word
> > which requires an expansion of bignum by an extra integer in size. The logic
> > you use to determine this is just related to examining the carry flag.
>
> Great. Now what has all this to do with defining a sensible signed <+
> semantics for use in C (you know, a general purpose language, not tied
> to your particular pet peeve problem).

Oh yeah my particular pet peeve problem, which only every serious CPU
company on the planet is only too willing to solve for me.
 
> > It all just comes down to the carry flag. The other flags are not
> > necessary for bignum libraries. Maybe there's a way to transform
> > the OF into something that can be used in a bignum library, but I
> > am not aware of it.
>
> Because, we're talking about expanding the C language here (not about
> your fairy-tale operator introduction hocus-pocus). You need decent
> semantics.

CPU companies have laid out the decent semantics. I and others have
written bignum routines -- the carry flag is where its at as far as
this is concerned.
 
> >>>>>>>- var is set to the result of the addition; the remainder if
> >>>>>>>a carry occurs.
> >>>>>>
> >>>>>>What happens if the signedness of var, a, and b are not equal?
> >>>>
> >>>>>It just behaves like the ADC x86 assembly instruction, the
> >>>>>details of which I will not regurgitate here.
> >>>>
> >>>>So on adding a signed plus unsigned.... The carry gets a value
> >>>>"as if" the signed value was actually unsigned?
> >>
> >>>Yes. Note that x86 does not have a version of ADC that considers
> >>>the sign of its operands in any special way that is reflected in
> >>>the CF.
> >>
> >>Of course, but you have to know whether your operands are signed or
> >>unsigned to see whether the CF or the OF is the important flag.
> >
> > OF is never the important flag. Its not relevant at all.
>
> I guess that's why everything since the 4004 supports it.

That's because OF is used for <,>,<=,>= comparisons. Outside of
this, I have never seen OF being used for any purpose. I.e., I would
surmise that the only relevant usage of OF is already encoded in the
language semantics of C (and just about any other language with
integers.) But I could be wrong -- can you name at least one other
purpose which isn't really just an encoding of <,>,<=,>= ?
 
> You must really consider that there's more to life than multiplying big
> numbers (and that's coming from one of the few people that sees the fun
> in it). We're not going to extend C with an operator to fit this (or
> ANY) one purpose, however important it may be.

Carry flag comes up time and again. Just look it up on google:

    http://www.google.com/search?hl=en&lr=&ie=ISO-8859-
1&safe=off&c2coff=1&q=jc+ret+mov

Off the top of my head, there's chain shifting, bit counting, and
predication/masking.

> >>I'm still wondering about the high_order_value() and low_order_value()
> >>semantics I mentioned in my previous post, actually the part which I was
> >>most curious about.
> >
> > I wasn't able to suss out a coherent question from your post, so I just deleted
> > this discussion.
>
> ...And there I was thinking you had trouble answering it. What was I
> thinking?
>
> > In this case, I am just copying the x86's MUL -> EDX:EAX or
> > AMD64's MUL -> RDX:RAX semantics.
>
> So again (as with the carry vs. overflow), you're going to perform an
> instruction that works on UNsigned integers even on signed integers.

Yes.

> Pardon my French, but that's just plain stupid. Perhaps you should be a
> little less confident about your own understanding of bit-twiddling...

Perhaps you should consider that there's a reason every assembly
language in existence associated signedness with particular
*OPERATIONS* not the *OPERANDS*. Having signed and non-signed versions of
every operator may be something that fit C when it was first designed, but that
doesn't make it necessarily correct for all relevant computations.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/


Relevant Pages

  • Re: The Promise of Forth
    ... Do you think Ada and PL/1 would be as high as they are ... They can keep a language alive at the fringes. ... still another bunch of complications when the stack holds mixed types. ... For a different data type it would have to be *completely* rewritten: ...
    (comp.lang.forth)
  • Re: FORTH levels
    ... Most working on a collaborative project do not choose the programming language they are using: it is thrust upon them by the needs of the collaboration. ... When Iverson and Hui came up with J-- in part to remove APL's special character set and make it more "user friendly" not much of a community formed around it. ... But RPN does not require a visible stack, any more than any language requires a visible stack to rebuild its semantic trees from its flat expression. ...
    (comp.lang.forth)
  • Re: Statement on Schildt submitted to wikipedia today
    ... the C language definition clearly says otherwise. ... stack functionality at a minimum ... There was because it became unfashionably "racist" to speak of German ... have an appropriate newsgroups line in your header for your mail to be seen, ...
    (comp.lang.c.moderated)
  • Re: [OT] PostLisp, a language experiment
    ... a preconception that there isn't a language for which longer ... > latter method in Lisp, and C and many other languages so that the code ... >> apparently no runtime checks for stack mismatches, ... > the complexity and number of stack items for each definition. ...
    (comp.lang.lisp)
  • LSE64 - reference
    ... Strings are represented by arrays in which cell 0 is the number of characters and the remaining cells are ... Variables and arrays yield their addresses when executed. ... The flag register is separate from the stack. ...
    (comp.lang.forth)

Loading