Re: Is C99 the final C? (some suggestions)

From: Paul Hsieh (qed_at_pobox.com)
Date: 12/03/03


Date: Wed, 03 Dec 2003 20:58:13 GMT

In article <bqj1tg$rb3$1@news.tudelft.nl>, sidney@jigsaw.nl says...
> Paul Hsieh wrote:
> > Sidney Cadot <sidney@jigsaw.nl> wrote:
> >>[...] I for one would be happy if more compilers would
> >>fully start to support C99, It will be a good day when I can actually
> >>start to use many of the new features without having to worry about
> >>portability too much, as is the current situation.
>
> > I don't think that day will ever come. In its totallity C99 is almost
> > completely worthless in real world environments. Vendors will be
> > smart to pick up restrict and few of the goodies in C99 and just stop
> > there.
>
> Want to take a bet...?

Sure. Vendors are waiting to see what the C++ people do, because they are well
aware of the unreconcilable conflicts that have arisen. Bjarne and crew are
going to be forced to take the new stuff C99 in the bits and pieces that don't
cause any conflict or aren't otherwise stupid for other reasons. The Vendors
are going to look at this and decide that the subset of C99 that the C++ people
chose will be the least problematic solution and just go with that.
 
> > If instead, the preprocessor were a lot more functional, then you
> > could simply extract packed offsets from a list of declarations and
> > literally plug them in as offsets into a char[] and do the slow memcpy
> > operations yourself.
>
> This would violate the division between preprocessor and compiler too
> much (the preprocessor would have to understand quite a lot of C semantics).

No, that's not what I am proposing. I am saying that you should not use
structs at all, but you can use the contents of them as a list of comma
seperated entries. With a more beefed up preprocessor one could find the
offset of a packed char array that corresponds to the nth element of the list
as a sum of sizeof()'s and you'd be off to the races. So I am not proposing
that the preprocessor know anything more about the C language at all. I am
instead proposing that it be better at what it *does* know about -- numbers,
macros, and various C-language compatible tokens.
 
> >>* a clear statement concerning the minimal level of active function
> >> calls invocations that an implementation needs to support.
> >> Currently, recursive programs will stackfault at a certain point,
> >> and this situation is not handled satisfactorily in the standard
> >> (it is not adressed at all, that is), as far as I can tell.
>
> > That doesn't seem possible. The amount of "stack" that an
> > implementation might use for a given function is clearly not easy to
> > define. Better to just leave this loose.
>
> It's not easy to define, that's for sure. But to call into recollection
> a post from six weeks ago: [...] ...This is legal C (as per the Standard),
> but it overflows the stack on any implementation (which is usually a
> sumptom of UB). Why is there no statement in the standard that even so much
> as hints at this?

isgraph(-1) is also legal C -- *SYNTACTICALLY*. There is no end of problems
with the C programming environment. To gripe about runtime stack depth
limitations alone I think is kind of pointless. C is a language suitable for
and high encouraging of writing extremely unsound and poor code. Fixing it
would require a major overhaul of the language and library.
 
> >>* a library function that allows the retrieval of the size of a memory
> >> block previously allocated using "malloc"/"calloc"/"realloc" and
> >> friends.
> >
> > There's a lot more that you can do as well. Such as a tryexpand()
> > function which works like realloc except that it performs no action
> > except returning with some sort of error status if the block cannot be
> > resized without moving its base pointer. Further, one would like to
> > be able to manage *multiple* heaps, and have a freeall() function --
> > it would make the problem of memory leaks much more manageable for
> > many applications. It would almost make some cases enormously faster.
>
> But this is perhaps territory that the Standard should steer clear of,
> more like something a well-written and dedicated third-party library
> could provide.

But a third party library can't do this portably. Its actual useful
functionality that you just can't get from the C language, and there's no way
to reliably map such functionality to the C language itself. One is forced to
know the details of the underlying platform to implement such things. Its
something that really *should* be in the language.
 
> >>* a #define'd constant in stdio.h that gives the maximal number of
> >> characters that a "%p" format specifier can emit. Likewise, for
> >> other format specifiers such as "%d" and the like.
> >>
> >>* a printf format specifier for printing numbers in base-2.
>
> > Ah -- the kludge request.
>
> I'd rather see this as filling in a gaping hole.
>
> > Rather than adding format specifiers one at
> > a time, why not instead add in a way of being able to plug in
> > programmer-defined format specifiers?
>
> Because that's difficult to get right (unlike a proposed binary output
> form).

There are sources for snprintf available that can do it. You are asking for
this feature because you think it would be useful *FOR YOU*. I convert hex to
binary in my head without barely thinking and would rather use the screen space
for more pertinant things, so it would not be useful for me. My proposal
allows the programmer to decide what is or is not useful them.
 
> > I think people in general would
> > like to use printf for printing out more than just the base types in a
> > collection of just a few formats defined at the whims of some 70s UNIX
> > hackers. Why not be able to print out your data structures, or
> > relevant parts of them as you see fit?
>
> The %x format specifier mechanism is perhaps not a good way to do this,
> if only because it would only allow something like 15 extra output formats.

I'm not sure what you are saying here. You all of a sudden don't like the hex
printing format? And why is having more, user definable print formats a bad
thing?
 
> >>* I think I would like to see a real string-type as a first-class
> >> citizen in C, implemented as a native type. But this would open
> >> up too big a can of worms, I am afraid, and a good case can be
> >> made that this violates the principles of C too much (being a
> >> low-level language and all).
> >
> > The problem is that real string handling requires memory handling.
> > The other primitive types in C are flat structures that are fixed
> > width. You either need something like C++'s constructor/destructor
> > semantics or automatic garbage collection otherwise you're going to
> > have some trouble with memory leaking.
>
> A very simple reference-counting implementation would suffice. [...]

This would complexify the compiler to no end. Its also hard to account for a
reference that was arrived at via something like "memcpy".

> >>* Normative statements on the upper-bound worst-case asymptotic
> >> behavior of things like qsort() and bsearch() would be nice.
> >
> > Yeah, it would be nice to catch up to where the C++ people have gone
> > some years ago.
>
> I don't think it is a silly idea to have some consideration for
> worst-case performance in the standard, especially for algorithmic
> functions (of which qsort and bsearch are the most prominent examples).

Perhaps you misunderstand me. The fact the C committee *DIDN'T* do this is an
abomination. STL includes some kind of sorting mechanisms which are now
guaranteed to be O(n*log(n)) because of the existence of an algorithm called
"INTROSORT" (which is really just a quicksort that aborts when it realizes its
going too slow, and switches to heapsort -- but the authors think its clever
because they do this determiniation recursively.)
 
> >>* a "reverse comma" type expression, for example denoted by
> >> a reverse apastrophe, where the leftmost value is the value
> >> of the entire expression, but the right-hand side is also
> >> guaranteed to be executed.
> >
> > This seems too esoteric.
>
> Why is it any more esoteric than having a comma operator?

I didn't say was. I've never used the comma operator outside of an occasional
extra command at the end of the increment statement in a for loop in my life.
I consider comma to be esoteric as well.
 
> >>* triple-&& and triple-|| operators: &&& and ||| with semantics
> >> like the 'and' and 'or' operators in python:
> >>
> >> a &&& b ---> if (a) then b else a
> >> a ||| b ---> if (a) then a else b
> >>
> >> (I think this is brilliant, and actually useful sometimes).
> >
> > Hmmm ... why not instead have ordinary operator overloading?
>
> I'll provide three reasons.
>
> 1) because it is something completely different

Yeah its a superset that has been embraced by the C++ community.

> 2) because it is quite unrelated (I don't get the 'instead')

I'm saying that you could have &&&, |||, but just don't defined what they
actually do. Require that the programmer define what they do. C doesn't have
type-specific functions, and if one were to add in operator overloading in a
consistent way, then that would mean that an operator overload would have to
accept only its defined type. For this to be useful without losing the
operators that already exist in C, the right answer is to *ADD* operators. In
fact I would suggest that one simply defined a grammar for such operators, and
allow *ALL* such operators to be definable.

> 3) because operator overloading is mostly a bad idea, IMHO

Well, Bjarne Stroustrup has made a recent impassioned request to *REMOVE*
features from C++. I highly doubt that operator overloading is one that has
been made or would be taken seriously. I.e., I don't think a credible
population of people who have been exposed to it would consider it a bad idea.
 
> > While
> > this is sometimes a useful shorthand, I am sure that different
> > applications have different list cutesy compactions that would be
> > worth while instead of the one above.
>
> ... I'd like to see them. &&& is a bit silly (it's fully equivalent to
> "a ? b : 0") but ||| (or ?: in gcc) is actually quite useful.

But there are no end of little cheesy operators that one could add. For
example, a <> b to swap a and b, a <<< b to rotate a by b bits, @ a to find the
highest bit of a, etc., etc., etc. All of these are good, in some cases. And
I think that there would be no end to the number of useful operators that one
might like to add to a program. I think your proposal is DOA because you
cannot make a credible case as to why your operator in particular has any value
over any of number of other operators that you might like to add.

Adding operator overloading, however, would be a real extension and would in a
sense address *all* these issues.
 
> >>* a way to "bitwise invert" a variable without actually
> >> assigning, complementing "&=", "|=", and friends.
> >
> > Is a ~= a really that much of a burden to type?
>
> It's more a strain on the brain to me, why there are coupled
> assignment/operators for neigh all binary operators, but not for this
> unary one.

Ok, but then again this is just a particular thing with you.
 
> >>* 'min' and 'max' operators (following gcc: ?< and ?>)
> >
> > As I mentioned above, you might as well have operator overloading instead.
>
> Now I would ask you: which existing operator would you like to overload
> for, say, integers, to mean "min" and "max" ?

How about a <==> b for max and a >==< b for min? I personally don't care that
much.
 
> >>* a div and matching mod operator that round to -infinity,
> >> to complement the current less useful semantics of rounding
> >> towards zero.
>
> > Well ... but this is the very least of the kinds of arithmetic operator
> > extensions that one would want. A widening multiply operation is
> > almost *imperative*. It always floors me that other languages are not
> > picking this up. Nearly every modern microprocessor in existence has
> > a widening multiply operation -- because the CPU manufacturer *KNOW*
> > its necessary. And yet its not accessible from any language.
>
> ...It already is available in C, given a good-enough compiler. Look at
> the code gcc spits out when you do:
>
> unsigned long a = rand();
> unsigned long b = rand();
>
> unsigned long long c = (unsigned long long)a * b;

Yes I'm sure the same trick works for chars and shorts. So how do you widen a
long long multiply?!?!? What compiler trick are you going to hope for to
capture this? What you show here is just some trivial *SMALL* multiply, that
relies on the whims of the optimizer.

PowerPC, Alpha, Itanium, UltraSPARC and AMD64 all have widening multiplies that
take two 64 bit operands and returns a 128 bit result in a pair of 64 bit
operands. They all invest a *LOT* of transistors to do this *ONE* operation.
They all *KNOW* you can't finagle any C/C++ compiler to produce the operation,
yet they still do it -- its *THAT* important (hint: SSL, and therefore *ALL* of
e-commerce, uses it.)
 
> > Probably because most languages have been written on top of C or C++.
> > And what about a simple carry capturing addition?
>
> Many languages exists where this is possible, they are called
> "assembly". There is no way that you could come up with a well-defined
> semantics for this.

carry +< var = a + b;
 
> Did you know that a PowerPC processor doesn't have a shift-right where
> you can capture the carry bit in one instruction? Silly but no less true.

What has this got to do with anything? Capturing carries coming out of shifts
don't show up in any significant algorithms that I am aware of that are
significantly faster than using what we have already. The specific operations
I am citing make a *HUGE* difference and have billion dollar price tags
associated with them.

I understand the need for the C language standard to be applicable to as many
platforms as possible. But unlike some right shift detail that you are talking
about, the widening multiply hardware actually *IS* deployed everywhere.
 

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/


Relevant Pages

  • Re: How to suppress autotext US date format after typing current y
    ... The date drop down merely reflects the language at the cursor ... American designed template to be truly UK English. ... I have all the usual settings corrected to ... the date format Word forces on you is ...
    (microsoft.public.word.docmanagement)
  • Re: QDE (Quick Date Entry)
    ... Also if you want to show the results in the default date setting format ... me an advantage of querying the registry settings. ... This is always a decision between flexibility ... > learn a form of syntax for a very simple display language. ...
    (microsoft.public.excel.worksheet.functions)
  • Re: Rene is a hypocrite (OK, what else is new?)
    ... > language to it, you will need to add N translators to convert from the new ... > intermediate format, and one to convert the other way around. ... it would be nice to automatically translate MASM32 includes into ... requires a bit of manual intervention after conversion). ...
    (alt.lang.asm)
  • Re: Is C99 the final C? (some suggestions)
    ... > that the preprocessor know anything more about the C language at all. ... Why is there no statement in the standard that even so much ... > This would complexify the compiler to no end. ...
    (comp.lang.c)
  • Re: Is [protected and/or internal] to be avoided?
    ... First your crappy language allows for several classes interfaces in one ... my worst abuse was playing around with operator overloading. ... adding a NodeList with a Node was an append, adding two NodeLists was a ... features every couple of years, but it's now a relatively big language ...
    (microsoft.public.dotnet.languages.csharp)