Re: Is C99 the final C? (some suggestions)
From: Paul Hsieh (qed_at_pobox.com)
Date: 12/05/03
- Next message: Michael Steve: "Re: Are the functions "time" and "scandir" thread-safe?"
- Previous message: Ed Morton: "Re: when GOTO makes sense."
- In reply to: Sidney Cadot: "Re: Is C99 the final C? (some suggestions)"
- Next in thread: Dan Pop: "Re: Is C99 the final C? (some suggestions)"
- Reply: Dan Pop: "Re: Is C99 the final C? (some suggestions)"
- Reply: Sidney Cadot: "Re: Is C99 the final C? (some suggestions)"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: 4 Dec 2003 16:12:48 -0800
Sidney Cadot <sidney@jigsaw.nl> wrote:
> Paul Hsieh wrote:
> > In article <bqj1tg$rb3$1@news.tudelft.nl>, sidney@jigsaw.nl says...
> >>Paul Hsieh wrote:
> >>>Sidney Cadot <sidney@jigsaw.nl> wrote:
> >>>>[...] I for one would be happy if more compilers would
> >>>>fully start to support C99, [...]
>
> >>>I don't think that day will ever come. In its totallity C99 is almost
> >>>completely worthless in real world environments. Vendors will be
> >>>smart to pick up restrict and few of the goodies in C99 and just stop
> >>>there.
> >>Want to take a bet...?
> >
> > Sure. Vendors are waiting to see what the C++ people do, because they
> > are well aware of the unreconcilable conflicts that have arisen. Bjarne
> > and crew are going to be forced to take the new stuff C99 in the bits and
> > pieces that don't cause any conflict or aren't otherwise stupid for other
> > reasons. The Vendors are going to look at this and decide that the
> > subset of C99 that the C++ people chose will be the least problematic
> > solution and just go with that.
>
> Ok. I'll give you 10:1 odds; there will be a (near-perfect) C99 compiler
> by the end of this decade.
A single vendor?!?! Ooooh ... try not to set your standards too high.
Obviously, its well known that the gnu C++ people are basically converging
towards C99 compliance and are most of the way there already. That's not my
point. My point is that will Sun, Microsoft, Intel, MetroWerks, etc join the
fray so that C99 is ubiquitous to the point of obsoleting all previous C's for
all practical purposes for the majority of developers? Maybe the Comeau guy
will join the fray to serve the needs of the "perfect spec compliance" market
that he seems to be interested in.
If not, then projects that have a claim of real portability will never
embrace C99 (like LUA, or Python, or the JPEG reference implementation, for
example.) Even the average developers will forgo the C99 features for fear
that someone will try compile their stuff on an old compiler.
Look, nobody uses K&R-style function declarations anymore. The reason is
because the ANSI standard obsoleted them, and everyone picked up the ANSI
standard. That only happened because *EVERYONE* moved forward and picked up
the ANSI standard. One vendor is irrelevant.
> >>>If instead, the preprocessor were a lot more functional, then you
> >>>could simply extract packed offsets from a list of declarations and
> >>>literally plug them in as offsets into a char[] and do the slow memcpy
> >>>operations yourself.
> >>
> >>This would violate the division between preprocessor and compiler too
> >>much (the preprocessor would have to understand quite a lot of C semantics).
> >
> > No, that's not what I am proposing. I am saying that you should not use
> > structs at all, but you can use the contents of them as a list of comma
> > seperated entries. With a more beefed up preprocessor one could find the
> > offset of a packed char array that corresponds to the nth element of the list
> > as a sum of sizeof()'s and you'd be off to the races.
>
> Perhaps I'm missing something here, but wouldn't it be easier to use the
> offsetof() macro?
It would be, but only if you have the packed structure mechanism. Other
people have posted indicating that in fact _Packed is more common that I
thought, so perhaps my suggestion is not necessary.
> > C is a language suitable for
> > and high encouraging of writing extremely unsound and poor code. Fixing it
> > would require a major overhaul of the language and library.
>
> That's true. I don't quite see how this relates to the preceding
> statement though.
I'm saying that trying to fix C's intrinsic problems shouldn't start or end
with some kind of resolution of call stack issues. Anyone who understands
machine architecture will not be surprised about call stack depth limitations.
There are far more pressing problems in the language that one would like to
fix.
> >>>There's a lot more that you can do as well. Such as a tryexpand()
> >>>function which works like realloc except that it performs no action
> >>>except returning with some sort of error status if the block cannot be
> >>>resized without moving its base pointer. Further, one would like to
> >>>be able to manage *multiple* heaps, and have a freeall() function --
> >>>it would make the problem of memory leaks much more manageable for
> >>>many applications. It would almost make some cases enormously faster.
> >>
> >>But this is perhaps territory that the Standard should steer clear of,
> >>more like something a well-written and dedicated third-party library
> >>could provide.
>
> > But a third party library can't do this portably.
>
> I don't see why not?
Explain to me how you implement malloc() in a *multithreaded* environment
portably. You could claim that C doesn't support multithreading, but I highly
doubt your going to convince any vendor that they should shut off their
multithreading support based on this argument. By dictating its existence in
the library, it would put the responsibility of making it work right in the
hands of the vendor without affecting the C standards stance on not
acknowledging the need for multithreading.
> > Its actual useful functionality that you just can't get from the C
> > language, and there's no way to reliably map such functionality to the C
> > language itself. One is forced to know the details of the underlying
> > platform to implement such things. Its something that really *should* be
> > in the language.
>
> Well, it looks to me you're proposing to have a feature-rich heap
> manager. I honestly don't see you this couldn't be implemented portably
> without platform-specific knowledge. Could you elaborate?
See my multithreading comment above. Also, efficient heaps are usually
written with a flat view of memory in mind. This kind of is impossible in
non-flat memory architectures (like segmented architectures.)
> [...] I want this more for reasons of orthogonality in design than anything
> else.
You want orthogonality in the C language? You must be joking ...
> > My proposal allows the programmer to decide what is or is not useful them.
>
> I'm all for that.
Well, I'm a programmer, and I don't care about binary output -- how does your
proposal help me decide what I think is useful to me?
> >>> I think people in general would like to use printf for printing out
> >>> more than just the base types in a collection of just a few formats
> >>> defined at the whims of some 70s UNIX hackers. Why not be able to
> >>> print out your data structures, or relevant parts of them as you see
> >>> fit?
>
> I don't think it's too bad an idea (although I have never gotten round
> to trying the mechanism gcc provides for this). In any case, this kind
> of thing is so much more naturally done in a OOP-supporting language
> like C++ . Without being bellingerent: why not use that if you want this
> kind of thing?
Well, when I am programming in C++ I will use it. But I'm not going to move
all the way to using C++ just for this single purpose by itself.
> I used "%x" as an example of a format specifier that isn't defined ('x'
> being a placeholder for any letter that hasn't been taken by the
> standard). The statement is that there'd be only 15 about letters left
> for this kind of thing (including 'x' by the way -- it's not a hex
> specifier). Sorry for the confusion, I should've been clearer.
Well what's wrong with %@, %*, %_, %^, etc?
> >>>>* I think I would like to see a real string-type as a first-class
> >>>> citizen in C, implemented as a native type. But this would open
> >>>> up too big a can of worms, I am afraid, and a good case can be
> >>>> made that this violates the principles of C too much (being a
> >>>> low-level language and all).
> >>>
> >>>The problem is that real string handling requires memory handling.
> >>>The other primitive types in C are flat structures that are fixed
> >>>width. You either need something like C++'s constructor/destructor
> >>>semantics or automatic garbage collection otherwise you're going to
> >>>have some trouble with memory leaking.
> >>
> >>A very simple reference-counting implementation would suffice. [...]
> >
> > This would complexify the compiler to no end. Its also hard to account for a
> > reference that was arrived at via something like "memcpy".
>
> A first-class citizen string wouldn't be a pointer; neither would you
> necessarily be able to get its address (although you should be able to
> get the address of the characters it contains).
But a string has variable length. If you allow strings to be mutable, then
the actual sequence of characters has to be put into some kind of dynamic
storage somewhere. Either way, the base part of the string would in some way
have to be the storable into, say a struct. But you can copy a struct via
memcpy or however. But this then requires a count increment since there is
now an additional copy of the string. So how is memcpy supposed to know that
its contents contain a string that it needs to increase the ref count for?
Similarly, memset needs to know how to *decrease* such a ref count.
If you allow the base of the string itself to move (like those morons did in
the Safe C String Library) then a simple things like:
string *a, b;
a = (string *) malloc (sizeof (string));
*a = b;
b = b + b + b; /* triple up b, presumably relocating the base */
/* But now *a is undefined */
are just broken.
Look, the semantics of C just don't easily allow for a useful string primitive
that doesn't have impact on the memory model (i.e., leak if you aren't
careful.) Even the Better String Library (http://bstring.sf.net/) concedes
that the programmer has to dilligently call bdestroy() to clean up after
themselves, otherwise you'll just leak.
> >>2) because it is quite unrelated (I don't get the 'instead')
>
> > I'm saying that you could have &&&, |||, but just don't defined what they
> > actually do. Require that the programmer define what they do. C doesn't have
> > type-specific functions, and if one were to add in operator overloading in a
> > consistent way, then that would mean that an operator overload would have to
> > accept only its defined type.
>
> Ok, so the language should have a big bunch of operators, ready for the
> taking. Incidentally, Mathematica supports this, if you want it badly.
Hey, its not me -- apparently its people like you who wants more operators.
My point is that no matter what operators get added to the C language, you'll
never satisfy everyone's appetites. People will just want more and more,
though almost nobody will want all of what could be added.
My solution solves the problem once and for all. You have all the operators
you want, with whatever semantics you want.
> > For this to be useful without losing the
> > operators that already exist in C, the right answer is to *ADD* operators. In
> > fact I would suggest that one simply defined a grammar for such operators, and
> > allow *ALL* such operators to be definable.
>
> This seems to me a bad idea for a multitude of reasons. First, it would
> complicate most stages of the compiler considerably. Second, a
> maintenance nightmare ensues: while the standard operators of C are
> basically burnt into my soul, I'd have to get used to the Fantasy
> Operator Of The Month every time I take on a new project, originally
> programmed by someone else.
Yes, but if instead of actual operator overloading you only allow redefinition
of these new operators, there will not be any of the *surprise* factor. If
you see one of these new operators, you can just view it like you view an
unfamilliar function -- you'll look up its definition obviously.
> There's a good reason that we use things like '+' and '*' pervasively,
> in many situations; they are short, and easily absorbed in many
> contexts. Self-defined operator tokens (consisting, of course, of
> 'atomic' operators like '+', '=', '<' ...) will lead to unreadable code,
> I think; perhaps something akin to a complicated 'sed' script.
And allowing people to define their own functions with whatever names they
like doesn't lead to unreadable code? Its just the same thing. What makes
your code readable is adherence to an agreed upon coding standard that exists
outside of what the language defines.
> >>3) because operator overloading is mostly a bad idea, IMHO
>
> > Well, Bjarne Stroustrup has made a recent impassioned request to *REMOVE*
> > features from C++.
>
> Do you have a reference? That's bound to be a fun read, and he probably
> missed a few candidates.
It was just in the notes to some meeting Bjarne had in the last year or so to
discuss the next C++ standard. His quote was something like that: while
adding a feature for C++ can have value, removing one would have even more
value. Maybe someone who is following the C++ standardization threads can
find a reference -- I just spent a few minutes on google and couldn't find it.
> > I highly doubt that operator overloading is one that has
> > been made or would be taken seriously. I.e., I don't think a credible
> > population of people who have been exposed to it would consider it a bad idea.
>
> I can only speak for myself; I have been exposed, and think it's a bad
> idea. When used very sparsely, it has it's uses. However, introducing
> new user-definable operators as you propose would be folly; the only way
> operator overloading works in practice is if you maintain some sort of
> link to the intuitive meaning of an operator. User defined operators
> lack this by definition.
But so do user definable function names. Yet, functionally they are almost
the same.
> >>>While
> >>>this is sometimes a useful shorthand, I am sure that different
> >>>applications have different list cutesy compactions that would be
> >>>worth while instead of the one above.
> >>
> >>... I'd like to see them. &&& is a bit silly (it's fully equivalent to
> >>"a ? b : 0") but ||| (or ?: in gcc) is actually quite useful.
>
> > But there are no end of little cheesy operators that one could add. For
> > example, a <> b to swap a and b, a <<< b to rotate a by b bits, @ a to find the
> > highest bit of a, etc., etc., etc.
>
> "<>" would be a bad choice, since it is easy to confuse for "not equal
> to". I've programmed a bit in IDL for a while, which has my dear "min"
> and "max" operators.... It's a pity they are denoted "<" and ">",
> leading to heaps of misery by confusion.
>
> <<< and @ are nice though. I would be almost in favour of adding them,
> were it not for the fact that this would drive C dangerously close in
> the direction of APL.
You missed the "etc., etc., etc." part. I could keep coming up with them
until the cows come home: a! for factorial, a ^< b for "a choose b" (you want
language supposed for this because of overflow concerns of using the direct
definition) <-> a for endian swapping, $% a for the fractional part of a
floating point number, a +>> b for the average (there is another overflow
issue), etc., etc.
> > All of these are good, in some cases. And I think that there would be no
> > end to the number of useful operators that one might like to add to a
> > program. I think your proposal is DOA because you cannot make a credible
> > case as to why your operator in particular has any value over any of
> > number of other operators that you might like to add. Adding operator
> > overloading, however, would be a real extension and would in a sense
> > address *all* these issues.
>
> Again I wonder, seriously: wouldn't you be better of using C++ ?
No because I want *MORE* operators -- not just the ability to redefine the
ones I've got (and therefore lose some.)
> >>>>* 'min' and 'max' operators (following gcc: ?< and ?>)
> >>>
> >>>As I mentioned above, you might as well have operator overloading instead.
>
> Sure, but you're talking about something that goes a lot further than
> run-off-the-mill operator overloading. I think the simple way would be
> to just introduce these min and max operators and be done with it.
>
> "min" and "max" are perhaps less important than "+" and "*", but they
> are probably the most-used operations that are not available right now
> as operators. If we are going to extend C with new operators, they would
> be the most natural choice I think.
WATCOM C/C++ defined the macros min(a,b) and max(a,b) in some header files.
Why wouldn't the language just accept this? Is it because you want variable
length parameters? -- Well in that case, does my preprocessor extension
proposal start to look like its making more sense?
> >>Now I would ask you: which existing operator would you like to overload
> >>for, say, integers, to mean "min" and "max" ?
>
> > How about a <==> b for max and a >==< b for min? I personally don't care that
> > much.
>
> Those are not existing operators, as you know. They would have to be
> defined in your curious "operator definition" scheme.
>
> I find the idea freaky, yet interesting. I think C is not the place for
> this (really, it would be too easy to compete in the IOCCC) but perhaps
> in another language... Just to follow your argument for a bit, what
> would an "operator definition" declaration look like for, say, the "?<"
> min operator in your hypothetical extended C?
This is what I've posted elsewhere:
int _Operator ?< after + (int a, int b) {
if (a > b) return a;
return b;
}
> > Yes I'm sure the same trick works for chars and shorts. So how do you
> > widen a long long multiply?!?!? What compiler trick are you going to
> > hope for to capture this? What you show here is just some trivial
> > *SMALL* multiply, that relies on the whims of the optimizer.
>
> Well, I'd show you, but it's impossible _in principle_. Given that you
> are multiplying two expressions of the widest type supported by your
> compiler, where would it store the result?
In two values of the widest type -- just like how just about every
microprocessor which has a multiply does it:
high *% low = a * b;
> > PowerPC, Alpha, Itanium, UltraSPARC and AMD64 all have widening multiplies that
> > take two 64 bit operands and returns a 128 bit result in a pair of 64 bit
> > operands. They all invest a *LOT* of transistors to do this *ONE* operation.
> > They all *KNOW* you can't finagle any C/C++ compiler to produce the operation,
> > yet they still do it -- its *THAT* important (hint: SSL, and therefore *ALL* of
> > e-commerce, uses it.)
>
> Well, I don't know if these dozen-or-so big-number 'powermod' operations
> that are needed to establish an SSL connection are such a big deal as
> you make it.
Its not me -- its Intel, IBM, Motorola, Sun and AMD who seem to be obsessed
with these instructions. Of course Amazon, Yahoo and Ebay and most banks are
kind of obsessed with them too, even if they don't know it.
> >>>Probably because most languages have been written on top of C or C++.
> >>>And what about a simple carry capturing addition?
> >>
> >>Many languages exists where this is possible, they are called
> >>"assembly". There is no way that you could come up with a well-defined
> >>semantics for this.
>
> > carry +< var = a + b;
>
> It looks cute, I'll give you that. Could you please provide semantics?
> It may be a lot less self evident than you think.
How about:
- carry is set to either 1 or 0, depending on whether or not a + b overflows
(just follow the 2s complement rules of one of a or b is negative.)
- var is set to the result of the addition; the remainder if a carry occurs.
- The whole expression (if you put the whole thing in parenthesese) returns
the result of carry.
+< would not be an operator in of itself -- the whole syntax is required.
For example: c +< v = a * b would just be a syntax error. The "cuteness" was
stolen from an idea I saw in some ML syntax. Obviously +< - would also be
useful.
> >>Did you know that a PowerPC processor doesn't have a shift-right where
> >>you can capture the carry bit in one instruction? Silly but no less true.
>
> > What has this got to do with anything? Capturing carries coming out of
> > shifts don't show up in any significant algorithms that I am aware of
> > that are significantly faster than using what we have already.
>
> Ah, I see you've never implemented a non-table-driven CRC or a binary
> greatest common divisor algorithm.
You can find a binary gcd algorithm that I wrote here:
http://www.pobox.com/~qed/32bprim.c
You will notice how I don't use or care about carries coming out of a right
shift. There wouldn't be enough of a savings to matter.
> [...] They are both hard at work when you establish an SSL connection.
>
> > The specific operations I am citing make a *HUGE* difference and have billion
> > dollar price tags associated with them.
>
> These numbers you made up from thin air, no? otherwise, I'd welcome a
> reference.
Widening multpilies cost transistor on the CPU. The hardware algorithms are
variations of your basic public school multiply algorithm -- so it takes n^2
transistors to perform the complete operation, where n is the largest bit
word that the machine accepts for the multiplier. If the multiply were not
widened they could save half of those transistors. So multiply those extra
transistors by the number of CPUs shipped with a widening multipliy (PPC,
x86s, Alphas, UltraSparcs, ... etc) and you easily end up in the billion
dollar range.
> > I understand the need for the C language standard to be applicable to as
> > many platforms as possible. But unlike some right shift detail that you
> > are talking about, the widening multiply hardware actually *IS* deployed
> > everywhere.
>
> Sure is. Several good big-number libraries are available that have
> processor-dependent machine code to do just this.
And that's the problem. They have to be hand written in assembly. Consider
just the SWOX Gnu multiprecision library. When the Itanium was introduced,
Intel promised that it would be great for e-commerce. The problem is that
the SWOX guys were having a hard time with IA64 assembly language (as
apparently lots of people are.) So they projected performance results for
the Itanium without having code available to do what they claim. So people
who wanted to consider using an Itanium system based on its performance for
e-commerce were stuck -- they had no code, and had to believe Intel's claims,
or SWOX's as to what the performance would be.
OTOH, if instead, the C language had exposed a carry propogating add, and a
widening multiply in the language, then it would just be up to the Intel
*compiler* people to figure out how to make sure the widening multiply was
used optimally, and the SWOX/GMP people would just do a recompile for baseline
results at least.
-- Paul Hsieh http://www.pobox.com/~qed/ http://bstring.sf.net/
- Next message: Michael Steve: "Re: Are the functions "time" and "scandir" thread-safe?"
- Previous message: Ed Morton: "Re: when GOTO makes sense."
- In reply to: Sidney Cadot: "Re: Is C99 the final C? (some suggestions)"
- Next in thread: Dan Pop: "Re: Is C99 the final C? (some suggestions)"
- Reply: Dan Pop: "Re: Is C99 the final C? (some suggestions)"
- Reply: Sidney Cadot: "Re: Is C99 the final C? (some suggestions)"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|