Re: Is C99 the final C? (some suggestions)
From: Paul Hsieh (qed_at_pobox.com)
Date: 12/04/03
- Next message: CBFalconer: "Re: AMD opteron 64"
- Previous message: Al Bowers: "Re: Getting address of structure to string array / pointer problem"
- In reply to: Keith Thompson: "Re: Is C99 the final C? (some suggestions)"
- Next in thread: Keith Thompson: "Re: Is C99 the final C? (some suggestions)"
- Reply: Keith Thompson: "Re: Is C99 the final C? (some suggestions)"
- Reply: Sidney Cadot: "Re: Is C99 the final C? (some suggestions)"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Thu, 04 Dec 2003 13:58:20 GMT
In article <lnk75encj9.fsf@nuthaus.mib.org>, kst-u@mib.org says...
> qed@pobox.com (Paul Hsieh) writes:
> > Sidney Cadot <sidney@jigsaw.nl> wrote:
> [...]
> > > * support for a "packed" attribute to structs, guaranteeing that no
> > > padding occurs.
> >
> > Indeed, this is something I use on the x86 all the time. The problem
> > is that on platforms like UltraSparc or Alpha, this will either
> > inevitably lead to BUS errors, or extremely slow performing code.
> >
> > If instead, the preprocessor were a lot more functional, then you
> > could simply extract packed offsets from a list of declarations and
> > literally plug them in as offsets into a char[] and do the slow memcpy
> > operations yourself.
>
> Obviously an implementation of packed structures is useless if it
> leads to bus errors.
>
> There's ample precedent in other languages (Pascal and Ada at least)
> for packed structures. [...] You can't sensible take the address of
> packed_obj.i. A function that takes an "int*" argument will likely die if
> you give it a misaligned pointer (unless you want to allow _Packed as an
> attribute for function arguments). The simplest approach would be to forbid
> taking the address of a member of a packed structure (think of the members
> as fat bit fields). [...]
Then what would be the point of even calling it a "struct"? This is what I am
saying -- it leads to bus errors because of the rest of the language concepts
like taking the address of any value that is stored in a memory location.
> Another possibility (ugly but perhaps useful) is to make the address of a
> member of a packed field yield a void*.
No -- the problem is with the BUS error itself. The C language doesn't need
*EVEN MORE* ways of creating UB with otherise acceptable syntax. This is more
than just ugly is very very anti-intuitive.
The right answer is to give such pointers a special attribute, like,
"_Unaligned" (or simply reuse "_Packed".) The compiler would then enforce type
safety in the following way: A non-"_Unaligned" pointer may not accept that
value of an "_Unaligned" pointer, but the other way around is not true.
Certain functions like memcpy and memmove would then be declared with these
"_Unaligned" decorators. But programmers could go ahead and use the decorator
themselves so that unaligned accesses could be propogated arbitrarily down a
call stack in a well-defined and programmer controlled way. This would
precisely encapsulate what the programmer is trying to do without allowing the
compiler to produce unexpected BUS errors. Attempts to address an unaligned
pointer will be caught at compile time -- the perfect solution.
> > > * upgraded status of enum types (they are currently quite
> > > interchangeable with ints); deprecation of implicit casts from
> > > int to enum (perhaps supported by a mandatory compiler warning).
> >
> > I agree. Enums, as far as I can tell, are almost useless from a
> > compiler assisted code integrity point of view because of the
> > automatic coercion between ints and enums. Its almost not worth the
> > bothering to ever using an enum for any reason because of it.
>
> I don't think enums can be repaired without breaking tons of existing
> code. And they are useful as currently defined for defining names for
> a number of distinct integer values. If you want Pascal-like
> enumeration types, you'd need a new construct -- but I think having
> two distinct kinds of enumeration types would be too ugly for new
> users.
How about another decorator? Like: enum _Strict ____ {...}; ? Basically the
language would not auto-convert such enums to ints at all without an explicit
cast. Once again, under programmer control.
> > > * a clear statement concerning the minimal level of active function
> > > calls invocations that an implementation needs to support.
> > > Currently, recursive programs will stackfault at a certain point,
> > > and this situation is not handled satisfactorily in the standard
> > > (it is not adressed at all, that is), as far as I can tell.
> >
> > That doesn't seem possible. The amount of "stack" that an
> > implementation might use for a given function is clearly not easy to
> > define. Better to just leave this loose.
>
> Agreed. The limit on call depth is typically determined by the amount
> of available memory, something a compiler implementer can't say much
> about. You could sensibly add a call depth clause to the Translation
> Limits section (C99 5.2.4.1); that would the [require the] implementation to
> handle at least one program with a call depth of N, but wouldn't really
> guarantee anything in general.
Well the problem with this is that then the *LINKER* would have to have
augmented to analyze the max relevant stack size of all functions in an object
and then assign the final stack according to a formula (N * maxstacksz) that
makes this work. It also kind of makes use of alloca impossible.
> [...]
> > > * a #define'd constant in stdio.h that gives the maximal number of
> > > characters that a "%p" format specifier can emit. Likewise, for
> > > other format specifiers such as "%d" and the like.
> > >
> > > * a printf format specifier for printing numbers in base-2.
> >
> > Ah -- the kludge request. Rather than adding format specifiers one at
> > a time, why not instead add in a way of being able to plug in
> > programmer-defined format specifiers? I think people in general would
> > like to use printf for printing out more than just the base types in a
> > collection of just a few formats defined at the whims of some 70s UNIX
> > hackers. Why not be able to print out your data structures, or
> > relevant parts of them as you see fit?
>
> Well, you can do that with the "%s" specifier, as long as you've
> defined a function that returns an image string for a value of your
> type (with all the complications of functions returning dynamic
> strings).
Who is going to free the memory allocated for this string? If its static, then
what happens when you try to printf two such items -- or just try to use it in
a multitasking environment in general?
> > > * 'min' and 'max' operators (following gcc: ?< and ?>)
> >
> > As I mentioned above, you might as well have operator overloading instead.
>
> Most languages that provide operator overloading restrict it to
> existing operator symbols.
Yeah well most languages have real string primitives and built-in array range
checking too. Somehow I don't think what has been done in *other languages*
has any serious bearing on what should be done in C. To reiterate my proposal:
A whole *GRAMMAR* of symbols for operators could be added all of which have no
default definition, but which *can be* defined by the programmer, with
semantics similar to C's function declaration.
> [...] If you want "min" and "max" for int, there
> aren't any spare operator symbols you can use. If you want to allow
> overloading for arbitrary symbols (which some languages do), you'll
> need to decide how and whether the user can define precedence for the
> new operators.
Good point, but something as simple as "lowest precendence" and increasing in
the order in which they are declared seems fine enough. Or maybe inverted --
just play with those combinations to see what makes sense in practice. If
that's not good enough, then make the precedence level relative to another
operator at the time of declaration. For example:
int _Operator ?< after + (int x, int y) { /* max */
if (x > y) return x;
return y;
}
int _Operator ?> same ?< (int x, int y) { /* min */
if (x < y) return x;
return y;
}
> [...]
> > > Personally, I don't think it would be a good idea to have templates
> > > in C, not even simple ones. This is bound to have quite complicated
> > > semantics that I would not like to internalize.
> >
> > Right -- this would just be making C into C++. Why not instead
> > dramatically improve the functionality of the preprocessor so that the
> > macro-like cobblings we put together in place of templates are
> > actually good for something? I've posted elsewhere about this, so I
> > won't go into details.
>
> Hmm. I'm not sure that making the preprocessor *more* powerful is
> such a good idea. It's too easy to abuse as it is [...]
A *LOT* of C is easy to abuse. If you're worried about programmer you are
working with's abuse of the preprocessor then that's an issue between you and
that programmer.
> If you can improve the preprocessor without making it even more
> dangerous, that's great. (I don't think I've see your proposal.)
My proposal is to add preprocessor-only scoped variables:
#define $c 1
The idea is that "$c" could never show up as such a symbol in the C source
after preprocessing is done. And in cases where such a $___ variable has not
been defined you could insert an instance specific generated variable such as:
$c -> __PREPROCINST_MD5_09839fe8d98798fe8978de98799cfe01_c
so as to kind of put it into its own kind of "name-space" that is not really
"accessible" to the programmer. Where the MD5 obfuscation would come from a
source like: <filename><date,time><MD5(source)><the $varname> in an effort to
probabilistically avoid collisions across files (trust me, this is not as much
voodoo as you might think) in case some bozo turns this into a global
declaration.
The purpose is to allow for even more useful things like:
#for $c in #range(0,5)
printf ("Even: %d Odd: %d\n", 2*$c, 2*$c+1);
#endfor
/* #range(0,5) just expands to 0,1,2,3,4 and the #for loop works, kind of
python-like, just as you would expect. */
#define genStruct(name,#VARARGS) struct tag##name { #VARARGS };\
#for $c in #VARARGS
# define offsetof_##$c offsetof (tag##name, %c)
#endfor
/* Here the "\" is required to attach the #for, which then itself
has an implicit multi-line characteristic, so that the lines
up until the #endfor are sucked into the #define genStruct.
Also, #define's executed inside of a #for are repeatedly
executed for each iteration */
#define swap(type,x,y) { \
type $tmp = x; \
x = y; \
y = $tmp; \
}
/* In this case, without prior definition, $tmp is given an
obfuscated name by the time it reaches the C source code. */
-- Paul Hsieh http://www.pobox.com/~qed/ http://bstring.sf.net/
- Next message: CBFalconer: "Re: AMD opteron 64"
- Previous message: Al Bowers: "Re: Getting address of structure to string array / pointer problem"
- In reply to: Keith Thompson: "Re: Is C99 the final C? (some suggestions)"
- Next in thread: Keith Thompson: "Re: Is C99 the final C? (some suggestions)"
- Reply: Keith Thompson: "Re: Is C99 the final C? (some suggestions)"
- Reply: Sidney Cadot: "Re: Is C99 the final C? (some suggestions)"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|