Re: Increasing efficiency in C

From: Dan Pop (Dan.Pop_at_cern.ch)
Date: 03/05/04


Date: 5 Mar 2004 13:19:51 GMT

In <c28cj2$g6b$1@news-reader2.wanadoo.fr> "jacob navia" <jacob@jacob.remcomp.fr> writes:

>"Dan Pop" <Dan.Pop@cern.ch> a écrit dans le message de
>news:c27gec$4n5$1@sunnews.cern.ch...
>> In <c27e5q$c72$1@news-reader1.wanadoo.fr> "jacob navia"
><jacob@jacob.remcomp.fr> writes:
>>
>>
>> >"Dan Pop" <Dan.Pop@cern.ch> a écrit dans le message de
>> >news:c27a00$7hh$4@sunnews.cern.ch...
>> >> In <c25kuu$rae$1@news-reader4.wanadoo.fr> "jacob navia"
>> ><jacob@jacob.remcomp.fr> writes:
>> >I wanted to emphasize "unbounded" because there is no way to know if
>> >the zero is not there where the pointer will end pointing to...
>>
>> You don't know where the pointer will end pointing to. Your wording
>> simply didn't make any sense to anyone but you.
>>
>> The representation of a string in C is the sequence of characters, up to
>> and including the null terminator. No kind of pointer is involved in the
>> representation of a C string.
>
>Wow Dan, this is news for me. No kind of pointer?
>
>Not even a char * as it seems?

Not even.

>Strange. Are all those prototypes in string.h wrong?

Nope, they are correct. But they are not passed the representations of
strings, they are passed the addresses of strings. This is where pointers
get into the picture.

>I would fill a defect report.
>
>Just do not exaggerate Dan. Let's keep cool ok?

When did I get hot?

>I am speaking about a naked char * that points to the
>start of a sequence o bytes that should end with a
>terminating zero.

It is NOT part of the representation of a string any more than a pointer
to a double is part of the representation of a double.

>By definition of the data structure, its length is
>not known, and the same scan must be repeated
>each time we access the length.

But, if you access it repeatedly and the length matters, there is
nothing preventing you from keeping track of its length. Far too often,
the length doesn't matter and then, it would be wasteful to keep track of
it, nevertheless. Especially in a relatively low level language

>More serious, the failure modes are quite horrible.

No failure mode exists for correctly written code.

>In writing mode a wild pointer is like a loaded
>machine gun, ready to start shooting around
>at random. Pieces of the program, essential data
>like the return address are wiped by the gun,
>without any way for the system to stop it.

Which has zilch to do with strings and everything with the fact that C
supports pointers the way it does. And C pointers are largely responsible
for C's strength and popularity.

>The program is in an undeterminable state,
>depending on the direction the machine gun was
>shooting.

Again, nothing to do with C strings.

>Ahh. How nice. We are fearful. We risk that but
>it works you see?

Have I ever told you that C is a sharp tool?

>*I* do not do any mistake, you say.

Nope, I didn't say *anything* like that. Please have the minimal decency
to quote me correctly.

>Well Dan, just keep cool.

You'd better take your own advice.

>I have no fear to recognize that I do make mistakes.

Neither do I.

>You say:
>
>> >> It doesn't hurt to use your common sense in validating your opinions.
>> >> If C strings were "extremely inefficient", that would have been a much
>> >> bigger problem 30 years ago, when computers were orders of magnitude
>> >> slower than today. Yet, no one produced a fix then. No alternate
>> >> string libraries designed and implemented for C since then have
>> >> acquired any kind of popularity. ^^^^
>> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> >There are many Dan. Just search in Google and you will find zig libraries
>> >that implement this with different emphasis in different objectives.
>>
>> Are you reading impaired or what? Which of them qualifies as popular?
>
>Well, Microsoft proposed one recently. And there are several.
>I can't tell you which are "popular" since I am not doing
>that kind of research. But they are surely used.
                        ^^^^^^^^^^^^^^^^^^^^^^^^
Where does this certainty come from, since you haven't done any research
in this direction?

>> >The objective of this discussion is to see why the *language* doesn't
>> >support any other schema for implementing strings.
>>
>> No other scheme proved to by better in a GENERAL PURPOSE context.
>> As you admit yourself, the alternate libraries are designed for well
>> defined goals, rather than as universal replacements for the C strings.
>
>Safety was one of the more widespread goals.

C strings are perfectly safe.

>I am trying to
>build checked strings into lcc-win32. I think that a more
>debuggable environment is easier to work with.

What happens when the incompetent programmer modifies the bytes containing
the strings length, by misusing pointers?

Nothing is safe in the hands of the incompetent.

>> And the very existence of these libraries proves that the C language DOES
>> support alternate schemes. So, your point is moot.
>
>The language doesn't support it.

So, they're not implemented in C?

>I repeat that length prefixed strings should be easy to
>use: name[2] should do what is supposed to.

Null terminated C *are* easy to use. And both can be misused.

>My whole point is that data structure development should
>be opened up to the C user that should be able to
>specify data types that follow special rules he/she defines.

Isn't this already the case? Aren't users allowed to define they own
data structures?

>For instance you could add a "flags" field to the standard
>length prefixed string, and implement read only
>strings, or time stamp based data, or whatever.

Precisely. It's the user job to do anything it wants.

>The language should allow people defining programs
>that handle the data structures in a way it suits them the
>best.

It has being doing exactly this for the last 30 years. Have you been
sleeping?

>C is not object oriented but we all use lists, stacks, hash tables
>in our everyday programming.

So what?

>> >> Since C programmers aren't the last people to care about efficiency,
>> >> what conclusion can you draw?
>> >
>> >Since language support doesn't encourage the use of bounded pointers
>> >C string handling is much more error prone than it should be.
>>
>> 1. This is not a performance issue.
>
>No, this is a human performance issue. People get bored of
>details. Computers do not.

So, people develop libraries tuned to their *specific* needs.

>People use computers to make repetitive work. Why can't
>we use the computer to check for mistakes?

Because C pointers allow mistakes that can't be mechanically detected.

Because it is perfectly possible to make mistakes even in languages that
allow all the checking you're dreaming about.

>Your answer is:
>>
>> 2. This is a *general* problem of C: most C features are error prone in
>> the hands of the incompetent.
>
>Your are competent Dan. Surely more than me.
>
>I belong to the other ones.
>The ones that make mistakes. I am not afraid of saying this,
>maybe because I think knowing this is the start of
>knowledge.

Competent programmers make mistakes, too. Only perfect programmers don't.
However, it is a big difference between the mistakes made by competent
programmers and those made by incompetents.

>When you realize your mistakes you can start learning.
>Only then.

You start learning before that, but you also learn A LOT from your own
mistakes. But this has exactly zilch to do with a discussion about
counted strings...

>> >Never had the traps because of the missing zero?
>>
>> Nope.
>
>:-)
>
>Of course not Dan. Sure. I believe you that 100%

Believe me or not, this is one kind of mistake I am perfectly capable of
avoiding.

>> >The failure modes of the string functions in the library like strcpy
>> >are just horrible. Memory corruption is guaranteed unless a zero
>> >is found...
>>
>> Dynamic memory allocation has exactly the same problems: write beyond
>> a dynamically allocated memory bolck (in either direction) and memory
>> corruption will (most likely) bite you, sooner or later. What is your
>> better replacement for malloc and friends?
>
>The garbage collector. I wrote one for my Lisp interpreter
>in the 90ties, and I have adapted Mt Boehm's work to lcc-win32.
>
>The GC is much better than malloc/free. But I know, that's
>another discussion ...

AFAIK, Boehm's GC is not a complete solution and I am not aware of any
complete solution for C. The run time costs would exceed by far the
costs of malloc and friends.

>> C is a sharp tool *by design*. People who can't use sharp tools or are
>> afraid of them, should not use C. There are plenty of other programming
>> languages designed for them so there is no need to turn C into a less
>> sharp tool (and, therefore less effective in the hands of the competent
>> programmers) and annoy C's *intended* user base.
>
>I want it to be sharper Dan. C is not sharp enough
>with all those bugs that creep the programs.

The bugs are not C's problem. Show me one language where the incompetent
is guaranteed to write bug free code.

>You can't be sure of a tool if it is not designed to be sharp and safe.

Which part of C's design is unsafe?

>You take the knife not at the edge?

That's precisely the point. The safety of both C and the knife is
dictated by the way they are used. Both are safe when correctly used
and unsafe when misused.

>A knife is a sharp tool by its very nature.
>
>But it can only be used because
>you do not touch at the edge isn't it?

And what is your point?

>That blunt side, that provides safety for your hand
>makes for a usable knife. Without it, using a knife
>is cutting yourself in the fingers :-)

I found C's safety features perfectly adequate. Any additional safety
feature that affects its sharpness is not going to be accepted by the
competent C programmers.

>> There are many ways in which C needs to be extended, but adding more
>> string formats is not one of them. You're wasting your time trying to
>> fix something that isn't broken.
>
>That is the start.

A couple of weeks ago, bound checked pointers were a start. Have you
already finished that project? I'd like to play with your compiler that
implements bound checked pointers.

>A better string library would be an achievement.

You have completely failed to convince anyone that your idea of a string
library would be any better than the <string.h> stuff.

And, since N such "better" string libraries have already been implemented,
why bother with a new one? Just because all the others were not written
by you? (i.e. the NIH - not invented here - syndrome)

>Nothing spectacular, and very simple.

Is there anyone preventing you from doing the job? Hopefully, after
finishing the bound checked pointers...

BTW, what's the point in asking our opinions, if you're completely deaf to
them? Wouldn't your time be better spent working on your ideas, rather
than pointlessly arguing about them?

Dan

-- 
Dan Pop
DESY Zeuthen, RZ group
Email: Dan.Pop@ifh.de


Relevant Pages

  • Re: question on structs and memory blocks
    ... >>MyStruct and in fact it is better to do so since sizeofincludes any ... >>see a strlen in there your structure does not contain the strings ... >>but merely pointers to them, it is the cost of these pointers that ... the two machines must agree on a common external representation ...
    (comp.lang.c)
  • Re: A taxonomy of types
    ... however, elsewhere in my project (off in the dynamic typesystem, ...), I ... (since I am using NULL-terminated strings), and so I have used U+10FFFF ... remember, C also has things like arrays, funtion pointers, nestable ... int RIL_TypeSmallIntP; ...
    (comp.lang.misc)
  • Re: second fclose() should not segfault
    ... Programmers would be surprised if the following ... segments and offsets of pointers stored in an array. ... value is not the same as maintaining representation. ... Only when the standard prohibits user's inspection on the ...
    (comp.std.c)
  • Re: new IL: C (sort of...).
    ... only for "recent" Pascals, ... far pointers weren't really limited, ... in my compiler, I made wchar_t a builtin type (in most cases, aliased to ... I could very well include builtin "managed strings" in the new IL. ...
    (comp.lang.misc)
  • Re: HeapFree() Failing to deallocate string
    ... I've been able to recreate and isolate the problem with HeapFree(), ... in this simplified example is just a pointer to 16 bytes to store pointers ... the szCaption strings need to be copied to the pointed to ... strings for which storage needs to be allocated, ...
    (microsoft.public.windowsce.embedded)