Re: Increasing efficiency in C

From: jacob navia (jacob_at_jacob.remcomp.fr)
Date: 03/05/04


Date: Fri, 5 Mar 2004 00:02:56 +0100


"Dan Pop" <Dan.Pop@cern.ch> a écrit dans le message de
news:c27gec$4n5$1@sunnews.cern.ch...
> In <c27e5q$c72$1@news-reader1.wanadoo.fr> "jacob navia"
<jacob@jacob.remcomp.fr> writes:
>
>
> >"Dan Pop" <Dan.Pop@cern.ch> a écrit dans le message de
> >news:c27a00$7hh$4@sunnews.cern.ch...
> >> In <c25kuu$rae$1@news-reader4.wanadoo.fr> "jacob navia"
> ><jacob@jacob.remcomp.fr> writes:
> >I wanted to emphasize "unbounded" because there is no way to know if
> >the zero is not there where the pointer will end pointing to...
>
> You don't know where the pointer will end pointing to. Your wording
> simply didn't make any sense to anyone but you.
>
> The representation of a string in C is the sequence of characters, up to
> and including the null terminator. No kind of pointer is involved in the
> representation of a C string.
>

Wow Dan, this is news for me. No kind of pointer?

Not even a char * as it seems?

Strange. Are all those prototypes in string.h wrong?

I would fill a defect report.

Just do not exaggerate Dan. Let's keep cool ok?

I am speaking about a naked char * that points to the
start of a sequence o bytes that should end with a
terminating zero.

By definition of the data structure, its length is
not known, and the same scan must be repeated
each time we access the length.

More serious, the failure modes are quite horrible.

In writing mode a wild pointer is like a loaded
machine gun, ready to start shooting around
at random. Pieces of the program, essential data
like the return address are wiped by the gun,
without any way for the system to stop it.

The program is in an undeterminable state,
depending on the direction the machine gun was
shooting.

Ahh. How nice. We are fearful. We risk that but
it works you see?

*I* do not do any mistake, you say.

Well Dan, just keep cool.

I have no fear to recognize that I do make mistakes.
I am not a star programmer. I am a run of
the mill brain, that gets bored taking always this
new dangerous turn. Damm it. Can't the machine
do it for me?

You say:

> >> It doesn't hurt to use your common sense in validating your opinions.
> >> If C strings were "extremely inefficient", that would have been a much
> >> bigger problem 30 years ago, when computers were orders of magnitude
> >> slower than today. Yet, no one produced a fix then. No alternate
> >> string libraries designed and implemented for C since then have
> >> acquired any kind of popularity. ^^^^
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >There are many Dan. Just search in Google and you will find zig libraries
> >that implement this with different emphasis in different objectives.
>
> Are you reading impaired or what? Which of them qualifies as popular?
>

Well, Microsoft proposed one recently. And there are several.
I can't tell you which are "popular" since I am not doing
that kind of research. But they are surely used.

> >The objective of this discussion is to see why the *language* doesn't
> >support any other schema for implementing strings.
>
> No other scheme proved to by better in a GENERAL PURPOSE context.
> As you admit yourself, the alternate libraries are designed for well
> defined goals, rather than as universal replacements for the C strings.
>

Safety was one of the more widespread goals. I am trying to
build checked strings into lcc-win32. I think that a more
debuggable environment is easier to work with.

> And the very existence of these libraries proves that the C language DOES
> support alternate schemes. So, your point is moot.
>

The language doesn't support it.

I repeat that length prefixed strings should be easy to
use: name[2] should do what is supposed to.

My whole point is that data structure development should
be opened up to the C user that should be able to
specify data types that follow special rules he/she defines.

For instance you could add a "flags" field to the standard
length prefixed string, and implement read only
strings, or time stamp based data, or whatever.

The language should allow people defining programs
that handle the data structures in a way it suits them the
best.

C is not object oriented but we all use lists, stacks, hash tables
in our everyday programming.

> >> Since C programmers aren't the last people to care about efficiency,
> >> what conclusion can you draw?
> >
> >Since language support doesn't encourage the use of bounded pointers
> >C string handling is much more error prone than it should be.
>
> 1. This is not a performance issue.

No, this is a human performance issue. People get bored of
details. Computers do not.
People use computers to make repetitive work. Why can't
we use the computer to check for mistakes?

Your answer is:
>
> 2. This is a *general* problem of C: most C features are error prone in
> the hands of the incompetent.
>

Your are competent Dan. Surely more than me.

I belong to the other ones.
The ones that make mistakes. I am not afraid of saying this,
maybe because I think knowing this is the start of
knowledge.

When you realize your mistakes you can start learning.
Only then.

> >Never had the traps because of the missing zero?
>
> Nope.
>

:-)

Of course not Dan. Sure. I believe you that 100%

> >The failure modes of the string functions in the library like strcpy
> >are just horrible. Memory corruption is guaranteed unless a zero
> >is found...
>
> Dynamic memory allocation has exactly the same problems: write beyond
> a dynamically allocated memory bolck (in either direction) and memory
> corruption will (most likely) bite you, sooner or later. What is your
> better replacement for malloc and friends?
>

The garbage collector. I wrote one for my Lisp interpreter
in the 90ties, and I have adapted Mt Boehm's work to lcc-win32.

The GC is much better than malloc/free. But I know, that's
another discussion ...

> C is a sharp tool *by design*. People who can't use sharp tools or are
> afraid of them, should not use C. There are plenty of other programming
> languages designed for them so there is no need to turn C into a less
> sharp tool (and, therefore less effective in the hands of the competent
> programmers) and annoy C's *intended* user base.
>

I want it to be sharper Dan. C is not sharp enough
with all those bugs that creep the programs.
You can't be sure of a tool if it is not designed to
be sharp and safe.

You take the knife not at the edge?

A knife is a sharp tool by its very nature.

But it can only be used because
you do not touch at the edge isn't it?

That blunt side, that provides safety for your hand
makes for a usable knife. Without it, using a knife
is cutting yourself in the fingers :-)

> There are many ways in which C needs to be extended, but adding more
> string formats is not one of them. You're wasting your time trying to
> fix something that isn't broken.
>

That is the start. A better string library would be an achievement.

Nothing spectacular, and very simple.



Relevant Pages

  • Re: Displaying the contents of pointers in cobol...
    ... > example both as a string of characters and as a string of digits. ... > passing variables by reference are taken care of in the linkage ... > AFAIK, COBOL does not have an operator which de-references a pointer, ... So if you are actually passing a data structure which ...
    (comp.lang.cobol)
  • Re: Strings in C are less optimal than in (say) Pascal - correct?
    ... in some sort of 'header' data structure, and that if a programmer wants to know the length of such a string, the resultant discovery is therefore very fast. ... a pointer" is generating a new string that is a slice of the existing string. ... A naive implementation would indeed be much slower in these cases, however -- of course, so would a naive implementation of repeated string concatenation in a terminator system be. ...
    (comp.lang.c)
  • Re: STL vector push_back bug????
    ... pointer to a string is stored in TCITEM::pszText, ... You should define your own data structure and copy all data, ... struct MySafeTcItem ...
    (microsoft.public.vc.stl)
  • Re: Writing a Text Editor in Lisp
    ... It is hard to say which is the "best" data structure without knowing ... changed) a gap buffer to hold the entire text of the buffer. ... Multics Emacs used a doubly linked list of strings, ... editing operations on different lines, ...
    (comp.lang.lisp)
  • Re: "Mastering C Pointers"....
    ... A pointer is a kind of variable that can "point to" some object. ... has a type (pointer to int), and a value of some kind. ... You may know that you can access these integers by using array notation ... The function will take one argument, a string, and will return the length ...
    (comp.lang.c)