Re: Strings in C are less optimal than in (say) Pascal - correct?



Gordon Burditt wrote:
I believe that in languages like Pascal, the length of a string is encoded in some sort of 'header' data structure, and that if a programmer wants to know the length of such a string, the resultant discovery is therefore very fast.


And making tail substrings of a string is made MUCH slower, since
in order to, say, generate a substring with leading spaces stripped
off, you have to COPY the string rather than simply increment a
pointer.  This can get very messy in trying to parse strings.


What you're doing on a conceptual level when you're "simply incrementing a pointer" (if we accept strings as a distinct data type, which is the point of the exercise) is generating a new string that is a slice of the existing string. The slice and original happen to share memory.


In a length-based system, this sort of operation would be accommodated by different means: the pointer would become an index into a string, or index-based operations are encapsulated by introducing an explicit type for string slices. A naive implementation would indeed be much slower in these cases, however -- of course, so would a naive implementation of repeated string concatenation in a terminator system be.

In C, functions like strlen() have to traverse the string in order to find the null terminator. This requires a comparison of each char in the string to '\0' of course.

Therefore, string functions (the majority I can think of all require some 'end searching') are necessarily slower than their Pascal counterparts.


There are a lot of things in C that are string functions that you
don't even think of as string functions.  These can get much, much
slower.

You're probably thinking of applications like the above: manipulating virtual strings through pointers. This system can be built on top of length-based strings -- at a cost, but a constant one.

<snip>
However, this would impose a limit on a string's length - but, surely, a practical one!?

Sure, just like the buffer passed to gets() is "long enough". NOT!!!


Well, store the length in a size_t and it will certainly be "long enough". Either that or you'll run out of addressable memory. If a 4G long string just isn't long enough, how the length of your strings is indicated will be a minor problem in comparison.


And it's a tad ironic that gets() would be cited as an example. In a system with length-based strings, gets() might crash your application with an out-of-memory, but you couldn't use it for a buffer overflow attack.

So, to the questions!

Are my assertions correct?


If you look at speeding up a small portion of the job at a potential
large cost of speed in the REST of the job, you can find something
that looks like a speed improvement which isn't.

And you could also find something that looks like a speed improvement which is, through the 90/10 rule. Use the right tool for the right job. Sometimes you *will* get definite improvements by keeping track of length. At other times, you won't.

The systems are interchangeable in that you can always explicitly keep track of length yourself in a terminator system (at a cost), and you can always use indices to simulate pointers in a length-based system (at a cost). C's choice for zero-terminated strings makes sense in light of its philosophy; so does Pascal's (and most object-oriented languages') choice of length-based strings.

Of course, endless flame wars can^H^H^H rage over what system ought to be picked as the basis... Asking which one is "optimal" is sure to provoke lots of factless debate, and I'm not sure the unqualified question has merit to begin with.

S.
.



Relevant Pages

  • Re: "Mastering C Pointers"....
    ... A pointer is a kind of variable that can "point to" some object. ... has a type (pointer to int), and a value of some kind. ... You may know that you can access these integers by using array notation ... The function will take one argument, a string, and will return the length ...
    (comp.lang.c)
  • Re: pesky Pointers !!
    ... > and the function takes it as a reference instead of a copy. ... function may access the string passed directly, ... > *px dereferences the pointer to get the value ... If pTest is a pointer-to-string, *pTest is the string it points to ...
    (alt.comp.lang.learn.c-cpp)
  • Re: strtok ( ) help
    ... > splitCommandssomehow modifying the pointer, but I HAVE to call that ... Here's an idea of how to use the strtok() function. ... don't mind trashing the contents of a string s, ... will give you a loop that extracts the tokens one at a time from s. ...
    (comp.lang.c)
  • Re: new IL: C (sort of...).
    ... C doesn't need a string type... ... variant of PL/1 which was very Pascal-ish. ... - C does implement an array declaration. ... effectively converted into a pointer that can be used with the offset ...
    (comp.lang.misc)
  • Re: copy a string into a 2d array of chars
    ... This split function should allocate a 2D array of chars ... >focus the program the string is not actually split. ... later) is an array of char containing the original contents of the ... The i-th pointer will contain the starting address of the ...
    (comp.lang.c)