Re: Strings in C are less optimal than in (say) Pascal - correct?
- From: Skarmander <invalid@xxxxxxxxxxxxxx>
- Date: Thu, 03 Nov 2005 23:22:14 +0100
Gordon Burditt wrote:
I believe that in languages like Pascal, the length of a string is encoded in some sort of 'header' data structure, and that if a programmer wants to know the length of such a string, the resultant discovery is therefore very fast.
And making tail substrings of a string is made MUCH slower, since in order to, say, generate a substring with leading spaces stripped off, you have to COPY the string rather than simply increment a pointer. This can get very messy in trying to parse strings.
What you're doing on a conceptual level when you're "simply incrementing a pointer" (if we accept strings as a distinct data type, which is the point of the exercise) is generating a new string that is a slice of the existing string. The slice and original happen to share memory.
In a length-based system, this sort of operation would be accommodated by different means: the pointer would become an index into a string, or index-based operations are encapsulated by introducing an explicit type for string slices. A naive implementation would indeed be much slower in these cases, however -- of course, so would a naive implementation of repeated string concatenation in a terminator system be.
You're probably thinking of applications like the above: manipulating virtual strings through pointers. This system can be built on top of length-based strings -- at a cost, but a constant one.In C, functions like strlen() have to traverse the string in order to find the null terminator. This requires a comparison of each char in the string to '\0' of course.
Therefore, string functions (the majority I can think of all require some 'end searching') are necessarily slower than their Pascal counterparts.
There are a lot of things in C that are string functions that you don't even think of as string functions. These can get much, much slower.
<snip>
However, this would impose a limit on a string's length - but, surely, a practical one!?
Sure, just like the buffer passed to gets() is "long enough". NOT!!!
Well, store the length in a size_t and it will certainly be "long enough". Either that or you'll run out of addressable memory. If a 4G long string just isn't long enough, how the length of your strings is indicated will be a minor problem in comparison.
And it's a tad ironic that gets() would be cited as an example. In a system with length-based strings, gets() might crash your application with an out-of-memory, but you couldn't use it for a buffer overflow attack.
And you could also find something that looks like a speed improvement which is, through the 90/10 rule. Use the right tool for the right job. Sometimes you *will* get definite improvements by keeping track of length. At other times, you won't.So, to the questions!
Are my assertions correct?
If you look at speeding up a small portion of the job at a potential large cost of speed in the REST of the job, you can find something that looks like a speed improvement which isn't.
The systems are interchangeable in that you can always explicitly keep track of length yourself in a terminator system (at a cost), and you can always use indices to simulate pointers in a length-based system (at a cost). C's choice for zero-terminated strings makes sense in light of its philosophy; so does Pascal's (and most object-oriented languages') choice of length-based strings.
Of course, endless flame wars can^H^H^H rage over what system ought to be picked as the basis... Asking which one is "optimal" is sure to provoke lots of factless debate, and I'm not sure the unqualified question has merit to begin with.
S. .
- References:
- Prev by Date: nanosleep and nanosecond
- Next by Date: Re: nanosleep and nanosecond
- Previous by thread: Re: Strings in C are less optimal than in (say) Pascal - correct?
- Next by thread: Re: Strings in C are less optimal than in (say) Pascal - correct?
- Index(es):
Relevant Pages
|