Re: Null-terminated strings: the final analysis.
- From: Keith Thompson <kst-u@xxxxxxx>
- Date: Sun, 12 Apr 2009 16:11:35 -0700
Mark McIntyre <markmcintyre@xxxxxxxxxxxxxxxxxxx> writes:
On 12/04/09 21:32, Keith Thompson wrote:
Tab and newline characters are non-printable; can a text file contain
those?
Indeed, I left that as so obvious it was unsaid - I'd forgotten I was
in the land of the pedants!
Subtle distinctions are at the core of what we're discussing here.
Let's not ignore such distinctions for the sake of avoiding pedantry.
[...]
On the systems I use, if I write a '\a' character (ASCII BEL) to a
text file, I can reasonably expect to see a '\a' character when I read
it back. The same is not true of '\0' if I use fgets() to read it
What would you expect to "see"? I would hope that nothing is displayed
on your VDU or printed on paper for instance. So in the context of
"text", how can it be meaningful?
By "see", I meant that I could write something like:
c = fgetc(my_file);
if (c == '\a') {
puts("Yes, it's a '\\a' character");
}
with the expectation that the puts statement would be executed
sometimes.
I left that as so obvious it was unsaid. 8-)}
Incidentally, I do have at least one text file with an embedded ASCII
BEL character. I have a perfectly valid reason for doing this, and
it's never been a serious problem.
In any case, the distinction between text files and non-text files is
irrelevant to a discussion of C strings. Clearly C strings can
contain any characters other than '\0', including non-printable
characters. If I want to construct a sequence of characters
containing a control sequence for a VT100-style terminal, for example,
a string is a perfectly sensible thing to use. And if any such
sequences include null characters (I don't know whether they do or
not), then the fact that I can't store embedded null characters in
strings is an inconvenience.
And anyway, if you want char arrays containing nulls, C can do those, no
problem.
Yes, but you can't store a null character in the middle of a string,
But again thats a circular argument.
Not at all.
If, because of some requirement outside the C language, I want to
store arbitrary character sequences, I can use C strings only if I can
guaranteed that I don't need to store any null characters.
[...]
So I concent that its not a useful point. If you want to transport
elephants, use a crate, not a box. If you want to transport nulls, use
an array, not a string - or use some language that allows internal
nulls in its string type.
Right. So C strings impose a limitation, and I might have to work
around that limitation in some circumstances. That seems to me to be
a very useful thing to be aware of.
For example, if I'm reading chunks of data from a binary file, I can
store those chunks in character arrays, but I can't safely use the
language's built-in string processing functions on them. For example,
I can't use strstr() to search for a pattern in the data. If C had
been designed differently, that wouldn't be an issue.
which makes char arrays containing nulls more difficult to deal with.
I'm not saying it's a fatal flaw in the language, but it is a slight
inconvenience.
I can't recall /ever/ having found it so, in 20+ years of
programming. Its surely just a matter of interface design: if you
expect to be fed non-strings, then don't use a string to contain
them. Alternatively, document the interface appropriately.
Ok, so it's a *potential* inconvenience.
And there are languages whose native strings *can* contain embedded
null characters. In C, strlen("foo\0bar") returns 3; in Perl,
length("foo\0bar") returns 7, and there's nothing particularly special
about the 4th character.
Apart from being a nul, which isn't a common character in real-world
strings. For instance, find me a place or person with a nul in their
name, or a word in any language, including klingon.
Strings aren't just used to store names of places or people. And if C
strings *could* store embedded null characters, they might be
*slightly* more useful than they are without that ability.
In the design of the language, a tradeoff was made between the
convenience of null termination vs. the *slightly* greater flexibility
of being able to store embedded null characters. I do not suggest
that the choice was the wrong one, merely that it was a tradeoff with
a non-zero cost. And if you've never run into it, that doesn't change
the point.
--
Keith Thompson (The_Other_Keith) kst-u@xxxxxxx <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
.
- Follow-Ups:
- Re: Null-terminated strings: the final analysis.
- From: Mark McIntyre
- Re: Null-terminated strings: the final analysis.
- References:
- Re: Null-terminated strings: the final analysis.
- From: Tony
- Re: Null-terminated strings: the final analysis.
- From: Mark McIntyre
- Re: Null-terminated strings: the final analysis.
- From: Mark Wooding
- Re: Null-terminated strings: the final analysis.
- From: CBFalconer
- Re: Null-terminated strings: the final analysis.
- From: Joe Wright
- Re: Null-terminated strings: the final analysis.
- From: Mark Wooding
- Re: Null-terminated strings: the final analysis.
- From: Mark McIntyre
- Re: Null-terminated strings: the final analysis.
- From: Keith Thompson
- Re: Null-terminated strings: the final analysis.
- From: Mark McIntyre
- Re: Null-terminated strings: the final analysis.
- Prev by Date: Re: Portability regarding sizeof() function
- Next by Date: Re: structure and union queries
- Previous by thread: Re: Null-terminated strings: the final analysis.
- Next by thread: Re: Null-terminated strings: the final analysis.
- Index(es):
Relevant Pages
|