Re: Null-terminated strings: the final analysis.



Mark McIntyre <markmcintyre@xxxxxxxxxxxxxxxxxxx> writes:
On 12/04/09 20:16, Mark Wooding wrote:
Joe Wright<joewwright@xxxxxxxxxxx> writes:

I believe strings can contain tabs, and other things that don't print but I
agree strings cannot contain the NUL character.

... by a trivial consequence of C's definition of a string, no less.

Exactly - definitionally.

Further, a text file is corrupted by the NUL character in a line.

Only because C strings can't represent lines of text containing a zero
byte.

Again we're into definitions: my definition of a text file is one that
doesn't contain non-alphanumeric characters. So if you send a null into
such a file, its corrupted.

I presume you meant non-printable, not non-alphanumeric; surely a text
file can contain spaces and punctuation characters.

Tab and newline characters are non-printable; can a text file contain
those?

I'm sure you can construct a rigorous definition of "text file" that
excludes null characters. But I don't think there's any universal
definition.

On the systems I use, if I write a '\a' character (ASCII BEL) to a
text file, I can reasonably expect to see a '\a' character when I read
it back. The same is not true of '\0' if I use fgets() to read it
(though I think can see the '\0' if I use fgetc()).

Using this to justify C's representative inadequacy is circular.

But then so is the counter-argument that is being made. C defines a
string as a null-terminated array of characters, therefore its circular
to complain that a string can't contain a null.

And anyway, if you want char arrays containing nulls, C can do those, no
problem.

Yes, but you can't store a null character in the middle of a string,
which makes char arrays containing nulls more difficult to deal with.
I'm not saying it's a fatal flaw in the language, but it is a slight
inconvenience.

And there are languages whose native strings *can* contain embedded
null characters. In C, strlen("foo\0bar") returns 3; in Perl,
length("foo\0bar") returns 7, and there's nothing particularly special
about the 4th character.

--
Keith Thompson (The_Other_Keith) kst-u@xxxxxxx <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
.



Relevant Pages

  • RE: VBA question: How to extract cell values in different language
    ... language is entered, but it seems like all that data is lost when the VBA ... about having binary data and not unicode data confirms my suspicions. ... You are have 256 binary characters. ... First column has the string IDs ...
    (microsoft.public.excel.programming)
  • Re: Russian language support
    ... Actually in our previous test we didnt do language transition properly ... bytes, or if in Unicode, just a string of 2-byte values, each of which is ... it selects a font to use to display ... TTF fonts in Windows CE map Unicode characters into suitable glyphs. ...
    (microsoft.public.windowsce.platbuilder)
  • RE: VBA question: How to extract cell values in different language
    ... language is entered, but it seems like all that data is lost when the VBA ... You are have 256 binary characters. ... Unicode data on the same file with two diffferent language setting gives you ... First column has the string IDs ...
    (microsoft.public.excel.programming)
  • RE: VBA question: How to extract cell values in different language
    ... You are have 256 binary characters. ... Depending on the language settings the characters are displayed differrently, ... Unicode data on the same file with two diffferent language setting gives you ... First column has the string IDs ...
    (microsoft.public.excel.programming)
  • Re: How to convert Infix notation to postfix notation
    ... If this is for an error message, why isn't it using stderr for its output? ... array of 15 characters, and you call this function with the limit 15 on ... Making sure that the only string I allocate and append to, ... because mulFactor in all versions must needs incorporate the functions ...
    (comp.lang.c)