Re: Is C99 the final C? (some suggestions)

From: Martien Verbruggen (mgjv_at_tradingpost.com.au)
Date: 12/15/03


Date: 15 Dec 2003 05:25:11 GMT

On Mon, 15 Dec 2003 03:50:21 GMT,
        Paul Hsieh <qed@pobox.com> wrote:
> ajo@nospam.andrew.cmu.edu says...
>> [...] You have implied elsethread that you think this solution a
>> "non-solution" (or perhaps I'm again imagining sarcasm where
>> none was intended). If that's the case, would you mind explaining
>> what you think is wrong with getc() and fread() on binary streams?
>
> getc() only reads a character (i.e., it reads too little), and
> fread() only reads a buffer and ignores line terminators (i.e., it
> reads too much). That leaves writing a loop over getc() as the only
> solution. But then one has to wonder, why was fgets() included with
> its specific semantics, if its really just a subset of a more
> general solution?

fgets was designed to read text lines into a string. A string, in C,
is a bunch of characters, terminated by a '\0'.

> fgets() is the only string function which specifically ignores '\0'.

I haven't been able to find a post yet in which you specify exactly
what you mean by "ignores '\0'". Other people have already explained
that fgets() does not ignore embedded null characters in text files.
It neatly reads it in, and puts it in the buffer nominated. When you
then go ahead, and treat that buffer as a string, the first zero in
there will terminate the string, as the C language requires. The rest
of the stuff in the buffer just never is visible if you look at it as
a string.

#include <string.h>
#include <stdio.h>
#include <stdlib.h>

#define MAXLEN 40

int main(void)
{
    char buf[MAXLEN];

    while (fgets(buf, MAXLEN, stdin))
    {
        size_t length = strlen(buf);
        int i = 0;
        while (i < MAXLEN && buf[i] != '\n')
            i++;
        if (i == MAXLEN)
            puts("No newline found");
        else
            printf("Found newline at %d\n", i);

        printf("READ %d: %s", (int)length, buf);
        if ( i != (int)length - 1)
            putchar('\n');
    }
    return 0;
}

If you compile and run the above and feed it a "text" file with
embedded null characters on stdin, you'll notice that fgets() does not
"ignore" the null characters, and neither does it stop reading on a
null character. It merrily continues until the end of line, end of
file, an error condition, or until the buffer is full. As per
specification.

You'll also notice that the embedded null characters "confuse" tools
like strlen() and %s in printf(), because they (correctly) assume that
the first null character is the end of the string.

> A minor change in its semantics would have made it more consistent
> with the rest of the C language without any restriction of the file
> mode (return the length, terminate on either '\0' or '\n' but always
> add an additional '\0' at the end) which would have made it a
> *superset* of what we have today without any strange
> conditions or anomolies.

There are no anomalies. fgets() was meant to read text into a string.
If you want embedded '\0' characters, it is no longer a string (or
rather, the string ends where the first zero appears). If you want to
work with buffers that have zeroes in them, it is no longer a string,
and fgets isn't the right tool.

> The value of a library function should be how much it *saves* the
> programmer from having to code themselves. In the context of the C
> language, you might also have other criteria like being minimal, but
> I don't see that it would be significantly larger to have
> implemented fgets with the semantics I suggest.

Well, Dennis Ritchie didn't implement fgets() with the semantics you
suggest, so fgets() as it is, is what we got. If you feel that another
function with other semantics is needed, you should probably take it
up with the folks in comp.std.c.

But none of that means that fgets() "ignores" '\0' characters.

What you seem to be saying instead is that you want fgets() to have a
different interface, and you probably should just get over that,
because it isn't going to happen.

I'd probably find it more fruitful to argue that the standard
explicitly uses the word "string" in the description for fgets(), to
avoid this sort of pointless argument.

Martien

-- 
                        | 
Martien Verbruggen      | 
Trading Post Australia  | "Mr Kaplan. Paging Mr Kaplan..."
                        | 


Relevant Pages

  • Re: Is this string input function safe?
    ... return a pointer to mallocated memory holding one input string, ... See my comment after your call to fgets. ... char* malloc_getstr ... before any characters are read, then the ...
    (comp.lang.c)
  • Re: Difference between fgets and gets
    ... I Know the basic difference m gets will get the string from the ... fgets() allows you to specify the maximum number of characters to ... the buffer, potentially clobbering something important. ...
    (comp.lang.c)
  • Re: Delphi Quiz: SetLength( WideString, 10 );
    ... >> I call a function and the function returns a buffer of bytes. ... Let's assume it's a 16 bit unicode string. ... characters to a wide character encoding scheme such as Unicode. ...
    (alt.comp.lang.borland-delphi)
  • Re: fgets question
    ... documentation didn't say if fgets put \0 after a string literal. ... fgets() has nothing at all to do with string literals. ... within the bounds of the array, then it is not a string. ... strlenon an array of characters that is not a string, ...
    (comp.lang.c)
  • Re: Why crash ?
    ... Yes, but naively, I would assume that the optimal buffer size ... depended on a certain number of characters, ... small string optimization is that the std::string object has a certain ... If you allocate dynamic storage, the string object has to store ...
    (microsoft.public.vc.language)