Re: CR-NL, NL and ftell

From: Flash Gordon (spam_at_flash-gordon.me.uk)
Date: 02/21/05


Date: Mon, 21 Feb 2005 17:18:10 +0000

Bart C wrote:
> "Eric Sosman" <esosman@acm-dot-org.invalid> wrote in message
> news:nNGdnWUD6rBreoTfRVn-vA@comcast.com...
>
>>Bart C wrote:
>>
>>>[...]
>>>Why doesn't feof work as expected, ie return True when positioned at
>>>end-of-file? Is coding 'currfileposition' >= 'filesize' really that
>>>difficult.
>>
>> Yes. Keep in mind that feof() et al. operate on FILE*
>>streams, which may be connected to data sources (and sinks)
>>that are not fixed-size files. Explain, if you will, how
>>you would implement a "predictive" feof() on a stream taking
>>data from a TCP/IP socket, or even from your keyboard.
>
> I wouldn't. I would treat disk files differently from devices such as
> keyboards or i/o ports. The two kinds of data are different enough to
> warrant a separate set of functions

That would make it far harder to implement programs that can take input
from a number of possible sources including a file based on command line
switches.

> You may want to use C to implement another language and to emulate the
> behaviour of that language's equivalent of feof(). C, being general purpose,
> should be up to the job but sometimes it's not that easy.

There is no language in which everything is easy.

> I also found some time back on this newgroup that reading a single key from
> the keyboard was not part of standard C! This is a problem I remember from
> mainframes in the 70s. It went away with microcomputers in the 80s, and now
> with C it's come back again.

Actually it never went away.

> 2 major revisions of the C standard and
> something so basic is not in?

I agree that it would be useful, but it was not added to the standard.

>> Your experience with different file formats is clearly
>>not very extensive. Here are a few of the byte sequences
>>you might find in a file after puts("Hello") -- all of these
>>are from my own experience and none is a fabrication, although
>>I may have mis-remembered a detail here and there:
>>
>>H e l l o \n
>>H e l l o \r
>>H e l l o \r \n
>>\005 \000 H e l l o \000
>>H e l l o \040 \040 \040... (75 spaces all told)
>>\006 \000 \001 H e l l o
>>H e l l o \n \032... (plus 121 garbage characters)
>
>
> My specs for puts() say that '\n' is appended after the string argument.
> Whether that means cr-lf, cr or lf I'm not sure, but it's best to assume any
> of these when reading such a file.

It adds a \n which then gets translated to by the C implementation to
whatever the file system wants to indicate a new line.

> If you're getting all this extra garbage
> after your data (I don't mean padding bytes to fill up a disk sector) then
> I'd complain.

Without all that other stuff if you loaded your "text" file in to a text
editor on the system, or passed it to anything else expecting a text
file, it would not work. That is because text files on those systems are
*defined* as using fixed length space padded lines, or lines where the
line length is indicated by the first byte of the line record or whatever.

Try writing a simple text processing application that works on all
systems including those with strange (to you) native text file formats
*without* having the implementation taking care of the details would be
a *major* problem.

>>Thought question: Would you prefer to learn all the rules of
>>these (and many other) file formats and write that knowledge
>>into all your programs, or would it make more sense to use a
>>text stream to mediate between these and a standardized format?
>
> I've invented plenty of file formats. But to the OS or the C runtime, my
> file should be just a bunch of data, namely a set of N bytes.

*Your* file formats are. Just open them in binary mode and that is what
you get.

> And a text
> file is set of bytes sprinkled with cr and/or lf characters. The total size
> of N bytes should (naturally) include those characters.

No, a text file is whatever the OS defines a text file as being, which
can be a *lot* more complex.

> If the OS, disk controller, modem, whatever wants to add extra bytes to
> that, that's fine provided they are transparent.

The whole point of the way text streams are handled in C is that it
*does* make it transparent. You don't have to worry about whether it is
CR, CR/LF, LF, explicit length records, padded fixed length lines or
what. However, this means the file size on disk is *not* always the
number of characters you will read if you read all the way through it.

-- 
Flash Gordon
Living in interesting times.
Although my email address says spam, it is real and I read it.


Relevant Pages

  • Re: Code density and performance?
    ... >> vidstream.exe by default writes to 10 streams, ... >> laptop disk in a 1.6 GHz Dell ... >> I then reran the experiment a few times with different buffer sizes: ... interleaving the allocation of each of the target files. ...
    (comp.arch)
  • Re: packet drop with intel gigabit / marwell gigabit
    ... Is the number 3-4MB/s for per stream or the total for all 30-40 streams? ... If MB/s is MBytes/s and you also write this amount data to a disk, plus other traffic on fxp0 to disk too, ... What does the chipset (Motherboard) this machine have? ... AMD is good. ...
    (freebsd-performance)
  • Re: Big Strings
    ... Fragmentation can be sorted out ... this appears to be exactly what streams were designed to ... disk access is *very* expensive ... what appears to be a disk read can really access your own ...
    (alt.comp.lang.borland-delphi)
  • Re: [ANNOUNCE] Interbench v0.20 - Interactivity benchmark
    ... >> streams to disk while reading and viewing a third ... send the line "unsubscribe linux-kernel" in ...
    (Linux-Kernel)
  • Re: packet drop with intel gigabit / marwell gigabit
    ... packet drop with intel gigabit / marwell gigabit ... Is the number 3-4MB/s for per stream or the total for all 30-40 streams? ... If MB/s is MBytes/s and you also write this amount data to a disk, plus other traffic on fxp0 to disk too, ...
    (freebsd-performance)