Re: EOF location?



Thanks for that Richard.

I think I see what Roger was getting at now.

Please correct me if the following summary is wrong:

1. The file size is all that matters as far as Windows is concerned.

2. Any control characters used by applications are placed entirely at the
risk of those applications, without any guarantee that anything OTHER than
the application will even notice them.

I realised this when I wrote some apps using byte stream IO a while back.
But I'd never really explored it or verbalized it.

In the COBOL environment, dealing with LINE SEQUENTIAL files in the Fujitsu
environment (remember the start of this conversation :-)?), you come to
expect a CR/LF at the end of each record (I understand Unix only uses one of
these, but I don't know, as I don't use Unix environments...), and I think
we have established that there may or may not be a Ctrl/Z.

I am probably no wiser than I was before, but between you, you and Roger
have ensured that I am certainly better informed :-)

Poor old HeyBub on the other hand is not much further ahead...

Pete.


"Richard" <riplin@xxxxxxxxxxxx> wrote in message
news:1148942828.967788.305650@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Not such a Clever Monkey wrote:

What happens if we have two lines in a text file on some sort of
DOS/Win32 machines:


17$ cat testfile.txt
18$ od -hc testfile.txt
19$ wc testfile.txt
20$ ls -l testfile.txt

That is hardly "some sort of DOS/Win32", it might be cygwin.

(Win32 is pretty relaxed, in that the last CRLF pair can
actually be omitted. In this case we have a "normalized" file).

'Win32' neither knows nor cares about any CR/LF especially whether
there is one at the end of the file. There is no such thing as a
'normalized file', though there is a 'normalized file path' which is
quite a different thing.

Each
line is 11 characters long; 9 characters and two line-end characters.
Note also that the size in bytes is the same as the character count.

Well duhhh.

I assume it would depend on the file APIs in question, but it is typical
for routines to return lengths that include all the characters,
including the CRLF pair.

There are no Operating system disk file APIs in DOS or Windows that
care about the CR/LF nor any that give a length that notices whether
there are CR/LF or not.

It is up to each language implementation or application to deal with
whatever it wishes to do with any particualr characters in the file.

The EOF/EOT marker is "present", but is
usually used up by API calls that scan for it while retrieving the
contents of a file.

No. Wrong. There is no requirement for an EOF to be present, it is
indicated entirely by the file size. EOT is not used in files as any
sort of ending of the file.

This implies that those routines that count characters will ignore the
EOF. Indeed, those routines will likely stop whatever they are doing
immediately once they encounter the EOF and return with whatever they
got so far.

That may or may not be true depending entirely on the routine itself.
It is not something that is 'likely', some will always do so, some will
never do so. Some may even have an option.

Applications that use the usual APIs to get at the contents
of text files will normally never count or show the trailing EOF.

DOS/Win32 have no disk file APIs that will ever notice a Ctrl-Z nor
care about it. If the file is full of EOF (Ctrl-Z) characters then dir
will show how many there are.

Whether a particular language implementation does this or not depends
on the author. For example I just did a small C program that read a
text file (fopen, getc) and counted the total characters and the x'1a'
characters of a file that had several. It counted the whole file size
and 12 x'1a' characters.



.



Relevant Pages

  • Re: EOF location?
    ... line is 11 characters long; 9 characters and two line-end characters. ... Under some circumstances using these APIs to cycle through a text file until you get "EOF" will fail when confronted with a file that contains embedded EOF chars. ...
    (comp.lang.cobol)
  • Re: Unusual Input with fgets( )
    ... I assume you mean end-of-file, not the characters 'E', 'O', 'F'. ... EOF can't. ... It's hard to tell why your program halts, ...
    (comp.lang.c)
  • Re: Reading text file characters
    ... >> If anyone is still with this thread, I did a hex dump suggested by ... >> What doesn't show up in the numbers below is the last line of the hex ... >fill their file buffer with EOF characters, ... Since the first EOF marks the end, ...
    (microsoft.public.vb.general.discussion)
  • Re: converting std::basic_string to upper or lower case.
    ... >> upper case characters were coming back messed up. ... > 0-127 + EOF). ... So if you pass a negative char value it isn't going to be ... >> nagging suspicions it's not and that only the memory returned from ...
    (microsoft.public.vc.stl)
  • Re: EOF location?
    ... *> Get a directory report on the file... ... It would appear that there are three more characters than we actually ... the two characters which indicate EOF ... An ASCII CRLF pair are actually "line separators" used to ...
    (comp.lang.cobol)