Re: EOF location?




"Roger While" <simrw@xxxxxxxxxxxx> wrote in message
news:e5fn9v$iof$03$1@xxxxxxxxxxxxxxxxxxxx
Can you stop talking rubbish and refer to my previous post.
There is so much wrong in this post that defies belief.

Could you point some of it out, Roger?

Up until the para that starts "I assume..." it looked OK to me. He pointed
out my error which I had already admitted to, but his examples looked
good... I would disagree with some of what came after that, but I think it
is probably arguable.

What am I missing here?

Pete.
TOP POST - no more from me below

Roger

"Clever Monkey" <clvrmnky.invalid@xxxxxxxxxxxxxxxxxxx> schrieb im
Newsbeitrag news:YXHeg.24087$43.23700@xxxxxxxxxxxxxxx!nnrp1.uunet.ca...
Pete Dashwood wrote:
"HeyBub" <heybubNOSPAM@xxxxxxxxx> wrote in message
news:127hfdman9s836f@xxxxxxxxxxxxxxxxxxxxx
Anyone know where the EOF is supposed to be located on a PC file?


Er...please sir! please sir!... at the END of the file, sir!

In other words, if Windows reports a file is, say, 100 bytes long, is
the EOF (X'1A') supposed to be in the 100th or 101st byte?


I did a quick experiment using a DOS box....

Here's the results... (I have added comments for those who were not born
when the only option was a command line interface...these comments (as
this is a COBOL forum) are prefixed with *>

================== start of DOS box ===============

Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

*> Create a simple text file with the characters 1 thru 8...

C:\Documents and Settings\dashwood>echo 12345678 > testfile.txt

*> Make sure that textfile.txt DOES contain the expected characters...

C:\Documents and Settings\dashwood>type testfile.txt
12345678

*> Get a directory report on the file...

C:\Documents and Settings\dashwood>dir testfile.txt
Volume in drive C is PetesP4system
Volume Serial Number is D499-5305

Directory of C:\Documents and Settings\dashwood

28/05/2006 17:07 11 testfile.txt
1 File(s) 11 bytes
0 Dir(s) 10,042,187,776 bytes free

It would appear that there are three more characters than we actually
input.

Windows reports the file as 11 bytes; yet only 8 characters were
input...

Looks like a job for Hex Editor...(AXE version 3 ...)

31 32 33 34 35 36 37 38 20 0d 0a

Aha! NINE characters were input; I inadvertently entered a space...
(x20)

So, the answer to your question:

YES, the two characters which indicate EOF (in the Windows environment)
ARE included in the file size reported by the Operating System. Your Hex
1F EOF is incorrect for a text file (possibly COBOL uses a different
EOF, but I doubt it); it is x'0D' x'0A' and both these characters ARE
INCLUDED in the size reported by the OS.

It is not quite correct to say that the 0D0A pair indicates any sort of
EOF marker. An ASCII CRLF pair are actually "line separators" used to
distinguish one line from the other. A single 0x1A (control-Z in ASCII)
is used to indicate EOF, similar to EOT in POSIX environments.

What happens if we have two lines in a text file on some sort of
DOS/Win32 machines:

[...]
17$ cat testfile.txt
123456789
123456789
18$ od -hc testfile.txt
0000000000 31 32 33 34 35 36 37 38 39 0D 0A 31 32 33 34
35
1 2 3 4 5 6 7 8 9 \r \n 1 2 3 4
5
0000000020 36 37 38 39 0D 0A
6 7 8 9 \r \n
0000000026
19$ wc testfile.txt
2 2 22 testfile.txt
20$ ls -l testfile.txt
-rw-rw-rw- 1 user group 22 May 29 15:00 testfile.txt
21$
[...]

As we can see, those "\r\n" pairs are at the end of every line, not the
end of every file. We have two lines of text, separated by the CRLF
pairs. (Win32 is pretty relaxed, in that the last CRLF pair can actually
be omitted. In this case we have a "normalized" file). Each line is 11
characters long; 9 characters and two line-end characters. Note also that
the size in bytes is the same as the character count.

I assume it would depend on the file APIs in question, but it is typical
for routines to return lengths that include all the characters, including
the CRLF pair. The EOF/EOT marker is "present", but is usually used up
by API calls that scan for it while retrieving the contents of a file.

This implies that those routines that count characters will ignore the
EOF. Indeed, those routines will likely stop whatever they are doing
immediately once they encounter the EOF and return with whatever they got
so far. Applications that use the usual APIs to get at the contents of
text files will normally never count or show the trailing EOF.




.



Relevant Pages

  • Re: EOF location?
    ... *> Get a directory report on the file... ... It would appear that there are three more characters than we actually input. ... Windows reports the file as 11 bytes; yet only 8 characters were input... ... the two characters which indicate EOF ARE ...
    (comp.lang.cobol)
  • Re: EOF location?
    ... *> Get a directory report on the file... ... It would appear that there are three more characters than we actually ... the two characters which indicate EOF ... An ASCII CRLF pair are actually "line separators" used to ...
    (comp.lang.cobol)
  • Re: EOF location?
    ... *> Get a directory report on the file... ... It would appear that there are three more characters than we actually input. ... the two characters which indicate EOF ARE included in the file size reported by the Operating System. ... An ASCII CRLF pair are actually "line separators" used to distinguish one line from the other. ...
    (comp.lang.cobol)
  • Re: EOF location?
    ... line is 11 characters long; 9 characters and two line-end characters. ... Under some circumstances using these APIs to cycle through a text file until you get "EOF" will fail when confronted with a file that contains embedded EOF chars. ...
    (comp.lang.cobol)
  • Re: Unusual Input with fgets( )
    ... I assume you mean end-of-file, not the characters 'E', 'O', 'F'. ... EOF can't. ... It's hard to tell why your program halts, ...
    (comp.lang.c)