Re: EOF location?
- From: Clever Monkey <clvrmnky.invalid@xxxxxxxxxxxxxxxxxxx>
- Date: Mon, 29 May 2006 15:28:56 -0400
Pete Dashwood wrote:
"HeyBub" <heybubNOSPAM@xxxxxxxxx> wrote in message news:127hfdman9s836f@xxxxxxxxxxxxxxxxxxxxxIt is not quite correct to say that the 0D0A pair indicates any sort of EOF marker. An ASCII CRLF pair are actually "line separators" used to distinguish one line from the other. A single 0x1A (control-Z in ASCII) is used to indicate EOF, similar to EOT in POSIX environments.Anyone know where the EOF is supposed to be located on a PC file?
Er...please sir! please sir!... at the END of the file, sir!
In other words, if Windows reports a file is, say, 100 bytes long, is the EOF (X'1A') supposed to be in the 100th or 101st byte?
I did a quick experiment using a DOS box....
Here's the results... (I have added comments for those who were not born when the only option was a command line interface...these comments (as this is a COBOL forum) are prefixed with *>
================== start of DOS box ===============
Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.
*> Create a simple text file with the characters 1 thru 8...
C:\Documents and Settings\dashwood>echo 12345678 > testfile.txt
*> Make sure that textfile.txt DOES contain the expected characters...
C:\Documents and Settings\dashwood>type testfile.txt
12345678
*> Get a directory report on the file...
C:\Documents and Settings\dashwood>dir testfile.txt
Volume in drive C is PetesP4system
Volume Serial Number is D499-5305
Directory of C:\Documents and Settings\dashwood
28/05/2006 17:07 11 testfile.txt
1 File(s) 11 bytes
0 Dir(s) 10,042,187,776 bytes free
It would appear that there are three more characters than we actually input.
Windows reports the file as 11 bytes; yet only 8 characters were input...
Looks like a job for Hex Editor...(AXE version 3 ...)
31 32 33 34 35 36 37 38 20 0d 0a
Aha! NINE characters were input; I inadvertently entered a space... (x20)
So, the answer to your question:
YES, the two characters which indicate EOF (in the Windows environment) ARE included in the file size reported by the Operating System. Your Hex 1F EOF is incorrect for a text file (possibly COBOL uses a different EOF, but I doubt it); it is x'0D' x'0A' and both these characters ARE INCLUDED in the size reported by the OS.
What happens if we have two lines in a text file on some sort of DOS/Win32 machines:
[...]
17$ cat testfile.txt
123456789
123456789
18$ od -hc testfile.txt
0000000000 31 32 33 34 35 36 37 38 39 0D 0A 31 32 33 34 35
1 2 3 4 5 6 7 8 9 \r \n 1 2 3 4 5
0000000020 36 37 38 39 0D 0A
6 7 8 9 \r \n
0000000026
19$ wc testfile.txt
2 2 22 testfile.txt
20$ ls -l testfile.txt
-rw-rw-rw- 1 user group 22 May 29 15:00 testfile.txt
21$
[...]
As we can see, those "\r\n" pairs are at the end of every line, not the end of every file. We have two lines of text, separated by the CRLF pairs. (Win32 is pretty relaxed, in that the last CRLF pair can actually be omitted. In this case we have a "normalized" file). Each line is 11 characters long; 9 characters and two line-end characters. Note also that the size in bytes is the same as the character count.
I assume it would depend on the file APIs in question, but it is typical for routines to return lengths that include all the characters, including the CRLF pair. The EOF/EOT marker is "present", but is usually used up by API calls that scan for it while retrieving the contents of a file.
This implies that those routines that count characters will ignore the EOF. Indeed, those routines will likely stop whatever they are doing immediately once they encounter the EOF and return with whatever they got so far. Applications that use the usual APIs to get at the contents of text files will normally never count or show the trailing EOF.
.
- Follow-Ups:
- Re: EOF location?
- From: Pete Dashwood
- Re: EOF location?
- From: Richard
- Re: EOF location?
- From: Richard
- Re: EOF location?
- From: Roger While
- Re: EOF location?
- References:
- EOF location?
- From: HeyBub
- Re: EOF location?
- From: Pete Dashwood
- EOF location?
- Prev by Date: Re: [OT] Back Again
- Next by Date: Re: EOF location?
- Previous by thread: Re: EOF location?
- Next by thread: Re: EOF location?
- Index(es):
Relevant Pages
|