Re: Text files



"osmium" wrote:
"Malcolm McLean" wrote:
No. Text files often have a control Z as end of file marker. However on
most systems the text and binary formats are in fact identical.#

The most common form of text files use ASCII code, and there is no character
called 'control Z' in ASCII. Furthermore, I don't know of any compiler for
desktop computers that expects EOF to be encoded in the data, EOF is a
*condition* detected by the OS.

[Off-Topic] CP/M used a 0x1A character (a.k.a "Control-Z") as an EOF
marker "encoded in the data", since the OS kept track of file sizes
only as multiples of 128 byte blocks. (This convention was used only
in text files, obviously.)

Since every CP/M program that manipulated text expected this, and
Microsoft's DOS started its life as a CP/M look alike for the 8086/8
family, DOS programs also interpreted (and many still do) a CTRL-Z as
an EOF mark, and added automatically a CTRL-Z at the end of text files
when closing them. (Try COPYing a file that contains a CTRL-Z in a
Windows system, with and without the /B switch)

There is enough software relying on this, and enough files created
with a CTRL-Z at the end, that the C# language definition includes
provisions to deal with it:

"The C# Programming Language", 2nd ed., (c) 2006

2 - Lexical Structure ...
2.3 - Lexical Analysis ...
2.3.1 - Line Terminators ...
"If the last character of the source file is
a Control-Z character (U+001A) this
character is deleted"

I was forced to do the same eons ago, when writing C programs that had
to be portable between MSDOS, CP/M and DEC's operating systems.

Roberto Waltman

[ Please reply to the group,
return address is invalid ]
.



Relevant Pages

  • Re: Text files
    ... The most common form of text files use ASCII code, ... desktop computers that expects EOF to be encoded in the data, ... Microsoft's DOS started its life as a CP/M look alike for the 8086/8 ... and added automatically a CTRL-Z at the end of text files ...
    (comp.lang.c)
  • Need help to remove tbas, carriage retrun, and other!!!
    ... ASCII code of the character ... that I link to) to specify the string that you want to ... optionally the ascii chr number of the delimiter ... 'At the start point of the string (1 character after the ...
    (microsoft.public.access.queries)
  • Re: Does the cell have a number?
    ... CODEreturns the ascii code for each character - I'm looking for numbers ... I would like to be able to determine if any part of the cell>>>contents is a number, even if the string starts with, or contains ...
    (microsoft.public.excel.worksheet.functions)
  • Re: My symbols turn to question marks
    ... the space between & and #) where the number is the ASCII code for the ... It is possible that your server is delivering the page with UTF-8 character ... > marks or other oddball symbols. ... > question mark. ...
    (microsoft.public.frontpage.client)
  • Re: EOF location?
    ... I carefully noted in the rest of my reply that this is an API ... Certain application programs may still recognize Ctrl-Z and ... of File' is _not_ a character, it is the state of reading past the size ... No. stdin does not recognise EOT. ...
    (comp.lang.cobol)

Loading