Re: Binary v. Text, why is it faster?
- From: "mensanator@xxxxxxx" <mensanator@xxxxxxx>
- Date: 6 Feb 2006 18:16:24 -0800
Arctic Fidelity wrote:
I have constantly seen and heard that reading binary data is faster than
reading textual data. I have always presumed this to be a fact. But now I
am at the point where I would like to understand why.
I was trying to think about it, and it has rather confused me. To my
understanding, reading a text file is reading in the bytes which
correspond to, for example, ASCII character codes. But if we are dealing
with a 1-byte character encoding, how is it slower to read in 'a' rather
than some binary representation of that?
And in addition to this, what is the actual difference between binary and
textual files? I had always thought that a binary file was simply a file
composed of any combination of bytes, whereas a text file was a file
composed of a limited subset of the bytes available to a binary file. Am I
misunderstanding something here?
Actually, there is no difference. But since text files are often
interpreted as such, it is wise to limit the contents to the proper
subset. For example:
dir tt1.txt02/06/2006 07:05p 10 tt1.txt
1 File(s) 10 bytes
The file tt1.txt contains 10 bytes, but we only see 7 if we type it:
type tt1.txtabcdefg
because "type" expects the file to be text and not binary and
interprets some of the contents instead of printing them. A dump
reveals why we only see 7:
DUMP.EXE version 8-MAR-91
Block # 0 0
0 61 62 63 64 65 66 67 0D 0A 1A FF FF FF FF FF FF abcdefg...
Bytes 8, 9 & 10 are carraige return (0D), line feed (0A) and EOF (1A).
The EOF character is not strictly required, since the OS knows
there are exactly 10 bytes (the FFs are sector padding bytes not
part of the file).
But watch what happens when I concatenate two copies together:
copy tt1.txt+tt1.txt tt2.txttt1.txt
tt1.txt
1 file(s) copied.
dir tt2.txt02/06/2006 07:19p 19 tt2.txt
1 File(s) 19 bytes
10 bytes + 10 bytes = 19 bytes ??
A dump reveals what happened:
DUMP.EXE version 8-MAR-91
Block # 0 0
0 61 62 63 64 65 66 67 0D 0A 61 62 63 64 65 66 67
abcdefg..abcdefg
16 0D 0A 1A FF FF FF FF FF FF FF FF FF FF FF FF FF
....
The terminating EOF of the first copy of tt1.txt was dropped
as part of the concatenation. The OS expects only one (if any)
EOF character per file and it better be the last one.
I could simply insert the original EOF back into the file
dir tt3.txt02/06/2006 07:25p 20 TT3.TXT
1 File(s) 20 bytes
DUMP.EXE version 8-MAR-91
Block # 0 0
0 61 62 63 64 65 66 67 0D 0A 1A 61 62 63 64 65 66
abcdefg...abcdef
16 67 0D 0A 1A FF FF FF FF FF FF FF FF FF FF FF FF
g...
But the OS won't like it:
type tt3.txtabcdefg
Even though the files is now 20 bytes long, "type" won't go past
the first EOF character.
The copy command has a binary option that will concatenate
without trying to interpret the contents:
copy /b tt1.txt+tt1.txt tt4.txttt1.txt
tt1.txt
1 file(s) copied.
dir tt4.txt02/06/2006 07:29p 20 tt4.txt
1 File(s) 20 bytes
But that doesn't help the "type" command.
type tt4.txtabcdefg
These kind of problems can also occur if you use FTP to send
a binary file in text mode.
So, generally, assuming the content is ok, it's best to never
let a program think a binary file is a text file.
I guess I just don't see how reading in AF would be slower just because AF
appears in a text file instead of a "binary" file?
- Arctic
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
.
- References:
- Binary v. Text, why is it faster?
- From: Arctic Fidelity
- Binary v. Text, why is it faster?
- Prev by Date: Re: Starting
- Next by Date: Re: Binary v. Text, why is it faster?
- Previous by thread: Re: Binary v. Text, why is it faster?
- Next by thread: Re: Binary v. Text, why is it faster?
- Index(es):
Relevant Pages
|