Re: Binary or Ascii Text?
- From: "Claude Yih" <wing0630@xxxxxxxxx>
- Date: 31 Mar 2006 01:34:13 -0800
osmium writes:
The best you can do is make a guess. The first 32 characters of ASCII are
control codes and only a few of them (CR, LF, FF, HT (tab), .... are present
in text files. So if you have quite a few of the other 25 or so codes, it is
probably not a text file - but it's only an educated guess, no real proof.
Well, as matter of fact, I just got an idea to handle that problem. But
I don't know if it is feasible.
Now that we know ascii text only use 7 bits of a byte and the first bit
is always set as 0. So I wonder if I could write a program to get a
fixed length of a given file(for example, the first 1024 bytes) , to
store them in a unsigned char array and to check if there is any
elements greater than 0x7F. If any, the file can be judged as a binary
file.
However, the disadvantage of the above method is that it cannot handle
the multi-byte character. Take the UTF-8's japanese character for
example, a japanese character may be encoded as three bytes and some of
them may be greater than 0x7F。 In that case, my method will make no
sense.
.
- Follow-Ups:
- Re: Binary or Ascii Text?
- From: Keith Thompson
- Re: Binary or Ascii Text?
- References:
- Binary or Ascii Text?
- From: Claude Yih
- Binary or Ascii Text?
- Prev by Date: Re: Binary or Ascii Text?
- Next by Date: Re: Why no segmentation fault
- Previous by thread: Re: Binary or Ascii Text?
- Next by thread: Re: Binary or Ascii Text?
- Index(es):