Re: Recognising file type (ascii/binary)
- From: "Oliver Wong" <owong@xxxxxxxxxxxxxx>
- Date: Fri, 28 Oct 2005 20:15:59 GMT
"Matt Humphrey" <matth@xxxxxxxxxxxxxx> wrote in message
news:XpSdneq4Oe9H0PzeRVn-gw@xxxxxxxxxxxxxxx
>
> "Bruce Lee" <blah@xxxxxxxxxxxxxxxxxxxx> wrote in message
> news:P068f.23301$Ih5.7913@xxxxxxxxxxxxxxxxxxxxxxxxxxxx
>> Is there any easy way to get Java to determine whether a file is a binary
>> file or plain text ascii file?
>
> Files are simply sequences of (binary) bytes--there's no way to tell
> whether it's supposed to contain only bytes that represent printable ascii
> (or unicode) or any particular binary pattern. You can read the file to
> find out--if you find values that signify unlikely or non-printable
> characters you can deem the file binary or corrupt. Similarly, there are
> heuristics (based on convention) for guessing the "type" of the file based
> on the first few bytes, but there's no guarantee these are correct either.
> (And files with 2-byte UNICODE characters can really confuse things.)
>
> Of course, you could require that text files end in "txt" or
> something--it's no worse than any of the above and significantly easier.
Matt Humphrey is completely correct. However as an additional check to
the heuristic of looking for unprintable characters, another trick is to
check if the newline string is consistent. It should always be either "\n"
(for UNIX-like systems), "\r" (for Mac-like systems) or "\r\n" (for
Windows-like systems). If the file starts switching around between these, it
probably isn't a valid ASCII file on any of the above three platforms.
You could also disregard 2-byte UNICODE characters as being "non-ASCII",
and lump them in with the category of "binary files".
- Oliver
.
- Prev by Date: Re: public and private key pair in Java
- Next by Date: Re: Gmail for CVS?
- Previous by thread: Re: public and private key pair in Java
- Next by thread: Re: Recognising file type (ascii/binary)
- Index(es):
Relevant Pages
|