Re: Identifying File type by reading files
From: Andrew Dalke (adalke_at_mindspring.com)
Date: 12/26/03
- Next message: Robin Munn: "Re: dynamic typing question"
- Previous message: Todd Gardner: "Re: hex array to array of 16 bit numbers?"
- In reply to: hokiegal99: "Identifying File type by reading files"
- Next in thread: Gabriel Genellina: "Re: Identifying File type by reading files"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Fri, 26 Dec 2003 19:46:08 GMT
hokiegal99:
> what should I look for in a file to determine whether or not it is a
> MS Word file or an Excel file or a PDF file, etc., etc.? Below is a
> list of some of the strings I use to ID files, but I can't help but
> wonder that there must be a more precise way of doing this. I know of
> the Unix 'file' command. It is not very useful for me as it doesn't
> distinguish between MS Office documents... all .xls, .docs, .ppts are
> MS documents to it.
That likely means you have an incomplete 'magic' file. This is the
file used by the 'file' command to figure out the file type. Take a
look at http://www.unixhideout.com/freebsd/share/misc/magic for
a more complete (I think) version.
That's dated 1995 and is close the one on my Mac. It doesn't support
the newer MS Word and Excel formats. I'm having trouble
finding the most recent, definitive version. One link pointed me
to ftp://ftp.astron.com/pub/file/ but I haven't investigated it further.
There's also a pymagic, http://thomas.mangin.me.uk/software/python.html
which may help for a pure Python implementation of 'file'.
Andrew
dalke@dalkescientific.com
- Next message: Robin Munn: "Re: dynamic typing question"
- Previous message: Todd Gardner: "Re: hex array to array of 16 bit numbers?"
- In reply to: hokiegal99: "Identifying File type by reading files"
- Next in thread: Gabriel Genellina: "Re: Identifying File type by reading files"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|