pure perl replacment for "/usr/bin/file"



Hi,

I want to determine the character encoding of some strings I have.
Something similar to the "file" tool, which gives me this information:

cp1252.text: Non-ISO extended-ASCII text
iso-8859-1.text: ISO-8859 text, with no line terminators
macintosh.text: Non-ISO extended-ASCII text
utf16.text: data
utf8.text: UTF-8 Unicode text, with no line terminators


I got the files, well strings, via CGI upload and I want to encode all
to UTF8. Therefore I want to "use encoding..." but first I have to determine
which encoding the uploaded file is.

In addition I do not want to save the content to a File::Temp to run "file" on it.

Is there a solution in pure perl for this problem?


http://search.cpan.org/~cwest/ppt-0.14/bin/file is the best thing I found so far, but it only showes me this:

# ./file *
cp1252.txt: text
iso-8859-1.text: text
macintosh.text: text
utf16.text: data
utf8.text: text
.



Relevant Pages

  • Re: filename with characters other than english cant read
    ... When you are being puzzled by character encoding issues (or where it seems as ... resulting file in your editor, what character encoding did the editor decide to ... easy to get lost. ... So, when you get this kind of problem, don't just print the strings, print the ...
    (comp.lang.java.programmer)
  • Re: Encoding of file names
    ... applying a non-roundtrippable character encoding. ... 8-bit string "double bucky blackslash vertical bar" ... if you ask for byte strings, and one to apply when you ask for Unicode strings. ... If they continue to do so, they will continue to get bogus results in border cases. ...
    (comp.lang.python)
  • Re: [PATCH] Smackv10: Smack rules grammar + their stateful parser
    ... How do we get the information about the character encoding of the string ... I really would expect that kernel strings don't have an encoding. ...
    (Linux-Kernel)
  • Re: [PATCH] Smackv10: Smack rules grammar + their stateful parser
    ... On Tue, 6 Nov 2007, Adrian Bunk wrote: ... How do we get the information about the character encoding of the string ... I really would expect that kernel strings don't have an encoding. ...
    (Linux-Kernel)