Re: encoding problem?
- From: nobull67@xxxxxxxxx (Nobull67@xxxxxxxxx)
- Date: Thu, 27 Sep 2007 18:19:33 -0000
On Sep 27, 12:27 pm, braeds...@xxxxxxxxxxx wrote:
I am trying to use perl on the command line to process text files in
various ways, one of which is to decode html entities. As far as I can
see, the following line should work
perl -MHTML::Entities -p -e 'decode_entities($_)' <input.txt
output.txt
it does indeed change the html entities, but not into the required
characters, rather into pairs of unusual characters; and the command
line returns this:
Wide character in print, <> line 1.
It seems to me it is something to do with internal character encoding
being messed up but I can't work out how to control it.
Before you can control it you need to know what it is.
The text files
processed have MacOS character encoding which is required in the
finished file,
What is "MacOS character encoding"?
but perhaps I need to convert to UTF8 before processing
and back again after?
Perl will do this automatically if you tell it the encoding of the
input and output.
perl -MHTML::Entities -p -e 'decode_entities($_)' <input.txt
I think you need something like
perl -MHTML::Entities -p -e "BEGIN { binmode STDIN,
':encoding(whatever)'; binmode STDOUT, ':encoding(whatever)' }
decode_entities($_)"
Where "whatever" is the name Perl uses for that which you are calling
"MacOS character encoding".
For a list of supported encodings:
perldoc Encode::Supported
.
- Follow-Ups:
- Re: encoding problem?
- From: Braedsjaa
- Re: encoding problem?
- References:
- encoding problem?
- From: braedsjaa
- encoding problem?
- Prev by Date: Re: Strange debugging question...
- Next by Date: Re: $File::Find and no_chdir
- Previous by thread: Re: encoding problem?
- Next by thread: Re: encoding problem?
- Index(es):
Relevant Pages
|
|