Re: Problem converting euro from windows-1252 to UTF-8 !!



nevosa wrote:

I am trying to convert RFC-2047 encode MIME headers to UTF-8. It is
working fine so far using the MIME::WordDecoder and Unicode::MapUTF8
CPAN packages. I have done some unit testing and it seems to work fine
except for when I try to convert windows-1252 encode euro(0x80) symbol
to UTF-8.

Subject: This sub has non-ascii chars.Pound:
?windows-1252?Q?Euro_=80_?=

In this case the conversion simply fails and what i see as output is 2
spaces (0x20 0x20).

Windows-1252 differs from "default" ISO-8859-1 by using displayable
characters rather than control characters in the 0x80-0x9F range. If
you're running *nix, Windows-1252 might not be available.

To make sure, see

http://en.wikipedia.org/wiki/Windows-1252

And then try to display the characters in the yellow squares. If
they're not correctly converted, you've found the culprit.

Hope this helps,

--
Bart

.



Relevant Pages

  • Re: urllib.unquote and unicode
    ... input string in unicode is encoded to UTF-8 and then each byte ... followed by two hex characters. ... For all valid URIs ... encode to ascii before unquoting. ...
    (comp.lang.python)
  • Re: HTML entities from input fields
    ... >> IE to encode characters outside of the current code-page ... submitting whatever characters they care to paste into the submission ... Browsers will then send the data in UTF-8 format ...
    (comp.infosystems.www.authoring.html)
  • Re: URL Encode via SQL?
    ... Not sure which characters you want encoded. ... >> encode the value as needed. ... The guy who administrates Goldmine ... > wants to send out a mass mail to every customer in the database. ...
    (microsoft.public.sqlserver.programming)
  • Re: "Wide character in syswrite" in writing an HTML form.
    ... people don't know about "encode", it was also necessary to write ... What that says is that you feed it a "string" (i.e of characters ... fed to syswrite. ... return value of syswriteare in UTF-8 encoded Unicode characters). ...
    (comp.lang.perl.misc)
  • Re: What is the difference between using Unicode and UUENCODE?
    ... Thus unicode is able to encode all characters: ... Since email server usually only transport 7 bits, unicode characters must be ... and encode binary data with BASE64. ...
    (microsoft.public.outlook)