Re: encode UTF8 -> MIME



On 2007-05-30 00:00, cc96ai <calvin.chan.cch@xxxxxxxxx> wrote:
I got UTF8 value %C3%A9

Thats's not UTF-8. That's URL-encoded UTF-8.

how could I encode it become é ?

You have *decode* it to get é. And since it is encoded twice, you have
to decode it twice.

First decode the URL-Encoding:

$s = "%C3%A9";

$s =~ s/%([0-9A-F][0-9A-F])/chr(hex($1))/eg;

(there is almost certainly a module on CPAN which provides a
function to do that - but (to my surprise) neither CGI nor URI
contain such a function, ans its a simple one-liner)

Now you have UTF-8, which you can decode to a "perl character string":

use Encode;
$s = decode('utf-8', $s);

Now you have a string with a single character "é".

Now, how does MIME get into it?

For MIME, you again have to decide on a specific character encoding
(e.g., UTF-8, or ISO-8859-1, or whatever), and then possibly on a
specific transport encoding (base64 or quoted-printable).

So you have to encode it in your character encoding first, and then
possibly encode the result again with the transport encoding.

Note that the MIME is a quite complex format (especially the encoding of
header fields described in RFC 2047 and RFC 2231), so I won't go into
more detail unless you tell us exactly what you need. Any advice I can
give (except "use existing modules" and "read the RFCs") is almost
certainly incomplete and will cause you to produce ill-formed messages
if follow it blindly.

hp


--
_ | Peter J. Holzer | I know I'd be respectful of a pirate
|_|_) | Sysadmin WSR | with an emu on his shoulder.
| | | hjp@xxxxxx |
__/ | http://www.hjp.at/ | -- Sam in "Freefall"
.



Relevant Pages

  • Re: Beginners thread
    ... So I am now encoding and decoding Million Digit binary file. ... So fingers crossed that all goes well but I see the same decode data ... I am suggesting Encode be ...
    (comp.compression)
  • Re: Character Encoding
    ... > to decode the text when I read it from the database so I can display it ... I'm using UTF-8 character encoding. ... > characters that were UTF-8 incompatible came along for the ride, ...
    (comp.lang.java.programmer)
  • encoding ascii data for xml
    ... Most recent data is UTF-8 but data from ... ultimately an incorrect encoding translation but this isn't working. ... trying to encode this to utf-8 would fail but it doesn't-- I don't get ... groups interface and google mangles the entry sometimes. ...
    (comp.lang.python)
  • Re: UTF-8 practically vs. theoretically in the VFS API
    ... > Additional good news is that following octets in a utf-8 character sequence ... The original name for the encoding was, in fact, "FSS-UTF", ... do not decode to anything. ... if we don't want the kernel to know about utf-8. ...
    (Linux-Kernel)
  • Re: encoding ascii data for xml
    ... preference would be to force the data into UTF-8 even if it is ... ultimately an incorrect encoding translation but this isn't working. ... trying to encode this to utf-8 would fail but it doesn't-- I don't get ... it's a common enough keyboarding error to hit the Ctrl key instead of ...
    (comp.lang.python)