Re: encode UTF8 -> MIME



On 2007-05-30 00:00, cc96ai <calvin.chan.cch@xxxxxxxxx> wrote:
I got UTF8 value %C3%A9

Thats's not UTF-8. That's URL-encoded UTF-8.

how could I encode it become é ?

You have *decode* it to get é. And since it is encoded twice, you have
to decode it twice.

First decode the URL-Encoding:

$s = "%C3%A9";

$s =~ s/%([0-9A-F][0-9A-F])/chr(hex($1))/eg;

(there is almost certainly a module on CPAN which provides a
function to do that - but (to my surprise) neither CGI nor URI
contain such a function, ans its a simple one-liner)

Now you have UTF-8, which you can decode to a "perl character string":

use Encode;
$s = decode('utf-8', $s);

Now you have a string with a single character "é".

Now, how does MIME get into it?

For MIME, you again have to decide on a specific character encoding
(e.g., UTF-8, or ISO-8859-1, or whatever), and then possibly on a
specific transport encoding (base64 or quoted-printable).

So you have to encode it in your character encoding first, and then
possibly encode the result again with the transport encoding.

Note that the MIME is a quite complex format (especially the encoding of
header fields described in RFC 2047 and RFC 2231), so I won't go into
more detail unless you tell us exactly what you need. Any advice I can
give (except "use existing modules" and "read the RFCs") is almost
certainly incomplete and will cause you to produce ill-formed messages
if follow it blindly.

hp


--
_ | Peter J. Holzer | I know I'd be respectful of a pirate
|_|_) | Sysadmin WSR | with an emu on his shoulder.
| | | hjp@xxxxxx |
__/ | http://www.hjp.at/ | -- Sam in "Freefall"
.



Relevant Pages

  • Re: Character Encoding
    ... > to decode the text when I read it from the database so I can display it ... I'm using UTF-8 character encoding. ... > characters that were UTF-8 incompatible came along for the ride, ...
    (comp.lang.java.programmer)
  • encoding ascii data for xml
    ... Most recent data is UTF-8 but data from ... ultimately an incorrect encoding translation but this isn't working. ... trying to encode this to utf-8 would fail but it doesn't-- I don't get ... groups interface and google mangles the entry sometimes. ...
    (comp.lang.python)
  • Re: UTF-8 practically vs. theoretically in the VFS API
    ... > Additional good news is that following octets in a utf-8 character sequence ... The original name for the encoding was, in fact, "FSS-UTF", ... do not decode to anything. ... if we don't want the kernel to know about utf-8. ...
    (Linux-Kernel)
  • Re: encoding ascii data for xml
    ... preference would be to force the data into UTF-8 even if it is ... ultimately an incorrect encoding translation but this isn't working. ... trying to encode this to utf-8 would fail but it doesn't-- I don't get ... it's a common enough keyboarding error to hit the Ctrl key instead of ...
    (comp.lang.python)
  • Re: Python 3.1.1 bytes decode with replace bug
    ... In the original example I decoded to UTF-8 and in this example the ... The problem in your original example, and in the current one, is not in decode(), but in encode, which is implicitly called by print, when needed to convert from Unicode to some byte format of the console. ... But since you're running in a debugger, there's an implicit print, which is converting unicode into whatever your default console encoding is. ...
    (comp.lang.python)

Loading