Re: Is there a better way to convert foreign characters?



Dr.Ruud wrote:

perl -Mstrict -Mutf8 -MText::Unidecode -wle '
my $s = "àâÀéèëêÉÊçÇîïôÔùû";
print Text::Unidecode::unidecode( $s );
'
aaAeeeeEEcCiioOuu

The purpose of that module is to handle non-Roman characters. What makes you believe those characters are Unicode?

$ perl -MEncode -le '
$octets = "àâÀéèëêÉÊçÇîïôÔùû";
print "Raw: ", $octets;
print "Latin-1: ", decode "ISO-8859-1", $octets;
print "ANSI: ", decode "Windows-1252", $octets;
'
Raw: àâÀéèëêÉÊçÇîïôÔùû
Latin-1: àâÀéèëêÉÊçÇîïôÔùû
ANSI: àâÀéèëêÉÊçÇîïôÔùû
$

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
.



Relevant Pages

  • Re: Unicode Support
    ... >> (I know this is a poor example, but think about other languages, eg ... First things first, when you register your RosAsm windows classes, you ... the messages with ANSI / UNICODE parameters in ANSI or UNICODE form... ... with their alphabet characters, as with the numbers and punctuation...so, ...
    (alt.lang.asm)
  • =?windows-1252?Q?Re=3A_Encrypting_Unicode_=96_Using_ASCII_as_a_Surrogat?= =?windows-1252?Q?e
    ... characters of an exotic eastern language using an ASCII keyboard. ... communicate in large volume with China or Japan using CJK from Unicode ... present the message text to Alice as a string of hexadecimal numbers ... by the computer as an external file and enciphered by a stream cipher ...
    (sci.crypt)
  • =?windows-1252?Q?Encrypting_Unicode_=96_Using_ASCII_as_a_Surrogate_Al?= =?windows-1252?Q?pha
    ... characters of an exotic eastern language using an ASCII keyboard. ... It is true to say that any keyboard of any language can be simulated ... communicate in large volume with China or Japan using CJK from Unicode ... by the computer as an external file and enciphered by a stream cipher ...
    (sci.crypt)
  • =?windows-1252?Q?Re=3A_Encrypting_Unicode_=96_Using_ASCII_as_a_Surrogat?= =?windows-1252?Q?e
    ... characters of an exotic eastern language using an ASCII keyboard. ... communicate in large volume with China or Japan using CJK from Unicode ... present the message text to Alice as a string of hexadecimal numbers ... by the computer as an external file and enciphered by a stream cipher ...
    (sci.crypt)
  • Re: heeeeeeeeeeeeeeeellllllllllllllppppppppppppppppppppp
    ... This means that if you develop the bad habit of using char * (left over ... It usually takes me five minutes to create a Unicode version of any of my apps, ... BOOL and bool are different data types. ... can be up to MAX_PATH characters). ...
    (microsoft.public.vc.mfc)