Re: Print Spanish characters in Perl?



On Fri, 20 Jun 2008 04:37:34 +0000, Jürgen Exner wrote:

DanB <dbxxxxxxx@xxxxxxxxx> wrote:

I am trying to build a set of Spanish flash cards using TK, and I
need to be able to display the accented characters. I know that I
need to specify them in some unicode besides utf-8,

Actually, you don't. Just put them into your code in your favourite
editor and treat them like any ASCII character.

I suggest *not* doing this, but

use utf8;

and ensure that your file is saved in the UTF-8 format.

A problems arise only if your editor saves the file in a different
encoding then your display device expects.

You are just playing Russian Roulette with the encodings. Playing
Russian Roulette might be a safe hobby. After all there was no bullet
there last time you pulled the trigger, so this time you'll be safe
too. In the same way, you can cross your fingers and hope the
encodings match.

Typical examples are e.g.
saving as UTF-8, then including the text in an HTML page but
forgetting to specify UTF-8 as charset.

To avoid this kind of problem, make sure that all the characters are
encoded into Perl's internal encoding with

use utf8;

and always specify the output encoding you want:

binmode STDOUT,":encoding(utf8)";
binmode STDOUT,":encoding(cp850)";

or

open my $file, ">:encoding(iso-latin-1)", "filename";

If you need to pass a string to some kind of module which doesn't
understand UTF-8 (there are lots of these), then you can decode it into
whatever the module wants with

use Encode 'encode';

encode ("cp850", $string);

Similarly, there is

decode ("cp850", $string);

to go the other way.

I recommend you to keep everything in the Perl code which is under your
control as UTF-8, and don't use anything else. Always

use utf8;

at the top of the script. If your editor accidentally saves the file
in a non-UTF-8 format, then when you try to compile your Perl script
you'll get lots of messages like

Malformed UTF-8 character (unexpected non-continuation byte 0xd1,
immediately after start byte 0xf1) at ./encodings.pl line 8.
Malformed UTF-8 character (unexpected non-continuation byte 0xdc,
immediately after start byte 0xd1) at ./encodings.pl line 8.
Malformed UTF-8 character (unexpected non-continuation byte 0xfc,
immediately after start byte 0xdc) at ./encodings.pl line 8.
Malformed UTF-8 character (5 bytes, need 6, after start byte 0xfc) at ./
encodings.pl line 8.

So

use utf8;

means that Perl can protect you against accidents with text editors.

Here is a small sample script to test it out with:

#!/usr/local/bin/perl
use utf8;
binmode STDOUT,":encoding(cp850)";
my $spanish = "ñÑÜü¿¡«»\n";
print $spanish;

You don't even need to "use warnings;" to get the error messages.

In this case the browser
defaults to ISO-Latin-1 and the non-ASCII characters will be messed
up, of course. Or saving the file as Windows-1252 (or ISO-Latin-1)
and then viewing the output in a DOS Window which for western
languages uses OEM CP 850.

The attitude of a lot of people towards encodings seems to be just ignore
the problem and hope it will go away. That's OK as long as your luck
holds. If you are lucky, the encoding you used for the script may happen
to be the same one as the web browser or whatever expects. If you are
unlucky then they won't be the same. Then you'll get strange bugs, and
you won't know why.

So I suggest that unless you only use ASCII, you should get encodings
under close control. Specify the encoding of your Perl script with

use utf8;

and then specify exactly how you want to encode input and output at each
point. Then there won't be so many unexpected things waiting for you next
time something goes wrong with your code.

.



Relevant Pages

  • =?utf-8?B?UmU6IFN0cmluZyAiw6LigqzihKIiIHRyYW5zbGF0ZWQgdG8gYXBvc3Ryb3BoZS4gV2h5Pw==?=
    ... it works), though it seems to use mostly just Ascii characters, representing ... but the author is not making the best possible use of UTF-8. ... They don't map it to ASCII apostrophe, ... Latin 1 encoding. ...
    (alt.html)
  • Re: Special Characters in Query String
    ... I've had numerous problems with utf-8, ... in common characters in spanish not geting displayed. ... > available for encoding of characters. ... > If you can display your characters with ISO-8859-1, ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: DBD::ODBC and character sets
    ... you have and accept UTF-8 encoded data does mean you need to "use ... encoding" but if your script is encoded in xxx you need "use encoding ... Perl sees the left-hand side of eq as a string literal containg sixcharacters encoded as ISO-8859-1 ...
    (perl.dbi.users)
  • Re: Character Encoding
    ... > to decode the text when I read it from the database so I can display it ... I'm using UTF-8 character encoding. ... > characters that were UTF-8 incompatible came along for the ride, ...
    (comp.lang.java.programmer)
  • Re: Missing characters after file rewrite using File.OpenText
    ... you create your StreamReader. ... If your output file isn't meant to be UTF-8, you should specify the ... encoding when you create your StreamWriter. ...
    (microsoft.public.dotnet.languages.csharp)