HTML in utf8 and perl

From: Pawel Niewiadomski (niewiap_at_NOSPAM.widzew.net.INVALID)
Date: 02/28/04


Date: Sat, 28 Feb 2004 13:02:24 +0000 (UTC)

I have been looking all over for an answer to this and haven't found a
satisfactory one. Please tell me what's going on. I want to write a perl
script generating an html page encoded in utf8. I was wondering why the
following code

#!/usr/bin/perl
binmode (STDOUT, ":utf8");
use charnames ':full';
printf "\N{CYRILLIC SMALL LETTER EF}\n";
printf "\x{d184}\n";

produces two characters encoded differently, although theoretically it
should generate two russian ef's identically encoded. The first character
is normaly visible in a browser (provided I set utf8 encoding) and the
second is not. Other than that, the second character is coded by three,
not two bytes, as I would expect. Changing :utf8 to :raw in the second
line only produces additional "Wide character in print at..." warnings
but doesn't change the general output. Writing printf "\xd1\x84\n" would
be a solution, but I am wondering what the problem here is with "\x
{d184}". If what I am asking has an obvious answer, please be so kind and
refer me to a sensible source of information.
Thanks very much in advance,
Pawel



Relevant Pages

  • Re: HTML in utf8 and perl
    ... >script generating an html page encoded in utf8. ... Other than that, the second character is coded by three, ... This is the UTF-8 representation of U+D184, ... printf "\N\n"; ...
    (comp.lang.perl.misc)
  • Unicode (Was: Re: subjective feelings about actions?)
    ... >any other unicode encoding. ... If you are working at the character level, ... working on raw UTF8 can be a chore. ...
    (sci.lang.japan)
  • Re: printing % with printf(), use of (escape) character
    ... > about the tab character; printfjust copies it the same ... > printf() to stop copying and do something else, ... > and the strings don't contain any backslash characters ... > it could perfectly well have used the backslash character. ...
    (comp.lang.c)
  • Re: basics
    ... and has a completely useless getchar call. ... the printf function requires getchar." ... "The printf function is the standard C way of displaying output on the ... the user to hit enter before reading the character." ...
    (comp.lang.c)
  • Re: questions on ftell and fopen
    ... you may be much happier replacing the printf line with: ... water in the river from that information. ... Since you've said you're on Linux, ... a last character or as a character between the characters in ...
    (comp.lang.c)