Re: CGI.pm: encoding problems



Ben Bullock wrote:
I have a problem with inputing utf-8 via a text window using CGI.pm. This problem concerns UTF8 so apologies for posting something with Chinese characters in it.

The following code is a minimal working example of the problem with a lot of extraneous material removed. It needs to be run under a web server to see the problem. When the text is submitted using the form, the default text of Chinese characters (they are the numbers from one to four) are munged into some gibberish stuff, and the test of the input, which checks whether the input is valid Chinese numerals, fails:

Input text:

一二三四

Output of program:

Input 一二三四 was not a valid number

Thank you very much for any assistance, suggestions or advice about this problem.

Begin script (to end of message)

#!/usr/bin/perl
use warnings;
use strict;
use CGI;
use utf8;
binmode (STDOUT, ":utf8");
my $query = CGI->new();
$query->charset('UTF-8');
print $query->header();
my $kanji;
if ($query->param('kanji')) {
my $inputnumber = $query->param('kanji');
if ($inputnumber =~ /^([一二三四五六七八九十]+)$/) {
$kanji = $1;
} else {
print "<p>Input $inputnumber was not a valid number</p>";
$kanji = "";
}
} else {
$kanji = "一二三四";
}
print $query->start_form(-method => 'POST',-action => $query->url());
print $query->textarea(-name => 'kanji',
-default => $kanji);
print $query->submit();
print $query->endform();
print "<table><tr>\n<th>Value</th><td>",
$kanji, "</td></tr>\n", "</table>\n</form>\n<p>\n";
print $query->end_html();


I made a few changes to your program. I don't know exactly what the problem is, but I hope that this sheds some light on it:

#!/usr/bin/perl
use warnings;
use strict;
use CGI;
use utf8;
use Encode (); # changed
binmode (STDOUT, ":utf8");
my $query = CGI->new();
$query->charset('UTF-8');
print $query->header('-cache-control' => 'no-cache'); # changed

my $kanji;
if ($query->param('kanji')) {
my $inputnumber = $query->param('kanji');

print <<EOF;
<p> Interesting decodings of
&quot;$inputnumber&quot; <br>
UTF-8: @{[ Encode::decode('utf8', $inputnumber) ]} <br>
</p>
<hr>

EOF

# Add this to decode the number:
$inputnumber = Encode::decode('utf8', $inputnumber);

if ($inputnumber =~ /^([一二三四五六七八九十]+)$/) {
$kanji = $1;
} else {
print "<p>Input $inputnumber was not a valid number</p>";
$kanji = "";
}
} else {
$kanji = "一二三四";
}

print <<EOF;
<p> The value if \$kanji is: $kanji
</p>

EOF

print $query->start_form(
-method => 'POST',
-action => $query->url()
);
print $query->textarea(-name => 'kanji',
-default => $kanji);

print <<EOF;
<textarea name=alternate>
DATA = $kanji
</textarea>
EOF

print $query->submit();
print $query->endform();
print "<table><tr>\n<th>Value</th><td>",
$kanji, "</td></tr>\n", "</table>\n</form>\n<p>\n";
print $query->end_html();
.



Relevant Pages

  • Re: Jorden at MIT
    ... >>I've never thought that writing Japanese with Chinese characters changed ... > You don't think that the borrowing of the Chinese characters affected ... > compounds of appropriate kanji. ... If they were combining spoken units, only one reading would ...
    (sci.lang.japan)
  • Re: kanji/chinese/japanese
    ... > kanji for sky and the kanji for harbor. ... It turns out there are often lots of Chinese characters that make ... Chinese character and simply changed the sound. ... literate scholars. ...
    (sci.lang.japan)
  • CGI.pm: encoding problems
    ... This problem concerns UTF8 so apologies for posting something with Chinese characters in it. ...
    (comp.lang.perl.modules)
  • Re: Just ask Yao!
    ... talking about chinese characters and kanji is a japanese ... Hey foibey. ... I actually Googled kanji ...
    (rec.arts.bodyart)
  • Re: how to write a unicode string to a file ?
    ... open a file that is in UTF8-- it just might need to be TOLD that its utf8 when you go and open the file, as UTF8 looks just like ASCII -- until it contains characters that can't be expressed in ASCII. ... decide a file is written in UTF-8. ... When opened in Excel without the BOM, I got gibberish, but with the BOM the Chinese characters were displayed correctly. ...
    (comp.lang.python)