Big unicode problem with Perl 5.8.8 with MySQL 5.0 (i.e. Debian 4.0)



I just upgraded from Debian 3 (Perl 5.8.4 and MySQL 4.1) to Debian 4 (Perl 5.8.8 and MySQL 5.0 and DBD::mysql 3.0008) and have the following problem:

In the following lines I create a table with a char field of length 4 and then try, using a perl script, to populate it with a string of 4 unicode characters, and see that only the 2 first characters have been stored, in a "double-encoded" form (thus taking the space of 4 characters). Needless to say, this is a huge problem.


First the table:

mysql> create table bbb (a int primary key auto_increment, b varchar(4));
Query OK, 0 rows affected (0.00 sec)


Then the perl script to populate the field:

#!/usr/bin/perl -w
use DBI;
my $dbh = DBI->connect("DBI:mysql:aaa", 'username', 'password', { RaiseError => 1 });
$dbh->do("insert into bbb set b = 'Αθήν'");


And then checking the result:

mysql> select * from bbb;
+---+------------+
| a | b |
+---+------------+
| 1 | ΡÆÎ¸ |
+---+------------+
1 row in set (0.00 sec)


That was with default_character_set=utf8 under the [mysql] section of my.conf.

Commenting out that line and viewing the table again, we get:

mysql> select *, char_length(b) from bbb;
+---+------+----------------+
| a | b | char_length(b) |
+---+------+----------------+
| 1 | Αθ | 4 |
+---+------+----------------+
1 row in set (0.00 sec)


i.e. we only got the first two letters in the table, but doubly-encoded to take up the space of 4 chars.

I'm desperate for a solution, a hint, or if you run Debian to please try these short scripts on your machine to tell me whether you're getting the same results (or better ones).

Thanks.

P.S. I'm 99.9% positive I've made sure the problem is not at my terminal's encoding, by uploading the perl script from another machine (that's known to have no problem) and inserting a 'use encoding "utf8";' pragma as well.

And thanks again.
.



Relevant Pages

  • Add a page to an EPS file or covert to PS
    ... intermediate file that gets populated by a perl script and then converted to a pdf. ... If so, what would I use that would not lose the fields I need to populate (they are delimited by curly braces, i.e. )? ...
    (comp.lang.postscript)
  • Re: Character Count
    ... On Mar 23, 2004, at 3:18 PM, Rob Torres wrote: ... > I've tried many different ways of writing this, ...
    (perl.beginners)
  • Character Count
    ... I am writing a simple perl script to count the # of characters in a given ... Unrecognized file test: ... I've tried many different ways of writing this, but can't seem to get it ...
    (perl.beginners)
  • Problem with accentued characters
    ... I am having problems with a perl script. ... The fellowing script is ... contain accented characters. ... use strict; ...
    (comp.lang.perl.misc)
  • Using javascript variable value in Perl script
    ... function UpdatePage() ... var x=document.form100.TextArea1.value; ... I want to use this value in perl script. ... could be 3000 characters or more. ...
    (comp.lang.javascript)