Encoding



Hi,
I wonder if anyone can help me.

I am using
- Debian Linux
- Perl 5.8.8
- DBI (I do not know how to figure out its version)
- MySQL 5.0.22-Debian_3-log

I want to store and handle data in UTF-8 but so far I have not been able to force Perl/DBI to do so.

I have created a table using

my $sql = "CREATE TABLE $tbl (".join(", ", @columns).") CHARACTER SET utf8 COLLATE utf8_czech_ci;";
$dbh->do($sql);

I feeded the table with data using

my $list_of_columns = join(", ", @names);
my $list_of_values = join(", ", map{"_utf8'$record->{$_}'"}(@names));
my $sql = "INSERT INTO $tbl ($list_of_columns) VALUES ($list_of_values);";
$dbh->do($sql);

I have looked into the database using phpMyAdmin 2.8.2-Debian-0.1 and it really looked like the data were stored in correct UTF-8.

However, when I retrieve the data from Perl/DBI, something in the chain (MySQL? the driver? DBI?) decides that another encoding (probably, Latin1) would be better for me. It "converts" the strings from UTF-8 to that encoding, which means, at the time the data arrives in my Perl code, all the non-Latin1 characters have already been irrecoverably converted to question marks. I would be happy to decode the data myself but there is nothing I can do with the question marks.

I am using the following code to retrieve the data:

my $sql = "SELECT kod, hry.nazev FROM hry INNER JOIN prodej ON hry.kod = prodej.kod_hry GROUP BY kod, hry.nazev";
my $sqlobj = $dbh->prepare($sql);
$sqlobj->execute();
while(my ($kod, $nazev) = $sqlobj->fetchrow_array())
{
...
}

So far, the only workaround I have, is not to tell the DBI the data is UTF-8 when I am inserting it (i.e., drop the "_utf8" part before the single quote), and use Encode; decode("utf8", ...) on anything I fetch from the database. This way, the database never knows the data was a UTF-8 text, treats the bytes as Latin1 characters and returns them undisturbed. However, I cannot access the data using phpMyAdmin (unless I en/decode UTF in my brain), the string lengths do not reflect the reality etc.

Is there a better way to do it? I think there must be some small stupid locale-like setting telling the machine that I am a UTF guy. But the settings I was able to come up with did not help and I actually have no idea which part of the MySQL-driver-DBI-Perl chain is responsible.

Any hints are welcome.
Thanks

Dan
.



Relevant Pages

  • Re: How to access MS Access from Perl?
    ... if you were to later migrate to MS SQL Server or Oracle ... minimize your load on the database server's compiled procedure cache. ... On the other hand, if startup time for your perl script is an issue, and ... slightly preferable to the bulkier DBI + DBD::ODBC. ...
    (comp.lang.perl)
  • Re: Encoding
    ... >> I want to store and handle data in UTF-8 but so far I have not been ... DBI?) ... >> that encoding, which means, at the time the data arrives in my Perl ... strings with the utf8 flag on so hopefully I can live without your patch ...
    (perl.dbi.users)
  • Re: DBD::Informix and SIGBUS in perl
    ... use DBI; ... > My program is a CGI script in perl, which provides web-access to a database ... assigned to a DBD::Informix database handle. ... > LANGUAGE (unset) ...
    (perl.dbi.users)
  • ANNOUNCE : DBIx::DataModel
    ... This is to announce yet another wrapper module on top of DBI, ... Please have a look and see whether it can help suit your needs. ... base DBI layer and to the basic Perl datastructures, ... will not create the database itself. ...
    (perl.dbi.users)
  • Re: Any good Perl coders out there?
    ... I need to convert a PHP script into Perl. ... do is check out the DBI and DBD modules at your friendly neighborhood CPAN ... DBD is the actual database driver for it. ...
    (comp.lang.perl.misc)