Re: DBD::mysql and UTF-8



[ YorHel ]

[ on UTF-8 problems with DBD::Mysql ]

> Hate to reply to myself, but after some more googling, I found that
> when I cal a
> $dbh->do("SET NAMES 'utf8'");
> right after the DBH->connect(), I can put the real UTF-8-data in the
> DB, so phpMyAdmin and cli mysql both give the real UTF-8 output,
> instead of those weird characters. With this, it doesn't matter whether
> I call the decode() on $somevar before sending the UPDATE query, the
> data will still be inserted as UTF-8.
> But then again, I need to set the utf8-flag on $result with decode(),
> to get the well-formatted data, which sound like a hack to me, am I not
> supposed to get the UTF-8-ed data when I call $obj->fetchrow_array()?
> And the use of the "SET NAMES 'utf8'"-call also seems like an hack to
> me, why do I need to use that when the database I am using is already
> defined as "UTF-8".

I have experienced the very same problem and the reason, as far as I
could determine, is that Mysql has something called 'client character
set' and 'connection character set' which get set on a client
connection. Since DBD::Mysql doesn't read the .my.cnf file the
defaults are used, which are swedish latin 1. One would think (and
hope) that mysql would use database charset or table charset or even
default charset, but it doesn't seem to do so (I ran 4.1.12 when
doing this).

See <URL: http://dev.mysql.com/doc/mysql/en/charset-connection.html>
for further details (it doesn't cover perl specifics though).

I also resorted to "SET NAMES 'utf8'" which solved the issue. I,
luckily, didn't have to use encode and decode since the data was UTF-8
on the way into the db, and should be printed as UTF-8 also.

--
Knut
Matchbox cars and soda cans
.



Relevant Pages

  • Re: Changing the default charset for composing messages
    ... > correct default for the localized version of Entourage you're using. ... > UTF-8 if your message contains characters from more than one character set. ... > will just choose the correct charset on the basis of the characters you've ...
    (microsoft.public.mac.office.entourage)
  • Re: Decode data of different charsets into UTF8 (Perl internal format)
    ... I turned out that it wasn't the decode() function. ... block, determine the charset, decode it and process it. ... The "utf8" charset will skip the decoding part in the script. ... I think checking for the first character to be in range U+0001 to ...
    (comp.lang.perl.misc)
  • Re: utf8 output from database
    ... > set up to display that particular character. ... And I'm not sure UTF-8 ... The charset parameter doesn't 'do' anything. ... character repertoire applies. ...
    (comp.lang.php)
  • Re: LWP and Unicode
    ... default charset on usenet, and as much as I dislike ... in a MIME Content-Type header. ... UTF-8) should be acceptable. ... character set - a set of characters in the mathematical sense, ...
    (comp.lang.perl.misc)
  • Re: converting unicode to UTF-8
    ... write a two-byte character count followed by ... some bytes that represent the string in a format that is related ... UTF-8 is a a way of taking a stream/string of Unicode characters (and Java ... Java that conversion is ultimately provided by a "charset", ...
    (comp.lang.java.programmer)