unicode (utf8) DBI and MySQL

From: Angie Ahl (angie.ahl_at_gmail.com)
Date: 11/29/04

  • Next message: Graeme St. Clair: "RE: Windows to Oracle"
    Date: Mon, 29 Nov 2004 21:41:17 +0000
    To: dbi-users@perl.org
    
    

    Hi List

    I've been using DBI & MySQL for some time now and have decided to try
    and use unicode so that my web apps can be multilingual.

    I'm trying to work out getting data into and out of MySQL with utf 8.

    I'm inserting the data like this:

    I've got a hash in the following format:

    my %uni = (
            hebrew_alef => {
                            character => chr(0x05d0),
                            language => "hebrew",
            },
            recenu => {
                            character => "re\x{e7}enu",
                            language => "french",
            },
    );

    and I'm inserting the values into the database like this:

    my $funny = "CONVERT(_utf8'$ucode' USING utf8)";
    my $sql = qq {INSERT INTO unitest (id, aword) VALUES ( "$id", $funny )};

    Is this CONVERT business necessary/the right way to do it?.

    getting the data back is done like this:

    sub dbget {
            my $id = shift;
            my $sql = "select aword from unitest where id = \"$id\"";
            my $cur = $dbh->prepare($sql);
            $cur->execute;
            my $char = ($cur->fetchrow)[0];
            return decode("utf8", $char);
    }

    running is_utf8( &dbget($_)) ? "is unicode" : "is not unicode";
    indicates that I am getting utf8 data back when I use decode, but
    here's what's wrong:

    the following code is used to output the data:

    foreach (sort keys %uni) {
            &dbpush( $_, $uni{$_}->{character} );
            printf $tablinedef,
                    $_,
                    $uni{$_}->{language},
                    $uni{$_}->{character},
                    &dbget( $_ ),
                    is_utf8( &dbget($_)) ? "is unicode" : "is not unicode";
    }

    $uni{$_}->{character} shows the character as expected in firefox eg: abīmer

    but &dbget doesn't show it correctly (it should apparently), but the
    is_utf8 test says it is utf8.

    Sorry for the long post but I'm so new to unicode that I just can't
    work out what I'm missing.... Here's hoping Mr Dubois is around....
    great books BTW thanks.

    Angie


  • Next message: Graeme St. Clair: "RE: Windows to Oracle"

    Relevant Pages