Re: MultiByte Character Sets and False Matches

From: Takanori Kawai (GCD00051_at_nifty.ne.jp)
Date: 06/30/04

  • Next message: Jeff Zucker: "Re: MultiByte Character Sets and False Matches"
    To: <dbi-users@perl.org>
    Date: Wed, 30 Jun 2004 08:35:04 +0900
    
    
    

    Hi.

    >Which is hardly surprising:
    YES. Of course.

    >> my $hSt =3D $hDb->prepare("SELECT * FROM t1 WHERE NAME like ?");
    >> $hSt->execute("%\xB9\xA5%"); #SUKI in KANJI(EUC-JP)
    > You are telling MySQL to do the matching. So the matching will use
    > whatever character set the database thinks the data uses. If you tell
    > the database that the data is ShiftJIS when it is really EUC-JP,
    > confusion is to be expected.
    I just want to point out what the problem.
    -Use appropriate DBMS with character set
    -There should be no problem in DBDs
    --Regular expressions for matching is in the SQL::Statement (not in
    DBD::CSV)

    > I could not get any information on mismatching in other character sets
    > but do you mean there are no problems with mismatching in DBD::CSV if I
    > use UTF8?
    I hope so. But other problems might occur. Be careful.
    #tu.pl(attached file) seems to work correctly with "no utf8;"
    # But with "use utf8;", it will get no lines.

    ==============================================
    KAWAI, Takanori(Hippo2000)
       Mail: GCD00051@nifty.ne.jp kwitknr@cpan.org
       http://member.nifty.ne.jp/hippo2000/index_e.htm
       http://www.hippo2000.info/cgi-bin/KbWikiE/KbWiki.pl
     May we translate your pods into Japanese?
        -- Japanized Perl Resource Project
      http://sourceforge.jp/projects/perldocjp/
    ==============================================

    
    



  • Next message: Jeff Zucker: "Re: MultiByte Character Sets and False Matches"

    Relevant Pages

    • Re: MultiByte Character Sets and False Matches
      ... MySQL on my Windows uses ShiftJIS not EUC-JP. ... You are telling MySQL to do the matching. ... whatever character set the database thinks the data uses. ... the database that the data is ShiftJIS when it is really EUC-JP, ...
      (perl.dbi.users)
    • Re: Default characterset for database on Windows Servers
      ... Export done in WE8MSWIN1252 character set and AL16UTF16 NCHAR character ... On HP UNIX boxes, default character set is AL32UTF8, some UTF8 ... A different default database character set (not NCHAR character set is ...
      (comp.databases.oracle.server)
    • Re: Fuzzy matching of postal addresses
      ... > need to look for matching addresses in the two databases. ... > database B. ... The critical issue is, as you suspected, normalization. ... The first is a flat, house name and street name, the second is a number ...
      (comp.lang.python)
    • Re: Help with lookups and arrays
      ... I know this would be better as a database. ... Pricesheets, then return a 1 if the ... *approximate* matching, which is why col E in the Pricelist table needs ... lookup key exactly, that's why its result is compared to the lookup key. ...
      (microsoft.public.excel)
    • Re: Help beautify ugly heuristic code
      ... I'm already doing that with the rehmac regex. ... I tried matching a custom regex ... but compiling the regex for each test was too slow. ... A Bayesian classifier would have too big of a database, ...
      (comp.lang.python)