Re: DBI and character sets (yet again)

From: Tim Bunce (Tim.Bunce_at_pobox.com)
Date: 03/22/04


Date: Sun, 21 Mar 2004 23:17:16 +0000
To: Dean Arnold <darnold@presicient.com>

On Sun, Mar 21, 2004 at 01:10:27PM -0800, Dean Arnold wrote:
> (Note: I'm sending this to both -users and -dev, I'm not
> certain which it belongs to at this point)

dbi-user I think, at this point, as wide user comment may be helpful.
Though I might regret that if this produces more heat than light.
[I've removed dev-dev from the CC list. Anyone else replying to
(replies to) the original please do the same. Thanks.]

> Is there a consistent charset encoding behavior defined for
> DBI at this time ?

No.

> If not, is a rule wrt charset encoding behavior needed ?

Yes.

> If a list of charset behaviors for each DBD is needed,
> I'd be happy to put one together, assuming the DBD authors
> send me the details for each driver.

That would be great.

I'm not expert on this, as I'm probably about to prove, but here's
my perspective, for today at least...

1. Most applications only work with one character set encoding
   (not counting UTF8). Obvious example: Latin-1.

2. Unicode is where we're going. Get used to it.

3. I don't really want the DBI to be involved in any recoding
   of character sets (from client charset to server charset)
   and I suggest that the drivers don't try to do that either.

4. DBI v2 will provide hooks to allow callbacks to be fired
   on fetching a field and/or row and that could be used by an
   application for recoding if it wants to 'hide' it under the DBI.

5. When selecting data from the database the driver should:
   - return strings which have a unicode character set as UTF8.
   - return strings with other character sets as-is (unchanged) on
     the presumption that the application knows what to do with it.

6. Drivers that want to can offer a mechanism to recode non-unicode
   character sets into unicode but I don't see a big need for the
   DBI to standardize an interface for that at the moment.

7. DBI v2 will probably provide a way for applications to force the
   UTF8 flag on particular columns as a workaround for drivers that
   don't know the string of bytes they're returing is actually UTF8.

8. When passing data to the database (including the SQL statement)
   the driver should (perhaps) warn if it's presented with UTF8
   strings but the database or database can't handle unicode.

Comments welcome, of course, but please stick to practical issues,
ideally with examples, rather than theoretical ones. Thanks.

Tim.



Relevant Pages

  • Re: Perl 6 DBI API ideas
    ... Though I didn't raise my hand when you asked for people to contribute to a DBI module for Perl 6, I've had some ideas that I thought about sharing. ... I would also love to see some standardization on the driver names ("mysql" when it's normally written ... DBI is supposed to be query language agnostic, even if SQL is the most commonly used group of languages, and if a user can declare this explicitly, it saves the driver from having to guess what they were given, which might be ambiguous. ...
    (perl.dbi.users)
  • DBD::ODBC make test fails
    ... I installed DBI ... this looks like a unixodbc type of driver manager. ... Undefined subroutine &main::BAILOUT called at t/02simple.t line 23. ... ok 2 - use ODBCTEST; ...
    (perl.dbi.users)
  • RE: ORACLE DRIVER
    ... DBI - is the perl module to handle all the request of the database, connecting, disconnecting, routing the database request to proper driver. ... DBD::Oracle - is the Oracle driver that DBI can use to handle oracle request. ...
    (perl.dbi.users)
  • Re: Accessing VFP9 data from Perl/JSP pages being served from Apache/Linux?
    ... There is a dbi driver for ADO, you might be able to use that to get to VFP ...
    (microsoft.public.fox.programmer.exchange)
  • Re: Adding utf8 support to DBD::mysql
    ... The utf8 patch is very much a quick hack but it you were to submit ... interested in enhancing DBD::mysql and to other driver developers. ... It's the "character set number for the field". ... If the connection character set is _not_ utf8 but the application calls ...
    (perl.dbi.users)