Re: Fun with UTF-8 and Oracle [PATCH]



On 2007-09-04 14:20:20 +0100, Tim Bunce wrote:
On Mon, Sep 03, 2007 at 07:20:42PM +0200, Peter J. Holzer wrote:
On 2007-09-03 13:03:37 +0100, Tim Bunce wrote:
Oh the joys of Oracle character set support.

See the dbd_rebind_ph() code in dbdimp.c. Especially from the line
"csform = phs->csform;" onwards.

You can explicitly set the csform and csid using bind_param(..., { ... })

Have fun!

I had :-). You gave me a push into the right direction, and I think I
found the problem:

[code deleted]

While it claimed to set csform=SQLCS_IMPLICIT, it was actually
setting it to SQLCS_NCHAR, and I guess that was what prevented the
use of the index on the varchar2 column. Sure enough, if I
explicitely set ora_csform to 1, it's fast. It is also fast, if I
set NLS_NCHAR to US7ASCII, so that it isn't used.

Maybe the tests in line 1379 and 1381 should be reversed so that
SQLCS_IMPLICIT is preferred over SQLCS_NCHAR?

Sounds very plausible.
[...]
That's great, but why stop there? Now you seem to have identified the
problem, how about patching the code instead ;-)

Sorry for the long delay. I wanted to test the patch properly first and
then forgot to post it. Now I stumbled upon the same problem on a
different server which reminded me that I never posted the patch. So
here it is.

(I couldn't think of a proper test case: Building a table with 4 billion
rows and then checking whether the query takes a few seconds or an hour
doesn't seem a good idea :-). Anyway, I've used it in production for
almost 3 months.)

hp

--
_ | Peter J. Holzer | If I wanted to be "academically correct",
|_|_) | Sysadmin WSR | I'd be programming in Java.
| | | hjp@xxxxxxxxx | I don't, and I'm not.
__/ | http://www.hjp.at/ | -- Jesse Erlbaum on dbi-users
--- dbdimp.c.orig 2006-11-03 15:05:46.000000000 +0100
+++ dbdimp.c 2007-09-24 10:36:25.000000000 +0200
@@ -1376,15 +1376,15 @@

if (!csform && SvUTF8(phs->sv)) {
/* try to default csform to avoid translation through non-unicode */
- if (CSFORM_IMPLIES_UTF8(SQLCS_NCHAR)) /* prefer NCHAR */
- csform = SQLCS_NCHAR;
- else if (CSFORM_IMPLIES_UTF8(SQLCS_IMPLICIT))
+ if (CSFORM_IMPLIES_UTF8(SQLCS_IMPLICIT)) /* prefer IMPLICIT */
csform = SQLCS_IMPLICIT;
+ else if (CSFORM_IMPLIES_UTF8(SQLCS_NCHAR))
+ csform = SQLCS_NCHAR;
/* else leave csform == 0 */
if (trace_level)
PerlIO_printf(DBILOGFP, " rebinding %s with UTF8 value %s", phs->name,
- (csform == SQLCS_NCHAR) ? "so setting csform=SQLCS_IMPLICIT" :
- (csform == SQLCS_IMPLICIT) ? "so setting csform=SQLCS_NCHAR" :
+ (csform == SQLCS_IMPLICIT) ? "so setting csform=SQLCS_IMPLICIT" :
+ (csform == SQLCS_NCHAR) ? "so setting csform=SQLCS_NCHAR" :
"but neither CHAR nor NCHAR are unicode\n");
}

Attachment: pgpYFFEHhwZku.pgp
Description: PGP signature