RE: Oracle ncahr versus nvarchar2 and unicode data



I seem to have identified the difference between selecting from nchar and
nvarchar columns containing unicode data:

for nvarchar:
col 1: dbtype 1, scale 0, prec 512, nullok 1, name A
: dbsize 512, char_used 1, char_size 256, csid 2000, csform 2, disize 512

for ncahr:
col 1: dbtype 96, scale 0, prec 512, nullok 1, name A
: dbsize 512, char_used 1, char_size 256, csid 2000, csform 2, disize 512

and for nvarchar, DBD::Oracle makes the call:

calling OCIAttrSet OCI_ATTR_CHARSET_FORM with csform=2
OCIAttrSet(83423a0,OCI_HTYPE_DEFINE,bfffed87,0,32,833c858)=SUCCESS

If I change oci8.c around line 1316 which was:

#ifdef OCI_ATTR_CHARSET_FORM
if ( (fbh->dbtype == 1) ) { /* && (fbh->csform == SQLCS_NCHAR) &&
CS_IS_UTF8(ncharsetid) ) { */
/* ok... after doing what tim asked: setting SvUTF8 strictly based on csid
8bit Nchar test was broken
and this currently effectively just sets Attrs to the values in fhb ignoring
ncharsetid altogether
probably wrong
*/

to:

#ifdef OCI_ATTR_CHARSET_FORM
if ( (fbh->dbtype == 1) || (fbh->csform == SQLCS_NCHAR)) {

I seem to get my data back as I expected.

I don't know much about DBD::Oracle so the comment in the code does
not help me that much but it looks as though the code used to be
fairly similar to how I changed it.

So, my question now becomes, should I be able to store unicode data
in an nchar (which I can with stock DBD::Oracle 1.16) and expect to be
able to get it back again as unicode data? Is DBD::Oracle working
correctly or should I have done something else?

Martin

--
Martin J. Evans
Easysoft Ltd, UK
Development

On 19-Jul-2005 Martin J. Evans wrote:
> Hi,
>
> I have some further information on my problem. The insertion of UTF-8 data
> into
> nchar seems to work like it does for nvarchar2; it appears it is the
> selecting
> from nchar fields that is not returning the data inserted. I have verified
> this
> by inserting the same data into nchar and nvarchar fields and reading it via
> Oracle's ODBC driver in MS Access - where it looks fine. But, retrieving it
> in
> Perl does not display the same results.
>
> Martin
> --
> Martin J. Evans
> Easysoft Ltd, UK
> Development
>
>
> On 18-Jul-2005 Martin J. Evans wrote:
>> Hi,
>>
>> Wondered if anyone can tell me if I've misunderstood something here. I am
>> able
>> to insert unicode data into nvarchar2 and get back what was inserted but
>> same
>> code with nchar column does not work.
>>
>> OCI version is 10.1.0.2
>> Database is Oracle 10.2 on Linux
>> Database CHAR set is WE8ISO8859P1 (Non-Unicode), NCHAR set is AL16UTF16
>> (Unicode)
>> This is perl, v5.8.7 built for i686-linux
>> DBD::Oracle 1.16.
>>
>>#!/disk2/martin/perl/bin/perl
>> use charnames ':full';
>> use DBI;
>> use Encode;
>> use DBD::Oracle qw( :ora_types ORA_OCI SQLCS_NCHAR );
>> use Devel::Peek;
>>
>> if ( $] < 5.008) {
>> print "Need at least Perl 5.8\n";
>> exit 1;
>> }
>> binmode(STDOUT, ":utf8");
>>
>> print "<html><head>";
>> print q|<meta http-equiv="Content-Type" content="text/html;charset=utf-8">|;
>> print "<title>test</title></head><body>\n";
>>
>> $ENV{'ORACLE_HOME'} = '/home/oracle/product/10.1.0/db_2';
>> $ENV{'NLS_NCHAR'} = 'AL32UTF8';
>> my $dbh = DBI->connect('dbi:Oracle:tentwo.testing', 'xxx', 'yyy');
>> $dbh->{ChopBlanks} = 1;
>> print "OCI version is ", ORA_OCI, "\n";
>> print "ora_can_unicode = ", $dbh->ora_can_unicode, "\n";
>> if (!($dbh->ora_can_unicode & 1)) {
>> print "Database does not support UNICODE\n";
>> exit 1;
>> }
>> my $paramsH = $dbh->ora_nls_parameters();
>> printf "Database $ora_server_version CHAR set is %s (%s), NCHAR set is %s
>> (%s)\n",
>> $paramsH->{NLS_CHARACTERSET},
>> $dbh->ora_can_unicode & 2 ? "Unicode" : "Non-Unicode",
>> $paramsH->{NLS_NCHAR_CHARACTERSET},
>> $dbh->ora_can_unicode & 1 ? "Unicode" : "Non-Unicode";
>> eval {
>> $dbh->do('drop table martin');
>> };
>># NOTE nchar does not work!!!! (why?)
>> $dbh->do('create table martin (a nvarchar2(256))');
>> my $sql = "insert into martin values (?)";
>> my $x = "\N{EURO SIGN}\x{20ac}\N{LATIN CAPITAL LETTER A WITH ACUTE}";
>> $x .= chr(0x20ac);
>> print "is_utf8 returns ", utf8::is_utf8($x), "\n";
>> print "valid returns ", utf8::valid($x), "\n";
>> my $sth = $dbh->prepare($sql);
>> $sth->bind_param(1, $x, {ora_csform => SQLCS_NCHAR});
>> $sth->execute();
>> $sql = "select a from martin";
>> $sth = $dbh->prepare($sql);
>> $sth->execute;
>> while(my @row = $sth->fetchrow_array) {
>> print "valid on returned string returns ", utf8::valid($row[0]), "\n";
>> #$_ = decode("utf8", $_) foreach @row;
>> print $row[0], "\n";
>> print unpack("H*", $row[0]), "\n";
>> print Dump(@row[0]);
>> }
>> $sth->finish;
>> $dbh->disconnect();
>>
>> Output is:
>>
>> <html><head><meta http-equiv="Content-Type"
>> content="text/html;charset=utf-8"><title>test</title></head><body>
>> OCI version is 10.1.0.2
>> ora_can_unicode = 1
>> Database CHAR set is WE8ISO8859P1 (Non-Unicode), NCHAR set is AL16UTF16
>> (Unicode)
>> is_utf8 returns 1
>> valid returns 1
>> valid on returned string returns 1
>> <utf8 data here - snipped from email>
>> e282ace282acc381e282ac
>> SV = PV(0x82a3efc) at 0x82e5b48
>> REFCNT = 1
>> FLAGS = (POK,pPOK,UTF8)
>> PV = 0x8361bb0 "\342\202\254\342\202\254\303\201\342\202\254"\0 [UTF8
>> "\x{20ac}\x{20ac}\x{c1}\x{20ac}"]
>> CUR = 11
>> LEN = 12
>> </body></html>
>>
>> UTF-8 data output with Devel::Peek Dump looks fine.
>>
>> A simple change to create the table with nchar instead of nvarchar2
>> outputs:
>>
>> <html><head><meta http-equiv="Content-Type"
>> content="text/html;charset=utf-8"><title>test</title></head><body>
>> OCI version is 10.1.0.2
>> ora_can_unicode = 1
>> Database CHAR set is WE8ISO8859P1 (Non-Unicode), NCHAR set is AL16UTF16
>> (Unicode)
>> is_utf8 returns 1
>> valid returns 1
>> valid on returned string returns 1
>> <utf8 data here snipped from email>
>> 3f3f413f
>> SV = PV(0x82a3efc) at 0x82e5c3c
>> REFCNT = 1
>> FLAGS = (POK,pPOK,UTF8)
>> PV = 0x8361cc0 "??A?"\0 [UTF8 "??A?"]
>> CUR = 4
>> LEN = 5
>> </body></html>
>>
>> Devel::Peek Dump output looks nothing like what I expected.
>>
>> Also, do I have to use {ora_csform => SQLCS_NCHAR}? Does not seem to work in
>> either case if I don't.
>>
>> And one final thing. If I change $x containing the UTF-8 data to insert and
>> run
>> $sth->execute again it inserts the original data again and not the changed
>> $x.
>> I've not confirmed this yet with other drivers, but I am /fairly/ sure with
>> DBD::ODBC you don't have to call bind_param again if the bound data is
>> changed.
>>
>> Thanks.
>>
>> Martin
>> --
>> Martin J. Evans
>> Easysoft Ltd, UK
>> Development
.



Relevant Pages

  • Re: nvarchar max
    ... Yes, SQL FTS supports char, nchar, varchar, nvarchar, varchar, ... nvarchar, image, varbinary and xml. ...
    (microsoft.public.sqlserver.fulltext)
  • Re: Storing Text
    ... VatChar or NVarchar does not pad string to appropriate long, Char or NChar ...
    (microsoft.public.access.adp.sqlserver)
  • Re: Storing Chinese Characters in Sql Server 2000
    ... "Steve Kass" wrote: ... > is if the column type is incorrect. ... The type should be nchar, nvarchar, ...
    (microsoft.public.sqlserver.server)
  • Re: Get Longest Value
    ... The problem is that nvchar and nchar take up ... Yes, it has to be nvarchar, we get gobbles of asian text. ...
    (microsoft.public.sqlserver.programming)
  • Re: Storing Chinese Characters in Sql Server 2000
    ... >> The only way it could appear that the table converts characters to ?? ... >>is if the column type is incorrect. ... The type should be nchar, nvarchar, ...
    (microsoft.public.sqlserver.server)