RE: Creating XML from an Oracle DB



Keith.Barnard@xxxxxxxxxxxxxx wrote:
Hi Tim,

I have spent about three hours using Google and Metalink trying to
find the

answer to what must be a FAQ when generating XML from Oracle but I
cannot find anything that answers the question.

I work for a company in the UK and we are generating an XML file from
an Oracle 9.2 database using DBMS_XMLQUERY and DBMS_LOB. The XML file
has to be
in UTF-8 format (encoding="utf-8") but this is not the format that
it is held in the database. From what I have read, it seems that it
is AL32UTF8.

Oracle's utf-8 support for the XML packages is shameful.
http://www.dbforums.com/showthread.php?t=1212787.

For any of this to happen automatically, you need to make sure that your
original data (what you're making the XML from) is stored correctly in
the database's character set. This means that if your data actually
contains utf-8, the database characterset should be AL32UTF8. It
appears your data is indeed in utf-8, because 49827 (0xC2A3) is the
utf-8 representation of the pound sign.

You can determine your database's character set with this query:
select value from v$nls_parameters
where parameter = 'NLS_CHARACTERSET'

Assuming your database characterset actually matches the data that's in
it, you can just set the client character set to your desired output,
and the encoding is done for you:

# data from Oracle will now be converted correctly
# into Perl's internal encoding.
export NLS_LANG=AMERICAN_AMERICA.AL32UTF8

# in perl...
# convert from Perl's internal encoding to utf-8 when
# printing to stdout.
binmode(STDOUT, ':utf8'); # convert from Perl's internal to utf-8

If your database characterset does not match the data (for example, your
database characterset is US7ASCII) then sorry, you will have to manually
convert each unicode column. If you're using Oracle 10G, then there are
functions to convert XML encodings explicitly using character set Ids.
Otherwise, you'll have to figure out the right hocus pocus to transfer
the data from Oracle to Perl without losing character information, and
then encode/decode in Perl with the Encode module.

If you haven't already, I recommend reading the Oracle 9i Globalization
Best Practices document. It can help get your head around how the
character sets work. http://tinyurl.com/mtsxg [oracle.com].

Hope it helps. Regards,

Philip



.



Relevant Pages

  • Re: Im sure glad I didnt buy a Mac Mini!
    ... MS isn't making you send UTF-8 from your Mac to people who have trouble ... >>>> No, it's just Outlook. ... > emails from maccies are not using old versions of Outlook. ... But then you probably have no idea what the difference is between a character set and a font. ...
    (comp.sys.mac.advocacy)
  • Re: Want Input boxes to accept unicode strings on Standard Window
    ... If ther encoding is not specified, then the encoding is assumed to be ... Ah, UTF-8. ... That would be wrong according to the standard. ... when producing XML files. ...
    (microsoft.public.vc.mfc)
  • Re: Any portable way get a filename in UTF-8 or to get the FS encoding ?
    ... A reasonable convention to use is that all file names be stored in a normalized utf-8. ... The question of what to do where a process's character set is unable to convert from utf-8. ... If you want interoperability then a very good solution is to use a common base. ... It gets to the point that once you have decided you need to have multiple processes with different locale encodings to talk to each other, then using a common encoding like utf-8 and deprecating all other encodings becomes an interesting solution. ...
    (comp.unix.programmer)
  • Re: GIMP
    ... Using the fallback 'C' locale. ... from character set 'UTF-8' to 'ISO-8859-1' is not supported ... Conversion from character set 'UTF-8' to 'ISO-8859-1' is not supported ...
    (alt.os.linux.suse)
  • Re: Might be PHP after all
    ... changing hosts). ... On D, if I put in data with an apostrophe, it goes ... S has a default character set of lantin1, while D has a character set ... But if you get utf-8 anyway, ...
    (comp.lang.php)