Last time this came up (which was recently) Jared Still [CC'd]
reported that he'd found several non-perl related references to this
error on Metalink. They were related to improper NLS settings.
This does appear either NLS or character set related. First, it's
not the CLOB column that's a problem. We were looking there since
that was relatively new. The problem columns are VARCHAR2(4000)
columns. The data coming in is from an XML package and is in UTF-8.
Our default Oracle character set is ISO-8859-1. If I add a filter
to convert from UTF-8 to ISO-8859-1, everything works. My guess at
this point is that Oracle is choking trying to convert the UTF-8 to
ISO-8859 and that this is only an issue for the VARCHAR2 columns, not
the CLOB. I'll have to investigate the whole Oracle character set
thing a bit more. Some threads I found Googling this indicated that
a VARCHAR2 of 4000 might need to have its input data truncated to 2000
(halved) if there was a conversion that would take one byte characters
to two. Even though that should not be the case here (ISO-8859 is a
single byte character set) I tried that and still got errors. One thing
I want to try yet is setting Oracle's character set to UTF-8 and see
what the behavior is.
If these points ring a bell with anyone and you can fill in the gaps
it would be appreciated. I'd like to write something up on what this
all means for future internal reference.
The good news is that there do not seem to be any DBI or DBD::Oracle
issues.
For UTF8 conversions CPAN has Unicode::MapUTF8 if anyone else ends up
down this path.
Re: Im sure glad I didnt buy a Mac Mini! ... MS isn't making you send UTF-8 from your Mac to people who have trouble ... >>>> No, it's just Outlook.... > emails from maccies are not using old versions of Outlook. ... But then you probably have no idea what the difference is between a character set and a font. ... (comp.sys.mac.advocacy)
Re: Any portable way get a filename in UTF-8 or to get the FS encoding ? ... A reasonable convention to use is that all file names be stored in a normalized utf-8.... The question of what to do where a process's character set is unable to convert from utf-8. ... If you want interoperability then a very good solution is to use a common base.... It gets to the point that once you have decided you need to have multiple processes with different locale encodings to talk to each other, then using a common encoding like utf-8 and deprecating all other encodings becomes an interesting solution. ... (comp.unix.programmer)
Re: GIMP ... Using the fallback 'C' locale.... from character set 'UTF-8' to 'ISO-8859-1' is not supported ...Conversion from character set 'UTF-8' to 'ISO-8859-1' is not supported ... (alt.os.linux.suse)
Re: Might be PHP after all ...changing hosts).... On D, if I put in data with an apostrophe, it goes ... S has a default character set of lantin1, while D has a character set ... But if you get utf-8 anyway, ... (comp.lang.php)
Re: OT: character encodings (was: Linux 2.6.20-rc4) ...UTF-8 folk assume all text files are UTF-8 encoded. ... When you stored it on disk, the character set information was lost. ... $ git version ... A mixed charset environment was _already_ a pain in the butt, ... (Linux-Kernel)