Re: Encoding conversion problem



Andrea wrote:
Hi Sabine,
I'm guessing, but maybe if the databse tells the JDBC driver it's
ISO-8859-1 *and* your application tells it the same encoding, it won't
bother trying to transform anything...
Yes that's what I was thinking too... but I tried to change the
encoding of the JVM (tried Cp850, ...) but it keeps on working...

Hi Lew,
I was thinking only about the DB encoding while the problem is mainly
in the JVM encoding (now it's clear to me that Java can't handle
characters outside the encoding of the JVM, I wasn't thinking about
it, sorry...).
"The encoding of the JVM" is Unicode-16 with surrogate pairs; every Unicode
character is representable in the JVM, including the Euro character. There is
no Unicode character that the JVM cannot represent.
With "encoding of the JVM" I was referring to the file.encoding
property used by the JVM. If the JVM runs with:
- ISO-8859-1 then I can't read or write the EURO character to DB (it
becomes garbage) and ISO-8859-1 doesn't include that character;
- Cp1252 then I can read and write the EURO character to DB and Cp1252
includes that character.

I uinderstand my confusion now - it stemmed from the phrase "the encoding of the JVM". The JVM itself only uses one encoding; it translates to and from other encoding on I/O. So to make sure I understood you correctly, were you referring to the encoding specified by the I/O call?

Generally if the encoding you specify for I/O is different from the encoding in your data store, it will cause trouble. This is not limited to Java. Over in the Postgres newsgroups one finds people have trouble with character encoding from all sorts of platforms, mostly stemming from trying to store characters in a column that are not part of the specified character encoding for the DB. If such things don't match, then problems will hatch.

--
Lew
.



Relevant Pages

  • Re: how can I get the text files encode?
    ... As both of you say,there is not way to determine which encode the file ... When you ask for a stream to be read as ... use the encoding to convert the bytes to Unicode characters. ... the JVM just does what you tell it. ...
    (comp.lang.java.programmer)
  • Re: Encoding conversion problem
    ... The JVM itself only uses one encoding; ... In my posts I tried to specify the encoding of the DBMS and the JVM ... characters in a column that are not part of the specified character encoding ...
    (comp.lang.java.databases)
  • Re: Encoding conversion problem
    ... The JVM itself only uses one encoding; ... In my posts I tried to specify the encoding of the DBMS and the JVM ... characters in a column that are not part of the specified character encoding ...
    (comp.lang.java.databases)
  • Re: Writer doesnt write out =?UTF-8?B?IuKJpCIgY2hhcmFjdGVyIHBybw==?= =?UTF-8?B?cGVybHk=?=
    ... FileWriter picks up whatever encoding the machine happens to be configured with at the time the JVM started. ... I strongly suggest using FileOutputStream and OutputStreamWriter specifying an explicit character set. ...
    (comp.lang.java.programmer)
  • Re: Test to see of a variable has been used/initialised
    ... >>The Java virtual machine specification does not mandate a concrete value ... >>encoding null. ... >>interpret that to mean it can be anything the JVM author wishes it to be. ...
    (comp.lang.java.programmer)