Re: HELP: Unicode in Java 1.3.1 vs 1.4.2
From: John C. Bollinger (jobollin_at_indiana.edu)
Date: 02/15/05
- Next message: Anton Spaans: "Re: About ResultSet.getTimeStamp"
- Previous message: peter.doyle_at_littlewoods.co.uk: "Re: compaction reporting in ibm's verbose gc log"
- In reply to: modest: "HELP: Unicode in Java 1.3.1 vs 1.4.2"
- Next in thread: Chris Uppal: "Re: HELP: Unicode in Java 1.3.1 vs 1.4.2"
- Reply: Chris Uppal: "Re: HELP: Unicode in Java 1.3.1 vs 1.4.2"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Tue, 15 Feb 2005 10:15:07 -0500
modest wrote:
> according to
> http://java.sun.com/docs/books/tutorial/i18n/text/string.html:
>
> "If a byte array contains non-Unicode text, you can convert the text to
> Unicode with one of the String constructor methods. Conversely, you can
> convert a String object into a byte array of non-Unicode characters
> with the String.getBytes method. When invoking either of these methods,
> you specify the encoding identifier as one of the parameters."
>
> It works fine in Java 1.3.1
>
> ------------------------------------------------------------------
> // Convert ASCII to Unicode
> str_uni = new String(str_ascii.getBytes(), "ISO8859_8");
>
> // Convert Unicode to ASCII
> str_ascii = new String(str_uni.getBytes("ISO8859_8"));
> ------------------------------------------------------------------
>
> In Java 1.4.2 it returns question marks only.
>
> What is the difference and how it can be fixed?
You are not using the canonical name of the charset, which is
"ISO-8859-8". Which charsets are available and how they are configured
depends on your Java installation. On my Sun JDK 1.4.2_05 installation,
the charset in question has no defined aliases and therefore can only be
referred to by its canonical name. I don't know why you are getting
anything at all in this case (you should get an
UnsupportedEncodingException if the charset name were unknown).
That said, your code is deeply flawed. If you have data in a Java
String then it is already Unicode, *that is a fundamental characteristic
of Java Strings*. It does not make sense to talk about changing the
encoding / charset of a String -- the concept just doesn't apply (and
the i18n tutorial refer to doesn't suggest otherwise). If you have
taken a byte sequence and created a String from it without accounting
for the bytes' charset then you are already hosed. This may be your
real problem, and it has not changed from 1.3 to 1.4 (or 1.5).
In addition, it might be relevant to you that ASCII, Unicode, and all
the ISO-8859 nationalized charsets all assign the same codes to the
characters covered by ASCII. The UTF-8 charset for encoding Unicode is
produces encoded character codes for the ASCII characters that are the
same as the character codes themselves.
-- John Bollinger jobollin@indiana.edu
- Next message: Anton Spaans: "Re: About ResultSet.getTimeStamp"
- Previous message: peter.doyle_at_littlewoods.co.uk: "Re: compaction reporting in ibm's verbose gc log"
- In reply to: modest: "HELP: Unicode in Java 1.3.1 vs 1.4.2"
- Next in thread: Chris Uppal: "Re: HELP: Unicode in Java 1.3.1 vs 1.4.2"
- Reply: Chris Uppal: "Re: HELP: Unicode in Java 1.3.1 vs 1.4.2"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|