Re: converting unicode to UTF-8
From: Chris Uppal (chris.uppal_at_metagnostic.REMOVE-THIS.org)
Date: 11/20/04
- Next message: Chris Uppal: "Re: Thread-question"
- Previous message: Chris Uppal: "Re: Is it possible to set a block of code as non-JIT area?"
- In reply to: peter10: "converting unicode to UTF-8"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Sat, 20 Nov 2004 12:59:12 -0000
peter10 wrote:
> ByteArrayOutputStream out = new ByteArrayOutputStream();
> DataOutputStream dataOut = new DataOutputStream(out);
> dataOut.writeUTF(text_input);
The first problem here is that writeUTF8() does /NOT/ write UTF-8. It's an
incredibly, unbelievably, stupidly, misleadingly-named method. What it does is
write a two-byte character count (as Steve has already mentioned) followed by
some bytes that represent the string in a format that is (conceptually) related
to, but completely incompatible with, UTF-8.
UTF-8 is a a way of taking a stream/string of Unicode characters (and Java
Strings can be viewed as such, although the correspondence is not as close as
it looks), and representing them as bytes in a binary stream or similar. In
Java that conversion is ultimately provided by a "charset", specifically the
one named "UTF-8". Probably the easiest way for you to use that would be
either to ask your String for its
aString.getBytes("UTF-8");
or to use an OutputStreamWriter constructed with a 'charsetname' of "UTF-8".
-- chris
- Next message: Chris Uppal: "Re: Thread-question"
- Previous message: Chris Uppal: "Re: Is it possible to set a block of code as non-JIT area?"
- In reply to: peter10: "converting unicode to UTF-8"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|