Re: Writing Unicode-16 to a text file

From: Soren Kuula (dongfangspam_at_bitplanet.net)
Date: 01/29/04


Date: Thu, 29 Jan 2004 01:01:14 +0100

Konrad Den Ende wrote:
>>When you wrote the characters to a file (what method did you use?) they
>>probably underwent a 16-bit to 8-bit conversion

> try {
> BufferedWriter writer = new BufferedWriter (new FileWriter
> ("nihongo.txt"));
> writer.write (cc); // cc is a char[] that stores the characters
> writer.close ();
> }
> catch (Exception e) {System.out.println (e.getMessage ());}

>>using some encoding (what encoding did you specify? or what is your Java
>>installation using as its default encoding?).

> Any hint?

Sure.

You have been writing Japanese with an encoding that doensn't support
it. I bet your default encoding, derived from your operating system
locale (you may see that from System.getProperties() . .. ) is ISO-8859
or something like that. It does not support Japanese.

You should look at OutputStreamWriter, of which you can make an instance
that uses an encoding that supports Japanese. You can get an idea of
what encodings are supported by looking at the CharSet class of java
1.4's nio package. There is a static method there, I forgot its name,
that will return you a Set of the names of supported encodings.

You may end up using ISO-2022-something, but I prefer Unicode's UTF-8,
it's a lot nicer and cleaner, and it supports almost any language. You
will need Unicode fonts though.

En encoding is the mapping from bytes (sequences of 8 bits) to a higher
level of abstraction, namely characters. Streams are byte oriented,
readers/writers are character oriented, and encoding/decoding is in
between.

Hope that helped.
Soren

-- 
Fjern de 4 bogstaver i min mailadresse som er indsat for at hindre s...
Remove the 4 letter word meaning "junk mail" in my mail address.


Relevant Pages

  • urwid with multi-byte encoded and bidirectional text?
    ... I would like to support whatever encoding the user likes. ... *new* line translation format would have to support characters that are ... N bytes in the string and M columns wide when displayed, ...
    (comp.lang.python)
  • Re: Strange Characters When Viewing Outlook Express messages
    ... Messages Received in Outlook Express Have Different Characters in the ... messages in the default encoding format regardless of the actual encoding ... changed something with whatever they use to produce the emails. ...
    (microsoft.public.windowsxp.general)
  • Re: Help me!! Why java is so popular
    ... Well, Unicode is not a storage encoding system, or anything like that. ... Unicode is primarily a mapping from characters (in the linguistic conceptual ... French, Russian, Japanese and Korean songs. ...
    (comp.lang.java.programmer)
  • Re: Workable encryption in Tcl??
    ... abstract characters using the concrete UTF-8 encoding, ... character streams and octet streams when doing input and output. ... How does this relate to encryption? ...
    (comp.lang.tcl)
  • Re: Trasferire file
    ... The Base64 Content-Transfer-Encoding is designed to ... The encoding and decoding algorithms ... as output strings of 4 encoded characters. ... that this may be done directly by the encoder rather than in ...
    (it.comp.macintosh)

Loading