Re: Big5--->GB converter
From: Michael Lee (leeji_at_netvigator.com)
Date: 11/04/03
- Next message: Tor Iver Wilhelmsen: "Re: easy newbie question"
- Previous message: Roedy Green: "Re: Is primitive data variable length limited by computer hardware ?"
- In reply to: terry: "Re: Big5--->GB converter"
- Next in thread: Roedy Green: "Re: Big5--->GB converter"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Wed, 5 Nov 2003 02:21:00 +0800
Converting Big5 text to GB text is not as simple as it seems.
Some facts first:
1. Big5 is the de facto Traditional Chinese encoding scheme.
2. Big5_HKSCS is Big5 plus the Hong Kong Supplimentary Character Set, so it
is a superset of Big5. But note that HKSCS is only used in Hong Kong.
3. GBK is the de facto Simplified Chinese encoding scheme.
4. Both Big5_HKSCS and GBK are subsets of Unicode.
5. A subset of GBK is a subset of Big5_HKSCS.
There is no problem converting Big5 or GBK to Unicode. But due to the facts
listed above, it is obvious that not every character in Big5 has a
corresponding mapping in GBK.
It is still possible to perform such a conversion because almost every Big5
character has a corresponding GBK character *linguistically*, but Java's API
doesn't provide any means to perform this kind of conversion.
Michael Lee
----- Original Message -----
From: "terry" <leonlai2k@yahoo.com>
Newsgroups: comp.lang.java.programmer
Sent: Sunday, November 02, 2003 11:28 PM
Subject: Re: Big5--->GB converter
Gordon Beaton <not@for.email> wrote in message
news:<3fa4eba5@news.wineasy.se>...
> On 2 Nov 2003 03:04:35 -0800, terry wrote:
> > Anyone can give me an example? I know Java is able to do that.
>
> There seem to be several "GB" encodings like GBK, GB18030, x-EUC-CN
> and ISO2022_CN_GB. You can see for yourself here:
> http://java.sun.com/j2se/1.4.2/docs/guide/intl/encoding.doc.html
>
> Create an InputStreamReader and OutputStreamWriter, specifying
> appropriate encodings for each, then simply copy your data from one to
> the other:
>
> // data source
> InputStream is = ...
> InputStreamReader isr = new InputStreamReader(is,"Big5");
> BufferedReader br = new BufferedReader(isr);
>
> // destination
> OutputStream os = ...
> OutputStreamWriter osw = new OutputStreamWriter(os,"GBK");
> BufferedWriter bw = new BufferedWriter(osw);
>
> String line;
>
> while ((line = br.readLine()) != null) {
> bw.write(line);
> bw.newLine();
> }
>
> br.close();
> bw.close();
>
> /gordon
However, I find the character returned is incorrect
For example, 歡(6B61) returns (欢)9919.
But I have found that 欢 is 6B22.
- Next message: Tor Iver Wilhelmsen: "Re: easy newbie question"
- Previous message: Roedy Green: "Re: Is primitive data variable length limited by computer hardware ?"
- In reply to: terry: "Re: Big5--->GB converter"
- Next in thread: Roedy Green: "Re: Big5--->GB converter"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|