Re: Big5--->GB converter

From: Michael Lee (leeji_at_netvigator.com)
Date: 11/04/03


Date: Wed, 5 Nov 2003 02:21:00 +0800

Converting Big5 text to GB text is not as simple as it seems.

Some facts first:

1. Big5 is the de facto Traditional Chinese encoding scheme.

2. Big5_HKSCS is Big5 plus the Hong Kong Supplimentary Character Set, so it
is a superset of Big5. But note that HKSCS is only used in Hong Kong.

3. GBK is the de facto Simplified Chinese encoding scheme.

4. Both Big5_HKSCS and GBK are subsets of Unicode.

5. A subset of GBK is a subset of Big5_HKSCS.

There is no problem converting Big5 or GBK to Unicode. But due to the facts
listed above, it is obvious that not every character in Big5 has a
corresponding mapping in GBK.

It is still possible to perform such a conversion because almost every Big5
character has a corresponding GBK character *linguistically*, but Java's API
doesn't provide any means to perform this kind of conversion.

Michael Lee

----- Original Message -----
From: "terry" <leonlai2k@yahoo.com>
Newsgroups: comp.lang.java.programmer
Sent: Sunday, November 02, 2003 11:28 PM
Subject: Re: Big5--->GB converter

Gordon Beaton <not@for.email> wrote in message
news:<3fa4eba5@news.wineasy.se>...
> On 2 Nov 2003 03:04:35 -0800, terry wrote:
> > Anyone can give me an example? I know Java is able to do that.
>
> There seem to be several "GB" encodings like GBK, GB18030, x-EUC-CN
> and ISO2022_CN_GB. You can see for yourself here:
> http://java.sun.com/j2se/1.4.2/docs/guide/intl/encoding.doc.html
>
> Create an InputStreamReader and OutputStreamWriter, specifying
> appropriate encodings for each, then simply copy your data from one to
> the other:
>
> // data source
> InputStream is = ...
> InputStreamReader isr = new InputStreamReader(is,"Big5");
> BufferedReader br = new BufferedReader(isr);
>
> // destination
> OutputStream os = ...
> OutputStreamWriter osw = new OutputStreamWriter(os,"GBK");
> BufferedWriter bw = new BufferedWriter(osw);
>
> String line;
>
> while ((line = br.readLine()) != null) {
> bw.write(line);
> bw.newLine();
> }
>
> br.close();
> bw.close();
>
> /gordon

However, I find the character returned is incorrect
For example, &#27489;(6B61) returns (&#27426;)9919.
But I have found that &#27426; is 6B22.



Relevant Pages

  • Re: convert GB to Big5
    ... >I found out I can use strconvfor converting GB to unicode and then ... >converting it to Big5. ... > DECLARE integer LCMapString IN WIN32API integer, long, string, integer, ...
    (microsoft.public.fox.helpwanted)
  • convert GB to Big5
    ... I found out I can use strconvfor converting GB to unicode and then ... It seems that I need to map the unicoded GB to unicoded Big5 before ... DECLARE integer LCMapString IN WIN32API integer, long, string, integer, ...
    (microsoft.public.fox.helpwanted)
  • Re: convert GB to Big5
    ... > I found out I can use strconvfor converting GB to unicode and then ... > converting it to Big5. ... Prev by Date: ...
    (microsoft.public.fox.helpwanted)
  • Re: change ISO8859-1 to GB2312 to UTF-8 to EBCDIC to Big5 to ...
    ... Another character set and encoding! ... I'm not familiar with GB2312 and Big5 but I expect that there are ... You have to tell IE what encoding to use to display the file. ... Our ISO8859-1 Database(Progress Database) have some Japanese/Korea/ ...
    (comp.lang.java.programmer)
  • Re: [PHP] Re: 0x9f54
    ... I hope you're not using legacy encoding like Big5 or GB. ... While in Big5 every character is represented by two ... you should look at the positive side of using Unicode ...
    (php.general)