Re: change ISO8859-1 to GB2312 to UTF-8 to EBCDIC to Big5 to ...



On 25/05/2010 09:48, moonhkt wrote:

Thank [you]. I am not testing [with] JDBC.

When you wrote "Our database is ISO8859-1 format with some GB2312 and other non ISO8859-1 data." I got the impression that a DBMS was involved. If you were using Hibernate or some other framework rather than JDBC, the same principles would apply.


But tired to GB2312 file , to UTF-8 then BIG5

BIG5! Another character set and encoding! I think that makes seven you've mentioned in this thread! Any more?


10 TEST1 |测试1
11 TEST2 |测试2
13 TEST4 |测试4

[the program below] can conv[ert a file containing the above data] to UTF-8

When [it] conv[erts from] UTF-8 to BIG5, [it] can not [successfully convert
all characters].Do you know why ?

You are ignoring exceptions. Exceptions might be telling you something you really need to know about. Don't ignore exceptions.

I'm not familiar with GB2312 and Big5 but I expect that there are characters in GB2312 that are not in Big5. It is almost certain.

GB2312 originated in the People's Republic of China, where simplified Chinese characters were mandatory. I think this policy has been relaxed now.

I suspect Big5 originated in either the British colony of Hong Kong or in the Republic of China (Taiwan/Formosa). In both these places, Traditional Chinese characters were (and still are) used.

Whether the conversion from GB2312 to UTF-16 and then to Big5 can convert a simplified character to a traditional counterpart is unknown to me. Perhaps this causes conversion problems?


[I] Checked [the resulting file] with IE, the BIG5 code is [displayed as] "?"

You have to tell IE what encoding to use to display the file. That was why I wrote HTML markup containing <meta charset="gb2312">. You can probably force an encoding using a menu option in IE. You certainly can in Firefox.

If IE does not have access to a font containing the required glyph, it will display a placeholder character. I don't use IE much so I'm not certain what the placeholder IE displays, a small box, a question-mark or something else.

If Java writes a character that is not present in the specified output character set then I expect it might also substitute a placeholder character.

Also Big5 is weird, apparently it doesn't exactly encode characters, it encodes logograms or parts of graphical characters. It also has to be paired with a single-byte character-set that isn't specified in the Big5 standard. Also there are variants of Big5. Lots of scope for encoding issues. Maybe Java and IE disagree about Big5 variants?
<http://en.wikipedia.org/wiki/Big5>

P.S. IE6 is old and a security hazard, I'd upgrade.
--
RGB
.



Relevant Pages

  • Re: change ISO8859-1 to GB2312 to UTF-8 to EBCDIC to Big5 to ...
    ... Another character set and encoding! ... I'm not familiar with GB2312 and Big5 but I expect that there are ... You have to tell IE what encoding to use to display the file. ... Our ISO8859-1 Database(Progress Database) have some Japanese/Korea/ ...
    (comp.lang.java.programmer)
  • Re: [PHP] Re: 0x9f54
    ... I hope you're not using legacy encoding like Big5 or GB. ... While in Big5 every character is represented by two ... you should look at the positive side of using Unicode ...
    (php.general)
  • Re: Petition to UN on Abolishment of Traditional Chinese in 2008
    ... >> So why did the traditional character set Big5 merge zhe5 ... Big5 has since long been extended to include zhe5/zhuo. ... the "correct" form for both is actually Morohashi ... The entry for u+8457 refers to Kangxi Index ...
    (sci.lang)
  • Re: Big5--->GB converter
    ... Converting Big5 text to GB text is not as simple as it seems. ... Big5_HKSCS is Big5 plus the Hong Kong Supplimentary Character Set, ... GBK is the de facto Simplified Chinese encoding scheme. ...
    (comp.lang.java.programmer)
  • Re: A Chinese Word for Ten-Thousand-Myriad?
    ... you didn't specify the character ... encoding in the article, so we can't read the Chinese. ... I suppose you use Big5? ...
    (rec.games.mahjong)