Re: change ISO8859-1 to GB2312 to UTF-8 to EBCDIC to Big5 to ...
- From: RedGrittyBrick <RedGrittyBrick@xxxxxxxxxxxxxxxxx>
- Date: Tue, 25 May 2010 12:02:10 +0100
On 25/05/2010 09:48, moonhkt wrote:
Thank [you]. I am not testing [with] JDBC.
When you wrote "Our database is ISO8859-1 format with some GB2312 and other non ISO8859-1 data." I got the impression that a DBMS was involved. If you were using Hibernate or some other framework rather than JDBC, the same principles would apply.
But tired to GB2312 file , to UTF-8 then BIG5
BIG5! Another character set and encoding! I think that makes seven you've mentioned in this thread! Any more?
10 TEST1 |测试1
11 TEST2 |测试2
13 TEST4 |测试4
[the program below] can conv[ert a file containing the above data] to UTF-8
When [it] conv[erts from] UTF-8 to BIG5, [it] can not [successfully convert
all characters].Do you know why ?
You are ignoring exceptions. Exceptions might be telling you something you really need to know about. Don't ignore exceptions.
I'm not familiar with GB2312 and Big5 but I expect that there are characters in GB2312 that are not in Big5. It is almost certain.
GB2312 originated in the People's Republic of China, where simplified Chinese characters were mandatory. I think this policy has been relaxed now.
I suspect Big5 originated in either the British colony of Hong Kong or in the Republic of China (Taiwan/Formosa). In both these places, Traditional Chinese characters were (and still are) used.
Whether the conversion from GB2312 to UTF-16 and then to Big5 can convert a simplified character to a traditional counterpart is unknown to me. Perhaps this causes conversion problems?
[I] Checked [the resulting file] with IE, the BIG5 code is [displayed as] "?"
You have to tell IE what encoding to use to display the file. That was why I wrote HTML markup containing <meta charset="gb2312">. You can probably force an encoding using a menu option in IE. You certainly can in Firefox.
If IE does not have access to a font containing the required glyph, it will display a placeholder character. I don't use IE much so I'm not certain what the placeholder IE displays, a small box, a question-mark or something else.
If Java writes a character that is not present in the specified output character set then I expect it might also substitute a placeholder character.
Also Big5 is weird, apparently it doesn't exactly encode characters, it encodes logograms or parts of graphical characters. It also has to be paired with a single-byte character-set that isn't specified in the Big5 standard. Also there are variants of Big5. Lots of scope for encoding issues. Maybe Java and IE disagree about Big5 variants?
<http://en.wikipedia.org/wiki/Big5>
P.S. IE6 is old and a security hazard, I'd upgrade.
--
RGB
.
- Follow-Ups:
- References:
- change ISO8859-1 to GB2312
- From: moonhkt
- Re: change ISO8859-1 to GB2312
- From: Lew
- Re: change ISO8859-1 to GB2312
- From: moonhkt
- Re: change ISO8859-1 to GB2312
- From: Lew
- Re: change ISO8859-1 to GB2312
- From: moonhkt
- Re: change ISO8859-1 to GB2312
- From: RedGrittyBrick
- Re: change ISO8859-1 to GB2312
- From: moonhkt
- Re: change ISO8859-1 to GB2312
- From: RedGrittyBrick
- Re: change ISO8859-1 to GB2312
- From: moonhkt
- Re: change ISO8859-1 to GB2312
- From: RedGrittyBrick
- Re: change ISO8859-1 to GB2312
- From: moonhkt
- Re: change ISO8859-1 to GB2312
- From: RedGrittyBrick
- Re: change ISO8859-1 to GB2312
- From: moonhkt
- change ISO8859-1 to GB2312
- Prev by Date: sql sort problem ?
- Next by Date: Re: Placement of Constants
- Previous by thread: Re: change ISO8859-1 to GB2312
- Next by thread: Re: change ISO8859-1 to GB2312 to UTF-8 to EBCDIC to Big5 to ...
- Index(es):
Relevant Pages
|