Re: change ISO8859-1 to GB2312 to UTF-8 to EBCDIC to Big5 to ...
- From: moonhkt <moonhkt@xxxxxxxxx>
- Date: Tue, 25 May 2010 07:18:53 -0700 (PDT)
On 5月25日, 下午7時02分, RedGrittyBrick <RedGrittyBr...@xxxxxxxxxxxxxxxxx>
wrote:
On 25/05/2010 09:48, moonhkt wrote:
Thank [you]. I am not testing [with] JDBC.
When you wrote "Our database is ISO8859-1 format with some GB2312 and
other non ISO8859-1 data." I got the impression that a DBMS was
involved. If you were using Hibernate or some other framework rather
than JDBC, the same principles would apply.
But tired to GB2312 file , to UTF-8 then BIG5
BIG5! Another character set and encoding! I think that makes seven
you've mentioned in this thread! Any more?
10 TEST1 |测试1
11 TEST2 |测试2
13 TEST4 |测试4
[the program below] can conv[ert a file containing the above data] to UTF-8
When [it] conv[erts from] UTF-8 to BIG5, [it] can not [successfully convert
all characters].Do you know why ?
You are ignoring exceptions. Exceptions might be telling you something
you really need to know about. Don't ignore exceptions.
I'm not familiar with GB2312 and Big5 but I expect that there are
characters in GB2312 that are not in Big5. It is almost certain.
GB2312 originated in the People's Republic of China, where simplified
Chinese characters were mandatory. I think this policy has been relaxed now.
I suspect Big5 originated in either the British colony of Hong Kong or
in the Republic of China (Taiwan/Formosa). In both these places,
Traditional Chinese characters were (and still are) used.
Whether the conversion from GB2312 to UTF-16 and then to Big5 can
convert a simplified character to a traditional counterpart is unknown
to me. Perhaps this causes conversion problems?
[I] Checked [the resulting file] with IE, the BIG5 code is [displayed as] "?"
You have to tell IE what encoding to use to display the file. That was
why I wrote HTML markup containing <meta charset="gb2312">. You can
probably force an encoding using a menu option in IE. You certainly can
in Firefox.
If IE does not have access to a font containing the required glyph, it
will display a placeholder character. I don't use IE much so I'm not
certain what the placeholder IE displays, a small box, a question-mark
or something else.
If Java writes a character that is not present in the specified output
character set then I expect it might also substitute a placeholder
character.
Also Big5 is weird, apparently it doesn't exactly encode characters, it
encodes logograms or parts of graphical characters. It also has to be
paired with a single-byte character-set that isn't specified in the Big5
standard. Also there are variants of Big5. Lots of scope for encoding
issues. Maybe Java and IE disagree about Big5 variants?
<http://en.wikipedia.org/wiki/Big5>
P.S. IE6 is old and a security hazard, I'd upgrade.
--
RGB
Our ISO8859-1 Database(Progress Database) have some Japanese/Korea/
Simplified Chinese and Traditional Chinese. Those Language imported by
lookup function. e.g. When User Input "G" in particular , the lookup
program will get "Green" in corresponding Language Character set.
Also, I checked other GB2312 Database(Progress Database), the Encoding
Value of "测试" (in English "TEST") same as IS08859-1. Checked by unix
tool "od -ct x1 file_name".
For BIG5 conversion, I just for testing how to change GB2312 to BIG5.
My Boss ask me for check what is the encoding value for "TEST" in
GB2312 or BIG5. So, I want convert to BIG5 to check what encoding
value in BIG5.
I will add the exceptions back.
Thank a lot.
moonhkt
.
- Follow-Ups:
- Re: change ISO8859-1 to GB2312 to UTF-8 to EBCDIC to Big5 to ...
- From: RedGrittyBrick
- Re: change ISO8859-1 to GB2312 to UTF-8 to EBCDIC to Big5 to ...
- References:
- change ISO8859-1 to GB2312
- From: moonhkt
- Re: change ISO8859-1 to GB2312
- From: Lew
- Re: change ISO8859-1 to GB2312
- From: moonhkt
- Re: change ISO8859-1 to GB2312
- From: Lew
- Re: change ISO8859-1 to GB2312
- From: moonhkt
- Re: change ISO8859-1 to GB2312
- From: RedGrittyBrick
- Re: change ISO8859-1 to GB2312
- From: moonhkt
- Re: change ISO8859-1 to GB2312
- From: RedGrittyBrick
- Re: change ISO8859-1 to GB2312
- From: moonhkt
- Re: change ISO8859-1 to GB2312
- From: RedGrittyBrick
- Re: change ISO8859-1 to GB2312
- From: moonhkt
- Re: change ISO8859-1 to GB2312
- From: RedGrittyBrick
- Re: change ISO8859-1 to GB2312
- From: moonhkt
- Re: change ISO8859-1 to GB2312 to UTF-8 to EBCDIC to Big5 to ...
- From: RedGrittyBrick
- change ISO8859-1 to GB2312
- Prev by Date: Re: Design Questions about static factory classes
- Next by Date: Re: To what extent can Java be written in Chinese?
- Previous by thread: Re: change ISO8859-1 to GB2312 to UTF-8 to EBCDIC to Big5 to ...
- Next by thread: Re: change ISO8859-1 to GB2312 to UTF-8 to EBCDIC to Big5 to ...
- Index(es):
Relevant Pages
|