Re: [PHP] Re: 0x9f54



Man-wai Chang wrote:
On the other hand, I remember you talked about the type of that
column to be char(2). Have you specified what encoding it's using?
Moreover, I hope you're not using legacy encoding like Big5 or GB. Use
Unicode (UTF-8) if your database is a brand new one.


Unfortunately, I am still using Big5. you need a longer field to store
utf-8 codes for the same big5 string right?

Yes. While in Big5 every (Chinese) character is represented by two
bytes, every Chinese character represented in UTF-8 uses at least three
bytes (in rare occasion, 4 bytes, if very rare characters are used such
as those in ancient Chinese). This is because UTF-8 is designed to be
8-bit compatible to old data-processing functions. In other words, for
a string containing pure Chinese characters, a UTF-8 one is 150% longer
than a Big-5 one.

You could, of course, use UTF-16 as the base format for your
string. In this case, every character is represented by 2 bytes, be it
a Western Latin character or an Eastern CJK character. OK, yes, for
rare characters, you would use up to 4 bytes, but this is rare.

Anyway, you should look at the positive side of using Unicode
instead of the dinosaur encoding, sorry, I mean Big5 :p Hard drives
(and RAM) nowadays are getting real big, string size should be
considered as a first criterion to choose what encoding to use.

Unicode is done by an international consortium and it could support
most languages in the world. For instance, using Big5, you can't even
represent the simplest of Western European characters like in these
words: español or français!! But you could represent them using
Unicode. Actually, the ability to represent (Western) European
characters might not interest you. But using Unicode, you could store
both traditional and simplified Chinese! And this, I'm sure you're
interested. You can't do that in Big5, I'm 100% sure!

Still not convinced yet. Well, Unicode even contains traditional
Chinese characters that Big5 doesn't support. For example, a friend on
mine has this character 驊 in his first name. This character isn't
supported in Big5 and in pre-Unicode period, he had to type (馬華)!
Very stupid! Another example: 氹 is quite a common word in southern
China but this character can't be found in Big5.

So, think about using Unicode. We are in 2007 and be a modern man!



----------
* Zoner PhotoStudio 8 - Your Photos perfect, shared, organised! www.zoner.com/zps
You can download your free version.
.



Relevant Pages

  • Re: Petition to UN on Abolishment of Traditional Chinese in 2008
    ... >> So why did the traditional character set Big5 merge zhe5 ... Big5 has since long been extended to include zhe5/zhuo. ... the "correct" form for both is actually Morohashi ... The entry for u+8457 refers to Kangxi Index ...
    (sci.lang)
  • Re: Big5--->GB converter
    ... Converting Big5 text to GB text is not as simple as it seems. ... Big5_HKSCS is Big5 plus the Hong Kong Supplimentary Character Set, ... GBK is the de facto Simplified Chinese encoding scheme. ...
    (comp.lang.java.programmer)
  • Re: A Chinese Word for Ten-Thousand-Myriad?
    ... you didn't specify the character ... encoding in the article, so we can't read the Chinese. ... I suppose you use Big5? ...
    (rec.games.mahjong)
  • Re: Generating Unicode characters in a cell using formulas
    ... Unicode, Working with in Excel ... > With older multibyte Chinese code sets like Big5, ... > were the hex code for the character was in tyhe ...
    (microsoft.public.excel)
  • Re: big5 to gb
    ... What are big5 and gb characters? ... "Utada P.W. SIU" wrote in message ... > anyone know how to convert big5 character to gb character?? ...
    (microsoft.public.inetserver.asp.general)