Re: Fast UTF-8 strlen function
- From: Sevag Krikorian <kahlinor@xxxxxxxxxxxxx>
- Date: Wed, 11 May 2005 21:53:59 -0400
Beth wrote:
Randy wrote:
Sevag Krikorian wrote:
If you're going with new library routines, why not use UTF-32
instead?
UTF-32 is an *okay* internal format, but AFAICT it's not widely accepted as an external format.
Yeah; Basically because it's "too fat"...4 bytes per character is a bit "steep" a price to pay, especially if you're not interested in the oriental ideographs because that's about all that's up there in the >64K range...so, that's, like, 2 "pad bytes" on every character...
Worse, UNICODE say there are only interested to ever go up to some 2^20 characters, anyway...so, that's over a byte per character used for absolutely nothing but "alignment", really...
[ Indeed, it isn't one of the UNICODE standard encodings, in fact, but perhaps a UTF-24 - three bytes per character - could be added? Yeah, that's a little "unusual"...but don't forget that pixels are already 24-bit in "true colour" modes...so, "unusual" but not unheard of... ]
So many different character formats, it's just insane. Everyone should just speak English and be done with it!
It would be nice to come up with a new phonetic alphabet that just uses
the standard 27 keys. If you drop the redundancies in English characters, that would free up several possible keys for adding new
'sounds' ... that also leaves plenty of space in the byte for useful 'symbolic' characters.
eg: ku = 'q' -- frees up 'q' for a new sound s - c - k -- just use 's' for all 's' and 'c' sound, free up 'c' use 'k' for all 'k' and 'c' words that sound like 'k' use 'c' for ch and change 'ch' to the gutteral version "loch" or "ach"
It's possible to fit all sounds in use by all languages in 27 keys along with 2 key combos.
I have no problem with droping the Armenian alphabet: 36 characters with several redundancies in exchange for a phonetic alphabet that uses latin characters. But alas, many people are too stubborn or proud to do so and we're stuck with UTF-xx
-- [kain] http://www.geocities.com/kahlinor .
- Follow-Ups:
- Re: Fast UTF-8 strlen function
- From: NoDot
- Re: Fast UTF-8 strlen function
- From: T.M. Sommers
- Re: Fast UTF-8 strlen function
- From: Chewy509
- Re: Fast UTF-8 strlen function
- References:
- Fast UTF-8 strlen function
- From: randyhyde
- Re: Fast UTF-8 strlen function
- From: Sevag Krikorian
- Re: Fast UTF-8 strlen function
- From: randyhyde
- Re: Fast UTF-8 strlen function
- From: Beth
- Fast UTF-8 strlen function
- Prev by Date: Re: Fast UTF-8 strlen function
- Next by Date: Re: Alignment rules for PIV, PM, and Xeon processors
- Previous by thread: Re: Fast UTF-8 strlen function
- Next by thread: Re: Fast UTF-8 strlen function
- Index(es):
Relevant Pages
|