Re: Fast UTF-8 strlen function



Beth wrote:
Randy wrote:

Sevag Krikorian wrote:

If you're going with new library routines, why not use UTF-32

instead?

UTF-32 is an *okay* internal format, but AFAICT it's not widely
accepted as an external format.


Yeah; Basically because it's "too fat"...4 bytes per character is a bit
"steep" a price to pay, especially if you're not interested in the oriental
ideographs because that's about all that's up there in the >64K range...so,
that's, like, 2 "pad bytes" on every character...

Worse, UNICODE say there are only interested to ever go up to some 2^20
characters, anyway...so, that's over a byte per character used for
absolutely nothing but "alignment", really...

[ Indeed, it isn't one of the UNICODE standard encodings, in fact, but
perhaps a UTF-24 - three bytes per character - could be added? Yeah, that's
a little "unusual"...but don't forget that pixels are already 24-bit in
"true colour" modes...so, "unusual" but not unheard of... ]


So many different character formats, it's just insane. Everyone should just speak English and be done with it!

It would be nice to come up with a new phonetic alphabet that just uses
the standard 27 keys. If you drop the redundancies in English characters, that would free up several possible keys for adding new
'sounds' ... that also leaves plenty of space in the byte for useful 'symbolic' characters.


eg:
ku = 'q' -- frees up 'q' for a new sound
s - c - k -- just use 's' for all 's' and 'c' sound, free up 'c'
use 'k' for all 'k' and 'c' words that sound like 'k'
use 'c' for ch and change 'ch' to the gutteral version "loch" or "ach"

It's possible to fit all sounds in use by all languages in 27 keys along with 2 key combos.

I have no problem with droping the Armenian alphabet: 36 characters with several redundancies in exchange for a phonetic alphabet that uses latin characters. But alas, many people are too stubborn or proud to do so and we're stuck with UTF-xx

--
[kain]
http://www.geocities.com/kahlinor
.



Relevant Pages

  • Re: Where to put interface?
    ... > 1) You need your device dependent API which returns characters. ... Both have peculiarities (ie, arrow keys, ... > 2) You should have a second level which translates your generic ... The most common output is as you described -- displaying the map, ...
    (rec.games.roguelike.development)
  • Re: The Last Remnant
    ... Smallish 3rd person characters rushing around with the screen flashing ... One thing I thought was poor design is that you can control the 3rd person ... of as RPG. ... that even help any if one knew exactly what all those keys are for? ...
    (comp.sys.ibm.pc.games.rpg)
  • Re: Great SWT Program
    ... failing of the GUI user-interface paradigm. ... And that probably works fine if the remote files aren't too big, ... the keys used for movement and cut, copy, paste can't even be depended ... characters and then type in the resulting number before hitting my ...
    (comp.lang.java.programmer)
  • Getting the correct size of a glyph in a font
    ... the text for the differing keys. ... however I found that those characters which have a descent ... GLYPHMETRICS and use the origin and its blackboxy, ...
    (microsoft.public.win32.programmer.gdi)
  • Re: Text Messages in Chinese and Japanese (WAS: Writing French without accented characters)
    ... > characters in text messages. ... > characters for the string of pinyin text I input using the digit keys ... > "occidental phones". ... Once the appropriate key is entered, lists of kanji would ...
    (sci.lang)