Re: Fast UTF-8 strlen function
- From: Sevag Krikorian <kahlinor@xxxxxxxxxxxxx>
- Date: Thu, 12 May 2005 01:18:18 -0400
NoDot wrote:
Sevag Krikorian wrote:
So many different character formats, it's just insane. Everyone
should
just speak English and be done with it!
I think Beth's already commented along these lines.
It would be nice to come up with a new phonetic alphabet that just
uses
the standard 27 keys. If you drop the redundancies in English characters, that would free up several possible keys for adding new 'sounds' ... that also leaves plenty of space in the byte for useful 'symbolic' characters.
eg: ku = 'q' -- frees up 'q' for a new sound s - c - k -- just use 's' for all 's' and 'c' sound, free up 'c' use 'k' for all 'k' and 'c' words that sound like 'k' use 'c' for ch and change 'ch' to the gutteral version "loch" or
"ach"
It's possible to fit all sounds in use by all languages in 27 keys
along
with 2 key combos.
Um, look at the phoenics of the word "through". The "th" sound is a consonant sound that realy could go almost anywhere.
Keep in mind that a word like "through" had a different pronouciation at an earlier time in history, with a more guttural sound for "gh"
For the modern represantation using phonetic sounds, I think: th-r-u
Don't forget, also, the Japanese "*y*" sets made of a consonant at the start, the "y", and a vowel that goes with "y" ("a", "o", and "u", if memory serves). Next, my favorite Japanese syllable: "tsu". It's one that takes a few moments of practice, but it's fun to say. Lastly, we have the Japanese "n" syllable, and it's the only case of repeating with "na", "ne", "ni", "no", "nu", and "n".
Isn't that the case with all languages? The vowel modifies the consonant. It seems you already have the representations of Japanese syllables using Latin characters.
The case of a syllable like "tsu" can be broken down, it is already a consonant "ts" + a vowel "u".
When you write "tsu" do you mean "ts" as in "boo_ts_" + u or more a "dsu" as in "bu_ds_" + u ? Sorry, not familiar with Japanese. If it's like Chinese, I expect it to sound like "dsu" as in Sun Tsu.
The point is any sound can be represented by 27 characters + combos if there is international agreement on what letters represent what sounds.
A funny thing with the "ts" sound. An English speaking friend of mine had difficulty pronouncing that and I was surprised as it is already used in English "Boo_ts_" He could easily pronounce "boots" but not "ts" by itself.
(Oh, and what vowels are we going to use? Japanese uses "a" ("father"), "e" (short 'e'), "i" (long 'e'), "o" ("'Oh,' you don't say!"), and "u" ("through"). Should the short 'i' ("bit") should be included?)
All the vowels are already in the standard Latin-set. We'll keep all those as they are important in phonitic-combos. Short vowles/consonants can be represented by a back-quote character "`". That leaves tilde '~' for maybe long vowels/consonants. If 'hard' is desired, use the same letter twice: "kk" for hard "k".
Let's step back for a minute and say you did try comming up with this: what kind of culture would it be aimed at? If it comes up online, then you'll definately need Information Technology terminology, acronyms, and other things in the vocabulary. Give it a few seconds thought, and you'll see it isn't such an easy problem.
I'm not talking about a new language (though that would be nice for the linguists to figure out). I'm talking about people writing their native language using the Latin-characters.
For example, I write this Hai phrases in latin characters:
How are you? "inch bes es?"
However, if I want to ask if you want an apple:
"kh'ntsor guzes?"
Starts to get confusing without a strict set of rules. "kh" guttural, like the "ch" the German "ach" 'n short n ts like the "ts" in "boots"
The Armenian alphabet has 36 different syllables, remove the redundancies and you still have about 32. Yet with word-combos, every syllable can be represented by the latin character set as it is now. The latin character set can even be 'cleaned' up by removing the redundancies there to make room for more syllables to keep down the sheer number of combos.
So there really is no need for me to use a special keyboard or retarted character format to see a graphical representation of the Armenian alphabet.
-- [kain] http://www.geocities.com/kahlinor
NoDot, ...
P.S. The phoenics of "through" might turn out to be "thru", for the interested.
Hey, we came up with the same thing!
Here is my quick and dirty chart
a a in car b b in bat c ch in chore ch ch in German ach d d in door dh th in that ds ds in buds e a in cat 'e i in bit f f in fat g g in good gh hard guttural, no equivalent h h in hat i ee in feel j j in jog jh j in French Jean k k in took 'k ke in rake kh possible alternate for "ch" above l l in late m m in moon n n in moon o o in oh p p in peer q free r r in rat s s in seer t t in rat th th in through ts ts in boots u oo in door v v in vat w w in wool x free y y in yield z z in zoo
It's a pretty comprehensive list which leaves 2 free ones. Also, if we use "kh" instead of the proposed "ch", move the proposed "c" to "ch", that will free up another letter "c" for whatever I missed. Some work still needs to be done on the vowels, but nothing that can't be worked out to fit easily in 7 bits.
-- [kain] http://www.geocities.com/kahlinor .
- Follow-Ups:
- Re: Fast UTF-8 strlen function
- From: NoDot
- Re: Fast UTF-8 strlen function
- References:
- Fast UTF-8 strlen function
- From: randyhyde
- Re: Fast UTF-8 strlen function
- From: Sevag Krikorian
- Re: Fast UTF-8 strlen function
- From: randyhyde
- Re: Fast UTF-8 strlen function
- From: Beth
- Re: Fast UTF-8 strlen function
- From: Sevag Krikorian
- Re: Fast UTF-8 strlen function
- From: NoDot
- Fast UTF-8 strlen function
- Prev by Date: Re: Bitwise Operator (was: Early fruits of my labour)
- Next by Date: Re: Fast UTF-8 strlen function
- Previous by thread: Re: Fast UTF-8 strlen function
- Next by thread: Re: Fast UTF-8 strlen function
- Index(es):
Relevant Pages
|