Re: More elegant UTF-8 encoder



websnarf@xxxxxxxxx writes:
On Jun 15, 3:26 am, rich...@xxxxxxxxxxxxxxx (Richard Tobin) wrote:
In article <1181783286.771652.130...@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>,
<websn...@xxxxxxxxx> wrote:
On a modern processor you are getting your ass kicked on the control
flow.

That depends. You also need to take into account the distribution of
your data. If it consists only of English text, then 99.9% of the
characters will be ASCII, so an immediate test

if (c < 0x80) return c;

is a big win. If you include western European languages, it will
still get about 90% of characters.

Tell that to the Greeks, French or Russians. The above is a good
idea, basically for English, and may be ok for Spanish and German.

Greek and Russian are not western European languages. French uses
accented characters, but the majority of typical French text is plain
ASCII, n'est-ce pas?

[...]

--
Keith Thompson (The_Other_Keith) kst-u@xxxxxxx <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
.



Relevant Pages

  • Character encodings
    ... I'm looking for an easy way to remove french characters and replace them ... with their english equivalent. ... French String: Jôêl ... English String: J??l ...
    (comp.lang.java.help)
  • Re: Weird Symbols appearing when I type
    ... I initially installed the "Canadian English" version, but unfortunately, Microsoft apparently thinks we all speak and write French up here. ... I changed to "American English", but the French characters do reappear, just as they have for you, on occasion. ... fix it seem to be a reboot of my computer. ...
    (microsoft.public.windows.vista.general)
  • Re: This Looks Interesting - The Rocket (Maurice Richard)
    ... meaning characters like Richard & Beliveau ... speak French while characters like Irvin speak English (presumably, ... In Quebec, the English dialog was subtitled in ...
    (rec.sport.hockey)
  • Re: [OT][Oban Star Racers] Jetix sneak preview
    ... 2D for characters, 3D for mecha. ... Does anyone know where to find downloads of the original French ... the English OP seems longer than the French OP ... OK, still no script? ...
    (rec.arts.anime.misc)
  • We need Language Exchange Partners, we can teach you French
    ... We can teach you French in exchange. ... Dalesa Dalima, 24 years old, France ... I want to learn: English ... Je suis une fille très dynamique, ...
    (soc.culture.belgium)