Re: More elegant UTF-8 encoder
- From: Keith Thompson <kst-u@xxxxxxx>
- Date: Fri, 15 Jun 2007 22:53:40 -0700
websnarf@xxxxxxxxx writes:
On Jun 15, 3:26 am, rich...@xxxxxxxxxxxxxxx (Richard Tobin) wrote:
In article <1181783286.771652.130...@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>,
<websn...@xxxxxxxxx> wrote:
On a modern processor you are getting your ass kicked on the control
flow.
That depends. You also need to take into account the distribution of
your data. If it consists only of English text, then 99.9% of the
characters will be ASCII, so an immediate test
if (c < 0x80) return c;
is a big win. If you include western European languages, it will
still get about 90% of characters.
Tell that to the Greeks, French or Russians. The above is a good
idea, basically for English, and may be ok for Spanish and German.
Greek and Russian are not western European languages. French uses
accented characters, but the majority of typical French text is plain
ASCII, n'est-ce pas?
[...]
--
Keith Thompson (The_Other_Keith) kst-u@xxxxxxx <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
.
- References:
- More elegant UTF-8 encoder
- From: Bjoern Hoehrmann
- Re: More elegant UTF-8 encoder
- From: websnarf
- Re: More elegant UTF-8 encoder
- From: Richard Tobin
- Re: More elegant UTF-8 encoder
- From: websnarf
- More elegant UTF-8 encoder
- Prev by Date: Re: order of eveluation of functions
- Next by Date: Re: how the following code works?
- Previous by thread: Re: More elegant UTF-8 encoder
- Next by thread: Re: More elegant UTF-8 encoder
- Index(es):
Relevant Pages
|