Re: Fast UTF-8 strlen function
- From: Chewy509@xxxxxxxxxxxxxxxx
- Date: 10 May 2005 18:11:34 -0700
Frank Kotler wrote:
> I suppose... if we encounter a byte with the high bit clear,
> we just count "one". If we encounter a byte with the high
> bit set, we determine how many bits are set (look-up
> table?), and skip that many bytes, counting "one" for the
> whole mess... I don't see an "optimized" version of this
> working out very well...
Easy, if the first character is:
< 07fh single char encoding
080h -> 0deh double char encoding
0dfh -> 0efh triple char encoding
0f0h -> 0ffh quad char encoding
Based on one of these, you could skip ahead x places. However you still
need to consider or check for invalid UTF8 encodings, and also if Randy
want's combining characters counted or not-counted as separate
characters? (In which case, if he does, he then has to determine if the
character is a combining character or not).
--
Darran (aka Chewy509) brought to you by Google Groups!
.
- References:
- Fast UTF-8 strlen function
- From: randyhyde
- Re: Fast UTF-8 strlen function
- From: Frank Kotler
- Fast UTF-8 strlen function
- Prev by Date: Re: Fast UTF-8 strlen function
- Next by Date: Re: Fast UTF-8 strlen function
- Previous by thread: Re: Fast UTF-8 strlen function
- Next by thread: Re: Fast UTF-8 strlen function
- Index(es):
Relevant Pages
|