Re: Fast UTF-8 strlen function
- From: Frank Kotler <fbkotler@xxxxxxxxxxx>
- Date: Tue, 10 May 2005 19:21:04 -0400
randyhyde@xxxxxxxxxxxxx wrote:
>
> Is there a fast UTF-8 string length function floating around?
What's the meaning of "string length" of a UTF-8 (or other
unicode) string? Length in bytes, or length in characters?
As NoDot points out, if we want length in bytes, a "regular"
strlen ought to work.
Beth raised this issue on the luxasm-devel list, too
(although not asking for an optimized function). IIRC, her
example sent a string to stdout, and the question was "what
goes in edx". I'm pretty sure we need length in bytes here,
and in "most" cases (allocating memory, e.g.), but sometimes
we'd want the length in characters (plus "font metrics" to
determine where the "next" print position would be, for
example).
I suppose... if we encounter a byte with the high bit clear,
we just count "one". If we encounter a byte with the high
bit set, we determine how many bits are set (look-up
table?), and skip that many bytes, counting "one" for the
whole mess... I don't see an "optimized" version of this
working out very well...
Beth says Nasm accepts UTF-8 strings in quoted strings (and
comments) "by accident". I don't think it's "by accident", I
think it's by "careful design"... not *Nasm's* design, but
UTF-8's. As Betov observed, the risk is that we'd encounter
a "false end-quote" (or false EOL, in a comment). As long as
that doesn't happen (and Beth's explanation assures us it
won't), it's "just bytes", and the assembler doesn't need to
care what it represents.
LuxAsm, since it'll include an editor, *may* need to
determine length in characters, too... My big question is,
if a user presses the key for "King Tut", what in hell kind
of an "event" do we get???
Best,
Frank
.
- Follow-Ups:
- Re: Fast UTF-8 strlen function
- From: Beth
- Re: Fast UTF-8 strlen function
- From: Chewy509
- Re: Fast UTF-8 strlen function
- From: randyhyde
- Re: Fast UTF-8 strlen function
- References:
- Fast UTF-8 strlen function
- From: randyhyde
- Fast UTF-8 strlen function
- Prev by Date: Re: Byte vs. Dword aligned accesses
- Next by Date: Re: Early fruits of my labour
- Previous by thread: Re: Fast UTF-8 strlen function
- Next by thread: Re: Fast UTF-8 strlen function
- Index(es):
Relevant Pages
|