Fast UTF-8 strlen function



Is there a fast UTF-8 string length function floating around? I've been
playing around with UTF-8 lately (after the recent UTF-8 interest on
this board) and I'm wondering if there are some code examples (not
unlike the zstring length function typically found in the optimization
manuals) of a high-performance string length function for UTF-8?
Obviously, such a function is going to be the basis of a UTF-8 string
library module, so that seems like a good place to start.

BTW, HLA v1.x seems to handle UTF-8 string data just fine. Of course,
the library modules aren't all going to work properly (e.g., str.len),
but the basic support for UTF-8 is there. I don't have any plans of
adding UTF-8 support to the existing HLA standard library, but I may as
well add it to HLA v2.0 as the main issues is one of supplying
appropriate library routines (both in the HLA stdlib and in the HLA
compile-time language).

Cheers,
Randy Hyde

.



Relevant Pages

  • Re: Fast UTF-8 strlen function
    ... playing around with UTF-8 lately and I'm wondering if there are some code examples of a high-performance string length function for UTF-8? ... Obviously, such a function is going to be the basis of a UTF-8 string library module, so that seems like a good place to start. ... the library modules aren't all going to work properly, but the basic support for UTF-8 is there. ... I don't have any plans of adding UTF-8 support to the existing HLA standard library, but I may as well add it to HLA v2.0 as the main issues is one of supplying appropriate library routines (both in the HLA stdlib and in the HLA ...
    (alt.lang.asm)
  • Re: RosAsm Bliss, and NASM Bliss
    ... Windows and Linux. ... looks like a Troll compared to the windows one. ... I think its hard to judge from the HLA code. ... checks to see if there is already a string allocated, if so, frees ...
    (alt.lang.asm)
  • Re: Interpretation of extensions different from Unix/Linux?
    ... the use of UTF-8 in this way is the recommendation of the ARG. ... (UTF-8 is a problem of its own in Ada. ... a UTF-8 encoded string is a String. ... You can't enumerate roots in Windows, ...
    (comp.lang.ada)
  • Re: Unicode Delphi Win32 - which approach
    ... I like the backwards compatibility aspects of UTF-8 vs UTF-16. ... The first 256 Unicode characters map to the ANSI character set. ... entire stream> but calling an API 100 times in a loop I can imagine. ... and explicitly contextualise every string. ...
    (borland.public.delphi.non-technical)
  • Re: UTF-8 encoding
    ... I need to pass a UTF-8 encoded writer ... reading that file with the system's default encoding. ... String), but used elsewhere as if it were a StringBuffer. ... There's a very good reason that ...
    (comp.lang.java.programmer)