support for UTF-8 in C language standard?



Does any standard C function support reading or writing UTF-8?
I'm not talking about the trivial case where the text is just the
ASCII subset of UTF-8. Rather, I'm referring to a hypothetical
function that could read UTF-8 when 2, 3, or even 4 byte encodings are present and store the final unencoded character in, I guess, an array
of 32 bit integers.

I'm guessing that there _might_ be functions for this somewhere
in the C standard because trying to apply typical text manipulations
on a UTF-8 string directly seems to be quite messy and slow.
For instance, even a simple operation like "swap characters 1002->1005
with 2007->2010" would be a pain, you'd pretty much have to
parse from the beginning of the UTF-8 string
just to find the specified ranges, and then they might be different
numbers of bytes. So even though the number of characters is the same
they couldn't just be swapped byte for byte.

Thanks,

David Mathog

.



Relevant Pages

  • Re: Transmitting strings via tcp from a windows c++ client to a Java server
    ... the length is "followed by a standard UTF-8 byte encoding of the ... However there is also a major difference in how it encodes ... will encode in UTF-8 as (taken from the Uncode Standard 4.0.1, ... Unicode characters. ...
    (comp.lang.java.programmer)
  • Re: Authenticating an UTF-8, I18N field in struts using regular expressions
    ... is there any such thing as an invalid UTF-8 encoding ... Java doesn't actually use the official UTF-8 standard. ... and how characters outside the Basic Multilingual Plane are encoded. ...
    (comp.lang.java.programmer)
  • Re: No call for Ada (was Re: Announcing new scripting/prototyping language)
    ... there's no standard or even existing libraries ... And which Ada compiler do you want to use? ... As for UTF-8, that is again an implementation specific decision as to ... the external representation of Wide_Character. ...
    (comp.lang.ada)
  • Re: Kaputte Umlaute
    ... > Warum soll es kein Argument sein dass UTF-8 im Usenet kein Standard ist? ... >> Ok, das ist in einer Phase, bevor s?mtliche beteiligte Software UTF-8 f?hig ...
    (de.rec.film.heimkino)
  • Re: RfD: c-addr/len
    ... You say that UTF-8 works on the top of octet bytes, octet characters, ... the Forth94 standard allows this sort of implementation. ...
    (comp.lang.forth)