Re: Why doesn't strrstr() exist?



Old Wolf wrote:
> websnarf@xxxxxxxxx wrote:
> > Antoine Leca wrote:
> >> Paul Hsieh va escriure:
> >>> Remember that almost every virus, buffer overflow exploit, core
> >>> dump/GPF/etc is basically due to some undefined situation in the
> >>> ANSI C standard.
> >>
> >> <OT>
> >> The worst exploit I've seen so far was because a library dealing
> >> with Unicode was not checking about malformed, overlong, UTF-8
> >> sequences, and allowed to walk though the filesystem
> >
> > In any event, compare this to Java, where Unicode is actually the
> > standard encoding for string data.
>
> Unicode is a character set, not an encoding.

Right. It turns out that UTF-16 is the encoding (I don't know whether
or not its LE or BE, but I suspect, its not an exposed thing from the
representation point of view -- i.e., it just matches whatever works
best for your platform.)

> > Its not really possible to have "unicode parsing problems" in Java,
> > since all this stuff has been specified in the core of the language.
>
> AFAIK the language doesn't specify how to deal with Unicode
> characters whose value is greater than 65,535

Not so. It specifies UTF-16, which can represent the whole range.

> Does it handle UTF-8, big-endian UCS-2, little-endian UCS-2,
> b-e UTF16, l-e UTF16, and UCS-4 ? All of those occur in the
> real world (unfortunately!)

I am not *that* familliar with Java. But I wouldn't be surprised if
Java didn't come with utilities to support all of those. UCS-2 are
just subsets of UTF16, and UCS-4 is trivial. The only real question is
UTF-8 support, which I don't know about.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

.



Relevant Pages

  • Re: Confusion between UTF-8 and Unicode
    ... There are two "official" standards for the encoding ... Unicode consortium; their version of it is identical to the IS0 version except ... Now Sun enter the picture. ... Start with the situation before Java 5. ...
    (comp.lang.java.programmer)
  • Re: Is the default Java character encoding always Cp1252?
    ... >checking encoding, even when supposedly set to Turkish. ... >like there is an IDE setting to specify an encoding, ... >What I am confused about is whether Java is not seeing my computer as ...
    (comp.lang.java.programmer)
  • Re: utf8 silly question
    ... You can first convert your c string to unicode, ... specify an encoding that understands non-ASCII characters (if you don't ... Then you can utf8-encode the c string via the codecs module. ...
    (comp.lang.python)
  • Re: Java process default character set?
    ... CALL MD5SUM PARM ... If I have this program run via Java it returns: ... Java does use Unicode, but Unicode is not a particular encoding. ...
    (comp.sys.ibm.as400.misc)
  • Re: How to read html files AS IS. Encoding seems to change the characters.
    ... i didn't specify the write encoding. ... If you save the file using utf-8, all the characters will still be there, as strings are unicode and utf-8 can store any unicode characters. ...
    (microsoft.public.dotnet.languages.csharp)