Re: Unescaping Unicode code points in a Java string




"Greg" <greghe@xxxxxxxxxxx> wrote in message news:1157007079.550984.122030@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
My Java program reads in (from an external source) text that contains
the same sort of unicode character escape sequences as java source
code. For example, one such string might be:

"En Espa\u00f1ol"

Naturally, I would like to convert the five characters subsequence,
"\u00f1", into the single character codepoint (hex 00F1) that those
characters actually represent:

"En Español"

I've been browsing the J2SE 1.5 docs hoping to find a convenient method
to perform this kind of conversion, but so far have not found one. Does
anyone have any suggestions?

Iterate through each character of the String, looking for the sequence "\u". If you find it, delete those two chars, and read in the next 4 chars. Parse that sequence of 4 characters into a integer assuming hexadecimal notation. Take that integer and cast it to a char, and insert the resulting char back into the String.

- Oliver

.



Relevant Pages

  • Re: user defined function that converts string to float
    ... > I need user defined function that converts string to float in c. ... initial, possibly empty, sequence of white-space characters (as ... point character, then an optional exponent part as defined in ... then a nonempty sequence of hexadecimal digits ...
    (comp.lang.c)
  • Re: Check for Common character sequence ( I will pay)?
    ... Dude, programming is all problem-solving. ... You need to identify character sequences of 3 or more characters that appear ... in more than one string. ... and test each 3-character sequence that results. ...
    (microsoft.public.dotnet.framework)
  • Re: Check for Common character sequence ( I will pay)?
    ... Do I need to return an array? ... You need to identify character sequences of 3 or more characters that appear ... in more than one string. ... and test each 3-character sequence that results. ...
    (microsoft.public.dotnet.framework)
  • Re: Check for Common character sequence ( I will pay)?
    ... Yes you are returning an array of FoundString objects. ... in more than one string. ... This means that you have to identify sequences 1 character at a time, ... Again, obviously, if the 3-character sequence doesn't match, neither will ...
    (microsoft.public.dotnet.framework)
  • [TOMOYO #15 3/8] Common functions for TOMOYO Linux.
    ... This file contains common functions (e.g. policy I/O, pattern matching). ... Since TOMOYO Linux is a name based access control, ... TOMOYO Linux's string manipulation functions make reviewers feel crazy, ... the Linux kernel accepts all characters but NUL character ...
    (Linux-Kernel)