Newbie string processing query

From: Tim Wright (tim.wright_at_nospam.informa.com.invalid)
Date: 02/18/04


Date: Wed, 18 Feb 2004 13:14:16 +0000

Hi all,

I'm trying to work out an efficient way to convert a String (with
extended characters - ie accents, symbols and so on) into a block of
HTML. So, for example, every instance of unicode 0224 would be replaced
by the "à" entity.

I'm sure there must be an easy way to do this (or at least, a *quick*
way to do it) but at the moment, my solution looks like this:

   String unicodeToHTML (String input) {
      StringBuffer output = new StringBuffer();
      int len = input.length();
      String oneChar = "";
      for (int i=0; i<len; i++) {
         oneChar = input.substring(i,i+1);
         if (oneChar.equals("\u0160")) oneChar = "&nbsp;";
         if (oneChar.equals("\u0161")) oneChar = "&iexcl;";
         // several hundred more "if" statements here...
         output.append(oneChar);
      }
      return output.toString();
   }

Whilst this works, it's quite astonishingly slow. I'm sure that when I
create a new String from an array of bytes (with a specified encoding)
that Java must be doing something similar to this (translating said
encoding into unicode) but somehow it does it a couple of hundred times
faster...

Any tips or suggestions would be gratefully received!

Cheers,

Tim.



Relevant Pages

  • Re: Unicode to UTF8
    ... DOMDocument), creates HTML (UTF8) ... Why would you want to convert Unicode to UTF-8? ... unicode string in an XML DOM and save the DOM the default is to save UTF-8 ...
    (microsoft.public.scripting.vbscript)
  • Re: Tranfering unicod charcters in Socket programming!
    ... You are telling about conversion b/w MBCS to Unicode. ... If this is not possible Shall I try with string to wstring ... int SendStringAsUnicode ...
    (microsoft.public.win32.programmer.networks)
  • Re: using structs like BROWSEINFO and OPENFILENAME (string members
    ... your discussion of unicode ... vs ansi reminded me to recheck my typelib and found a couple of errors. ... > is declared as string, the other is declared as long. ...
    (microsoft.public.vb.winapi)
  • Re: Tranfering unicod charcters in Socket programming!
    ... As you said I have to use std::wstring for unicode characters .But ... std::string object, which is a wrapper over ANSI string. ... int CParser::RetrieveCmd(string strRecvbuf, string* strCmd, ... bytesRecv - is the number of bytes. ...
    (microsoft.public.win32.programmer.networks)
  • Re: Tranfering unicod charcters in Socket programming!
    ... unicode string and back again. ... bytesRecv = SOCKET_ERROR; ... Rlp has doen to fix unicode ...
    (microsoft.public.win32.programmer.networks)