Re: Emoticon text parser



On Thu, 20 Mar 2008 10:40:39 -0700 (PDT), Karsten Wutzke
<kwutzke@xxxxxx> wrote, quoted or indirectly quoted someone who said :

how do I write a text parser that will detect many ":-)", "]:->"
strings so that they can be replaced with small icons a text
component? Can someone direct me to some classes which might be
useful? Pattern? Looks complicated... BTW there's no real pattern in
those codes as I also use custom codes for other symbols, e.g. (cig)
oder :cig:, haven't decided that yet...

Here are 5 ways to tackle your problem:

1. see http://mindprod.com/jgloss/parser.html

2. You could do it crudely but simply with a table and loop through
the table using indexOf until you find all the emoticons. Look for the
longest ones first.

3. A very fast, but hard-to-maintain technique would be to do it with
a case statement for the first char that then looks at the second char
etc.

4. The mathematically inclined might write a program to analyse the
list of emoticons and generate code for a finite state automaton. See
http://mindprod.com/jgloss/finitestate.html

5. A practical solution might be to make a list of chars that start
emoticons. Then for each char build a list of emoticons that start
with that char. Now scan for emoticon-starting chars. When you find
one, compare the look-ahead with all the candidate emoticons that
start with that letter. you could implement this as a case with if
for the emoticon-starting letters e.g.

switch ( nextChar )
{
case ':': return look.substr(0,3).equals(":-)") ||
look.substr(0,3).equals(":-(");

case '<': return look.substr(0,3).equals("<:)");
}
--

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
.



Relevant Pages

  • Re: Emoticon text parser
    ... those codes as I also use custom codes for other symbols, ... a case statement for the first char that then looks at the second char ... list of emoticons and generate code for a finite state automaton. ... As I said I have a map of all emoticons already, ...
    (comp.lang.java.programmer)
  • Re: Justifying text output to the right
    ... For details of all the formatting codes, ... However, as a basic explanation of the codes I used, %<char> is replaced ... "s" means that it's a string. ...
    (comp.unix.shell)
  • Why my code doesnt look Perl-ish?
    ... I have developed a program that reads one string having several ... codes in octal, decimal and/or hex; ... next char could be \ itself. ...
    (comp.lang.perl.misc)
  • Re: Incorrect index computation
    ... where char is signed, ... But I was looking for a solution where I don't have to cast bufevery time. ... By the time we reach the function, buf contains only legal ISO/IEC 8859-1 codes, that is ... if not outright "type punning") must occur somewhere. ...
    (comp.lang.c)
  • Re: Storage of char in 64 bit machine
    ... character is guaranteed to be /at least/ 8 bits wide, ... For all we know, a char might be ... If your currency codes are "naturally" strings, ...
    (comp.lang.c)