Re: Emoticon text parser



Karsten Wutzke schrieb:
On 22 Mrz., 21:37, Karsten Wutzke <kwut...@xxxxxx> wrote:
On 21 Mrz., 10:21, Jussi Piitulainen <jpiit...@xxxxxxxxxxxxxxxx>
wrote:



Karsten Wutzke writes:
Here are the possible strings applying to each position:
hair = {"o", "O", ">", "}", "]", ")"} <-- hair optional!
eyes = {":", ";", "8"}
subeyes = {"'", ","} <-- subeyes optional!
nose = {"-"} <-- nose optional!
mouth = {")", "(", "s", "S", "d", "D",
"p", "P", "c", "C", "o", "O",
"#", "@", "*", "$", "|",
"))", "(("}
beard = {"="} <-- beard optional!
That is very close to a regular expression already. It's as if your
are spelling out the meaning of such an expression here.
Most of these are character sets. The exceptions are the two
two-character mouths, so mouth must be partly an alternation.
hair = [oO>}\])]? "]" must be escaped
eyes = [:;8] no problem
subeyes = [',]?
nose = -
mouth = (?:[sSdDpPcCoO#@*$|]|\)\)?|\(\(?)
This is [...] | one or two of ) | one or two of (,
parentheses need escaping, and I've wrapped it all
in (? ) to make it a non-capturing group.
beard = =?
Put it all together, in a string, which requires doubling the escapes:
"[oO>}\\])]?[:;8][',]?-(?:[sSdDpPcCoO#@*$|]|\\)\\)?|\\(\\(?)=?". Ouch.
It does look ugly.
We can ease the pain with the COMMENT flag of Pattern; must escape the
comment character # then; end comments with ends of line. Let's make
it CASE_INSENSITIVE too.
import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Test {
public static void main(String [] args) {
Pattern p =
Pattern.compile
("[o>}\\])]? # hair, optional \n" +
"[:;8] # eyes \n" +
"[',]? # subeyes, optional \n" +
"-? # nose, optional \n" +
"(?: [sdpco\\#@*$|] " +
" | \\)\\)? " +
" | \\(\\(? ) # mouth \n" +
"=? # beard, optional \n",
Pattern.COMMENTS | Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(args[0]);
while (m.find()) {
System.out.println("Found " + m.group() + " at " +
m.start() + " to " + m.end());
}
}
}
That's about the best I can do.
And it is great! It works like a charm and even seems to be fast as
lightning... I also split up the sub components into several strings
as Christian suggested instead of the commenting stuff. I suppose this
was made is for loading (commented) files from disk.

One question that remains is:

The pattern really just addresses strings that are *exactly* 2-7 chars
long. Do I understand right, that there's no way to automatically
detect a pattern ":-)" in the string " :-)" or ":-) " or
" :-) " directly???

Use find() method on the matcher

Do I always have to make a list of starting characters and then scan
for a 7 char string, a 6 lenght, a 5 length... until maybe one pattern
matched?

you already have made a pattern that matches all 2-7 char smileys .. use find() to find one after the other in a string with any length..

Karsten

PS: I'm really really happy :-D ATM

Would be great if someone could look over that last question I have, I
suspect it got somewhat overlooked due to the chinese spammer...

Karsten
.



Relevant Pages

  • Re: String Manipulation Alternatives to RegEx
    ... million bytes of text to search through, as using a regular expression is ... >I have string that is 2.5 million bytes long. ... > I tried using Regular Expressions to look for patterns and replace the> pattern found with a pre-defined text. ... > So I am looking for an alternative method as I've been told Regular> Expressions are expensive. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Automation of comp.lang.javascript FAQ
    ... I cannot see a blanket statement about how string manipulation is "best ... and that is out of place in the FAQ. ... language whereas "Regular Expression" only occurs in descriptions of the ... expression is a pattern of text that consists of ordinary characters ...
    (comp.lang.javascript)
  • Re: RegExp irregularity in JScript
    ... of characters in the string is at ... All three strings match if the pattern is "."; ... the pattern as a submatch ") the entire string is returned, ... This looks like a bug in Microsoft's regular expression implementation (it ...
    (microsoft.public.scripting.jscript)
  • java regex help
    ... For ex: if the String is ... after the pattern matching ... I think my regular expression pattern might be ...
    (comp.lang.java.help)
  • java regex help
    ... For ex: if the String is ... after the pattern matching ... I think my regular expression pattern might be ...
    (comp.lang.java.softwaretools)