Re: Emoticon text parser
- From: Christian <fakemail@xxxxxx>
- Date: Mon, 24 Mar 2008 17:47:38 +0100
Karsten Wutzke schrieb:
On 22 Mrz., 21:37, Karsten Wutzke <kwut...@xxxxxx> wrote:Use find() method on the matcherOn 21 Mrz., 10:21, Jussi Piitulainen <jpiit...@xxxxxxxxxxxxxxxx>
wrote:
Karsten Wutzke writes:And it is great! It works like a charm and even seems to be fast asHere are the possible strings applying to each position:That is very close to a regular expression already. It's as if your
hair = {"o", "O", ">", "}", "]", ")"} <-- hair optional!
eyes = {":", ";", "8"}
subeyes = {"'", ","} <-- subeyes optional!
nose = {"-"} <-- nose optional!
mouth = {")", "(", "s", "S", "d", "D",
"p", "P", "c", "C", "o", "O",
"#", "@", "*", "$", "|",
"))", "(("}
beard = {"="} <-- beard optional!
are spelling out the meaning of such an expression here.
Most of these are character sets. The exceptions are the two
two-character mouths, so mouth must be partly an alternation.
hair = [oO>}\])]? "]" must be escaped
eyes = [:;8] no problem
subeyes = [',]?
nose = -
mouth = (?:[sSdDpPcCoO#@*$|]|\)\)?|\(\(?)
This is [...] | one or two of ) | one or two of (,
parentheses need escaping, and I've wrapped it all
in (? ) to make it a non-capturing group.
beard = =?
Put it all together, in a string, which requires doubling the escapes:
"[oO>}\\])]?[:;8][',]?-(?:[sSdDpPcCoO#@*$|]|\\)\\)?|\\(\\(?)=?". Ouch.
It does look ugly.
We can ease the pain with the COMMENT flag of Pattern; must escape the
comment character # then; end comments with ends of line. Let's make
it CASE_INSENSITIVE too.
import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Test {
public static void main(String [] args) {
Pattern p =
Pattern.compile
("[o>}\\])]? # hair, optional \n" +
"[:;8] # eyes \n" +
"[',]? # subeyes, optional \n" +
"-? # nose, optional \n" +
"(?: [sdpco\\#@*$|] " +
" | \\)\\)? " +
" | \\(\\(? ) # mouth \n" +
"=? # beard, optional \n",
Pattern.COMMENTS | Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(args[0]);
while (m.find()) {
System.out.println("Found " + m.group() + " at " +
m.start() + " to " + m.end());
}
}
}
That's about the best I can do.
lightning... I also split up the sub components into several strings
as Christian suggested instead of the commenting stuff. I suppose this
was made is for loading (commented) files from disk.
One question that remains is:
The pattern really just addresses strings that are *exactly* 2-7 chars
long. Do I understand right, that there's no way to automatically
detect a pattern ":-)" in the string " :-)" or ":-) " or
" :-) " directly???
you already have made a pattern that matches all 2-7 char smileys .. use find() to find one after the other in a string with any length..Do I always have to make a list of starting characters and then scan
for a 7 char string, a 6 lenght, a 5 length... until maybe one pattern
matched?
.Karsten
PS: I'm really really happy :-D ATM
Would be great if someone could look over that last question I have, I
suspect it got somewhat overlooked due to the chinese spammer...
Karsten
- Follow-Ups:
- Re: Emoticon text parser
- From: Karsten Wutzke
- Re: Emoticon text parser
- References:
- Emoticon text parser
- From: Karsten Wutzke
- Re: Emoticon text parser
- From: Peter Duniho
- Re: Emoticon text parser
- From: Karsten Wutzke
- Re: Emoticon text parser
- From: Peter Duniho
- Re: Emoticon text parser
- From: Karsten Wutzke
- Re: Emoticon text parser
- From: Jussi Piitulainen
- Re: Emoticon text parser
- From: Karsten Wutzke
- Re: Emoticon text parser
- From: Karsten Wutzke
- Emoticon text parser
- Prev by Date: Re: Create String from char - so difficult??
- Next by Date: Re: Java Web Start Icons, please register your preference.
- Previous by thread: Re: Emoticon text parser
- Next by thread: Re: Emoticon text parser
- Index(es):
Relevant Pages
|