how do I write a regex that looks for 'X' 'NOT Y' 'Z'
From: Bram Mertens (M8ram_at_linux.be)
Date: 03/30/04
- Next message: max4o_at_magic.g00net.org: "Compilation"
- Previous message: Adrian N. Ichim: "RE: Incrementing count"
- Next in thread: Harry Putnam: "Re: how do I write a regex that looks for 'X' 'NOT Y' 'Z'"
- Reply: Harry Putnam: "Re: how do I write a regex that looks for 'X' 'NOT Y' 'Z'"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
To: perl beginners-digest mailing list <beginners@perl.org> Date: Tue, 30 Mar 2004 15:34:52 +0200
Hi
I'm trying to write a rule for SpamAssassin that looks for the following
in a message:
"From: " followed by "anything BUT 'Mertens Bram' or 'Bram Mertens'"
followed by "<my_e-mail-address>"
So these two shouldn't trigger the rule:
From: Bram Mertens <my_e-mail-address>
From: Mertens Bram <my_e-mail-address>
But something like this should trigger it:
From: "optometric" <my_e-mail-address>
this rule catches the above:
/from\:\s\"optometric\"\s<my_e-mail-address>/i
But the rule needs to catch other fake names as well.
I've tried among others:
/From\:\s(?:(?:Bram\sMertens\s)|(?:Mertens\sBram\s))<my_e-mail-address>/i
/from\:\s(?!(?:Bram\sMertens\s)|(?:Mertens\sBram\s))<my_e-mail-address>/i
/from\:\s(?<!(?:Bram\sMertens\s)|(?:Mertens\sBram\s))my_e-mail-address>/i
/from\:\s(^(?:Bram\sMertens\s)|(?:Mertens\sBram\s))<my_e-mail-address>/i
/from\:\s[^(?:Bram\sMertens\s)|(?:Mertens\sBram\s)]<my_e-mail-address>/i
this partly works:
/from\:\s(?!(?:Bram\sMertens\s)|(?:Mertens\sBram\s)<my_e-mail-address>)/i
Only this look for "From: " NOT followed by "Bram Mertens
<my_e-mail-address>" or "Mertens Bram <my_e-mail-address>"
So it will also trigger on 'From: "jack" <jack@home.com>' or even 'From:
' which is not what I want.
Somebody suggested to use a rule like:
/From\:\s".*"\s*<my_e-mail-address>/i
And another rule to catch the 2 exceptions. But the .* means that the
parser might test the entire e-mail making the test slow and heavy on
memory-usage.
Something like:
/From\:\s".{0,20}"\s*<my_e-mail-address>/i prevents this but I'd like to
know if there's a better solution.
Perhaps testing against some characters, or character-combinations that
don't exist in 'Bram Mertens' or 'Mertens Bram'?
Is there a way to test how (in)efficient or demanding a certain rule is?
(Sorry for the long post.)
TIA
-- # Mertens Bram "M8ram" <M8ram@linux.be> Linux User #349737 # # SuSE Linux 8.2 (i586) kernel 2.4.20-4GB i686 256MB RAM # # 3:16pm up 8 days 18:53, 8 users, load average: 0.09, 0.19, 0.10 #
- Next message: max4o_at_magic.g00net.org: "Compilation"
- Previous message: Adrian N. Ichim: "RE: Incrementing count"
- Next in thread: Harry Putnam: "Re: how do I write a regex that looks for 'X' 'NOT Y' 'Z'"
- Reply: Harry Putnam: "Re: how do I write a regex that looks for 'X' 'NOT Y' 'Z'"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|
|