how do I write a regex that looks for 'X' 'NOT Y' 'Z'

From: Bram Mertens (M8ram_at_linux.be)
Date: 03/30/04


To: perl beginners-digest mailing list <beginners@perl.org>
Date: Tue, 30 Mar 2004 15:34:52 +0200

Hi

I'm trying to write a rule for SpamAssassin that looks for the following
in a message:
"From: " followed by "anything BUT 'Mertens Bram' or 'Bram Mertens'"
followed by "<my_e-mail-address>"

So these two shouldn't trigger the rule:
From: Bram Mertens <my_e-mail-address>
From: Mertens Bram <my_e-mail-address>

But something like this should trigger it:
From: "optometric" <my_e-mail-address>

this rule catches the above:
/from\:\s\"optometric\"\s<my_e-mail-address>/i

But the rule needs to catch other fake names as well.

I've tried among others:
/From\:\s(?:(?:Bram\sMertens\s)|(?:Mertens\sBram\s))<my_e-mail-address>/i
/from\:\s(?!(?:Bram\sMertens\s)|(?:Mertens\sBram\s))<my_e-mail-address>/i
/from\:\s(?<!(?:Bram\sMertens\s)|(?:Mertens\sBram\s))my_e-mail-address>/i
/from\:\s(^(?:Bram\sMertens\s)|(?:Mertens\sBram\s))<my_e-mail-address>/i
/from\:\s[^(?:Bram\sMertens\s)|(?:Mertens\sBram\s)]<my_e-mail-address>/i

this partly works:
/from\:\s(?!(?:Bram\sMertens\s)|(?:Mertens\sBram\s)<my_e-mail-address>)/i

Only this look for "From: " NOT followed by "Bram Mertens
<my_e-mail-address>" or "Mertens Bram <my_e-mail-address>"

So it will also trigger on 'From: "jack" <jack@home.com>' or even 'From:
' which is not what I want.

Somebody suggested to use a rule like:
/From\:\s".*"\s*<my_e-mail-address>/i

And another rule to catch the 2 exceptions. But the .* means that the
parser might test the entire e-mail making the test slow and heavy on
memory-usage.
Something like:
/From\:\s".{0,20}"\s*<my_e-mail-address>/i prevents this but I'd like to
know if there's a better solution.

Perhaps testing against some characters, or character-combinations that
don't exist in 'Bram Mertens' or 'Mertens Bram'?

Is there a way to test how (in)efficient or demanding a certain rule is?

(Sorry for the long post.)

TIA

-- 
# Mertens Bram "M8ram"   <M8ram@linux.be>          Linux User #349737 #
# SuSE Linux 8.2 (i586)     kernel 2.4.20-4GB      i686     256MB RAM #
#  3:16pm  up 8 days 18:53,  8 users,  load average: 0.09, 0.19, 0.10 #


Relevant Pages

  • Re: Serial: bug in 8250.c when handling PCI or other level triggers
    ... In which case the receive_charsfunction gobbles up to 255 characters ... from the device before relinquishing to the main interrupt loop. ... > - On a virtualised system this trap can trigger because the emulations ...
    (Linux-Kernel)
  • How to create an automated input box.
    ... I have the code below that will evaluate a string of characters. ... logic say once i enter in my last character it will trigger the logic ... Private Sub Button1_Click(ByVal sender As System.Object, ...
    (microsoft.public.dotnet.languages.vb)
  • Re: Why sql Server automatically change my input value.
    ... The default number of characters to display per column in Query ... >I don't have any trigger on the table. ... >>Can you repro the problem using Query Analyzer? ...
    (microsoft.public.sqlserver.programming)
  • Re: exim4-light vs exim4-heavy
    ... > Note that you can always trigger, say, clamav and ... > spamassassin from procmail instead, still allowing you to use the ...
    (Debian-User)
  • [PATCH] Fix isspace() and other ctype.h functions to ignore chars 128-255
    ... As following Latin-1 will most likely break UTF-8 any any *other* encoding that is backwards- compatible with 7-bit-ASCII, change ctype.c to ignore such characters completely. ... As far as the kernel is concerned, "isspace" should just accept the obvious spaces, and *perhaps* the VT/FF kind of things. ... It's basically there from v0.01, and while the really original one had all the non-ascii characters not trigger anything, it was converted to be latin1 in the 2.1.x timeframe. ...
    (Linux-Kernel)