Re: traversing a variable with regex instead of a file

From: Angie Ahl (angie_at_vertebrate.co.uk)
Date: 10/10/03


Date: Fri, 10 Oct 2003 16:51:54 +0100
To: James Edward Gray II <james@grayproductions.net>,     Perl List <beginners@perl.org>

on 2003-10-10 James Edward Gray II said:

>Keep your replies on the list, so you can get help from all the people
>smarter than me. ;)

If there are people smarter than you out there I must be an amoeba ;)

>Okay, why put this inside an if block. If it doesn't find a match it
>will fail and do nothing, which is what you want, right? I don't think
>you need the if.

Good point.

>Why don't we work on your Regular Expression a little and see if we can
>do it all in one move. We want to find all occurrences of the keyword,
>as long as they're not on a line beginning with qz, right? This seems
>to do that for me:
>
>$content =~ s/^([^\n]*)($kw)/substr($1, 0, 2) ne 'qz' ? "$1\n$2\n" :
>"$1$2"/mge;
>

Ok. I had to stop to pick myself up off the floor then. WOW.

This has actually made it possible to cut the whole thing down massively.

here's the code now:

_________________________
# get line breaks to make <br>'s at the end
$content =~ s/\n/-qbr-/g;

# find markup and add markers so it doesn't get processed by regex,
# no keyword links to be made inside other tags
$content =~ s/(\[(img|page|link|mp3)=.*?\])/\nqz$1\n/g;

# find HTML so it doesn't get processed by regex,
# no keyword links to be made inside valid HTML
$content =~ s/(<.*?>)/\nqz$1\n/g;

for my $href ( @Keywords ) {
    
    # get each keyword and llok for it in content.
    for $kw ( keys %$href ) {
        if ($content =~ /\b($kw)\b/g) {
            
            # do the very clever reg with help from and thanks to
            # james@grayproductions.net
            $content =~ s/^([^\n]*)($kw)/substr($1, 0, 2) ne 'qz' ?
"$1\nqz[link=\"$href->{$kw}\" title=\"$2\"]\n" : "$1$2"/mge;
        }
    }
}

# clean up those line breaks and markers;
$content =~ s/\n(qz)?//g;

# put in <br>'s
$content =~ s/-qbr-/<br>\n/g;

print $content;
_________________________

As you can see I've adapted your regex a little to put in the full markup around
the keyword.

The regex itself made perfect sense, it was the

"" ? "" : "" bit that I've never seen before. That's really useful.

I assume it means

"if statement" ? "do if true" : "do if false"

Please do correct me if I'm wrong. What do you call that? I think I'm going it
be using that quite a bit ;)

do I even need the if false bit in this case?

>I used the /e modifier for the replacement, which allows me to use Perl
>code in there. It's pretty simple. If the line didn't start with a
>qz, we do a normal replace.

That's going in my BBEdit gold dust code snippets glossary.

>Let me know if that will work for you.

It did, perfectly. Thank you soooooo much,

>Your right about it being inefficient, of course. It was easier to
>read than my Regex though, eh? <laughs>

Are you implying that regex isn't easy to read ;)

>The first choice may be slow,
>but on modern computers they may both work in the blink of an eye.
>Save worrying about speed for when you need to and try and keep your
>life as a programmer as easy as possible until then.

Sadly then is now. That's why I joined up to this list today ;)

This code will be run on every single page of a website, in one go. So it needs
to be as efficient as physically possible. The site will only be a few hundred
pages, and not all pages will always be processed. It's a system that makes it's
own links and maintains them, so eveytime a page's keywords change this has to
be done to all pages that contain that keyword.

I know this is not a task for the beginner, but this is actually version 3 of
the code. my old programming language started to show it's dislike for regex.

>> If you have any suggestions I would be most grateful to hear them.
>
>Those are my best shots. Hope they help.

They did, thank you so much.

Angie



Relevant Pages

  • Re: traversing a variable with regex instead of a file
    ... wouldn't surprise me if one of the Regex Gurus could do it even better ... > # no keyword links to be made inside valid HTML ...
    (perl.beginners)
  • Re: traversing a variable with regex instead of a file
    ... We want to find all occurrences of the keyword, ... I will assume that $content contains only printable characters so why replace ... > As you can see I've adapted your regex a little to put in the full markup around ... Note that the conditional operator can also be used as an lvalue ...
    (perl.beginners)
  • regex for replacing plain text within html string...
    ... i have a tricky problem and my regex expertise has reached its limit. ... preserve the html, and replace some of the plain text. ... problems because the keyword may appear in markup tags or attribute ... it essentially matches the keyword inside the inner text of a html tag ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: traversing a variable with regex instead of a file
    ... We want to find all occurrences of the keyword, ... I will assume that $content contains only printable characters so why ... > As you can see I've adapted your regex a little to put in the full markup around ... That is called the "Conditional Operator". ...
    (perl.beginners)
  • Re: Needs help with Matching Logic
    ... this regex can be optimized a bit by noting that the only way $1 ... This is noticeably faster if the first occurrence of $keyword isn't ... if you use global matching to extract multiple ... multiple occurrences of $keyword near each other. ...
    (comp.lang.perl.misc)