Re: text processing problem




Maurice LING wrote:
> Matt wrote:
> > I'd HIGHLY suggest purchasing the excellent <a
> > href="http://www.oreilly.com/catalog/regex2/index.html";>Mastering
> > Regular Expressions</a> by Jeff Friedl. Although it's mostly
geared
> > towards Perl, it will answer all your questions about regular
> > expressions. If you're going to work with regexs, this is a
must-have.
> >
> > That being said, here's what the new regular expression should be
with
> > a bit of instruction (in the spirit of teaching someone to fish
after
> > giving them a fish ;-) )
> >
> > my_expr = re.compile(r'(\w+)\s*(\(\1\))')
> >
> > Note the "\s*", in place of the single space " ". The "\s" means
"any
> > whitespace character (equivalent to [ \t\n\r\f\v]). The "*"
following
> > it means "0 or more occurances". So this will now match:
> >
> > "there (there)"
> > "there (there)"
> > "there(there)"
> > "there (there)"
> > "there\t(there)" (tab)
> > "there\t\t\t\t\t\t\t\t\t\t\t\t(there)"
> > etc.
> >
> > Hope that's helpful. Pick up the book!
> >
> > M@
> >
>
> Thanks again. I've read a number of tutorials on regular expressions
but
> it's something that I hardly used in the past, so gone far too rusty.
>
> Before my post, I've tried
> my_expr = re.compile(r'(\w+) \s* (\(\1\))') instead but it doesn't
work,
> so I'm a bit stumped......
>
> Thanks again,
> Maurice

Maurice,
The reason your regex failed is because you have spaces around the
"\s*". This translates to "one space, followed by zero or more
whitespace elements, followed by one space". So your regex would only
match the two text elements separated by at least 2 spaces.

This kind of demostrates why regular expressions can drive you nuts.

I still suggests picking up the book; not because Jeff Friedl drove a
dump truck full of money up to my door, but because it specifically has
a use case like yours. So you get to learn & solve your problem at the
same time!

HTH,
M@

.



Relevant Pages

  • Re: Search for multiple things in a string
    ... >> I also feel that Regular Expressions, being an object in asp.net (not ... So using Regex is not really like using another language (as C# is different ... I agree with you that readability is important. ... And I was not saying experiment with it. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: for a laught (???)
    ... Regex doesn't work too well with a null byte delimiter :-) ... the API for a particular form of regular expressions ... Regex doesn't work with null terminated strings. ... qualifier or the qualifier "commonly" might have suggested. ...
    (comp.lang.cobol)
  • Re: Search for multiple things in a string
    ... >>> As far as readability, it has nothing to do with Regular Expressions ... > and Regex. ... >> characters in the string, perhaps even writing your own state machine ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: I need a test to see that a string is a valid path to a file
    ... I need a test to see that a string is a valid path to a file. ... "does point to an existing file": you can't check that with a regex. ... of folder names, but I could add that for the test. ... I don't know anything about regular expressions so I looked on the ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: [OT] Re: Chris Sonnack on VB.Nets putative Set statement
    ... > regex is in a C-ish language string. ... the ease of making errors in regular expressions is a concern ... You do need to replicate the pattern on either side of the comma ...
    (comp.programming)