Re: Matching neighbouring words of a pattern using Regex

From: Gunnar Hjalmarsson (noreply_at_gunnar.cc)
Date: 08/31/04


Date: Tue, 31 Aug 2004 00:09:34 +0200


[ Reply not posted to the defunct group comp.lang.perl ]

CV wrote:
> How can I match 'n' number of neighbouring words of a pattern using
> regular expressions?
>
> For example, suppose I am looking for the pattern "length xyz cm"
> in some text. where xyz is a number - integer or fraction or
> decimal point. How can I also grab about 3-5 words on either side
> of the pattern "length xyz cm"? The surrounding words are not
> always constant & may be variable. Also, the original text to be
> matched is not just a single sentence, but lines from a file
> concatenated together - so the text has many newline characters
> too. I only want the words on the same line as the pattern.
>
> I have tried using regex of the form
> /\b(\w*)\b(\w*)\b(\w*)\b($pattern)\b(\w*)\b(\w*)\b(\w*), but this
> doesn't work for some reason.

It doesn't work for several reasons, such as:

- No space characters.
- '\w*\b\w*' is an impossible combination that can never match (check
out the description of \b in "perldoc perlre" to learn why).
- The \w character class does not include e.g. the '$' character,
while you mentioned that a "word" may be a variable.

> Could someone please offer some suggestions?

Try something like this:

     /((?:\S+ +){0,3})\b($pattern)\b((?: +\S+){0,3})/

-- 
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl


Relevant Pages

  • Re: perl file parsing
    ... AGGACATGCGGCCCGGCGACCTCATCATCTACTTCGACGACGCCAGCCACGTCGGGATG ... the number of times the pattern has occured. ... chomp; #remove newline characters ...
    (perl.beginners)
  • Re: Thomas Covenant series
    ... reason not to continue modifiying it given the need, ... I don't see any reason, then, why the Earth could not be described as ... one would be hard-pressed to say they fit with reality. ... pattern of them, and lift that pattern up (shaken loose of anything ...
    (rec.arts.sf.composition)
  • Re: memset question!
    ... parameter is also a *byte* fill pattern. ... So, filling an integer with ... 11, like above will fill it with hex 0x0B0B0B0B, or decimal ... For that reason, memset's middle parameter is best passed ...
    (microsoft.public.vc.language)
  • Re: java.sql.Date returning wrong date
    ... some reason the conversion is not working. ... What do you imagine "mm" means to SimpleDateFormat? ... the possible pattern letters in the javadoc in SimpleDateFormat.java. ...
    (comp.lang.java.programmer)
  • Re: Genetics of colour blindness?
    ... > guys can see through camouflage that stops a normally sighted guy. ... People with colour vision give priority to colour information, ... So both normal and colour blind people see a pattern in the test card. ... Don't fall into the fallacy that there must be a reason ...
    (talk.origins)