Loop over regexp groups



Hello,

I am matching a regexp with an a priori unknown number of groups. I would
like to loop over all groups that were matched. For example:

/(\w+)\s(\w+)/ ;
#or
/(\w+)\s(\w+)\s(\w+)/ ;
# or something else

@groups = ...???

for( @groups ) {
process_match( $_ ) ;
}

Of course, the above example is simplifying reality and could be replaced
by split(). Here are more details on the problem:

I am processing protein sequence files in the FASTA format. Depending on
the database, the FASTA headers may look like that:

O81231 (Q81999) Dehydrogenase alpha subunit

or like that

O81231 123 Q81999

or

gi|O81231||li|Q81999

or, possibly,

O81231; synonyms: Q81999, P89812, O77781

or, basically, anything else. As you might guess, I'm interested in the
"Q81231" or "Q81231" part. The idea is that my utility can take an
optional "regexp" string that matches the type of headers that are found in
a given database; while looping through the database, the regexp is
matched, and entries are made for any of the synonymous identifiers found
in one header.

Currently, I am assuming that I will not find more than four synonims, and
I do the following:

for( $1, $2, $3, $4 ) {
last unless $_ ;
process_match( $_ ) ;
}

....which is, of course, crap.

Thanks in advance,
January

P.S. No, ([A-Z]\d{5}) would not match any identifier; the id format can
differ as well. Sometimes it is HBA_HUMAN.

--
.



Relevant Pages

  • Re: Loop over regexp groups
    ... I am matching a regexp with an a priori unknown number of groups. ... I am processing protein sequence files in the FASTA format. ... the FASTA headers may look like that: ... a given database; while looping through the database, ...
    (comp.lang.perl.misc)
  • Re: Loop over regexp groups
    ... I am matching a regexp with an a priori unknown number of groups. ... I am processing protein sequence files in the FASTA format. ... the FASTA headers may look like that: ... a given database; while looping through the database, ...
    (comp.lang.perl.misc)
  • Re: removing last chrs (with different browsers giving different last chrs )
    ... Whitelisting is just one step in my validation process. ... I normally stick to regexp even if is_numericor type casting would also suffice, since the regexp engine is needed anyway, and I've set up a collection of frequently needed expressions as class constants. ... In this case foreign key integrity would be violated (after all we are using a "real" relational database engine, ... Nothing is written into the database, just an error message in my database log. ...
    (comp.lang.php)
  • Re: XML logs : Good idea
    ... >>> Give me grep and a simple regexp, ... More to the point, log message has fairly fixed format: ... >> into a database anyway. ... via XML + XSL to produce something approximating a view of the table. ...
    (comp.os.linux.security)
  • Re: parsing phone number from CHAR to Integer
    ... Is there a regexp for Redhat ES 3.0? ... The problem is that the phone numbers we have in database (Informix ... Please access the attached hyperlink for an important electronic communications disclaimer: ...
    (comp.databases.informix)