regex help
From: Andrew Gaffney (agaffney_at_skylineaero.com)
Date: 05/28/04
- Next message: Andrew Gaffney: "Re: weird math"
- Previous message: Charles K. Clarkson: "RE: max number of data within an array"
- Next in thread: Roberto Etcheverry: "Re: regex help"
- Reply: Roberto Etcheverry: "Re: regex help"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Fri, 28 May 2004 14:38:29 -0500 To: beginners <beginners@perl.org>
I'm trying to write a regex to parse the following data. Each group is a string
to parse.
<td class="f3" colspan="2" width="48">05/28/04</td>
<td class="f3" colspan="2" width="60"></td>
<td class="f3" colspan="2" width="186">Purchase With Pin Pin</td>
<td class="f3" colspan="2" align="right" width="78"></td>
<td class="f3" colspan="2" align="right" width="78">$10.00<br>(pending)<a
href='javascript: ShowHelp("PENDING TRANSACTION")'><img src="usb
ank_files/help.gif" valign="middle" alt="Pending Transaction Help"
border="0"></a></td>
<td class="f3" align="right">$1,224.45</td>
<td class="f3" colspan="2" width="48">05/27/04</td>
<td class="f3" colspan="2" width="60"></td>
<td class="f3" colspan="2" width="186">Purchase With Pin Shell Service Stlake
St. Loumo</td>
<td class="f3" colspan="2" align="right" width="78"></td>
<td class="f3" colspan="2" align="right" width="78">$1.78</td>
<td class="f3" align="right">$1,234.45</td>
<td class="f3" colspan="2" width="48">05/21/04</td>
<td class="f3" colspan="2" width="60"></td>
<td class="f3" colspan="2" width="186">Atm Withdrawal One O'fallon Squo'fallon
Mo 1</td>
<td class="f3" colspan="2" align="right" width="78"></td>
<td class="f3" colspan="2" align="right" width="78">$20.00</td>
<td class="f3" align="right"><a href='javascript:
ShowHelp("NOTE","RESTRICTEDFUNDSAMOUNT=$2.00","AVAILABLETRANSACTIONAMOUNT=$1,132.79")'>$
1,134.79</a></td>
This is the regex I put together:
my $regex = '<td[^>]+?>(\d{2})/(\d{2})/(\d{2})</td>.+?';
$regex .= '<td[^>]+?>(.*?)</td>.+?';
$regex .= '<td[^>]+?>(.+?)</td>.+?';
$regex .= '<td[^>]+?>(?:\$(\d+\.\d{2})).*?</td>.+?';
$regex .= '<td[^>]+?>(?:\$(\d+\.\d{2})).*?</td>.+?';
$regex .= '<td[^>]+?>.*?(?:\$(\d+\.\d{2})).*?</td>';
The first field will always be in the form 'mm/dd/yy'. The second and third
field need to be captured as they are. As for the fourth and fifth fields, only
one will contain a value. The other one will be empty (nothing between
<td></td>). The format is '$123.45' with the possibility of trailing HTML before
the </td>. I only want the number without the $. The sixth field will contain a
dollar amount like the fourth and fifth fields. It could be surrounded by HTML.
Again, I only need the number without the $. What is wrong with the above regex?
I am using it with the 's' modifier.
-- Andrew Gaffney Network Administrator Skyline Aeronautics, LLC. 636-357-1548
- Next message: Andrew Gaffney: "Re: weird math"
- Previous message: Charles K. Clarkson: "RE: max number of data within an array"
- Next in thread: Roberto Etcheverry: "Re: regex help"
- Reply: Roberto Etcheverry: "Re: regex help"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|
|