Re: regex help

From: Andrew Gaffney (agaffney_at_skylineaero.com)
Date: 05/29/04


Date: Fri, 28 May 2004 19:38:43 -0500
To: Roberto Etcheverry <rob@arkham.com.ar>

Roberto Etcheverry wrote:
>
> On Fri, 28 May 2004, Andrew Gaffney wrote:
>
>
>>I'm trying to write a regex to parse the following data. Each group is a string
>>to parse.
>>
>><td class="f3" colspan="2" width="48">05/28/04</td>
>><td class="f3" colspan="2" width="60"></td>
>><td class="f3" colspan="2" width="186">Purchase With Pin Pin</td>
>><td class="f3" colspan="2" align="right" width="78"></td>
>><td class="f3" colspan="2" align="right" width="78">$10.00<br>(pending)<a
>>href='javascript: ShowHelp("PENDING TRANSACTION")'><img src="usb
>>ank_files/help.gif" valign="middle" alt="Pending Transaction Help"
>>border="0"></a></td>
>><td class="f3" align="right">$1,224.45</td>
>>
>><td class="f3" colspan="2" width="48">05/27/04</td>
>><td class="f3" colspan="2" width="60"></td>
>><td class="f3" colspan="2" width="186">Purchase With Pin Shell Service Stlake
>>St. Loumo</td>
>><td class="f3" colspan="2" align="right" width="78"></td>
>><td class="f3" colspan="2" align="right" width="78">$1.78</td>
>><td class="f3" align="right">$1,234.45</td>
>>
>><td class="f3" colspan="2" width="48">05/21/04</td>
>><td class="f3" colspan="2" width="60"></td>
>><td class="f3" colspan="2" width="186">Atm Withdrawal One O'fallon Squo'fallon
>>Mo 1</td>
>><td class="f3" colspan="2" align="right" width="78"></td>
>><td class="f3" colspan="2" align="right" width="78">$20.00</td>
>><td class="f3" align="right"><a href='javascript:
>>ShowHelp("NOTE","RESTRICTEDFUNDSAMOUNT=$2.00","AVAILABLETRANSACTIONAMOUNT=$1,132.79")'>$
>>1,134.79</a></td>
>>
>>This is the regex I put together:
>>
>> my $regex = '<td[^>]+?>(\d{2})/(\d{2})/(\d{2})</td>.+?';
>> $regex .= '<td[^>]+?>(.*?)</td>.+?';
>> $regex .= '<td[^>]+?>(.+?)</td>.+?';
>> $regex .= '<td[^>]+?>(?:\$(\d+\.\d{2})).*?</td>.+?';
>> $regex .= '<td[^>]+?>(?:\$(\d+\.\d{2})).*?</td>.+?';
>> $regex .= '<td[^>]+?>.*?(?:\$(\d+\.\d{2})).*?</td>';
>>
>>The first field will always be in the form 'mm/dd/yy'. The second and third
>>field need to be captured as they are. As for the fourth and fifth fields, only
>>one will contain a value. The other one will be empty (nothing between
>><td></td>). The format is '$123.45' with the possibility of trailing HTML before
>>the </td>. I only want the number without the $. The sixth field will contain a
>>dollar amount like the fourth and fifth fields. It could be surrounded by HTML.
>>Again, I only need the number without the $. What is wrong with the above regex?
>>I am using it with the 's' modifier.
>>
> It seems two things are missing:
>
> 1) A '?' after the 4th and 5th group (because they may be empty).
> 2) Include ',' on the regex matching the amounts (to match '1,234.45' for
> example).
>
> So the regex would be:
>
> my $regex = '<td[^>]+?>(\d{2})/(\d{2})/(\d{2})</td>.+?';
> $regex .= '<td[^>]+?>(.*?)</td>.+?';
> $regex .= '<td[^>]+?>(.+?)</td>.+?';
> $regex .= '<td[^>]+?>(?:\$([\d,]+\.\d{2}))?.*?</td>.+?';
> $regex .= '<td[^>]+?>(?:\$([\d,]+\.\d{2}))?.*?</td>.+?';
> $regex .= '<td[^>]+?>.*?(?:\$([\d,]+\.\d{2})).*?</td>';

Ah, thank you. Those changes worked.

-- 
Andrew Gaffney
Network Administrator
Skyline Aeronautics, LLC.
636-357-1548


Relevant Pages

  • Parsing and regex
    ... I am writing a program to parse a Cisco ASA log file. ... I have created a database that contains the 6 ... digit rule number which is unique, a severity level, the full ASA rule ... regex "rule" in the database. ...
    (comp.lang.perl.misc)
  • Re: sscanf in c#
    ... |> | With sscanf, I believe you can do something like: ... |> I prefer using TryParse over RegEx, just a matter of taste, and quite ... some minimal code arround the individual type's Parse and TryParse ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: sscanf in c#
    ... |> | With sscanf, I believe you can do something like: ... |> I prefer using TryParse over RegEx, just a matter of taste, and quite ... some minimal code arround the individual type's Parse and TryParse ...
    (microsoft.public.dotnet.languages.csharp)
  • RE: Mechanize
    ... All you can really do using mechanize is parse the ... > I need to parse out data from the page that is neither a link or form. ... > I need to read the content a line at a time and using a regex find the ... from the POD for WWW::Mechanize: ...
    (perl.beginners)
  • Re: long input
    ... long.Parse(string) or first validate the string (with RegEx for example) and ... then parse it if the string's valid input for a long value. ... the conversion doesn't work, you'll recive a FormatException, so it might be ...
    (microsoft.public.dotnet.languages.csharp)