Re: Elegant equivalent to this regex?



Thus spoke sherifffruitfly (on 2007-01-04 22:47):

Here's the regex I came up with:

(?<whole>\"(?<one>\d{1,3}),(?<two>\d{1,3}),(?<three>\d{1,3})\"|\"(?<one>\d{1,3}),(?<two>\d{1,3})\")

This works fine for me, and getting the desired complete "clean" number
from it is a
triviality.

But I get the feeling that this is the regex-equivalent of baby-talk.
I'd like to know if there's a simpler, more elegant regex matching the
same class of strings, and capturing essentially the same substrings.

You didn't specify how *exact* is your matching requirement,
eg. if you have data like this:

"323, 432, 5" "123, 456, 789" " 888 , 999" " " "1234, 456, 789" "333, 444, 333, 444"

we want to extract *only* sequences with 2 or 3 fields
(comma delimited) *and* exactly 3 digits per number(!),
so only group #2 and #3 would match. And how the whitespace
convention is going to be ...

This would (worst case and highest specification)
look almost like:

...
my $stuff = q'
"323, 432, 5" "123, 456, 789" " 888 , 999" " " "1234, 456, 789" "333, 444, 333, 444"
';

my $rexp = qr/ \"\s* # first quote
\d{3}\s* # first number

,\s* # first comma
\d{3}\s* # second number

(?: # prepare optional third thingy
,\s* # second comma
\d{3}\s* # third number
)?
\"/x; # second quote

my @hits =
map s/\D+//g && $_,
$stuff =~ /$rexp/g;

print join "\n", @hits;
...


Regards

M.
.



Relevant Pages

  • Re: Formatting...
    ... Not sure I follow you completely but to kill everything from the first comma ... Now when I run the script the output is like this: ... How can I kill these extra commas? ...
    (microsoft.public.scripting.vbscript)
  • Re: Tony Coopers downhill slide
    ... >zeroed in on the undesirable interruption the first comma ... >to understand what the purpose of the comma was and probably ... >remark being punctuated as parenthetical. ... parenthetical phrase out of something that need not - even should not ...
    (alt.usage.english)
  • Re: Reference Field Error!
    ... Insert a bookmark starting at the left side of the first comma and ending at ... Reference Source not found" in the reference fields, ...
    (microsoft.public.word.vba.general)
  • Misbehaving list box
    ... user inputs a last name and a first name and then hits a button (or just hits ... display purposes. ... comma as a delimiter and is breaking up the names onto separate lines in the ...
    (microsoft.public.access.modulesdaovba)
  • Re: Tony Coopers downhill slide
    ... >>>zeroed in on the undesirable interruption the first comma ... >>>to understand what the purpose of the comma was and probably ... This defense of the punctuation because it is the choice of the writer ... Especially since it's followed by a justified parenthetical phrase. ...
    (alt.usage.english)