Re: Regular Expressions



On Sun, Feb 8, 2009 at 03:49, Gunnar Hjalmarsson <noreply@xxxxxxxxx> wrote:
Chas. Owens wrote:

On Sat, Feb 7, 2009 at 19:11, Gunnar Hjalmarsson <noreply@xxxxxxxxx>
wrote:

TMTOWTDI

use Time::Local;
while (<DATA>) {
s{,(.+?),}{
my ($d, $m, $y) = split /\//, $1;
my $t = timelocal 0, 0, 0, $d, $m-1, $y;
($d, $m, $y) = (localtime $t)[3..5];
sprintf ',%d-%02d-%02d,', $y+1900, $m+1, $d;
}e;
}

snip

And this would be the confusing, fragile mess I spoke of.

Sorry, but I fail too see how using the s/// operator to extract the date
field would be so much more confusing and fragile compared to split() +
join().
snip

You are calling three functions (one of which is split) and assigning
returns three times inside the replacement. Add on top of that the
fact that the regex only works for the second field. Compare all of
that to calling two much simpler functions, a simple substitution, and
one assignment. Try to imagine what happens six months from now when
you need to go back and perform a transformation on the fifth field.
Are you going to extend the regex to try to capture that value? Or
are you just going to rewrite the code to use a split like you should
have in the first place? Also, there may be a need to handle commas
in the fields at some point in the future. This will entail using a
module like Text::CSV. With the split code you can just replace the
split with the proper parsing function from the module. With the
giant substitution code you pretty much have to rewrite the whole
thing.

I am all for using advanced features of Perl when it makes the code
clearer or more concise, but this code is longer than the split
version, involves more functions (including the confusing* localtime
and timelocal functions), and doesn't even do error checking on the
data.

On an unrelated topic, why are you using timelocal? A much better
solution is to use the strftime function from the POSIX module:

#!/usr/bin/perl

use strict;
use warnings;

use POSIX;

while (<DATA>) {
s{,([^,]+),}{
my ($m, $d, $y) = $1 =~ m<^([0-9]+)/([0-9]+)/([0-9]+)$>
or die "$. has an invalid date format";
strftime ",%Y%m%d,", 0, 0, 0, $d, $m - 1, $y - 1900;
}e;
print;
}

__DATA__
1,1/1/2009,optional,foo
2,1/2/2009,,bar
3,1/3/2009,,baz


Note how the split from your code has been changed to a regex. This
is because split is indiscriminate. This was good in my code because
it acted as future proofing against more fields being added to the
record** (which is unlikely to affect the meaning of earlier fields),
but bad here because we know the expected format of the date and the
chances of it not being that format and the code still being correct
at some point in the future is small.

* localtime pretty much only makes sense when you know the C based tm
structure it came from and timelocal, besides being a word play that
is too clever by half, is worse because it violates that structure***.
** also, if we wanted to throw an error because there were too few or
too many fields it would be easily achieved by asking the array how
many elements it held.
*** http://perldoc.perl.org/Time/Local.html#Year-Value-Interpretation

--
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.
.



Relevant Pages

  • Re: DateFormat - String Manipulating
    ... >>which there is absolutely NO chance of confusing the month and day ... I convinced them that, unless the format is specified, their own ...
    (microsoft.public.scripting.vbscript)
  • Re: Regular Expressions
    ... And this would be the confusing, fragile mess I spoke of. ...
    (perl.beginners)
  • Re: strlen(), K+1: clarification
    ... objects that did not overlap 100%. ... the behavior of memcpy and confusing && and || under pressure, ... three multiple choice answers at once, ...
    (comp.programming)
  • Re: Zip Back up?
    ... Win9x and earlier or Me. ... And the only reason I questioned you is for the OP, which only made things more confusing. ... Even if the drive is formatted for FAT32 for XP, ... then you should not have to format it. ...
    (microsoft.public.windowsxp.general)
  • Re: using function names as variable names? Is it Justified?
    ... [SNIP] ... it can be confusing to have ... >> contexts that give it a different meaning. ... > Then you may as well say that syntax highlighting is not needed. ...
    (comp.lang.lisp)