Re: software engineering, program construction
- From: Ben Morrow <ben@xxxxxxxxxxxx>
- Date: Wed, 28 Oct 2009 21:34:25 +0000
Quoth ccc31807 <cartercc@xxxxxxxxx>:
On Oct 28, 3:27 pm, Ben Morrow <b...@xxxxxxxxxxxx> wrote:
? If you need the keys as well then a
while (my ($level, $level_hash) = each %information_hash) {
loop might be more appropriate.
I am using the keys to do other things, so yes, I need the keys, but
thanks for your suggestion. I find myself doing this a lot, so I'm
open to making it easier.
As a general rule when you find yourself doing the same
$great{long}{dereference}{expression}
over and over, you should find a way to put it in a variable. This is a
simple application of DRY.
This is only because you've never been bitten by using globals when you
shouldn't have; probably because you've only ever written relatively
small programs, and never come back to a program six months later to add
a new feature.
Okay, let's consider an evolving programming style. Suppose you wrote
a very short script that looks like this:
my %hash;
#step_one
open IN, '<', 'in.dat';
while (<IN>)
{
chomp;
my ($val1, $val2, $val3 ...) = split /,/;
$hash($val1} = (name => $val2, $id => $val3 ...);
}
close IN;
#step_two
open OUT, '>', 'out.csv';
foreach my $key (sort keys %hash)
{
print OUT qq("$hash{$key}{name}","$hash{$key}{name}"\n);
}
close OUT;
exit(0);
Now, suppose you rewrote it like this:
my %hash;
step_one();
step_two();
exit(0);
sub step_one
{
open IN, '<', 'in.dat';
while (<IN>)
{
chomp;
my ($val1, $val2, $val3 ...) = split /,/;
$hash($val1} = (name => $val2, $id => $val3 ...);
}
close IN;
}
sub step_two
{
open OUT, '>', 'out.csv';
foreach my $key (sort keys %hash)
{
print OUT qq("$hash{$key}{name}","$hash{$key}{name}"\n);
}
close OUT;
}
exit(0);
Ben, I could make the case that the second version is clearer and
easier to maintain that the first version, even though the second
version breaks the rules and the first version doesn't.
I think you'd have a hard time making that case. Your sub declarations
aren't doing any more for you than the '#step 1' comments in the first
example, since the subs are still scribbling on global state.
The whole point of a subroutine is that you should be able to understand
the sub *without* looking at the rest of the code. Wherever possible,
subs should avoid touching global state, because that means that you
have to look at everything *else* in the program that touches that state
to understand what's going on. As Uri pointed out, it is often useful to
have a small amount of global state, simply because passing it around
all the time becomes tedious, but this should be kept to a minimum.
I might write that program something like this:
#!/usr/bin/perl
# ALWAYS. Even when you don't think you need to.
use warnings;
use strict;
# Give the subs meaningful names, that describe what they do.
# Pass the file to read as a parameter, so you can (say) change the
# code to read 3 files in succession if you need to.
# Have the sub return the data structure, so that when you *do* call
# it several times you can keep the results separate.
# Consider taking the input and output filenames from @ARGV, so you
# can easily rerun the program on a different data set.
my $data = read_data("in.dat");
write_csv($data, "out.csv");
sub read_data {
my ($file) = @_;
# Use a lexical instead of a global filehandle. That way you
# don't need to worry about other subs in the program that might
# be using the IN filehandle.
# Check for errors, even when you think you don't need to. Good
# habits require practice.
open my $IN, "<", $file
or die "can't read '$file': $!";
my %hash;
while (<$IN>) {
# I hope you *know* this file can be parsed like this. If
# this is supposed to be a CSV file you should use a module,
# because parsing CSV is not as trivial as it seems.
chomp;
my ($val1, $name, $id) = split /,/;
$hash{$val1} = {name => $name, id => $id};
}
return \%hash;
}
sub write_csv {
my ($data, $file) = @_;
open my $OUT, ">", $file
or die "can't write to '$file': $!";
# OK, so this *is* supposed to be CSV. Stop wasting time and
# just use Text::CSV_XS already. It'll correctly handle the
# cases like 'a name with a comma in' that you haven't thought
# about.
for my $key (sort keys %$hash) {
my $person = $hash{$key};
print $OUT qq("$person->{name}","$person->{id}"\n);
}
# We have written to this filehandle, so we need to check the
# return value of close.
close $OUT or die "writing to '$file' failed: $!";
}
What's the
REAL difference between the two versions? And why should the
decomposition of code into subroutines NECESSARILY require a
functional style and variable localization?
What I wrote above isn't in 'a functional style'. Doing non-idempotent
actions like IO in a functional style is *hard*, and requires concepts
like 'monad'. See Haskell.
The point here is that there's no *point* decomposing your code into
subs unless you're going to make those subs self-contained. You aren't
gaining anything.
If it works, then no, it isn't 'wrong'. It is bad style, though. If you
had written (say) the chart-creating code as a module with functions
that took parameters, then when you need another set of charts tomorrow
you could reuse it. As it is you have to copy/paste and modify it for
your new set of global data structures.
You are 100 percent correct. I don't know if I will ever run this
script again. If I do, I'll certainly revise it (as I wrote it like
version one above).
Hmm, you're still thinking about 'this script'. You need to be thinking
'What will I be doing tomorrow, next week, next year? Can I take some of
this and make it reusable, to save myself some work tomorrow?'.
That may be practical when your programs are only ever run once, but
quickly becomes less so when you have many programs in long-term use
with almost-but-not-quite the same subroutine in: when you find a bug,
how are you going to find all the places you've copy/pasted it to
correct it?
Again, I agree totally. However, I'm a lot more interested in the
architecture of a script than the other issues that have been
mentioned. With this particular issue, I try my best to follow the DRY
practice, and the second or third time I write the same thing, I often
will place it in a function and call it from there.
What about the second or third time you write a given function? Do you
place it in a module and import it from there? Once you start doing that
regularly, you'll start to see why relying on global state just makes
things harder in the long run.
Ben
.
- Follow-Ups:
- Re: software engineering, program construction
- From: ccc31807
- Re: software engineering, program construction
- References:
- software engineering, program construction
- From: ccc31807
- Re: software engineering, program construction
- From: ccc31807
- Re: software engineering, program construction
- From: Ben Morrow
- Re: software engineering, program construction
- From: ccc31807
- software engineering, program construction
- Prev by Date: Re: software engineering, program construction
- Next by Date: FAQ 8.3 How do I do fancy stuff with the keyboard/screen/mouse?
- Previous by thread: Re: software engineering, program construction
- Next by thread: Re: software engineering, program construction
- Index(es):
Relevant Pages
|