Re: A script to flag commonly misused words
- From: "A. Sinan Unur" <1usa@xxxxxxxxxxxxxxxxxxx>
- Date: Wed, 01 Aug 2007 01:22:40 GMT
David Delony <ickyelf@xxxxxxxxx> wrote in
news:oOOri.232$3x.209@xxxxxxxxxxxxxxxxxxxxxxxxxx:
This is my first real program,
Unfortunately, you ignored the primary principle of programming: Don't
repeat yourself.
Think for a second: Each time a new word is added to the list of words
to be flagged, you need to alter your program. That is not a good thing.
#! /usr/bin/perl -w
use warnings;
is preferable in general as it allows you to scope warnings.
use strict;
Good!
while (<>) { # Our friend, the magic filehandle.
&word_check;
}
You don't need the & to invoke a subroutine. Using it has certain side
effects. If you do not know what they are and do not explicitly want
those, don't use & to invoke subs.
You are not passing arguments to subs and you are relying on the
contents of $_ not changing within the program flow.
Unfortunately, your lines are too wide and they wrap making it necessary
to edit your post to be able to test it. Don't do that.
sub flag() {
chomp;
print "$.:$`\[$&\]$'\n"; #Print a line number and the line with
pattern match
Using the pre- and post-match variables imposes a speed penalty on every
regex operation in your program.
sub word_check () { # Brace yourself, it's going to be a long one!
But there is no need for it to be this long and this tedious.
So, I first wrote a script to extract the words to flag from your
script:
#!/usr/bin/perl
use strict;
use warnings;
while ( <> ) {
if ( m!^\s+&flag.+/(.+)/! ) {
print "$1\n";
}
}
__END__
C:\Home\asu1\src\New Folder> perl s.pl < t.pl > words
Now, the words file contains the words to flag. I am going to include
these words in the __DATA__ section of the following script for
convenience.
In the process, I discovered that at least one of the expressions you
used is wrong.
/is a.who|that/i; # "He is a man who" and other constr
A single solitary dot will only match one character. Second, the word
that in this expression will match by itself. You need a set of non-
capturing grouping parantheses around such expressions.
/thank* in advance/i;
I don't think you are really looking for
than in advance
thank in advance
thankkkkkk in advance
but rather want to match "thanks in advance". Well, at least I am glad
to have found out that I am not the only one who considers this phrase
rude and unnecessary.
/thank\w+ in advance/
would have been better.
The problem, of course, is that such expressions can span two lines. The
script I give below would also fail to match in such cases but I am too
lazy to fix it right now.
Please note that the purpose of this critique is not to discourage you
but to help you improve. Hope it helps.
As it stands, the script will give false positives. For example,
test(71):[Than]ks in advance
test(71):[Thanks in advance]
Again, I am too lazy to figure out how to do everything right.
However, I hope the following script will illustrate to you a way of
reducing your work:
#! /usr/bin/perl
use strict;
use warnings;
my @regexps;
while ( my $s = <DATA> ) {
$s =~ s/^\s+//;
$s =~ s/\s+$//;
next unless length $s;
push @regexps, qr/$s/i;
}
while (my $input = <>) {
WORD_CHECK: for my $r ( @regexps ) {
if ( (my $checked = $input) =~ s/($r)/[$1]/g ) {
print "$ARGV($.):$checked";
}
}
}
__DATA__
aggravat|irritat
all right
allud|allusion
alternate|alternative
among|between
or
anticipat
anybody
anyone
as good or better than
as to whether
as yet
being
but
can
care less
case
certainly
character
claim
clever
compar
compris
consider
contact
cope|coping
currently
data
different than
disinterested
divided into
due to
each and every one
effect
enormity
enthuse
etc
fact
facilit
factor
farther|further
feature
finalize
fix
flammable
folk
fortuitous
got
gratuitous
is a \w+ (?:who|that)
hopefully
however
illusion
imply|impli|infer
importanly
in regard to
in the last analysis
inside
insightful
in terms of
interesting
irregardless
ize
kind of
lay
leave
less
like
line|along these lines
literal|literally
loan
meaningful
memento
most
nature
nauseous|nauseated
nice
nor
one
one of the most
oriented
partially
ing
people
personal
posess
presently
prestigoius
refer
regeretful
relate
respective
firstly|secondly|thirdly|fourthly
shall|will
so
sort of
to.*ly
state
student body
than
thank\w+ in advance
that|which
the foreseeable future
the* is
they|he|she
this
thrust
tortuous|torturous
try
type
unique
utilize
verbal
very
while
\w+wise
worth.while
would
Sinan
--
A. Sinan Unur <1usa@xxxxxxxxxxxxxxxxxxx>
(remove .invalid and reverse each component for email address)
comp.lang.perl.misc guidelines on the WWW:
http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
.
- Follow-Ups:
- Re: A script to flag commonly misused words
- From: David Delony
- Re: A script to flag commonly misused words
- References:
- A script to flag commonly misused words
- From: David Delony
- A script to flag commonly misused words
- Prev by Date: FAQ 5.15 Why do I sometimes get an "Argument list too long" when I use <*>?
- Next by Date: Re: A script to flag commonly misused words
- Previous by thread: A script to flag commonly misused words
- Next by thread: Re: A script to flag commonly misused words
- Index(es):
Relevant Pages
|
Loading