Re: A script to flag commonly misused words



David Delony <ickyelf@xxxxxxxxx> wrote in
news:oOOri.232$3x.209@xxxxxxxxxxxxxxxxxxxxxxxxxx:

This is my first real program,

Unfortunately, you ignored the primary principle of programming: Don't
repeat yourself.

Think for a second: Each time a new word is added to the list of words
to be flagged, you need to alter your program. That is not a good thing.

#! /usr/bin/perl -w


use warnings;

is preferable in general as it allows you to scope warnings.


use strict;


Good!

while (<>) { # Our friend, the magic filehandle.

&word_check;
}

You don't need the & to invoke a subroutine. Using it has certain side
effects. If you do not know what they are and do not explicitly want
those, don't use & to invoke subs.

You are not passing arguments to subs and you are relying on the
contents of $_ not changing within the program flow.

Unfortunately, your lines are too wide and they wrap making it necessary
to edit your post to be able to test it. Don't do that.

sub flag() {

chomp;

print "$.:$`\[$&\]$'\n"; #Print a line number and the line with
pattern match

Using the pre- and post-match variables imposes a speed penalty on every
regex operation in your program.


sub word_check () { # Brace yourself, it's going to be a long one!

But there is no need for it to be this long and this tedious.

So, I first wrote a script to extract the words to flag from your
script:

#!/usr/bin/perl

use strict;
use warnings;

while ( <> ) {
if ( m!^\s+&flag.+/(.+)/! ) {
print "$1\n";
}
}
__END__

C:\Home\asu1\src\New Folder> perl s.pl < t.pl > words

Now, the words file contains the words to flag. I am going to include
these words in the __DATA__ section of the following script for
convenience.

In the process, I discovered that at least one of the expressions you
used is wrong.

/is a.who|that/i; # "He is a man who" and other constr

A single solitary dot will only match one character. Second, the word
that in this expression will match by itself. You need a set of non-
capturing grouping parantheses around such expressions.

/thank* in advance/i;

I don't think you are really looking for

than in advance
thank in advance
thankkkkkk in advance

but rather want to match "thanks in advance". Well, at least I am glad
to have found out that I am not the only one who considers this phrase
rude and unnecessary.

/thank\w+ in advance/

would have been better.

The problem, of course, is that such expressions can span two lines. The
script I give below would also fail to match in such cases but I am too
lazy to fix it right now.

Please note that the purpose of this critique is not to discourage you
but to help you improve. Hope it helps.

As it stands, the script will give false positives. For example,

test(71):[Than]ks in advance
test(71):[Thanks in advance]

Again, I am too lazy to figure out how to do everything right.

However, I hope the following script will illustrate to you a way of
reducing your work:

#! /usr/bin/perl

use strict;
use warnings;

my @regexps;

while ( my $s = <DATA> ) {
$s =~ s/^\s+//;
$s =~ s/\s+$//;
next unless length $s;

push @regexps, qr/$s/i;
}

while (my $input = <>) {
WORD_CHECK: for my $r ( @regexps ) {
if ( (my $checked = $input) =~ s/($r)/[$1]/g ) {
print "$ARGV($.):$checked";
}
}
}
__DATA__
aggravat|irritat
all right
allud|allusion
alternate|alternative
among|between
or
anticipat
anybody
anyone
as good or better than
as to whether
as yet
being
but
can
care less
case
certainly
character
claim
clever
compar
compris
consider
contact
cope|coping
currently
data
different than
disinterested
divided into
due to
each and every one
effect
enormity
enthuse
etc
fact
facilit
factor
farther|further
feature
finalize
fix
flammable
folk
fortuitous
got
gratuitous
is a \w+ (?:who|that)
hopefully
however
illusion
imply|impli|infer
importanly
in regard to
in the last analysis
inside
insightful
in terms of
interesting
irregardless
ize
kind of
lay
leave
less
like
line|along these lines
literal|literally
loan
meaningful
memento
most
nature
nauseous|nauseated
nice
nor
one
one of the most
oriented
partially
ing
people
personal
posess
presently
prestigoius
refer
regeretful
relate
respective
firstly|secondly|thirdly|fourthly
shall|will
so
sort of
to.*ly
state
student body
than
thank\w+ in advance
that|which
the foreseeable future
the* is
they|he|she
this
thrust
tortuous|torturous
try
type
unique
utilize
verbal
very
while
\w+wise
worth.while
would


Sinan
--
A. Sinan Unur <1usa@xxxxxxxxxxxxxxxxxxx>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html

.



Relevant Pages

  • Re: [PATCH] Speed up "make headers_*"
    ... 'use strict' and 'use warnings' is recommended. ... The parentheses are not needed for most of the built-in functions. ... More or less the same comments would apply to the next script as well. ...
    (Linux-Kernel)
  • Re: Regular Expression and file editing.
    ... Goksie wrote: ... If i run the script, the changes could not be effected bcos the files is ... use warnings; ... use strict; ...
    (perl.beginners)
  • Re: Passing vars to a "require"d script
    ... > require'd script didn't seem to pull in the form data, ... >> use strict; ... >> use warnings; # main program ...
    (comp.lang.perl.misc)
  • Re: Pipe input over several scripts
    ... The problem is that script c is not getting the input. ... use strict; ... use warnings; ... buffer where script_c can get it. ...
    (comp.lang.perl.misc)
  • Re: Counting column delimiters per row in a text file
    ... Then parse (see the parse method) ... This can't be more than 20 lines or so including use strict and use ... Put the data in the __DATA__ section of your script. ... use warnings; ...
    (comp.lang.perl.misc)

Loading