Re: Help with pattern matching

From: John W. Krahn (krahnj_at_acm.org)
Date: 04/03/04


To: beginners@perl.org
Date: Fri, 02 Apr 2004 15:10:02 -0800

A Lukaszewski wrote:
>
> Greetings all,

Hello,

> I have a comma-delimited lexical file with four fields: line number, the
> first part of a word, the second part of a word, and the word combined.
> The first and fourth fields are only for reference. The program I am
> developing is very simple. If field two and field three both have
> accents in them, then print the line to an output file.
>
> The heavily-commented program is below. Thus far, all I get is an exact
> replica of the input file. In addition to a plain binding operator of
> '=~ //', I have also tried explicit matching (m//) and regex (qr//).
>
> #!/usr/bin/perl
>
> #############################################################
> #############################################################
> # A PROGRAM TO READ THE SUB-WORD HEADERS OF A #
> # COMMA-DELIMITED FILE #
> # AND DETERMINE WHICH LINES HAVE MULTIPLE ACCENTS #
> #############################################################
> #############################################################
>
> use strict;

If you had had warnings enabled as well as strict you might have found
your problem a lot sooner. :-)

use warnings;

> ###################################
> # OPEN THE INPUT AND OUTPUT FILES #
> ###################################
>
> my ($file, $outfile);
>
> $file = 'y.csv' ;
> # Name the input file
> $outfile = 'y.res';
> # Name the output file

In Perl you usually declare your variables where you first use them and
your comments provide no useful additional information.

my $file = 'y.csv';
my $outfile = 'y.res';

> open(INFO, "$file" ) or die "Cannot open $file:$!\n";
> # Open the input file or report failure
> open(OUT, ">>$outfile") or die "Cannot open file y.res!\n";
> # Open the output file
>
> ########################################
> # INITIALIZATION OF SCALARS AND ARRAYS #
> ########################################
>
> my $line; # = scalar by which program steps through data
> my $fieldEval1; # = holding scalar for evaluating whether the
> # first half of the word has an accent in it
> my $fieldEval2; # = holding scalar for evaluating whether the
> # second half of the word has an accent in it
> my @field; # = holding array for the split line

You should declare these variables where you use them to limit their
scope.

> #######################################################
> # FOREACH CONTROL TO READ THE INPUT FILE LINE BY LINE #
> # AND MANIPULATE THE DESIRED DATA TO AN OUTPUT FILE #
> #######################################################
>
> foreach $line (<INFO>) {

foreach my $line ( <INFO> ) {

But you should really be using a while loop to read from files. foreach
and for create a list in memory which means that the whole file will
have to be read before processing starts.

while ( my $line = <INFO> ) {

> # Assign the contents of the input file to $line one line at time for
> # evaluation.
> chomp ($line); # remove input field separator
> next unless $line; # skip blank lines
> @field = split /,/, $line; # Read each line as four fields split by
> commas
>
> # Assign the second field to an evaluation scalar
> $fieldEval1 = $field[1];
> # Assign the third field to an evaluation scalar
> $fieldEval2 = $field[2];

You can assign to $fieldEval1 and $fieldEval2 directly from the split:

my ( undef, $fieldEval1, $fieldEval2 ) = split /,/, $line;

But it doesn't look like you are using those variables later?

> # Test whether BOTH the second or third fields have accents in them
> # Accents are represented by the following characters: k K c ; ' [ { ] }
> # \ and |.
> if ({$field[1] =~ /[kKc;'\[\{\]\}\\\|]/} && {$field[2] =~
> /[kKc;\'\[\{\]\}\\\|]/}) {

Your problem is in this line (which warnings would have complained
about.) The braces {} around the pattern matches are creating an
anonymous hash which returns a reference to that hash which is always
true in a boolean context which means that the expression is always
true.

if ( $fieldEval1 =~ /[][{};'\\|kKc]/ && $fieldEval2 =~ /][{};'\\|kKc/ )
{

Or you could probably simplify it like this:

my ( $fieldEval ) = $line =~ /^[^,]+,([^,]+,[^,]+)/;

if ( $fieldEval =~ /[][{};'\\|kKc]/ ) {

> print OUT "$line\n"; # If so, print the line to file
> }
> }
>
> close (OUT); # Close the output file
> close(INFO) ; # Close the input file
> __END__

John

-- 
use Perl;
program
fulfillment


Relevant Pages

  • Re: Help is needed to compile C program using Visual Studie 2005
    ... the pdb file that was used when this precompiled header was created, ... an output file whose name has the following format: ... The length of input file paths and name must be less than 256; ... while(i < DefinedVariableArrayIndex) { ...
    (microsoft.public.vc.language)
  • Re: Need advice on File I/O
    ... open the input file and open an output file, ... you would still have the input file unchanged. ... On all currently supported operating systems, ...
    (comp.soft-sys.matlab)
  • Re: Help with pattern matching
    ... then print the line to an output file. ... > replica of the input file. ... In Perl, CamelBack is generall reserved for package names. ... where their meaning must be expressed in comments. ...
    (perl.beginners)
  • Re: Difficult text file to parse.
    ... > records are which there are only two, look at the output file below to ... I want to show the delimiters even if ... > My sample Input file: ... [sample input and output files with long fields snipped] ...
    (comp.lang.perl.misc)
  • Re: How to remove [], {}, and other characters, rendering numeric values in a CSV file ?
    ... use warnings; ... use strict; ... Am I wrongly invocating the perl code? ... the word "cluster" from the input file, ...
    (perl.beginners)