regex problem

From: leegold (goldtech_at_worldpost.com)
Date: 01/07/04


Date: 7 Jan 2004 13:15:47 -0800

Notice I have to use the lines twice:

"$line =~ s/###\s*?###/###empty###/g;
$line =~ s/###\s*?###/###empty###/g;"

in order to get the desired result. I've been playing
for a long time but can't get it w/one regex expression. I wondered
if I can put one regex into a loop(?) I think it has to do w/greedy
vs. non-greedy. The code's is rough sorry - please find the output
below too hjope you'll see what I'm trying and that repetive regex's
are not a good solution - does anyone know the trick?

Thanx,
Lee

#!/usr/bin/perl -wT
use strict;
use CGI qw(:standard);
use CGI::Carp qw(fatalsToBrowser);

print header, start_html('Tag Conversion'), h1('Tag Conversion');
my $infile = param("in_file");
my $count;
if ($infile) {
  
  chomp($infile);
  print p("Input file: $infile");
  open (INFILE, "<$infile") || die "can't open '$infile': $!";
  my $line = <INFILE>;
  while ($line =~ /###/g)
   { $count++ }
  print p("There are $count fields in a record");
  print p("$line");
  $line =~ s/^\d+//;
  print p("$line");
  # my $repl = '###empty###';
  $line =~ s/###\s*?###/###empty###/g;
  $line =~ s/###\s*?###/###empty###/g;
  print p("$line");
}
elsif (param()) { # for sake of our discussion this block is not called

  my $infile = param("in_file");
  chomp($infile);

  print p("Input file: $infile");

  open (INFILE, "<$infile") || die "can't open '$infile': $!";

  while (<INFILE>) {
    chomp;
    s/^\d+//;
    my @rec_array = split( /###/, $_);
    print p($_);
    print p("@rec_array\n");
    }
}
else { # what user sees 1st
  print start_form();
  print p("What's your input file?: ", textfield("in_file"));
  print p(submit("Create Tagged File"));
  print end_form();
}
print end_html;

-----OUTPUT--------

There are 14 fields in a record

1###o31025883###3C on-line ###New York NY The Association###Hardcopy###
### ###F1A###ACM SIGCCC###Computer Science/Information
Technology###1078-2192###Quarterly### ### ###Goddard

###o31025883###3C on-line ###New York NY The Association###Hardcopy###
### ###F1A###ACM SIGCCC###Computer Science/Information
Technology###1078-2192###Quarterly### ### ###Goddard

###o31025883###3C on-line ###New York NY The
Association###Hardcopy###empty###empty###F1A###ACM SIGCCC###Computer
Science/Information
Technology###1078-2192###Quarterly###empty###empty###Goddard