RE: Pattern Match

From: Tom Kinzer (tomkinzer_at_earthlink.net)
Date: 12/09/03


To: <beginners@perl.org>
Date: Tue, 9 Dec 2003 12:37:59 -0800

Rob, can you explain the details of that replace? That's pretty slick. I
see you're adding the hex value to get to the appropriate ASCII value, but
didn't know you could do some of that gyration inside a regex.

Thanks.

-Tom Kinzer

-----Original Message-----
From: Rob Dixon [mailto:rob@dixon.port995.com]
Sent: Tuesday, December 09, 2003 11:58 AM
To: beginners@perl.org
Subject: Re: Pattern Match

Eric Sand wrote:
>
> I am very new to Perl, but I sense a great adventure ahead after just
> programming with Cobol, Pascal, and C over the last umpteen years. I have
> written a perl script where I am trying to detect a non-printing
> character(Ctrl@ - Ctrl_) and then substitute a printing ASCII sequence
such
> as "^@" in its place, but it does not seem to work as I would like. Any
> advice would be greatly appreciated.
>
> Thank You....Eric Sand
>
>

Your obvious guess is to write Perl as if it were C. That's slightly better
than treating it as a scripting language, but there are many joys left to be
found!

> $in_ctr=0;
> $out_ctr=0;
>
> while ($line = <STDIN>)
> {
> chomp($line);
> $in_ctr ++;
> if ($line = s/\c@,\cA,\cB,\cC,\cD,\cE,\cF,\cG,\cH,\cI,\cJ,\cK,
> \cL,\cM,\cN,\cO,\cP,\cQ,\cR,\cS,\cT,\cU,\cV,\cW,
> \cX,\cY,\cZ,\c[,\c\,\c],\c^,\c_
> /^@,^A,^B,^C,^D,^E,^F,^G,^H,^I,^J,^K,
> ^L,^N,^N,^O,^P,^Q,^R,^S,^T,^U,^V,^W,
> ^X,^Y,^Z,^[,^\,^],^^,^_/)
> {
> $out_ctr ++;
> printf("Non-printing chars detected in: %s\n",$line);
> }
> }
> printf("Total records read =
%d\n",$in_ctr);
> printf("Total records written with non-printing characters =
%d\n",$out_ctr);

I would write this as below. The first things is to *always*

  use strict;
  use warnings;

after which you have to declare all of your variables with 'my'.

The second is to get used to using the default $_ variable which
is set to the value for the current 'while(<>)' or 'for' loop
iteration, and is a default parameter for most built-in functions.

Finally, in your particular case you're using the s/// (substitute)
operator wrongly. The first part, s/here//, is a regular expression,
not a list of characters. You'll need to read up on these at

  perldoc perlre

The second part, s//here/, is a string expression which can use
'captured' sequences (anything in brackets) from the first part
and, with the addition of the s///e (executable) qualifier can
also be an executable statement. Here I've used it to add 0x20
to the ASCII value of the control character grabbed by the regex.

A lot of this won't make sense until you learn some more, but I
hope you'll agree that this code is cuter than your original?

HTH,

Rob

use strict;
use warnings;

my $in_ctr = 0;
my $out_ctr = 0;

while (<>) {

  chomp;

  $in_ctr++;

  if (s/([\x00-\1F])/'^'.chr(ord($1) + 0x40)/eg) {
    $out_ctr++;
    printf "Non-printing chars detected in: %s\n", $_;
  }
}

printf "Total records read = %d\n", $in_ctr;
printf "Total records written with non-printing characters = %d\n",
$out_ctr;

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Relevant Pages

  • Re: removing null characters from ascii file
    ... >i have an ascii text file that contains a few null characters in it ... >save the file, the nulls ... try tr or sed, or use perl. ...
    (comp.unix.questions)
  • Re: File-Compare "fc" falsely reports mismatch between identical files
    ... first and last lines of each set of differences, whereas /L is said to compare files as ascii text. ... Show me a couple of "text files" that fc/a does not compare properly, and I would argue that they are so extreme in some way that I would not consider them "text files". ... One of the definitions found by google is this: "A file that contains characters organized into one or more lines. ... the tax department reacted to a customer's complaint and insisted that the faulty tax calculation be fixed. ...
    (microsoft.public.win2000.cmdprompt.admin)
  • Re: POS. Cash Register on AS400.- New and Updates
    ... Probably the easiest way would be to send them as ASCII. ... You need to change the printer file to not convert unprintable characters. ... "The INITPRT tag defines the ASCII control ... but still can not open cash drawer. ...
    (comp.sys.ibm.as400.misc)
  • Re: Unicode Support
    ... consider:)...but, you know, a file is still just a "stream of characters" ... "escape sequence" but accessing an ordinary ASCII character) are considered ... English, while all your identifiers are in "Romanji" Japanese or something ... NASM appears already to do so with strings and comments in ...
    (alt.lang.asm)
  • Re: System 360 EBCDIC vs. ASCII
    ... I suppose they could have created a 7-bit architecture if it ... There are a few vestiges of 7-bit characters in other computer systems due ... If you set your modem to 8 bits you ... connections, including hardwired ones: plotters, ASCII terminals, etc. ...
    (bit.listserv.ibm-main)