Re: Why doesn't this work: matching capturing



On Feb 26, 1:19 pm, kzemb...@xxxxxxxxxx (Kevin Zembower) wrote:
I have a data file that looks like this:
uSF1  MD15000000  009214935522451020                9  0101001
88722397N07209999900
116759             0Block Group 1
S      1158      662+39283007-076574503

uSF1  MD15000000  009215035522451020                9  0101002
88722397N07209999900
109338             0Block Group 2
S       842      547+39280857-076573636

uSF1  MD15000000  009215135522451020                9  0101003
88722397N07209999900
182248        135142Block Group 3
S       920      442+39279557-076574311

This is actually three lines that all start with 'uSF1'. This is the
Summary File from the US 2000 Census. I want to print all the census
tracts and blockgroup numbers for FIPS state code = "24" (Maryland) and
FIPS county code "510" (Baltimore City) for summary level '150'. These
are all fixed-length records. I tried:
[kevinz@www UScensus]$ perl -ne '($tract, $bg) =
/^.{8}150.{18}24510.{21}(.{6})(.)/; print "Tract $tract BLKGRP $bg\n";'
mdgeo.uf1 |head
Tract  BLKGRP
Tract  BLKGRP
Tract  BLKGRP
<snip>

I thought that this would:
   skip 8 characters and match '150'
   skip 19 more characters and match '24' and '510'
   skip 21 more characters and capture the next 6 in $tract
   capture the next character in $bg
   and print them.

The first two matches work, but nothing is captured. Any ideas what I'm
doing wrong?

On what do you base your assumption that "the first two matches
work"? Nothing in your code or output indicates that, as you are
never checking the return value of the pattern match.

FWIW, your code did work for me when I copy and pasted your sample
text, and joined the lines as they should have been. Therefore, I
think it's pretty likely that your datafile does not contain what you
think it does. I think it's more likely that the one line you think
you have that starts with uSF is actually broken up into a few lines.

Try some debugging prints of $_ to see what you actually have, like:
print "Line $.: <<$_>>";

Try checking the return value of your regexp:
/^.{8}150.{18}24510.{21}(.{6})(.)/ and print "Tract $1 BLKGRP $2\n";

Try enabling warnings to see of your two variables are undefined
(which they would be if the pattern didn't match) or just empty
strings (which they would be if the pattern matched but nothing was
captured - this, of course, isn't possible, since a six-character
match can't possibly be the empty string).

Paul Lalli

.



Relevant Pages

  • RE: Why doesnt this work: matching capturing
    ... Tract 010100 BLKGRP 1 ... Subject: Why doesn't this work: matching capturing ... never checking the return value of the pattern match. ...
    (perl.beginners)
  • Calculating average annual change in real estate value
    ... I have median sales prices for the years 2000-2005 for each census ... particular census tract. ...
    (microsoft.public.excel.misc)
  • Re: Why doesnt this work: matching capturing
    ... Summary File from the US 2000 Census. ... Tract BLKGRP ... skip 19 more characters and match '24' and '510' ...
    (perl.beginners)
  • Re: Mapoint via VB.NET
    ... >> Does anyone familiar with mapPoint know of a way to determine what census ... tract, draw a shape around the area, and then QueryShape or Export to ... http://www.mp2kmag.com/mappoint/discussion/ - The Magazine for ...
    (microsoft.public.dotnet.languages.vb)