Re: Non-uniform split
- From: Christian Winter <thepoet_nospam@xxxxxxxx>
- Date: Thu, 07 Sep 2006 21:32:01 +0200
thisismyidentity@xxxxxxxxx wrote:
Hi all,
I am writing a Perl script that should parse each line of a file (which
unfortunately I cant modify) and split the line. The main problem is
that every line (nearly 10000 lines) of the file is not uniform. So
there doesnt seem to be a pattern or a delimiter on which I can simply
split the line and could do it in a loop over all lines :(.
Here is an example:
========================
A B C D E
d32 ab ae99 WB 89
d33 cd e787 WC 78
d34 ef WD
d35 gh ancjd WT 100
d36 ij WP
.
.
========================
My main intention is to extract the values in Column A, B,C..into an
array but since in some lines some values under a column may not be
present..I am unable to have a single regex on which i can split all
lines in a loop. I tried the (obvious) \s+ regex for splitting but
since the columns that r empty have spaces, I get different results for
a particular column on different lines. I am especially interested in
two columns for which it is guaranteed that each line will be non-empty
(like A,B,D) but coz of other empty columns cant get them on a
particular index of the array which is returned by split().
I'm just assuming now that column D is always "W" followed by
another capital letter, for my suggestion to work you need some
unique criteria for column D that lets you anchor your regex there:
my @fields = $line =~ /(\S+)\s+(\S+)\s+(\S*)\s*(W[A-Z])\s*(\S*)$/;
The first two non-whitespace groups should be self explanatory,
the third group (and the following whitespaces) might be absent
and therefore match an empty spot (asterisk). Column D is always
present, so we have an anchor here, and the following whitespaces
and fields may again match empty strings up to the end of the line.
HTH
-Chris
.
- Follow-Ups:
- Re: Non-uniform split
- From: Ilya Zakharevich
- Re: Non-uniform split
- References:
- Non-uniform split
- From: thisismyidentity
- Non-uniform split
- Prev by Date: Re: IO::Socket server
- Next by Date: Re: Non-uniform split
- Previous by thread: Re: Non-uniform split
- Next by thread: Re: Non-uniform split
- Index(es):
Relevant Pages
|