Re: Perl script to clean up file -- Dont know if it can be done
From: Craig Ciquera (cciquera_at_mathworks.com)
Date: 09/22/04
- Next message: Michele Dondi: "Re: "RFC": re [un]pack()"
- Previous message: Heinrich Mislik: "Re: In search of elegant code: Combining two statements into one (DBI)"
- In reply to: LHradowy: "Perl script to clean up file -- Dont know if it can be done"
- Next in thread: thundergnat: "Re: Perl script to clean up file -- Dont know if it can be done"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Wed, 22 Sep 2004 10:32:17 -0400
Would this work:
# Read in the datafile (assume name is datafile.txt)
open(DATAFILE,"<datafile.txt") or die "Cannot open datafile.txt";
while (defined ($line = <DATAFILE>) )
{
# Skip everything but the info we are interested in
next unless $line =~ /\d{2} \d{1} \d{2} \d{2}/;
# Remove any leading 0's and/or whitespace
$line =~ s/(^0{0,}\s{0,})(.*)/$2/;
# Remove any potential empty fields
$line =~ s/\s{2,}/,/g;
print $line;
}
"LHradowy" <laura.hradowy@NOSPAM.mts.caaaaa> wrote in message
news:WWe4d.2141$IO.13667@news1.mts.net...
> I thought I would throw this out there, I think it can not be done, but I
am
> not a guru.
>
> This is the problem I get a file that I must pull the pertanent data out.
I
> has a header and footer, as well as page breaks, this is all in ASCII
> format. I need to pull out just the columns.
> I do this all manually (delete the header and footer, and well as all the
> page breaks) there are also at times a 0 at the beginning of a record that
I
> do not want there as well.
>
> This is what the file looks like...
>
>
> REPORT NAME: FACICL
> 0 CLIENT JOB NUMBER: 23405
> 0 CLIENT NAME: LAURA XXXXXXXX
> 0 CLIENT MAILING CODE: D509H
> 0 REPORT DATE: 04/07/08
> 0 REPORT TIME: 12:46
> 1REPORT NO: FACRPT14 SOME INFO HERE
> RUN DATE: 04JUL10
> 0 JOB NAME: FACCTICL CUTOVER ACCARE INTERFACE
> REJECT REPORT PAGE NO: 2
> 0 PROGRAM : FACB5500 CUTOVER: LEAF CUTOVER
> DATE: 04JUL09
> 0 TELN CUTTELN CUTOEN REJECT
REASON
> ---- ------- -------------- -----------
--
> -----------------
> 0 1555200 00 0 12 02 CUSTOMER
HAS
> 2555206 00 0 05 01 CUSTOMER
HAS
> 4555208 00 0 03 06 TELN NOT
BILL
> 1REPORT NO: FACRPT14 SOME DATA HERE
> RUN DATE: 04JUL10
> 0 JOB NAME: FACCTICL CUTOVER ACCARE INTERFACE
> REJECT REPORT PAGE NO: 3
> 0 PROGRAM : FACB5500 CUTOVER: LEAF CUTOVER
> DATE: 04JUL09
> 0 TELN CUTTELN CUTOEN REJECT
REASON
>
> ---- ------- -------------- -----------
--
> -----------------
> 0 1555200 00 0 12 02 CUSTOMER
HAS
> 2555206 00 0 05 01 CUSTOMER
HAS
> 4555208 00 0 03 06 TELN NOT
BILL
> - REJECTED = 000000145 CUTOVER = 000000213
> - *** SUCCESSFUL COMPLETION OF
> FACCTICL ***
>
>
> I manually disect this file to make it look like this...
> 1555002 00 0 04 27 TELN NOT
BILL
> 3555007 00 0 06 00 CUSTOMER
HAS
> 5555410 00 0 12 10 CUSTOMER
HAS
> 6755012 00 0 12 06 CUSTOMER
HAS
>
> I have manually removed the header, footer and page breaks. As well as
there
> always seems to be a 0 at start of the first record. I remove this as
well.
> I then run this perl script:
>
> while (<>) {
> chomp; # Will remove the leading , or new line
> s,^\s+,,; #Remove leading spaces
> my @cols=split m/\s{2,}/, $_, -1; # Split on two (or more) white space
> characters
> @cols == 2 and splice @cols, 1, 0, "";
> print join (',',@cols)."\n";
> }
>
> And I get this: WHAT I NEED!
> 5555002,00 0 04 27,TELN NOT BILL
> 1555007,00 0 06 00,CUSTOMER HAS
> 2555010,00 0 12 10,CUSTOMER HAS
>
> I want to try to eliminate as much manual intervention as I can.
>
>
- Next message: Michele Dondi: "Re: "RFC": re [un]pack()"
- Previous message: Heinrich Mislik: "Re: In search of elegant code: Combining two statements into one (DBI)"
- In reply to: LHradowy: "Perl script to clean up file -- Dont know if it can be done"
- Next in thread: thundergnat: "Re: Perl script to clean up file -- Dont know if it can be done"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|