Re: Newbie Help (WinXP)
- From: Chris <ithinkiam@xxxxxxxxx>
- Date: Fri, 28 Oct 2005 10:30:32 +0100
Mark Jerde wrote:
There are 228 .html files I want to suck 4 items out of and put in a single comma delimitted file, one line per .html file. I know how to build the regular expressions to get the 4 items.
OK. Show us.
I know how to do this in *.NET and VB6 but I wanted to try to do it in Perl. I downloaded perl from ActiveState.com and have successfully run a few programs but I'm not being successful processing *.html on the command line.
Why don't you show us what you've tried. Then someone can give pointers as to where you're going wrong.
I would really appreciate it if someone would reply with the WinXP command line and the Perl program to make a single output file of [Father, Mother] from *.html input. Sample input:
1.html
Person=George Father=Alan Mother=Sarah Hobby=Cheating at dice
2.html
Person=Karen Father=David Mother=Mary Hobby=Burning books
3.html
Person=Mark Father=Sven Mother=Helga Hobby=Burping
Output file:
Alan, Sarah David, Mary Sven, Helga
In linux this works for one file at a time:
perl -lne 'if (/(Father|Mother)=(\w+)/) {push @data, $2} END{print join(",", @data)}' 1.html
I'll leave it as an exercise for the OP to modify it to output correctly for *.html. I could do it as a script, but I'm sure it's possible on a one-liner.
.
- Follow-Ups:
- Re: Newbie Help (WinXP)
- From: Mark Jerde
- Re: Newbie Help (WinXP)
- References:
- Newbie Help (WinXP)
- From: Mark Jerde
- Newbie Help (WinXP)
- Prev by Date: Re: Die without textoutput?
- Next by Date: Re: Die without textoutput?
- Previous by thread: Newbie Help (WinXP)
- Next by thread: Re: Newbie Help (WinXP)
- Index(es):
Relevant Pages
|
|