Re: HTML parsing
- From: khowlette@xxxxxxxxxxx (Keith Howlette)
- Date: 18 Apr 2005 06:29:34 -0700
Hi
I've posted a new version of the code on my website
http://www.keith-howlette.com/download/hack17v101.pl I've reviewed
the code and it now separates out the items and disregards Ebay's
recommended. Give it a try and email me if there are other problems
Keith Howlette
"Paul Bell" <paul@xxxxxxxxxxxxxxxxxxx> wrote in message news:<3a83lvF66tjseU1@xxxxxxxxxxxxxx>...
> Hello
>
> I am completely new to Perl and am needing some help with a short script I
> am using to search ebay for new items.
>
> The script parses a HTML page generated from an ebay search and looks for
> all the auction links and emails the list to me.
>
> The script works wonderfully, but if I put in a strong criteria to reduce my
> search results, ebay usually brings back a list of other auctions it thinks
> I will be interested in ie from other categories/countries/etc.
>
> What I really need to do is only email the links that are above the phrase
> "Some of the matching items found in other eBay areas" in the HTML.
>
>
>
> #!/usr/bin/perl
> # This is a replacement to Hack17
> # The original relied on the WWW::Search::EBay Module
> # Which has been redeveloped to support only Linux/UNIX's !!
> # This code make no use of it and relies on common modules
> # it also emails in a windows environment
> # doesn't makes use of sendmail (doen't exist on windows, unless its 3rd
> party)
> # You will Probably need to install MIME::Lite and possibly Net::SMTP
> # Auhtor Keith Howlette
> # www.keith-howlette.com
> # Use it as you please any bugs to "khowlette<at>bigfoot.com"
> #################################################################
> # Hacking th hack
> # search for more than one item.
> # The search items could be load from a file
> # and used to feed main body of program.
> #
> # I may set this up on a website and allow poeple to submit searches
> # need a good fee linux host that allows perl script to run.
>
> ###################################################################
>
>
> use strict;
> use LWP 5.64;
> use URI;
> use HTML::LinkExtor;
> use HTML::HeadParser;
> use Net::SMTP;
> use MIME::Lite;
>
> # Set too your country e.g. ebaycom.au
> my $country=".co.uk";
>
> my $base="http://search.ebay".$country."/ws/search/SaleSearch";
>
> # Title to search for
> my $title="radeon 9700*";
>
> # Catergory to search get from http://listings.ebay.co.uk
> my $cat ="160";
>
> #your email address
> my $email = qw /me@xxxxxx/;
>
> #your mail server
> my $mailsrv =qw /smtp@xxxxxx/;
>
>
> # File to keep items number already seen
> my $localfile="listing.txt";
>
> # declare some vars
> my ($a,$b,$line, $itemnumber,@title,$results,%data,%olditems,$key);
>
> #Set hash to nothing
> %data=();
>
> my $browser = LWP::UserAgent->new;
> # Un comment if you need to use a proxy - replace with real address and port
> #$browser->proxy(['http', 'ftp'], 'http://10.111.10.11:8080/');
> my $url =URI->new($base);
>
> $url->query_form(
> 'sacat'=> $cat,
> 'sasaleclass'=> '2',
> 'satitle'=> $title
> );
>
> # set up the link handler sub
> my $link_extor = HTML::LinkExtor->new(\&handle_links);
>
> #get search results
> my $response = $browser->get($url);
>
> #get the links
> $link_extor->parse($response->content);
>
> #get items already seen in hash %olditems
> my %olditems=();
> if (-s $localfile)
> {
>
> open (INFILE,"$localfile");
> while (<INFILE> )
> {
> chomp;
> next if $_ eq "";
> $olditems{$_}=1;
> }
> close (INFILE);
> }
>
> # delete items from %data hash already seen
> foreach $key (keys %olditems)
> {
> if (exists($data{$key}))
> {
> delete $data{$key};
> }
> }
>
>
> # *** save any remaining new entries to file ***
> open (OUTFILE,">>$localfile");
> my $mailbody="";
>
> foreach $itemnumber (keys %data)
> {
> my $line=&get_title($data{$itemnumber});
> print OUTFILE $itemnumber."\n";
> #print "Line=".$line."\n";
> $mailbody=$mailbody.$line;
> }
> close (OUTFILE);
>
> #send mail
> my $msg = MIME::Lite->new
> (
> To => $email,
> From => $email,
> Subject =>"Ebay Search for [".$title."]",
> Type =>'multipart/related'
> );
>
> $msg->attach(Type => 'text/html', Data => qq{ $mailbody }
> );
>
> MIME::Lite->send('smtp', $mailsrv, Timeout=>60);
>
> $msg->send if $mailbody ne "";
>
>
> ######################################
> sub handle_links
> {
>
> my ($tag, %links)=@_;
> my $key;
> if ($tag eq 'a')
> {
> foreach $key (keys %links)
> {
> #search for links with Viewitem
> if ($key eq 'href')
> {
> if ( $links{$key} =~ m/ViewItem/)
> {
> #get the item number from the link
> $links{$key} =~ m/item=(\d+)/;
>
> $data{$1}=$links{$key};
> }
> }
> }
> }
> }
>
> sub get_title($)
> {
>
> my ($page)=@_;
> my $itempage = LWP::UserAgent->new;
> my $item_contents=$itempage->get($page);
>
>
> my $p = HTML::HeadParser->new;
> $p->parse($item_contents->content);
> my $link="<p><a href=\"$page\">".$p->header('Title')."</p>";
> return $link;
>
>
> }
>
>
>
> If anyone could edit the code I would be extremely grateful.
>
> Paul
.
- Prev by Date: Re: Need help with a simple UNIX sockets server based on IO::Socket::UNIX
- Next by Date: Re: Need help with a simple UNIX sockets server based on IO::Socket::UNIX
- Previous by thread: Xmms-Perl-0.12 not compiling (required for Bundle::MP3)
- Next by thread: How to render HTML as text (like lynx does) ?
- Index(es):
Relevant Pages
|