Re: HTML parsing



Hi

I've posted a new version of the code on my website
http://www.keith-howlette.com/download/hack17v101.pl I've reviewed
the code and it now separates out the items and disregards Ebay's
recommended. Give it a try and email me if there are other problems

Keith Howlette

"Paul Bell" <paul@xxxxxxxxxxxxxxxxxxx> wrote in message news:<3a83lvF66tjseU1@xxxxxxxxxxxxxx>...
> Hello
>
> I am completely new to Perl and am needing some help with a short script I
> am using to search ebay for new items.
>
> The script parses a HTML page generated from an ebay search and looks for
> all the auction links and emails the list to me.
>
> The script works wonderfully, but if I put in a strong criteria to reduce my
> search results, ebay usually brings back a list of other auctions it thinks
> I will be interested in ie from other categories/countries/etc.
>
> What I really need to do is only email the links that are above the phrase
> "Some of the matching items found in other eBay areas" in the HTML.
>
>
>
> #!/usr/bin/perl
> # This is a replacement to Hack17
> # The original relied on the WWW::Search::EBay Module
> # Which has been redeveloped to support only Linux/UNIX's !!
> # This code make no use of it and relies on common modules
> # it also emails in a windows environment
> # doesn't makes use of sendmail (doen't exist on windows, unless its 3rd
> party)
> # You will Probably need to install MIME::Lite and possibly Net::SMTP
> # Auhtor Keith Howlette
> # www.keith-howlette.com
> # Use it as you please any bugs to "khowlette<at>bigfoot.com"
> #################################################################
> # Hacking th hack
> # search for more than one item.
> # The search items could be load from a file
> # and used to feed main body of program.
> #
> # I may set this up on a website and allow poeple to submit searches
> # need a good fee linux host that allows perl script to run.
>
> ###################################################################
>
>
> use strict;
> use LWP 5.64;
> use URI;
> use HTML::LinkExtor;
> use HTML::HeadParser;
> use Net::SMTP;
> use MIME::Lite;
>
> # Set too your country e.g. ebaycom.au
> my $country=".co.uk";
>
> my $base="http://search.ebay".$country."/ws/search/SaleSearch";;
>
> # Title to search for
> my $title="radeon 9700*";
>
> # Catergory to search get from http://listings.ebay.co.uk
> my $cat ="160";
>
> #your email address
> my $email = qw /me@xxxxxx/;
>
> #your mail server
> my $mailsrv =qw /smtp@xxxxxx/;
>
>
> # File to keep items number already seen
> my $localfile="listing.txt";
>
> # declare some vars
> my ($a,$b,$line, $itemnumber,@title,$results,%data,%olditems,$key);
>
> #Set hash to nothing
> %data=();
>
> my $browser = LWP::UserAgent->new;
> # Un comment if you need to use a proxy - replace with real address and port
> #$browser->proxy(['http', 'ftp'], 'http://10.111.10.11:8080/');
> my $url =URI->new($base);
>
> $url->query_form(
> 'sacat'=> $cat,
> 'sasaleclass'=> '2',
> 'satitle'=> $title
> );
>
> # set up the link handler sub
> my $link_extor = HTML::LinkExtor->new(\&handle_links);
>
> #get search results
> my $response = $browser->get($url);
>
> #get the links
> $link_extor->parse($response->content);
>
> #get items already seen in hash %olditems
> my %olditems=();
> if (-s $localfile)
> {
>
> open (INFILE,"$localfile");
> while (<INFILE> )
> {
> chomp;
> next if $_ eq "";
> $olditems{$_}=1;
> }
> close (INFILE);
> }
>
> # delete items from %data hash already seen
> foreach $key (keys %olditems)
> {
> if (exists($data{$key}))
> {
> delete $data{$key};
> }
> }
>
>
> # *** save any remaining new entries to file ***
> open (OUTFILE,">>$localfile");
> my $mailbody="";
>
> foreach $itemnumber (keys %data)
> {
> my $line=&get_title($data{$itemnumber});
> print OUTFILE $itemnumber."\n";
> #print "Line=".$line."\n";
> $mailbody=$mailbody.$line;
> }
> close (OUTFILE);
>
> #send mail
> my $msg = MIME::Lite->new
> (
> To => $email,
> From => $email,
> Subject =>"Ebay Search for [".$title."]",
> Type =>'multipart/related'
> );
>
> $msg->attach(Type => 'text/html', Data => qq{ $mailbody }
> );
>
> MIME::Lite->send('smtp', $mailsrv, Timeout=>60);
>
> $msg->send if $mailbody ne "";
>
>
> ######################################
> sub handle_links
> {
>
> my ($tag, %links)=@_;
> my $key;
> if ($tag eq 'a')
> {
> foreach $key (keys %links)
> {
> #search for links with Viewitem
> if ($key eq 'href')
> {
> if ( $links{$key} =~ m/ViewItem/)
> {
> #get the item number from the link
> $links{$key} =~ m/item=(\d+)/;
>
> $data{$1}=$links{$key};
> }
> }
> }
> }
> }
>
> sub get_title($)
> {
>
> my ($page)=@_;
> my $itempage = LWP::UserAgent->new;
> my $item_contents=$itempage->get($page);
>
>
> my $p = HTML::HeadParser->new;
> $p->parse($item_contents->content);
> my $link="<p><a href=\"$page\">".$p->header('Title')."</p>";
> return $link;
>
>
> }
>
>
>
> If anyone could edit the code I would be extremely grateful.
>
> Paul
.



Relevant Pages

  • Cookie not working for CGI logon script
    ... The problem I'm having is that umzadmin.cgi script makes me login twice ... before I can use the website. ... sub loginScreen { ... &SortForm), last SWITCH if param; ...
    (comp.lang.perl.misc)
  • HTML parsing
    ... I am completely new to Perl and am needing some help with a short script I ... am using to search ebay for new items. ... sub handle_links ...
    (comp.lang.perl.modules)
  • Script - runat=server
    ... I am updating a website for a friend of mine and it has this script ... SUB Session_OnStart ...
    (microsoft.public.frontpage.programming)
  • Re: Adsense
    ... Somewhere along the line of researching website hosting, ... on eBay there are a lot of turn key web sites already set up. ... blogs, forums, dating match ups and tons of ads. ... the biggest money maker out there is porn. ...
    (alt.marketing.online.ebay)
  • Re: Adsense
    ... Somewhere along the line of researching website hosting, ... on eBay there are a lot of turn key web sites already set up. ... blogs, forums, dating match ups and tons of ads. ... the biggest money maker out there is porn. ...
    (alt.marketing.online.ebay)