How to pick content from html using beatifulsoup



Hi,

I am a newbie in python, I need to fetch names of side filters and save in csv [PFA screen shot].

Following is snippet from code:
soup = BeautifulStoneSoup(html)
# for e in soup.findAll('div'):
# for c in e.findAll('h3'):
# for d in c.findAll('li'):
# print'@@@@@@@', d.extract()
#

# #select_pod=soup.findAll('div', {"class":"win aboutUs"})
# #promeg= select_pod[0].findAll("p")[0]
#
#



# for dv in soup.findAll('div', {"class":"attribution"}):
# ds = dv.findAll("<h3>")
# print ds



select_pod = soup.findAll('div')
print select_pod
for j in select_pod:
if j is not None:
print j.findall('a')
promeg = select_pod.findAll("<h3>")
#print '--', promeg




#hreflist = [ each.get('value') for each in soup.findAll('<h3>') ]


for m in promeg :
if m:
print 'Data values', m
fd1.writerow([x[2], m, i[0], "Data Found"])


Structure of HTML:

<div class="attribution">
<div>
<h3>By Brand</h3>
<ul>
<li>
<a href="http://www.xyz.com/cellphones/nokia/nokia/259-33902/buy";>Nokia</a>
</li>
<li>
<li>
<li>
<li>
<li>
<li>
<li>
<li class="more">
</ul>
</div>
<div>
<h3>By Seller</h3>
<ul>
<li>
<a id="att_296935_184059" class="attributeUrlReplacementTarget" href="http://www.xyz.com/cellphones/nokia/amazon-marketplace/296935-184059/buy";>Amazon Marketplace</a>
<input id="att_296935_184059_replacement" type="hidden" value="http://www.xyz.com/cellphones/nokia/amazon-marketplace/296935-184059/buy";>
</li>
<li>
<li>
<li>
<li>
<li>
<li>
<li>
<li class="more">
</ul>
</div>
<div>
<div>
</div>


Output required in csv:

By Brands
Nokia
Samsung
..
..

By Seller
Amazon
Buy.com
..
..
..



Please suggest how to fetch details.

Sheetal Singh

Attachment: filters.png
Description: filters.png



Relevant Pages

  • IE bug?
    ... I am generating a page that outputs a CSV file. ... TABLE, each record is wrapped in a TR TD, and each field is wrapped in a DIV ... Positioning of this DIV tag goes through several steps to ... * If another window partially covers the IE window, ...
    (microsoft.public.windows.inetexplorer.ie6.browser)
  • Re: [SOLUTION] Stock Portfolios (#41)
    ... # fetch and print data ... csv.each {|line| puts line} ...
    (comp.lang.ruby)