Re: Help with BeautifulSoup





Michiel Overtoom wrote:

Alex wrote...

Okay, heres the general idea of the html I have to work with:

<div>
noun
<table class='luna'>
<table class='luna'>
<table class='luna'>
<table class='luna'>
verb
<table class='luna'>
<table class='luna'>
<table class='luna'>
</div>

Okay, I left off some stuff.

I wish you didn't, or at least provided an URL where I can get the page
which you are trying to parse. Now I don't have a valid testcase to
tinker
with. And maybe you can also show your code which you already came up
with.


I can easily get the tables but it is the span's than I am having trouble
with.

I can't see any SPAN tags in the example you provided.

Greetings,

--
"The ability of the OSS process to collect and harness
the collective IQ of thousands of individuals across
the Internet is simply amazing." - Vinod Vallopillil
http://www.catb.org/~esr/halloween/halloween4.html

--
http://mail.python.org/mailman/listinfo/python-list



Oh, well sorry, I wrote the span tags, but they didn't show up. But it was
around the noun. Here is the code I have to get the definitions alone:

import urllib
from BeautifulSoup import BeautifulSoup

class defWord:
def __init__(self, word):
self.word = word

def get_defs(term):
soup =
BeautifulSoup(urllib.urlopen('http://dictionary.reference.com/search?q=%s' %
term))

for tabs in soup.findAll('table', {'class': 'luna-Ent'}):
yield tabs.findAll('td')[-1].contents[0].string

self.mainList = list(get_defs(self.word))

Theres a bit more to it, but it doesn't matter here, and so if you look I am
using dictionary.com as the website. If you look at the html, the "" tags
are where the type of the word is and that is what I need, in order. Or if I
can figure out how many <table> tags are inbetween each "" tag, that too
would work.


This is the type the code I am talking about if it helps:

–noun
<table class="luna-Ent">
<tbody>
<tr>
<td class="dn" valign="top">1.</td>
<td valign="top">the curd of milk separated from the whey and prepared in
many ways as a food. </td>
</tr>
</tbody>
</table>
<table class="luna-Ent">
<tbody>
<tr>
<td class="dn" valign="top">2.</td>
<td valign="top">a definite mass of this substance, often in the shape of a
wheel or cylinder. </td>
</tr>
</tbody>
</table>
<table class="luna-Ent">
</table>
<table class="luna-Ent">
</table>
<table class="luna-Ent">
</table>
<table class="luna-Ent">
</table>
<table class="luna-Ent">
</table>
<table class="luna-Ent">
</table>
–verb (used without object)
<table class="luna-Ent">
<tbody>
<tr>
<td class="dn" valign="top">9.</td>
<td valign="top">
</td>
</tr>
</tbody>
</table>
–verb (used with object)
<table class="luna-Ent">


If you need anything else, feel free to ask!
--
View this message in context: http://www.nabble.com/Re%3A-Help-with-BeautifulSoup-tp18418004p18423003.html
Sent from the Python - python-list mailing list archive at Nabble.com.

.



Relevant Pages

  • Re: Help with BeautifulSoup
    ... heres the general idea of the html I have to work with: ... Okay, ... Oh, well sorry, I wrote the span tags, but they didn't show up. ... around the noun. ...
    (comp.lang.python)
  • Help with BeautifulSoup
    ... Okay, ... EXACT order that it would be in the html. ...
    (comp.lang.python)
  • No pop-up but how about a pop quiz? (OT)
    ... Okay, I know this has nothing to do with html, but I just can't resist ... The following question assumes that the laws, theories, hypotheses, et al ... purchased the latest and greatest spaceship ever made. ...
    (alt.html)
  • Re: Could someone test this please..?
    ... >> Okay, I finally got around to viewing some of your Web pages. ... The css error is really a work around for an internet ... But now there are only 9 HTML 4.01 Strict errors. ...
    (rec.gardens)
  • Re: One-To-One Relationships
    ... Okay, I think that makes you a natural-born implementor even though I ... Just what is it that qualifies them to be entities? ... Do you mean anything that can be described as a singular noun, ...
    (comp.databases.theory)