Re: Pulling out data between <TD> tags using regular expressions



tdmailbox@xxxxxxxxx writes:
> If I had this tag and wanted to return 123 how would I do it? I have
> tried countless methods but can not get the only the 123 without the
> <TD> tags
>
> <TD class=tblform3 id=L_listing width=23>123</TD>
>
> After 3 hours I am giving up and asking the experts.

If you'd asked your computer, you'd have had the answer much faster:

perldoc -q HTML

And the first returned result is:

"How do I remove HTML from a string?"

Which is exactly what you need. If you get in the habit of searching
your local documentation first, then you'll get better answers faster,
as you won't have to wait for an answer here, and also the people who
can give you the best answers to your questions are tired of answering
them all the time, which is why they wrote the FAQ in the first place!
So if you ask FAQs here, then you will by definition only get the
less-experienced people answering your questions, as a rule.

But I'm feeling generous, also I'd been meaning to poke at
HTML::Parser for a while anyhow. So I whipped up this little example:

#!/usr/bin/perl
use warnings;
use strict;
use HTML::Parser ();

sub start_handler
{
return if shift ne "td";
my $self = shift;
$self->handler(text => sub { print shift }, "dtext");
$self->handler(end => sub { shift->eof if shift eq "td"; },
"tagname,self");
}

my $p = HTML::Parser->new(api_version => 3);
$p->handler( start => \&start_handler, "tagname, self" );
$p->parse( <<EODATA );
<TD class=tblform3 id=L_listing width=23>123</TD>
EODATA
print "\n";
__END__

For future reference, if you have a problem, you're going to get the
best results here if you can create an example of it that looks
something like that-- short (I went to 21 lines, and that's about as
big as I try to let them get), complete, and clearly state what is
happening, and how that differs from what you wanted to happen.

Also, note that the above example stops parsing after the first </TD>;
if you are going to parse text containing multiple TD elements, you'll
want to read the HTML::Parser documentation to find out better ways of
doing that.

-=Eric
--
Come to think of it, there are already a million monkeys on a million
typewriters, and Usenet is NOTHING like Shakespeare.
-- Blair Houghton.
.



Relevant Pages

  • Re: Model Castle sale - 20% off
    ... Because these goobers have repeatedly spammed RGMW before with untagged ... (Exactly how hard is it to read an FAQ ... time can manage to tag his off topic posts related to Starship Troopers, ... Castleworks pretty much ignored the polite emails about spamming last ...
    (rec.games.miniatures.warhammer)
  • Re: RFD: How To Recognize Bad Javascript Code
    ... Well, if we're being anal, Thomas, "_deprecated_" is not a word. ... Elements consist of tags ... A link or even a hint about where your big ole FAQ is for those of us not arrogant enough to read your mind. ... Maybe you just like being a pompous arrogant, I dunno, but most other people don't care for it. ...
    (comp.lang.javascript)
  • Re: parsing HTML
    ... Andrew Gaffney wrote: ... >> the end tag. ... sub start { ... my $tagname = shift; ...
    (perl.beginners)
  • Re: Build Numbers in ASPX Pages
    ... asp.net faq: http://asp.net.do/faq/ ... doing a google search. ... maybe its own tag. ... each time a developer updates it (we all know how much we developers ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: suppressing margins on
    and tags
    ... the margin on form or href tags so that there is no space before or ... Web Authoring FAQ ... For a tag display:inline gets rid of the line break but not the space at the end of the tag. ...
    (comp.infosystems.www.authoring.html)