need help reading source code: HTML::Parser
ioneabu_at_yahoo.com
Date: 12/31/04
- Next message: Paul Lalli: "Re: need help reading source code: HTML::Parser"
- Previous message: Peter Scott: "Re: another newbie stupid question"
- Next in thread: Paul Lalli: "Re: need help reading source code: HTML::Parser"
- Reply: Paul Lalli: "Re: need help reading source code: HTML::Parser"
- Reply: Matt Garrish: "Re: need help reading source code: HTML::Parser"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: 31 Dec 2004 08:41:54 -0800
I was curious about why using regex for parsing HTML was so terrible,
at least in simple cases. I can see why line breaks can complicate
things, but with the relatively small size of most HTML files and power
of today's computers, it should not be a big deal to load the whole
file into a string and remove the line breaks first.
In doing a little searching through the newsgroup, I found a lot of
people saying HTML parsing with regex is always a bad idea but not
explaining clearly why.
My next thought was to read through the code of HTML::Parser and get a
general idea of how they do it or at least how complicated the process
really is.
I used IE 6 to look at the source at cpan.org and the ctrl-f find
command to search through the document. It seems that all of the work
is done in a sub named parse. For example:
$p->parse();
I have searched up and down the source for HTML::Parser and I cannot
find a sub parse. There is a sub parse_file which calls parse.
I searched for any use, require, or do statements and found:
require HTML::Entities;
which I thought might be useful, but was not what I was looking for.
So where is this parse sub? If it is not in HTML::Parser, where is it
and how is HTML::Parser importing it?
Thanks!
wana
- Next message: Paul Lalli: "Re: need help reading source code: HTML::Parser"
- Previous message: Peter Scott: "Re: another newbie stupid question"
- Next in thread: Paul Lalli: "Re: need help reading source code: HTML::Parser"
- Reply: Paul Lalli: "Re: need help reading source code: HTML::Parser"
- Reply: Matt Garrish: "Re: need help reading source code: HTML::Parser"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|