using HTML::Parser
Date: 30 Mar 2004 07:19:31 -0800
Hi,
I need to parse a HTML file, and extract all the text in it (not the
images, tags). I cannot figure out how to do it. I have the HTML file
saved in my local directory. I need to have the text printed/saved in
my local directory. I would really appreciate any help in this regard.
Thanks,
Divya Rao
Relevant Pages
- Re: using HTML::Parser
... > I need to parse a HTML file, and extract all the text in it (not the ... > images, tags). ... > saved in my local directory. ... (comp.lang.perl) - Re: using HTML::Parser
... > I need to parse a HTML file, and extract all the text in it (not the ... > images, tags). ... > saved in my local directory. ... (comp.lang.perl) - Re: Possible Payload inside PDF or HTML files
... 1-->by simply viewing the source of the html files, look for something similar to an image pointing to some external link, where -I suppose your requests for displaying the images, are logged. ... This is done quite commonly with HTML emails where an img src is that of a PHP script that records when the script is accessed (and possibly by whom, by coordinating record ids with the emails sent and the script doing the recording), but instead of returning text or something of that nature, it sets the content-type header to image/gif and pushes a 1x1 invisible gif to the client at the end of the routine. ... Is there a way to know if exist a payload inside a PDF or HTML File, ... (Security-Basics) - Re: Compacting HTML into DOC
... Is there a way to take an HTML file (with images linked in) and roll it into a single .doc file that would still contain all the images? ... This little dashboard report now needs to be passed around the office *and* commented on while it makes its rounds. ... (microsoft.public.word.conversions) - Re: Possible Payload inside PDF or HTML files
... There are some rudimentary tracking that can be done in the HTML files, by checking the logs on their server for included images or other external assets referenced with full URLs. ... This is done quite commonly with HTML emails where an img src is that of a PHP script that records when the script is accessed (and possibly by whom, by coordinating record ids with the emails sent and the script doing the recording), but instead of returning text or something of that nature, it sets the content-type header to image/gif and pushes a 1x1 invisible gif to the client at the end of the routine. ... Is there a way to know if exist a payload inside a PDF or HTML File, ... (Security-Basics) |
|