Re: DocumentHTML ?
- From: "A. Sinan Unur" <1usa@xxxxxxxxxxxxxxxxxxx>
- Date: Tue, 27 Feb 2007 15:43:29 GMT
"~greg" <g_m@xxxxxxxxxxxxxxxxxx> wrote in
news:o-KdnS_QLYiwMH7YnZ2dnUVZ_qOpnZ2d@xxxxxxxxxxx:
17/lib/Win32/
"A. Sinan Unur" > wrote ...
"~greg" > wrote ...
...
Any hints, please?
Well, the first one would to use
http://search.cpan.org/~abeltje/Win32-IE-Mechanize-0.009_17/
I have successfully used that module to do some really complicated
automated downloading of about 10 GB of HTML from various web sites
(sorry can't be more specific).
Note the comment at
http://search.cpan.org/~abeltje/Win32-IE-Mechanize-0.009_
IE/Mechanize.pm#%24ie-%3Econtent
use strict;
use warnings; # do not leave it out.
Repeat:
Do not leave
use strict;
use warnings;
out in your source code (whatever peculiar development environment you
might have).
But what I am really trying to do is to add value to my regular
browser (i.e, IE), --without having to write COM plug-ins
(or whatever they're called these days.)
I don't know what you mean by "the comment" at the link
to cpan's Win32::IE::Mechanize,
Well, if you had followed the link, you would have seen:
$ie->content
Fetch the outerHTML from the $ie->Document->documentElement.
I have found no way to get to the exact contents of the document. This
is basically the interpretation of IE of what the HTML looks like and
beware all tags are upcased :(
but the DESCRIPTION of its current state is not at all encouraging
(---"Don't expect it to be like the mech in that the class is not
derived from the user-agent class (like LWP). WARNING: This is a work
in progress ... ")
and the CAVEATS (---"...This means that you may need
to set your security settings to a low and possibly unsafe level.
...")
sounds down right dire to me.
Note the *may*. I have never needed to tinker with any security settings
and I have used the module for quite complicated tasks where the sites
were so dependent on IE that no other solution would have worked.
You are free not to take advice and try to re-invent the wheel. I am not
likely to waste my time helping you do that.
(Part of what I mean by adding value to IE is ADDING security, not
subtracting it!)
One can choose to use CPAN modules and contribute improvements as one
comes up with them. IMHO, that is both more productive and more useful
to everyone.
But of course I use warnings!
I can only see what you chose to show.
<SNIP>
All this stuff about your configuration and not one comment about
whether the solution I posted worked for you or not (a solution which I
copied straight from Win32::IE::Mechanize). I will now bid you farewell.
Sinan
--
A. Sinan Unur <1usa@xxxxxxxxxxxxxxxxxxx>
(remove .invalid and reverse each component for email address)
comp.lang.perl.misc guidelines on the WWW:
http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
.
- Follow-Ups:
- Re: DocumentHTML ?
- From: ~greg
- Re: DocumentHTML ?
- From: ~greg
- Re: DocumentHTML ?
- References:
- DocumentHTML ?
- From: ~greg
- Re: DocumentHTML ?
- From: A. Sinan Unur
- Re: DocumentHTML ?
- From: ~greg
- DocumentHTML ?
- Prev by Date: LWP:Authen:NTLM
- Next by Date: Re: Perl threads - capturing value returned from sub
- Previous by thread: Re: DocumentHTML ?
- Next by thread: Re: DocumentHTML ?
- Index(es):
Relevant Pages
|