Re: DocumentHTML ?



"~greg" <g_m@xxxxxxxxxxxxxxxxxx> wrote in
news:o-KdnS_QLYiwMH7YnZ2dnUVZ_qOpnZ2d@xxxxxxxxxxx:


"A. Sinan Unur" > wrote ...
"~greg" > wrote ...
...
Any hints, please?

Well, the first one would to use

http://search.cpan.org/~abeltje/Win32-IE-Mechanize-0.009_17/

I have successfully used that module to do some really complicated
automated downloading of about 10 GB of HTML from various web sites
(sorry can't be more specific).

Note the comment at

http://search.cpan.org/~abeltje/Win32-IE-Mechanize-0.009_
17/lib/Win32/
IE/Mechanize.pm#%24ie-%3Econtent

use strict;

use warnings; # do not leave it out.


Repeat:

Do not leave

use strict;
use warnings;

out in your source code (whatever peculiar development environment you
might have).

But what I am really trying to do is to add value to my regular
browser (i.e, IE), --without having to write COM plug-ins
(or whatever they're called these days.)

I don't know what you mean by "the comment" at the link
to cpan's Win32::IE::Mechanize,

Well, if you had followed the link, you would have seen:

$ie->content

Fetch the outerHTML from the $ie->Document->documentElement.

I have found no way to get to the exact contents of the document. This
is basically the interpretation of IE of what the HTML looks like and
beware all tags are upcased :(

but the DESCRIPTION of its current state is not at all encouraging
(---"Don't expect it to be like the mech in that the class is not
derived from the user-agent class (like LWP). WARNING: This is a work
in progress ... ")

and the CAVEATS (---"...This means that you may need
to set your security settings to a low and possibly unsafe level.
...")

sounds down right dire to me.

Note the *may*. I have never needed to tinker with any security settings
and I have used the module for quite complicated tasks where the sites
were so dependent on IE that no other solution would have worked.

You are free not to take advice and try to re-invent the wheel. I am not
likely to waste my time helping you do that.

(Part of what I mean by adding value to IE is ADDING security, not
subtracting it!)

One can choose to use CPAN modules and contribute improvements as one
comes up with them. IMHO, that is both more productive and more useful
to everyone.

But of course I use warnings!

I can only see what you chose to show.

<SNIP>

All this stuff about your configuration and not one comment about
whether the solution I posted worked for you or not (a solution which I
copied straight from Win32::IE::Mechanize). I will now bid you farewell.

Sinan
--
A. Sinan Unur <1usa@xxxxxxxxxxxxxxxxxxx>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html

.



Relevant Pages

  • Security settings do not allow me to send html forms
    ... When I try to log in to certain web sites using Internet Explorer I get a ... message that says " your current security settings do not allow you to send ... How would I change those settings so I could send html forms. ...
    (microsoft.public.windowsxp.security_admin)
  • Unable to send html forms
    ... When I try to log in to certain web sites using Internet Explorer I get a ... message that says " your current security settings do not allow you to send ... How would I change those settings so I could send html forms. ...
    (microsoft.public.windowsxp.security_admin)
  • Re: A toughy.
    ... Do I ignore all those warnings and using add and remove just delete the darned thing? ... I have always used HTML for my personal posts and the HTML doesn't seem to ... Bug 2. ... Paying users do not have this message in their emails. ...
    (microsoft.public.windows.inetexplorer.ie6.browser)
  • Re: Variable remaining undef in one place but not another.
    ... the cwd on to part that starts the HTML, so that the cwd can be used in ... the HTML title tag. ... use warnings; ... sub htmlStart{ ...
    (comp.lang.perl.misc)
  • Re: backreferneces in search pattern
    ... Unless you control the layout of the HTML, you would be much better off ... > I am labouring since a long while already on this filter. ... because of the additional control over the warnings it affords. ... The uninitialized value might have come from a capturing parentheses ...
    (comp.lang.perl.misc)