Unexpected behavior of com.gargoylesoftware.htmlunit



Hi,

If anybody is using "com.gargoylesoftware.htmlunit" packages, would you
please share your experience on the following issue:

Lets we have a html file (test1.html) like below where "<form>" tag is
not placed suitably. However I think it is valid for HTML.

<html>
<head><title>Testing com.gargoylesoftware.htmlunit</title></head>
<body>
<table>
<tr><td>
<form name="frmTest" method="post" action="test2.php">
<table>
<tr><td>Testing com.gargoylesoftware.htmlunit's html processing
behaviour</td></tr>
</table>
</td></tr>
<input type="hidden" name="hidXTNUM" value="50">
</form>
</table>
</body>
</html>

and lets we have codes (like below) to download and process the html
file -

//
String strUrl = "http://some.domain.com/test1.html";;
WebClient webClient = new WebClient();
URL url = null;
try {
url = new URL(strUrl);
} catch (Exception ex) {
System.out.println(ex.toString());
}

HtmlPage page = null;
try {
page = (HtmlPage) webClient.getPage(url);
}
catch (Exception ex) {
System.out.println(ex.toString());
}

HtmlForm frmPage = page.getFormByName("frmTest");
frmPage.getInputByName("hidXTNUM").setAttributeValue("value", "100");
//

What I get from the execution of codes -

1. It downloads the html page
2. Also It can process the form: HtmlForm frmPage =
page.getFormByName("frmTest");
3. It could not set the "hidXTNUM" value in the last statement.

I found that WebClient has processed the <form> tag incorrectly and put
the "hidXTNUM" hidden element outside of the form.

Dumping the html file (test1.html) I found the following text like
below where "hidXTNUM" hidden input is outside of the <form>.

<html>
<head><title>Testing com.gargoylesoftware.htmlunit</title></head>
<body>
<table>
<tr><td>
<form name="frmTest" method="post" action="test2.php">
<table>
<tr><td>Testing com.gargoylesoftware.htmlunit's html processing
behaviour</td></tr>
</table>
</form>
</td></tr>
<input type="hidden" name="hidXTNUM" value="50">
</table>
</body>
</html>

I want "HtmlPage" to tolerate malformed html and process the <form> tag
accurately. By the way, browsers could process this sort of malformed
html accurately. Can anyone help me in solving the issue? Does
"HtmlPage" support malformed html?

Thanks in advance
Manik

.



Relevant Pages

  • Unexpected behavior of com.gargoylesoftware.htmlunit
    ... However I think it is valid for HTML. ... WebClient webClient = new WebClient; ... page = (HtmlPage) webClient.getPage; ... "HtmlPage" support malformed html? ...
    (comp.lang.java.programmer)
  • Re: Load HTML page on dialog in VC6
    ... class we can load a html page. ... i create a html class on dialog and load a html ... myhtml *obj; ... Can somone tell how htmlpage will cover whole dialog and there should be ...
    (microsoft.public.vc.mfc.docview)
  • Load HTML page on dialog in VC
    ... class we can load a html page. ... myhtml *obj; ... Can somone tell how htmlpage will cover whole dialog and there should be no ...
    (microsoft.public.vc.language)
  • Load HTML page on dialog in VC6
    ... i create a html class on dialog and load a html ... myhtml *obj; ... Can somone tell how htmlpage will cover whole dialog and there should be no ...
    (microsoft.public.vc.mfc.docview)
  • Re: Retrieving form data from an asp page
    ... If your getting it that way then you don't even need to decode it into a string, you can just store it in a byte array and then pass it to the UploadData method of the webclient. ... I retrive an response from an aspx page and store the resultant html into an string object like: ... I need to get this data and post it to another aspx page. ...
    (microsoft.public.dotnet.languages.csharp)