Re: clueless student trying to parse XML

From: Emanuel Bulic (emanuelbulic_at_yahoo.com)
Date: 10/17/03


Date: 17 Oct 2003 12:38:28 -0700

To begin... html is not always parseable by an xml parser. rules
regarding html are less strict than xml, which means valid HTML is not
equivalent to valid xml... many web pages contain invalid html
(missing closed tags, etc) that will not pass xml well-formedness.

Next... become familiar with XML processing and java. buy an XML for
Java book, and use online resources. apache is your best friend.

XML technologies (java)

JAXP- java api for xml proc. standard api for xml processing.
Xerces - open source xml parser by apache... xml.apache.org
xalan - " " " xml transformer by apache. same place

that should keep you busy for a week...

"sal achhala" <none@none.com> wrote in message news:<bmosr2$jve$1@south.jnrs.ja.net>...
> I need pointing in the right direction regards writing a parser to parse
> HTML/XML in order to extract the data from it.
>
> Im writing a prototype for the final application but bieng fairly new to
> java I'm totally at a loss where to start.
>
> I'm getting quite frustrated as i havent got a clue where to start (ive read
> some of the javadoc & have a pile of java reference books)
>
> Ive read up on the DOM/SAX standards and java's support for XML parsing but
> still no idea how to actually get coding.
>
> The final application is aimed at extracting data which meets user critera
> from a given website.
>
> thanks
>
> sal
>
> ps this is a final year University Computer Science Project
>
> more deails at http://www.mellowmoose.org/project.html



Relevant Pages

  • Re: Recommendations for a web application framework?
    ... Yet I still see this duality wherein there's a Java class representing the form. ... you can not eliminate this duality if you want to use HTML templates and insert a value into a specific position in the template. ... To be perhaps a little clearer, what JSF does is ensure that the correspondence between identifiers and names in Java code and HTML content is one-way. ... Java may have a verbose syntax, but it's still much better than XML, especially with a decent IDE. ...
    (comp.lang.java.programmer)
  • Re: onclick - reassign new function with parameters after displaye
    ... It creates an HTML document which looks and acts correctly. ... The orginal XSL is creating a record that shows data from two different ... The form reads in those global variables. ... XML Node that forms the context of your little XSL. ...
    (microsoft.public.scripting.jscript)
  • Re: Inhaltstyp eines Strings
    ... Typen sind normaler Wiki-Text, Java, XML, HTML oder allgemeine ... kommen auch nur Fragmente, wie im zweiten und vierten Beispiel. ...
    (de.comp.lang.java)
  • Re: onclick - reassign new function with parameters after displaye
    ... As far as XML data, it is not on the client side, and my limted ... as global parameter the info I need to get correct record from HTML, ... needed into XSL proscessing. ... The form reads in those global variables. ...
    (microsoft.public.scripting.jscript)
  • Re: ruby html (or xhtml) forms class...
    ... xx is a library designed to extend ruby objects with html, xhtml, and xml ... xml or xhtml as clean looking and natural as ruby it self. ... attributes may be passed to any tag method as either symbol or string. ...
    (comp.lang.ruby)