quick and easy way to parse XML

From: luca passani (passani_at_eunet.no)
Date: 09/27/04


Date: Mon, 27 Sep 2004 17:21:04 +0100


I am building a servlet that needs to parse XHTML files (with DTD and everything),
in order to figure out the link to the pictures (<img src="getmeifyoucan.gif" />)

I thought I had already solved the problem elegantly when I realized that
the package to parse XML would automatically open a connection to
the a website on the internet to retrieve the DTD!
Since this happens at every request to to the servlets, this behaviour
is unacceptable for my application.

Apparently, there is no simple way to disable this behavior, since
the XML spec demands that the DTD is retrieved.
I tried to treat the XML as a string and remove the DTD reference,
but, unfortunately, the library will fail if an entity is encountered
(&nbsp; for example).

I am puzzled. If I treat the XML as a string, String methods and
regexps are hardly powerful enough to achieve the task.
On the other hand, XML parsing turns up to introduce
even more problems than I am trying to solve (as an aside, wasn't
XML supposed to be simple?)

Is there an easy way to achieve my goal? XML parsing or regexps?

thanks

Luca



Relevant Pages

  • Re: quick and easy way to parse XML
    ... > I am building a servlet that needs to parse XHTML files (with DTD and ... > the a website on the internet to retrieve the DTD! ... > the XML spec demands that the DTD is retrieved. ...
    (comp.lang.java.help)
  • Re: Lets think who will like to say delphi is dying?
    ... if that particular DTD rules it out. ... there are 2 basic usage forms for XML. ... HTML is/canbe quite strict too. ... Parsing and compiling are kind of the same thing.. ...
    (borland.public.delphi.non-technical)
  • Re: standalone validating XML parser for Solaris?
    ... the unix command line to validate large XML files against an XML DTD. ... Older versions only parse but dont validate. ... More recent versions can validate against DTD or Schema. ...
    (comp.text.xml)
  • Re: DTD in browsers
    ... statement that "DTD for XML are always fetched" is totally correct. ... It is definitely fetched by XML parsers in known Web ... I just presume that Thomas was not aware of the current bug in Gecko ... agree that there is only one DOCTYPE and only one DTD mechanics equal ...
    (comp.infosystems.www.authoring.html)
  • Re: A new paradigm
    ... AccuTerm GUI application for the WinDoze types and a ... With Coyote and a finished XML project, I'm sure that all of that could ... Tom H has taken over the DTD stuff recently ... XML-based development under OpenQM should take part in discussions on the ...
    (comp.databases.pick)