Re: How to parse XML which contains & in the text ?



On Wed, 14 Feb 2007 11:31:18 -0000, sohan.soni@xxxxxxxxx <sohan.soni@xxxxxxxxx> wrote:

When Parsing (i.e. converting this XML doc to String) this XML file
using Java code, I am getting following exception.

org.xml.sax.SAXParseException: Next character must be ";" terminating
reference to entity "Value".


Section 2.4 of the XML 1.0 specification:

"The ampersand character (&) and the left angle bracket (<) MUST NOT appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they are needed elsewhere, they MUST be escaped using either numeric character references or the strings "&amp;" and "&lt;" respectively. The right angle bracket (>) may be represented using the string "&gt;", and MUST, for compatibility, be escaped using either "&gt;" or a character reference when it appears in the string "]]>" in content, when that string is not marking the end of a CDATA section."

I think there is some changes/modification needed in DTD to treat the
string in XML which contains & as a literal, instead of expecting some
entity.

You can't fix this in the DTD, the XML is invalid and the parser is correct to reject it.

Adding to this, XML content is not under our control.

Unforunately, the only rational fix *is* to change the XML. Either use &amp; or wrap the element data in a CDATA section. If the XML is controlled by a third part it would be reasonable to request that they change it since it is not really XML at all if it is not valid.

Dan.

--
Daniel Dyer
http://www.uncommons.org
.



Relevant Pages

  • Re: Non-ascii characters in VS.NET service
    ... method that takes a string parameter. ... How is it turning the character into hex? ... What do you mean by "an XML header"? ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Illegal Charaters in path
    ... I am downloading this file using ... Stripping the first character solved the problem though. ... I have a small XML file, I uploaded to a web page. ... XmlDocument.Load doesn't have an overlaod that loads XML from a string. ...
    (microsoft.public.dotnet.languages.csharp)
  • RE: System.ArgumentException: Illegal characters in path
    ... But I don't use any xml string at all in my web ... It is a default data type string and I wonder it ... cannot accept latin character since string accepts all utf-8 characters. ... Microsoft XML 3.0 SP1 ...
    (microsoft.public.dotnet.framework.webservices)
  • RE: Xml deserialization problem..help needed.
    ... "The '*' character, hexadecimal value 0x2A, cannot begin with a name. ... set of characters...in the value of an xml element. ... I am deserializing the xml data into a c# class I have created. ... All I want to do is take a string of xmldata and deserialize it into a class. ...
    (microsoft.public.dotnet.framework.webservices)
  • Re: Regular expressions
    ... string one character at a time, but although it moves from left to right ... through the string, it has the capability to move backwards as well, ... Regular Expressions ... followed by a right angle bracket) ...
    (microsoft.public.dotnet.framework.aspnet)