Re: non SGML character escape
- From: Srini <Srinihello@xxxxxxxxx>
- Date: Mon, 16 Mar 2009 09:20:45 -0700 (PDT)
On Mar 13, 1:44 pm, Tom Anderson <t...@xxxxxxxxxxxxxxx> wrote:
On Fri, 13 Mar 2009, Srini wrote:
I have some typographical/special characters in our database which
comes from user input by pasting from documents. I have to take that
data and create xml file. Run the xml through W3C xml validator, it is
failing and saying that
"Line 37231, Column 135:nonSGMLcharacter number 25
You have used an illegal character in your text. HTML uses the
standard UNICODE Consortium character repertoire, and it leaves
undefined (among others) 65 character codes (0 to 31 inclusive and 127
to 159 inclusive) that ...... and so on"
I am using Apache Commons Lang package escape utils class
StringEscapeUtils.escapeXml() method and I also tried using
StringEscapeUtils.escapeHtml() methods. Which both of them are failed
to escape these characters.
I think what the error report is saying is that there's no way to escape
the characters, because they're characters that just don't exist in
unicode. It's just like if you had Klingon characters in your database.
Your solution is to remove the characters, and either replace them with
something equivalent that is in unicode, or forget about them. ASCII
character 25 is EM, 'end of medium' - what does that mean in your system?
How on earth are your users entering it?
Can some one point me in the right direction, is there an utility that
I can use for this???
Even though XML Validator fails can XSLT validation by pass these
characters when it parse this xml??
It's likely but not certain that XML parsers will choke on the characters
(a standards-compliant parser will), and since parsing is a prerequisite
for XSLT processing, you can't rely on that being possible.
tom
--
THE DRUMMER FROM DEF LEPPARD'S ONLY GOT ONE ARM!
I believe these are the characters coming from users doing copy/paste
from applications like word documents. So the solution would be just
ignore that particular element when parser chokes?? and asking user
not to do cut and past from word processor?? but how can you control
users???
.
- Follow-Ups:
- Re: non SGML character escape
- From: Mark Space
- Re: non SGML character escape
- References:
- non SGML character escape
- From: Srini
- Re: non SGML character escape
- From: Tom Anderson
- non SGML character escape
- Prev by Date: Re: test java Assimilation
- Next by Date: Re: test java Assimilation
- Previous by thread: Re: non SGML character escape
- Next by thread: Re: non SGML character escape
- Index(es):
Relevant Pages
|