Re: ignoring chinese characters parsing xml file



On 10/23/07, Fabian López <fabian@xxxxxxxxxxxx> wrote:
Hi,
I am parsing an XML file that includes chineses characters, like
^??啖啖才是?.???锍才是? or ヘアアイロン... The problem is that I get an error like:
UnicodeEncodeerror:'charmap' codec can't encode characters in position....
The thing is that I would like to ignore it and parse all the characters
less these ones. So, could anyone help me? I suppose that I can catch an
exception that ignores it or maybe use any function that detects this
chinese characters and after that ignore them.

Sorry, that's not Chinese but Japanese. And I don't know which
encoding is in the source xml, because most of xml files should be
encoded in utf-8, and it'll be ok for CJK characters, and how did you
get this error?

--
I like python!
UliPad <<The Python Editor>>: http://code.google.com/p/ulipad/
meide <<wxPython UI module>>: http://code.google.com/p/meide/
My Blog: http://www.donews.net/limodou


Relevant Pages

  • Re: Character Set Problem?
    ... "Brendan Reynolds" wrote: ... was no problem until I created a test file with accented characters, ... so the actual encoding and the declaration did not match. ... I have an Access 2002 database that imports an XML file. ...
    (microsoft.public.access.modulesdaovba)
  • Re: Character Set Problem?
    ... was no problem until I created a test file with accented characters, ... so the actual encoding and the declaration did not match. ... I have an Access 2002 database that imports an XML file. ...
    (microsoft.public.access.modulesdaovba)
  • Re: Converting "&#x2019;" to an Apostrophe?
    ... all these different strings (including dagger, ellipsis, euro symbol, double quote, etc.) to their ASCII equivalents? ... Perl has so many different modules for handling XML and CGI that it is unlikely my example matches your situation. ... # Demonstrate handling of Unicode characters in a UTF8 encoded XML file ... # First we write some Unicode to an XML file using UTF-8 encoding. ...
    (comp.lang.perl.misc)
  • Re: Unicode Reading
    ... characters. ... > hexa decimal format(representing the unicode) or entities while saving as ... > fonts) appear as character itself in the xml file while the symbols ... > from "symbol font"(or any non-standard font) appear as entities in ...
    (microsoft.public.mac.office.word)
  • Re: Clean out accents in French names
    ... and builds an XML file. ... A few Latin-1 characters are not taken care of by the above function: ...
    (comp.lang.perl.misc)