locale specific input



I'm writing a little app that users rss feeds from a website and I've found
that some feeds contain items in different languages. So far, I've only had
feeds that are in Japanese (eucjp). I've managed to get the feed to save
and display properly by adding the charset to my inputstream (or
inputstreamreader, I forgot). Anyway, the problem I'm having is that I'd
like to be able to read the xml, and then figure out the language that each
item is in. It seems that only a few are in Japanese, and I wouldn't be
surprised if they are sometimes mixed with items from different languages.
I've found the "Java port of Mozilla charset detector" and it works ok, but
it still won't be able to handle what I'm trying to do.

I'm using an rss library to parse the xml and give me simple objects to work
with. I'd hate to parse the xml manually by looking at bytes and then
feeding byte arrays to the charset detection library, this seems like a
dumb way to go (plus it means a lot more work).

Has anybody dealt with this in the past? I can't seem to find any solutions
on the net.

Thanks.

- Miguel

--
Posted via a free Usenet account from http://www.teranews.com

.



Relevant Pages

  • Re: locale specific input
    ... that some feeds contain items in different languages. ... surprised if they are sometimes mixed with items from different languages. ... I'm using an rss library to parse the xml and give me simple objects to work ... My understanding is that XML is by default encoded in UTF-8, and an XML parser should assume it's receiving UTF-8 data until it receives an encoding declaration stating otherwise. ...
    (comp.lang.java)
  • Re: locale specific input
    ... that some feeds contain items in different languages. ... It seems that only a few are in Japanese, ... surprised if they are sometimes mixed with items from different languages. ... I'm using an rss library to parse the xml and give me simple objects to work ...
    (comp.lang.java)
  • Re: SQL2005: shred the feeds or store as XML?
    ... The first thing I did was review Michael's blog and observe much in ... general that is going to prove insightful but nothing, nada, zip, zero, as ... Otherwise store it as XML in the ... Shred the feeds or use ...
    (microsoft.public.sqlserver.xml)
  • Re: SQL2005: shred the feeds or store as XML?
    ... Michael ... Otherwise store it as XML in the ... Shred the feeds or use ... >>SQL2005 to store the whole file as XML? ...
    (microsoft.public.sqlserver.xml)
  • Re: Form feed within string sent from Mainframe to PC
    ... Or write a very simple program that uses an XML parser that will display the ... same across all ascii platforms and is notorious for causing problems. ... > We are experimenting in writing XML using IBM Mainframe CoBOL. ... > This huge XML line isn't easy to read, so we would like to add line feeds. ...
    (comp.lang.cobol)