Re: locale specific input
- From: "Oliver Wong" <owong@xxxxxxxxxxxxxx>
- Date: Thu, 29 Jun 2006 19:19:10 GMT
"Miguel De Anda" <miguel@xxxxxxxxxxxxx> wrote in message news:44a0e7b9$0$9844$88260bb3@xxxxxxxxxxxxxxxxxxxx
I'm writing a little app that users rss feeds from a website and I've found
that some feeds contain items in different languages. So far, I've only had
feeds that are in Japanese (eucjp). I've managed to get the feed to save
and display properly by adding the charset to my inputstream (or
inputstreamreader, I forgot). Anyway, the problem I'm having is that I'd
like to be able to read the xml, and then figure out the language that each
item is in. It seems that only a few are in Japanese, and I wouldn't be
surprised if they are sometimes mixed with items from different languages.
I've found the "Java port of Mozilla charset detector" and it works ok, but
it still won't be able to handle what I'm trying to do.
I'm using an rss library to parse the xml and give me simple objects to work
with. I'd hate to parse the xml manually by looking at bytes and then
feeding byte arrays to the charset detection library, this seems like a
dumb way to go (plus it means a lot more work).
Has anybody dealt with this in the past? I can't seem to find any solutions
on the net.
Thanks.
RSS uses XML. My understanding is that XML is by default encoded in UTF-8, and an XML parser should assume it's receiving UTF-8 data until it receives an encoding declaration stating otherwise. In other words, this should all work automatically.
Possibilities why it might not be working:
(1) The RSS library is buggy.
(2) The author of the RSS feed set their encoding declaration incorrectly.
- Oliver
.
- Follow-Ups:
- Re: locale specific input
- From: John W. Kennedy
- Re: locale specific input
- References:
- locale specific input
- From: Miguel De Anda
- locale specific input
- Prev by Date: Re: Java server app "disppears"
- Next by Date: Re: Writing apps for Windows platform in Java? Why?
- Previous by thread: locale specific input
- Next by thread: Re: locale specific input
- Index(es):
Relevant Pages
|
|