Re: utf-8 encoding
From: Dale King (KingD_at_tmicha.net)
Date: 11/05/03
- Next message: Roedy Green: "Re: Salute to all Java Gurus (?=JTapi)"
- Previous message: Mike Baranczak: "Re: Launch User's Default Browser"
- In reply to: Sascha Obermüller: "utf-8 encoding"
- Next in thread: Sascha Obermüller: "Re: utf-8 encoding"
- Reply: Sascha Obermüller: "Re: utf-8 encoding"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Wed, 5 Nov 2003 13:16:13 -0500
"Sascha Obermüller" <sing-sing@gmx.net> wrote in message
news:bo9gut$7tc$07$1@news.t-online.com...
> I'm building a Crawler that chop different nationalities websites' text
into
> segments and terms.
> My Problem: I have to transform all used encodings (e.g.: 8849-1 etc.) of
> sites to utf-8 format. How can i do that?
Not difficult at all. When reading the text you will be transforming the
bytes read to Unicode. That is done using an InputStreamReader (the JDK1.4
NIO apis have other ways as well) with the encoder set to the particular
encoding. The list of supported encodings is here
http://java.sun.com/j2se/1.4.2/docs/guide/intl/encoding.doc.html.
Then when ouputting you will use an OutputStreamWriter with the encoding set
to UTF8.
For more information you might want to see the internationalization trail of
the tutorial:
http://java.sun.com/docs/books/tutorial/i18n/index.html
And this section in particular:
http://java.sun.com/docs/books/tutorial/i18n/text/convertintro.html
-- Dale King
- Next message: Roedy Green: "Re: Salute to all Java Gurus (?=JTapi)"
- Previous message: Mike Baranczak: "Re: Launch User's Default Browser"
- In reply to: Sascha Obermüller: "utf-8 encoding"
- Next in thread: Sascha Obermüller: "Re: utf-8 encoding"
- Reply: Sascha Obermüller: "Re: utf-8 encoding"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|