Re: UTF-8 encoding

From: Chris Smith (cdsmith_at_twu.net)
Date: 04/21/04


Date: Tue, 20 Apr 2004 23:16:20 -0600

Nishi,

See below for an answer.

As a side note, would it be possible to convince you to choose a
reasonable line wrap size in the future? It's a real pain to reformat
all the quoting when responding to your message.

Nishi Bhonsle wrote:
> In an servlet application, I need to pass a UTF-8 encoded writer
> to an Java API, which will process the contents of a file through
> the writer. Thereafter, the file can be saved on the users machine
> through a OS specific "File Save As" dialog box. The UTF-8 encoding
> takes into account non-ascii data(users in non-english locale).
>
> I noticed that this works fine on IE as well as Netscape for English
> locale but for non-english locale, IE does not pop up the FileSaveAs
> box but displays the contents of the file in a same browser window
> whereas Netscape saves the file as a 0 byte file.
>
> Can someone please let me know what could be wrong with the below
> code?

Sure. Here are a few comments.

First, you are writing a file in UTF-8 encoding, then turning around and
reading that file with the system's default encoding. That has
undefined results, and will only do something sensible if the system
default encoding happens to be UTF-8. If you know that you've written
the file in UTF-8, you should create an InputStreamReader using the UTF-
8 encoding explicitly, and use that to read the file back again.

Second, the code you posted doesn't compile. There's some confusion
there where newLine is sometimes treated as a String (and declared as a
String), but used elsewhere as if it were a StringBuffer. I'm going to
assume that this is related to copying the code into your newsreader.
Copy/paste works great for that, and saves you from these kinds of
unintentional mistakes.

Third, you completely destroy any shot of writing working code when you
use the StringBufferInputStream class. There's a very good reason that
it's deprecated.

Fourth, why on earth do you have one variable called 'newline' and
another called 'newLine'. You'd have to try really hard to come up with
something so bug-prone as that.

If you can clarify what you mean to accomplish by everything past where
you create the BufferedInputStream, perhaps I can help more. Looks to
me like the only purpose of any code past this:

> //this page has the "download" property set, so the temp.txt will be saved on the users machine by providing the user with a File //SaveAs dialog box.
> try {
> java.io.Writer utf8Writer = new OutputStreamWriter(new FileOutputStream("temp.txt",false), "UTF-8");
>
> <APIname>(utf8Writer); //API call
> utf8Writer.flush();
>
> java.io.InputStream is = new BufferedInputStream(new FileInputStream("temp.txt"));

... is to break things. Just do the above, and as far as I can tell you
are done.

-- 
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.
Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation


Relevant Pages

  • Re: DBD::ODBC and character sets
    ... you have and accept UTF-8 encoded data does mean you need to "use ... encoding" but if your script is encoded in xxx you need "use encoding ... Perl sees the left-hand side of eq as a string literal containg sixcharacters encoded as ISO-8859-1 ...
    (perl.dbi.users)
  • question about character encodings with Tcl interpreter embedded in C++
    ... I'm struggling with an encoding problem. ... I have a utf-8 string that I would like to convert to iso8859-1. ... puts; ...
    (comp.lang.tcl)
  • Re: PEP 263 status check
    ... > chosing windows-1252 as the source encoding. ... in the string module, the string methods and all through ... encoded data (including utf-8 encodings) ... character that is outside of the 7-bit ascii subset. ...
    (comp.lang.python)
  • Re: SimpleXmlRpcServer and character encoding
    ... The client is written in java using Apache XmlRpc library 2.0. ... Is there any solution other than sending all string values in Base64 ... And unicode IS NOT utf-8. ...
    (comp.lang.python)
  • Re: JFS default behavior (was: UTF-8 in file systems? xfs/extfs/etc.)
    ... >> Definitely a good reason. ... Only the shortest encoding of a character is valid ... That means any code which transcodes UTF-8 to another encoding (such ...
    (Linux-Kernel)