Re: How to read unicode



JR <jriker1@xxxxxxxxx> wrote:
I have a java program that parses text files of metadata and does
various activities on it. I recently was asked to start working with
Japanese Unicode characters but not sure where to begin if I need ot
do anything specific for this. This program runs in a DOS window on a
Western character set PC. Some questions that come to mind that I was
hoping to get input on:

1. Would it just work as is if I was running in a DOS window on a
Japanese version of Windows XP?

There are two ways to approach I/O. One is to use the system default
character encoding. The other is to specify a character encoding. If
you've used the system default character encoding, then it would
probably work on a Japanese system with Japanese characters. If you've
specified an encoding, then it probably won't.

You should always prefer specifying an encoding when possible. However,
the encoding you use has to match the encoding of the "metadata text
files" you are reading. If you can't control those, then your choice is
made for you. You need to find out from whomever writes these files
what encoding they use.

2. If in US, do I have to convert the characters from their graphical
representation to their Unicode numeric equivalent?

You can't draw characters to the console that aren't in the character
set for that console. So you'll either need to convert your code to a
GUI, or give up on drawing Japanese characters on a non-Japanese
terminal.

3. If so is there some way to parse the source data and convert it
from like MS Mincho to Unicode?

I don't know what MS Mincho is. Sorry.

4.Can I save this data if converted as a standard text file?

Sure you can save it. Again, you can save it either in a specific
encoding, or with the platform default. If the text contains characters
that can't be encoded with that encoding, they will appear as '?'
characters.

--
Chris Smith
.



Relevant Pages

  • Re: DB2 UTF-8 ODBC double conversion
    ... Unicode considers the various UTFs flavors completely equivalent. ... Just various encoding forms for the same thing. ... they can't use your database to represent as many characters as ... are required in order to support the GB-18030 Chinese National standard. ...
    (microsoft.public.vc.mfc)
  • Re: utf8 and ftplib
    ... It opens a new local file using utf8 encoding and then reads from a file ... characters from the source file (e.g. foreign characters, ... Is there any way that I can correctly retrieve a utf8 encoded file from an FTP server? ... to be decoded to unicode on being read later. ...
    (comp.lang.python)
  • Re: TCHAR string?
    ... According to Microsoft's documentation the 'A' functions are "ANSI" ... although Unicode is not itself an ISO standard; ... just as much an ISO encoding as any of the ISO encodings ... Windows) *was* to be able to represent any of the characters of the ...
    (microsoft.public.vc.mfc)
  • Re: Unicode support in Smalltalk
    ... Characters 128-255, as they mean both "the bytes 128-255 used in the ... encoding of a String" and "the Unicode Characters whose code points are ... Characters represent the encoding, UnicodeCharacters represent, well, ... EncodedString class that holds explicitly the encoding, ...
    (comp.lang.smalltalk)
  • Re: Using Japanese text with Delphi
    ... on it and enter something in Japanese. ... correctly input, show, read and write Japanese characters. ... You have to go to Unicode, because then your characters will appear ... The Tnt Unicode controls are just about the only Delphi controls which can ...
    (borland.public.delphi.non-technical)

Loading