Question on conversion from UTF8 to Shift_JIS (or ISO-2022-JP)



Hi,

Sorry this is a cross-post in Perl.Unicode.

I've some questions about converting Japanese from UTF8 to Shift_JIS
(or finally ISO_2022_JP) under Unix as follows:

UTF8 ==> Shift_JIS ==> ISO-2022-JP

The first conversion from UTF8 to Shift_JIS is done using Text::Iconv.
The second conversion from Shift_JIS to ISO-2022-JP is done using
mathematic algorithm.

However, I found that some Japanese characters are corrupted during the
first conversion (UTF8 ==> Shift_JIS). For example, the Japanese
character (or symbol) ~ can be found in Shift_JIS but it was
converted to ? after the first conversion.

Does any one know a perfect (or better) way to convert from UTF8 to
Shift_JIS (or ISO-2022-JP)?

I know that ISO-2022-JP is a subset of Unicode but I couldn't find a
perfect way to convert from UTF8 to ISO-2022-JP and that's why others
suggest me to first convert from UTF8 to Shift_JIS and then from
Shift_JIS to ISO_2022_JP mathematically. Your comment is highly
aprpeciated.

Thanks,
Wing

.



Relevant Pages

  • Re: Most Komplex Kanji ?
    ... >> conversion to read UTF8, and usually I can't be bothered. ... I'm still using tin in a kterm, linked to a JISified version of vi ...
    (sci.lang.japan)
  • Re: confusion and case problems: utf8 <-> iocharset
    ... the terminology in vfat.txt is not consistent with what actually ... It says "iocharset" but in fact it is not a charset used for IO ... you want to use "utf8" systemwidely. ... iocharset=utf8 doesn't have a case conversion table. ...
    (Linux-Kernel)
  • Re: Racy NLS behaviour in FAT (and possible other fs)
    ... conversion. ... If I mount a vfat fs with utf8 and then create a file ... with invalid utf-8 sequences, the file will briefly exist with these ...
    (Linux-Kernel)
  • Re: Bug#333776: linux-2.6: vfat driver in 2.6.12 is not properly case-insensitive
    ... >> which make no conversion, thus leading to the problem I outlined above. ... For fixing this bug cleanly, ... unfortunately "utf8" has problem too. ... send the line "unsubscribe linux-kernel" in ...
    (Linux-Kernel)
  • Re: AC Unit Conversions
    ... > The first conversion is correct (12,000 btu/h is 1 ton.) ... The real hp to watts conversion is defined as 746 ...
    (misc.rural)