Re: decode a string to "Perl's internal form" without Encode module?



Raymundo <gypark@xxxxxxxxx> wrote in comp.lang.perl.misc:
Hello,

At first, I'm sorry that I'm not good at English. :-)

There is a string which is encoded with UTF-8, EUC-KR(Korean), EUC-JP,
or any other encoding scheme.

I want to decode it so that it become a string in "Perl's internal
form" (that is, unicode form.. is it so called "utf8"?).

For example,
$octets = "°¡³ª"; # 2 Korean characters, sequence of 6 Bytes
according to UTF-8
$string = "\x{AC00}\x{B098}"; # 2 Unicode characters. I want to get
this from $octets

It can be done easily using Encode module:
use Encode qw(decode);

$string = decode("UTF-8", $octets);

My question is, if I don't have Encode module in my server and I have

You have the Encode module, it is part of every complete Perl
installation.

Text::Iconv module instead, Can I do the same thing using it? If I
can, how?

I don't know the Text::Iconv module, so I can't answer that. If Encode
works for you, use that.

Anno
.



Relevant Pages

  • Re: Unicode File Names
    ... file names get reported as byte strings unless they have the UTF-8 ... zipfile module) uniformly use the character string type, ... chosing between ASCII and CP437 has trade-offs. ... not matter (neither ascii nor cp437 can encode, ...
    (comp.lang.python)
  • Re: detecting a UTF-8 string
    ... I want to check if a certain string is UTF-8 or not. ... For example, if I have a UTF-8 encoded file and an ANSI encoded file, if I ... encode a text as UTF-8 twice. ... the middle of a multi-byte character, ...
    (perl.beginners)
  • detecting a UTF-8 string
    ... I have tried using is_utf8 from the Encode module, and utf8::is_utf8but the string is detected wrong. ... For example, if I have a UTF-8 encoded file and an ANSI encoded file, if I ... Actually, I want to get a text from a database and check if it is UTF-8 encoded, and if it is not, to encode it as UTF-8, because I don't want to encode a text as UTF-8 twice. ...
    (perl.beginners)
  • Re: Null-terminated strings: the final analysis.
    ... My first thought was something like how UTF-8 ... uses the high bits to encode how many bytes the character is composed ... You could use the high bit for the "start of string" marker, ...
    (comp.lang.c)
  • Re: Trouble decoding URL encoded Japanese characters
    ... convert some UTF-8 URL-encoded strings to UTF-8 strings. ... then utf-8 decode the byte sequence to a character string. ... _unescaping_ of a URI can only return a byte vector, not characters. ... into utf-8 if you ask it (but you could encode it into utf-16, ...
    (comp.lang.lisp)