Re: decode a string to "Perl's internal form" without Encode module?




Quoth anno4000@xxxxxxxxxxxxxxxxxxxxxx:
Raymundo <gypark@xxxxxxxxx> wrote in comp.lang.perl.misc:
Hello,

At first, I'm sorry that I'm not good at English. :-)

There is a string which is encoded with UTF-8, EUC-KR(Korean), EUC-JP,
or any other encoding scheme.

I want to decode it so that it become a string in "Perl's internal
form" (that is, unicode form.. is it so called "utf8"?).

For example,
$octets = "°¡³ª"; # 2 Korean characters, sequence of 6 Bytes
according to UTF-8
$string = "\x{AC00}\x{B098}"; # 2 Unicode characters. I want to get
this from $octets

It can be done easily using Encode module:
use Encode qw(decode);

$string = decode("UTF-8", $octets);

My question is, if I don't have Encode module in my server and I have

You have the Encode module, it is part of every complete Perl
installation.

....from 5.8 onwards. If you are stuck with 5.6, you should be aware that
that version of Perl did not handle Unicode at all internally, and you
really ought to upgrade.

Ben

--
Every twenty-four hours about 34k children die from the effects of poverty.
Meanwhile, the latest estimate is that 2800 people died on 9/11, so it's like
that image, that ghastly, grey-billowing, double-barrelled fall, repeated
twelve times every day. Full of children. [Iain Banks] ben@xxxxxxxxxxxx
.



Relevant Pages

  • Re: Sending floats over a client-server in Smalltalk
    ... The trick is knowing what to decode them ... Then encode the number in the remaining bytes. ... ByteString>>floatAt: byteIndex ... I could then take a string ...
    (comp.lang.smalltalk)
  • Re: CCertAdmin.SetCertificateExtension
    ... > You must determine how the extension should be encoded and perform that> encoding prior to setting varExt.bstrVal and calling> SetCertificateExtension -- and you must then specify PROPTYPE_BINARY, ... > http://wp.netscape.com/eng/security/cert-exts.html appears to describe the> expected encoding as IA5 string. ... > You can use CryptEncodeObject to encode IA5 strings. ...
    (microsoft.public.platformsdk.security)
  • Re: Unicode File Names
    ... file names get reported as byte strings unless they have the UTF-8 ... zipfile module) uniformly use the character string type, ... chosing between ASCII and CP437 has trade-offs. ... not matter (neither ascii nor cp437 can encode, ...
    (comp.lang.python)
  • Re: high and low bytes of a decimal
    ... If you're trying to fit integers into a bytestream I'm guessing ... you need to encode your integers into a string ... Chances are you're going to want to use big-endian order, ...
    (comp.lang.perl.misc)
  • Re: Base64 encoding/decoding in VB6
    ... > Does anyone know how I can encode binary data to produce a character string ... Private Declare Sub RtlMoveMemory Lib "kernel32.dll" _ ... 'Encode encodes the byte array Source() to a string using the BASE64 ...
    (microsoft.public.vb.general.discussion)