Re: Creating UNICODE filenames with PERL 5.8

From: Malcolm Dew-Jones (yf110_at_vtn1.victoria.tc.ca)
Date: 11/19/03


Date: 18 Nov 2003 16:40:22 -0800

Ben Morrow (usenet@morrow.me.uk) wrote:
: allan@yates.ca (Allan Yates) wrote:
: > I have been having distinct trouble creating file names in PERL

: Perl or perl, not PERL.

: > containing UNICODE

: I'm not so sure about UNICODE...

: > For a simple test, I picked a UNICODE character that could be
: > displayed by Windows Explorer. I can select the character(U+0636) from
: > 'charmap' and cut/paste into a filename on Windows Explorer and the
: > character displays the same as it does in 'charmap'. This proves that
: > I have the font available.
: >
: > When I attempt to create the same filename with PERL, I end up with a
: > filename two characters long: ض

: OK, your problem here is that Win2k is being stupid about Unicode: any
: sensible OS that understood UTF8 would be fine :).

Hum, NT has been handling unicode for at least ten years (3.5, 1993) by
the simple expedient of using 16 bit characters. It is hardware that is
stupid, by continuing to use ancient tiny 8 bit elementary units.

Imagine if all that hardware still used 16 or 24 bit memory addresses.
Imagine if all our communication and hardware backbones still actually
transmitted data in single digit bit sizes.

Character size was always a compromise between functionality and memory.
Character size continually increased from the first character manipulating
electronic equipment of the (gee, way way back 1930's or so, believe it or
not) until the 1980's, when it suddenly solidified into a standard
elementary unit that was still a compromise in terms of size, but is now
clearly too small.

Character size remains frozen due to one of murphy's laws regarding the
success of hardware first build using compromises that were appropriate
twenty years ago.



Relevant Pages

  • How to decode JavaScripts encodeURIComponent in Perl.
    ... who struggle with the Perl language and all it's myriad idiosyncracies. ... character sets, but I acknowledge that if you *are* dealing with what I ... they find they can't use their own native character-set in a URI, ... So now we have Unicode -- a vastly superior term, to some people, ...
    (comp.lang.perl.misc)
  • Re: Creating UNICODE filenames with PERL 5.8
    ... I didn't clue in from the documentation ... It comes back with a two character ... Do you know of a method of reading directories to get the UNICODE file ... >> I have been having distinct trouble creating file names in PERL ...
    (comp.lang.perl.misc)
  • Wide character notation, was Re: How to NOT use utf8.
    ... > So the author suggests that there may be a problems for unicode, ... in the Perl documentation). ... The Unicode code for the desired character, in hexadecimal, ... Unicode strings ...
    (comp.lang.perl.misc)
  • Re: Proposal: require 7-bit source strs
    ... >> character encodings make more sense. ... Programs that show text still need to know which character set the ... there are many non-'global' applications too where Unicode is ... I don't know Perl 6, but Perl 5 is an excellent example of how not do to ...
    (comp.lang.python)
  • Re: VB - Ascii to Unicode and then Unicode to UTF-8 conversion (Very desperate!!)
    ... Latin together) then you have to use a Unicode column type. ... AscW returns the real Unicode character ... for Chinese characters, ... then the next thing to worry about is your CSV file. ...
    (microsoft.public.vb.general.discussion)