Re: Creating UNICODE filenames with PERL 5.8

From: Allan Yates (allan_at_yates.ca)
Date: 11/17/03


Date: 17 Nov 2003 13:01:08 -0800

The key was the missing "-C". I didn't clue in from the documentation
that this was important. Once I added that command line parameter, the
file was created with the correct name.

My next step was to read the file name from the directory. However, I
thought I read in some documentation somewhere that 'readdir' is not
UNICODE aware. I seemed to prove this by reading the directory
containing the file I just created. It comes back with a two character
file name that 'ord' into 0xd8 and 0xb6 as you indicated.

Do you know of a method of reading directories to get the UNICODE file
names?

Thanks,

Allan.

"Alan J. Flavell" <flavell@ph.gla.ac.uk> wrote in message news:<Pine.LNX.4.53.0311171438540.22311@ppepc56.ph.gla.ac.uk>...
> On Mon, 17 Nov 2003, Allan Yates wrote:
>
> > I have been having distinct trouble creating file names in PERL
> > containing UNICODE characters. I am running ActiveState PERL 5.8 on
> > Windows 2000.
>
> N.B I have limited expertise in this specific area, but some of the
> locals around here seem to look to me to answer Unicode questions of
> any kind, so I'll give it a try, as long as you take the answers with
> the necessary grains of salt...
>
> First important question is - have you set the option for wide
> character API in system calls?
>
> > For a simple test, I picked a UNICODE character that could be
> > displayed by Windows Explorer. I can select the character(U+0636) from
>
> that'd be Arabic letter DAD, right?
>
> Its utf-8 representation will be two octets: 0xd8, 0xb6.
>
> > 'charmap' and cut/paste into a filename on Windows Explorer and the
> > character displays the same as it does in 'charmap'. This proves that
> > I have the font available.
>
> (I think that's the least of your worries at the moment...)
>
> > When I attempt to create the same filename with PERL, I end up with a
> > filename two characters long: ض
>
> Those look like 0xd8 and 0xb6 to me...
>
> At a quick glance, I suspect we are seeing the pair of octets that
> represent the character in utf-8 (Perl's internal representation)
> rather than as what Win32 would use, which AIUI is utf-16LE (which in
> this case would come out as 0x3606, IINM). However, I'm not sure that
> (other than for diagnostic purposes) you should ever need to tangle
> with it in that form, since Perl ought to know what to do in a (wide)
> system call.
>
> The system call is evidently treating them as two one-byte characters,
> hence my question about wide system calls. Look for the reference to
> wide system calls in the perlrun page, and the other references to
> which it links.
>
> > I somebody could point me in the correct direction, I would very much
> > appreciate it. I have read the UNICODE documents included with PERL as
>
> OK, but there are also some Win32-specific documents/web-pages that
> come with the ActivePerl distribution. In some situations they might
> be just what you need.
>
> > well searching the newgroups and the web, and everything appears to
> > indicate this should work.
>
> If the above is not the answer, then maybe Win32API::File has
> something for you - but I've never been there myself, so don't pay too
> much attention to that.
>
> > Perl program:
>
> But did you start it with the -C option, or set the wide system calls
> thingy? I think that may prove to be the key.
>
> Good luck, and please report your findings.



Relevant Pages

  • How to decode JavaScripts encodeURIComponent in Perl.
    ... who struggle with the Perl language and all it's myriad idiosyncracies. ... character sets, but I acknowledge that if you *are* dealing with what I ... they find they can't use their own native character-set in a URI, ... So now we have Unicode -- a vastly superior term, to some people, ...
    (comp.lang.perl.misc)
  • Wide character notation, was Re: How to NOT use utf8.
    ... > So the author suggests that there may be a problems for unicode, ... in the Perl documentation). ... The Unicode code for the desired character, in hexadecimal, ... Unicode strings ...
    (comp.lang.perl.misc)
  • Re: Creating UNICODE filenames with PERL 5.8
    ... :> I have been having distinct trouble creating file names in PERL ... I'm not so sure about UNICODE... ... :> character displays the same as it does in 'charmap'. ... Imagine if all that hardware still used 16 or 24 bit memory addresses. ...
    (comp.lang.perl.misc)
  • Re: Proposal: require 7-bit source strs
    ... >> character encodings make more sense. ... Programs that show text still need to know which character set the ... there are many non-'global' applications too where Unicode is ... I don't know Perl 6, but Perl 5 is an excellent example of how not do to ...
    (comp.lang.python)
  • Re: VB - Ascii to Unicode and then Unicode to UTF-8 conversion (Very desperate!!)
    ... Latin together) then you have to use a Unicode column type. ... AscW returns the real Unicode character ... for Chinese characters, ... then the next thing to worry about is your CSV file. ...
    (microsoft.public.vb.general.discussion)