Opening Unicode files?



Does Perl ship with a simple method of opening Unicode files? E.g., I
would like to have something like

open my $fh, '< :BOM0or(utf8)', $filename

where BOM0or does what Perl itself does for Perl files: it looks for the
first 4 bytes; given that a Perl file starts in ASCII, one can detect
BOMs, can detect UTF-16LE, UTF-16BE, UTF-32LE, UTF-32BE, or see that it
is none of the above (then the arument in parens explains what to do;
e.g., Perl itself does BOM0or(latin1)).

Likewise, if one does not know that the file starts in ASCII, one can
still detect BOM (which does not appear often in the encodings I know)
so one could do :BOMor(utf8). Do not recollect seeing such support
for files open()ed by Perl programs; is there?

Thanks,
Ilya
.



Relevant Pages

  • RE: Pattern Match
    ... Rob, can you explain the details of that replace? ... > I am very new to Perl, but I sense a great adventure ahead after just ... not a list of characters. ... to the ASCII value of the control character grabbed by the regex. ...
    (perl.beginners)
  • how do make certain that no input (keyboard + mouse paste) is outside of 7-bit ASCII in a perl s
    ... of 7-bit ASCII in a perl script? ... I'm looking at Programming Perl, ... and one numeric capturing regex. ... was a 16-bit Unicode character? ...
    (perl.beginners)
  • Re: Tk und locale
    ... dass das irgendein Superset von ASCII sein muss, ... Wenn du also eine Operation wie ucor $str ... dann ist es ein Bug. ... schickt jede Menge Patches an die perl 5 porter. ...
    (de.comp.lang.perl.misc)
  • Re: Trying to read a multiline string
    ... In this format, Perl can read it, no problem. ... Is this an ascii representation of a binary line of data? ... code for endline is used to represent a number in binary data. ...
    (perl.beginners)
  • Re: Sort collating sequence
    ... But, Perl uses the ASCII ... collating sequence when comparing strings. ... > The system's default behavior is to sort punctuation first, ...
    (microsoft.public.windowsxp.help_and_support)