Re: RTF and UTF-8 files in Perl



V S Rawat wrote:
Matt Garrish wrote:


"V S Rawat" <VSRawat@xxxxxxxxxxxx> wrote in message
news:xn0e93zst5r39c000@xxxxxxxxxxx

1. How do I open a RTF file as input in Perl and read
formatted ASCII text from it?


You don't. There's no such thing as formatted ASCII text. You could look into a module such as RTF::Tokenizer if you want to parse apart the RTF file and extract the text from it. If you want to know what formatting has been applied you'll also need to check the formatting commands as you go.


2. How do I open a UTF-8 (Unicode) file as output in Perl and
write Unicode text to it?


Assuming your data is not already utf8:


I am reading some unicode char codes, and converting them to
display the unicode chars.

AIUI, it shouldn't be necessary to do this since Perl can work with Unicode characters in UTF8 form.


AIUI, you should just be able to use any UTF8 capable editor (e.g. VIM) to write perl code where literal strings contain the unicode characters you want, just like typing ASCII or Latin-1 characters. These should display correctly on any UTF8 compatible operating system/display with the appropriate font.

  #!/usr/bin/perl
  use strict;
  use warnings;
  print "Unicode glyph at code point 0x0964 is [।] \n";

The above works OK for me, doubtless my newsreader will mangle the UTF8 text I cut & pasted into it :-)

Say, my code is ($cod = ) 0964 (hexadec chars) which should lead
to a unicode char. But, when I do $char = chr($cod), end up
getting "d" in $char.

How do I get 0964 to give the unicode char it represents.

perldoc perlunicode perldoc utf8


[untested]

use Encode;

open(my $out, '>:utf8', 'somefile.dat') or die "Could not open
file: $!"; print $out Encode::encode('utf8', $mydata);
close($out) or die "Could not close file: $!";

Otherwise you could skip the encoding step.

Matt


Thanks Matt.
.