Re: RTF and UTF-8 files in Perl
- From: Ian Wilson <scobloke2@xxxxxxxxxxxxx>
- Date: Mon, 31 Oct 2005 10:24:20 +0000 (UTC)
V S Rawat wrote:
Matt Garrish wrote:
"V S Rawat" <VSRawat@xxxxxxxxxxxx> wrote in message news:xn0e93zst5r39c000@xxxxxxxxxxx
1. How do I open a RTF file as input in Perl and read formatted ASCII text from it?
You don't. There's no such thing as formatted ASCII text. You could look into a module such as RTF::Tokenizer if you want to parse apart the RTF file and extract the text from it. If you want to know what formatting has been applied you'll also need to check the formatting commands as you go.
2. How do I open a UTF-8 (Unicode) file as output in Perl and write Unicode text to it?
Assuming your data is not already utf8:
I am reading some unicode char codes, and converting them to display the unicode chars.
AIUI, it shouldn't be necessary to do this since Perl can work with Unicode characters in UTF8 form.
AIUI, you should just be able to use any UTF8 capable editor (e.g. VIM) to write perl code where literal strings contain the unicode characters you want, just like typing ASCII or Latin-1 characters. These should display correctly on any UTF8 compatible operating system/display with the appropriate font.
#!/usr/bin/perl use strict; use warnings; print "Unicode glyph at code point 0x0964 is [।] \n";
The above works OK for me, doubtless my newsreader will mangle the UTF8 text I cut & pasted into it :-)
Say, my code is ($cod = ) 0964 (hexadec chars) which should lead to a unicode char. But, when I do $char = chr($cod), end up getting "d" in $char.
How do I get 0964 to give the unicode char it represents.
perldoc perlunicode perldoc utf8
[untested]
use Encode;
open(my $out, '>:utf8', 'somefile.dat') or die "Could not open file: $!"; print $out Encode::encode('utf8', $mydata); close($out) or die "Could not close file: $!";
Otherwise you could skip the encoding step.
Matt
Thanks Matt.
.
- Follow-Ups:
- Re: RTF and UTF-8 files in Perl
- From: Dr.Ruud
- Re: RTF and UTF-8 files in Perl
- References:
- RTF and UTF-8 files in Perl
- From: V S Rawat
- Re: RTF and UTF-8 files in Perl
- From: Matt Garrish
- Re: RTF and UTF-8 files in Perl
- From: V S Rawat
- RTF and UTF-8 files in Perl
- Prev by Date: OO Perl, iterators
- Next by Date: Re: OO Perl, iterators
- Previous by thread: Re: RTF and UTF-8 files in Perl
- Next by thread: Re: RTF and UTF-8 files in Perl
- Index(es):