Re: Converting "’" to an Apostrophe?
- From: RedGrittyBrick <RedGrittyBrick@xxxxxxxxxxxxx>
- Date: Thu, 28 Feb 2008 14:37:39 +0000
RedGrittyBrick wrote:
maria wrote:I am using a CGI program to read XML files and extract their various
items. Somehow, my program converts the apostrophe "’" to ...
"\â\€\™".
It's more likely your browser is doing this than your CGI program. Probably because your program lied about the character-set/encoding.
How do I program my CGI program to convert "’" to
an apostrophe, "'"?
You shouldn't.
Is there a little CGI code that will convert
all these different strings (including dagger, ellipsis, euro symbol, double quote, etc.) to their ASCII equivalents?
No, because dagger, ellipsis and euro don't have ASCII equivalents!
Unicode code-point u2019 is represented in UTF8 as the byte sequence e2 80 99 (shown here in hexadecimal), that same byte sequence, when interpreted as Latin-1 is the three characters ’ (a acute, euro, trademark).
You can learn more about Perl's handling of unicode by typing the command `perldoc perlunicode`
Here's another example, but using XML instead of plain text. Perl has so many different modules for handling XML and CGI that it is unlikely my example matches your situation.
The following perl file can be dropped into a CGI directory. The first line may need changing, depending on OS, webserver etc.
--------------------------------- 8< ----------------------------------
#!perl
#
# Demonstrate handling of Unicode characters in a UTF8 encoded XML file
#
# RGB 2008-02-28
#
use strict;
use warnings;
use XML::Simple;
use CGI qw/:standard/;
use CGI::Carp qw(warningsToBrowser fatalsToBrowser);
#
# First we write some Unicode to an XML file using UTF-8 encoding.
#
my $tempfile = "unicode.xml";
open (my $out, '>:utf8', $tempfile) or die "can't open $tempfile because $!\n";
print $out <<ENDXML;
<?xml version="1.0" encoding="UTF-8"?>
<foo>
<bar>
<baz>Here is a Unicode RIGHT SINGLE QUOTE MARK \x{2019}</baz>
</bar>
</foo>
ENDXML
close $out;
#
# Now we read our XML file and use it in a web-page
#
my $foo = XMLin($tempfile);
my $line = $foo->{bar}->{baz};
print header(-charset=>'utf-8'), # NOTE - Default is NOT utf-8
start_html(), h1("Unicode example"), pre($line), hr(), end_html();
--------------------------------- 8< ----------------------------------
In case it's not obvious, the only reason the example first writes a file is so that I don't have to include a separate example data file. The example is completely self contained. I could have used a DATA section but felt that mishandling text file encodings might be part of your problem.
.
- References:
- Converting "’" to an Apostrophe?
- From: maria
- Re: Converting "’" to an Apostrophe?
- From: John W. Kennedy
- Re: Converting "’" to an Apostrophe?
- From: maria
- Re: Converting "’" to an Apostrophe?
- From: RedGrittyBrick
- Converting "’" to an Apostrophe?
- Prev by Date: Re: Generate an associative array from a file
- Next by Date: Re: Generate an associative array from a file
- Previous by thread: Re: Converting "’" to an Apostrophe?
- Next by thread: Re: Converting "’" to an Apostrophe?
- Index(es):
Relevant Pages
|