Re: Converting "’" to an Apostrophe?



maria wrote:
On Wed, 27 Feb 2008 22:45:02 -0500, "John W. Kennedy"
<jwkenne@xxxxxxxxxxxxx> wrote:

maria wrote:
I am using a CGI program to read XML files and extract their various
items. Somehow, my program converts the apostrophe "&#x2019;" to ...
"\â\€\™". How do I program my CGI program to convert "&#x2019;" to
an apostrophe, "'"? Is there a little CGI code that will convert
all these different strings (including dagger, ellipsis, euro symbol, double quote, etc.) to their ASCII equivalents?
Thank you very much.

maria
You have a serious misunderstanding that is much too complicated to explain here. Learn about Unicode.

The whole modern world is filled with people who feel compelled to
respond to other people's messages when they have absolutely nothing
to say.


Oh dear. Replying to percieved rudeness with more rudeness just puts off potential helpers.

John's reply *did* contain something useful to you.

AIUI John is pointing out that "\â\€\™" is your Unicode apostrophe encoded in UTF-8 but displayed using an incorrect encoding such as Latin-1.

Unicode code-point u2019 is represented in UTF8 as the byte sequence e2 80 99 (shown here in hexadecimal), that same byte sequence, when interpreted as Latin-1 is the three characters ’ (a acute, euro, trademark).

You can learn more about Perl's handling of unicode by typing the command `perldoc perlunicode`


It's a while since I've read the posting guidelines for this newsgroup but I'm pretty sure they suggest you include a short example program that demonstrates your problem. That would make it easier for people to help you identify what you are doing wrong.
.



Relevant Pages

  • Re: Converting "&#x2019;" to an Apostrophe?
    ... an apostrophe, "'"? ... all these different strings (including dagger, ellipsis, euro symbol, double quote, etc.) to their ASCII equivalents? ... Learn about Unicode. ... # First we write some Unicode to a file using UTF-8 encoding. ...
    (comp.lang.perl.misc)
  • Re: n00bie - Use of ANSI and/or UniCode Characters ......
    ... Because the character 0x80 is not the euro symbol. ... Windows codepage Cp-1252 ... uses 0x80 for the euro symbol, but not Unicode. ...
    (comp.lang.java.help)
  • Re: Entities in alt and title text
    ... apostrophe, ... a single quotation mark 9 and an apostrophe. ... That's indeed what the Unicode Standard says, though this is based on a change that caused much dispute. ... umlaut and for diaeresis. ...
    (comp.infosystems.www.authoring.html)
  • Re: Entities in alt and title text
    ... apostrophe, ... That's your personal taste. ... Unicode does not distinguish between ... Would I follow my own logic, I should advocate this distinction. ...
    (comp.infosystems.www.authoring.html)
  • Re: Why the odd characters in a Word document
    ... > trouble getting the text to not have strange characters in some of the ... Word 2004 uses Unicode, but Word v.X has very limited Unicode support. ... present in the Unicode font that Word 2004 is using. ... Your friend's Word v.X probably uses 213 for apostrophe, ...
    (microsoft.public.mac.office.word)