Re: character encoding in CGI.pm

From: Shawn Corey (shawn.corey_at_sympatico.ca)
Date: 11/25/04


Date: Wed, 24 Nov 2004 21:10:33 -0500

David Lee Lambert wrote:
> I noticed that, without setting any options, CGI.pm output of a
> simple page starts as follows:
>
> Content-Type: text/html; charset=ISO-8859-1
>
> <?xml version="1.0" encoding="utf-8"?>
>
>
> Now, is the webpage in ISO-8859-1, utf8, or some other encoding? Or
> is XML defined such that this is a perfectly valid situation? If I
> send a string containing Unicode characters (with \x{}), IE 6 detects
> the page as Latin-1 and doesn't show those characters properly; if I
> manually tell it that the encoding is UTF-8, it displays the
> characters properly.
>
> This is using perl 5.6.1; I'm not sure what verion of CGI.p I have.
>
> --
> DLL

The web page is both. The ISO-8859-1 encoding is used for the HTTP
transfer. All bytes, including the web page, while be interpreted as
ISO-8859-1 encoded until handed off to the display engine in the
browser. Then it will be interpreted as UTF-8. This normally does not
mean much since the bytes after the blank line are usually not processed
by the HTTP decoding code; they are simply passed to the next part.

If you are using Perl 5.6, add 'use utf8;' to the code. For any Perl,
you can add:

print handler( -charset => 'UTF-8' );

for the Content-Type handler.

See perldoc CGI for details.

    --- Shawn



Relevant Pages

  • Re: Want Input boxes to accept unicode strings on Standard Window
    ... If ther encoding is not specified, then the encoding is assumed to be ... Ah, UTF-8. ... That would be wrong according to the standard. ... when producing XML files. ...
    (microsoft.public.vc.mfc)
  • Re: tDOM doesnt support encoding=ASCII?
    ... a Tcl channel then Tcl will ... specifically asked for binary encoding), so any XML encoding declaration ... but when tdom sees it it is almost certainly UTF-8. ...
    (comp.lang.tcl)
  • Re: UTF-8 encoding problem
    ... Declaration having the "encoding" attribute at the begining of file ... What I am saying is the "encoding" of your physical file is different then the logical file (the xml itself). ... It sounds like your physical file is UTF-8, while I'm concerned your logical file is whatever, where whatever is the text you blindly copied from an MSDN article. ...
    (microsoft.public.dotnet.languages.vb)
  • Re: Want Input boxes to accept unicode strings on Standard Window
    ... simply writes std::string to and from XML. ... Why does it need to understand UTF-8? ... And if you read an XML and the encoding ... You can also read/write ANSI using std::string, ...
    (microsoft.public.vc.mfc)
  • Re: automating the SQL warning and the choice of text format
    ... automatically select 'yes' and 'utf-8' rather than changing the registry, ... In order to get the correct encoding, I believe that you have to do the ... You need one of those for each data source. ... For a comma-delimited file using UTF-8 encoding, ...
    (microsoft.public.word.mailmerge.fields)