Re: PHP & Unicode



In article <1130157575.614621.249910@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>,
slavi.marinov@xxxxxxxxx wrote:

[...]

> Let's say I have a database and a php script that communicates with the
> database. The database has some kind of character encoding - let's say
> UTF-8, UTF-16, or something different.

[...]

> My question is, how do you tell the PHP interpreter what encoding to
> use when displaying the text that the mysql queries return? In other
> words, will the $row[0] be displayed correctly regardless the database
> encoding, provided the database encoding and the HTML <meta> tags are
> the same

No. First off you'll need to use a character repertoire that makes sense
on the Web. utf-8 makes sense, utf-16 does not. So if your database uses
utf-16, you'll need to transliterate to utf-8 before serving.[*]

In addition you need to ensure that the user-agent (a browser for
example) is informed correctly of which character repertoire applies.
(Unless you want to rely on chance this is *always* a requirement, with
any character repertoire. Not just when you work with utf-8.) You do so
by having your server accompany the document with an appropriate
Content-Type header. For example, if it's a utf-8 encoded HTML file,
your server must say Content-Type: text/html; charset=utf-8. (Whether
the file name extension is ".php" or ".html" is irrelevant)

An alternative to configuring the server to do so is to have PHP
generate the Content-Type header:

header("Content-Type: text/html; charset=utf-8");

Contrary to popular belief, a META HTTP-EQUIV is *not* a realiable
alternative.


Notes:
- I'm not entirely sure what you mean with "displaying". PHP doesn't
display. Nor does a Web server. It is the *browser*'s job to "display"
(whether visually or otherwise).
- all this assumes what you're trying to do is meant for the Web. An
intranet situation may have different requirements and possibilities.

[*] How exactly to do transliteration in PHP I can't tell you. I'm sure
it can be found in the documentation. It might also be that your
database allows you to request output in a specific character
repertoire. If so, that route might be more efficient.

--
Sander Tekelenburg, <http://www.euronet.nl/~tekelenb/>

Mac user: "Macs only have 40 viruses, tops!"
PC user: "SEE! Not even the virus writers support Macs!"
.



Relevant Pages

  • Re: [PHP] Preventing SQL Injection/ Cross Site Scripting
    ... It's a shame that so many PHP installations have them enabled, and a huge disappointment that PHP is actually distributed with this stuff enabled! ... encoding data for output to an HTML document. ... characters into 5, 6, or 7-byte strings, if you already provided the correct character set in the Content-Type HTTP header. ... For anything that gets written to a database or used for a query, I suggest escaping the data using a function specifically designed for that database. ...
    (php.general)
  • Re: [PHP] Preventing SQL Injection/ Cross Site Scripting
    ... It's a shame that so many PHP ... encoding data for output to an HTML document. ... characters into 5, 6, or 7-byte strings, if you already provided the ... anything that gets written to a database or used for a query, ...
    (php.general)
  • Re: Determining encoding (Thai)
    ... I know the Query Analyzer will display the data ... I would expect query analyzer to get the character set right ... So, you need to find out what encoding the original uses, and then ... Data is encoded from characters to bytes and stored in the database ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: JTextField Unicode Mysql
    ... The interpretation of a bitpattern as a string of characters is called 'encoding'. ... What you are seeing here is that the DOS box uses an encoding with a name/number I can't remember right now while the textfield probably uses an encoding named 'UTF8'. ... I am creating a text editor that will store data in mySql database. ... I am not able to display this character in textbox. ...
    (comp.lang.java.gui)
  • Re: utf8 output from database
    ... >I'm having problems outputting data from my MySQL database. ... Playa Del Ingl?s. ... What is the encoding of the page in PHPMyAdmin? ... How does the character appear in the page source in PHPMyAdmin? ...
    (comp.lang.php)