Re: [PHP] languages and PHP



Your biggest problem will be if you accept any kind of user input which could be in any kind of language.
Depending on your server configuration you'll probably have some serious cleaning and filtering to do.
I often have to employ this line for example:
foreach (array_keys($_POST) as $key) $clean[$key] = mb_convert_encoding($_POST[$key], "UTF-8");

Trying to make sure that you'll receive UTF-8 helps as well:
<form action="form.php" method="post" enctype="multipart/form-data" accept-charset="utf-8">

Magic-Quotes can be a mayor pain in the rear, especially if you're on a host where you can't turn them off.

Other than that I concur, keep everything in UTF-8, explicitly check everything you're not sure about for UTF-8 compliance and explicitly set all your database connections to use UTF-8 and store stuff in UTF-8.

Cheers.

UTF-8, UTF-8

On 27. Sep 2007, at 19:15, Angelo Zanetti wrote:

Hi all.

this is more of a general question but Im sure some people will have experience and also it will be useful to others who are looking for the same answers as I am.

What are the implications of having a site that has many different languages, including latin and non latin characters?

Firstly, can a mysql database handle these characters normally? or would you have to have a table for each language and set a different CHARSET for each language depending on the type of language?

Secondly, PHP and displaying the information: Is there anything that needs to change with regards to PHP and handling of these characters? OR is it able to handle all characters fine?

HTML: I assume the charset changes in the metatag in the <head> of the document?

Is there anything else that I might be missing or other problems that I should be aware of?

Thanks in advance.
.



Relevant Pages

  • Re: Attention: European C/C++/C#/Java Programmers-Call for Input
    ... For any language using a Latin ... Look at existing tools and source code that supports UTF-8, and see how it can make your work easier and give a result that users might actually be able to *use*. ... But you'll find something that does a reasonable job and *will* work perfectly for most programmers who stick to ASCII identifiers. ... A related problem is if you are making identifiers case-insensitive - it's hard to figure out cases for non-ASCII characters. ...
    (comp.arch.embedded)
  • Re: LC_CTYPE=UTF-8 in ksh
    ... And the idea of UTF-8 is to be language independent, ... The "UTF-8" encoding is language ... shall define character classification, case conversion, and other ...
    (comp.unix.shell)
  • IIS 6.0 / UTF-8 Include File Issue
    ... All the language is included in variables in UTF-8 include files. ... IIS seems to implicitly think the page is UTF for text inputs. ... Just making all the pages UTF-8 causes is other display problems as IIS 6.0 ...
    (microsoft.public.inetserver.iis)
  • Microsoft Excel 2003 + UTF-8 issues
    ... So data could be in any language. ... to Microsoft Excel 2003 with following content type and encoding. ... But the MS-Excel doesn't understand UTF-8 charset. ... characters correctly. ...
    (microsoft.public.excel.misc)
  • Re: Attention: European C/C++/C#/Java Programmers-Call for Input
    ... For any language using a Latin ... script for identifiers, the effective string length is 1.0x or rare ... The new programming language supports fonts, ... predates UNICODE and UTF-8 BTW) Additional ...
    (comp.arch.embedded)