Re: character sets

From: NC (nc_at_iname.com)
Date: 02/01/05


Date: 1 Feb 2005 13:40:47 -0800

WindAndWaves wrote:
>
> The encoding that I use on my webpage is:
> <META HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8">
>
> When people enter new data I use
> $newvalue = htmlentities($_POST["newvalue"], ENT_QUOTES)
>
> I then SQL this into my table and next I display the value
> e.g. <DIV CLASS="content">'.$newvalue.'</DIV>
>
> All of this works fine, BUT, funny characters that may have been
> entered through the form (e.g. Word-Style quotation marks,
> e-accent-grave, etc..) are taking on a whole new life.
> I put in an e with an accent and it changed into a chinese
> character.

You have two options to fix this:

1. Convert your strings from UTF-8 into, say, ISO-8859-1,
before storing them in the database:

$string = iconv('UTF-8', 'ISO-8859-1', $string);

You will need your PHP installation to be compiled
with iconv extension to do that.

2. Set your MySQL server's character set to UTF-8.

First, check if you currently have UTF-8 support.
Run this query:

SHOW VARIABLES;

find the `character_sets` variable in the output and
verify that `utf8` is listed among the character sets
currently supported. If there's no support for UTF-8,
install or configure it (see MySQL documentation for
details).

If and when you have UTF-8 support, you can set
UTF-8 as the default character set for your database:

ALTER DATABASE db_name
DEFAULT CHARACTER SET utf8;

Alternatively, you can change character set setting
on a per-connection basis by sending this query:

SET NAMES 'utf8';

first thing after establishing a connection to the
database.

Cheers,
NC



Relevant Pages

  • Re: Creating XML from an Oracle DB
    ... answer to what must be a FAQ when generating XML from Oracle but I ... an Oracle 9.2 database using DBMS_XMLQUERY and DBMS_LOB. ... Oracle's utf-8 support for the XML packages is shameful. ... You can determine your database's character set with this query: ...
    (perl.dbi.users)
  • RE: LANG system environment variable
    ... So, if I understand you correctly, I would be better off leaving the system the way it is and change the database data to UTF-8? ... > character set the other expects. ... > to store umlaut characters and accents. ...
    (Fedora)
  • Re: Im sure glad I didnt buy a Mac Mini!
    ... MS isn't making you send UTF-8 from your Mac to people who have trouble ... >>>> No, it's just Outlook. ... > emails from maccies are not using old versions of Outlook. ... But then you probably have no idea what the difference is between a character set and a font. ...
    (comp.sys.mac.advocacy)
  • Re: problem with charset
    ... When creating tables in my database, the fields where I think it might ... be necessary are defined as UTF-8, while the rest is usually Latin-1: ... textColumn VARCHARCHARACTER SET 'utf8', ... This ensures that transfering the data between the DB and my scripts ...
    (comp.lang.php)
  • Re: Any portable way get a filename in UTF-8 or to get the FS encoding ?
    ... A reasonable convention to use is that all file names be stored in a normalized utf-8. ... The question of what to do where a process's character set is unable to convert from utf-8. ... If you want interoperability then a very good solution is to use a common base. ... It gets to the point that once you have decided you need to have multiple processes with different locale encodings to talk to each other, then using a common encoding like utf-8 and deprecating all other encodings becomes an interesting solution. ...
    (comp.unix.programmer)