Re: utf8 output from database



In article <L8idne-6U7PDiOfeRVn-rQ@xxxxxxxxxxx>,
Jerry Stuckle <jstucklex@xxxxxxxxxxxxx> wrote:

[...]

> I have seen other problems like this; it's generally the charset isn't
> set up to display that particular character. And I'm not sure UTF-8
> will do it.

The charset parameter doesn't 'do' anything. It simply claims which
character repertoire applies. If the content is utf-8, the charset
should say so. If the content is for example mac-roman, claiming
"charset=utf-8" doesn't *make* it utf-8. You'll need to transliterate to
utf-8 first.

Thus, you need to always, in this order,
- know what character repertoire your original content is in
- optionally transliterate to another character repertoire
- ensure all tools you use handle that character repertoire properly
- provide the correct charset value to let user-agents know how to
handle the data

--
Sander Tekelenburg, <http://www.euronet.nl/~tekelenb/>

Mac user: "Macs only have 40 viruses, tops!"
PC user: "SEE! Not even the virus writers support Macs!"
.



Relevant Pages

  • Re: Changing the default charset for composing messages
    ... > correct default for the localized version of Entourage you're using. ... > UTF-8 if your message contains characters from more than one character set. ... > will just choose the correct charset on the basis of the characters you've ...
    (microsoft.public.mac.office.entourage)
  • Re: DBD::mysql and UTF-8
    ... > data will still be inserted as UTF-8. ... > But then again, I need to set the utf8-flag on $result with decode(), ... is that Mysql has something called 'client character ... hope) that mysql would use database charset or table charset or even ...
    (comp.lang.perl.modules)
  • Re: LWP and Unicode
    ... default charset on usenet, and as much as I dislike ... in a MIME Content-Type header. ... UTF-8) should be acceptable. ... character set - a set of characters in the mathematical sense, ...
    (comp.lang.perl.misc)
  • Re: converting unicode to UTF-8
    ... write a two-byte character count followed by ... some bytes that represent the string in a format that is related ... UTF-8 is a a way of taking a stream/string of Unicode characters (and Java ... Java that conversion is ultimately provided by a "charset", ...
    (comp.lang.java.programmer)
  • Re: Google Beta mangles ASCII-IPA
    ... Surely not, if you mean the _character repertoire_, which is what ... But the character repertoire ASCII is a subset of virtually ... The charset isn't the issue really - as far as the ...
    (sci.lang)