Re: Encoding/characterset/font family confusion
- From: Willem Bogaerts <w.bogaerts@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
- Date: Fri, 30 Mar 2007 12:22:42 +0200
A few day ago I discovered that the euro-sign is not defined in all
fontfamilies.
This is a client issue - nothing you can do about. All you can do is
using HTML entities (€) so the browser knows what you mean (and
maybe switch fonts, depending on how intelligent the browser is)
They cannot produce the right sign no matter if I use € or the
hexadecimal equivalent.
After a little research I found I could put font-tags around the euro-sign
with another font-family (Arial in this case) to get the Euro sign.
I am completely graphical impaired, and only understand programmingcode (and
HTML/JavaScript of course) , so this is a weak point on my side, hence this
question.
I target on Europe only at the moment (no need for Chineese
charactersupport)
That said, will the following setup make sense?
Postgresql db encoding scheme: LATIN1
In the headers of all my HTML: content-type: text/html charset: iso-8859-1
Latin-1 does not include a euro sign at all. However, latin-1 is
sometimes replaced by enhanced encodings (like cp-1252 or Windows
encoding) and the euro sign does appear.
A few related questions:
1) Will people be able to copy/paste info from other sources (like
wordprocessing programs and other websites) into my forms?
In short: yes. It is up to the browser to convert the encoding to the
one used by the OS. I never had any trouble with it.
2) Can I use regular expressions as I am used to (ASCII) in my PHP code?
Will I match e acute, eurosign, etc?
Yes. All latin-1 characters are just one byte. No problem.
3) Will the roundtrip describe here under have problems with normal expected
european characters?
client copies some text from some source ->
paste in the form ->
receive by PHP ->
insert in Postgresql (or update) ->
retrieve from postgresql ->
display as HTML (with content-type: text/html charset: iso-8859-1)
Is that OK?
Any pitfalls?
Should I maybe use UTF-8?
I switched to using utf-8 a few months ago, and I still have trouble
with it. For some vague reason, so can set all encoding startup
variables to utf-8, and connections are STILL made with latin-1 unless
you specifically use the SET NAMES command. Someone wrote an article
"utf-8, love at fifth site". That is so true! It can do a lot, but it is
a real hell to configure all systems to use it. Furthermore, the
implementations are all non-encoding-aware. The problem is that a text
always has an encoding, while a string does not. And texts are treated
as strings, so with every string operation, you will have to make sure
that the correct encoding is used.
Any pointers are hugely appriciated because, to me, this is all quite
confusing.
Here are some links:
http://www.phpwact.org/php/i18n/charsets
http://www.gravitonic.com/downloads/talks/intlphpcon2005/php_unicode.pdf
Best regards
--
Willem Bogaerts
Application smith
Kratz B.V.
http://www.kratz.nl/
.
- Follow-Ups:
- Re: Encoding/characterset/font family confusion
- From: Erwin Moller
- Re: Encoding/characterset/font family confusion
- References:
- Encoding/characterset/font family confusion
- From: Erwin Moller
- Encoding/characterset/font family confusion
- Prev by Date: Re: nooB PhP login using MySQL
- Next by Date: Re: Encoding/characterset/font family confusion
- Previous by thread: Encoding/characterset/font family confusion
- Next by thread: Re: Encoding/characterset/font family confusion
- Index(es):
Relevant Pages
|