Re: Encoding/characterset/font family confusion



A few day ago I discovered that the euro-sign is not defined in all
fontfamilies.

This is a client issue - nothing you can do about. All you can do is
using HTML entities (€) so the browser knows what you mean (and
maybe switch fonts, depending on how intelligent the browser is)

They cannot produce the right sign no matter if I use € or the
hexadecimal equivalent.
After a little research I found I could put font-tags around the euro-sign
with another font-family (Arial in this case) to get the Euro sign.

I am completely graphical impaired, and only understand programmingcode (and
HTML/JavaScript of course) , so this is a weak point on my side, hence this
question.

I target on Europe only at the moment (no need for Chineese
charactersupport)
That said, will the following setup make sense?

Postgresql db encoding scheme: LATIN1
In the headers of all my HTML: content-type: text/html charset: iso-8859-1

Latin-1 does not include a euro sign at all. However, latin-1 is
sometimes replaced by enhanced encodings (like cp-1252 or Windows
encoding) and the euro sign does appear.

A few related questions:
1) Will people be able to copy/paste info from other sources (like
wordprocessing programs and other websites) into my forms?

In short: yes. It is up to the browser to convert the encoding to the
one used by the OS. I never had any trouble with it.

2) Can I use regular expressions as I am used to (ASCII) in my PHP code?
Will I match e acute, eurosign, etc?

Yes. All latin-1 characters are just one byte. No problem.

3) Will the roundtrip describe here under have problems with normal expected
european characters?

client copies some text from some source ->
paste in the form ->
receive by PHP ->
insert in Postgresql (or update) ->
retrieve from postgresql ->
display as HTML (with content-type: text/html charset: iso-8859-1)

Is that OK?
Any pitfalls?
Should I maybe use UTF-8?

I switched to using utf-8 a few months ago, and I still have trouble
with it. For some vague reason, so can set all encoding startup
variables to utf-8, and connections are STILL made with latin-1 unless
you specifically use the SET NAMES command. Someone wrote an article
"utf-8, love at fifth site". That is so true! It can do a lot, but it is
a real hell to configure all systems to use it. Furthermore, the
implementations are all non-encoding-aware. The problem is that a text
always has an encoding, while a string does not. And texts are treated
as strings, so with every string operation, you will have to make sure
that the correct encoding is used.


Any pointers are hugely appriciated because, to me, this is all quite
confusing.

Here are some links:
http://www.phpwact.org/php/i18n/charsets
http://www.gravitonic.com/downloads/talks/intlphpcon2005/php_unicode.pdf

Best regards
--
Willem Bogaerts

Application smith
Kratz B.V.
http://www.kratz.nl/
.



Relevant Pages

  • Re: UTF-8 without external modules on Perl 5.0
    ... before general browser support for utf-8 was adequate. ... Users could select an 8-bit web page encoding appropriate to their ... various ways when they attempt to submit characters which cannot be ...
    (comp.lang.perl.misc)
  • Re: UTF8: cgi ist staerker als ich
    ... use encoding "utf8" ... use locale ist sogar äusserst gefährlich und unberechenbar. ... dass Latin-1 weder hebräische noch kyrillische ... hab' ich schon festgestellt - wenn ich die cgi header auf utf-8 ...
    (de.comp.lang.perl.cgi)
  • Re: [PHP] Ongoing encoding issues
    ... encoding set to Latin-1. ... If I set my browser encoding ... Latin-1 and enter the data I get that odd symbol, if I set it to UTF-8 ... Firefox trusts the headers. ...
    (php.general)
  • Re: Tiger: W??rterb??cher nach-installieren?
    ... Warum Du mitten drin das Encoding deines Newsreaders umstellst, ... > Ok, also UTF-8, und das kann nicht mehr passieren? ... ob man UTF-8 oder Latin-1 benutzt. ... Next by Date: ...
    (de.comp.sys.mac.misc)
  • Re: Special characters pulled from wordpress database
    ... If the browser says it is utf-8, and the text file shows question marks ... My browser was set to UTF-8 encoding. ... and everything displayes fine with the browser set to UTF-8. ...
    (comp.lang.php)