Re: Encoding/characterset/font family confusion



Willem Bogaerts wrote:

A few day ago I discovered that the euro-sign is not defined in all
fontfamilies.

This is a client issue - nothing you can do about. All you can do is
using HTML entities (€) so the browser knows what you mean (and
maybe switch fonts, depending on how intelligent the browser is)

They cannot produce the right sign no matter if I use € or the
hexadecimal equivalent.
After a little research I found I could put font-tags around the
euro-sign with another font-family (Arial in this case) to get the Euro
sign.

I am completely graphical impaired, and only understand programmingcode
(and HTML/JavaScript of course) , so this is a weak point on my side,
hence this question.

I target on Europe only at the moment (no need for Chineese
charactersupport)
That said, will the following setup make sense?

Postgresql db encoding scheme: LATIN1
In the headers of all my HTML: content-type: text/html charset:
iso-8859-1

Latin-1 does not include a euro sign at all. However, latin-1 is
sometimes replaced by enhanced encodings (like cp-1252 or Windows
encoding) and the euro sign does appear.

A few related questions:
1) Will people be able to copy/paste info from other sources (like
wordprocessing programs and other websites) into my forms?

In short: yes. It is up to the browser to convert the encoding to the
one used by the OS. I never had any trouble with it.

2) Can I use regular expressions as I am used to (ASCII) in my PHP code?
Will I match e acute, eurosign, etc?

Yes. All latin-1 characters are just one byte. No problem.

3) Will the roundtrip describe here under have problems with normal
expected european characters?

client copies some text from some source ->
paste in the form ->
receive by PHP ->
insert in Postgresql (or update) ->
retrieve from postgresql ->
display as HTML (with content-type: text/html charset: iso-8859-1)

Is that OK?
Any pitfalls?
Should I maybe use UTF-8?

I switched to using utf-8 a few months ago, and I still have trouble
with it. For some vague reason, so can set all encoding startup
variables to utf-8, and connections are STILL made with latin-1 unless
you specifically use the SET NAMES command. Someone wrote an article
"utf-8, love at fifth site". That is so true! It can do a lot, but it is
a real hell to configure all systems to use it. Furthermore, the
implementations are all non-encoding-aware. The problem is that a text
always has an encoding, while a string does not. And texts are treated
as strings, so with every string operation, you will have to make sure
that the correct encoding is used.


Any pointers are hugely appriciated because, to me, this is all quite
confusing.

Here are some links:
http://www.phpwact.org/php/i18n/charsets
http://www.gravitonic.com/downloads/talks/intlphpcon2005/php_unicode.pdf

Best regards

Thank you Willem.
Excactly the kind of info I needed to read.

I like the link to www.joelonsoftware.com/articles/Unicode.html
He describes a type of programmer that excactly fits myself: the one trying
to ignore issues with charactersets. :-)

[quote]
So I have an announcement to make: if you are a programmer working in 2003
and you don't know the basics of characters, character sets, encodings, and
Unicode, and I catch you, I'm going to punish you by making you peel onions
for 6 months in a submarine. I swear I will.

And one more thing: IT'S NOT THAT HARD.

In this article I'll fill you in on exactly what every working programmer
should know. All that stuff about "plain text = ascii = characters are 8
bits" is not only wrong, it's hopelessly wrong, and if you're still
programming that way, you're not much better than a medical doctor who
doesn't believe in germs. Please do not write another line of code until
you finish reading this article.

[/quote]

I think I follow his advise (treat). ;-)
Time to grow up/read up.

Thanks.

Regards,
Erwin Moller
.



Relevant Pages

  • Re: UTF-8 without external modules on Perl 5.0
    ... before general browser support for utf-8 was adequate. ... Users could select an 8-bit web page encoding appropriate to their ... various ways when they attempt to submit characters which cannot be ...
    (comp.lang.perl.misc)
  • Re: [PHP] First stupid post of the year. [SOLVED]
    ... The page encoding is determined by the HTTP ... may provide hints to a browser if the HTTP header ... place with special windoze characters that appear ... like garbage in my browser. ...
    (php.general)
  • Re: HTML entities from input fields
    ... > characters to numeric representations. ... encoding specified, and that MSIE was set to guess the ... Could it be that MSIE could revise its guess, ... An even more exciting possibility is that the browser defaults to ...
    (comp.infosystems.www.authoring.html)
  • Re: accented chars. shown as question marks in black diamonds in mozilla
    ... characters are better off by using UTF-8 encoding. ... Your utf-8 setup (combined with using the proper ... The problem is that a webpage has to tell your browser which encoding it ...
    (Debian-User)
  • Re: Strange Characters When Viewing Outlook Express messages
    ... Messages Received in Outlook Express Have Different Characters in the ... messages in the default encoding format regardless of the actual encoding ... changed something with whatever they use to produce the emails. ...
    (microsoft.public.windowsxp.general)