Encoding/characterset/font family confusion



Hi group,

I could use a bit of guidance on the following matter.

I am starting a new project now and must make some decisions regarding
encoding.
Environment: PHP4.3, Postgres7.4.3

I must be able to receive forminformation and store that in a database and
later produce it on screen on the client (just plain HTML).
Nothing special. I do this for many years, but I never paid a lot of
attention to special characters.

A few day ago I discovered that the euro-sign is not defined in all
fontfamilies.
They cannot produce the right sign no matter if I use € or the
hexadecimal equivalent.
After a little research I found I could put font-tags around the euro-sign
with another font-family (Arial in this case) to get the Euro sign.

I am completely graphical impaired, and only understand programmingcode (and
HTML/JavaScript of course) , so this is a weak point on my side, hence this
question.

I target on Europe only at the moment (no need for Chineese
charactersupport)
That said, will the following setup make sense?

Postgresql db encoding scheme: LATIN1
In the headers of all my HTML: content-type: text/html charset: iso-8859-1

A few related questions:
1) Will people be able to copy/paste info from other sources (like
wordprocessing programs and other websites) into my forms?

2) Can I use regular expressions as I am used to (ASCII) in my PHP code?
Will I match e acute, eurosign, etc?

3) Will the roundtrip describe here under have problems with normal expected
european characters?

client copies some text from some source ->
paste in the form ->
receive by PHP ->
insert in Postgresql (or update) ->
retrieve from postgresql ->
display as HTML (with content-type: text/html charset: iso-8859-1)

Is that OK?
Any pitfalls?
Should I maybe use UTF-8?

Any pointers are hugely appriciated because, to me, this is all quite
confusing.

Thanks in advance!

Regards,
Erwin Moller

.



Relevant Pages

  • Re: Strange Characters When Viewing Outlook Express messages
    ... Messages Received in Outlook Express Have Different Characters in the ... messages in the default encoding format regardless of the actual encoding ... changed something with whatever they use to produce the emails. ...
    (microsoft.public.windowsxp.general)
  • Re: Help me!! Why java is so popular
    ... Well, Unicode is not a storage encoding system, or anything like that. ... Unicode is primarily a mapping from characters (in the linguistic conceptual ... French, Russian, Japanese and Korean songs. ...
    (comp.lang.java.programmer)
  • Re: Workable encryption in Tcl??
    ... abstract characters using the concrete UTF-8 encoding, ... character streams and octet streams when doing input and output. ... How does this relate to encryption? ...
    (comp.lang.tcl)
  • Re: Workable encryption in Tcl??
    ... like TCL deals with the abstract ... > abstract characters using the concrete UTF-8 encoding, ... > character streams and octet streams when doing input and output. ...
    (comp.lang.tcl)
  • Re: Trasferire file
    ... The Base64 Content-Transfer-Encoding is designed to ... The encoding and decoding algorithms ... as output strings of 4 encoded characters. ... that this may be done directly by the encoder rather than in ...
    (it.comp.macintosh)