Re: UTF-8 without external modules on Perl 5.0



On Sun, 21 May 2006, Peter J. Holzer wrote:

Yohan N. Leder wrote:

[ in a context where an old version of Perl has been imposed ... ]

- HTML forms generated by the Perl scripts must be able to handle
all which may be usually typed in English and French language,
including euro sign.

If you only need English and French (and won't be needing Czech next
year because your company opens a branch office in Prague) you are
probably better off using an 8-bit character set which covers those
two languages. ISO-8859-15 and Windows-1252 come immediately to
mind.

Yes, this could indeed resolve the stated problem. (Windows-1252
could handle Czech also, no? - but not Polish etc). It's what the
search engines' query pages (altavista, google etc.) were doing some
years ago, before general browser support for utf-8 was adequate.
Users could select an 8-bit web page encoding appropriate to their
language, and then submit their query - the browser would submit their
input using that same encoding.

This is probably the wrong place to go into any detail on that, but if
I might mention
http://ppewww.ph.gla.ac.uk/~flavell/charset/form-i18n.html

There's absolutely nothing you can do to prevent users from typing or
copy/pasting oddball characters into the form, and browsers react in
various ways when they attempt to submit characters which cannot be
expressed in the chosen encoding. So it's necessary to design server
side scripts to be able to cope in some way when that happens - if
only to recognise the defective input and politely refuse it.

Depending on the circumstances, it may be that file upload would be a
preferable implementation, as you hinted?

With sufficiently modern software, on the other hand, I'd strongly
recommend getting things to work with utf-8. Practically any browser
of any consequence today can deal with that (as you may deduce from
the fact that the search engine queries no longer bother the user with
the encoding options, but simply use utf-8 without further comment).

best
.



Relevant Pages

  • =?utf-8?B?UmU6IFN0cmluZyAiw6LigqzihKIiIHRyYW5zbGF0ZWQgdG8gYXBvc3Ryb3BoZS4gV2h5Pw==?=
    ... it works), though it seems to use mostly just Ascii characters, representing ... but the author is not making the best possible use of UTF-8. ... They don't map it to ASCII apostrophe, ... Latin 1 encoding. ...
    (alt.html)
  • Re: Special Characters in Query String
    ... I've had numerous problems with utf-8, ... in common characters in spanish not geting displayed. ... > available for encoding of characters. ... > If you can display your characters with ISO-8859-1, ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: DBD::ODBC and character sets
    ... you have and accept UTF-8 encoded data does mean you need to "use ... encoding" but if your script is encoded in xxx you need "use encoding ... Perl sees the left-hand side of eq as a string literal containg sixcharacters encoded as ISO-8859-1 ...
    (perl.dbi.users)
  • Re: Character Encoding
    ... > to decode the text when I read it from the database so I can display it ... I'm using UTF-8 character encoding. ... > characters that were UTF-8 incompatible came along for the ride, ...
    (comp.lang.java.programmer)
  • Re: Print Spanish characters in Perl?
    ... and ensure that your file is saved in the UTF-8 format. ... encoding then your display device expects. ... forgetting to specify UTF-8 as charset. ... To avoid this kind of problem, make sure that all the characters are ...
    (comp.lang.perl.misc)

Loading