Re: UTF-8 without external modules on Perl 5.0
- From: Yohan N. Leder <ynleder@xxxxxxxxxx>
- Date: Mon, 22 May 2006 01:43:29 +0200
In article <1f3p4e.vp7.ln@xxxxxxxxxxx>, hjp-usenet2@xxxxxx says...
perl 5.005 also doesn't know about wide characters. A character is a
byte, so there is no way to have a character outside of the range
0..255. So you don't need any decoding routines because you couldn't
decode a euro sign anyway :-).
Hum, effectively, I didn't realize all the aspect about this charset
problem. In fact, in my first idea, I thought I could do that :
1/ indicate (simply by comment) that the string in code (the
configurable one I told about and others written by me) have to use
character in iso-8859-* table only.
2/ indicate a charset of utf-8 for generated html pages and convert
anything to utf-8 prior to print to browser.
3/ take anything which come from html forms as being utf-8 and, then,
convert-it back to iso-8859-* immediately on receiving to be Perl
5.00503 compliant
And for this I found a pure Perl module called Unicode::UTF8simple
containing to/from conversion sub I could copy/paste in my own script
(indicating the original author in header of course)... But as you state
: own to convert from UTF8 to an iso-8859-* when the given UTF-8
character (like euro sign) is not representable in the target charset ?
What do you think ? Does this way definitively out or is there a
workaround ?
So if you need to work with unicode strings in perl 5.005, the best way
is probably to work with raw UTF-8-encoded strings. That means that a...
Reading your list of needed changes, I'm not very ready to go toward
this nightmare. Well, maybe I could develop two version :
- A one for Perl 5.00503 with a solution not found at this time :
depending grandly of your reply about the way (if any) to use this
UTF8simple converter above, or you iso-8859-15 solution below.
- A more evoluated one for more recent interpreters. So, just a question
: how does it's simple in these Perl release : do I just have to
indicate "use utf8;" and that's all ? Not clear in my mind.
This is a quite silly policy: If you can do something stupid or harmful
with a module, you can do the same thing with a script.
But I know that some sites have such a policy, and they probably won't
change it, so you're probably stuck with it.
The reason why is very simple : the team of developer who work in
majority for these servers are a PHP ones and they have conviced the
direction of this company to privillegiate PHP *against* Perl. So, not
silly, wicked for others developers who have to use Perl a day or
another !
If you only need English and French (and won't be needing Czech next
year because your company opens a branch office in Prague) you are
probably better off using an 8-bit character set which covers those two
languages. ISO-8859-15 and Windows-1252 come immediately to mind.
Yes, we will only target English and French, and even if things could be
accessed by people from countries without these language as natives,
they will input using these two languages (and will read in these two
languages two, of course). Well, effectively, choice of a single-byte
charset could be something which could make me happy... If really right
! Also, two questions :
1/ I found some (in ng and on web) who said iso-8859-15 was not a good
choice : but I don't knw exactly. What could be wrong with this charset?
2/ Windows-1252 seems to be not often choosen : why ? because of it's
"Windows" reminder in name?
Where do people edit these strings? Directly on the server? Or do they
edit the file on their Windows machine and then upload it to the server
via FTP (or whatever)?
Both :-( These scripts will be edited under Win32 and Unix flavors, will
run under Win32 and Unix flavors.
Thank you for your help Peter, it becomes a little less confused from
your post.
.
- Follow-Ups:
- Re: UTF-8 without external modules on Perl 5.0
- From: Peter J. Holzer
- Re: UTF-8 without external modules on Perl 5.0
- References:
- UTF-8 without external modules on Perl 5.0
- From: Yohan N . Leder
- Re: UTF-8 without external modules on Perl 5.0
- From: Peter J. Holzer
- UTF-8 without external modules on Perl 5.0
- Prev by Date: Re: How to detect text file encoding in Perl
- Next by Date: Re: How to detect text file encoding in Perl
- Previous by thread: Re: UTF-8 without external modules on Perl 5.0
- Next by thread: Re: UTF-8 without external modules on Perl 5.0
- Index(es):
Relevant Pages
|