Re: F<utf8.pm> is evil (was: XML::LibXML UTF-8 toString() -vs- nodeValue())
- From: Eric Pozharski <whynot@xxxxxxxxxxxxxx>
- Date: Fri, 17 Apr 2009 03:23:39 +0300
On 2009-04-15, Peter J. Holzer <hjp-usenet2@xxxxxx> wrote:
On 2009-04-14 23:45, Eric Pozharski <whynot@xxxxxxxxxxxxxx> wrote:
On 2009-04-12, Peter J. Holzer <hjp-usenet2@xxxxxx> wrote:
I've thought a lot. I should admit, whenever I see C<use 'utf8';>
instead of C<use encoding 'utf8';> I'm going nuts.
I'm going nuts when I see "use encoding". It does way too many magic
things and most of them only half-baked. Here be dragons! Don't use that
stuff, if you value your sanity.
I'm puzzled. However, I should admit, that I've yet found those dark
corners of C<use encoding 'utf8';>. And I'm not afraid of dragons.
And with C<use encoding 'utf8';> you'll get the same character string,
and lots of other useful stuff.
Correction: Lots of stuff which looks useful at first glance but which
works in subly different ways than you expect (and some stuff which you
simply don't expect). "use utf8" OTOH does only one thing, and it does
it well.
I fail to see how utfizing both literals and symbols makes F<utf8.pm>
doing one thing. I don't say that it doesn't do it well.
(I just can't get why anyone would need[...]
implicit upgrade of scalars into characters and yet then maintain wide
IO.) But my point isn't that F<encoding.pm> outperforms F<utf8.pm>.
I'm scared. I consider F<utf8.pm> kind of Pandora box. Read this, if
you can
проц запросить {
мое ($имяфайла) = @_;
}
I admit, it's imposible to write this with F<utf8.pm> alone
Right. "sub" still is "sub", not "проц", and "my" is still "my", not
"мое". Your example is more like a Russian(?) equivalent to
Lingua::Romana::Perligata.
And frankly, "проц запросить" is only marginable less readable to me
than "sub zaprosit". I need a dictionary for both, and "запросить" at
least has the advantage that I can actually find it in a Russian
dictionary :-). If you want your software to be maintainable by authors
from other countries, stick to English and write "sub request". If you
want to use Russian names you might as well go all the way and use
cyrillic letters instead of a transliteration into latin letters which
neither those who speak Russian nor those who don't speak Russian
understand easily.
So you suggest that localizing Perl (or actually any other language) is
kind of online dictionary providers conspiracy? I didn't think it this
way, should consider.
I bet you've seen this before,
I've seen German versions of BASIC in the 1980's. They weren't a huge
success, to put it mildly. About the only successful localization of a
programming language i can think of is MS-Excel (and I guess it's
debatable if this is even a programming language (without VBA) - is it
turing-complete?).
That's in case you have an option. There're places you have no option.
Someone could say "Who the heck would need that stupidity?" Idiots. It
still surprises me how many idiots are around. They would scream:
"Look! What a cool stuff! I have to learn nothing!"
Idiots indeed, if they think learning a few dozen keywords is the
hardest part in learning a programming language. In fact I think the
main reason why localized programming languages are so unpopular is that
people figure out that it doesn't make any difference whether you
declare a lexical variable with "my", "мое", or "mein": You have to
learn the keyword anyway and you have to learn what it means and how to
use it.
Then get any dictionary handy, till they are cheap.
*SKIP*
*SKIP*That doesn't fix the endianness, and it behaves completely differently.
"perl -Mencoding=ucs2" can't work, as I already explained to sln.
However, since I don't understand why it "can't work",
It can't work because -Mencoding=ucs2 says that the source code is
encoded in ucs2. So your script would have to begin with the byte
sequence
FE FF 00 70 00 72 00 69 00 6e 00 74 00 20 ...
to be interpreted as "print ...". But it doesn't (and it can't because
you cannot pass arguments with embedded null bytes in Unix).
It begins with
70 72 69 6e 74 20
which doesn't seem to be anything useful in UCS2 (U+2074 is SUPERSCRIPT
FOUR, but the rest is unassigned in both big and little endian).
It becomes worse if you use "use encoding 'ucs2';" inside a script:
You would have to start the script in US-ASCII so that "use encoding
'ucs2';" is recognized and then switch to UCS2: So you need to mix two
different incompatible encodings in the same script: Good luck finding
an editor which supports this. And you don't have to, anyway, because if
you want to encode your scripts in UTF-16, you can just do it and perl
will notice it automatically (but Unix won't recognize the hashbang any
Ouch. Shame on me.
more, so you don't want to do this on Unix - you might on windows,
though).
I don't windows.
--
Torvalds' goal for Linux is very simple: World Domination
Stallman's goal for GNU is even simpler: Freedom
.
- Follow-Ups:
- Re: F<utf8.pm> is evil (was: XML::LibXML UTF-8 toString() -vs- nodeValue())
- From: Peter J. Holzer
- Re: F<utf8.pm> is evil (was: XML::LibXML UTF-8 toString() -vs- nodeValue())
- References:
- XML::LibXML UTF-8 toString() -vs- nodeValue()
- From: MaggotChild
- Re: XML::LibXML UTF-8 toString() -vs- nodeValue()
- From: Ben Bullock
- Re: XML::LibXML UTF-8 toString() -vs- nodeValue()
- From: MaggotChild
- Re: XML::LibXML UTF-8 toString() -vs- nodeValue()
- From: Ben Bullock
- Re: XML::LibXML UTF-8 toString() -vs- nodeValue()
- From: Peter J. Holzer
- Re: XML::LibXML UTF-8 toString() -vs- nodeValue()
- From: Ben Bullock
- Re: XML::LibXML UTF-8 toString() -vs- nodeValue()
- From: Peter J. Holzer
- Re: XML::LibXML UTF-8 toString() -vs- nodeValue()
- From: sln
- Re: XML::LibXML UTF-8 toString() -vs- nodeValue()
- From: Peter J. Holzer
- Re: XML::LibXML UTF-8 toString() -vs- nodeValue()
- From: sln
- Re: XML::LibXML UTF-8 toString() -vs- nodeValue()
- From: Peter J. Holzer
- Re: XML::LibXML UTF-8 toString() -vs- nodeValue()
- From: Eric Pozharski
- Re: XML::LibXML UTF-8 toString() -vs- nodeValue()
- From: Peter J. Holzer
- Re: XML::LibXML UTF-8 toString() -vs- nodeValue()
- From: Eric Pozharski
- Re: XML::LibXML UTF-8 toString() -vs- nodeValue()
- From: Peter J. Holzer
- F<utf8.pm> is evil (was: XML::LibXML UTF-8 toString() -vs- nodeValue())
- From: Eric Pozharski
- Re: F<utf8.pm> is evil (was: XML::LibXML UTF-8 toString() -vs- nodeValue())
- From: Peter J. Holzer
- XML::LibXML UTF-8 toString() -vs- nodeValue()
- Prev by Date: Re: output from shell cmd
- Next by Date: Re: output from shell cmd
- Previous by thread: Re: F<utf8.pm> is evil (was: XML::LibXML UTF-8 toString() -vs- nodeValue())
- Next by thread: Re: F<utf8.pm> is evil (was: XML::LibXML UTF-8 toString() -vs- nodeValue())
- Index(es):
Relevant Pages
|
Loading