Re: join("") somehow changes characters after 'z'



On 2007-10-09 21:13, Ben Morrow <ben@xxxxxxxxxxxx> wrote:
Perl doesn't know what character encoding you are expecting on STDOUT.
As a result, it is printing the raw bytes of its own internal
representation,

No, it isn't. It prints strings which contain only characters
in the range [0 .. 255] as 1 byte per character, and strings which
contain characters outside of this range as 1 utf8 sequence per
character. This is independent of how the strings are represented
internally. Consider this:

#!/usr/bin/perl
use utf8;

my $x = "\x{B0}";
utf8::upgrade($x);

print STDERR utf8::is_utf8($x) ? "wide\n" : "byte\n";

print $x;
__END__

% ./foo | od -tx1
wide
0000000 b0
0000001

After the upgrade, $x is internally represented as a wide string (as can
be seen from the output "wide" on STDERR), put it still prints only one
byte to STDOUT.

hp



--
_ | Peter J. Holzer | I know I'd be respectful of a pirate
|_|_) | Sysadmin WSR | with an emu on his shoulder.
| | | hjp@xxxxxx |
__/ | http://www.hjp.at/ | -- Sam in "Freefall"
.



Relevant Pages

  • Re: Unicode LISP??
    ... I'm not experienced with Common Lisp library, ... terms of strings rather than characters. ... have their representation upgraded if they are updated in place. ...
    (comp.lang.lisp)
  • Re: Redefining how a standard object prints
    ... Are you proposing to have strings *in ... the image* with UTF-8 encoding, ... representation of some Lisp source code? ... where characters are read from the stream (and applies equally to all ...
    (comp.lang.lisp)
  • Re: Internal Representation of Strings
    ... fixed-length strings and dynamic-lenght strings ... among them the representation as start+length ... all characters can be represented. ... M. Anton Ertl ...
    (comp.compilers)
  • Re: diferences between 22 and python 23
    ... >>representation that they originally had in the source code. ... Strings don't have an encoding ... > encoding of characters really just represents characters, ... Do the strings represent characters AT ALL? ...
    (comp.lang.python)
  • Re: Why R6RS is controversial
    ... the semantics of the language, ... behavior of grapheme-cluster characters under most linguistic ... as the strings grow longer. ... Normalization is hideously complicated, and may require many ...
    (comp.lang.scheme)