Re: tuples, index method, Python's design



"Rhamphoryncus" <rhamph@xxxxxxxxx> writes:
i = s.index(e) => s[i] = e
Then this algorithm is no longer guaranteed to work with strings.
It never worked correctly on unicode strings anyway (which becomes the
canonical string in python 3.0).

What?! Are you sure? That sounds broken to me.

Nope, it's pretty fundamental to working with text, unicode only being
an extreme example: there's a wide number of ways to break down a
chunk of text, making the odds of "e" being any particular one fairly
low. Python's unicode type only makes this slightly worse, not
promising any particular one is available.

I don't understand this. I thought that unicode was a character
coding system like ascii, except with an enormous character set
combined with a bunch of different algorithms for encoding unicode
strings as byte sequences. But I've thought of those algorithms
(UTF-8 and so forth) as basically being kludgy data compression
schemes, and unicode strings are still just sequences of code points.
.



Relevant Pages

  • Re: Code Page problem in SetWindowText
    ... Those Asc and Chr function calls cause problems when a Chinese code page is ... "algorithms peppered throughout" the code suggests that there are deeper ... be written in terms of counted byte strings, ... I can say with a degree of certainty that going to Unicode has ...
    (microsoft.public.vc.mfc)
  • Re: Code Page problem in SetWindowText
    ... Since VB3 is an antique, I'm not at all sure what it was doing with its functions. ... "algorithms peppered throughout" the code suggests that there are deeper ... be written in terms of counted byte strings, ... I can say with a degree of certainty that going to Unicode has ...
    (microsoft.public.vc.mfc)
  • Re: What is the encoding of this String?
    ... There are two ways to think of Java Strings. ... Strings are collections of characters. ... are Unicode characters. ... pure Unicode data into sequences of bytes -- and Java's Strings are not ...
    (comp.lang.java.programmer)
  • Re: How to check variables for uniqueness ?
    ... characters is the sequence SS. ... is simply capitalizing strings. ... The fact that case mapping in English /is/ simple is neither here not ... That is a fair criticism of the Unicode position. ...
    (comp.lang.java.programmer)
  • Re: Dangerous behavior of CString
    ... If I'm reading a data file or serial port or something, if the raw data are multibyte but the compilation is Unicode or vice-versa, then sometimes the converting constructors in CString are convenient. ... I did not actually write code like this; in fact I was pretty careful always to use the _T macro with any literal strings. ... But it does the conversion using the current 8-bit code page, which is not what I want. ...
    (microsoft.public.vc.mfc)