Re: Is there any way to say ignore case with "in"?



On Apr 6, 8:53 am, "Martin v. Löwis" <mar...@xxxxxxxxxxx> wrote:
I know I could use:-

    if lower(string1) in lower(string2):
        <do something>

but it somehow feels there ought to be an easier (tidier?) way.

Easier?  You mean like some kind of mind meld?

Interestingly enough, it shouldn't be (but apparently is) obvious that

   a.lower() in b.lower()

is a way of expressing "a is a substring of b, with case-insensitive
matching". Can we be sure that these are really the same concepts,
and if so, is

  a.upper() in b.upper()

also equivalent?

It's probably a common assumption that, for any character c,
c.lower()==c.upper().lower(). Yet,

py> [i for i in range(65536) if unichr(i).upper().lower() !=
unichr(i).lower()]
[181, 305, 383, 837, 962, 976, 977, 981, 982, 1008, 1009, 1010, 1013,
7835, 8126]

Take, for example, U+017F, LATIN SMALL LETTER LONG S. It's .lower() is
the same character, as the character is already in lower case.
It's .upper() is U+0053, LATIN CAPITAL LETTER S. Notice that the LONG
is gone - there is no upper-case version of a "long s".
It's .upper().lower() is U+0073, LATIN SMALL LETTER S.

So should case-insensitive matching match the small s with the small
long s, as they have the same upper-case letter?

Regards,
Martin

Another surprise (or maybe not so surprising) - this "upper != lower"
is not symmetric. Using the inverse of your list comp, I get

[i for i in range(65536) if unichr(i).lower().upper() !=
... unichr(i).upper()]
[304, 1012, 8486, 8490, 8491]

Instead of 15 exceptions to the rule, conversion to upper has only 5
exceptions. So perhaps comparsion of upper's is, while not foolproof,
less likely to encounter these exceptions? Or at least, simpler to
code explicit tests.

-- Paul
.



Relevant Pages

  • Re: What Happens to Superman in 2013
    ... or at least retaining a mark when it has happened de facto. ... A character can't really be generic. ... to argue that Superman has many versions ...   ...
    (rec.arts.comics.dc.universe)
  • Re: The monumental stupidity of PIE theorists further illustrated
    ... that an Indo-European s was lost before n in other words in Latin, ... Greek, and Armenian, so we can confidently assume that Latin nurus, ...    16 ...
    (sci.lang)
  • Re: File Handling in Ada -95.- Demonstartion.
    ...   SUBTYPE NameRange IS Positive RANGE 1 .. ... The interesting part of the copy operation is a simple character by ... As long as the external file model is accurately represented within ...
    (sci.crypt)
  • Re: CfV: Escaped Strings S"
    ... 20070913 5   Added clarifications. ... The word S" 6.1.2165 is the primary word for generating strings. ... the S" string cannot contain the '"' character, ... The xchars are an integer multiple of pchars. ...
    (comp.lang.forth)
  • Re: YASD - Crawl 0.4.3 - Spriggan Enchanter
    ... Slash'EM character ...     ... Sling and you devoted a lot of inventory weight to ammo, ... Very unusual choice of spells and skills there. ...
    (rec.games.roguelike.misc)