python3 raw strings and \u escapes
- From: "rurpy@xxxxxxxxx" <rurpy@xxxxxxxxx>
- Date: Tue, 29 May 2012 23:52:16 -0700 (PDT)
In python2, "\u" escapes are processed in raw unicode
strings. That is, ur'\u3000' is a string of length 1
consisting of the IDEOGRAPHIC SPACE unicode character.
In python3, "\u" escapes are not processed in raw strings.
r'\u3000' is a string of length 6 consisting of a backslash,
'u', '3' and three '0' characters.
This breaks a lot of my code because in python 2
re.split (ur'[\u3000]', u'A\u3000A') ==> [u'A', u'A']
but in python 3 (the result of running 2to3),
re.split (r'[\u3000]', 'A\u3000A' ) ==> ['A\u3000A']
I can remove the "r" prefix from the regex string but then
if I have other regex backslash symbols in it, I have to
double all the other backslashes -- the very thing that
the r-prefix was invented to avoid.
Or I can leave the "r" prefix and replace something like
r'[ \u3000]' with r'[ ]'. But that is confusing because
one can't distinguish between the space character and
the ideographic space character. It also a problem if a
reader of the code doesn't have a font that can display
the character.
Was there a reason for dropping the lexical processing of
\u escapes in strings in python3 (other than to add another
annoyance in a long list of python3 annoyances?)
And is there no choice for me but to choose between the two
poor choices I mention above to deal with this problem?
.
- Follow-Ups:
- Re: python3 raw strings and \u escapes
- From: jmfauth
- Re: python3 raw strings and \u escapes
- Prev by Date: PIL threading problems
- Next by Date: issubclass(C, Mapping) not behaving as expected
- Previous by thread: PIL threading problems
- Next by thread: Re: python3 raw strings and \u escapes
- Index(es):
Relevant Pages
|