Re: string.replace non-ascii characters



Steven Bethard <steven.bethard@xxxxxxxxx> on Sun, 11 Feb 2007 22:23:59
-0700 didst step forth and proclaim thus:

Samuel Karl Peterson wrote:
Greetings Pythonistas. I have recently discovered a strange anomoly
with string.replace. It seemingly, randomly does not deal with
characters of ordinal value > 127. I ran into this problem while
downloading auction web pages from ebay and trying to replace the
"\xa0" (dec 160, nbsp char in iso-8859-1) in the string I got from
urllib2. Yet today, all is fine, no problems whatsoever. Sadly, I
did not save the exact error message, but I believe it was a
ValueError thrown on string.replace and the message was something to
the effect "character value not within range(128).

Was it something like this?

>>> u'\xa0'.replace('\xa0', '')
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position
0: ordinal not in range(128)

Yeah that looks like exactly what was happening, thank you. I wonder
why I had a unicode string though. I thought urllib2 always spat out
a plain string. Oh well.

u'\xa0'.encode('latin-1').replace('\xa0', " ")

Horray.
--
Sam Peterson
skpeterson At nospam ucdavis.edu
"if programmers were paid to remove code instead of adding it,
software would be much better" -- unknown
.



Relevant Pages

  • Re: x and strings
    ... >>> If it is a unicode string use ... > So the only way is what John Carson wrote. ... The escape sequence for embedded bytes is three characters following the \x ...
    (microsoft.public.vc.language)
  • urllib2 (py2.6) vs urllib.request (py3)
    ... characters mentioned earlier), why? ... The problem isn't a difference between urllib2 and urllib.request, ... which is what urllib.request returns in python3. ... The thing to keep in mind is that print converts its argument to string ...
    (comp.lang.python)
  • Re: urllib2 (py2.6) vs urllib.request (py3)
    ... extra characters mentioned earlier), why? ... The problem isn't a difference between urllib2 and urllib.request, ... python3, it will complain that you can't write bytes to the file object. ... The thing to keep in mind is that print converts its argument to string ...
    (comp.lang.python)
  • Re: From python to LaTeX in emacs on windows
    ... > In the file there is international characters like é and ó. ... > I read the file into python as a string and suddenly the characters ... > Second problem: ... convert the unicode string back to a byte sequence. ...
    (comp.lang.python)
  • Re: How to find number of characters in a unicode string?
    ... and not in bytes to represent the characters. ... Decode the byte string and use `len` on the unicode string. ... these 4 symbols are all considered *single* letter symbols:) If your ...
    (comp.lang.python)