python encoding bug?
- From: garabik-news-2005-05@xxxxxxxxxxxxxxxxxxxxxxxx
- Date: Fri, 30 Dec 2005 22:54:05 +0000 (UTC)
I was playing with python encodings and noticed this:
garabik@lancre:~$ python2.4
Python 2.4 (#2, Dec 3 2004, 17:59:05)
[GCC 3.3.5 (Debian 1:3.3.5-2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> unicode('\x9d', 'iso8859_1')
u'\x9d'
>>>
U+009D is NOT a valid unicode character (it is not even a iso8859_1
valid character)
The same happens if I use 'latin-1' instead of 'iso8859_1'.
This caught me by surprise, since I was doing some heuristics guessing
string encodings, and 'iso8859_1' gave no errors even if the input
encoding was different.
Is this a known behaviour, or I discovered a terrible unknown bug in python encoding
implementation that should be immediately reported and fixed? :-)
happy new year,
--
-----------------------------------------------------------
| Radovan Garabík http://kassiopeia.juls.savba.sk/~garabik/ |
| __..--^^^--..__ garabik @ kassiopeia.juls.savba.sk |
-----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!
.
- Follow-Ups:
- Re: python encoding bug?
- From: Benjamin Niemann
- Re: python encoding bug?
- From: Vincent Wehren
- Re: python encoding bug?
- Prev by Date: Re: Guido at Google
- Next by Date: Global Variables in OOP and Python
- Previous by thread: generators in Java?
- Next by thread: Re: python encoding bug?
- Index(es):