Removing Unicode from Python?

From: Paradox (JoeyTaj_at_netzero.com)
Date: 10/30/03


Date: 29 Oct 2003 23:12:39 -0800

In general I love Python for text manipulation but at our company we
have the need to manipulate large text values stored in either a SQL
Server database or text files. This data is stored in a "text" field
type and is definitely not unicode though it is often very strange
text since it is either OCR or some kinda electronic file extraction.
Unfortunately when it is retrieved into a string type in python it is
invariably a unicode type string. The best I can do is try and encode
it to 'latin-1' but that will often throw and error if I use the
ignore parameter then it will wack my data with a bunch of "?". I am
just not understanding why python is thinking stuff is unicode and why
it is failing on conversion. There is no way that a byte can not be
between 0 and 255 right? This problem can be so haunting that I will
start to wish I had coded the solution in VB where at least a string
is a string is a string. Is there a way to modify Python so that all
strings will always be single byte strings since we have no need for
Unicode support? Any solutions or suggestions to my biggest Python
annoyance would be greatly appreciated.

                Thanks Joey



Relevant Pages

  • chapter3
    ... An Informal Introduction to Python ... the hash character, "#", and extend to the end of the physical line. ... string literal is just a hash character. ... Unicode Strings ...
    (Ubuntu)
  • Re: Removing Unicode from Python?
    ... >> In general I love Python for text manipulation but at our company we ... >> Unfortunately when it is retrieved into a string type in python it is ... >> invariably a unicode type string. ...
    (comp.lang.python)
  • Re: break unichr instead of fix ord?
    ... length-one string on wide Python builds. ... one Unicode code point (i.e. Python Unicode ... Python's codecs will adopt ...
    (comp.lang.python)
  • Re: Generalised String Coercion
    ... There is a large amount of Python code ... > that assumes that string data is represented as str instances. ... > unicode for all string data. ... This PEP strikes me as premature, as putting the toy wagon before the ...
    (comp.lang.python)
  • Re: Unicode drives me crazy...
    ... every string on some level). ... Python needs to know what encoding is used. ... The decode instruction converts s into a unicode string - where Python ...
    (comp.lang.python)