Psycopg and queries with UTF-8 data

From: Alban Hertroys (alban_at_magproductions.nl)
Date: 10/14/04


Date: Thu, 14 Oct 2004 12:00:45 +0200

Another python/psycopg question, for which the solution is probably
quite simple; I just don't know where to look.

I have a query that inserts data originating from an utf-8 encoded XML
file. And guess what, it contains utf-8 encoded characters...
Now my problem is that psycopg will only accept queries of type str, so
how do I get my utf-8 encoded data into the DB?

I can't do query.encode('ascii'), that would be similar to:
>>> x = u'\xc8'
>>> print x.encode('ascii')
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xc8' in
position 0: ordinal not in range(128)

I also tried setting PostgreSQL's client-encoding by executing "SET
client_encoding TO 'utf-8'", but psycopg still only accepts str-type
strings (which is not really surprising).

I assume that the solution will result in an ascii encoded query string,
and that I then can use the QuotedString type to escape my strings
(which is in my current situation not possible because that also only
accepts str type strings and it contains utf-8 characters).

Regards,
Alban.



Relevant Pages

  • Re: Select instruction not returning entire column (explicit truncate ???)
    ... I tested this script into Query ... characters like ... >converting the strings to varbinary, ... >>Yanick Charland ...
    (microsoft.public.sqlserver.mseq)
  • Re: Select instruction not returning entire column (explicit truncate ???)
    ... Have you checked the Query Analyzer setting I mentioned? ... >characters like ... >>converting the strings to varbinary, ...
    (microsoft.public.sqlserver.mseq)
  • Re: Why R6RS is controversial
    ... the semantics of the language, ... behavior of grapheme-cluster characters under most linguistic ... as the strings grow longer. ... Normalization is hideously complicated, and may require many ...
    (comp.lang.scheme)
  • Re: Unicode LISP??
    ... I'm not experienced with Common Lisp library, ... terms of strings rather than characters. ... have their representation upgraded if they are updated in place. ...
    (comp.lang.lisp)
  • Re: not quite 1252
    ... The kill_gremlins function is intended to fix Unicode strings that have been obtained by decoding 8-bit strings using 'latin1' instead of 'cp1252'. ... In fact it wasn't, it was UTF-8 like Sergei wrote, but it was easy to convert it to cp1252, no problem. ... characters to documents marked up as ISO 8859-1 or other encodings. ...
    (comp.lang.python)