Re: non standard path characters



thanks for that. I guess the problem is that when a path is obtained
from such an object the code that gets the path usually has no way of
knowing what the intended use is. That makes storage as simple bytes
hard. I guess the correct way is to always convert to a standard (say
utf8) and then always know the required encoding when the thing is to be
used.

Inside the program itself, the best things is to represent path names
as Unicode strings as early as possible; later, information about the
original encoding may be lost.

If you obtain path names from the os module, pass Unicode strings
to listdir in order to get back Unicode strings. If they come from
environment variables or command line arguments, use
locale.getpreferredencoding() to find out what the encoding should
be.

If they come from a zip file, Tijs already explained what the encoding
is.

Always expect encoding errors; if they occur, chose to either skip
the file name, or report an error to the user. Notice that listdir
may return a byte string if decoding fails (this may only happen
on Unix).

Regards,
Martin
.



Relevant Pages

  • Re: Unicode -> Python -> DBAPI -> PyPgSQL -> PostgreSQL
    ... There is no mention of encoding and Unicode ... See section 2.2.5 in the pyPgSQL README: ... pyPgSQL has a few extensions that make it possible to insert Unicode strings ... If you also want to fetch Unicode strings from the database, ...
    (comp.lang.python)
  • Re: the official way of printing unicode strings
    ... Especially in Python, where there is one official way to do any elementary ... And I just want to know what is the normal, official way of printing ... The official way to write Unicode strings into a file is not to do that. ... Explicit is better then implicit - always explicitly pick an encoding, ...
    (comp.lang.python)
  • Re: What encoding does u... syntax use?
    ... has to have an encoding associated with it, including unicode strings ... produced by the Python parser when it parses the ascii string "u'\xb5'" ... My question is: what is that encoding? ... If builders built buildings the way programmers wrote ...
    (comp.lang.python)
  • Re: handling unicode data
    ... unicode strings means they have to be encoded into a byte string again. ... And whatever encoding the target of the print (your console) uses, ...
    (comp.lang.python)
  • Re: Unicode & Pythonwin / win32 / console?
    ... > is in a PythonWin Interactive session - ok results for cyrillic chars ... > (tolerant mbcs/utf-8 encoding!). ... > encoding Errors - no matter if chcp1251, ... > I think this is not a good behaviour of python to be so picky. ...
    (comp.lang.python)