Re: i18n: looking for expertise

From: klappnase (klappnase_at_web.de)
Date: 03/10/05


Date: 10 Mar 2005 07:58:52 -0800


"stewart.midwinter@gmail.com" <stewart.midwinter@gmail.com> wrote in message news:<1110384648.779852.81580@o13g2000cwo.googlegroups.com>...
> Michael:
>
> on my box, (winXP SP2), sys.getfilesystemencoding() returns 'mbcs'.

Oh, from the reading docs I had thought XP would use unicode:

* On Windows 9x, the encoding is ``mbcs''.
* On Mac OS X, the encoding is ``utf-8''.
* On Unix, the encoding is the user's preference according to the
result of nl_langinfo(CODESET), or None if the nl_langinfo(CODESET)
failed.
* On Windows NT+, file names are Unicode natively, so no conversion is
performed.

Maybe that's for compatibility between different Windows flavors.

>
> If you post your revised solution to this unicode problem, I'd be
> delighted to test it on Windows. I'm working on a Tkinter front-end
> for Vivian deSmedt's rsync.py and would like to address the issue of
> accented characters in folder names.
>
> thanks
> Stewart
> stewart dot midwinter at gmail dot com

I wrote it for use with linux only, and it looks like using the system
encoding as I try to guess it in my UnicodeHandler module (see the
first post) is fine there.

When on windows the filesystemencoding differs from what I get in
UnicodeHandler.sysencoding I guess I would have to define separate
convenience methods for decoding/encoding filenames with sysencoding
replaced with sys.getfilesystemencoding()( I found the need for these
convenience methods when I discovered that some strings I used were
sometimes unicode and sometimes not, and I have a lot of interactions
between several modules which makes it hard to track which I have
sometimes).

Tk seems to be pretty smart on handling unicode, so using unicode for
everything that's displayed on tk widgets should be ok (I hope).

So filling a listbox with the contents of a directory "pathname" looks
like this:

pathname = fsencode(pathname)# make sure it's a byte string, for
python2.2 compatibility
flist = map(fsdecode, os.listdir(pathname))
flist.sort()
for item in flist:
    listbox.insert('end', item)

For file operations I have written a separate module which defines
convenience methods like these:

##########################################

def remove_ok(self, filename, verbose=1):
    b, u = fsencode(filename), fsdecode(filename)
    if not os.path.exists(b):
        if verbose:
            # popup a dialog box, similar to tkMessageBox
            MsgBox.showerror(parent=self.parent, message=_('File not
found:\n"%s"') % u)
        return 0
    elif os.path.isdir(b):
        if verbose:
            MsgBox.showerror(parent=self.parent, message=_('Cannot
delete "%s":\nis a directory') % u)
        return 0
    if not os.access(os.path.dirname(b), os.W_OK):
        if verbose:
            MsgBox.showerror(parent=self.parent, message=_('Cannot
delete "%s":\npermission denied.') % u)
        return 0
    return 1
    
def remove(self, filename, verbose=1):
    b, u = fsencode(filename), fsdecode(filename)
    if self.remove_ok(filename, verbose=verbose):
        try:
            os.remove(b)
            return 1
        except:
            if verbose:
                MsgBox.showerror(parent=self.parent, message=_('Cannot
delete "%s":\npermission denied.') % u)
    return 0

###################################

It looks like you don't need to do any encoding of filenames however,
if you use python2.3 (at least as long as you don't have to call
os.access() ), but I want my code to run with python2.2 ,too.

I hope this answers your question. Unfortunately I cannot post all of
my code here, because it's quite a lot of files, but the basic concept
is still the same as I wrote in the first post.

Best regards

Michael



Relevant Pages

  • Re: "env" parameter to "popen" wont accept Unicode on Windows -minor Unicode bug
    ... Unicode to be handled automatically. ... Windows, and it knows what encoding Windows needs for its environment ... So the current code will handle Win9x, ...
    (comp.lang.python)
  • Re: Filenames in Ada
    ... > Martin Krischik wrote: ... >> page is active and all filenames are expressed using that and not Latin ... > Well, it says "Windows stores the long file names on disk in Unicode", ... > so now we have to guess which encoding it is they call "Unicode". ...
    (comp.lang.ada)
  • RE: "env" parameter to "popen" wont accept Unicode on Windows -minor Unicode bug
    ... Unicode to be handled automatically. ... Windows, and it knows what encoding Windows needs for its environment ... the distinction between windows and other platforms is debatable. ...
    (comp.lang.python)
  • Re: which locale
    ... > Can't Unicode handles non-Unicode filenames? ... the UTF-8 encoding of Unicode is ...
    (Debian-User)
  • Re: [kde] Changing encoding in Dolphin
    ... If you have got a windows box that can read the filenames, ... 950 is the default multibyte encoding I have set in my Windows XP ... Archives: http://lists.kde.org/. ...
    (KDE)