Re: LANG, locale, unicode, setup.py and Debian packaging



I would advise against such a strategy. Instead, you should first
understand what the encodings of the file names actually *are*, on
a real system, and draw conclusions from that.
I don't follow you here. The encoding of file names *on* a real system are
(for Linux) byte strings of potentially *any* encoding.

No. On a real system, nothing is potential, but everything is actual.

So on *your* system, today: what encoding are the filenames encoded in?
We are not talking about arbitrary files, right, but about font files?
What *actual* file names do these font files have?

On my system, all font files have ASCII-only file names, even if they
are for non-ASCII characters.

os.listdir() may even
fail to grok some of them. So, I will have a few elements in a list that are
not unicode, I can't ask the O/S for any help and therefore I should be able
to pass that byte string to a function as suggested in the article to at
least take one last stab at identifying it.

It won't identify it. It will just give you *some* Unicode string.

Or is that a waste of time because os.listdir() has already tried something
similar (and prob. better)?

"better" is a difficult notion here. Is it better to produce some
result, possibly incorrect, or is it better to give up?

I forgot to mention the command-line interface... I actually had trouble with
that too. The user can start the app like this:
fontypython /some/folder/
or
fontypython SomeFileName
And that introduces input in some kind of encoding. I hope that
locale.getprefferedencoding() will be the right one to handle that.

If the user has set up his machine correctly: yes.

I see no problem with that:
u"M\xd6gul".encode("ascii","ignore")
'Mgul'
u"M\xd6gul".encode("ascii","replace")
'M?gul'
Well, that was what I expected to see too. I must have been doing something
stupid.

Most likely, you did not invoke .encode on a Unicode string.

Regards,
Martin
.



Relevant Pages

  • Re: byte count unicode string
    ... in a "UTF-8 encoded Python string object", ... A Python Unicode string is an abstract sequence of ... UTF-8 is a character encoding; ...
    (comp.lang.python)
  • Re: byte count unicode string
    ... in a "UTF-8 encoded Python string object", ... A Python Unicode string is an abstract sequence of ... UTF-8 is a character encoding; ...
    (comp.lang.python)
  • byte count unicode string
    ... A byte string is a sequence of quantities with ... >UTF-8 is a character encoding; ... For a character encoding, "what is the ...
    (comp.lang.python)
  • Re: UnicodeDecodeError help please?
    ... do not hack the default encoding. ... looks like ISO-8859-1 (Latin-1) to me. ... so it's an ISO Latin-1 string. ... you're trying to combine an 8-bit string with a Unicode string, ...
    (comp.lang.python)
  • Re: Why asci-only symbols?
    ... >> Perhaps string equivalence in keys will be treated like numeric equivalence? ... I know typewill be and in itself contain no encoding information now, ... >and a Unicode string, the system default encoding ...
    (comp.lang.python)