Re: A Unicode problem -HELP
- From: "Martin v. Löwis" <martin@xxxxxxxxxxx>
- Date: Wed, 17 May 2006 08:08:26 +0200
manstey wrote:
a=str(word_info + parse + gloss).encode('utf-8')
a=a[1:len(a)-1]
Is this clearer?
Indeed. The problem is your usage of str() to "render" the output.
As word_info+parse+gloss is a list (or is it a tuple?), str() will
already produce "Python source code", i.e. an ASCII byte string
that can be read back into the interpreter; all Unicode is gone
from that string. If you want comma-separated output, you should
do this:
def comma_separated_utf8(items):
result = []
for item in items:
result.append(item.encode('utf-8'))
return ", ".join(result)
and then
a = comma_separated_utf8(word_info + parse + gloss)
Then you don't have to drop the parentheses from a anymore, as
it won't have parentheses in the first place.
As the encoding will be done already in the output file,
the following should also work:
a = u", ".join(word_info + parse + gloss)
This would make "a" a comma-separated unicode string, so that
the subsequent output_file.write(a) encodes it as UTF-8.
If that doesn't work, I would like to know what the exact
value of gloss is, do
print "GLOSS IS", repr(gloss)
to print it out.
Regards,
Martin
.
- Follow-Ups:
- Re: A Unicode problem -HELP
- From: manstey
- Re: A Unicode problem -HELP
- References:
- A Unicode problem -HELP
- From: manstey
- Re: A Unicode problem -HELP
- From: "Martin v. Löwis"
- Re: A Unicode problem -HELP
- From: manstey
- Re: A Unicode problem -HELP
- From: "Martin v. Löwis"
- Re: A Unicode problem -HELP
- From: manstey
- A Unicode problem -HELP
- Prev by Date: Re: assignment in a for loop
- Next by Date: Re: A Unicode problem -HELP
- Previous by thread: Re: A Unicode problem -HELP
- Next by thread: Re: A Unicode problem -HELP
- Index(es):
Relevant Pages
|