Re: From python to LaTeX in emacs on windows

From: Benjamin Niemann (b.niemann_at_betternet.de)
Date: 08/30/04


Date: Mon, 30 Aug 2004 13:55:18 +0200

Brian Elmegaard wrote:
> Hi group
>
> I hope this is not a faq...
>
> I try to understand how to use the new way of specifying a files
> encoding, but no matter what I do I get strange characters in the
> output.
>
> I have a text file which I have generated in python by parsing some
> html.
>
> In the file there is international characters like é and ó.
> I can see the file in emacs it is encoded as
> mule-utf-8-dos
>
> I read the file into python as a string and suddenly the characters
> when printed looks strange and consists of two characters.
>
> First problem: How do I avoid this?
>
> Second problem is that I make some string replacements and more in
> the string to write a latex output file. When I open this file in
> emacs the characters now are not the same?
>
> Second problem: How do I avoid this?

When you read the filecontents in python, you'll have the "raw" byte
sequence, in this case it is the UTF-8 encoding of unicode text. But you
probably want a unicode string. Use "text = unicode(data, 'utf-8')"
where "data" is the filecontent you read. After processing you probably
want to write it back to a file. Before you do this, you will have to
convert the unicode string back to a byte sequence. Use "data =
text.encode('utf')".

Handling character encodings correctly *is* difficult. It's no shame, if
you don't get it right on the first attempt.



Relevant Pages

  • From python to LaTeX in emacs on windows
    ... In the file there is international characters like é and ó. ... I read the file into python as a string and suddenly the characters ... Second problem: How do I avoid this? ...
    (comp.lang.python)
  • Re: CHR$() and international characters.
    ... characters in the range 128-255. ... Be assured the string is being created correctly in PB/WIN 8.04. ... Unicode string and use the Windows API function MessageBoxW (the unicode ...
    (comp.lang.basic.powerbasic)
  • Re: LC_ALL and os.listdir()
    ... Serge Orlov wrote: ... If you pass a unicode string ... > and a byte string it currently tries to convert bytes to characters ... > but it makes more sense to convert the unicode string to bytes ...
    (comp.lang.python)
  • Get number of actual bytes in a string
    ... Hi, i have a unicode string, i want to know how i can get the number of ... actaul bytes the string is made up of. ... of characters in the string which is not what i want. ...
    (microsoft.public.dotnet.framework.compactframework)
  • Re: How to convert Infix notation to postfix notation
    ... If this is for an error message, why isn't it using stderr for its output? ... array of 15 characters, and you call this function with the limit 15 on ... Making sure that the only string I allocate and append to, ... because mulFactor in all versions must needs incorporate the functions ...
    (comp.lang.c)

Loading