urwid with multi-byte encoded and bidirectional text?

From: Ian Ward (ian.rmthispart_at_excess.org)
Date: 11/05/04


Date: Thu, 04 Nov 2004 23:17:07 -0500
To: python-list@python.org

I hope to add support for multi-byte encoded and bidirectional text to
my curses-based UI library:
http://excess.org/urwid/

I would like to support whatever encoding the user likes. Are there
functions for:
- querying the preferred encoding
- splitting encoded strings into characters based on an encoding
- determining the direction (L to R, R to L) of each character
- determining the number of columns used by each character when written
to the terminal

I currently use a "line translation" structure to store instructions
for mapping a text string to a two-dimensional "canvas". Its current,
simple, format is described here:
http://excess.org/urwid/reference.html#Text-get_line_translation

The line translation structures describe the result of
word-wrapping/clipping and justification applied to the source text. A
*new* line translation format would have to support characters that are
N bytes in the string and M columns wide when displayed, as well as text
that is displayed in a different order than it appears in the string.

Is normalizing bidirectional text orthogonal to wrapping/clipping and
aligning that text? Could I create a "direction translation" structure
that describes how a given string can be reordered Left-to-Right, then
solve the wrapping and alignment with this normalized version?

In what situations are characters modified/removed/inserted as part of
displaying them? (eg. punctuation being reversed when surrounding R to L
text)

TIA

Ian Ward <ian#excess,org>



Relevant Pages

  • Re: Byte Array to String
    ... retrieved text will mismatch the original characters. ... encoding the characters. ... Dim strFileData as String ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: Send string to IP address
    ... "Plain hex" implies something formatted as text, but doesn't answer the question of encoding. ... There's no "just" as far as "an ASCII string" is concerned. ... Characters are not bytes and bytes are not characters. ... Normally you'd create the Writer once at the same time as you create the underlying stream, rather than every time you write some text, obviously. ...
    (comp.lang.java.programmer)
  • Re: Optimization of code
    ... It would also have been nice if the notion of derviation from CString had been a supported ... but still return the intended "formatted" string. ... if the editor supports Unicode and the compilers support ... Swedish, German, French, Hungarian, etc. that use accented characters). ...
    (microsoft.public.vc.mfc)
  • Re: Byte Array to String
    ... retrieved text will mismatch the original characters. ... I think VBA may use the default system locale to ... encoding the characters. ... Dim strFileData as String ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: Character semantics for filenames (was: win32 reading wide filenames (unicode))
    ... Now file name is stored in utf8 format. ... it doesn't make any difference whether the string is internally ... DO WITH CHARACTERS ABOVE "\xFF". ... encoding to perl strings by readdir and from perl strings to the OS ...
    (comp.lang.perl.misc)