Re: How to find number of characters in a unicode string?




Lawrence D'Oliveiro wrote:
In message <pan.2006.09.18.20.29.20.510034@xxxxxxx>, Marc 'BlackJack'
Rintsch wrote:

In <20060918221814.08625ea2.randhol+valid_for_reply_from_news@xxxxxxx>,
Preben Randhol wrote:

Is there a way to calculate in characters
and not in bytes to represent the characters.

Decode the byte string and use `len()` on the unicode string.

Hmmm, for some reason

len(u"C\u0327")

returns 2.

If python ever provide this functionality it would be I guess
u"C\u0327".width() == 1. But it's not clear when unicode.org will
provide recommended fixed font character width information for *all*
characters. I recently stumbled upon Tamil language, where for example
u'\u0b95\u0bcd', u'\u0b95\u0bbe', u'\u0b95\u0bca', u'\u0b95\u0bcc'
looks like they have width 1,2,3 and 4 columns. To add insult to injury
these 4 symbols are all considered *single* letter symbols :) If your
email reader is able to show them, here they are in all their glory:
க், கா, கொ, கௌ.

.



Relevant Pages

  • Re: x and strings
    ... >>> If it is a unicode string use ... > So the only way is what John Carson wrote. ... The escape sequence for embedded bytes is three characters following the \x ...
    (microsoft.public.vc.language)
  • Re: From python to LaTeX in emacs on windows
    ... > In the file there is international characters like é and ó. ... > I read the file into python as a string and suddenly the characters ... > Second problem: ... convert the unicode string back to a byte sequence. ...
    (comp.lang.python)
  • Re: string.replace non-ascii characters
    ... characters of ordinal value> 127. ... why I had a unicode string though. ... I thought urllib2 always spat out ...
    (comp.lang.python)
  • Re: problems with  character
    ... had encoded a unicode string into utf-8. ... I have a mysql database with characters like   » in it. ... trying to write a python script to remove these, ...
    (comp.lang.python)
  • Re: How to mark UTF-8 string as being UTF-8
    ... if the characters are more than just ... When you "decode" that, you get a sequence of bytes whose contents ... Perl's appropriate function to get them represented as Perl ... here parsing code ... ...
    (comp.lang.perl.misc)

Loading