Re: convert unicode characters to visibly similar ascii characters



Peter Bulychev wrote:
Hello.

I want to convert unicode character into ascii one.
The method ".encode('ASCII') " can convert only those unicode characters, which fit into 0..128 range.

But there are still lots of characters beyond this range, which can be manually converted to some visibly similar ascii characters. For instance, there are several quotation marks in unicode, which can be converted into ascii quotation mark.
Please be more specific. There is no general solution. Unicode can handle latin, cyrilic (russian), chinese, japanese and arabic characters in the same string. There are thousands of possible non-ascii characters and many of them are not similar to any ascii character.

If you only want this to work for a subset, please define that subset.

Laszlo

.



Relevant Pages

  • Re: File-Compare "fc" falsely reports mismatch between identical files
    ... first and last lines of each set of differences, whereas /L is said to compare files as ascii text. ... Show me a couple of "text files" that fc/a does not compare properly, and I would argue that they are so extreme in some way that I would not consider them "text files". ... One of the definitions found by google is this: "A file that contains characters organized into one or more lines. ... the tax department reacted to a customer's complaint and insisted that the faulty tax calculation be fixed. ...
    (microsoft.public.win2000.cmdprompt.admin)
  • Re: POS. Cash Register on AS400.- New and Updates
    ... Probably the easiest way would be to send them as ASCII. ... You need to change the printer file to not convert unprintable characters. ... "The INITPRT tag defines the ASCII control ... but still can not open cash drawer. ...
    (comp.sys.ibm.as400.misc)
  • Re: Unicode Support
    ... consider:)...but, you know, a file is still just a "stream of characters" ... "escape sequence" but accessing an ordinary ASCII character) are considered ... English, while all your identifiers are in "Romanji" Japanese or something ... NASM appears already to do so with strings and comments in ...
    (alt.lang.asm)
  • Re: System 360 EBCDIC vs. ASCII
    ... I suppose they could have created a 7-bit architecture if it ... There are a few vestiges of 7-bit characters in other computer systems due ... If you set your modem to 8 bits you ... connections, including hardwired ones: plotters, ASCII terminals, etc. ...
    (bit.listserv.ibm-main)
  • Re: Unicode Support
    ... it is intended that no UNICODE character will ever go ... | same as it would be in ASCII: ... All non-ASCII characters use a multi-byte sequence ...
    (alt.lang.asm)