Re: code critique



Daniel Leidisch <news@xxxxxxxxxxxx> writes:
In article <87r6oyigvo.fsf@xxxxxxxxxxxxxxxxxx> you wrote:

* A portability caveat: ANSI Common Lisp does not require
characters to be encoded in ASCII or any other encoding, and so
CODE-CHAR may not do what you expect here.

Do you have any suggestions as to how to implement this
more portably?

Well, this was mostly a fussy "technically this isn't portable" note;
all current implementations do, I think, use a superset of ASCII for
their character repertoires.

In case you are interested in being fussy: a conforming ANSI CL
implementation is only required to support the 96 characters in the
standard-character repertoire, and one of those characters, Newline,
need not map to a single ASCII character. You can get the standard
character for an ASCII code like this:

(defun ascii-code-char (code)
(cond ((< 31 code 127)
(char #.(format nil "~@{~A~}"
" !\"#$%&’()*+,-./0123456789:;<=>?@"
"ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`"
"abcdefghijklmnopqrstuvwxyz{|}~")
(- code 32)))
(t ;Here you might handle Newline and the semistandard characters
(error "Can't find an ASCII character for code ~D" code))))

For the other characters in the ASCII character set, you'll need to
resort to implementation-dependent details. For Unicode, RFC 3986
defines the URL encoding of a Unicode code point as the URL encoding
of the octets in the UTF-8 encoding of the code point, so if you care
to support this, you'll probably have make your decoder a bit more
complex. (A quick check leaves me less than confident that all
browsers actually encode Unicode code points according to the RFC,
though.)

--
RmK
.



Relevant Pages

  • =?utf-8?B?UmU6IFN0cmluZyAiw6LigqzihKIiIHRyYW5zbGF0ZWQgdG8gYXBvc3Ryb3BoZS4gV2h5Pw==?=
    ... it works), though it seems to use mostly just Ascii characters, representing ... but the author is not making the best possible use of UTF-8. ... They don't map it to ASCII apostrophe, ... Latin 1 encoding. ...
    (alt.html)
  • Re: Generic innerHTML functionality and other minor questions...
    ... > escape was introduced probably only for US-ASCII characters). ... Protocol to encode / decode ASCII characters which are not ... These are all extended ASCII ...
    (comp.lang.javascript)
  • Re: what does "serialization" mean?
    ... it's the most important piece of the ASCII ... ANSI recognized that 128 characters were ... ASCII committee hasn't met to discuss character encoding formats for many, ... Space Invaders or LEM games. ...
    (comp.programming)
  • Re: How do I get unicode support in python?
    ... unable to print any characters outside of ascii. ... What do I need to do to get python on the web server to have unicode ... For Python to be able to "print" unicode characters to the console, ... know the encoding of the console. ...
    (freebsd-questions)
  • Re: File-Compare "fc" falsely reports mismatch between identical files
    ... first and last lines of each set of differences, whereas /L is said to compare files as ascii text. ... Show me a couple of "text files" that fc/a does not compare properly, and I would argue that they are so extreme in some way that I would not consider them "text files". ... One of the definitions found by google is this: "A file that contains characters organized into one or more lines. ... the tax department reacted to a customer's complaint and insisted that the faulty tax calculation be fixed. ...
    (microsoft.public.win2000.cmdprompt.admin)