Re: code critique



Daniel Leidisch <news@xxxxxxxxxxxx> writes:

Since I'm learning Lisp on my own, I would highly appreciate any tips
concerning my code. What could be done better, what is already ok?

(defun convert-hex-encoded-chars (string)
"Converts %HEX encoded chars found in string."
(if (cl-ppcre:scan "%.." string)
(cl-ppcre:register-groups-bind
(before match after) ("([^%]*)(%..)(.*)" string)
(concatenate 'string before
(string (code-char
(parse-integer (subseq match 1) :radix 16)))
(convert-hex-encoded-chars after)))
string))

This is fine; the following are just a few suggestions:

* it may be more efficient to use a string-output-stream than to
repeatedly construct intermediate strings. See
WITH-OUTPUT-TO-STRING.

* A portability caveat: ANSI Common Lisp does not require characters
to be encoded in ASCII or any other encoding, and so CODE-CHAR may
not do what you expect here.

* While it's evident when you look at the code for a few seconds (if
one knows regular expressions), neither the name of the function nor
the docstring actually says which way this function "converts"; it
could be "convert /from/", or "convert /to/". Why not name it
something like DECODE-HEX-ENCODED-CHARACTERS, or
DECODE-URLENCODED-STRING?

* Note that you don't really need regular expressions here: you can go
over characters in the string fairly easily with LOOP or DOTIMES, or
by constructing a string-input-stream and looping with READ-CHAR,
decoding and accumulating the first two characters after a percent
sign, and accumulating other characters verbatim. (This isn't a
claim that regular expressions are to be avoided: rather, that if
you're looking for exercises, then since you demonstrate facility
with regular expressions already, you might like to try writing this
routine using only standard CL routines.)

(defun split-query-string (query-string)
"Splits a query-string and returns a list of variable=value assignments."
(cl-ppcre:split "&" (substitute #\Space #\+
(convert-hex-encoded-chars query-string))))

If you split after decoding, what happens in case one of the decoded
variables or values contains an ampersand? For example:

(split-query-string "foo=b%26r")

(defun get-parameters (&optional
(query-string
(osicat:environment-variable "QUERY_STRING")))
"Returns an alist of (variable . value) pairs for a cgi query-string, which
may be given as an optional parameter. Otherwise, the environment-variable
QUERY_STRING is used."
(loop for variable in (split-query-string query-string)
collect (cl-ppcre:register-groups-bind
(key value) ("(.*)=(.*)" variable)
(cons (intern (string-upcase key))
value))))

Likewise, if you split variables from values after decoding, what
happens in case one of the decoded variables or values contains an
equal sign? For example:

(get-parameters "foo=b%3Dr")

--
RmK
.



Relevant Pages

  • Re: Prothon should not borrow Python strings!
    ... """It does not make sense to have a string without knowing what encoding ... same cul de sac as Python. ... Prothon_String_As_ASCII // raises error if there are high characters ... Python's split between byte strings and Unicode strings is ...
    (comp.lang.python)
  • Re: Letter to US Sen. Byron Dorgan re unpaid overtime
    ... put them in stupid places. ... Programming is difficult (as you must surely appreciate, ... > strings will be in the range 1...1000 characters. ... impose an artificially small limit on string length." ...
    (comp.programming)
  • Re: Fast UTF-8 strlen function
    ... >> Is there a fast UTF-8 string length function floating around? ... Length in bytes, or length in characters? ... For UTF-8, the main basic "change" you have to make to your string routines ... then I could individually look up the characters in my UNICODE ...
    (alt.lang.asm)
  • Re: Byte Array to String
    ... retrieved text will mismatch the original characters. ... encoding the characters. ... Dim strFileData as String ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: A note on personal corruption as a result of using C
    ... impossible to write effective string validation routines by definition ... (Note that a string literal may contain embedded null characters; ... without resorting to abusive language. ... In practice, programmers typically use "struct" ...
    (comp.programming)