Re: "Read stuff from a file and chop it up to do stuff" code advice wanted.



On 25 Oct, 15:28, Alan Crowe <a...@xxxxxxxxxxxxxxxxxxxxxxx> wrote:
landspeedrecord <landspeedrec...@xxxxxxxxx> writes:
I am writing code that reinvents the "read stuff from a file and chop
it up to do stuff" code wheel (of pain conan). If you know what I
mean... Why am I doing this?

a) to learn lisp and also how to program...
b) because I am too stupid (or it is too hard) to find code that does
the insanely simple crap I want to do as a newbie. I tried with
google... really. Plus I don't know how to use packages yet.

Anyway... I am seeking some advice... any advice about how to write
code better or with better style. Especially advice on how to write
more beautiful and/or simple code.
Am I over documenting?

You are asking people to comment on your code. Suppose,
hypothetically, that it is hard to follow. Then readers will
not be able to work out what it is supposed to, and will be
unable to suggest better ways of writing it. If you want
replies you need to offer a seperate, plain English, written
account of what the code is supposed to do. At first glance
you seem to be providing the necessary level of
documentation for a post to c.l.l. asking for advice. This
is very different from the documentation appropriate to a
source file.

The key idea in writing comments is that modern languages
permit lengthy descriptive variable names, pseudo-english
constructs such as loop, and many ways of writing code, with
the intention that you should write self-documenting
code. So the comments should mainly be about stuff that is
not present in the code. For example, if you have not
written the code in the obvious way, the code itself
explains the way that you actually wrote it, but omits both
the obvious way and why that doesn't actually work. So the
comment should provide what is missing, sketching the
obvious way and explaining the problem with it.

Under using useful functions that I am too stupid/ignorant
to use?

Err, I think you mean "Under using useful functions because
I have only just started on my wonderful adventure"

I assume
that there is a library that already does what my code is doing... but
I couldn't find it so I wrote my own.

The Perl way building a system of interacting programs is
that each program writes it outputs in formats that the
programmer makes up as he goes along. Later when an output
is needed as input to a program that is written later, the
programmer attempts to discover the grammar of the language
he has invented and tries to write an ad hoc parser for it
using regular expressions. Perl is a powerful language with
convenient regular expressions so this approach is almost
successful.

The Lisp way is to just use WRITE and READ. So things are a
bit ugly. (configuring the pretty printer can help). But
it is reliable and requires no coding.

;; NOT-TEXT?
;; INPUT: a character
;; OUTPUT: T or NIL
;;
;; This function returns TRUE if any character
;; passed to it is below #\!... which is where
;; all the ascii (unicode as well?) control characters
;; are. Also below #\! is #\space etc...

(defun not-text? (char)
(if (char< char #\!)
T
NIL))

It is traditional to hold #\? in reserve for use as a
macro-character. One might arrange that ?x gets read as
#s(match-variable name x). So tradition would have you name
the function not-text-p.

You can use predicates directly, not-text-p is a one-liner

(defun not-text-p (char)
(char< char #\!))

;; ALPHA-CHAR?
;; INPUT: a character
;; OUTPUT: T or NIL
;;
;; Only works with ASCII??? (I am NOT SURE.)
;; Returns TRUE if any character passed to it
;; is alphabetic. Problematically, Unicode
;; characters (I think it is Unicode at least! - not sure)
;; 91. #\[ 92. #\\ 93. #\] 94. #\^ 95. #\_ 96. #\`
;; are between 65. #\A and 122. #\z. and so they
;; would ALSO make the function return TRUE.
;; I don't know how to make Lisp change between different
;; character code sets so I can't fix/figure out how
;; to solve this issue.

The problem here is with original ASCII, I don't think that
changing character sets will help.



(defun alpha-char? (char)
(if (AND (char-not-lessp char #\A)(char-not-greaterp char #\z))
T
NIL))

No need for if. Notice that many CL ordering predicates
permit more than two arguments, which makes writing range
checks straight forward.

You don't have to write

(and (<= lower-limit x)
(<= x upper-limit))

you can simply say

(<= lower-limit x upper-limit)

This applies here,

CL-USER> (defun alpha-filter (char)
(char-not-greaterp #\A char #\Z))

CL-USER> (loop for code from 0 below 256
when (alpha-filter (code-char code))
collect (code-char code))

(#\A #\B #\C #\D #\E #\F #\G #\H #\I #\J #\K #\L #\M #\N #\O #\P #\Q #\R #\S
#\T #\U #\V #\W #\X #\Y #\Z #\a #\b #\c #\d #\e #\f #\g #\h #\i #\j #\k #\l
#\m #\n #\o #\p #\q #\r #\s #\t #\u #\v #\w #\x #\y #\z)

though now that I have given the game away by saying that
predicates end in #\p not #\? you are soon going to discover
the built-in functions alpha-char-p and alphanumericp







;; GET-TEXTCHUNK-FROM-STREAM
;; INPUT: a stream and an array to hold characters in temp memory.
;; OUTPUT: a string or NIL if at the absolute end of file and
;; the temp memory array is empty.
;;
;; This function grabs characters from a stream until it has
;; a chunk of text surrounded by white space
;; or linefeeds or carriage returns and returns the
;; resulting string. It works via recursion... IS THIS
;; INEFFECIENT or slower than a loop??? Don't know.
;;
;; Some notes about the steps of the "COND" part of the fuction:
;; 1 If new-char is nil - then end of file/stream! but if the
;; array contains characters then return the array so the final
;; characters in the stream aren't lost!
;; 2 new-char = NIL & the array has no characters.
;; end of file/stream! Return NIL.
;; 3 Any character below "!" is a control character like
;; #\newline or #\space etc... Text-chunk complete!
;; Throw out the control character and return the chunk.
;; 4 Still reading in legitimate characters. Push new-char
;; onto the array and then keep going
;; via recursion (passing in the updated char-array...)

(defun get-textchunk-from-stream (stream char-array)
;;the below NILs are important! for endoffile error avoidance.
(let ((new-char (read-char stream nil)))
;1
(cond ((AND (eq nil new-char) (> (length char-array) 0)) char-
array)
;2
((eq nil new-char) nil)
;3
((not-text? new-char) char-array)
;4
(T (progn (vector-push-extend new-char char-array)
(get-textchunk-from-stream stream char-array))))))

(and (eq nil new-char) (> (length char-array) 0))

could be

(and (not new-char)(plusp (length char-array)))

The progn is redundant. Cutting and pasting from the
hyperspec

Macro COND

Syntax:

cond {clause}* => result*

clause::= (test-form form*)
^
|
This crucial asterisk indicates 0,1,2, or more forms

They are evaluated in an implicit progn. Think of cond as

(cond ((test data)(do-this)(do-that)(compute-value))
((further-test data) (do-something-else data)(different-value-computation)))

COND looks a bit strange if you are used to C with its if-then-else but
it is used alot because it does both
if-then-elseif-then-else and packages up multiple statements.







;; GET-TEXTCHUNK
;; INPUT: a stream
;; OUTPUT: a string
;;
;; This function is just a helper function that sets up
;; GET-TEXTCHUNK-FROM-STREAM to begin its recursive process
;; properly. I could probably have done without it but
;; I couldn't figure out how to make GET-TEXTCHUNK-FROM-STREAM
;; self contained.

(defun get-textchunk (stream)
(let ((char-catcher (make-array 0 :element-type 'character
:fill-pointer 0
:adjustable t)))
(get-textchunk-from-stream stream char-catcher)))

;; GET-ALL-TEXTCHUNKS
;; INPUT: a stream.
;; OUTPUT: a list of strings.
;;
;; This function loops GET-TEXTCHUNK over and over again
;; until the stream ends and GET-TEXTCHUNK returns NIL finally,
;; whereupon GET-ALL-TEXTCHUNKS
;; returns a list of strings.

(defun get-all-textchunks (stream)
(loop for word = (get-textchunk stream)
while word collect word))

;; SLURP-STREAM5
;; INPUT: a stream
;; OUTPUT: a very long string?
;;
;; Holy Crap this grabs all the text from a stream so fast!
;; I got this off the web at:
;;http://www.emmett.ca/~sabetts/slurp.html
;; My older code was grabbing one word from the file stream at a time.
;; It is MUCH faster to use this code to read in all the text at once
;; and then run my code on the text string that this code creates.

(defun slurp-stream5 (stream)
(let ((seq (make-array (file-length stream)
:element-type 'character
:fill-pointer t)))
(setf (fill-pointer seq) (read-sequence seq stream))
seq))

slurp-character-stream might be a better name, reflecting
your commitment to a specific element type.





;; SUPER-TEXT-SLURP
;; INPUT: a file path
;; OUTPUT: the output of slurp-string5, i.e. a single long string.
;;
;; The functin uses slurp-stream5 to open a stream and
;; return all the text as a single long string. SUPER-TEXT-SLURP is
;; just some time saving code that saves me from having
;; to open and close strings just to read in text.
;; It also has the advantage of using WITH-OPEN-FILE
;; so the closing of the file is done automatically.

(defun super-text-slurp (file-location)
(with-open-file (temp-var file-location
:direction :input
:if-does-not-exist :error)

...

read more »- Hide quoted text -

- Show quoted text -- Hide quoted text -

- Show quoted text -- Hide quoted text -

- Show quoted text -

I haven't read all your code but in general for this class of problem
there are 2 approaches

1. Read in the stuff from the file into a list and then process the
list.
2. Read and process the file incrementally as it is being read.

In general, method 1 results in code that is cleaner and easier to
read and declarative in spirit. 2. is suitable if the input is very
large and you're looking to retain only a fragment of the original.

If you're a newbie I'd advise going for 1. In Qi there is an inbuilt
function read-file-as-charlist which reads the contents of a file as a
list of characters. If you download the source from www.lambdassociates.org
you can find the Lisp source for this.

For now stay away from loop and other procedural constructions if
possible - they delay your evolution into thinking like a functional
programmer.

Mark


.



Relevant Pages

  • Re: Send string to IP address
    ... "Plain hex" implies something formatted as text, but doesn't answer the question of encoding. ... There's no "just" as far as "an ASCII string" is concerned. ... Characters are not bytes and bytes are not characters. ... Normally you'd create the Writer once at the same time as you create the underlying stream, rather than every time you write some text, obviously. ...
    (comp.lang.java.programmer)
  • Re: A question about fputs()
    ... The asciicharacterNUL does not normally appear in a text stream. ... used in memory as a 'string' terminator. ... You normally *don't* want null characters in text files. ...
    (comp.lang.c)
  • Re: Another Lisp newbie
    ... substituting a new string for an old string wherever it ... Graham uses a ring buffer to store the characters already ... If the test fails, use FILE-POSITION again to reset the position ... of the stream. ...
    (comp.lang.lisp)
  • Re: java.io.FileOutputStream misbehavior
    ... > For example you save a String with about 40kbyte into a file. ... > When putting a single umlaut in the String, ... with raw bytes, not characters. ... you are going from characters to bytes to put into the stream (e.g. what ...
    (comp.lang.java.help)
  • Working with openssl and files
    ... aes.iv = iv if iv!= nil ... The code works if I just encrypt and decrypt strings, ... looking at the length of the string its roughly about 1,000 ... new content is only about 300 characters long. ...
    (comp.lang.ruby)