Re: "Read stuff from a file and chop it up to do stuff" code advice wanted.



landspeedrecord <landspeedrecord@xxxxxxxxx> writes:

I am writing code that reinvents the "read stuff from a file and chop
it up to do stuff" code wheel (of pain conan). If you know what I
mean... Why am I doing this?

a) to learn lisp and also how to program...
b) because I am too stupid (or it is too hard) to find code that does
the insanely simple crap I want to do as a newbie. I tried with
google... really. Plus I don't know how to use packages yet.

Anyway... I am seeking some advice... any advice about how to write
code better or with better style. Especially advice on how to write
more beautiful and/or simple code.

Am I over documenting?

You are asking people to comment on your code. Suppose,
hypothetically, that it is hard to follow. Then readers will
not be able to work out what it is supposed to, and will be
unable to suggest better ways of writing it. If you want
replies you need to offer a seperate, plain English, written
account of what the code is supposed to do. At first glance
you seem to be providing the necessary level of
documentation for a post to c.l.l. asking for advice. This
is very different from the documentation appropriate to a
source file.

The key idea in writing comments is that modern languages
permit lengthy descriptive variable names, pseudo-english
constructs such as loop, and many ways of writing code, with
the intention that you should write self-documenting
code. So the comments should mainly be about stuff that is
not present in the code. For example, if you have not
written the code in the obvious way, the code itself
explains the way that you actually wrote it, but omits both
the obvious way and why that doesn't actually work. So the
comment should provide what is missing, sketching the
obvious way and explaining the problem with it.


Under using useful functions that I am too stupid/ignorant
to use?

Err, I think you mean "Under using useful functions because
I have only just started on my wonderful adventure"

I assume
that there is a library that already does what my code is doing... but
I couldn't find it so I wrote my own.

The Perl way building a system of interacting programs is
that each program writes it outputs in formats that the
programmer makes up as he goes along. Later when an output
is needed as input to a program that is written later, the
programmer attempts to discover the grammar of the language
he has invented and tries to write an ad hoc parser for it
using regular expressions. Perl is a powerful language with
convenient regular expressions so this approach is almost
successful.

The Lisp way is to just use WRITE and READ. So things are a
bit ugly. (configuring the pretty printer can help). But
it is reliable and requires no coding.

;; NOT-TEXT?
;; INPUT: a character
;; OUTPUT: T or NIL
;;
;; This function returns TRUE if any character
;; passed to it is below #\!... which is where
;; all the ascii (unicode as well?) control characters
;; are. Also below #\! is #\space etc...

(defun not-text? (char)
(if (char< char #\!)
T
NIL))


It is traditional to hold #\? in reserve for use as a
macro-character. One might arrange that ?x gets read as
#s(match-variable name x). So tradition would have you name
the function not-text-p.

You can use predicates directly, not-text-p is a one-liner

(defun not-text-p (char)
(char< char #\!))

;; ALPHA-CHAR?
;; INPUT: a character
;; OUTPUT: T or NIL
;;
;; Only works with ASCII??? (I am NOT SURE.)
;; Returns TRUE if any character passed to it
;; is alphabetic. Problematically, Unicode
;; characters (I think it is Unicode at least! - not sure)
;; 91. #\[ 92. #\\ 93. #\] 94. #\^ 95. #\_ 96. #\`
;; are between 65. #\A and 122. #\z. and so they
;; would ALSO make the function return TRUE.
;; I don't know how to make Lisp change between different
;; character code sets so I can't fix/figure out how
;; to solve this issue.

The problem here is with original ASCII, I don't think that
changing character sets will help.


(defun alpha-char? (char)
(if (AND (char-not-lessp char #\A)(char-not-greaterp char #\z))
T
NIL))

No need for if. Notice that many CL ordering predicates
permit more than two arguments, which makes writing range
checks straight forward.

You don't have to write

(and (<= lower-limit x)
(<= x upper-limit))

you can simply say

(<= lower-limit x upper-limit)

This applies here,

CL-USER> (defun alpha-filter (char)
(char-not-greaterp #\A char #\Z))

CL-USER> (loop for code from 0 below 256
when (alpha-filter (code-char code))
collect (code-char code))

(#\A #\B #\C #\D #\E #\F #\G #\H #\I #\J #\K #\L #\M #\N #\O #\P #\Q #\R #\S
#\T #\U #\V #\W #\X #\Y #\Z #\a #\b #\c #\d #\e #\f #\g #\h #\i #\j #\k #\l
#\m #\n #\o #\p #\q #\r #\s #\t #\u #\v #\w #\x #\y #\z)

though now that I have given the game away by saying that
predicates end in #\p not #\? you are soon going to discover
the built-in functions alpha-char-p and alphanumericp




;; GET-TEXTCHUNK-FROM-STREAM
;; INPUT: a stream and an array to hold characters in temp memory.
;; OUTPUT: a string or NIL if at the absolute end of file and
;; the temp memory array is empty.
;;
;; This function grabs characters from a stream until it has
;; a chunk of text surrounded by white space
;; or linefeeds or carriage returns and returns the
;; resulting string. It works via recursion... IS THIS
;; INEFFECIENT or slower than a loop??? Don't know.
;;
;; Some notes about the steps of the "COND" part of the fuction:
;; 1 If new-char is nil - then end of file/stream! but if the
;; array contains characters then return the array so the final
;; characters in the stream aren't lost!
;; 2 new-char = NIL & the array has no characters.
;; end of file/stream! Return NIL.
;; 3 Any character below "!" is a control character like
;; #\newline or #\space etc... Text-chunk complete!
;; Throw out the control character and return the chunk.
;; 4 Still reading in legitimate characters. Push new-char
;; onto the array and then keep going
;; via recursion (passing in the updated char-array...)

(defun get-textchunk-from-stream (stream char-array)
;;the below NILs are important! for endoffile error avoidance.
(let ((new-char (read-char stream nil)))
;1
(cond ((AND (eq nil new-char) (> (length char-array) 0)) char-
array)
;2
((eq nil new-char) nil)
;3
((not-text? new-char) char-array)
;4
(T (progn (vector-push-extend new-char char-array)
(get-textchunk-from-stream stream char-array))))))


(and (eq nil new-char) (> (length char-array) 0))

could be

(and (not new-char)(plusp (length char-array)))

The progn is redundant. Cutting and pasting from the
hyperspec

Macro COND

Syntax:

cond {clause}* => result*

clause::= (test-form form*)
^
|
This crucial asterisk indicates 0,1,2, or more forms

They are evaluated in an implicit progn. Think of cond as

(cond ((test data)(do-this)(do-that)(compute-value))
((further-test data) (do-something-else data)(different-value-computation)))

COND looks a bit strange if you are used to C with its if-then-else but
it is used alot because it does both
if-then-elseif-then-else and packages up multiple statements.




;; GET-TEXTCHUNK
;; INPUT: a stream
;; OUTPUT: a string
;;
;; This function is just a helper function that sets up
;; GET-TEXTCHUNK-FROM-STREAM to begin its recursive process
;; properly. I could probably have done without it but
;; I couldn't figure out how to make GET-TEXTCHUNK-FROM-STREAM
;; self contained.

(defun get-textchunk (stream)
(let ((char-catcher (make-array 0 :element-type 'character
:fill-pointer 0
:adjustable t)))
(get-textchunk-from-stream stream char-catcher)))



;; GET-ALL-TEXTCHUNKS
;; INPUT: a stream.
;; OUTPUT: a list of strings.
;;
;; This function loops GET-TEXTCHUNK over and over again
;; until the stream ends and GET-TEXTCHUNK returns NIL finally,
;; whereupon GET-ALL-TEXTCHUNKS
;; returns a list of strings.

(defun get-all-textchunks (stream)
(loop for word = (get-textchunk stream)
while word collect word))



;; SLURP-STREAM5
;; INPUT: a stream
;; OUTPUT: a very long string?
;;
;; Holy Crap this grabs all the text from a stream so fast!
;; I got this off the web at:
;; http://www.emmett.ca/~sabetts/slurp.html
;; My older code was grabbing one word from the file stream at a time.
;; It is MUCH faster to use this code to read in all the text at once
;; and then run my code on the text string that this code creates.

(defun slurp-stream5 (stream)
(let ((seq (make-array (file-length stream)
:element-type 'character
:fill-pointer t)))
(setf (fill-pointer seq) (read-sequence seq stream))
seq))

slurp-character-stream might be a better name, reflecting
your commitment to a specific element type.



;; SUPER-TEXT-SLURP
;; INPUT: a file path
;; OUTPUT: the output of slurp-string5, i.e. a single long string.
;;
;; The functin uses slurp-stream5 to open a stream and
;; return all the text as a single long string. SUPER-TEXT-SLURP is
;; just some time saving code that saves me from having
;; to open and close strings just to read in text.
;; It also has the advantage of using WITH-OPEN-FILE
;; so the closing of the file is done automatically.

(defun super-text-slurp (file-location)
(with-open-file (temp-var file-location
:direction :input
:if-does-not-exist :error)
(slurp-stream5 temp-var)))

using the word "super" to denote a variant of a function
always ends in tears. You need something less general,
perhaps

slurp-text-file

temp-var is super-vague, which will annoy you when you
re-read your code in 6months time. The hyperspec says

macro WITH-OPEN-FILE

Syntax:

with-open-file (stream filespec options*) declaration* form*

so you can steal the names from there and write

(defun slurp-text-file (filespec)
(with-open-file (stream filespec
:direction :input
:if-does-not-exist :error)
(slurp-character-stream stream)))

Whoops, I am so out of time and I haven't got to the
interesting stuff yet.

Alan Crowe
Edinburgh
Scotland
.



Relevant Pages

  • "Read stuff from a file and chop it up to do stuff" code advice wanted.
    ... ;; This function returns TRUE if any character ... a stream and an array to hold characters in temp memory. ... ;; resulting string. ... Push new-char ...
    (comp.lang.lisp)
  • Re: ReplacerStream
    ... string, do a replace on that string and create a stream again to be ... If those are problems, and you are looking just for a single string, it seems to me that you could just read the stream one character at a time, checking to see if it matches the current character in your search string. ...
    (microsoft.public.dotnet.framework)
  • Re: Extent of standard C/C++ library support in Visual C++
    ... Check if character is alphanumeric ... Reopen stream with different file or mode ... Write formatted data to string (function) ... Find element in range (function template) ...
    (comp.sources.d)
  • Re: Output stream deleting last few bytes
    ... I think it was a null character, ... Public Sub New ... Dim outgoing As String ... _origStream is a Stream class field. ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: finding strings in a text file help
    ... digits and reserved words and then prints them out in order ... > it gets the whole string matches it against the reserved words array ... one character of a potential word in your "s" string. ... a char[] array would do. ...
    (comp.lang.java.help)