Re: massive data analysis with lisp



Yes, an excellent additional trick!

To finish complete your thought, all one needs to do is:

(file-position data-stream indexed-position)
(read data-stream)

and you've got the data nearly instantly! In this manner you can get
hold of as many or few of the entries totally at random access.

Thanks for the extension!

To finish up on the subject this is the code for a second index
(customer->ratings).

(defun read-mov-idx ()
(with-open-file (i "mov_index.txt")
(let ((htable (make-hash-table)))
(loop for res = (read i nil)
sum 1 into k
while res do
(setf (gethash k htable) res))
htable)))

(defun make-cust-idx (movidx)
(let ((assoctab (make-hash-table)))
(with-open-file (ifile "data.lisp")
(loop for k being the hash-keys in movidx using (hash-value v) do
(file-position ifile v)
(let ((res (cdr (read ifile))))
(loop for elem in res do
(let ((custid (first elem)))
(if (not (gethash custid assoctab))
(setf (gethash custid assoctab)
(make-array 1 :element-type 'fixnum
:fill-pointer 0 :adjustable
t)))
(vector-push-extend k (gethash custid
assoctab)))))))
(with-open-file (ofidx "cust_index.txt" :direction :output
:if-exists :supersede)
(with-open-file (ofcust "cust.lisp" :direction :output
:if-exists :supersede)
(loop for k being the hash-keys in assoctab using
(hash-value v) do
(format ofidx "~A ~A~%" k (file-position ofcust))
(format ofcust "~S~%" v))))))

It requires about 600mb of mem in order to build the index and after
its
done cust.lisp takes up 550mb and cust_index.txt about 8.5mb.

So with both indices in place and resident in memory, total memory
requirements
are about 8.6mb for a very fast way to get to your data at minimum
time wasted, no sql
and fully integrated with lisp ;-)

I'm sure someone could optimize this further if needed to reduce memory
used at build
time.

.



Relevant Pages

  • Re: massive data analysis with lisp
    ... while res do ... (loop for k being the hash-keys in movidx using (hash-value v) ... (if (not (gethash custid assoctab)) ...
    (comp.lang.lisp)
  • Re: little isprime challenge
    ... Would it be faster to compute the square root of p, and use it as the loop ... The speed of your routine will be ... > res = .false. ... > COMMON block which contains an array holding the primes found so far. ...
    (comp.lang.fortran)
  • Re: Seaching across the top
    ... if not res is nothing then ... msgbox "Not found" ... It keeps running the loop until one ...
    (microsoft.public.excel.programming)
  • Re: Countnumberofdays while function = TRUE
    ... Loop while res <= 10 ... behind each cell in B. ...
    (microsoft.public.excel.programming)