Re: massive data analysis with lisp
- From: "remixer" <remixer1@xxxxxxxxx>
- Date: 14 Oct 2006 21:11:36 -0700
fofiko@xxxxxxxxxxxxxxx wrote:
Yes, an excellent additional trick!
To finish complete your thought, all one needs to do is:
(file-position data-stream indexed-position)
(read data-stream)
and you've got the data nearly instantly! In this manner you can get
hold of as many or few of the entries totally at random access.
Thanks for the extension!
To finish up on the subject this is the code for a second index
(customer->ratings).
(defun read-mov-idx ()
(with-open-file (i "mov_index.txt")
(let ((htable (make-hash-table)))
(loop for res = (read i nil)
sum 1 into k
while res do
(setf (gethash k htable) res))
htable)))
(defun make-cust-idx (movidx)
(let ((assoctab (make-hash-table)))
(with-open-file (ifile "data.lisp")
(loop for k being the hash-keys in movidx using (hash-value v) do
(file-position ifile v)
(let ((res (cdr (read ifile))))
(loop for elem in res do
(let ((custid (first elem)))
(if (not (gethash custid assoctab))
(setf (gethash custid assoctab)
(make-array 1 :element-type 'fixnum
:fill-pointer 0 :adjustable
t)))
(vector-push-extend k (gethash custid
assoctab)))))))
(with-open-file (ofidx "cust_index.txt" :direction :output
:if-exists :supersede)
(with-open-file (ofcust "cust.lisp" :direction :output
:if-exists :supersede)
(loop for k being the hash-keys in assoctab using
(hash-value v) do
(format ofidx "~A ~A~%" k (file-position ofcust))
(format ofcust "~S~%" v))))))
It requires about 600mb of mem in order to build the index and after
its
done cust.lisp takes up 550mb and cust_index.txt about 8.5mb.
So with both indices in place and resident in memory, total memory
requirements
are about 8.6mb for a very fast way to get to your data at minimum
time wasted, no sql
and fully integrated with lisp ;-)
Thanks, this is great. Is it right that the third entry in the movie
index does not correspond to the entry for movie-id=3, as the output of
directory is not sorted? I ended up doing something like this to make
sure that the movie-index was accurate.
;; MovieIDs range from 1 to 17770 sequentially
(defparameter *num-movies* 17770)
;; mv_0000026.txt
(defun make-filename-for-movie (mid)
(format nil "~Amv_~7,'0D.txt" *trainingdir* mid))
(defun load-all-ratings-sequentially (o oindex dates? &optional
(num-movies *num-movies*))
(dotimes (i num-movies)
(read-movie (make-filename-for-movie (+ i 1)) o oindex dates?)))
.
- References:
- massive data analysis with lisp
- From: remixer
- Re: massive data analysis with lisp
- From: JShrager
- Re: massive data analysis with lisp
- From: remixer
- Re: massive data analysis with lisp
- From: JShrager
- Re: massive data analysis with lisp
- From: remixer
- Re: massive data analysis with lisp
- From: JShrager
- Re: massive data analysis with lisp
- From: K Livingston
- Re: massive data analysis with lisp
- From: JShrager
- Re: massive data analysis with lisp
- From: Thomas A. Russ
- Re: massive data analysis with lisp
- From: JShrager
- Re: massive data analysis with lisp
- From: grackle
- Re: massive data analysis with lisp
- From: JShrager
- Re: massive data analysis with lisp
- From: fofiko
- Re: massive data analysis with lisp
- From: JShrager
- Re: massive data analysis with lisp
- From: fofiko
- massive data analysis with lisp
- Prev by Date: Re: Definition of Software Engineering
- Next by Date: Re: minor stuff with CMU Lisp
- Previous by thread: Re: massive data analysis with lisp
- Next by thread: Re: massive data analysis with lisp
- Index(es):
Relevant Pages
|