Re: Collecting like-labelled sublists of a list
- From: tar@xxxxxxxxxxxxx (Thomas A. Russ)
- Date: 30 Jul 2008 16:10:24 -0700
Cortez <relativeflux@xxxxxxxxxxxxx> writes:
Hi Volkan,
Well a 1MB SDIF file might contain about 3000 partials, so that's 3000
labels. That's about the size of file I typically work with. But the
situation is complicated by the fact that within the file itself these
partials are distributed over a set of time-ordered matrices, which
have to be traversed in order to extract the data. Each matrix
contains information on every partial located at that point (each
matrix also has a header). The information itself (frequency,
amplitude, phase, etc - although I'm only interested in frequency at
the moment) is typically in the form of single or double floats. For
example, here are three matrices (which I model as arrays) containing
data extracted from a single clarinet note.
The label is in the left-....
most column, followed by frequency, amplitude and other data which I
won't elaborate on here (relating to time-offset and noise content.
The time of the matrix itself is contained in the matrix header). So
you can see how the partials (0 to 6 in this example) are distributed
across the matrices -
#2A((0.0d0 1207.8944079486291d0 3.0572778826667964d-5
0.8507826862315859d0
0.9633105769790011d0 8.735621320690784d-4)
(1.0d0 2799.767798804896d0 2.9758011552006322d-5
3.45542871985445d0
0.9999134899562377d0 7.948914515372436d-4)
(2.0d0 556.677039133396d0 1.7342238369471644d-5
3.5441699661680883d0 0.0d0
0.0013469001214222072d0))
#2A((0.0d0 1203.4944446791235d0 2.7734811951076312d-5
3.7002160311387104d0
0.8199124994997562d0 4.790926227113963d-4)
(1.0d0 2660.5866387929855d0 2.5368686049752616d-5
3.5907080654740575d0
Well, it seems that one of the key questions is whether there are any
other constraints on the label that you are using. All of the values
are doubles, but it seems from the small sample given above that the
label values are all integer values.
Is this a correct assumption?
If so, and if the number of labels is reasonably bounded, then you can
take a two-pass buffering approach that should be reasonably efficient.
Is it reasonable to assume that the number of items you will need to
collect for each label is relatively small? If so, then a simple
solution to accumulating the values will work. Otherwise, something
more complicated will be needed.
Is it also the case that the labels will be in ascending order without
any missing values?
So the original function I posted was
a model (admittedly simplistic) for the kind of thing I want to do
instead, which is to extract the partials as I'm actually reading in
the bytes.
Based on some guesses to the questions above, namely that the labels are
non-negative integers, not many total values per label, a bounded number
of labels and consecutive labels, I created the following. Actually,
consecutive is not a strict requirement, but you might end up with some
empty entries that way.
You set up a vector of maximum size, and then just assign the incoming
values into the correct bucket. In some ways this is a short-cut of a
hash-table where we assume that the label value itself constitutes the
hash-key into our collision-free vector. [It is related to the O(n)
radix-sort routine.]
(defconstant maximum-label-value 1024) ;; Or whatever is reasonable.
(defun collate-data (data)
(let ((results (make-array (list maximum-label-value)))
(max-label -1))
(dolist (datum data)
(let ((label (truncate (first datum)))
(values (rest datum)))
(setf (aref results label)
(nconc (aref results label) values))
(setf max-label (max max-label label))))
(values results max-label)))
This will return a vector with the data in place, and an indication of
how many elements are present.
CL-USER> (setq *input* (copy-tree '((0 a b) (1 c d) (2 e f) (3 g h)
(1 i j) (2 k l) (4 m n) (2 o p)
(4 q r) (5 s t))))
((0 A B) (1 C D) (2 E F) (3 G H) (1 I J) (2 K L) (4 M N) (2 O P) (4 Q R) (5 S T))
CL-USER> (time (collate-data *input*))
; cpu time (non-gc) 0 msec user, 0 msec system
; cpu time (gc) 0 msec user, 0 msec system
; cpu time (total) 0 msec user, 0 msec system
; real time 0 msec
; space allocation:
; 1 cons cell, 4,112 other bytes, 0 static bytes
#((A B) (C D I J) (E F K L O P) (G H) (M N Q R) (S T) NIL NIL NIL NIL ...)
5
BTW, more info on SDIF can be found at:
http://recherche.ircam.fr/equipes/analyse-synthese/sdif/
Wow. Looks complicated....
--
Thomas A. Russ, USC/Information Sciences Institute
.
- Follow-Ups:
- Re: Collecting like-labelled sublists of a list
- From: Cortez
- Re: Collecting like-labelled sublists of a list
- References:
- Collecting like-labelled sublists of a list
- From: Cortez
- Re: Collecting like-labelled sublists of a list
- From: Cortez
- Re: Collecting like-labelled sublists of a list
- From: Alberto Riva
- Re: Collecting like-labelled sublists of a list
- From: Cortez
- Re: Collecting like-labelled sublists of a list
- From: Alberto Riva
- Re: Collecting like-labelled sublists of a list
- From: Cortez
- Re: Collecting like-labelled sublists of a list
- From: Volkan YAZICI
- Re: Collecting like-labelled sublists of a list
- From: Cortez
- Collecting like-labelled sublists of a list
- Prev by Date: Re: paging all socket geniuses
- Next by Date: Re: Collecting like-labelled sublists of a list
- Previous by thread: Re: Collecting like-labelled sublists of a list
- Next by thread: Re: Collecting like-labelled sublists of a list
- Index(es):
Relevant Pages
|