Re: Collecting like-labelled sublists of a list



Cortez <relativeflux@xxxxxxxxxxxxx> writes:

Hi Volkan,

Well a 1MB SDIF file might contain about 3000 partials, so that's 3000
labels. That's about the size of file I typically work with. But the
situation is complicated by the fact that within the file itself these
partials are distributed over a set of time-ordered matrices, which
have to be traversed in order to extract the data. Each matrix
contains information on every partial located at that point (each
matrix also has a header). The information itself (frequency,
amplitude, phase, etc - although I'm only interested in frequency at
the moment) is typically in the form of single or double floats. For
example, here are three matrices (which I model as arrays) containing
data extracted from a single clarinet note.

The label is in the left-
most column, followed by frequency, amplitude and other data which I
won't elaborate on here (relating to time-offset and noise content.
The time of the matrix itself is contained in the matrix header). So
you can see how the partials (0 to 6 in this example) are distributed
across the matrices -

#2A((0.0d0 1207.8944079486291d0 3.0572778826667964d-5
0.8507826862315859d0
0.9633105769790011d0 8.735621320690784d-4)
(1.0d0 2799.767798804896d0 2.9758011552006322d-5
3.45542871985445d0
0.9999134899562377d0 7.948914515372436d-4)
(2.0d0 556.677039133396d0 1.7342238369471644d-5
3.5441699661680883d0 0.0d0
0.0013469001214222072d0))
#2A((0.0d0 1203.4944446791235d0 2.7734811951076312d-5
3.7002160311387104d0
0.8199124994997562d0 4.790926227113963d-4)
(1.0d0 2660.5866387929855d0 2.5368686049752616d-5
3.5907080654740575d0
....

Well, it seems that one of the key questions is whether there are any
other constraints on the label that you are using. All of the values
are doubles, but it seems from the small sample given above that the
label values are all integer values.

Is this a correct assumption?

If so, and if the number of labels is reasonably bounded, then you can
take a two-pass buffering approach that should be reasonably efficient.

Is it reasonable to assume that the number of items you will need to
collect for each label is relatively small? If so, then a simple
solution to accumulating the values will work. Otherwise, something
more complicated will be needed.

Is it also the case that the labels will be in ascending order without
any missing values?

So the original function I posted was
a model (admittedly simplistic) for the kind of thing I want to do
instead, which is to extract the partials as I'm actually reading in
the bytes.

Based on some guesses to the questions above, namely that the labels are
non-negative integers, not many total values per label, a bounded number
of labels and consecutive labels, I created the following. Actually,
consecutive is not a strict requirement, but you might end up with some
empty entries that way.

You set up a vector of maximum size, and then just assign the incoming
values into the correct bucket. In some ways this is a short-cut of a
hash-table where we assume that the label value itself constitutes the
hash-key into our collision-free vector. [It is related to the O(n)
radix-sort routine.]

(defconstant maximum-label-value 1024) ;; Or whatever is reasonable.

(defun collate-data (data)
(let ((results (make-array (list maximum-label-value)))
(max-label -1))
(dolist (datum data)
(let ((label (truncate (first datum)))
(values (rest datum)))
(setf (aref results label)
(nconc (aref results label) values))
(setf max-label (max max-label label))))
(values results max-label)))

This will return a vector with the data in place, and an indication of
how many elements are present.

CL-USER> (setq *input* (copy-tree '((0 a b) (1 c d) (2 e f) (3 g h)
(1 i j) (2 k l) (4 m n) (2 o p)
(4 q r) (5 s t))))
((0 A B) (1 C D) (2 E F) (3 G H) (1 I J) (2 K L) (4 M N) (2 O P) (4 Q R) (5 S T))

CL-USER> (time (collate-data *input*))
; cpu time (non-gc) 0 msec user, 0 msec system
; cpu time (gc) 0 msec user, 0 msec system
; cpu time (total) 0 msec user, 0 msec system
; real time 0 msec
; space allocation:
; 1 cons cell, 4,112 other bytes, 0 static bytes

#((A B) (C D I J) (E F K L O P) (G H) (M N Q R) (S T) NIL NIL NIL NIL ...)
5


BTW, more info on SDIF can be found at:
http://recherche.ircam.fr/equipes/analyse-synthese/sdif/

Wow. Looks complicated....

--
Thomas A. Russ, USC/Information Sciences Institute
.



Relevant Pages

  • Re: Collecting like-labelled sublists of a list
    ... you can see how the partials are distributed ... other constraints on the label that you are using. ... The data within each matrix has to be of the same binary type - ... Yes, I just want to extract frequency data, contained in the second ...
    (comp.lang.lisp)
  • Re: Collecting like-labelled sublists of a list
    ... you can see how the partials are distributed ... other constraints on the label that you are using. ... The data within each matrix has to be of the same binary type - ... Yes, I just want to extract frequency data, contained in the second ...
    (comp.lang.lisp)
  • Re: help with macros
    ... input data into the form and produce a label, then the following macro will ... 'Choose the label position ... label format and extract the name from the resulting macro. ... Word MVP web site http://word.mvps.org ...
    (microsoft.public.word.newusers)
  • Re: Labels in Virtuoso
    ... >> insert a label so that when I extract the layout I can access that ... >> label and measure voltage or current at that point. ... >assura or diva for layout extraction, that will depend on how the rule deck ... >You will not be allowed to insert multiple labels on the power grid, ...
    (comp.cad.cadence)
  • Re: help with macros
    ... The revised version below creates a new label document, ... The macro will then unlock and lock the form. ... that label format and extract the name from the resulting macro. ... Word MVP web site http://word.mvps.org ...
    (microsoft.public.word.newusers)