Re: Collecting like-labelled sublists of a list



On Jul 31, 12:10 am, t...@xxxxxxxxxxxxx (Thomas A. Russ) wrote:
Cortez <relativef...@xxxxxxxxxxxxx> writes:

Hi Volkan,

Well a 1MB SDIF file might contain about 3000 partials, so that's 3000
labels. That's about the size of file I typically work with. But the
situation is complicated by the fact that within the file itself these
partials are distributed over a set of time-ordered matrices, which
have to be traversed in order to extract the data. Each matrix
contains information on every partial located at that point (each
matrix also has a header). The information itself (frequency,
amplitude, phase, etc - although I'm only interested in frequency at
the moment) is typically in the form of single or double floats. For
example, here are three matrices (which I model as arrays) containing
data extracted from a single clarinet note.
The label is in the left-
most column, followed by frequency, amplitude and other data which I
won't elaborate on here (relating to time-offset and noise content.
The time of the matrix itself is contained in the matrix header). So
you can see how the partials (0 to 6 in this example) are distributed
across the matrices -

#2A((0.0d0 1207.8944079486291d0 3.0572778826667964d-5
0.8507826862315859d0
0.9633105769790011d0 8.735621320690784d-4)
(1.0d0 2799.767798804896d0 2.9758011552006322d-5
3.45542871985445d0
0.9999134899562377d0 7.948914515372436d-4)
(2.0d0 556.677039133396d0 1.7342238369471644d-5
3.5441699661680883d0 0.0d0
0.0013469001214222072d0))
#2A((0.0d0 1203.4944446791235d0 2.7734811951076312d-5
3.7002160311387104d0
0.8199124994997562d0 4.790926227113963d-4)
(1.0d0 2660.5866387929855d0 2.5368686049752616d-5
3.5907080654740575d0

...

Well, it seems that one of the key questions is whether there are any
other constraints on the label that you are using. All of the values
are doubles, but it seems from the small sample given above that the
label values are all integer values.

Is this a correct assumption?

The data within each matrix has to be of the same binary type -
they're generally either 32- or 64-bit floats.

If so, and if the number of labels is reasonably bounded, then you can
take a two-pass buffering approach that should be reasonably efficient.

Is it reasonable to assume that the number of items you will need to
collect for each label is relatively small? If so, then a simple
solution to accumulating the values will work. Otherwise, something
more complicated will be needed.

Is it also the case that the labels will be in ascending order without
any missing values?

Yes, I just want to extract frequency data, contained in the second
column. And yes again, they're in ascending order. There might be
occasional small gaps, though, when a partial disappears from the
stream of matrices only to reappear a few matrices later.

So the original function I posted was
a model (admittedly simplistic) for the kind of thing I want to do
instead, which is to extract the partials as I'm actually reading in
the bytes.

Based on some guesses to the questions above, namely that the labels are
non-negative integers, not many total values per label, a bounded number
of labels and consecutive labels, I created the following. Actually,
consecutive is not a strict requirement, but you might end up with some
empty entries that way.

You set up a vector of maximum size, and then just assign the incoming
values into the correct bucket. In some ways this is a short-cut of a
hash-table where we assume that the label value itself constitutes the
hash-key into our collision-free vector. [It is related to the O(n)
radix-sort routine.]

(defconstant maximum-label-value 1024) ;; Or whatever is reasonable.

(defun collate-data (data)
(let ((results (make-array (list maximum-label-value)))
(max-label -1))
(dolist (datum data)
(let ((label (truncate (first datum)))
(values (rest datum)))
(setf (aref results label)
(nconc (aref results label) values))
(setf max-label (max max-label label))))
(values results max-label)))

This will return a vector with the data in place, and an indication of
how many elements are present.

CL-USER> (setq *input* (copy-tree '((0 a b) (1 c d) (2 e f) (3 g h)
(1 i j) (2 k l) (4 m n) (2 o p)
(4 q r) (5 s t))))
((0 A B) (1 C D) (2 E F) (3 G H) (1 I J) (2 K L) (4 M N) (2 O P) (4 Q R) (5 S T))

CL-USER> (time (collate-data *input*))
; cpu time (non-gc) 0 msec user, 0 msec system
; cpu time (gc) 0 msec user, 0 msec system
; cpu time (total) 0 msec user, 0 msec system
; real time 0 msec
; space allocation:
; 1 cons cell, 4,112 other bytes, 0 static bytes

#((A B) (C D I J) (E F K L O P) (G H) (M N Q R) (S T) NIL NIL NIL NIL ...)
5

Thank you Thomas, that's a neat suggestion.

BTW, more info on SDIF can be found at:
http://recherche.ircam.fr/equipes/analyse-synthese/sdif/

Wow. Looks complicated....

I model the various internal SDIF structures using just a few classes
and binary types. I'm only really interested in certain types of SDIF
files, namely 1TRC and RBEP types, which contain sinusoidal track
data. As it stands my library can handle them quite well. It's mainly
because I want to extract time and frequency values and convert them
into xy coordinates (for graphical display) that I'm interested in
parsing the files in a new (and more efficient) way.

.



Relevant Pages

  • Re: Collecting like-labelled sublists of a list
    ... you can see how the partials are distributed ... other constraints on the label that you are using. ... The data within each matrix has to be of the same binary type - ... Yes, I just want to extract frequency data, contained in the second ...
    (comp.lang.lisp)
  • Re: Collecting like-labelled sublists of a list
    ... Well a 1MB SDIF file might contain about 3000 partials, ... have to be traversed in order to extract the data. ... won't elaborate on here (relating to time-offset and noise content. ... other constraints on the label that you are using. ...
    (comp.lang.lisp)
  • Re: help with macros
    ... input data into the form and produce a label, then the following macro will ... 'Choose the label position ... label format and extract the name from the resulting macro. ... Word MVP web site http://word.mvps.org ...
    (microsoft.public.word.newusers)
  • Re: Labels in Virtuoso
    ... >> insert a label so that when I extract the layout I can access that ... >> label and measure voltage or current at that point. ... >assura or diva for layout extraction, that will depend on how the rule deck ... >You will not be allowed to insert multiple labels on the power grid, ...
    (comp.cad.cadence)
  • Re: help with macros
    ... The revised version below creates a new label document, ... The macro will then unlock and lock the form. ... that label format and extract the name from the resulting macro. ... Word MVP web site http://word.mvps.org ...
    (microsoft.public.word.newusers)