Re: Dict sharing vs. duplication




On Dec 21, 7:05 pm, ZB <zbREMOVE_THIS@xxxxxxxxxxxxxxxxxxxx> wrote:
Dnia 21.12.2007 tom.rmadilo <tom.rmad...@xxxxxxxxx> napisa³/a:

As a data structure I don't find much of a use for this one. I use an
array to enforce unique keys and I use lists to enforce order. I
combine the two in a number of ways: array of lists, list of array
names, etc.

A good example would be a typical "personal database" (or just any other):

- array of lists won't be suitable for this one

- list of array names isn't comfortable: if an array has to contain personal
data (say: arr(name), arr(address), arr(phone) and so on), so although you
have to use different array names, like f.e. arr001, arr002 or similar way
- but the real index will be list index, pointing to the array name (and
not array name itself). One can even just generate array names at random,
because they have just to be different, and will not have any other meaning.
And so there's an "intermediate level" when accessing the data. Of course,
one can live with that - but it isn't convenient neither elegant.

Much easier will this be solved using dictionaries, which are kind of records
(or C-structures). For example, such way:

set tmpDict [dict create name John address Washington phone 123455678]
set dBase [list]
lappend dBase $tmpDict
set tmpDict [dict create name Mary address Chicago phone 3453453453]
lappend dBase $tmpDict
...and so on...

Such way we've got a replacement for "ordinary", integer-indexed linear
array, which holds records as data, with record fields accessible just by
field names - very common structure. You can then sort $dBase list whichever
way you want, and the list index will be real pointer to particular record..

Just my thoughts since "my" thread "Still no records". BTW: take a look at
an example http://pinds.com/2000/09/18/datastructures-in-tcl- the guy had
to use "clever naming", and it's quite interesting solution - but pay
attention, how natural solution would be just using dictionaries (not
present at that time).

Of course Mr. Pind doesn't use Tcl anymore (od'd on ruby then
crashed), so who cares? Using clever names for arrays which you will
just put into a list is moronic. Lists are ordered structures and the
only requirement for arrays is that each needs a unique name. If you
use some kind of non-opaque name for the array names that relates to
order, you instantly lose the ability to insert, delete or reorder the
elements (like lreverse). This is the typical solution you find from
developers who can't figure out how to use the data structures which
exist, basically: arrays have unique keys and lists have unique
order.

But records are not that difficult to create in Tcl. I think the
problem is that whenever something is not present in the base Tcl
language it is seen as a missing feature. Since you can write a record
API in a page or two of Tcl code, it isn't so much a missing feature
as a lack of effort by those who want it. The argument against writing
it in Tcl is always that everything needs to be fast, like as fast as
possible. Believe me, if you write it in plain ol' Tcl and people use
it, you can port it to C. But just complaining about the lack of a
data structure which can be coded up over a weekend or two doesn't
help anyone.

If you want a head-start, I'll give you my nvlist code, which is
generalized beyond the dict code in some (tiny) ways, but not quite to
the record stage. And it is interesting to note here that my nvlist
API is designed to be special-cased with thin wrappers to provide
meaningfully named operators. Trying to push too many features into
core API may seem like a good idea at first, but a thin core API has
the advantage of composition, specialization and slight renaming so
that API users will easily understand the meaning of an API, and you
will be able to search for uses of the API.

So here are some examples:

http://junom.com/gitweb/gitweb.perl?p=twsdl.git&a=search&h=HEAD&st=grep&s=nvlist

This shows a search of uses of nvlist in a larger API. Note that in
most cases here there is a relatively small list of data items, but
each has more than the simple name-value relationship implied by the
name of the structure. This is a typical complex situation which isn't
easily handled by a dictionary, which has indexes on keys and values
only. This nvlist code is built upon [lsearch] and can use any range
of list members as a key, for instance, a 5 member list {a b c d e},
could be selected based upon a search of {* b c d *}.

Okay, that is the nvlist example, how about lists of array names:

http://junom.com/gitweb/gitweb.perl?p=twt.git;a=commit;h=c22ea29e5906d75bd9ed044a677030b3d7929e3a

There are two sets of files here. One is the original pg-
browse.tcl/.adp for browsing a database using foreign keys as links
between the tables. If you are interested, the pg-browse code uses the
OpenACS database API which creates a new specialized data structure,
which can't be easily used without the other associated API.

The second set of files is pg-tmp.tcl/.tmpl Here the database API
returns a list of array names. I wrote this API several years ago and
never considered the case of appending (append) additional rows with
another query, or adding columns (extend) to each row using a loop.

The original OpenACS API has lots of switches, and if you look at the
code it is quite complex, many long files, and the result is a
completely inaccessible data structure.

The new API uses lists and arrays. You can create and modify these
using regular Tcl code, no special API. So, when I needed to port this
relatively complex sequence of queries, I was actually surprised that
my original code fully supported the 'extend' and 'append' features
with no additional code.

My main point here is that it is very important to not create problems
for yourself with new specialized data structures. I think that the
new dict structure very much follows this idea. It does so because
there is a text/string/list representation which can be created in any
number of ways, and many of the operations return values which can be
turned into lists or arrays or new dicts without any special code.
This is very important, much more important than adding a super
functional but isolated data structure.

And don't use Lars Pind as an example of good coding in Tcl, not even
close.
.



Relevant Pages

  • Re: garbage collection problem in large linked lists
    ... Instead all buckets are allocated at once in an array. ... VG.net, which must be very scalable, but only for lists which would normally ... Linked lists require pointer dereferencing in order to traverse. ... When you run out of space, you allocate a new smaller array. ...
    (microsoft.public.dotnet.framework.performance)
  • Re: How to count value in a ArrayList
    ... were adding was not already stored in the data structure. ... Doing this with a ArrayList is a lot slower than a hash ... But presumably they use a relatively common technique of having a fixed-sized array of bins that grows as the hash table accumulates items. ... Some implementations use bins that are either arrays or linked lists, while others simply use the next available spot in the hash table. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Translating python to scheme
    ... Scheme has mutation, and there are reasons to use it. ... confusing when considering a 2d array. ... in other languages: Lisp Lists and Vectors. ...
    (comp.lang.scheme)
  • Re: vasam
    ... // Set the size for long lists in records #2 and #3 only! ... Here is the sequel of my previous post "Variable array sizes as members" ... I was given some advice on how to use malloc and realloc in oder to achieve ...
    (microsoft.public.vc.language)
  • Re: Approx mode creates largedr matricies
    ... real/complex "array" types ... In the "list (of lists)" alternate storage type, ... no ROM address is ever used, ... which started with the HP48S series) ...
    (comp.sys.hp48)