Re: What is best term for data structure etc. for key-to-value lookups?



In Java, the general term for mapping key to value is the interface
"Map", which can be implemented as HashMap or TreeMap etc. The
single term implies both the data structure and the accompanying
methods for changing the contents and performing a lookup.
Unfortunately the word "map" means something entirely different
outside C++ and Java:
- Lisp: Applying function to each element of a sequence.
- Geography: Diagram showing where places are in relation to each other.
From: Logan Shaw <lshaw-usenet@xxxxxxxxxxxxx>
All these meanings are variations the same meaning of the term "map":
having two sets of items and some sort of relation between the items in
the sets that defines which elements in one set correspond to which
elements in the other set.

Except in Java the word means a data store and algorithm that
produces a single value per key request, whereas the Lisp meaning
is all data processed en masse in a particular sequence, no random
access at all, and in geography it's a gestalt view where the
*user* has the burden of actually doing the lookups manually by
visual processing, and where a whole lot of relationships between
different items at the same level are displayed simultaeously.

In Java and C++ (and Perl and awk and ...), the data structure
contains both sets and it also contains the definition of the
relationship between them.

The *range* isn't a set, it's a multi-set, in the sense that
different keys can happen to produce identical values, so the
values occur more than once in effect.

However it occurs to me that "relationship", as in "relational
database", or mathematics, is the correct general concept, and then
specifically a many-to-one relationship is what "map" means. In
RDBS jargon, there are two columns, left and right, and the left
column is a unique key. The major difference is that in a RDBS both
the direct single-result function and the reverse multiple-valued
function are provided, whereas in general a "map" doesn't provide
the reverse mvf at all.

In Lisp, the term "map" is used as a verb, but it is still used
in this sense, because you have an input list and an output list.

You don't actually have an output list, except when the mapping
function is MAPCAR or it's MAP with type-of-output 'LIST. And you
don't have it except immediately after the mapping is performed.
And even after you have it, you can't do a key lookup to get the
value. The key-to-value lookup isn't in the mapping function, it's
in the function that it applied to elements in the list.

Both of the lists can be viewed as sets, and the function you're
applying is the definition of the relationship.

Yes, but like I said, that function being applied isn't the mapping
function, rather the test rig that applies it to the test suite is
the mapping function. So the name MAP mismatches where the map
really is, hence the word "map" is misdirecting.

In geography, one set is physical (usually two-dimensional)
space, and the other set is some sort of quality or quantity or
attribute associated with that point in space. For example, in
an elevation map, the set of points in 2-D space has a
relationship with elevation.

As a dataset with lookup capabilities, such as a raster in RAM, yes
it works that way, but as a map is usually shown on paper, lookup
is done by the human, not by the map.

Other terms are:
- "associative array" (seems to be the preferred term in WikiPedia,
<http://en.wikipedia.org/wiki/Associative_array>
but I don't like it because it seems to imply that it's implemented
to look like an array)
I don't mind this term at all. I just look at this as a definition by
analogy. Everyone knows (well, programmers do at least) what an array is,
and a traditional array has the restriction that indexes must be integers.
An associative array behaves like an array but opens up the restriction
on the keys to allow any set on which you can compare any two elements
for equality. (Or at least something like that. Some implementations
only allow strings as keys. But the interesting part is that it allows
certain things which don't satisfy all the properties of integers.)

Yes, that's the origin of the term. But look how many lines it took
you to explain that rationale. I was hoping for a term that
wouldn't need such an explanation.

- "hash" (unfortuately seems to imply implementation as a hashtable)
Yes, it is a bit of a misnomer because it does imply a certain
implementation. So it's inaccurate,

Yes, my point.

but on the other hand, it is probably a pretty good term if your
goal is to get the point across.

Further point against it: Hash actually means two different things:
- The full process of computing a pseudo-random deterministic
function of the key, then reducing it modulo the number of
primary buckets to yield a first-attempt bucket number,
then resolving conflicts to yield a single bucket containing the
key (or empty so the key is immediately put in there), in a standard
hash table, or alternately sticking with that original bucket
but searching down an alist to find the key in a list of keys
all in the same primary bucket, so the actual bucket is in the
alist rather than in the primary table.
- What I nicknamed a "pre-hash", just computing the pseudo-random
deterministic function, without any of the rest of the work in
locating a specific bucket in a specific table. As the size of
the table changes because it filled up, the pre-hash might not
change at all even while the buckets are grossly scrambled.
Note that even after you hash the key and find the bucket, per the
first option there, still it takes one more memory access before
you have the value. So neither sense of "hash" actually provides
the value already, just the cell (in first sense) or 32-bit number
(in second sense).

So I argue that you really aren't getting the point across.

Now "hashtable" is presumed to do all of that, plus the actual
fetch of the value, per the "gethash(key)" function.
So what really captures the concept is the hashtable and gethash
function together. Given the other problem with forcing an
implementation, I reject the term here.

Language often endorses (and canonicalizes as standard usage)
terms that are not accurate but have a close relationship to the
actual concept desired. For example, the phrase "the crown" is
often used to mean "the king" (or "the queen", or royalty in
general) even though a crown itself is just a physical object.

I agree as to the state of archaic language misnomers, but I'm
opposed to deliberately introducing further examples of misnomers.
For example, common usage is that whales are fish and bats are
birds, but I reject the idea that we should deliberate persist that
usage.

Does anybody have a better term to suggest to keep it
language-neutral (not specific to Java or Perl or PHP) and not
confusing/misleading to novices? I'm thinking of "associative map",
but I'm not totally comfortable with it.
"Set of key/value pairs" is unambiguous and easily understood, but
it's a bit wordy.

Even abbreviated to SKVP (unpronouceable) or Sokevap (pronounceable
but slightly long, but similar enough to Sokoban and Pokeman that
it might have a chance of catching on). But even so, just "set"
without a left-to-right function seems to beg the question.

I have some other new ideas for the jargon:
- Key-Value (KV) collection
- Lookup collection
The basic idea is to emphasize two things, that it's a collection
of data rather than a general function that could potentially
compute a result from gazillions of possible inputs, it can compute
the result only for the data that has already been collected, and
that there's a lookup (key to value) function inherent in the
thingy, it's not *just* a collection without also a lookup method.

But I'm not totally comfortable with those new ideas either.
Maybe I should "punt" by taking full advantage of online Web-based
HTML documents by using a [selection list] instead of a fixed
wording in the document. [dropdown menu ]
[select options]
.



Relevant Pages

  • RE: Where to do a SQL Lookup: map or orchestration
    ... I have a question about where to handle a SQL Adapter lookup. ... customer part number and the CustomerId as criteria. ... Is this kind of thing best handled in a map using a (Database Lookup ...
    (microsoft.public.biztalk.general)
  • Re: Using java.util.map
    ... You perform a lookup and an insertion. ... these operations degenerate each element in the map would have been visited. ... create an extra pairobject for each entry), and the Java way is dirty. ... problems understanding simple for loops of the sort ...
    (comp.lang.java.advocacy)
  • LOOK AT ME WILL JOHNSON N MA SOCK PUPPETS IN SANTA CRUZ CA USING CRUZIO AT Lookup IP Address: 63.249
    ... LOOK AT ME WILL JOHNSON N MA SOCK PUPPETS IN SANTA CRUZ CA USING ... CRUZIO AT MAP OF Lookup IP Address: ...
    (soc.genealogy.medieval)
  • RE: Updategram inserts when DB lookup fails
    ... using a database Error Return functoid in your map is definitely an ... an XML format, then you can use the XML EXPLICIT option(giving you complete ... Orchestration and retry. ... > In my map I do a DB lookup to decide whether I should create an insert or an ...
    (microsoft.public.biztalk.general)
  • Re: Impl of a list of key/value pairs (and more ?)
    ... overkill if you just need index look-up. ... Array is indexed by non-negative integers, and there supposed to be no ... "array as a map" works in a very-very narrow ... Tree map allows index range scan lookup. ...
    (comp.lang.java.programmer)