Re: Suggestions for double-hashing scheme



On 2005-06-21, Tim Rentsch <txr@xxxxxxxxxxxxxxxxxxx> wrote:
> Sorry for not responding to this earlier.

No problem!

> The items that are being moved are the items in the hash table itself,
> which are of fixed size (they are in an array, after all). If you want
> to hash variable-sized data, they need to be stored separately in any
> case. The hash table array would store, eg, a pointer to the memory
> holding the element being hashed.
>
> Make sense?

Well, just to make sure we're talking about the same thing, I am planning
on allocating enough space to store data elements of any size. So, my
struct is going to look something like:

typedef struct {
char *data; /* = malloc(length * width) */
unsigned hi; /* watermark for upsize */
unsigned lo; /* watermark for downsize */
unsigned size; /* number of elements in the hash */
unsigned width; /* size per object */
unsigned length; /* the current hash capacity */
struct {
unsigned status; /* { <probes 31:1>, <valid-bit 1:1> } */
unsigned hash_val0; /* calculated initial hash value */
unsigned hash_val1; /* calculated increment hash value */
} *entries; /* = malloc(length * sizeof(entry)) */
} hash_t;

It doesn't really make sense for me to store the actual data array and also
store void pointers to all of them. I'm already paying a penalty for
storing actual data since I have to store the width (size per item).
Additionally storing pointers to them would make me incur the charge twice.

So, where you just were able to have a void pointer on the stack, I'll need
to allocate space for a single item and copy it out of the hash. This
prevents the need for a recursive insert routine as you suggested.

One of the reasons I'm playing around with double hashing is the ability to
keep all the data items together. I'm curious how well it performs
compared to chaining. One advantage is that data locality is preserved,
since it's in the same array, rather than being scattered all about in
memory like you would with chain-links.

The downside? Well, the most obvious is that data movement means we copy
actual data using memcpy() rather than swapping pointers. It also means
that references to items in the hash will not be valid after any operation
that could result in a resize or reshuffle of items.

Foo myfoo;
hash_t table = table_create(..., sizeof(Foo), ...);

hash_put(table, &somestuff);
hash_put(table, &morestuff);

hash_get(table, &foo); /* if second param is not NULL, copy the data into
* it in case the pointer to the returned data
* gets moved around later */

-Clint
.



Relevant Pages

  • Re: advice on good perl idiom
    ... Therefore you want to store that result in an auxiliary variable ... $bestV = check; ... using an array, I feel like I should be doing something with a hash. ...
    (comp.lang.perl.misc)
  • Re: How to create a 10000000000 size array ?
    ... > I just want a class of array which does not use MFC. ... If your pointers are the same size as the minimum addressable unit, ... You have 4 billion memory locations, but only room for 1 billion ... Do you really need to store these pointers at all, ...
    (microsoft.public.vc.language)
  • Re: array of pointers
    ... > I am using an array of pointers in FORTRAN 90 where those points store ... > array of pointers is neccesary more RAM memory to store these datas. ...
    (comp.lang.fortran)
  • Re: array of pointers
    ... | array of pointers is neccesary more RAM memory to store these datas. ... Visual Fortran pointer descriptor format (on 32-bit ...
    (comp.lang.fortran)
  • Re: segmentation fault writing to array elements in structure
    ... And here you try to access it as a '2-level array', that is, a pointer ... to a series of pointers. ... allocate space for NR pointers (to pointer to element, ... store each of those in data; ...
    (comp.lang.c)