Re: Suggestions for double-hashing scheme
- From: Clint Olsen <clint@xxxxxxxxx>
- Date: Tue, 21 Jun 2005 15:41:00 -0500
On 2005-06-21, Tim Rentsch <txr@xxxxxxxxxxxxxxxxxxx> wrote:
> Sorry for not responding to this earlier.
No problem!
> The items that are being moved are the items in the hash table itself,
> which are of fixed size (they are in an array, after all). If you want
> to hash variable-sized data, they need to be stored separately in any
> case. The hash table array would store, eg, a pointer to the memory
> holding the element being hashed.
>
> Make sense?
Well, just to make sure we're talking about the same thing, I am planning
on allocating enough space to store data elements of any size. So, my
struct is going to look something like:
typedef struct {
char *data; /* = malloc(length * width) */
unsigned hi; /* watermark for upsize */
unsigned lo; /* watermark for downsize */
unsigned size; /* number of elements in the hash */
unsigned width; /* size per object */
unsigned length; /* the current hash capacity */
struct {
unsigned status; /* { <probes 31:1>, <valid-bit 1:1> } */
unsigned hash_val0; /* calculated initial hash value */
unsigned hash_val1; /* calculated increment hash value */
} *entries; /* = malloc(length * sizeof(entry)) */
} hash_t;
It doesn't really make sense for me to store the actual data array and also
store void pointers to all of them. I'm already paying a penalty for
storing actual data since I have to store the width (size per item).
Additionally storing pointers to them would make me incur the charge twice.
So, where you just were able to have a void pointer on the stack, I'll need
to allocate space for a single item and copy it out of the hash. This
prevents the need for a recursive insert routine as you suggested.
One of the reasons I'm playing around with double hashing is the ability to
keep all the data items together. I'm curious how well it performs
compared to chaining. One advantage is that data locality is preserved,
since it's in the same array, rather than being scattered all about in
memory like you would with chain-links.
The downside? Well, the most obvious is that data movement means we copy
actual data using memcpy() rather than swapping pointers. It also means
that references to items in the hash will not be valid after any operation
that could result in a resize or reshuffle of items.
Foo myfoo;
hash_t table = table_create(..., sizeof(Foo), ...);
hash_put(table, &somestuff);
hash_put(table, &morestuff);
hash_get(table, &foo); /* if second param is not NULL, copy the data into
* it in case the pointer to the returned data
* gets moved around later */
-Clint
.
- Follow-Ups:
- Re: Suggestions for double-hashing scheme
- From: Tim Rentsch
- Re: Suggestions for double-hashing scheme
- References:
- Re: Suggestions for double-hashing scheme
- From: Clint Olsen
- Re: Suggestions for double-hashing scheme
- From: Tim Rentsch
- Re: Suggestions for double-hashing scheme
- From: Clint Olsen
- Re: Suggestions for double-hashing scheme
- From: Tim Rentsch
- Re: Suggestions for double-hashing scheme
- From: Clint Olsen
- Re: Suggestions for double-hashing scheme
- From: Tim Rentsch
- Re: Suggestions for double-hashing scheme
- From: Clint Olsen
- Re: Suggestions for double-hashing scheme
- From: Tim Rentsch
- Re: Suggestions for double-hashing scheme
- Prev by Date: Re: Which programming language is better to start
- Next by Date: Re: Which programming language is better to start
- Previous by thread: Re: Suggestions for double-hashing scheme
- Next by thread: Re: Suggestions for double-hashing scheme
- Index(es):
Relevant Pages
|