Re: Hash table in C++? STL?
From: Siemel Naran (SiemelNaran_at_REMOVE.att.net)
Date: 04/16/04
- Next message: Siemel Naran: "Re: Hash table in C++? STL?"
- Previous message: tom_usenet: "Re: Problem with Cast"
- In reply to: pembed2003: "Re: Hash table in C++? STL?"
- Next in thread: pembed2003: "Re: Hash table in C++? STL?"
- Reply: pembed2003: "Re: Hash table in C++? STL?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Fri, 16 Apr 2004 08:44:07 GMT
"pembed2003" <pembed2003@yahoo.com> wrote in message
> "Siemel Naran" <SiemelNaran@REMOVE.att.net> wrote in message
news:<wfCfc.34540
> > > #define SIZE ((sizeof(int)) * (INT16_MAX) * (2))
Prefer to use const size_t instead of #define.
> > > void find(int* one,int c1,int* two,int c2){
> > > int i;
> > > int* buffer = malloc(SIZE);
> > > memset((void*)buffer,0,SIZE);
> > > for(i = 0; i < c1; i++)
> > > buffer[one[i] + INT16_MAX] = 1;
> > > for(i = 0; i < c2; i++)
> > > if(buffer[two[i] + INT16_MAX])
> > > printf("%d\n",two[i]);
> > > free(buffer);
> > > }
> > > this gives me a O(n) algr. but seems to be wasting a lot of memory so
> > > I am thinking using a hashtable instead. am i in the right track?
> > > Thanks!
> >
> > If the array size 'a' or 'one' is small then the O(N^2) algorithm is
fine.
>
> Is my solution a O(N^2) solution? I think it's a O(N) right?
It is O(N). Sorry for confusing. The O(N^2) algorithm is when you scan all
the chars in 'one', and for each scan all the chars in 'two'.
> Becasue...
>
> Assume array a and b both have N elements, the program loops through
> both array exactly once, so we have:
>
> for(i = 0; i < N; i++) // O(N)
> array_one[i] = 1;
> for(i = 0; i < N; i++) // O(N)
> does array_two[i] in array_one[i]
>
> N + N = 2N
>
> which is a O(N)
Right.
> > Either loop over the chars in 'one' or 'two', then over the chars in the
> > other loop. If you want to use STL, you can use std::count_if and
supply a
> > binary predicate that employs std::find (the binary predicate will have
a
> > constructor that takes the array 'one' as an argument).
>
> Will a binary search beat a hash lookup? I know this has no exact
> answer but generally speaking will a binary search be faster than a
> hash lookup?
Guessing hash lookup is faster in general. It's worth running an experiment
to see for your case. Who knows, maybe the hash function is too complicated
to compute and the sorted array is faster. It's pretty easy to all methods
given you have an implementation of hash_map available. If you can, post
your results if you do the experiment, because it's nice to know. (State
the size of N, result using: original algorithm, hash_map, std::map,
binary_search, N^2 method, double sort).
> > Note you can also sort the elements in 'one' and 'two', then loop over
both
> > containers simultaneously. Running time O(lg(N)) for the sort if using
> > std::sort which normally uses quicksort, and O(N) for the scan. I don't
> > think the standard provides such an algorithm though.
>
> But this solution requires:
>
> 1. Sort one array with best case O(N*log N) using a quicksort
> 2. Loop through another array lookup for dups which is a O(N)
>
> I think it goes like:
>
> sort(array_one); // O(N*log N)
> for(i = 0; i < N; i++) // O(N)
> look for array_two[i] in array_one
>
> which is a O(N*log N) algr. It doesn't look like it will beat my O(N)
> algr. Am I right?
Yes, I think your O(N) algorithm is probably best, but it consumes lots of
space. Nevertheless, it is possible that the binary search method is faster
than hash_table.
Also, you can sort both arrays, and walk through each simultaneously in
linear time.
> After I discovered map, I am actually doing:
>
> void find(int* a,int c1,int* b,int c2){
> std::map<int,int> m;
> for(int i = 0; i < c1; i++)
> m[a[i]] = 1;
> for(int j = 0; j < c2; j++)
> if(m[b[j]])
> std::cout<<b[j]<<std::endl;
> }
>
> What do you think about that? Thanks!
Looks right. You can use std::set too, which in principle should be a tad
faster. How fast does it run? Also note that a node in a map is pretty
expensive because you have to allocate space for one parent and two child
pointers, and if sizeof(int)==sizeof(node*), this is 4 times the memory per
node, plus the cost of maintaining the pointers. I'm willing to bet you a
$1 that sorting 'a' with std::sort and using std::binary_search instead is
faster than map and uses less memory.
You can replace std::map with std::hash_map if you are using STL. Don't
know about the other implementations, probably something similar.
original algorithm:
What you originally wrote
Still not sure why array size is double, why the multiply by 2 at the end?
const size_t SIZE = ((sizeof(int)) * (INT16_MAX) * (2))
std::map:
The above algorithm
hash_map (probably largest:
The above algorithm with std::map replaced by hash_map. Maybe there is a
function to set the bucket size that you can try to tweak for optimal
performance?
binary_search:
Similar to the above algorithm.
void find(int* a,int c1,int* b,int c2){
std::sort(a, a+c1);
for(int j = 0; j < c2; j++)
if(std::binary_search(a, a+c1, b[j]))
std::cout<<b[j]<<std::endl;
}
N^2 method:
void find(int* a,int c1,int* b,int c2){
for(int j = 0; j < c2; j++)
if(std::find(a, a+c1, b[j]))
std::cout<<b[j]<<std::endl;
}
double sort:
void find(int* a,int c1,int* b,int c2){
std::sort(a, a+c1);
std::sort(b, b+c1);
int i =0, j = 0;
while (true) {
if (i==c1 || j==c2) break;
const int ai = a[i];
const int bj = b[j];
if (ai < bj) ++i;
else if (ai > bj) ++j;
else {
std::cout<<bj<<std::endl;
++i;
++j;
}
}
}
It's 1:30am, so I don't claim the above is error free!
- Next message: Siemel Naran: "Re: Hash table in C++? STL?"
- Previous message: tom_usenet: "Re: Problem with Cast"
- In reply to: pembed2003: "Re: Hash table in C++? STL?"
- Next in thread: pembed2003: "Re: Hash table in C++? STL?"
- Reply: pembed2003: "Re: Hash table in C++? STL?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|