Synchronization routine
- From: "Joris Dobbelsteen" <joris.dobbelsteen@xxxxxxxx>
- Date: Sun, 30 Oct 2005 21:30:48 +0100
I'm looking into a writitng a efficient routine and am looking for different
views of the problem:
Basically I have two lists, one on my firewall and one in a database. The
list on the firewall must become the same as the list in the database. The
current sizes are over 1 million entries and I expect to see 2 million rows
in the future.
Unfortuanlly the firewall vendor (Microsoft) made loading that list very
very slow when it gets large enough. Loading this list will take several
days on a very decent machine. Unfortuanlly I have only a slow system for
todays standards (would be top notch 6-7 years ago) but for the purpose.
So I have:
from the firewall:
Set A, which I can randomly access (though slowly)
from the database:
Set B, which is ordered, but can only be traversed sequentially once(!).
I expect the sets to scale up to 2 million elements.
The results should be:
C = A / B = all elements from A that are not in B
D = B / A = all elements from B that are not in A
I do not update set A until after retreiving the results. I have absolutely
no clue how they remove or add an enty to the list. So I don't want to jump
to make any assumptions in this regard.
Of course this must be done with only very limited computing power and I
mind the memory use.
My current solutions to the issue were:
Under the assumption that set A was ordered I used a 'merge join' which was
very fast. Unfortunally the assumption was flawed.
Using a sort routine has not yet worked out. I used quicksort, but I don't
believe its the correct solution. The entries in A are largely ordered so
quicksort wasn't too efficient and finally caused a stack overflow.
Next is using a 'hash join'. In my previous (probably inefficient)
implementation I saw memory usage reach over 100 MB for only 40000 entries.
Maybe anyone has a good suggestion which direction I should look for?
I thought it is a very nice problem, but hard to get it right.
- Joris
.
- Follow-Ups:
- Re: Synchronization routine
- From: Ertugrul Soeylemez
- Re: Synchronization routine
- From: Willem
- Re: Synchronization routine
- Prev by Date: Re: Is xml overhyped?
- Next by Date: bug in windows getsavefilename api?
- Previous by thread: Javaholics
- Next by thread: Re: Synchronization routine
- Index(es):
Relevant Pages
|