Re: Extreamly large Hashtable



"Eric Sosman" <eric.sosman@xxxxxxx> schreef in bericht
news:d5g9sf$9sn$1@xxxxxxxxxxxxxxxxxxxxxxxxxxx
> anthony@xxxxxxxxxxxx wrote:
>> Has anyone created an extreamly large Hashtable? I need to create a
>> simple look up table key/value of some information.
>>
>> I'm assuming that if it is in memory it will be faster then looking
>> this value up in a database.
>>
>> The problem is I have about 4,000,000 rows of data. Assuming I have
>> enough memory to hold it, will a hashtable break being that large? Will
>> it be fast? Each row will only be about 10k in size.
>
> Four megarows times ten kilobytes per row equals forty
> gigabytes. Are you sure you have that much RAM available?
> If you don't, you'll just trade database I/O for swap I/O.

I believe the OP was talking about keeping the hashtable in memory, not the
entire database.

> Where does this 40GB come from? If you can read it at
> 100 MB/s (from a well-tuned RAID stripe, say), it'll take
> about seven minutes to pull in all the data and load the
> table. Divide this seven minutes by the expected savings
> per query to get the number of queries you must process in
> order to recoup the startup time.

Right. Mind that, to binary search such a record in such a database with a
load of 75%, it takes about 39 times a disk access and a compare of two 10 kB
blocks. Subtract from this, about 4/3 disk accesses and a hashcode
computation, to get the expected savings.

> Note that the 10 KB row size is irrelevant to the table's
> performance (unless it means that the keys' equals() and
> hashCode() methods are slow). The table contains only the
> references to the objects (that's another 7 GB, at a guess),
> not the objects' data fields.

Four million times 8 bytes (4-byte hashcode, 4-byte index) at a load of 75% is
still less than 40 MB. Where is your guess based on?


.



Relevant Pages

  • Re: dataset Performence Issue
    ... Microsoft that a DataSet is okay to abuse as a DataBase. ... Managed Code can never be as fast and as optimized ... very good for 90% of the situations i.e. normal memory usage, ... Merge/GetChanges - and oh lets not forget keeping your disconnected cache ...
    (microsoft.public.dotnet.framework.adonet)
  • Re: To Normalize or not ??
    ... The problem is that when you run a split database, ... save a word document (it is in memory, and thus does not get saved). ... ms-access is different then excel or word. ... database server. ...
    (microsoft.public.access.formscoding)
  • Re: Can I Trust Pointer Arithmetic In Re-Allocated Memory?
    ... and leaving the database in an undefined state. ... memory and leave data it arrived unchanged. ... have already allocated with a null pointer. ... SOME people have to clean up in that fashion. ...
    (comp.lang.c)
  • Re: Memory leak in ASP.NET web site
    ... > I have a web app that makes many queries to the database on every page. ... > string to a method and I am passed a datareader or else individual values ... > There is a horrible memory leak in this application. ... > restored until SQL Server and IIS services are restarted. ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: Memory leak in ASP.NET web site
    ... > I have a web app that makes many queries to the database on every page. ... > string to a method and I am passed a datareader or else individual values ... > There is a horrible memory leak in this application. ... > restored until SQL Server and IIS services are restarted. ...
    (microsoft.public.dotnet.framework.adonet)