Re: Extreamly large Hashtable





Boudewijn Dijkstra wrote:
> "Eric Sosman" <eric.sosman@xxxxxxx> schreef in bericht
> news:d5g9sf$9sn$1@xxxxxxxxxxxxxxxxxxxxxxxxxxx
>
>>anthony@xxxxxxxxxxxx wrote:
>>
>>>Has anyone created an extreamly large Hashtable? I need to create a
>>>simple look up table key/value of some information.
>>>
>>>I'm assuming that if it is in memory it will be faster then looking
>>>this value up in a database.
>>>
>>>The problem is I have about 4,000,000 rows of data. Assuming I have
>>>enough memory to hold it, will a hashtable break being that large? Will
>>>it be fast? Each row will only be about 10k in size.
>>
>> Four megarows times ten kilobytes per row equals forty
>>gigabytes. Are you sure you have that much RAM available?
>>If you don't, you'll just trade database I/O for swap I/O.
>
>
> I believe the OP was talking about keeping the hashtable in memory, not the
> entire database.

That's possible; the O.P. has not divulged much about
what he's trying to do. I thought he intended to keep the
whole thing in memory because he specifically mentioned the
row size; if he were intending just to store the keys and
a disk pointer he'd have talked about the key size. Also,
he seemed to have a speed fetish, and speed is incompatible
with I/O ...

>> Note that the 10 KB row size is irrelevant to the table's
>>performance (unless it means that the keys' equals() and
>>hashCode() methods are slow). The table contains only the
>>references to the objects (that's another 7 GB, at a guess),
>>not the objects' data fields.
>
> Four million times 8 bytes (4-byte hashcode, 4-byte index) at a load of 75% is
> still less than 40 MB. Where is your guess based on?

I may have guessed wrong; let me try again. The Hashtable
itself contains four million references to Map.Entry objects;
in a 64-bit JVM (on the load-all-the-data assumption) they'll
take eight bytes apiece for 32 MB.

Each Map.Entry object holds references to a key and a value
(sixteen bytes) plus some amount of overhead to express its own
"objectivity" -- I don't know how much that is, but let's assume
it's something like sixteen bytes. That's thirty-two bytes for
each of the four million Map.Entry objects, or 128 MB.

Grand total for metadata: about 160 MB, "just a trifle" less
than my original guess. Even if the Map.Entry overhead is more
like thirty-two bytes and even if the Hashtable itself stores
more than just the Map.Entry reference, the total won't come
close to my 7 GB blunder ... (Memo to self: Stop trying to do
back-of-the-envelope calculations on envelope-less E-mail ;-)

--
Eric.Sosman@xxxxxxx


.



Relevant Pages

  • Re: whats better way to store a million keys in mem?
    ... If you *must* keep track of users by keeping user data in memory, ... If we assume an overhead of 8 bytes per object and 4 byte references, ... HashMap will use /at least/ 28 bytes per entry and that does not ... overhead, but you use a lot less memory and I guess it will also be ...
    (comp.lang.java.programmer)
  • Re: [patch 1/6] mmu_notifier: Core code
    ... external references to pages managed by the Linux kernel. ... access memory managed by the Linux kernel. ... The MMU notifier will notify the device driver that subscribes to such ...
    (Linux-Kernel)
  • Re: [PHP] Odd PHP memory issue
    ... all references to the result variable are unset when I ... to see if the memory is getting chewed up in a straight line or if it ... Scope seems like it should be simple, ... posting this question was to find out more about how PHP internals work, ...
    (php.general)
  • Re: Writing huge Sets() to disk
    ... that this code takes a lot of extra memory. ... > I believe it's the references problem, ... It's a bit unfortunate that all those instance variables are global to ... it merely aggregates it for use in storing new Python objects. ...
    (comp.lang.python)
  • Re: How to track memory usage?
    ... the IE memory leak issue is real, ... And JScript in IE allows the garbage collector to be explicitly ... but then removes all references to that object ... "Break Circular References" - examines all DIVs and removes ...
    (comp.lang.javascript)