Re: Large Dictionaries
- From: Claudio Grondi <claudio.grondi@xxxxxxxxxx>
- Date: Thu, 18 May 2006 11:55:11 +0200
Chris Foote wrote:
Claudio Grondi wrote:Ok, according to the Windows task manager the Python process reads/writes to the file system during the run of BerkeleyDB test around 7 GByte(!) of data and the hard drive is continuously busy, where the size of file I found in the Temp directory is always below 20 MByte. The hard drive access is probably the main reason for loosing time - here a question to BerkeleyDB experts:Chris Foote wrote:>Klaas wrote:I have run your code for the bsddb on my P4 2.8 GHz and have got:
22.2s 20m25s[3]
20m to insert 1m keys? You are doing something wrong.
I've put together some simplified test code, but the bsddb
module gives 11m for 1M keys:
Number generator test for 1000000 number ranges
with a maximum of 3 wildcard digits.
Wed May 17 16:34:06 2006 dictionary population started
Wed May 17 16:34:14 2006 dictionary population stopped, duration 8.4s
Wed May 17 16:34:14 2006 StorageBerkeleyDB population started
Wed May 17 16:35:59 2006 StorageBerkeleyDB population stopped, duration 104.3sSurprising here, that the dictionary population gives the same time, but the BerkeleyDB inserts the records 6 times faster on my computer than on yours. I am running Python 2.4.2 on Windows XP SP2, and you?
Fedora core 5 with ext3 filesystem. The difference will be due to
the way that Windows buffers writes for the filesystem you're using
(it sounds like you're using a FAT-based file system).
Can the BerkeleyDB via Python bsddb3 interface be tuned to use only RAM or as BerkeleyDB can scale to larger data amount it makes not much sense to tweak it into RAM?
Chris, is maybe a RAM-disk the right way to go here to save time lost for accessing the file stored in the file system on the hard drive?
The RAM requirements, according to Windows XP task manager, are below 100 MByte. I am using the NTFS file system (yes, I know, that FAT is in some configurations faster than NTFS) and XP Professional SP2 without any tuning of file system caching. The CPU is 100% busy.
What CPU and RAM (SIMM, DDR, DDR2) do you have? I have 2GByte fast DDR PC400/3200 dual line RAM. It seems, that you are still not getting results within the range others experience running your code, so I suppose, it has something to do with the hardware you are using.
One of the reasons I put an eye on BerkeleyDB is that it pretends to scale to a huge amount (Terrabyte) of data and don't need as much RAM as Python dictionary and it is not necessary to save/load pickled version of the data (i.e. here the dictionary) from/to RAM in order to work with it.
Number generator test for 1000000 number rangesAs I don't have SQLite installed, it is interesting to see if the factor 10 in the speed difference between BerkeleyDB and SQLite can be confirmed by someone else.
with a maximum of 3 wildcard digits.
Wed May 17 22:18:17 2006 dictionary population started
Wed May 17 22:18:26 2006 dictionary population stopped, duration 8.6s
Wed May 17 22:18:27 2006 StorageBerkeleyDB population started
Wed May 17 22:29:32 2006 StorageBerkeleyDB population stopped, duration 665.6s
Wed May 17 22:29:33 2006 StorageSQLite population started
Wed May 17 22:30:38 2006 StorageSQLite population stopped, duration 65.5s
Why is SQLite faster here? I suppose, that SQLite first adds all the records and builds the index afterwards with all the records there (with db.commit()).
SQLite is way faster because BerkeleyDB always uses a disk file,
and SQLite is in RAM only.
I guess, that in your case BerkeleyDB is for the named reasons probably the right way to go, except your data will stay small and the Python dictionary with them will always fit into RAM.
Now I am curious to know which path you have decided to go and why?
Claudio
Cheers,
Chris
.
- References:
- Large Dictionaries
- From: Chris Foote
- Re: Large Dictionaries
- From: Claudio Grondi
- Re: Large Dictionaries
- From: Chris Foote
- Re: Large Dictionaries
- From: Klaas
- Re: Large Dictionaries
- From: Chris Foote
- Re: Large Dictionaries
- From: Claudio Grondi
- Re: Large Dictionaries
- From: Chris Foote
- Large Dictionaries
- Prev by Date: Re: Proposal for new operators to python that add syntactic sugar for hierarcical data.
- Next by Date: Re: Proposal for new operators to python that add syntactic sugar for hierarcical data.
- Previous by thread: Re: Large Dictionaries
- Next by thread: Re: Large Dictionaries
- Index(es):
Relevant Pages
|