Re: concurrency with DBI questions
- From: xhoster@xxxxxxxxx
- Date: 28 Feb 2006 20:39:29 GMT
Jim Gibson <jgibson@xxxxxxxxxxxxxxxxx> wrote:
In article <1141148894.217670.119880@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>, DJ
Stunks <DJStunks@xxxxxxxxx> wrote:
Andy wrote:
Hi,
I'm trying to look into the best way of speeding up multiple insert
or update statements into a database table. The values to insert or
update are in a file which could contain upto 100,000 records.
I can only speak from a MySQL perspective, but the fastest way to load
records (and by a significant margin) is via LOAD DATA INFILE. Since
your records are already in a file, my suggestion would be to use Perl
to massage your input file into a delimited record format suitable for
insertion via LOAD DATA INFILE and then doing a single $dbh->do{}
command to load them.
I do not have much experience with this problem myself, but I do
remember the general advice given for large database inserts to a table
that has one or more indexes (indices?) defined is to delete the index,
do the inserts, and then recreate the index.
This is almost always a bad idea, unless the table is a new one being
populated for the first time. It isn't that bad an idea if the table
stays about the same size because the mass upload is accompanied by
a mass deletes of almost all of the old data, but even then there are
better ways to accomplish this. For example, it would generally be better
to copy the data to keep into a new table, load the bulk data to that new
table, then build the indexes and rename the new table to replace the old
one.
I had one person who was oh-so-clever and "optimized" a loading program
by dropping all the indices before a bulk load and remaking them after.
This reduced the overall time to load 300,000 data points into the table
by 20 or 40%, like from 5 minutes to 3 or 4 minutes or something, back when
the table only had 500,000 rows to start with. Fast forward 3 years and
the table has 100,000,000 rows. Just rebuilding the indices afterwards
took over 6 hours--far longer than it took to do the load with the indices
still there. And of course during that time the database would be
completely unusable.
This is faster than
revising the index after each insert. However, it is possible that a
bulk LOAD operation will do the same thing automatically.
Bulk load generally takes a middle road. It updates the indices in batches
in a way which is faster than doing it one row at a time but doesn't throw
the vast amounts of work it already did to index the pre-existing data.
Xho
--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
.
- References:
- concurrency with DBI questions
- From: Andy
- Re: concurrency with DBI questions
- From: DJ Stunks
- Re: concurrency with DBI questions
- From: Jim Gibson
- concurrency with DBI questions
- Prev by Date: Re: suppress regex parsing in interpolated string
- Next by Date: Re: concurrency with DBI questions
- Previous by thread: Re: concurrency with DBI questions
- Next by thread: Re: concurrency with DBI questions
- Index(es):
Relevant Pages
|
|