Re: concurrency with DBI questions
- From: xhoster@xxxxxxxxx
- Date: 28 Feb 2006 19:53:55 GMT
"Andy" <andrewfaseuk@xxxxxxxxxxx> wrote:
Hi,
I'm trying to look into the best way of speeding up multiple insert or
update statements into a database table. The values to insert or update
are in a file which could contain upto 100,000 records.
Generally, the fastest way to bulk load data is to the use the database's
bulk loading tool, like sqlldr for Oracle or MySQL's LOAD DATA INFILE.
I tend to want to do more transformation/error-checking/etc. than is
conventient with those tools, but still they are pretty good at what they
specialize in.
My plan was to have multiple children to do the database calls and feed
them via a process just reading the data from a file. I believe using
DBI with threads is a bad idea
I wouldn't try using DBI with threads in the first place since fork is so
easy, but other than that I see no reason not to do so, as long as you do
it right (i.e. each thread opening its own independent DB connection.)
So the only other method I can think of is to fork multiple child
processes's. Each child will need to create its own database connection
and then the main process will need to feed in the data to do the
database calls
Why have a main process at all? Have all forked processes be peers.
So questions:
1. Is fork my only option ? or am I overlooking a much better solution
No, and maybe. There are other options, and using the specialized bulk
loader that came with your database is likely to be much better.
2. How do I produce a "queue" for the forked processes to work from ?,
the only think I can think of is using some sort of PIPE,
warn "transaction logic left to the reader";
my $parallel=4;
my $pm = new Parallel::ForkManager($parallel);
foreach my $id (0..$parallel-1) {
$pm->start and next; # do the fork
my $dbh=Get_connect();
my $sth = $dbh->prepare('whatever');
open my $fh, "input.txt" or die $!;
while (<$fh>) { chomp;
next unless ($. % $parallel == $id);
do_whatever($sth,$_);
};
$sth->finish;
$dbh->disconnect;
$pm->finish; # do the exit in the child process
}
$pm->wait_all_children;
Altenratively, split the file into chunks beforehand (using, say, the
unix "split" util) and give each chunk to a child)
any other ideas
3. What happens if the child process can't get a database connection
for some reason (database refuses connection) ? how can it tell the
master not to send requests as they cannot be processed, if there was a
single queue to work from this would be fine (as the child would'nt
pick any requests up) but if using a queue per child as I would be
using "Safe Pipes" then I think i'm stuck
No true master, no problem. If the child dies, it logs this fact. You
look into the log, and just restart just that one child.
4. Do you think I even need to to try and make this job concurrent or
will a single thread be quick enough ?
How could anyone possibly know this but you?
anyone have any stats on how
many insert / update statements can be run through DBI per second ?
It depends on the size of the data to be updated, the contention on the
database, the size of the database server, the speed of the storage system
of the serve, the structure (indexes and constraints on the tables), the
commit frequency, etc, etc, etc.
Due to contention on the DB end, splitting the data into multiple streams
may not speed up much at all, and in fact may slow things down. You will
have to try it and see.
Xho
--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
.
- References:
- concurrency with DBI questions
- From: Andy
- concurrency with DBI questions
- Prev by Date: Re: Combine hash declaration/assignment into single statement?
- Next by Date: Re: A Problem With GD
- Previous by thread: Re: concurrency with DBI questions
- Next by thread: Re: concurrency with DBI questions
- Index(es):
Relevant Pages
|