Re: concurrency with DBI questions



"Andy" <andrewfaseuk@xxxxxxxxxxx> wrote:
Hi,

I'm trying to look into the best way of speeding up multiple insert or
update statements into a database table. The values to insert or update
are in a file which could contain upto 100,000 records.

Generally, the fastest way to bulk load data is to the use the database's
bulk loading tool, like sqlldr for Oracle or MySQL's LOAD DATA INFILE.
I tend to want to do more transformation/error-checking/etc. than is
conventient with those tools, but still they are pretty good at what they
specialize in.

My plan was to have multiple children to do the database calls and feed
them via a process just reading the data from a file. I believe using
DBI with threads is a bad idea

I wouldn't try using DBI with threads in the first place since fork is so
easy, but other than that I see no reason not to do so, as long as you do
it right (i.e. each thread opening its own independent DB connection.)


So the only other method I can think of is to fork multiple child
processes's. Each child will need to create its own database connection
and then the main process will need to feed in the data to do the
database calls

Why have a main process at all? Have all forked processes be peers.


So questions:

1. Is fork my only option ? or am I overlooking a much better solution

No, and maybe. There are other options, and using the specialized bulk
loader that came with your database is likely to be much better.


2. How do I produce a "queue" for the forked processes to work from ?,
the only think I can think of is using some sort of PIPE,

warn "transaction logic left to the reader";
my $parallel=4;
my $pm = new Parallel::ForkManager($parallel);

foreach my $id (0..$parallel-1) {
$pm->start and next; # do the fork
my $dbh=Get_connect();
my $sth = $dbh->prepare('whatever');
open my $fh, "input.txt" or die $!;
while (<$fh>) { chomp;
next unless ($. % $parallel == $id);
do_whatever($sth,$_);
};
$sth->finish;
$dbh->disconnect;
$pm->finish; # do the exit in the child process
}
$pm->wait_all_children;


Altenratively, split the file into chunks beforehand (using, say, the
unix "split" util) and give each chunk to a child)

any other ideas

3. What happens if the child process can't get a database connection
for some reason (database refuses connection) ? how can it tell the
master not to send requests as they cannot be processed, if there was a
single queue to work from this would be fine (as the child would'nt
pick any requests up) but if using a queue per child as I would be
using "Safe Pipes" then I think i'm stuck

No true master, no problem. If the child dies, it logs this fact. You
look into the log, and just restart just that one child.


4. Do you think I even need to to try and make this job concurrent or
will a single thread be quick enough ?

How could anyone possibly know this but you?

anyone have any stats on how
many insert / update statements can be run through DBI per second ?

It depends on the size of the data to be updated, the contention on the
database, the size of the database server, the speed of the storage system
of the serve, the structure (indexes and constraints on the tables), the
commit frequency, etc, etc, etc.

Due to contention on the DB end, splitting the data into multiple streams
may not speed up much at all, and in fact may slow things down. You will
have to try it and see.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
.



Relevant Pages

  • Re: Why the police now have to ask teenage muggers: Do you eat chips?
    ... can you prefix the subject with 'SHIT' so that sensible people can ... within an Every Child <atters programme. ... life tied to this database. ... If people don't resist information gathering and information ...
    (uk.legal)
  • Politicians and celebrities to be protected from child database
    ... Politicians and celebrities to be protected from child database ...
    (uk.legal)
  • Re: Tables wont link correctly
    ... another file (database, text file, spreadsheet file, etc) that are "linked" ... The subtables link field remain blank. ... automatically if you're entering data directly into the child table. ... populate with the primary key value from the parent record. ...
    (microsoft.public.access.tablesdbdesign)
  • Re: Performance problem on import
    ... I am attempting to do a mass import of data to this database using a C++ program and the ... mutiple insert commands on both child tables ... until I get to about 50 or so entries in the master table. ... I see delays of 2 or 3 seconds on every insert command. ...
    (microsoft.public.access.externaldata)
  • Re: Using a Resource as a Class Property
    ... In OOP the constructor of the grandest child:) will be called when a class is instantiated as an object. ... By having to call the parent constructor from within the child object, It gives the child better control. ... In the case of a Database object, the main purpose of the object is to perform the communications. ... abstract methods open, close, query ...
    (comp.lang.php)