Re: large updates and deletes



On 1/18/06, Robert Hicks <sigzero@xxxxxxxxx> wrote:
> mark d wrote:
> > On 1/18/06, amonotod@xxxxxxxxxxx <amonotod@xxxxxxxxxxx> wrote:
> >> ---- mark d <murkyd@xxxxxxxxx> wrote:
> >>> This optimization might have made some sense decades over a serial line, but
> >>> not today. It is saving 320 single-packet round trips, which is an
> >>> infintesimal amount of traffic.
> >> Well, traffic wise, you're right. But, it also saves parsing all those rows out of the
> >> DB and through the DBI; this way the only process that has to work with the data
> >> is the oracle engine. Use DBI where appropriate... Right tool for the job, and all that...
> >
> > I have no idea what "parsing all those rows out of the DB and through the DBI"
> > means, though it sounds like you think row data is copied from the DB to
> > the DBI in the course of a sql DELETE statement (that would be a fundamental
> > misunderstanding of SQL and DBI.)
> >
> > In any case, you're mistaken that the PL/SQL saves anything but a few packets.
> > To be explicit, the proposed PL/SQL was:
> >
> >> declare
> >> rows_processed number;
> >> batch_size number := 50000;
> >> begin
> >> rows_processed := batch_size;
> >> while rows_processed = batch_size loop
> >> delete from table where rownum < batch_size + 1;
> >> rows_processed := SQL%ROWCOUNT;
> >> commit;
> >> end loop;
> >> end;
> >
> > The equivalent DBI code is:
> >
> > $sth = $dbh->prepare("delete from TABLE where rownum < 50001");
> > while ( ($deleted = $sth->execute) > 0) {
> > printf "deleted $deleted\n";
> >
> >
> > The only difference between the two is a few network packets.
>
> Is that the only difference? I read it, and I could be wrong, that your
> SQL will stop when rownum hits 50001, while his will keep going in
> batches of 50000 until there isn't any more to delete given the delete
> statement.

No. Look more closely. The execute is inside a while loop.

The execute() method returns the number of "affected" rows. For a DELETE,
this means the number of deleted rows. The conditional test inside the loop
checks to see when more than zero rows have been deleted, so it repeats
the delete until zero rows are left.

Here's a real-life sample:

code:

$batch = 50001;

$sth = $dbh->prepare("delete from SNAFU where rownum < $batch");

printf "Rows in table = %d\n",
$dbh->selectall_arrayref("select count(*) from SNAFU")->[0]->[0];

$t = time();
while (($deleted = $sth->execute) > 0) {

printf "batch %d: deleted %d rows in %f secs\n",
++$i, $deleted, time()-$t;

$t = time();
}


Output:


Rows in table = 700000
batch 1: deleted 50000 rows in 0.850589 secs
batch 2: deleted 50000 rows in 0.840419 secs
batch 3: deleted 50000 rows in 0.931219 secs
batch 4: deleted 50000 rows in 0.835611 secs
batch 5: deleted 50000 rows in 0.841162 secs
batch 6: deleted 50000 rows in 0.834864 secs
batch 7: deleted 50000 rows in 3.877935 secs
batch 8: deleted 50000 rows in 0.871310 secs
batch 9: deleted 50000 rows in 0.826238 secs
batch 10: deleted 50000 rows in 0.825013 secs
batch 11: deleted 50000 rows in 0.829521 secs
batch 12: deleted 50000 rows in 0.838740 secs
batch 13: deleted 50000 rows in 0.830472 secs
batch 14: deleted 50000 rows in 0.881580 secs
Done



> Yours is a simple delete. I need to break up my deletes into manageable
> sizes, do the delete, do the next batch, etc.

Note that unless you have a meaningful basis for, and definition of,
"manageable",
this is all a complete waste of time anyway. (I originally only
chimed in to note
the pointless "optimization", but I guess I'll comment on the original
question too :)

"large" is highly relative. 16 million is "small" to me. It might be
large for a
MS Access rolodex, but it's certainly not large by any objective measure.

> On the delete side I have about 16 million records that will be deleted.
> I have two of those to do. I have one update that changes a few values
> on about 3.1 million records (not the whole table).

Unless you have good reason to believe it would be a problem, just issue
a single SQL statement and don't waste your time fooling around with
these hacks.

Mark
.



Relevant Pages

  • RE: Question with DBI versus PERL
    ... Question with DBI versus PERL ... statement at once (if I parse my SQL file) my pl/sql will work? ... All spool statements will turn to Perl print() calls. ... I want to know how can i run this sql script with DBI. ...
    (perl.dbi.users)
  • RE: how to invoke .sql file from dbi
    ... recommended changing all the SQL*Plus scripts into DBI calls. ... The sql statements are SELECT statements and we are expecting plain (no ... to just point the dbi to a sql file and dbi taking care of it. ...
    (perl.dbi.users)
  • Re: use of DBI; I am getting multiple error messages mixed in with the correct output.
    ... T> expected a mature library like DBI to behave like this. ... I learned plenty about the perl side of things from her posts. ... closely represent the SQL idea of a null. ... trivially easy to perform bulk loads using the DB's particular ...
    (comp.lang.perl.misc)
  • RE: New DBI in-memory tables, heterogeneous operations, SQL-Statement
    ... SQL ERROR: Bad predicate: 'id'! ... New DBI in-memory tables, heterogeneous operations, ... > functions, in-memory tables, and heterogeneous operations across all DBI ... > As with AoA in-memory tables, once a heterogeneous query in-memory table ...
    (perl.dbi.users)
  • RE: Newline inserted?
    ... Are you processing individual SQL statements (does your file contain SQL ... DBI is not your best bet to do this. ... When I attempt to load this file into Oracle 10g using the DBI I get the ...
    (perl.dbi.users)