RE: Optimization for faster select...
From: Daniel Rozengurtel (Daniel.Rozengurtel_at_bofasecurities.com)
Date: 12/13/03
- Previous message: Jonathan Leffler: "Re: DBD::Informix make test problem on Solaris"
- Maybe in reply to: Daniel Rozengurtel: "Optimization for faster select..."
- Next in thread: Kristian Nielsen: "Re: Optimization for faster select..."
- Reply: Kristian Nielsen: "Re: Optimization for faster select..."
- Reply: Jared Still: "RE: Optimization for faster select..."
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
To: "'David N Murray'" <dmurray@jsbsystems.com>, "'Michael.Fox@auspost.com.au'" <Michael.Fox@auspost.com.au>, "'kn@sifira.dk'" <kn@sifira.dk>, "'andy.crichton@abibuildingdata.com'" <andy.crichton@abibuildingdata.com>, "'GoldS@ncr.disa.mil'" <GoldS@ncr.disa.mil> Date: Sat, 13 Dec 2003 15:34:48 -0500
David,Michael,Andy, Sam,Tom and Kristian - thanx very,very much for your
incisive and knowledgeable answers/suggestions. I tried and learned A LOT
from these.
Here are some of my discoveries and things I've done.
1. I am using a CBO type of Database. TRADE_STATUS does not have an index at
all. Column INST_MNEM has index GOVT_MONDAY_INDX_2. I would expect this
column to hold about 150 distinct values (from which I'm interested only in
one:'GRDCOMP'), whereas TRADE_STATUS to have only 4 to 8 distinct values. So
I think there is a high selectivity ratio on INST_MNEM column with about 90K
records, and I should be ok.
2. I played around with RowCacheSize along with analyzing table and
indexes:
- Before analyzed index/table:
With no RowCacheSize set up the extract took 34 secs, using about
5mb RAM
With RowCacheSize =>0 (automatically select what's best), the
extract took 33 secs, with 5mb RAM
With RowCacheSize =>100, the extract took 8 secs, using about 6mb
RAM (THE BEST RESULTS EVER)
- After analyzed index/table:
No change
No change
No change
3. Although its a cute idea to create a function based index on SUBSTR
etc... I did not like it a lot cause based on clients requests for
formatting data, I would need to use a lot of those functions
(DECODE,INSTR,SUBSTR,NVL etc...). Creating all these indexes would have a
huge impact on my initial load, when the table is created every day(its
created by first truncation/inserts/updates)
4. I tried to see whether the SUBSTR could be replaced with Perl formatting
combining with both JOIN/MAP functions on 10K records.
- using SUBSTR with:
'map' extract took 30 sec - good results
'join' extract took 33 sec
- without using SUBSTR, but utilize Perl Formatting later with:
'map' extract took 38 sec - not as good as I hoped
'join' extract took 40 sec - not good at all
>From this test I could see that there is no BIG difference in join/map,
though 'join' is cuter for not having extra '|' at the end. But it really
DID NOT make any difference when I tried to avoid of use of SUBSTR from sql.
5. I took out the print statements (print OUTFILE join.....) from while
($sth->fetch) loop to see if code spends most of its time writing data, and
let just loop over the datasets gotten from DB. No impact at all. Prints
don't save me much time. In fact it don't matter at all. So that rules out
the possibility of writes delaying the extract.
6. SPOOL command was really good. To get those 10K records (not formatted
text) took me about 7-8 seconds. Which shows the writes to Unix file is not
an issue.
7. The database is on the same physical Unix machine that my extractions
run. So I am not sure if there is network delay of of some sort.
8. Also compared using bind_columns on a regular array instead of hash with:
RowCacheSize=>100, there was no difference at all
RowCacheSize not set at all, the differences were almost
insignificant. 'Array' way was couple of seconds faster as expected.
So, ladies and gets, everything comes down to using RowCacheSize attribute.
I don't have any experience with it. I know if you have a huge number set up
to it, your other processes might end up with no Memory for them, or
something like that. How dangerous is it to use? How do you know what
optimal RowCacheSize you need for specific code? If this is a bottleneck in
my case, and RowCacheSize helps me solve that issue, what does it mean the
problem is? Low cache size Database, Unix???? If I use RowCacheSize=>0, does
it mean Oracle/Unix has to find a safest and not necessarily best way of
dealing with data being extracted?? Why then SQL SPOOL command works faster
than that?
The more I know, the more I know that I dont know a lot. :-) Would you help
me consider these questions?
Again, Thanx A LOT to all of you who helped me understand better what I need
to do.
Best regards,
Daniel
-----Original Message-----
From: David N Murray [mailto:dmurray@jsbsystems.com]
Sent: Friday, December 12, 2003 12:23 AM
To: Rozengurtel, Daniel
Cc: 'dbi-users@perl.org'
Subject: Re: Optimization for faster select...
Sounds like a great test! Why don't you let us know which way is faster:
doing the substr's on the DB or in perl (my bet is perl). For that
matter, you could also test to see if join('|', @row) is faster. You
could certainly move the "\n" up to the first print line (via . or ,) and
cut out a whole perl statement. But then, there alway more than one way
to do it ;-)
I don't normally use the bind* calls. I typically just use fetchrow_array
or fetchrow_arrayref for large result sets. But, I just read page 127 of
our favorite book and your way is probably faster than mine.
For testing purposes, you could always insert a cut-off (say 10K rows) and
time it on a smaller scale. That way each test won't take 35min. You
could also take the print out of the loop and just see how long it takes
to pull the data from the DB. I imagine that's where your spending most
of your time. Maybe tuning the (Oracle?) client interface is the place to
look. As Michael points out, EXPLAIN PLAN might be beneficial on your
query.
Finally, would performance be improved if you used bind_columns on a
regular array instead of a hash? (see join(), above.) It would save the
overhead of computing the index (hash).
Just my $0.02.
Dave
On Dec 11, Rozengurtel, Daniel scribed:
> Hello All,
>
> I am trying to optimize my code to work faster in selecting about 30
columns
> from a denormolized table. The result set of 165,000 records is put to a
> file on Unix in about 35-40 minutes. I have tried to follow the guide
lines
> from Tim's recent presentation on DBI (DBI_AdvancedTalk_200307.ppt) to
> achieve fast results. The code is working absolutely fine utilizing an
index
> on that table (INST_MNEM), however I was wondering if anyone can suggest a
> faster and better approach to do the same thing. I know selecting 30
columns
> can and will affect performance but still....
> Does anyone know if a specific function in select stmt affects the
> performance and how? (i.e. SUBSTR, INSTR) Is it better to parse it in
Perl?
>
> Thanx much for your help,
>
> Regards,
>
> Daniel
>
> $pfd_sql="SELECT
> CUSIP, ISIN, SUBSTR(FXTICKER, 1, INSTR(FXTICKER, ' ', 1) -1) AS TICKER,
> SECURITY_DESC, ' ' AS IS_DOLLAR_PFD,
> ...
> ...
> ...
> FROM GOVT_TABLE WHERE INST_MNEM='GRDCOMP' AND TRADE_STATUS='ACTV' ";
>
> my %row;
> # first get all the Preferreds into a file
> my $sth = $dbh->prepare_cached($pfd_sql) || die $dbh->errstr;
> $sth->execute; $sth->bind_columns( \( @row{@ClnFldsArray} ));
>
> while($sth->fetch) {
> print OUTFILE map "$row{$_}|", @ClnFldsArray; # print each row with
> | as delimiter
> print OUTFILE "\n";
> }#while
>
>
> ____________________________________________________________________
>
> IMPORTANT NOTICES:
> This message is intended only for the addressee. Please notify
the
> sender by e-mail if you are not the intended recipient. If you are not the
> intended recipient, you may not copy, disclose, or distribute this message
> or its contents to any other person and any such actions may be unlawful.
>
> Banc of America Securities LLC("BAS") does not accept time
> sensitive, action-oriented messages or transaction orders, including
orders
> to purchase or sell securities, via e-mail.
>
> BAS reserves the right to monitor and review the content of all
> messages sent to or from this e-mail address. Messages sent to or from
this
> e-mail address may be stored on the BAS e-mail system.
> __________________________________________________________________
>
- Previous message: Jonathan Leffler: "Re: DBD::Informix make test problem on Solaris"
- Maybe in reply to: Daniel Rozengurtel: "Optimization for faster select..."
- Next in thread: Kristian Nielsen: "Re: Optimization for faster select..."
- Reply: Kristian Nielsen: "Re: Optimization for faster select..."
- Reply: Jared Still: "RE: Optimization for faster select..."
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|