RE: Optimization for faster select...

From: Daniel Rozengurtel (Daniel.Rozengurtel_at_bofasecurities.com)
Date: 12/13/03

  • Next message: Andy Tsouladze: "DBI would not compile"
    To: "'David N Murray'" <dmurray@jsbsystems.com>,   "'Michael.Fox@auspost.com.au'" <Michael.Fox@auspost.com.au>,   "'kn@sifira.dk'" <kn@sifira.dk>,   "'andy.crichton@abibuildingdata.com'"	 <andy.crichton@abibuildingdata.com>,   "'GoldS@ncr.disa.mil'"	 <GoldS@ncr.disa.mil>
    Date: Sat, 13 Dec 2003 15:34:48 -0500
    
    

    David,Michael,Andy, Sam,Tom and Kristian - thanx very,very much for your
    incisive and knowledgeable answers/suggestions. I tried and learned A LOT
    from these.
    Here are some of my discoveries and things I've done.

    1. I am using a CBO type of Database. TRADE_STATUS does not have an index at
    all. Column INST_MNEM has index GOVT_MONDAY_INDX_2. I would expect this
    column to hold about 150 distinct values (from which I'm interested only in
    one:'GRDCOMP'), whereas TRADE_STATUS to have only 4 to 8 distinct values. So
    I think there is a high selectivity ratio on INST_MNEM column with about 90K
    records, and I should be ok.

    2. I played around with RowCacheSize along with analyzing table and
    indexes:
      - Before analyzed index/table:
            With no RowCacheSize set up the extract took 34 secs, using about
    5mb RAM
            With RowCacheSize =>0 (automatically select what's best), the
    extract took 33 secs, with 5mb RAM
            With RowCacheSize =>100, the extract took 8 secs, using about 6mb
    RAM (THE BEST RESULTS EVER)
      - After analyzed index/table:
              No change
              No change
              No change

    3. Although its a cute idea to create a function based index on SUBSTR
    etc... I did not like it a lot cause based on clients requests for
    formatting data, I would need to use a lot of those functions
    (DECODE,INSTR,SUBSTR,NVL etc...). Creating all these indexes would have a
    huge impact on my initial load, when the table is created every day(its
    created by first truncation/inserts/updates)

    4. I tried to see whether the SUBSTR could be replaced with Perl formatting
    combining with both JOIN/MAP functions on 10K records.
        - using SUBSTR with:
            'map' extract took 30 sec - good results
            'join' extract took 33 sec
        - without using SUBSTR, but utilize Perl Formatting later with:
            'map' extract took 38 sec - not as good as I hoped
            'join' extract took 40 sec - not good at all

    >From this test I could see that there is no BIG difference in join/map,
    though 'join' is cuter for not having extra '|' at the end. But it really
    DID NOT make any difference when I tried to avoid of use of SUBSTR from sql.

    5. I took out the print statements (print OUTFILE join.....) from while
    ($sth->fetch) loop to see if code spends most of its time writing data, and
    let just loop over the datasets gotten from DB. No impact at all. Prints
    don't save me much time. In fact it don't matter at all. So that rules out
    the possibility of writes delaying the extract.

    6. SPOOL command was really good. To get those 10K records (not formatted
    text) took me about 7-8 seconds. Which shows the writes to Unix file is not
    an issue.

    7. The database is on the same physical Unix machine that my extractions
    run. So I am not sure if there is network delay of of some sort.

    8. Also compared using bind_columns on a regular array instead of hash with:
            RowCacheSize=>100, there was no difference at all
            RowCacheSize not set at all, the differences were almost
    insignificant. 'Array' way was couple of seconds faster as expected.

    So, ladies and gets, everything comes down to using RowCacheSize attribute.
    I don't have any experience with it. I know if you have a huge number set up
    to it, your other processes might end up with no Memory for them, or
    something like that. How dangerous is it to use? How do you know what
    optimal RowCacheSize you need for specific code? If this is a bottleneck in
    my case, and RowCacheSize helps me solve that issue, what does it mean the
    problem is? Low cache size Database, Unix???? If I use RowCacheSize=>0, does
    it mean Oracle/Unix has to find a safest and not necessarily best way of
    dealing with data being extracted?? Why then SQL SPOOL command works faster
    than that?

    The more I know, the more I know that I dont know a lot. :-) Would you help
    me consider these questions?
    Again, Thanx A LOT to all of you who helped me understand better what I need
    to do.

    Best regards,

    Daniel

    -----Original Message-----
    From: David N Murray [mailto:dmurray@jsbsystems.com]
    Sent: Friday, December 12, 2003 12:23 AM
    To: Rozengurtel, Daniel
    Cc: 'dbi-users@perl.org'
    Subject: Re: Optimization for faster select...

    Sounds like a great test! Why don't you let us know which way is faster:
    doing the substr's on the DB or in perl (my bet is perl). For that
    matter, you could also test to see if join('|', @row) is faster. You
    could certainly move the "\n" up to the first print line (via . or ,) and
    cut out a whole perl statement. But then, there alway more than one way
    to do it ;-)

    I don't normally use the bind* calls. I typically just use fetchrow_array
    or fetchrow_arrayref for large result sets. But, I just read page 127 of
    our favorite book and your way is probably faster than mine.

    For testing purposes, you could always insert a cut-off (say 10K rows) and
    time it on a smaller scale. That way each test won't take 35min. You
    could also take the print out of the loop and just see how long it takes
    to pull the data from the DB. I imagine that's where your spending most
    of your time. Maybe tuning the (Oracle?) client interface is the place to
    look. As Michael points out, EXPLAIN PLAN might be beneficial on your
    query.

    Finally, would performance be improved if you used bind_columns on a
    regular array instead of a hash? (see join(), above.) It would save the
    overhead of computing the index (hash).

    Just my $0.02.
    Dave

    On Dec 11, Rozengurtel, Daniel scribed:

    > Hello All,
    >
    > I am trying to optimize my code to work faster in selecting about 30
    columns
    > from a denormolized table. The result set of 165,000 records is put to a
    > file on Unix in about 35-40 minutes. I have tried to follow the guide
    lines
    > from Tim's recent presentation on DBI (DBI_AdvancedTalk_200307.ppt) to
    > achieve fast results. The code is working absolutely fine utilizing an
    index
    > on that table (INST_MNEM), however I was wondering if anyone can suggest a
    > faster and better approach to do the same thing. I know selecting 30
    columns
    > can and will affect performance but still....
    > Does anyone know if a specific function in select stmt affects the
    > performance and how? (i.e. SUBSTR, INSTR) Is it better to parse it in
    Perl?
    >
    > Thanx much for your help,
    >
    > Regards,
    >
    > Daniel
    >
    > $pfd_sql="SELECT
    > CUSIP, ISIN, SUBSTR(FXTICKER, 1, INSTR(FXTICKER, ' ', 1) -1) AS TICKER,
    > SECURITY_DESC, ' ' AS IS_DOLLAR_PFD,
    > ...
    > ...
    > ...
    > FROM GOVT_TABLE WHERE INST_MNEM='GRDCOMP' AND TRADE_STATUS='ACTV' ";
    >
    > my %row;
    > # first get all the Preferreds into a file
    > my $sth = $dbh->prepare_cached($pfd_sql) || die $dbh->errstr;
    > $sth->execute; $sth->bind_columns( \( @row{@ClnFldsArray} ));
    >
    > while($sth->fetch) {
    > print OUTFILE map "$row{$_}|", @ClnFldsArray; # print each row with
    > | as delimiter
    > print OUTFILE "\n";
    > }#while
    >
    >
    > ____________________________________________________________________
    >
    > IMPORTANT NOTICES:
    > This message is intended only for the addressee. Please notify
    the
    > sender by e-mail if you are not the intended recipient. If you are not the
    > intended recipient, you may not copy, disclose, or distribute this message
    > or its contents to any other person and any such actions may be unlawful.
    >
    > Banc of America Securities LLC("BAS") does not accept time
    > sensitive, action-oriented messages or transaction orders, including
    orders
    > to purchase or sell securities, via e-mail.
    >
    > BAS reserves the right to monitor and review the content of all
    > messages sent to or from this e-mail address. Messages sent to or from
    this
    > e-mail address may be stored on the BAS e-mail system.
    > __________________________________________________________________
    >


  • Next message: Andy Tsouladze: "DBI would not compile"

    Relevant Pages