Speed Freak

From: Dave Bee (ruu_at_cwcom.net)
Date: 12/29/03

  • Next message: Liang: "parse unix-style difference reporting"
    Date: 29 Dec 2003 11:13:30 -0800
    
    

    This is a conceptual question rather than a specific coding one, but
    hopefully someone might have played around with something similar. In
    a nutshell, I have around 10 million information entries with lots of
    data points. My current script has two stages - the first organises
    certain data points of the data into large (huge) hashes, and the
    second stage forks off lots of children and does the subsequent
    processing to produce ldifs, using the information in the hashes
    (thanks to copy-on-write, and the fact that the children don't need to
    update the hashes this doesn't use a great deal of memory).

    My current problem is with stage one - it is, by current necessity, a
    single process, since it needs to refer to information within the
    hashes as it builds them, and the processing required by the single
    processor is the choke point here. I would like to cut down the
    current time it takes to do the first stage processing (~50 minutes)
    and I am at liberty to use any interesting techniques in order to do
    so - my hardware is somewhat above spec (24 CPU 6800, 48G RAM etc),
    and can be dedicated 100% to the script when it runs, so unusual and
    incredibly memory / CPU wasteful techniques are more than welcome.

    I've thought of threading (no real experience, but I could probably
    figure something out), parent hash-controller with multiple forked
    children etc, I'm just curious if anyone has done something similar
    and already knows the most efficient way of doing this.

    Dave


  • Next message: Liang: "parse unix-style difference reporting"