Re: Algorithm ideas

From: Calum (calum.bulk_at_ntlworld.com)
Date: 02/19/04


Date: Thu, 19 Feb 2004 14:36:25 +0000

NotAsDumbAsILook wrote:

> Hello all!
>
> I'm updating some engineering software at work (moving it off of the
> mainframe, which entails translating the old COBOL to a Java
> application), and trying to come up with a good idea for a new
> algorithm. I'm an engineer who does programming on the side. I'm
> well-versed, but not nearly as knowledgable as someone who pounds out
> code 40+ hours per week. I've come up with a few ideas (other than
> just copying the old algorithm, which is accurate, but slow and
> clunky). But, I'd really like to use an optimal algorithm. I want it
> to be as fast as possible. So, I'm wondering if anyone can point me to
> any books or articles that discuss algorithms that would fit my needs.
> I can't say exactly what the program is or does because of company
> proprietary information issues, but I'll make an example as best I
> can.
>
> Imagine you have a bunch of data that is in a tree structure. You have
> a top-level folder that contains other folders, and documents. Those
> sub-folders can, in turn, contain other sub-folders and documents. The
> structure goes down several layers deep, until a folder only contains
> documents. Each document has a number written on it. You have to add
> up all of the documents to get the value of the folder that contains
> those documents. In a folder that contains documents and sub-folders
> it's value is the sum of the documents plus the sum of the values of
> the sub-folders. This addition rolls all the way up to the top-level
> folder until it is given a value.
>
> One requirement is that I need to be able to look at any folder in the
> structure and see what it's value is (the sum of everything below it).
> That means I can't just parse through the tree and add up all of the
> documents. This has to be a recursive thing that gives me the
> subtotals at each folder.
>
> Speed is most important (next to accuracy). A user who is looking at
> the top-most folder in this structure will be dealing with 15,000+
> documents plus the sub-folders, so something on the order of 20,000
> items that the algorithm would have to crank through when updates to
> the documents are made. The user's computers really get a good
> workout.
>
> So, if anyone has seen anything resembling this in a book or
> periodical, or if you just have an idea, let me know.
>
> Thanks,
> Kevin

You definitely need to look at each file in order to sum them - that's
unavoidable. But you can cache the size of the subtree in each folder,
as most people have pointed out. From a practical perspective, that
either means keeping a seperate database of folders and sizes, or
storing the size of each directory in the directory itself, in a file
named ".size" or something.

The next problem is spotting when a document changes size. If only your
software writes these documents, then the act of changing a document's
size could propagate the change in size up the directory tree to the
root. Then your cached sizes are always in sync.

You could also run a seperate process that runs in the background and
periodically updates the sizes data. You could also play with the
creation date of directories: ensuring that the creation date was the
same as the freshest file, propagated up the tree. Then you would be
able to spot when your cached sizes need updating, and only scan those
directories newer than the cache.

It all sounds a bit complicated, only get creative if you need to, have
  been asked to, and you are prepared to make the system-wide changes
that would be required for this. If it was me I'd just leave it alone.

Calum

-- 
Persistent data in C++: http://visula.org/persist


Relevant Pages

  • Re: verifying copied folders
    ... My thoughts were to run some sort of algorithm on each file just giving ... >> The overall project includes selecting drive, folder and possibly sub ... >> deleting the original. ... >enumerate. ...
    (microsoft.public.vb.general.discussion)
  • Re: Algorithm ideas
    ... > I'm updating some engineering software at work (moving it off of the ... > just copying the old algorithm, which is accurate, but slow and ... > Imagine you have a bunch of data that is in a tree structure. ... > a top-level folder that contains other folders, ...
    (comp.programming)
  • Re: read only folder on Windows
    ... As I already gave you the algorithm, I fail to see what would you ... Microsoft MVP, MCSD ... is more clear to understand access control for folder? ... "Alexander Nickolov" wrote: ...
    (microsoft.public.vc.language)
  • Re: Algorithm ideas
    ... In a folder that contains documents and sub-folders ... > it's value is the sum of the documents plus the sum of the values of ... upward from the file to the root of the tree structure. ... This algorithm will touch each plain file exactly once, ...
    (comp.programming)
  • Algorithm ideas
    ... I'm updating some engineering software at work (moving it off of the ... just copying the old algorithm, which is accurate, but slow and ... a top-level folder that contains other folders, ... sub-folders can, in turn, contain other sub-folders and documents. ...
    (comp.programming)