Re: looking for help with a counting algorithm

From: Andy Baxter (news3_at_earthsong.null.free-online.co.uk)
Date: 12/29/03

  • Next message: Regent: "Calling a scalar from another script using "require""
    Date: Mon, 29 Dec 2003 02:14:34 +0000
    
    

    At earth time Mon, 29 Dec 2003 11:35:17 +0100, the following transmission
    was received from the entity known as Gregory Toomey:

    > Andy Baxter wrote:
    >> I'm working on a web-based database project written in perl, which is a
    >> virtual library for people in the town where I live to share details of
    >> books and videos they have - the idea is if you find a book you like you
    >> click on 'borrow', and you are given details of how to contact the person
    >> who owns it.
    >>
    >> I'm not sure exactly where to ask for this, as I'm looking for help with
    >> finding a good algorithm, rather than details of coding, but I've chosen
    >> this group as it's written in perl, and more of a coding problem than a
    >> cgi problem.
    >>
    ...
    >> The thing that makes it hard is the multiple category membership plus
    >> multiple category grouping, but I want to keep this if I possibly can,
    >> because one of the problems with traditional non-computerised indexes is
    >> that you can't do this even though books can be relevant to several
    >> different topics.
    >>
    >> My thoughts at the moment are either:
    >>
    >> - give each category a hash where the keys are all the items included in
    >> the category. Read these in from the database, then each time a
    >> subcategory is counted, the code goes back up the tree to the root, adding
    >> the items that belong to it to all the higher level categories. When the
    >> recursive calls to subcategories return, (i.e. end of this code fragment),
    >> then count the keys in the hash. This would work I think, but it would
    >> involve retrieving all the category memberships from the database, rather
    >> than just the counts, and it might cause speed and memory problems if a
    >> lot of items were added to the system.
    >>
    >> - or when you put an item in a category, category memberships are
    >> automatically created in the database for all the higher level categories.
    >> This would be quicker to count, but I'm thinking that if someone has
    >> chosen to put an item in a given subcategory, and then later on I decide
    >> to move that subcategory to a different group, the category memberships
    >> should shift to follow this, whereas with this system the item would stay
    >> in the old top-level categories.
    >>
    >> Finally - my questions are:
    >> - does anyone know of an algorithm for working out the counts just from
    >> the counts in each category - i.e. without having to get details of
    >> exactly which items belong to each category? (I have a feeling this may be
    >> impossible)
    >
    > You have to write a recursive algorithm, which you can precompute.
    >
    > SQL has a "start with" and "connect by" syntax for this sort of thing,
    > but mysql does to implement it. Oracle implement this recursion for
    > you. eg See http://www.adp-gmbh.ch/ora/sql/connect_by.html
    > It would be a bit of a nightmare to do it yourself.

    Thanks for this - I didn't know you could do this sort of thing in sql.
     
    >> - or generally, and maybe more useful in the long run, a good place to
    >> start looking on the web for answers to problems like this when they crop
    >> up.
    >
    > Its called "graph theory", and has been well studied for decades.

    OK, I'll have a look about.

    I've tried doing the first method I thought of, and it's not working out
    as slow as I thought - I added 10000 items to the dataset and it slowed
    the page loading by just over a second, which isn't as bad as I thought.
    This was on a 1.3 GHz machine doing nothing else though.

    Precomputing the algorithm might help - thanks.

    I think I needn't have posted about this - I just felt like I'd hit a
    brick wall thinking about how to do it, then writing the posting helped me
    put my thoughts together, but having written all that I wanted to post it
    anyway and see if anyone had any suggestions.

    For what it's worth, the code I've written now looks like this: (in module
    LibMod.pm) Any comments welcome, as I'm still quite new to perl, and
    probably not making best use of the language.

    andy.

    sub ReadCategories($$$) {
    # Read in list of categories from database into Id index and tree structure.
            my ($mod,$pRoot, # a pointer to an empty hash which will be filled with the
                             # category tree structure.
                $pIdIndex, # a pointer to an empty hash which will be filled with the
                             # category Id index.
                $pFlags # ref to a hash with flags saying what info to include in the
                             # tree.
                             # CountItems=>defined - count items belonging to that category.
                             # Contains=>defined - mark whether this category includes an
                             # item.
                             # ItemId=>num - Id of item to check for.
                )=@_;
            my ($query, # query handle.
                $pCats, # ref to results from category query.
                $pCatRow, # ref to single row of results.
                $pCat, # ref to individual category entry in the data structure.
                $pGroups, # ref to results from category grouping query.
                $pGroupRow, # ref to row from cat group query.
                $pGroupCat, # ref to category that this one is grouped under.
                $pLink # ref to a grouping link between two categories.
                );
            DPrint ("sub: ReadCategories<br>");
            $query=$db->prepare("select Id,Name from Category");
            $query->execute;
            $pCats=$query->fetchall_arrayref;
            $query->finish;

            # fill the category Id index with entries, each of which is a pointer to a hash
            # containing a category name and Id.
            foreach $pCatRow ( @$pCats ) {
                    $pCat={Name=>$$pCatRow[1], # Category name.
                           Id=>$$pCatRow[0], # Category Id.
                           UpList=>[], # List of links to cats this is grouped under.
                           DownList=>[], # List of links to cats grouped under this one.
                           };
                    ${$pIdIndex}{$$pCatRow[0]}=$pCat;
                    if ( exists ${$pFlags}{CountItems} ) {
                            $$pCat{Items}={};
                            };
                    };

            # Now read in the category groupings to generate a tree structure.

            # first generate a root category to build the tree from. There is no
            # category in the database with Id 0, but all the top level categories
            # have a CategoryGrouping entry with Id 0.
            ${$pRoot}{Name}="root";
            ${$pRoot}{Id}=0;
            ${$pRoot}{UpList}=undef;
            ${$pRoot}{DownList}=[];
            ${$pIdIndex}{0}=$pRoot;

            # Read all the category groupings into a hash in memory:
            $query=$db->prepare("select GroupId,MemberId,SortOrder,Stop ".
                                "from CategoryGrouping");
            $query->execute;
            $pGroups=$query->fetchall_arrayref;
            $query->finish;

            # Now go through all the groupings, and add links to the categories to make
            # a tree structure.
            foreach $pGroupRow (@$pGroups) {
                    $pCat=${$pIdIndex}{$$pGroupRow[1]};
                    $pGroupCat=${$pIdIndex}{$$pGroupRow[0]};
                    $pLink={Down=>$pCat, # ref to sub-category.
                            Up=>$pGroupCat, # ref to group category.
                            Sort=>$$pGroupRow[2], # sort order of this link.
                            };
                    if ($$pGroupRow[3] ) {
                            # if Stop is defined, don't display these subcategories.
                            ${$pLink}{Stop}=${$pGroupRow}[3];
                            };
                    # add link to up and down lists for the two categories.
                    push @{${$pCat}{UpList}}, $pLink;
                    push @{${$pGroupCat}{DownList}}, $pLink;
                    };

            if ( exists ${$pFlags}{CountItems} ) {
                    # count how many items belong to each category.
                    # This means adding three keys to the category hash:
                    # CountDirect - How many belong to this category directly.
                    # CountUnique - How many belong to this category and none of its subcategories.
                    # CountTotal - How many belong to this category and any of its subcategories.
                    DPrint "Counting<br>";
                    LibMod->CountCatItems($pRoot);
                    }

    # LibMod->ReadCategoryLevel($pRoot,$pIdIndex,$pGroups);
            };

    sub CountCatItems($$) {
    # Count how many items there are under this category, either directly or indirectly.
            my ($mod,$pCat, # pointer to the category hash.
                $pStack # pointer to stack which
                )= @_;
            my ($query, # query handle.
                $catId, # category Id no.
                $pLink # ref to link between cats.
               );
            $catId=${$pCat}{Id};
            DPrint "sub: CountCatItems. Category= ${$pCat}{Name} - Id: $catId.<br>";
            #DPrint "iteration: $i <br>";
            $query=$db->prepare("select ItemId from CategoryMembership where " .
                                "CategoryId=$catId ");
            $query->execute;
            my $pItems = $query->fetchall_arrayref();
            $query->finish;
            my $row;
            foreach $row (@$pItems) {
                    my $itemId=$$row[0];
                    FollowCatLinksUp($pCat,\&MarkCatContainsItem,$itemId);
                    };
            #${$pCat}{CountDirect}=$dCount;
            #DPrint "$dCount<br>";
            foreach $pLink ( @{${$pCat}{DownList}} ) {
              LibMod->CountCatItems(${$pLink}{Down});
              };
            DPrint "CountCatItems: dumping items from $$pCat{Name} - ";
            DumpStruct($$pCat{Items});
            DPrint "<br>";
            $$pCat{Count}=scalar (keys (%{$$pCat{Items}}));
            };

    sub MarkCatContainsItem {
    # mark this category as contaning a given item.
            my ($pCat,$itemId
               )=@_;
            ${$$pCat{Items}}{$itemId}=1;
            };

    sub FollowCatLinksUp {
    # follow the upward pointing links from this category to all higher level cats,
    # and do something with them.
            my ($pCat, # ref to category.
                $pSub, # sub to call on this category.
                $parm
                )=@_;
            DPrint "sub: FollowCatLinksUp. Category=$$pCat{Name}<br>";
            # go through each of the up-links:
            if ( not defined ($$pCat{UpList}) ) {
                    DPrint "Reached category root. <br>";
                    return;
                    };
            &$pSub($pCat,$parm); # call the sub.
            my $pLink;
            foreach $pLink ( @{$$pCat{UpList}} ) {
                    DPrint "Checking up-link: <br>";
                    FollowCatLinksUp($$pLink{Up}, $pSub, $parm);
                    };
            };

    ReadCategories is called with two empty hash pointers by any of the
    scripts that need to work with the category tree, and then this hash, with
    the {Count} hash keys filled in for each category, is then passed on to
    the function which renders the category listing page.

    -- 
    http://www.niftybits.ukfsn.org/
    remove 'n-u-l-l' to email me. html mail or attachments will go in the spam
    bin unless notified with [html] or [attachment] in the subject line. 
    

  • Next message: Regent: "Calling a scalar from another script using "require""

    Relevant Pages

    • looking for help with a counting algorithm
      ... subcategory under one or more higher level categories. ... stored in a mysql database, and the links between books and categories, ... I want any set of category memberships or groupings of categories that can ... where each category is represented by a hash ...
      (comp.lang.perl.misc)
    • Re: What is the name of the Language we are using & recommend book
      ... Can I have 2 sub forms in a form that are not sub forms of the other sub ... As for my process I am trying to create my Access Database in shells like ... QSL or Microsoft SQL Server Data Engine or what. ... language of queries, and the query design grid is just a tool to construct ...
      (microsoft.public.access.formscoding)
    • Re: Custom Login Screen
      ... Private Sub cmdLogin_Click ... On Error GoTo ErrorHandler ... You will need to enter your full path to the database file and MDW file in the appropriate places. ... Now make an MDE file from this MDB. ...
      (microsoft.public.access.security)
    • Re: Jeff C
      ... properly secure an Access database. ... Private Sub Form_Open ... Resume ExitPoint ... Dim db As DAO.Database ...
      (microsoft.public.access.formscoding)
    • Re: How to use a logon screen to log into a secured Database?
      ... > that appears with a secured database. ... >>On Error GoTo ErrorHandler ... >> Exit Sub ... Now make an MDE file ...
      (microsoft.public.access.security)