Re: looking for help with a counting algorithm
From: Andy Baxter (news3_at_earthsong.null.free-online.co.uk)
Date: 12/29/03
- Previous message: Gregory Toomey: "Re: looking for help with a counting algorithm"
- In reply to: Gregory Toomey: "Re: looking for help with a counting algorithm"
- Next in thread: Andy Baxter: "Re: looking for help with a counting algorithm"
- Reply: Andy Baxter: "Re: looking for help with a counting algorithm"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Mon, 29 Dec 2003 02:14:34 +0000
At earth time Mon, 29 Dec 2003 11:35:17 +0100, the following transmission
was received from the entity known as Gregory Toomey:
> Andy Baxter wrote:
>> I'm working on a web-based database project written in perl, which is a
>> virtual library for people in the town where I live to share details of
>> books and videos they have - the idea is if you find a book you like you
>> click on 'borrow', and you are given details of how to contact the person
>> who owns it.
>>
>> I'm not sure exactly where to ask for this, as I'm looking for help with
>> finding a good algorithm, rather than details of coding, but I've chosen
>> this group as it's written in perl, and more of a coding problem than a
>> cgi problem.
>>
...
>> The thing that makes it hard is the multiple category membership plus
>> multiple category grouping, but I want to keep this if I possibly can,
>> because one of the problems with traditional non-computerised indexes is
>> that you can't do this even though books can be relevant to several
>> different topics.
>>
>> My thoughts at the moment are either:
>>
>> - give each category a hash where the keys are all the items included in
>> the category. Read these in from the database, then each time a
>> subcategory is counted, the code goes back up the tree to the root, adding
>> the items that belong to it to all the higher level categories. When the
>> recursive calls to subcategories return, (i.e. end of this code fragment),
>> then count the keys in the hash. This would work I think, but it would
>> involve retrieving all the category memberships from the database, rather
>> than just the counts, and it might cause speed and memory problems if a
>> lot of items were added to the system.
>>
>> - or when you put an item in a category, category memberships are
>> automatically created in the database for all the higher level categories.
>> This would be quicker to count, but I'm thinking that if someone has
>> chosen to put an item in a given subcategory, and then later on I decide
>> to move that subcategory to a different group, the category memberships
>> should shift to follow this, whereas with this system the item would stay
>> in the old top-level categories.
>>
>> Finally - my questions are:
>> - does anyone know of an algorithm for working out the counts just from
>> the counts in each category - i.e. without having to get details of
>> exactly which items belong to each category? (I have a feeling this may be
>> impossible)
>
> You have to write a recursive algorithm, which you can precompute.
>
> SQL has a "start with" and "connect by" syntax for this sort of thing,
> but mysql does to implement it. Oracle implement this recursion for
> you. eg See http://www.adp-gmbh.ch/ora/sql/connect_by.html
> It would be a bit of a nightmare to do it yourself.
Thanks for this - I didn't know you could do this sort of thing in sql.
>> - or generally, and maybe more useful in the long run, a good place to
>> start looking on the web for answers to problems like this when they crop
>> up.
>
> Its called "graph theory", and has been well studied for decades.
OK, I'll have a look about.
I've tried doing the first method I thought of, and it's not working out
as slow as I thought - I added 10000 items to the dataset and it slowed
the page loading by just over a second, which isn't as bad as I thought.
This was on a 1.3 GHz machine doing nothing else though.
Precomputing the algorithm might help - thanks.
I think I needn't have posted about this - I just felt like I'd hit a
brick wall thinking about how to do it, then writing the posting helped me
put my thoughts together, but having written all that I wanted to post it
anyway and see if anyone had any suggestions.
For what it's worth, the code I've written now looks like this: (in module
LibMod.pm) Any comments welcome, as I'm still quite new to perl, and
probably not making best use of the language.
andy.
sub ReadCategories($$$) {
# Read in list of categories from database into Id index and tree structure.
my ($mod,$pRoot, # a pointer to an empty hash which will be filled with the
# category tree structure.
$pIdIndex, # a pointer to an empty hash which will be filled with the
# category Id index.
$pFlags # ref to a hash with flags saying what info to include in the
# tree.
# CountItems=>defined - count items belonging to that category.
# Contains=>defined - mark whether this category includes an
# item.
# ItemId=>num - Id of item to check for.
)=@_;
my ($query, # query handle.
$pCats, # ref to results from category query.
$pCatRow, # ref to single row of results.
$pCat, # ref to individual category entry in the data structure.
$pGroups, # ref to results from category grouping query.
$pGroupRow, # ref to row from cat group query.
$pGroupCat, # ref to category that this one is grouped under.
$pLink # ref to a grouping link between two categories.
);
DPrint ("sub: ReadCategories<br>");
$query=$db->prepare("select Id,Name from Category");
$query->execute;
$pCats=$query->fetchall_arrayref;
$query->finish;
# fill the category Id index with entries, each of which is a pointer to a hash
# containing a category name and Id.
foreach $pCatRow ( @$pCats ) {
$pCat={Name=>$$pCatRow[1], # Category name.
Id=>$$pCatRow[0], # Category Id.
UpList=>[], # List of links to cats this is grouped under.
DownList=>[], # List of links to cats grouped under this one.
};
${$pIdIndex}{$$pCatRow[0]}=$pCat;
if ( exists ${$pFlags}{CountItems} ) {
$$pCat{Items}={};
};
};
# Now read in the category groupings to generate a tree structure.
# first generate a root category to build the tree from. There is no
# category in the database with Id 0, but all the top level categories
# have a CategoryGrouping entry with Id 0.
${$pRoot}{Name}="root";
${$pRoot}{Id}=0;
${$pRoot}{UpList}=undef;
${$pRoot}{DownList}=[];
${$pIdIndex}{0}=$pRoot;
# Read all the category groupings into a hash in memory:
$query=$db->prepare("select GroupId,MemberId,SortOrder,Stop ".
"from CategoryGrouping");
$query->execute;
$pGroups=$query->fetchall_arrayref;
$query->finish;
# Now go through all the groupings, and add links to the categories to make
# a tree structure.
foreach $pGroupRow (@$pGroups) {
$pCat=${$pIdIndex}{$$pGroupRow[1]};
$pGroupCat=${$pIdIndex}{$$pGroupRow[0]};
$pLink={Down=>$pCat, # ref to sub-category.
Up=>$pGroupCat, # ref to group category.
Sort=>$$pGroupRow[2], # sort order of this link.
};
if ($$pGroupRow[3] ) {
# if Stop is defined, don't display these subcategories.
${$pLink}{Stop}=${$pGroupRow}[3];
};
# add link to up and down lists for the two categories.
push @{${$pCat}{UpList}}, $pLink;
push @{${$pGroupCat}{DownList}}, $pLink;
};
if ( exists ${$pFlags}{CountItems} ) {
# count how many items belong to each category.
# This means adding three keys to the category hash:
# CountDirect - How many belong to this category directly.
# CountUnique - How many belong to this category and none of its subcategories.
# CountTotal - How many belong to this category and any of its subcategories.
DPrint "Counting<br>";
LibMod->CountCatItems($pRoot);
}
# LibMod->ReadCategoryLevel($pRoot,$pIdIndex,$pGroups);
};
sub CountCatItems($$) {
# Count how many items there are under this category, either directly or indirectly.
my ($mod,$pCat, # pointer to the category hash.
$pStack # pointer to stack which
)= @_;
my ($query, # query handle.
$catId, # category Id no.
$pLink # ref to link between cats.
);
$catId=${$pCat}{Id};
DPrint "sub: CountCatItems. Category= ${$pCat}{Name} - Id: $catId.<br>";
#DPrint "iteration: $i <br>";
$query=$db->prepare("select ItemId from CategoryMembership where " .
"CategoryId=$catId ");
$query->execute;
my $pItems = $query->fetchall_arrayref();
$query->finish;
my $row;
foreach $row (@$pItems) {
my $itemId=$$row[0];
FollowCatLinksUp($pCat,\&MarkCatContainsItem,$itemId);
};
#${$pCat}{CountDirect}=$dCount;
#DPrint "$dCount<br>";
foreach $pLink ( @{${$pCat}{DownList}} ) {
LibMod->CountCatItems(${$pLink}{Down});
};
DPrint "CountCatItems: dumping items from $$pCat{Name} - ";
DumpStruct($$pCat{Items});
DPrint "<br>";
$$pCat{Count}=scalar (keys (%{$$pCat{Items}}));
};
sub MarkCatContainsItem {
# mark this category as contaning a given item.
my ($pCat,$itemId
)=@_;
${$$pCat{Items}}{$itemId}=1;
};
sub FollowCatLinksUp {
# follow the upward pointing links from this category to all higher level cats,
# and do something with them.
my ($pCat, # ref to category.
$pSub, # sub to call on this category.
$parm
)=@_;
DPrint "sub: FollowCatLinksUp. Category=$$pCat{Name}<br>";
# go through each of the up-links:
if ( not defined ($$pCat{UpList}) ) {
DPrint "Reached category root. <br>";
return;
};
&$pSub($pCat,$parm); # call the sub.
my $pLink;
foreach $pLink ( @{$$pCat{UpList}} ) {
DPrint "Checking up-link: <br>";
FollowCatLinksUp($$pLink{Up}, $pSub, $parm);
};
};
ReadCategories is called with two empty hash pointers by any of the
scripts that need to work with the category tree, and then this hash, with
the {Count} hash keys filled in for each category, is then passed on to
the function which renders the category listing page.
-- http://www.niftybits.ukfsn.org/ remove 'n-u-l-l' to email me. html mail or attachments will go in the spam bin unless notified with [html] or [attachment] in the subject line.
- Previous message: Gregory Toomey: "Re: looking for help with a counting algorithm"
- In reply to: Gregory Toomey: "Re: looking for help with a counting algorithm"
- Next in thread: Andy Baxter: "Re: looking for help with a counting algorithm"
- Reply: Andy Baxter: "Re: looking for help with a counting algorithm"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|