Re: looking for help with a counting algorithm

From: Gregory Toomey (nospam_at_bigpond.com)
Date: 12/29/03


Date: Mon, 29 Dec 2003 11:35:17 +0100

Andy Baxter wrote:
> I'm working on a web-based database project written in perl, which is a
> virtual library for people in the town where I live to share details of
> books and videos they have - the idea is if you find a book you like you
> click on 'borrow', and you are given details of how to contact the person
> who owns it.
>
> I'm not sure exactly where to ask for this, as I'm looking for help with
> finding a good algorithm, rather than details of coding, but I've chosen
> this group as it's written in perl, and more of a coding problem than a
> cgi problem.
>
> I have decided on a categorisation system for the items, where each item
> can belong to many categories, and each category can also be grouped as a
> subcategory under one or more higher level categories. Everything is
> stored in a mysql database, and the links between books and categories,
> and between categories and higher level categories are done through two
> linking tables, which have entries like:
>
> CategoryMembership
> ------------------
> CatID ItemId
> 1 1
> 2 1
> 1 2
> 1 3
> 3 3
> 1 4
> 4 4
> 7 5
>
> CategoryGrouping
> ----------------
> CatID SubCatId SortOrder Stop
> 1 2 1
> 1 3 2
> 1 4 3
> 3 4 1 1 (yes.)
> 4 5 1
> 4 6 2
> 4 7 3
>
>
> I.e. item 1 is in cats 1 & 2, item 2 is in cat 1, item 3 is in cat 3, and
> item 4 is in cats 1 & 4. Categories 2 and 3 are in Cat 1, and Category 4
> is in cat 3 and cat 1 etc.
>
> I want any set of category memberships or groupings of categories that can
> be represented like this without leading to loops to be allowed. (I.e. the
> only thing that isn't allowed is things like category 1 is grouped under
> cat 2 which is grouped under cat 3 which is grouped under cat 1, and more
> complex variations on this.)
>

That's called a directed acyclic graph, or DAG.

> I have written some code which reads these tables and generates a
> tree-like data structure, where each category is represented by a hash
> like this:
>
> {Name=>"category name",
> Id=>"Id on database",
> DownList=>ref of array with list of links to subcategories,
> UpList=>ref of array with list of links to higher categories
> }
>
> The links between categories are hashes like this:
> {Up=>ref to category hash,
> Down=>ref to (sub) category hash,
> Stop=>stop listing subcategories at this point?,
> Sort=>sort order when displaying this subcategory}
>
> This structure is then used by the scripts which render the pages
> displaying the categories. This bit is working fine.
>
> The bit I'm finding difficult is how to count how many items there are
> under a given category, so that a count can be shown in the category index
> page.
>
> The code I have at the moment is here: (This code is in a module called
> LibMod.pm)
>
> sub CountCatItems($$) {
> # Count how many items there are under this category, either directly or indirectly.
> my ($mod,$pCat, # pointer to the category hash.
> $pStack # pointer to stack which
> )= @_;
> my ($query, # query handle.
> $catId, # category Id no.
> $pLink # ref to link between cats.
> );
>
> $catId=${$pCat}{Id};
> DPrint "counting ${$pCat}{Name} - $catId = ";
> $query=$db->prepare("select count(ItemId) from CategoryMembership where " .
> "CategoryId=$catId group By CategoryId");
> $query->execute;
> my ($dCount) = $query->fetchrow_array();
> $query->finish;
> ${$pCat}{CountDirect}=$dCount;
> DPrint "$dCount<br>"; # debug print.
> foreach $pLink ( @{${$pCat}{DownList}} ) {
> LibMod->CountCatItems(${$pLink}{Down}); # call self recursively
> };
> };
>
> This works, but only counts the number of items directly included in a
> category, not the ones which belong to a subcategory.
>
> The thing that makes it hard is the multiple category membership plus
> multiple category grouping, but I want to keep this if I possibly can,
> because one of the problems with traditional non-computerised indexes is
> that you can't do this even though books can be relevant to several
> different topics.
>
> My thoughts at the moment are either:
>
> - give each category a hash where the keys are all the items included in
> the category. Read these in from the database, then each time a
> subcategory is counted, the code goes back up the tree to the root, adding
> the items that belong to it to all the higher level categories. When the
> recursive calls to subcategories return, (i.e. end of this code fragment),
> then count the keys in the hash. This would work I think, but it would
> involve retrieving all the category memberships from the database, rather
> than just the counts, and it might cause speed and memory problems if a
> lot of items were added to the system.
>
> - or when you put an item in a category, category memberships are
> automatically created in the database for all the higher level categories.
> This would be quicker to count, but I'm thinking that if someone has
> chosen to put an item in a given subcategory, and then later on I decide
> to move that subcategory to a different group, the category memberships
> should shift to follow this, whereas with this system the item would stay
> in the old top-level categories.
>
> Finally - my questions are:
> - does anyone know of an algorithm for working out the counts just from
> the counts in each category - i.e. without having to get details of
> exactly which items belong to each category? (I have a feeling this may be
> impossible)

You have to write a recursive algorithm, which you can precompute.

SQL has a "start with" and "connect by" syntax for this sort of thing,
but mysql does to implement it. Oracle implement this recursion for
you. eg See http://www.adp-gmbh.ch/ora/sql/connect_by.html
It would be a bit of a nightmare to do it yourself.

> - or generally, and maybe more useful in the long run, a good place to
> start looking on the web for answers to problems like this when they crop
> up.

Its called "graph theory", and has been well studied for decades.

> - also if you have any thoughts on either of the solutions I've
> described, or can think of a better way of doing it, I'd appreciate it.
>
> andy.
>



Relevant Pages

  • looking for help with a counting algorithm
    ... subcategory under one or more higher level categories. ... stored in a mysql database, and the links between books and categories, ... I want any set of category memberships or groupings of categories that can ... where each category is represented by a hash ...
    (comp.lang.perl.misc)
  • Re: looking for help with a counting algorithm
    ... >> subcategory is counted, the code goes back up the tree to the root, adding ... >> involve retrieving all the category memberships from the database, ... sub ReadCategories{ ... ReadCategories is called with two empty hash pointers by any of the ...
    (comp.lang.perl.misc)
  • Re: A new paradigm
    ... files _plus_ a higher level Orders file which coordinates the others. ... sort of higher-level business object, and not rely too heavily on the ... the database and should be developed independently. ... advocating the replacement of Databasic with vb.net for example! ...
    (comp.databases.pick)
  • Help with ASP.NET Memberships
    ... I'm taking my first stab at using ASP.NET memberships and could use some help. ... I see it generated a complete, new database to store the membership information in. ... That's okay I guess but doesn't it have the option of storing those tables in my main database? ... the user information that ASP.NET created. ...
    (microsoft.public.dotnet.framework.aspnet)
  • Hierarchical combos
    ... I am trying to build a database that will sort a large number of publications ... To enter details of a publication, ... selects a category from cboCategory and then a subcategory on cboSubcategory ... The subform is based on tblPublications, ...
    (microsoft.public.access.gettingstarted)