Interesting problem: automatic item categorization



I've observed this interesting problem which has practical
applications.


Suppose you're a book pulisher, and you have N "knowledge items" in
total, and you're going to publish several books each containing a
selected subset of these N items.

The problem is, given a subset of items, how do you automatically make
a "table of contents" for this particular set of items.

What you have is M available "categories", each category gives its
memberships to some particular items and other categories. If category
A contains category B and category B contains category/item C, we say C
has a membership in B and in A.

You want the computer to automatically generate a table-of-contents for
a given subset of items using some of the M available categories as
"non-leaf nodes". The requirements are:

(1) Every item in the given subset appears in the generated TOC tree
once and only once;
(2) When a reader is exploring the immediate children nodes {C} of a
parent node, no possible item X could have a membership in more than
one element of {C}.
(3) A reader's averaged item lookup time in this TOC should be minimal.
An item's TOC lookup time is defined as "how many categories and items
a reader has to scan through until he finds this item." At first the
reader scans the immediate children of the TOC root node from the first
and then downwards, until he finds a children that is his desired item
or is a category of his desired item, and then (if it's a category) he
expands this category and repeats the same "directory scan" among the
immediate children of this category, until his desired node is found.
For example, in the example TOC below, if the reader wants to find item
E, his visiting sequence is: A, B, C, E. (F is not visited because the
readers knows B is not a category of E; D is not visited because C is
already a category of E.)

A
+---B
+---F
+---C
+---E
+---D


I haven't thought about a good algorithm for this automatic
categorization task, but wants to propose this problem first :)

Regards,
Yao Ziyuan

.



Relevant Pages

  • Re: Interesting problem: automatic item categorization
    ... Suppose you're a book pulisher, and you have N "knowledge items" in ... no possible item X could have a membership in more than ... A reader's averaged item lookup time in this TOC should be minimal. ... a reader has to scan through until he finds this item." ...
    (comp.theory)
  • Re: Interesting problem: automatic item categorization
    ... Suppose you're a book pulisher, and you have N "knowledge items" in ... no possible item X could have a membership in more than ... A reader's averaged item lookup time in this TOC should be minimal. ... a reader has to scan through until he finds this item." ...
    (comp.theory)
  • Re: Interesting problem: automatic item categorization
    ... "points" to be organized in this article by paragraphs and clauses. ... no possible item X could have a membership in more than ... A reader's averaged item lookup time in this TOC should be minimal. ... a reader has to scan through until he finds this item." ...
    (comp.theory)
  • Re: Java for 5.2
    ... and I need to install Java 1.3.1 on AIX5. ... I have tried deleting and recreating the .toc. ... the reader of this message is not the intended recipient, ... communication in error, please immediately notify CDS @ 248-478-7000. ...
    (AIX-L)