Re: Interesting problem: automatic item categorization
- From: yaoziyuan@xxxxxxxxx
- Date: 13 Aug 2006 17:20:09 -0700
The only difference is that in 20Q you can ask overlaying questions.
yaoziyuan@xxxxxxxxx wrote:
This problem sounds identical to what the 20Q game tries to maintain.
yaoziyuan@xxxxxxxxx wrote:
I've observed this interesting problem which has practical
applications.
Suppose you're a book pulisher, and you have N "knowledge items" in
total, and you're going to publish several books each containing a
selected subset of these N items.
The problem is, given a subset of items, how do you automatically make
a "table of contents" for this particular set of items.
What you have is M available "categories", each category gives its
memberships to some particular items and other categories. If category
A contains category B and category B contains category/item C, we say C
has a membership in B and in A.
You want the computer to automatically generate a table-of-contents for
a given subset of items using some of the M available categories as
"non-leaf nodes". The requirements are:
(1) Every item in the given subset appears in the generated TOC tree
once and only once;
(2) When a reader is exploring the immediate children nodes {C} of a
parent node, no possible item X could have a membership in more than
one element of {C}.
(3) A reader's averaged item lookup time in this TOC should be minimal.
An item's TOC lookup time is defined as "how many categories and items
a reader has to scan through until he finds this item." At first the
reader scans the immediate children of the TOC root node from the first
and then downwards, until he finds a children that is his desired item
or is a category of his desired item, and then (if it's a category) he
expands this category and repeats the same "directory scan" among the
immediate children of this category, until his desired node is found.
For example, in the example TOC below, if the reader wants to find item
E, his visiting sequence is: A, B, C, E. (F is not visited because the
readers knows B is not a category of E; D is not visited because C is
already a category of E.)
A
+---B
+---F
+---C
+---E
+---D
I haven't thought about a good algorithm for this automatic
categorization task, but wants to propose this problem first :)
Regards,
Yao Ziyuan
.
- References:
- Interesting problem: automatic item categorization
- From: yaoziyuan
- Re: Interesting problem: automatic item categorization
- From: yaoziyuan
- Interesting problem: automatic item categorization
- Prev by Date: Re: Interesting problem: automatic item categorization
- Next by Date: Re: Subsets from a set
- Previous by thread: Re: Interesting problem: automatic item categorization
- Index(es):
Relevant Pages
|