Re: counting subsets of S so that sum(S_n) = N



azi.stdout@xxxxxxxxx wrote:

Well, it's not really the NP completeness that bothers me, but rather
my lame "subset" generation method.. Many "subsets" repeat over time
so I have to keep track of already summed sets and check them each
time I've encountered a "valid" subset.

I'm wondering.. Is there really no elegant way to get rid of repeated
subsets, and make my algorithm generate only unique subsets?

I'm going to assume that your problem here is avoiding generating
repeated subsets /even/ in the case where the input contains no
repetitions.

Maybe that's not your problem, but I can't read your Python code, so I
don't really know what your current approach does in any detail.

If you want to generate all subsets of a given set -- or more
accurately all subsequences of a given sequence -- then one way to do
it is by counting. The underlying idea is that you create an array of
boolean values of the same size as the input sequence, and then run
though every possible combination of true/false in each position. For
each combination, you select the corresponding items from the orginal
sequence.

For instance if the input is

A B C D E F

and the array of boolean at one step of the algorithm is

t f t t f f

then the subsequence that we generate at this step is

A C D

You can implement the "array of booleans" as an integer, using
bit-twiddling to extract each boolean, and using +1 to generate the
next array at each step (as Logan has described). But I think it's
nicer to code it directly. If you want to do that, then you can start
with the array all set to false:

f f f f f

and at each step you generate a new set of values by "adding one". To
do that start at the leftmost position and look at the value there. If
it is false, then you set it to true and stop. Otherwise you set it to
false, and move to consider the next position to the right. If the
value at /that/ position is false, then set it to true and stop.
Otherwise, set it to false and move right again. Keep going until you
either find a position which contains false, or run of the end of the
array (in which case the entire algorithm is complete). So the
algorithm (for N = 3) generates the following values for the boolean
array, in order:

f f f
t f f
f t f
t t f
f f t
t f t
f t t
t t t

The nice thing about doing it that way (either by using integers or
manipulating boolean in a "counting-like" way) is that it has a simple
implementation without needing recursion.

But, for this particular application of subset generation, a different
approach would probably be better. The opportunity that we are missing
is that we consider /every/ subset, even though we could avoid
generating some of them because we /know/ they cannot be candidate
solutions.

Say we are looking for subsets of { 1 3 5 7 9 11 } that sum to 11, and
we have just considered and rejected subset { 5 7 } then we know that
no larger subset which also contains 5 and 7 can possibly be a
solution. (This assumes that all the numbers are non-negative -- it
gets a little more complicated if negative numbers are allowed.)

So what we would like is a way of generating subsets which allows us
(at least in part) to recognise partial solutions which cannot possibly
be part of a complete solution and so reject whole swathes of candidate
solutions without even generating them. An idea generally known as
"early pruning".

We can generate all possible subsequences of an imput sequence
recursively as follows.

Start with an index, N, set to zero (assuming the language uses
0-based indexing) and an empty working sequence, W. We will generate
all possible subsequences in the working array.

At each step of the algorithm:
If N is off the end of the input then we have generated a candidate
subsequence in W, process it and return.
Otherwise:
Append input[N] to W.
Recursively invoke the algorithm with N = N+1.
Remove the last item from W (the one we added earlier).
Recursively invoke the algorithm with N = N+1 again.

That will step through all the possible subsequence of the input. The
advantage of this particular way of implementing the search is that
each recursive call /increases/ the size of the subsequence we
consider. So, at each step we can add up the numbers in W, and if that
total is already too big then we can skip both the recursive calls
since they cannot possibly generate a solution to the problem. Doing
things that way, it seems as if sorting the input into reverse order
before we start should help (allow the pruning to happen as early as
possible), but I don't know how much real difference that would make
(if any).

-- chris
.



Relevant Pages

  • Re: Mergesort Vs Quicksort
    ... I might not have correctly remembered my algorithm of months ago ... for sorting records in an array using one auxilary of the ... on how things turn out from lower levels of recursion, ... whether the number of records in the array segment to be sorted is ...
    (comp.programming)
  • Re: is there problem that "has to" use recursion
    ... because either way you have defined an algorithm in terms of itself. ... your stack is either the system stack or an array stack. ... Any of the algorithms that optimally use recursion, ...
    (comp.programming)
  • Re: regex question - trying to find ".mp3" in a SELECT box
    ... But `Array' is a Function object reference. ... Establish a new execution context using F's FormalParameterList, ... Since is not a primitive value, the exception should be thrown. ... | 5.2 Algorithm Conventions ...
    (comp.lang.javascript)
  • Re: Reference to derived type element by index?
    ... as a set of distinctively-named scalars. ... In another scope, the same common block ... would be a single array. ... algorithm and a specific application of the algorithm, ...
    (comp.lang.fortran)
  • [SUMMARY] Maximum Sub-Array (#131)
    ... # sum the integer values of array contents ... algorithm, ... all possible lengths, to check each subarray. ... $ time ruby -r max_sub_array ...
    (comp.lang.ruby)