Re: Subset Sum problem (w/ limited scope)



*** Hendrickson wrote:
Dan wrote:
This question isn't actually a fortran question. However, there
might be a fortran person here with some insights?

The "Subset Sum" problem (stated various ways depending on what you
are reading):

Given: a set of integers, some may be positive, some negative.
Find: a subset of those integers that sums up to zero.

In this basic problem, the subset size can itself be quite large.
There is no "fantastic" way to solve this problem for a large set of
integers. The total number of possible subsets is 2**n, where n is
the number of integers in the original set.

What I want to do:
Limit the maximum number of integers that can be included in the
subset that must add up to zero ( say 6 integers maximum ). Limiting
the maximum subset size to 6 does greatly reduce the memory and
computations required to solve this problem. However, as the
original set size grows to n=100 and above, even limiting the max.
subset size to 6 starts requiring huge amounts of memory.

Does anyone have any information or insights or know of any web
references that they can point me to that may be of help?

Thanks,
Dan
I don't really have any insights, maybe a few suggestions based
on incomplete knowledge will help ;).

Why do you say it takes a lot of memory? Do you intend to compute
all of the permuted sums into a big array and then search that for
zeros? I think it would be better to search for index sets that
sum to zero and then do whatever you need to do then.

Are there going to be duplicate values in your original array?
Do they matter? Are there zero values? do they count? Suppose
your initial values are 1, -1, -1, 0, and 0. Do you have just
one subset? Or a whole bunch of similar ones? If you don't care
about zeros or multiple instances of the same values, try eliminating
duplicates from the array before searching. If you do care about
zeros or duplicate values, I'd still try eliminating them, but count
them as you go (and remember their location). I think you can then
deduce the other index sets once you have found one set; just
replace index values from the list of known duplicate values.

After thinking abut my reply, you can't do as much simplification for
duplicate values in the list as I thought. With a set like
4,-1,-1,-1,-1 you can't leave out the -1s, nor is it obvious now
that you can deduce anything interesting knowing that there are
duplicates of -1.

*** Hendrickson

I'd sort the original list into descending (or ascending) order.
That will make it easier to know when to stop looking for a
possible solution if you do your searching from left to right
(or vice versa). As you sort the list, keep a carry-along parallel
array with the original index value so you can report the set
in terms of the original location if that's important.

Are the values guaranteed to be bounded by some reasonably small
number? If you know for sure that all of the values will be
between say -12345 and +54321 create a helper array with dimensions
(-12345:54321). Set all of the values to zero. Then go through
and set the hundred or so elements to one that have the same
subscript as one of your original values. That way, when you
find 5 values that sum to say 137, you can trivially see if there
is a sixth value of -137 merely by looking at helper_array(-137)
and seeing if it's zero or not. This avoids searching through
the list of values. If duplicates are important, you
could set the helper_array to the could of instances, rather than
just one or zero. You could also create a parallel array to the
helper array that contained the index in the sorted array of
the first instance of the first duplicate. That should let you
recover the indexes of the other duplicate elements without searching.

Hope that helps

*** Hendrickson
.