Subset sum problem


In computer science, the subset sum problem is an important decision problem in complexity theory and cryptography. There are several equivalent formulations of the problem. One of them is: given a set of integers, is there a non-empty subset whose sum is zero? For example, given the set, the answer is yes because the subset sums to zero. The problem is NP-complete, meaning roughly that while it is easy to confirm whether a proposed solution is valid, it may inherently be prohibitively difficult to determine in the first place whether any solution exists.
The problem can be equivalently formulated as: given the integers or natural numbers does any subset of them sum to precisely ? Subset sum can also be thought of as a special case of the knapsack problem. One interesting special case of subset sum is the partition problem, in which is half of the sum of all elements in the set.

Complexity

The complexity of the subset sum problem can be viewed as depending on two parameters, N, the number of decision variables, and P, the precision of the problem.
The complexity of the best known algorithms is exponential in the smaller of the two parameters N and P. Thus, the problem is most difficult if N and P are of the same order. It only becomes easy if either N or P becomes very small.
If N is small, then an exhaustive search for the solution is practical. If P is a small fixed number, then there are dynamic programming algorithms that can solve it exactly.
Efficient algorithms for both small N and small P cases are given below.

Exponential time algorithm

There are several ways to solve subset sum in time exponential in. The most naïve algorithm would be to cycle through all subsets of numbers and, for every one of them, check if the subset sums to the right number. The running time is of order, since there are subsets and, to check each subset, we need to sum at most elements.
A better exponential time algorithm is known which runs in time. The algorithm splits arbitrarily the elements into two sets of each. For each of these two sets, it stores a list of the sums of all possible subsets of its elements. Each of these two lists is then sorted. Using a standard comparison sorting algorithm for this step would take time. However, given a sorted list of sums for elements, the list can be expanded to two sorted lists with the introduction of a th element, and these two sorted lists can be merged in time. Thus, each list can be generated in sorted form in time. Given the two sorted lists, the algorithm can check if an element of the first array and an element of the second array sum up to in time. To do that, the algorithm passes through the first array in decreasing order and the second array in increasing order. Whenever the sum of the current element in the first array and the current element in the second array is more than, the algorithm moves to the next element in the first array. If it is less than, the algorithm moves to the next element in the second array. If two elements that sum to are found, it stops. Horowitz and Sahni first published this algorithm in a technical report in 1974.

Pseudo-polynomial time dynamic programming solution

The problem can be solved in pseudo-polynomial time using dynamic programming. Suppose the sequence is
sorted in the increasing order and we wish to determine if there is a nonempty subset which sums to zero. Define the boolean-valued function to be the value of
Thus, the solution to the problem "Given a set of integers, is there a non-empty subset whose sum is zero?" is the value of.
Let be the sum of the negative values and the sum of the positive values. Clearly,, if or. So these values do not need to be stored or computed.
Create an array to hold the values for and.
The array can now be filled in using a simple recursion. Initially, for, set
where is a boolean function that returns true if is equal to, false otherwise.
Then, for, set
For each assignment, the values of on the right side are already known, either because they were stored in the table for the previous value of or because if or. Therefore, the total number of arithmetic operations is. For example, if all the values are for some, then the time required is.
This algorithm is easily modified to return the subset with sum 0 if there is one.
The dynamic programming solution has runtime of where is the sum we want to find in set of numbers. This solution does not count as polynomial time in complexity theory because is not polynomial in the size of the problem, which is the number of bits used to represent it. This algorithm is polynomial in the values of and, which are exponential in their numbers of bits.
For the case that each is positive and bounded by a fixed constant, Pisinger found a linear time algorithm having time complexity . In 2015, Koiliaris and Xu found a deterministic algorithm for the subset sum problem where is the sum we need to find. In 2017, Bringmann found a randomized time algorithm.

Polynomial time approximate algorithm

An approximate version of the subset sum would be: given a set of numbers and a number, output:
If all numbers are non-negative, the approximate subset sum is solvable in time polynomial in and.
The solution for subset sum also provides the solution for the original subset sum problem in the case where the numbers are small. If any sum of the numbers can be specified with at most bits, then solving the problem approximately with is equivalent to solving it exactly. Then, the polynomial time algorithm for approximate subset sum becomes an exact algorithm with running time polynomial in and .
The algorithm for the approximate subset sum problem is as follows:
initialize a list S to contain one element 0.
for each i from 1 to N do
let T be a list consisting of xi + y, for all y in S
let U be the union of T and S
sort U
make S empty
let y be the smallest element of U
add y to S
for each element z of U in increasing order do
// Trim the list by eliminating numbers close to one another
// and throw out elements greater than s.
if y + cs/N < zs then
y = z
add z to S
if S contains a number between s and s then
return yes
else
return no
The algorithm is polynomial time because the lists, and always remain of size polynomial in and and, as long as they are of polynomial size, all operations on them can be done in polynomial time. The size of lists is kept polynomial by the trimming step, in which we only include a number into if it is greater than the previous one by and not greater than.
This step ensures that each element in is smaller than the next one by at least and do not contain elements greater than. Any list with that property consists of no more than elements.
The algorithm is correct because each step introduces an additive error of at most and steps together introduce the error of at most.