Breaking the knapsack cryptosystem

More Info
expand_more

Abstract

We live in an online world: we date online, we do business online and we communicate online. To make sure this happens securely, almost all transferred data is encrypted by cryptosystems. This paper focuses on one of these: the knapsack cryptosystem of Merkle and Hellman, which relies on the hardness of solving the knapsack problem (1978). In general it works as follows.
Bob, who wants to communicate with Alice, encrypts his message with a public key and sends this to her. If a third party now intercepts it, he must solve an instance of the knapsack problem, which is NP-hard in general. However, this becomes computationally infeasible if the number of items is large, and therefore Bob’s message is safe. Alice has access to a private key, which she uses to transform the hard knapsack problem into an easier one: one where the vector of weights is superincreasing. That is, each component of the vector is larger than the sum of all previous components. In this case she can solve the problem efficiently and read what Bob sent her.
This method seems to be reliable at first sight. However, a few years after it was published, cryptographer Shamir proposed an algorithm that breaks the system (1984). The algorithm finds a pair of numbers by solving two systems of inequalities. Then a third party can use this pair to transform the hard knapsack problem into one he can solve, which may be different from the one Alice finds. However, he will find the same solution, and therefore he can also read Bob’s message.
This algorithm can be implemented as a computer program and in this paper we used Python. The first system of inequalities is written as a optimization problem and since there is fixed number of unknowns, we can solve it in polynomial time using Lenstra’s integer programming algorithm (1983). In this paper we used the Gurobi optimizer to solve it, as Lenstra’s algorithm is hard to use in practice. The second system of inequalities is solved by comparing lower and upper bound, which can also be carried out in polynomial time. Nevertheless, the total algorithm finishes in polynomial time only with a certain high probability since we made some probabilistic assumptions. This implies that there is a small probability of failure. However, if it finds a solution, then the algorithm is fast, and therefore it is still valuable to use in real life.

Files