CO Problems

We show the basic denotations of graphs. Let \(\mathcal{G} = (\mathcal{V}, \mathcal{E}, W)\) denote a weighted graph, where \(\mathcal{V}\) is the node set, \(\mathcal{E}\) is the edge set, \(|\mathcal{V}| = V\), \(|\mathcal{E}| = E\), and \(W : \mathcal{E} \rightarrow \mathbb{R}^+\) is the edge weight function, i.e., \(W_{u,v}\) is the weight of edge \((u,v) \in \mathcal{E}\). \(W_{u,v} > 0\) if \((u,v)\) is an edge and 0 otherwise. Let \(\delta^+(i)\) and \(\delta^-(i)\) denote the out-arcs and in-arcs of node \(i\).

Integer linear programming (ILP) is a standard formulation of combinatorial optimization problems. It has the canonical form:

\begin{equation} \begin{aligned} \min \quad & \mathbf{c}^T \mathbf{x} \\ \text{s.t.} \quad & A \mathbf{x} \le \mathbf{b}, \\ & \mathbf{x} \ge 0, \\ & \mathbf{x} \in \mathbb{Z}^n, \end{aligned} \tag{1} \end{equation}

where \(\mathbf{x}\) is a vector of \(n\) decision variables, \(\mathbf{c}\) is a vector of \(n\) coefficients for \(\mathbf{x}\) in the objective function, \(A \in \mathbb{R}^{m \times n}\) and \(\mathbf{b} \in \mathbb{R}^m\) together denote \(m\) linear constraints, and \(\mathbf{x} \in \mathbb{Z}^n\) implies that we are interested in integer solutions. Let \(\mathbf{x}^*\) denote the optimal solution and \(f^*\) denote the corresponding objective value. Only a few problems such as portfolio optimization are quadratic programming, which will be described later.

With respect to QUBO or Ising model, we consider a 1D Ising model with a ring structure and an external magnetic field \(h_i\). There are \(N\) nodes with \((N + 1) \equiv 1 \mod N\); a node \(i\) has a spin \(s_i \in \{+1, -1\}\) (where +1 for up and −1 for down). Two adjacent sites \(i\) and \(i + 1\) have an energy \(w_{i,i+1}\) or \(-w_{i,i+1}\) depending on whether they have the same or different directions, respectively.

The whole system will evolve into the ground state with the minimum Hamiltonian:

\begin{equation} f(\mathbf{s}) = - \sum_{i=1}^N h_i s_i + \alpha \sum_{i=1}^N -w_{i,i+1} s_i s_{i+1} \tag{2} \end{equation}

where \(\alpha\) is a weight, \(f_A\) is defined on each node’s effect on its own, and \(f_B\) is defined on each two adjacent nodes’ interactions. In fact, we generally use binary variables (0 or 1) to formulate the objective function, and \(s_i \in \{+1, -1\}\) can be replaced by

\begin{equation} x_i = \frac{s_i + 1}{2}, \tag{3} \end{equation}

where \(x_i \in \{0, 1\}\).

Graph Maxcut

An example of graph maxcut

Figure 1: An example of graph maxcut.

The graph maxcut problem is defined as follows. Given a graph \(\mathcal{G} = (\mathcal{V}, \mathcal{E}, W)\), split \(\mathcal{V}\) into two subsets \(\mathcal{V}^+\) (with edge set \(\mathcal{E}^+\)) and \(\mathcal{V}^-\) (with edge set \(\mathcal{E}^-\)), and the cut set is \(\delta = \{(i, j) \mid i \in \mathcal{V}^+, j \in \mathcal{V}^-\}\). The goal is to maximize the cut value:

\[\max \sum_{(i,j)\in \delta} W_{ij}\]

Where \(W_{ij}\) represents the importance (or cost) of edge \((i, j)\), and is usually predefined based on the graph structure.

1. ILP Formulation

Take Figure 1 for Example

\[\max \quad \mathbf{y}_{12} + \mathbf{y}_{14} + \mathbf{y}_{23} + \mathbf{y}_{35} + \mathbf{y}_{45}\]

s.t.

\[ \begin{align}\begin{aligned}\begin{split}\mathbf{y}_{12} \le \mathbf{x}_1 + \mathbf{x}_2 \\ \mathbf{y}_{12} \le 2 - \mathbf{x}_1 - \mathbf{x}_2 \\ \mathbf{y}_{12} \ge \mathbf{x}_1 - \mathbf{x}_2 \\ \mathbf{y}_{12} \ge -\mathbf{x}_1 + \mathbf{x}_2 \\\end{split}\\\begin{split}\mathbf{y}_{14} \le \mathbf{x}_1 + \mathbf{x}_4 \\ \mathbf{y}_{14} \le 2 - \mathbf{x}_1 - \mathbf{x}_4 \\ \mathbf{y}_{14} \ge \mathbf{x}_1 - \mathbf{x}_4 \\ \mathbf{y}_{14} \ge -\mathbf{x}_1 + \mathbf{x}_4 \\\end{split}\\\begin{split}\mathbf{y}_{23} \le \mathbf{x}_2 + \mathbf{x}_3 \\ \mathbf{y}_{23} \le 2 - \mathbf{x}_2 - \mathbf{x}_3 \\ \mathbf{y}_{23} \ge \mathbf{x}_2 - \mathbf{x}_3 \\ \mathbf{y}_{23} \ge -\mathbf{x}_2 + \mathbf{x}_3 \\\end{split}\\\begin{split}\mathbf{y}_{35} \le \mathbf{x}_3 + \mathbf{x}_5 \\ \mathbf{y}_{35} \le 2 - \mathbf{x}_3 - \mathbf{x}_5 \\ \mathbf{y}_{35} \ge \mathbf{x}_3 - \mathbf{x}_5 \\ \mathbf{y}_{35} \ge -\mathbf{x}_3 + \mathbf{x}_5 \\\end{split}\\\begin{split}\mathbf{y}_{45} \le \mathbf{x}_4 + \mathbf{x}_5 \\ \mathbf{y}_{45} \le 2 - \mathbf{x}_4 - \mathbf{x}_5 \\ \mathbf{y}_{45} \ge \mathbf{x}_4 - \mathbf{x}_5 \\ \mathbf{y}_{45} \ge -\mathbf{x}_4 + \mathbf{x}_5\end{split}\end{aligned}\end{align} \]

Each \(\mathbf{x}, \mathbf{y}\in \{0, 1\}\).

This ILP formulation ensures that \(\mathbf{y}_{ij} = 1\) if and only if nodes \(i\) and \(j\) belong to different subsets (i.e., edge \((i,j)\) is cut).

We provide general ILP formulation of graph maxcut

$$\begin{array}{ll} \text{max} & \sum_{(i,j)} W_{ij} y_{ij} \\ \text{s.t.} & y_{ij} \le x_i + x_j,\quad \forall i, j \in V,\ i < j \\ & y_{ij} \le 2 - x_i - x_j,\quad \forall i, j \in V,\ i < j \\ & y_{ij} \ge x_i - x_j,\quad \forall i, j \in V,\ i < j \\ & y_{ij} \ge -x_i + x_j,\quad \forall i, j \in V,\ i < j \end{array}$$

where \(x_i\) is a binary variable denoting if node \(i\) belongs to the selected subset; and \(y_{ij}\) is 1 if nodes \(i\) and \(j\) are in different subsets and 0 otherwise. The weight \(W_{ij}\) represents the importance (or cost) of edge \((i, j)\), and is usually predefined based on the graph structure.

2. QUBO Formulation

Take Figure 1 for example

For an illustrative example in the left graph of Figure 1, the edge set is:

\(E = \{(1,2), (1,4), (2,3), (2,4), (3,5)\}\)

and the weights are:

\(w_{12} = w_{14} = w_{23} = w_{24} = w_{35} = w_{45} = 1\).

The edge set of black nodes is \(\mathcal{E}^+ = \{(1, 4)\}\), and the edge set of white nodes is \(\mathcal{E}^- = \emptyset\).

The edges connecting the two subsets are:

\(\delta = \{(1,2), (2,3), (2,4), (3,5), (4,5)\}\).

The solution is \(x \in \{0,1\}^5\), and the Hamiltonian in Equation (3) becomes:

\[\begin{split}\begin{aligned} \min f(x) =\ & -\frac{1}{2} (1 - (2x_1 - 1)(2x_2 - 1)) \\ & -\frac{1}{2} (1 - (2x_1 - 1)(2x_4 - 1)) \\ & -\frac{1}{2} (1 - (2x_2 - 1)(2x_3 - 1)) \\ & -\frac{1}{2} (1 - (2x_3 - 1)(2x_5 - 1)) \\ & -\frac{1}{2} (1 - (2x_4 - 1)(2x_5 - 1)) \end{aligned}\end{split}\]

We provide general QUBO formulation of maxcut

\[\min f(x) = -\frac{1}{2} \sum_{(i,j)\in \mathcal{E}} W_{ij} \left(1 - (2x_i - 1)(2x_j - 1) \right)\]

Here, \(\sum_{(i,j)\in \mathcal{E}} W_{ij}\) is a constant. Let \(x_i = 1\) if node \(i\) belongs to \(V^+\), and 0 otherwise. The term \(1 - (2x_i - 1)(2x_j - 1)\) equals 1 if nodes \(i\) and \(j\) are in different subsets, and 0 otherwise.

3. RL Methods: Two Decision Patterns

../_images/fig2.png

Figure 2: Two patterns for graph maxcut.

Sparse-rewards Pattern (I) Initial state is empty. RL agent selects node 1 with the highest Q-value. Reward is 2. New state becomes :math:[1].

Dense-rewards Pattern (II) Current state is :math:[2, 3], objective = 2. Agent adds node 1. New state is :math:[1, 3, 4]. Objective improves to 4.

Knapsack

We take knapsack as a second probelm.

Knapsack problem Given a set of items \(\mathcal{I}\), each item \(i\) with an integer weight \(W_i\) and a value \(\mu_i\), determine which items to include in the collection so that the total weight is less than or equal to a given limit \(U\) and the total value is maximized.

We assume there are 3 items. Their values are \([1, 1, 2]\), weights are \([1, 1, 1]\), and the total weight limit is \(2\).

1. ILP Formulation

ILP formulation for this example

\[\begin{split}\max \quad x_1 + x_2 + 2x_3 \\ \text{s.t.} \quad x_1 + x_2 + x_3 \le 2 \\ x_1, x_2, x_3 \in \{0, 1\}\end{split}\]

We provide general ILP formulation of knapsack

\[\begin{split}\max \sum_{i \in \mathcal{I}} \mu_i x_i \\ \text{s.t.} \sum_{i \in \mathcal{I}} W_i x_i \le U, \\ x_i \in \{0, 1\}\end{split}\]

where \(x_i\) is a binary variable (1 if item \(i\) is in the knapsack), \(\mu_i\) is its value, \(W_i\) is its weight, and \(U\) is the weight limit.

2. QUBO Formulation

For the same example:

We consider an item \(i\), and let \(x_i\) be a binary variable with 1 denoting it is in the knapsack and 0 otherwise. Let \(y_n\) for \(1 \leq n \leq U\) be a binary variable with 1 denoting the final weight of the knapsack is \(n\).

The QUBO formulation of Knapsack is:

\[\min_x \left( \left( y_1 + 2y_2 \right)^2 + \left( y_1 + 2y_2 - x_1 - x_2 - x_3 \right)^2 - \alpha (x_1 + x_2 + 2x_3) \right)\]

Here, \(y_1, y_2\) are binary variables indicating whether total weight equals 1 or 2 respectively. \(\alpha \in (0,1)\) is a penalty parameter.

We provide general QUBO formulation of knapsack

\[\min_x f = \left( \sum_{n=1}^{U} y_n \right)^2 + \left( \sum_{n=1}^{U} n y_n - \sum_{i \in \mathcal{I}} W_i x_i \right)^2 - \alpha \sum_{i \in \mathcal{I}} \mu_i x_i\]

where \(y_n \in \{0,1\}\) denotes whether the final weight equals \(n\), and \(x_i\) is whether item \(i\) is in the knapsack.

3. Two Patterns of RL Methds

For the same 3-item knapsack problem

Pattern I The initial state is empty. Then we select item 2 and add it to the state, i.e., the new state is :math:[2]. The reward is 1.

Pattern II The current state is :math:[0, 1, 0], and the objective value is 1. The new state is :math:[0, 1, 1], and the objective value is 3.