Ex. S1a: Two-Stage Stochastic Optimization for a Server Farm

Based on this Medium post.

The general, two-stage stochastic optimization problem aims to minimize the loss (or maximize the profit) resulting from the first stage decision plus the expected loss resulting from the second stage decision,

\[ \min_{x \in \mathcal{X}} f(x) + \mathbb{E} \left[ Q(x,\xi)\right], \] where \(Q\) is the solution of the second-stage, recourse problem, \[ \min_{y \in \mathcal{Y}} g(y,x,\xi). \]

NOTE

We use the gurobi solver here, since it is at least 2 orders of magnitude faster than the open-source glpk solver.

Problem Formulation

We want either to design a new server (or storage) farm, or just allocate from an existing one, to meet uncertain future compute (or data) needs. We suppose that we have a limited budget, and limited power (or space) capacities to meet the demand.

We have a general estimation of the next-month demand for each resource and the distributions they follow. For the sake of simplicity, we will ignore future months after next month, though it is relatively simple to formulate a multi-stage stochastic program to handle this case.

In this example, our first-stage decision (the here-and-now decision) is the number of each resource to buy, or allocate. The second-stage (recourse) decision is the amount of additional cloud-compute (or storage) to buy (or allocate) in case of insufficient availability. Our goal is to minimize the operational cost of the server (or storage, or both) farm.

Problem notation

Let \(x\) denote the number of resources to buy/allocate. We suppose thate there are three resources here. Each resource has an installation/allocation cost, \(c,\) and a space/power requirement \(s.\) The total space available is \(S\) and the total budget is \(B.\)

Our goal is to minimize the total operational cost of the server farm, which can be written as

\[\begin{align*} \textrm{minimize} & \,\, c^{\top}x + \mathrm{E}\left[Q(x;\xi)\right], \\ \textrm{subject to} & \,\, s^{\top}x \le S,\\ & \,\, c^{\top}x \le B,\\ & \,\, x \ge 0, \end{align*}\] where \(Q(x;\xi)\) is the optimal value of the second-stage problem \[\begin{align*} \min_{y\ge0} & \,\,r^{\top}y\\ \textrm{s.t.} & \,\,x + y \ge d(\xi).\\ \end{align*}\] where \(r\) is the resource cost, and the demands \(d(\xi)\) are supposed Gaussian, with known means and variances.

SAA method

To solve the problem above, we will first implement a technique called sample average approximation (SAA). This method involves generating a large number \(K\) of potential realizations (scenarios) of the random variable through Monte Carlo sampling and treating each realization as an individual sub-problem. This means that instead of just having one set of second-stage variables, we will have \(K\) variables and the expectation of our second-stage value is the average value over the realizations. With this, our formulation changes to,

\[\begin{align*} \textrm{minimize} & \,\, c^{\top}x +\frac{1}{K}\sum_{k=1}^K Q_k(x;\xi), \\ \textrm{subject to} & \,\, s^{\top}x \le S,\\ & \,\, c^{\top}x \le B,\\ & \,\, x \ge 0, \end{align*}\] where \(Q_k(x;\xi)\) is the optimal value of the second-stage problem \[\begin{align*} \min_{y\ge0} & \,\,r^{\top}y_k\\ \textrm{s.t.} & \,\,x + y_k \ge d(\xi_k).\\ \end{align*}\]

import numpy as np
import pyomo.environ as pyo

np.random.seed(1956)

# ── First Stage Constants ──────────────────────────────────────────────────────
c = np.array([6, 14, 27]) # Installation (allocation) costs
s = np.array([1,  5,  3]) # Space (power) needed
B = 1000 # Budget available
S = 100  # Space available

# ── Second Stage Constants ─────────────────────────────────────────────────────
K = 50  # Number of scenarios for SAA
d_xi = np.array([
    np.random.normal(70, 10, K),
    np.random.normal( 8,  2, K),
    np.random.normal(15,  4, K)
])
r = np.array([7, 18, 30])    # Costs of each resource
p = np.array([1.0 / K] * K)  # Uniform scenario probabilities

# ── Index Sets ─────────────────────────────────────────────────────────────────
RESOURCES  = [0, 1, 2]   # CPU, GPU, TPU
SCENARIOS  = list(range(K))

# ── Model ──────────────────────────────────────────────────────────────────────
m = pyo.ConcreteModel(name="2SP")

# ── Variables ──────────────────────────────────────────────────────────────────
m.x = pyo.Var(RESOURCES, domain=pyo.NonNegativeIntegers)          # First stage
m.y = pyo.Var(RESOURCES, SCENARIOS, domain=pyo.NonNegativeIntegers)  # Recourse

# ── Objective ──────────────────────────────────────────────────────────────────
m.obj = pyo.Objective(
    expr=(
        sum(c[i] * m.x[i] for i in RESOURCES)
      + sum(p[k] * sum(r[i] * m.y[i, k] for i in RESOURCES)
            for k in SCENARIOS
          )
    ),
    sense=pyo.minimize
)

# ── Constraints ────────────────────────────────────────────────────────────────
## Installation budget
m.budget = pyo.Constraint(
    expr=sum(c[i] * m.x[i] for i in RESOURCES) <= B
)

## Space (power) available
m.space = pyo.Constraint(
    expr=sum(s[i] * m.x[i] for i in RESOURCES) <= S
)

## Meet demand in every scenario  (x_i + y_{i,k} >= d_{i,k})
m.demand = pyo.Constraint(
    RESOURCES, SCENARIOS,
    rule=lambda m, i, k: m.x[i] + m.y[i, k] >= d_xi[i, k]
)

# ── Solve ──────────────────────────────────────────────────────────────────────
solver = pyo.SolverFactory("gurobi")        # swap for "glpk",  "cbc" or "gurobi" freely
results = solver.solve(m, tee=False)        # tee=True for verbose output

# ── Results ────────────────────────────────────────────────────────────────────
if results.solver.termination_condition == pyo.TerminationCondition.optimal:
    x_opt = np.array([pyo.value(m.x[r]) for r in RESOURCES])
    obj_val = pyo.value(m.obj)
    print(
        f"Our server farm will cost €{obj_val:,.2f}. "
        f"\nWe should buy/allocate {int(x_opt[0])} CPUs, "
        f"{int(x_opt[1])} GPUs, and "
        f"{int(x_opt[2])} TPUs."
    )
else:
    print("Solver did not find an optimal solution:",
          results.solver.termination_condition)

Our server farm will cost €1,008.22. 
We should buy/allocate 56 CPUs, 4 GPUs, and 8 TPUs.

Discrete probabilities - Scenario tree model

Now we suppose that we have three scenarios for the demand, each with an associated probability.

Scenario (\(s\))	Demand (\(d_s\))	Probability (\(p_s\))
Low	[60, 6, 12]	0.10
Medium	[70, 8, 15]	0.80
High	[75, 10, 18]	0.10

A simple modification of the SAA model is needed. We just change the sum over the \(K\) scenarios, to a sum over the \(S\) discrete events, and modify the resource demands \(d(\xi)\) to an array of \(S\) vectors.

import numpy as np
import pyomo.environ as pyo

# ── First Stage Constants ──────────────────────────────────────────────────────
c = np.array([6, 14, 27]) # Installation (allocation) costs
p = np.array([1,  5,  3]) # Space (power) needed
B = 1000 # Budget available
P = 100  # Space (power) available

# ── Second Stage Constants ─────────────────────────────────────────────────────
S = 3 # Number of scenarios for tree
d_xi = np.array([[60, 6, 12],  # low
                 [70, 8, 15],  # medium
                 [75, 10, 18]]).T # high
r = np.array([7, 18, 30])    # Costs of each resource
p = np.array([1.0 / S] * S)    # gives 60, 6, 12 and cost = 952€ (Uniform scenario probabilities)
p = np.array([0.3, 0.5, 0.2 ]) # gives 60, 6, 12 and cost = 937€
p = np.array([0.1, 0.8, 0.1 ]) # gives 70, 8, 15 and cost = 953€

# ── Index Sets ─────────────────────────────────────────────────────────────────
RESOURCES  = [0, 1, 2]   # CPU, GPU, TPU
SCENARIOS  = list(range(S))

# ── Model ──────────────────────────────────────────────────────────────────────
m = pyo.ConcreteModel(name="2SP-TREE")

# ── Variables ──────────────────────────────────────────────────────────────────
m.x = pyo.Var(RESOURCES, domain=pyo.NonNegativeIntegers)          # First stage
m.y = pyo.Var(RESOURCES, SCENARIOS, domain=pyo.NonNegativeIntegers)  # Recourse

# ── Objective ──────────────────────────────────────────────────────────────────
m.obj = pyo.Objective(
    expr=(
        sum(c[i] * m.x[i] for i in RESOURCES)
      + sum(p[s] * sum(r[i] * m.y[i, s] for i in RESOURCES)
            for s in SCENARIOS
          )
    ),
    sense=pyo.minimize
)

# ── Constraints ────────────────────────────────────────────────────────────────
## Installation budget
m.budget = pyo.Constraint(
    expr=sum(c[i] * m.x[i] for i in RESOURCES) <= B
)

## Space (power) available
m.space = pyo.Constraint(
    expr=sum(p[i] * m.x[i] for i in RESOURCES) <= P
)

## Meet demand in every scenario  (x_i + y_{i,s} >= d_{i,s})
m.demand = pyo.Constraint(
    RESOURCES, SCENARIOS,
    rule=lambda m, i, s: m.x[i] + m.y[i, s] >= d_xi[i, s]
)

# ── Solve ──────────────────────────────────────────────────────────────────────
solver = pyo.SolverFactory("gurobi")        # swap for "glpk",  "cbc" or "gurobi" freely
results = solver.solve(m, tee=False)        # tee=True for verbose output

# ── Results ────────────────────────────────────────────────────────────────────
if results.solver.termination_condition == pyo.TerminationCondition.optimal:
    x_opt = np.array([pyo.value(m.x[r]) for r in RESOURCES])
    obj_val = pyo.value(m.obj)
    print(
        f"Our server farm will cost €{obj_val:,.2f}. "
        f"\nWe should buy/allocate {int(x_opt[0])} CPUs, "
        f"{int(x_opt[1])} GPUs, and "
        f"{int(x_opt[2])} TPUs."
    )
else:
    print("Solver did not find an optimal solution:",
          results.solver.termination_condition)

Our server farm will cost €953.10. 
We should buy/allocate 70 CPUs, 8 GPUs, and 15 TPUs.