CDT Implementation

Before addressing the simple examples of resource scheduling and the use cases, we describe the overall implementation, as shown in Figure 3.1.

flowchart LR
    A[/Input/] --> B["Model 
                      Function"]
    B --> C["pyomo
             Model"] 
    C --> D[/Schedule /]

Figure 3.1: The implementation in 4 stages.
  1. Input description of jobs (json file), that is loaded into …
  2. Model function, that generates …
  3. pyomo model, which produces …
  4. Output, an optimal schedule (csv or json file).

Input File

This json input file contains a complete description of the CDT that we want to model for optimization. Thanks to its genericity, it can be readily adapted to a wide diversity of projects, and even to global job scheduling at national or European levels.

The input file contains all the information regarding the following characteristics of the CDT:

  • \(\mathcal{P} = \{P_1, P_2, \ldots \}\) the list of projects;
  • \(\mathcal{J}_i = \{J_1^i, J_2^i, \ldots \}\) the list of jobs per project1;
  • \(\mathcal{O}_j = \{O_1^j, O_2^j, \ldots \}\) the list of operations per job2;
  • \(\mathcal{M}_k = \{M_1^k, M_2^k, \ldots \}\) the list of modes per job (operation)3;
  • Resources:
    • \(\mathcal{R}_l^{\rho} = \{R_1^l, R_2^l, \ldots \}\) list of renewable resources per mode;
    • \(\mathcal{R}_l^{\nu} = \{R_1^l, R_2^l, \ldots \}\) list of non-renewable resources per project;
  • Job requirements per resource: \(p_{ij}\)
  • Costs per resource: \(c_{ij,l}\)
  • List of precedences4: \(\mathcal{E} = \{(i,j) \mid i \prec j, \, i,j \in \mathcal{J} \}.\)

1 These are of type data transfer, data storage, data processing, etc.

2 This is an optional sub-group–the operations can just be considered as additional jobs.

3 These are the facilities that can be exploited for each category of job: networks, data centers, HPC centers, etc.

4 Successors and/or predecessors, as defined by the workflow DAG.


Data Model Overview

The input consists of four entities:

Entity Description
Resources Available resources with capacities (renewable per period, non-renewable)
Jobs Units of work to be scheduled
Modes Alternative execution options for each job
Precedences Directed edges defining the execution order (DAG)

1. Resources

Field Type Required Description
resource_id string yes Unique identifier for the resource
name string no Human-readable name
capacity integer ≥ 0 yes Maximum units available per time period5

5 For renewable resources. If resource is non-renewable, this is the total available.

2. Jobs

Field Type Required Description
job_id string yes Unique identifier for the job
name string no Human-readable name
release_time integer ≥ 0 no Earliest start time (default: 0)
deadline integer ≥ 0 no Latest finish time (default: none)

3. Modes

Each job must have at least one mode. A mode specifies how a job can be executed.

Field Type Required Description
mode_id string yes Unique identifier for the mode
job_id string yes Reference to parent job
duration integer ≥ 0 yes Processing time in periods
cost float ≥ 0 yes Cost of selecting this mode
resource_requirements list yes Resources consumed during execution

Resource Requirement (nested within Mode)

Field Type Required Description
resource_id string yes Reference to a defined resource
demand integer ≥ 0 yes Units required per period while job executes

4. Precedences

Defines the DAG structure. Each entry represents a directed edge.

Field Type Required Description
predecessor string yes job_id of the job that must finish first
successor string yes job_id of the job that must start after
lag integer ≥ 0 no Minimum time gap between finish and start (default: 0)

This gives the following json format.

Show Input File Code
{
  "problem_name": "string (optional)",
  "horizon": "integer (optional, planning horizon)",
  
  "resources": [
    {
      "resource_id": "string",
      "name": "string (optional)",
      "capacity": "integer"
    }
  ],
  
  "jobs": [
    {
      "job_id": "string",
      "name": "string (optional)",
      "release_time": "integer (optional, default 0)",
      "deadline": "integer (optional, default null)"
    }
  ],
  
  "modes": [
    {
      "mode_id": "string",
      "job_id": "string",
      "duration": "integer",
      "cost": "number",
      "resource_requirements": [
        {
          "resource_id": "string",
          "demand": "integer"
        }
      ]
    }
  ],
  
  "precedences": [
    {
      "predecessor": "string (job_id)",
      "successor": "string (job_id)",
      "lag": "integer (optional, default 0)"
    }
  ]
}

Validation

We provide a JSON schema that the user can use for formal validation of their input files as follows.

import json
import jsonschema
from jsonschema import validate
import DraftValidator

# Load schema and data
with open("workflow-schema.json") as f:
    schema = json.load(f)

with open("my-problem.json") as f:
    data = json.load(f)

# Validate
try:
    validate(instance=data, schema=schema)
    print("Input is valid.")
except jsonschema.ValidationError as e:
    print(f"Validation error: {e.message}")
    print(f"Path: {list(e.absolute_path)}")

Model Function

All the above are necessary and sufficient for the mathematical formulation presented in the section Mathematical Formulation - Deterministic Model. We define a pyomo function that takes a dictionary of tasks as input6, and returns a pyomo model.

6 See below for examples.

# Model function generator
def MRCPSP_model(TASKS):
    model = pyo.ConcreteModel()
    # TASKS is a two dimensional set of (j,m) constructed from the dictionary keys
    model.TASKS = pyo.Set(initialize=TASKS.keys(), dimen=2)
    # Set of JOBS is constructed from a python set
    model.JOBS = pyo.Set(initialize=list(set([j for (j, m) in model.TASKS])))
    # Set of MACHINES is constructed from a python set
    model.MACHINES = pyo.Set(initialize=list(set([m for (j, m) in model.TASKS])))
    # ORDER of tasks is constructed as a cross-product of tasks and filtering
    model.TASKORDER = pyo.Set(
        initialize = model.TASKS * model.TASKS,
        dimen      = 4,
        filter     = lambda model, j, m, k, n: (k, n) == TASKS[(j, m)]["prec"],
    )
    # Set of DISJUNCTIONS is cross-product of jobs, jobs, and machines
    model.DISJUNCTIONS = pyo.Set(
        initialize = model.JOBS * model.JOBS * model.MACHINES,
        dimen      = 3,
        filter     = lambda model, j, k, m: j < k
                     and (j, m) in model.TASKS
                     and (k, m) in model.TASKS,
    )
    # Load DURATION data into a model parameter for later access
    @model.Param(model.TASKS)
    def dur(model, j, m):
        return TASKS[(j, m)]["dur"]
    # Upper bound on MAKESPAN
    ub = sum([model.dur[j, m] for (j, m) in model.TASKS])
    # Create DECISION variables
    model.makespan = pyo.Var(bounds=(0, ub))
    model.start = pyo.Var(model.TASKS, bounds=(0, ub))
    # OBJECTIVE function 
    @model.Objective(sense=pyo.minimize)
    def minimize_makespan(model):
        return model.makespan
    # CONSTRAINTS:
    @model.Constraint(model.TASKS)
    def finish_tasks(model, j, m):
        return model.start[j, m] + model.dur[j, m] <= model.makespan
    @model.Constraint(model.TASKORDER)
    def preceding(model, j, m, k, n):
        return model.start[k, n] + model.dur[k, n] <= model.start[j, m]
    @model.Disjunction(model.DISJUNCTIONS)
    def no_overlap(model, j, k, m):
        return [
            model.start[j, m] + model.dur[j, m] <= model.start[k, m],
            model.start[k, m] + model.dur[k, m] <= model.start[j, m],
        ]
    # Apply Big-M to transform logic to MILP
    pyo.TransformationFactory("gdp.bigm").apply_to(model)
    return model
# Generate the model
MRCPSP_model(TASKS)

Optimization

The optimization can be performed by calling one of the built-in solvers of pyomo, or by using one of the more efficient commercial7 solvers, from CPLEX or gurobi.

7 Free academic versions are available.

# model optimization
def MRCPSP_solve(model):
    SOLVER.solve(model)
    results = [
        {
            "Job": j,
            "Machine": m,
            "Start": model.start[j, m](),
            "Duration": model.dur[j, m],
            "Finish": model.start[(j, m)]() + model.dur[j, m],
        }
        for j, m in model.TASKS
    ]
    return results
# select a suitable solver
solver = 'appsi_highs' #'gurobi' 'glpk','cbc','cplex'
SOLVER = pyo.SolverFactory(solver)
assert SOLVER.available(), f"Solver {solver} is not available."
# Solve the optimization problem
results = MRCPSP_solve(MRCPSP_model(TASKS))

Output

Finally, we extract the schedule from the results, store them in a suitable dataframe, and output a CSV (or JSON) file with start and finish times of each job, as well as the optimal mode selected for each job. This can also be output in the form of a GANTT chart. Other outputs can include resource usage, energy consumption, machine utilization statistics and costs.

Time-Windows - EF, LF, etc.

To reduce the number of variables in the MILP optimization, we define subsets of the time variable.

NoteTimespan, Ordering

The timespan \([0,T]\) is discretized into \(T\) time periods of length one. Period \(t\) refers to the time slot/interval \([t-1,t],\) where \(t=1, \ldots, T.\) The processing time (duration) of job \(j\) is denoted \(d_j,\) the set of immediate predecessor activities of activity \(j\) is denoted \(\mathcal{P}_j\) and the set of immediate successor activities of activity \(j\) is denoted \(\mathcal{S}_j.\)

The forward pass to compute earliest start (EST) and finishing (EFT) times is defined as follows:

EST[0], EFT[0] = 0 # initialization
for j in range(2, n):
  EST[j] = max(EFT[i] for i in P[j])
  EFT[j] = EST[j] + d[j]

The backward pass to compute latest start (LST) and finishing (LFT) times is defined as follows:

LFT[0], LST[0] = T # initialization
for j in range(n-1, 1, -1):
  LFT[j] = min(LST[i] for i in S[j])
  LST[j] = LFT[j] - d[j]