flowchart LR
A[/Input/] --> B["Model
Function"]
B --> C["pyomo
Model"]
C --> D[/Schedule /]
CDT Implementation
Before addressing the simple examples of resource scheduling and the use cases, we describe the overall implementation, as shown in Figure 3.1.
- Input description of jobs (
jsonfile), that is loaded into … - Model function, that generates …
pyomomodel, which produces …- Output, an optimal schedule (
csvorjsonfile).
Input File
This json input file contains a complete description of the CDT that we want to model for optimization. Thanks to its genericity, it can be readily adapted to a wide diversity of projects, and even to global job scheduling at national or European levels.
The input file contains all the information regarding the following characteristics of the CDT:
- \(\mathcal{P} = \{P_1, P_2, \ldots \}\) the list of projects;
- \(\mathcal{J}_i = \{J_1^i, J_2^i, \ldots \}\) the list of jobs per project1;
- \(\mathcal{O}_j = \{O_1^j, O_2^j, \ldots \}\) the list of operations per job2;
- \(\mathcal{M}_k = \{M_1^k, M_2^k, \ldots \}\) the list of modes per job (operation)3;
- Resources:
- \(\mathcal{R}_l^{\rho} = \{R_1^l, R_2^l, \ldots \}\) list of renewable resources per mode;
- \(\mathcal{R}_l^{\nu} = \{R_1^l, R_2^l, \ldots \}\) list of non-renewable resources per project;
- Job requirements per resource: \(p_{ij}\)
- Costs per resource: \(c_{ij,l}\)
- List of precedences4: \(\mathcal{E} = \{(i,j) \mid i \prec j, \, i,j \in \mathcal{J} \}.\)
1 These are of type data transfer, data storage, data processing, etc.
2 This is an optional sub-group–the operations can just be considered as additional jobs.
3 These are the facilities that can be exploited for each category of job: networks, data centers, HPC centers, etc.
4 Successors and/or predecessors, as defined by the workflow DAG.
Data Model Overview
The input consists of four entities:
| Entity | Description |
|---|---|
| Resources | Available resources with capacities (renewable per period, non-renewable) |
| Jobs | Units of work to be scheduled |
| Modes | Alternative execution options for each job |
| Precedences | Directed edges defining the execution order (DAG) |
1. Resources
| Field | Type | Required | Description |
|---|---|---|---|
resource_id |
string | yes | Unique identifier for the resource |
name |
string | no | Human-readable name |
capacity |
integer ≥ 0 | yes | Maximum units available per time period5 |
5 For renewable resources. If resource is non-renewable, this is the total available.
2. Jobs
| Field | Type | Required | Description |
|---|---|---|---|
job_id |
string | yes | Unique identifier for the job |
name |
string | no | Human-readable name |
release_time |
integer ≥ 0 | no | Earliest start time (default: 0) |
deadline |
integer ≥ 0 | no | Latest finish time (default: none) |
3. Modes
Each job must have at least one mode. A mode specifies how a job can be executed.
| Field | Type | Required | Description |
|---|---|---|---|
mode_id |
string | yes | Unique identifier for the mode |
job_id |
string | yes | Reference to parent job |
duration |
integer ≥ 0 | yes | Processing time in periods |
cost |
float ≥ 0 | yes | Cost of selecting this mode |
resource_requirements |
list | yes | Resources consumed during execution |
Resource Requirement (nested within Mode)
| Field | Type | Required | Description |
|---|---|---|---|
resource_id |
string | yes | Reference to a defined resource |
demand |
integer ≥ 0 | yes | Units required per period while job executes |
4. Precedences
Defines the DAG structure. Each entry represents a directed edge.
| Field | Type | Required | Description |
|---|---|---|---|
predecessor |
string | yes | job_id of the job that must finish first |
successor |
string | yes | job_id of the job that must start after |
lag |
integer ≥ 0 | no | Minimum time gap between finish and start (default: 0) |
This gives the following json format.
Show Input File Code
{
"problem_name": "string (optional)",
"horizon": "integer (optional, planning horizon)",
"resources": [
{
"resource_id": "string",
"name": "string (optional)",
"capacity": "integer"
}
],
"jobs": [
{
"job_id": "string",
"name": "string (optional)",
"release_time": "integer (optional, default 0)",
"deadline": "integer (optional, default null)"
}
],
"modes": [
{
"mode_id": "string",
"job_id": "string",
"duration": "integer",
"cost": "number",
"resource_requirements": [
{
"resource_id": "string",
"demand": "integer"
}
]
}
],
"precedences": [
{
"predecessor": "string (job_id)",
"successor": "string (job_id)",
"lag": "integer (optional, default 0)"
}
]
}Validation
We provide a JSON schema that the user can use for formal validation of their input files as follows.
import json
import jsonschema
from jsonschema import validate
import DraftValidator
# Load schema and data
with open("workflow-schema.json") as f:
schema = json.load(f)
with open("my-problem.json") as f:
data = json.load(f)
# Validate
try:
validate(instance=data, schema=schema)
print("Input is valid.")
except jsonschema.ValidationError as e:
print(f"Validation error: {e.message}")
print(f"Path: {list(e.absolute_path)}")Model Function
All the above are necessary and sufficient for the mathematical formulation presented in the section Mathematical Formulation - Deterministic Model. We define a pyomo function that takes a dictionary of tasks as input6, and returns a pyomo model.
6 See below for examples.
# Model function generator
def MRCPSP_model(TASKS):
model = pyo.ConcreteModel()
# TASKS is a two dimensional set of (j,m) constructed from the dictionary keys
model.TASKS = pyo.Set(initialize=TASKS.keys(), dimen=2)
# Set of JOBS is constructed from a python set
model.JOBS = pyo.Set(initialize=list(set([j for (j, m) in model.TASKS])))
# Set of MACHINES is constructed from a python set
model.MACHINES = pyo.Set(initialize=list(set([m for (j, m) in model.TASKS])))
# ORDER of tasks is constructed as a cross-product of tasks and filtering
model.TASKORDER = pyo.Set(
initialize = model.TASKS * model.TASKS,
dimen = 4,
filter = lambda model, j, m, k, n: (k, n) == TASKS[(j, m)]["prec"],
)
# Set of DISJUNCTIONS is cross-product of jobs, jobs, and machines
model.DISJUNCTIONS = pyo.Set(
initialize = model.JOBS * model.JOBS * model.MACHINES,
dimen = 3,
filter = lambda model, j, k, m: j < k
and (j, m) in model.TASKS
and (k, m) in model.TASKS,
)
# Load DURATION data into a model parameter for later access
@model.Param(model.TASKS)
def dur(model, j, m):
return TASKS[(j, m)]["dur"]
# Upper bound on MAKESPAN
ub = sum([model.dur[j, m] for (j, m) in model.TASKS])
# Create DECISION variables
model.makespan = pyo.Var(bounds=(0, ub))
model.start = pyo.Var(model.TASKS, bounds=(0, ub))
# OBJECTIVE function
@model.Objective(sense=pyo.minimize)
def minimize_makespan(model):
return model.makespan
# CONSTRAINTS:
@model.Constraint(model.TASKS)
def finish_tasks(model, j, m):
return model.start[j, m] + model.dur[j, m] <= model.makespan
@model.Constraint(model.TASKORDER)
def preceding(model, j, m, k, n):
return model.start[k, n] + model.dur[k, n] <= model.start[j, m]
@model.Disjunction(model.DISJUNCTIONS)
def no_overlap(model, j, k, m):
return [
model.start[j, m] + model.dur[j, m] <= model.start[k, m],
model.start[k, m] + model.dur[k, m] <= model.start[j, m],
]
# Apply Big-M to transform logic to MILP
pyo.TransformationFactory("gdp.bigm").apply_to(model)
return model
# Generate the model
MRCPSP_model(TASKS)Optimization
The optimization can be performed by calling one of the built-in solvers of pyomo, or by using one of the more efficient commercial7 solvers, from CPLEX or gurobi.
7 Free academic versions are available.
# model optimization
def MRCPSP_solve(model):
SOLVER.solve(model)
results = [
{
"Job": j,
"Machine": m,
"Start": model.start[j, m](),
"Duration": model.dur[j, m],
"Finish": model.start[(j, m)]() + model.dur[j, m],
}
for j, m in model.TASKS
]
return results
# select a suitable solver
solver = 'appsi_highs' #'gurobi' 'glpk','cbc','cplex'
SOLVER = pyo.SolverFactory(solver)
assert SOLVER.available(), f"Solver {solver} is not available."
# Solve the optimization problem
results = MRCPSP_solve(MRCPSP_model(TASKS))Output
Finally, we extract the schedule from the results, store them in a suitable dataframe, and output a CSV (or JSON) file with start and finish times of each job, as well as the optimal mode selected for each job. This can also be output in the form of a GANTT chart. Other outputs can include resource usage, energy consumption, machine utilization statistics and costs.
Time-Windows - EF, LF, etc.
To reduce the number of variables in the MILP optimization, we define subsets of the time variable.
The timespan \([0,T]\) is discretized into \(T\) time periods of length one. Period \(t\) refers to the time slot/interval \([t-1,t],\) where \(t=1, \ldots, T.\) The processing time (duration) of job \(j\) is denoted \(d_j,\) the set of immediate predecessor activities of activity \(j\) is denoted \(\mathcal{P}_j\) and the set of immediate successor activities of activity \(j\) is denoted \(\mathcal{S}_j.\)
The forward pass to compute earliest start (EST) and finishing (EFT) times is defined as follows:
EST[0], EFT[0] = 0 # initialization
for j in range(2, n):
EST[j] = max(EFT[i] for i in P[j])
EFT[j] = EST[j] + d[j]The backward pass to compute latest start (LST) and finishing (LFT) times is defined as follows:
LFT[0], LST[0] = T # initialization
for j in range(n-1, 1, -1):
LFT[j] = min(LST[i] for i in S[j])
LST[j] = LFT[j] - d[j]