CCDS Implementation

Before addressing the simple examples of resource scheduling and the use cases, we describe the overall implementation, as shown in Figure 3.1.

flowchart LR
    A[/Input/] --> B["Model 
                      Function"]
    B --> C["pyomo
             Model"] 
    C --> D[/Schedule /]

Figure 3.1: The implementation in 4 stages.

Input description of jobs (json file), that is loaded into …
Model function, that generates …
pyomo model, which produces …
Output, an optimal schedule (csv or json file).

Input File

This json input file contains a complete description of the CCDS that we want to model for optimization. Thanks to its genericity, it can be readily adapted to a wide diversity of projects, and even to global job scheduling at national or European levels.

The input file contains all the information regarding the following characteristics of the CCDS:

\(\mathcal{P} = \{P_1, P_2, \ldots \}\) the list of projects;
\(\mathcal{J}_i = \{J_1^i, J_2^i, \ldots \}\) the list of jobs per project¹;
\(\mathcal{O}_j = \{O_1^j, O_2^j, \ldots \}\) the list of operations per job²;
\(\mathcal{M}_k = \{M_1^k, M_2^k, \ldots \}\) the list of modes per job (operation)³;
Resources:
- \(\mathcal{R}_l^{\rho} = \{R_1^l, R_2^l, \ldots \}\) list of renewable resources per mode;
- \(\mathcal{R}_l^{\nu} = \{R_1^l, R_2^l, \ldots \}\) list of non-renewable resources per project;
Job requirements per resource: \(p_{ij}\)
Costs per resource: \(c_{ij,l}\)
List of precedences⁴: \(\mathcal{E} = \{(i,j) \mid i \prec j, \, i,j \in \mathcal{J} \}.\)

¹ These are of type data transfer, data storage, data processing, etc.

² This is an optional sub-group–the operations can just be considered as additional jobs.

³ These are the facilities that can be exploited for each category of job: networks, data centers, HPC centers, etc.

⁴ Successors and/or predecessors, as defined by the workflow DAG.

Data Model Overview

The input consists of four entities:

Entity	Description
Resources	Available resources with capacities (renewable per period, non-renewable)
Jobs	Units of work to be scheduled
Modes	Alternative execution options for each job
Precedences	Directed edges defining the execution order (DAG)

1. Resources

Field	Type	Required	Description
`resource_id`	string	yes	Unique identifier for the resource
`name`	string	no	Human-readable name
`capacity`	integer ≥ 0	yes	Maximum units available per time period⁵

⁵ For renewable resources. If resource is non-renewable, this is the total available.

2. Jobs

Field	Type	Required	Description
`job_id`	string	yes	Unique identifier for the job
`name`	string	no	Human-readable name
`release_time`	integer ≥ 0	no	Earliest start time (default: 0)
`deadline`	integer ≥ 0	no	Latest finish time (default: none)

3. Modes

Each job must have at least one mode. A mode specifies how a job can be executed.

Field	Type	Required	Description
`mode_id`	string	yes	Unique identifier for the mode
`job_id`	string	yes	Reference to parent job
`duration`	integer ≥ 0	yes	Processing time in periods
`cost`	float ≥ 0	yes	Cost of selecting this mode
`resource_requirements`	list	yes	Resources consumed during execution

Resource Requirement (nested within Mode)

Field	Type	Required	Description
`resource_id`	string	yes	Reference to a defined resource
`demand`	integer ≥ 0	yes	Units required per period while job executes

4. Precedences

Defines the DAG structure. Each entry represents a directed edge.

Field	Type	Required	Description
`predecessor`	string	yes	`job_id` of the job that must finish first
`successor`	string	yes	`job_id` of the job that must start after
`lag`	integer ≥ 0	no	Minimum time gap between finish and start (default: 0)

This gives the following json format.

Show Input File Code

{
  "problem_name": "string (optional)",
  "horizon": "integer (optional, planning horizon)",
  
  "resources": [
    {
      "resource_id": "string",
      "name": "string (optional)",
      "capacity": "integer"
    }
  ],
  
  "jobs": [
    {
      "job_id": "string",
      "name": "string (optional)",
      "release_time": "integer (optional, default 0)",
      "deadline": "integer (optional, default null)"
    }
  ],
  
  "modes": [
    {
      "mode_id": "string",
      "job_id": "string",
      "duration": "integer",
      "cost": "number",
      "resource_requirements": [
        {
          "resource_id": "string",
          "demand": "integer"
        }
      ]
    }
  ],
  
  "precedences": [
    {
      "predecessor": "string (job_id)",
      "successor": "string (job_id)",
      "lag": "integer (optional, default 0)"
    }
  ]
}

Validation

We provide a JSON schema that the user can use for formal validation of their input files as follows.

import json
import jsonschema
from jsonschema import validate
import DraftValidator

# Load schema and data
with open("workflow-schema.json") as f:
    schema = json.load(f)

with open("my-problem.json") as f:
    data = json.load(f)

# Validate
try:
    validate(instance=data, schema=schema)
    print("Input is valid.")
except jsonschema.ValidationError as e:
    print(f"Validation error: {e.message}")
    print(f"Path: {list(e.absolute_path)}")

Model Function

All the above are necessary and sufficient for the mathematical formulation presented in the section Mathematical Formulation - Deterministic Model. We define a pyomo function that takes a dictionary of tasks as input⁶, and returns a pyomo model.

⁶ See below for examples.

# Model function generator
def MRCPSP_model(TASKS):
    model = pyo.ConcreteModel()
    # TASKS is a two dimensional set of (j,m) constructed from the dictionary keys
    model.TASKS = pyo.Set(initialize=TASKS.keys(), dimen=2)
    # Set of JOBS is constructed from a python set
    model.JOBS = pyo.Set(initialize=list(set([j for (j, m) in model.TASKS])))
    # Set of MACHINES is constructed from a python set
    model.MACHINES = pyo.Set(initialize=list(set([m for (j, m) in model.TASKS])))
    # ORDER of tasks is constructed as a cross-product of tasks and filtering
    model.TASKORDER = pyo.Set(
        initialize = model.TASKS * model.TASKS,
        dimen      = 4,
        filter     = lambda model, j, m, k, n: (k, n) == TASKS[(j, m)]["prec"],
    )
    # Set of DISJUNCTIONS is cross-product of jobs, jobs, and machines
    model.DISJUNCTIONS = pyo.Set(
        initialize = model.JOBS * model.JOBS * model.MACHINES,
        dimen      = 3,
        filter     = lambda model, j, k, m: j < k
                     and (j, m) in model.TASKS
                     and (k, m) in model.TASKS,
    )
    # Load DURATION data into a model parameter for later access
    @model.Param(model.TASKS)
    def dur(model, j, m):
        return TASKS[(j, m)]["dur"]
    # Upper bound on MAKESPAN
    ub = sum([model.dur[j, m] for (j, m) in model.TASKS])
    # Create DECISION variables
    model.makespan = pyo.Var(bounds=(0, ub))
    model.start = pyo.Var(model.TASKS, bounds=(0, ub))
    # OBJECTIVE function 
    @model.Objective(sense=pyo.minimize)
    def minimize_makespan(model):
        return model.makespan
    # CONSTRAINTS:
    @model.Constraint(model.TASKS)
    def finish_tasks(model, j, m):
        return model.start[j, m] + model.dur[j, m] <= model.makespan
    @model.Constraint(model.TASKORDER)
    def preceding(model, j, m, k, n):
        return model.start[k, n] + model.dur[k, n] <= model.start[j, m]
    @model.Disjunction(model.DISJUNCTIONS)
    def no_overlap(model, j, k, m):
        return [
            model.start[j, m] + model.dur[j, m] <= model.start[k, m],
            model.start[k, m] + model.dur[k, m] <= model.start[j, m],
        ]
    # Apply Big-M to transform logic to MILP
    pyo.TransformationFactory("gdp.bigm").apply_to(model)
    return model
# Generate the model
MRCPSP_model(TASKS)

Optimization

The optimization can be performed by calling one of the built-in solvers of pyomo, or by using one of the more efficient commercial⁷ solvers, from CPLEX or gurobi.

⁷ Free academic versions are available.

# model optimization
def MRCPSP_solve(model):
    SOLVER.solve(model)
    results = [
        {
            "Job": j,
            "Machine": m,
            "Start": model.start[j, m](),
            "Duration": model.dur[j, m],
            "Finish": model.start[(j, m)]() + model.dur[j, m],
        }
        for j, m in model.TASKS
    ]
    return results
# select a suitable solver
solver = 'appsi_highs' #'gurobi' 'glpk','cbc','cplex'
SOLVER = pyo.SolverFactory(solver)
assert SOLVER.available(), f"Solver {solver} is not available."
# Solve the optimization problem
results = MRCPSP_solve(MRCPSP_model(TASKS))

Output

Finally, we extract the schedule from the results, store them in a suitable dataframe, and output a CSV (or JSON) file with start and finish times of each job, as well as the optimal mode selected for each job. This can also be output in the form of a GANTT chart. Other outputs can include resource usage, energy consumption, machine utilization statistics and costs.

Time-Windows - EF, LF, etc.

To reduce the number of variables in the MILP optimization, we define subsets of the time variable.

Timespan, Ordering

The timespan \([0,T]\) is discretized into \(T\) time periods of length one. Period \(t\) refers to the time slot/interval \([t-1,t],\) where \(t=1, \ldots, T.\) The processing time (duration) of job \(j\) is denoted \(d_j,\) the set of immediate predecessor activities of activity \(j\) is denoted \(\mathcal{P}_j\) and the set of immediate successor activities of activity \(j\) is denoted \(\mathcal{S}_j.\)

The forward pass to compute earliest start (EST) and finishing (EFT) times is defined as follows:

EST[0], EFT[0] = 0 # initialization
for j in range(2, n):
  EST[j] = max(EFT[i] for i in P[j])
  EFT[j] = EST[j] + d[j]

The backward pass to compute latest start (LST) and finishing (LFT) times is defined as follows:

LFT[0], LST[0] = T # initialization
for j in range(n-1, 1, -1):
  LFT[j] = min(LST[i] for i in S[j])
  LST[j] = LFT[j] - d[j]