3

I'm working on an AI model to predict dependency links between tasks for industrial plannifications, based on historical project data. I have two tables: Task Table (15 sheets, one sheet = one planning)

ID activity Name of activity Equipment Type Start Date End Date
ZZ0001/001 TRAVAUX A COORDONNER COLONNE 04/01/2011 08:00 04/01/2011 08:00
ZZ0001/002 POSE ECHAFAUDAGE EXTERNE COLONNE 04/06/2012 08:00 10/08/2012 17:00
ZZ0001/003 DECALORIFUGEAGE PARTIEL COLONNE 10/09/2012 08:00 10/09/2012 17:00

Dependencies (15 sheets, one sheet = one planning)

ID task ID successor Link Type
ZZ0001/002 ZZ0001/003 FS
ZZ0001/002 ZZ0001/006 FS
ZZ0001/003 ZZ0001/006 SS

Each sheet has 300 to 17k tasks. ID is unique, dataset is unbalanced (some equipment type appears 100x times more than some other)

Goal: Given a new list of tasks (typically filtered by EquipmentType), I want the model to suggest likely dependencies between them (and eventually, the LinkType) — learned from historical patterns in the existing data.

What I’ve tried:

Decision Trees

Basic Neural Networks (MLP + BERT/GNN)

Schematic Code

Data preparation :

  • Load Excel
  • Encode names with BERT
  • Encode type with OneHotEncoder
  • Combine: [BERT | OneHot] → torch.tensor(feature vector)
  • Build graph G: each node = task with feature, no edges at inference time

Training SEAL model : For each planning in training:

  • Extract real edges (u → v)
  • Generate negative pairs (same type, no link)
  • Build subgraphs 2-hop around each pair
  • Apply DRNL labeling
  • Store PyG Data(x, edge_index, drnl, label)
  • Train GNN: class SEALGNN(nn.Module): GINConv(input = [feat + drnl]) GlobalPool → MLP → Sigmoid

Problems encountered:

Random or irrelevant links

Models predicting dependencies between all tasks

Lack of logical flow learned from historical data

I'm pretty sure i am not pre-processing the data correctly as i'm not sur how to treat the tasks name for it to recognize the "pattern"

My Question: Would it make sense to frame this as a graph problem and use Graph Neural Networks (GNNs)? Or is there a better ML or statistical approach for modeling and predicting dependencies between tasks in this kind of scenario?

I'm open to advice on model architecture or data pre-processing strategies that might improve performance. Note that i work on google colab pro and have access to gpu a100 as well as tpu

lili
  • 311
  • 6

0 Answers0