diff --git a/submissions/Dia-Vats/level5/answers.md b/submissions/Dia-Vats/level5/answers.md index 93e4487ca..e9223eec5 100644 --- a/submissions/Dia-Vats/level5/answers.md +++ b/submissions/Dia-Vats/level5/answers.md @@ -1,58 +1,55 @@ -# Level 5 — Graph Thinking -### Dia Vats +# Level 5 - Graph Thinking +**Dia Vats** --- -# Q1. Model It +## Q1. Model It -See `schema.md` for the graph schema and `schema.png` for the rendered diagram. +See `schema.md` for the full diagram source and `schema.png` for the rendered image. -## Schema Summary +### Schema Summary -### Node Labels -- `Project` -- `WorkOrder` -- `Station` -- `Product` -- `Week` -- `Worker` -- `Certification` -- `CapacitySnapshot` +**Node Labels (8):** `Project`, `WorkOrder`, `Station`, `Product`, `Week`, `Worker`, `Certification`, `CapacitySnapshot` -### Relationship Types -- `HAS_WORKORDER` -- `AT_STATION` -- `PRODUCES` -- `SCHEDULED_IN` -- `FEEDS_INTO` -- `FOLLOWS` -- `ASSIGNED_TO` -- `CAN_COVER` -- `CERTIFIED_IN` -- `REQUIRES` -- `HAS_CAPACITY` +**Relationship Types (11):** -### Relationships Carrying Data -- `(WorkOrder)-[:SCHEDULED_IN]->(Week)` - Properties: `planned_hours`, `actual_hours`, `completed_units` +| Relationship | Direction | Properties | +|---|---|---| +| `HAS_WORKORDER` | Project → WorkOrder | — | +| `AT_STATION` | WorkOrder → Station | — | +| `PRODUCES` | WorkOrder → Product | — | +| `SCHEDULED_IN` | WorkOrder → Week | **`planned_hours`, `actual_hours`, `completed_units`** | +| `HAS_CAPACITY` | Week → CapacitySnapshot | — | +| `ASSIGNED_TO` | Worker → Station | — | +| `CAN_COVER` | Worker → Station | — | +| `CERTIFIED_IN` | Worker → Certification | — | +| `REQUIRES` | Station → Certification | — | +| `FEEDS_INTO` | Station → Station | — | +| `FOLLOWS` | WorkOrder → WorkOrder | — | + +**Relationships carrying data:** -- `(Project)-[:PRODUCES]->(Product)` - Properties: `quantity`, `unit_factor` +- `(WorkOrder)-[:SCHEDULED_IN {planned_hours, actual_hours, completed_units}]->(Week)` +- `(Project)-[:PRODUCES {quantity, unit_factor}]->(Product)` *(project-level aggregate)* -## Design Choice +### Design Reasoning -I used `WorkOrder` as the operational layer of the graph instead of treating every CSV row as just a relationship. A project can pass through multiple stations and weeks, so separating work execution from the project itself made the graph easier to reason about operationally. +The key design choice here is using `WorkOrder` as the operational unit instead of something like `ProductionEntry`. Each row in `factory_production.csv` is a discrete execution event — a specific project working a specific product through a specific station in a specific week. That's a work order, not just a data entry. Naming it that way makes the graph match how a factory actually operates. -I also added `FEEDS_INTO` relationships between stations to represent production flow. Without that, bottlenecks look isolated. With it, downstream impact becomes visible. +The two relationships I added that others probably won't have: `FEEDS_INTO` between stations (011 → 012 → 013 is the IQB flow sequence), and `FOLLOWS` between WorkOrders (w1 execution feeds into w2 execution for the same project-station pair). Without these, every bottleneck looks isolated. With them, you can trace downstream impact — if station 016 overruns in w2, you can see which downstream work orders are at risk. --- -# Q2. Why Not Just SQL? +## Q2. Why Not Just SQL? + +**Query:** *"Which workers are certified to cover Station 016 (Gjutning) when Per Gustafsson is on vacation, and which projects would be affected?"* + +> Note: The dataset lists `Per Hansen` (W07) as the primary operator at Station 016. I'm treating "Per Gustafsson" as referring to this worker. -## SQL Version +### SQL Version ```sql --- Workers who can cover Station 016 +-- Workers certified to cover 016 SELECT w.name, w.role, @@ -61,141 +58,116 @@ FROM workers w WHERE w.worker_id != 'W07' AND '016' = ANY(string_to_array(w.can_cover_stations, ',')); --- Projects affected by Station 016 +-- Projects that run through station 016 SELECT DISTINCT p.project_id, p.project_name -FROM production_entries pe -JOIN projects p -ON pe.project_id = p.project_id -WHERE pe.station_code = '016'; +FROM work_orders wo +JOIN projects p ON wo.project_id = p.project_id +WHERE wo.station_code = '016'; ``` -## Cypher Version +To get both in one result you need a CROSS JOIN - which already signals you're fighting the data model. -```cypher -MATCH (s:Station {station_code:'016'}) +### Cypher Version +```cypher +MATCH (s:Station {station_code: '016'}) MATCH (w:Worker)-[:CAN_COVER]->(s) WHERE w.worker_id <> 'W07' - MATCH (wo:WorkOrder)-[:AT_STATION]->(s) MATCH (p:Project)-[:HAS_WORKORDER]->(wo) - OPTIONAL MATCH (w)-[:CERTIFIED_IN]->(c:Certification) - RETURN - w.name AS backup_worker, - collect(DISTINCT c.name) AS certifications, - collect(DISTINCT p.project_name) AS affected_projects + w.name AS backup_worker, + collect(DISTINCT c.name) AS certifications, + collect(DISTINCT p.project_name) AS affected_projects ``` -## What the Graph Makes Clear +### What the Graph Makes Clear -The SQL version answers the question, but the graph version makes the operational dependency visible immediately. +The SQL version gives the right answer but doesn't show the risk. You get a list of names and a separate list of projects - connecting them is left to whoever is reading the output. -Station 016 connects directly to both worker coverage and active project flow. In this dataset, Victor Elm is effectively the only backup for that station, so multiple projects depend on one fallback path. That kind of risk becomes obvious when traversing the graph. +In the graph, the path `(Worker)-[:CAN_COVER]->(Station)<-[:AT_STATION]-(WorkOrder)<-[:HAS_WORKORDER]-(Project)` makes the dependency structural. When I ran this against the actual data, Victor Elm (W11, Foreman) is the only worker who can cover station 016 besides Per Hansen. That means four active projects - P03, P05, P07, P08 — all route their coverage risk through one person. SQL can tell you that too, but it won't show you that it's a single path until you draw it out. The graph just... shows it. --- -# Q3. Spot the Bottleneck +## Q3. Spot the Bottleneck -## Capacity Deficit Weeks - -From `factory_capacity.csv`: +### Capacity Deficits (from factory_capacity.csv) | Week | Capacity | Planned | Deficit | -|---|---|---|---| -| w1 | 480 | 612 | -132 | -| w2 | 520 | 645 | -125 | -| w4 | 500 | 550 | -50 | -| w6 | 440 | 520 | -80 | -| w7 | 520 | 600 | -80 | - -## Main Bottleneck Areas +|------|----------|---------|---------| +| w1 | 480 hrs | 612 hrs | **-132** | +| w2 | 520 hrs | 645 hrs | **-125** | +| w4 | 500 hrs | 550 hrs | -50 | +| w6 | 440 hrs | 520 hrs | -80 | +| w7 | 520 hrs | 600 hrs | -80 | -The largest overruns are concentrated around: +w1 and w2 are the worst - five of eight weeks are in deficit, which means this factory is running over capacity more often than not. -- Station `016` — Gjutning -- Station `018` — SB B/F-hall -- Station `014` — Svets o montage IQB +### Stations Causing the Overload (>10% actual vs planned) -Example overruns from the dataset: +| Station | Project | Week | Planned | Actual | Variance | +|---------|---------|------|---------|--------|---------| +| 016 Gjutning | P03 Lagerhall Jönköping | w2 | 28h | 35h | **+25.0%** | +| 016 Gjutning | P05 Sjukhus Linköping | w2 | 35h | 40h | **+14.3%** | +| 016 Gjutning | P08 Bro E6 Halmstad | w3 | 22h | 25h | **+13.6%** | +| 018 SB B/F-hall | P04 Parkering Helsingborg | w1 | 19h | 22h | **+15.8%** | +| 018 SB B/F-hall | P06 Skola Uppsala | w2 | 16h | 18h | **+12.5%** | +| 014 Svets o montage | P03 Lagerhall Jönköping | w1 | 42h | 48h | **+14.3%** | -| Project | Station | Planned | Actual | Variance | -|---|---|---|---|---| -| P03 | 016 | 28 | 35 | +25% | -| P05 | 016 | 35 | 40 | +14.3% | -| P08 | 016 | 22 | 25 | +13.6% | -| P04 | 018 | 19 | 22 | +15.8% | +Station 016 keeps showing up. It's consistently over by 13–25% across different projects, and the worker coverage there is basically one person (Per Hansen) with Victor Elm as the only real fallback. -Station 016 stands out because the overload is repeated across multiple projects while also depending heavily on a very small worker pool. - -## Cypher Query +### Cypher Query — Overruns >10% Grouped by Station ```cypher MATCH (p:Project)-[:HAS_WORKORDER]->(wo:WorkOrder) MATCH (wo)-[:AT_STATION]->(s:Station) MATCH (wo)-[r:SCHEDULED_IN]->(w:Week) - WHERE r.actual_hours > r.planned_hours * 1.1 - RETURN s.station_code, s.station_name, - collect(DISTINCT p.project_name) AS affected_projects, - round( - avg( - (r.actual_hours - r.planned_hours) - / r.planned_hours * 100 - ),1 - ) AS avg_variance_pct, - sum(r.actual_hours - r.planned_hours) AS excess_hours - + collect(DISTINCT p.project_name) AS affected_projects, + round(avg((r.actual_hours - r.planned_hours) / r.planned_hours * 100), 1) AS avg_variance_pct, + sum(r.actual_hours - r.planned_hours) AS excess_hours ORDER BY avg_variance_pct DESC ``` -## Bottleneck Modelling +### Modelling the Bottleneck as a Graph Pattern -I would model bottlenecks as a property on the scheduling relationship rather than creating a separate node. +I'd keep the bottleneck signal on the `SCHEDULED_IN` relationship as a property (`is_bottleneck: true`) rather than creating a separate node. The reason is practical: bottlenecks in this data are tied to a specific execution event — a project at a station in a week. That's already what `SCHEDULED_IN` represents. Adding a `(:Bottleneck)` node would mean joining three things that are already joined. -Example: +The `FEEDS_INTO` relationships I have between stations add something useful here though: once you flag a bottleneck on `SCHEDULED_IN`, you can traverse `FEEDS_INTO` to see which downstream stations and work orders are at risk. That's the part a flat flag can't do on its own. ```cypher -SET r.is_bottleneck = true +MATCH (wo:WorkOrder)-[r:SCHEDULED_IN]->(w:Week) +WHERE r.actual_hours > r.planned_hours * 1.1 +SET r.is_bottleneck = true, + r.variance_pct = round((r.actual_hours - r.planned_hours) / r.planned_hours * 100, 1) ``` -In this case the bottleneck is tied to a specific station-week execution event, so keeping it on the relationship feels more practical and easier to query during dashboard aggregation. - --- -# Q4. Vector + Graph Hybrid +## Q4. Vector + Graph Hybrid -## What I Would Embed +### What I Would Embed -I would embed project-level operational descriptions containing: -- project type -- product mix -- quantity scale -- station sequence -- variance history +I'd embed a constructed string per project that includes: project type, product mix with quantities, station sequence, and etapp/phase. Something like: -Example: - -```text -Hospital extension project using IQB + IQP products with high load on stations 011, 012, 014 and 016 ``` +"Warehouse project Lagerhall Jönköping — IQB 900m across stations 011 012 013 014 016 017 — ET1 BOP1/BOP2" +``` + +This captures what the project is, how complex the product mix is, and which part of the factory it touches. Embedding just product type misses scope; embedding station sequence captures operational complexity. -This captures both semantic similarity and operational complexity. +Worker skills are worth a separate embedding index too, especially for the coverage query problem — but for matching incoming project requests, the project description is the right unit. -## Hybrid Query +### Hybrid Query ```cypher -CALL db.index.vector.queryNodes( - 'project_embeddings', - 10, - $query_embedding -) +CALL db.index.vector.queryNodes('project_embeddings', 10, $query_embedding) YIELD node AS similar_project, score MATCH (similar_project)-[:HAS_WORKORDER]->(wo:WorkOrder) @@ -205,158 +177,136 @@ MATCH (wo)-[r:SCHEDULED_IN]->(:Week) WITH similar_project, score, collect(DISTINCT s.station_code) AS stations_used, - avg( - abs( - (r.actual_hours - r.planned_hours) - / r.planned_hours - ) * 100 - ) AS variance_pct + avg(abs((r.actual_hours - r.planned_hours) / r.planned_hours) * 100) AS avg_variance_pct -WHERE variance_pct < 5 +WHERE avg_variance_pct < 5.0 RETURN similar_project.project_name, + similar_project.project_id, stations_used, - round(variance_pct,2) AS variance_pct, - round(score,3) AS similarity_score - + round(avg_variance_pct, 2) AS variance_pct, + round(score, 3) AS similarity_score ORDER BY similarity_score DESC LIMIT 5 ``` -## Why Hybrid Search Helps +### Why This Matters More Than Filtering -Filtering only by product type would return projects that may look similar on paper but behave very differently operationally. +If you filter by `product_type = 'IQB'`, you get every IQB project regardless of whether it was 200 meters or 1200 meters, whether it ran through 3 stations or 7, whether it overran by 25% or came in clean. The results aren't comparable. -The vector layer finds projects with similar scope and execution context. The graph layer filters for projects that actually moved through similar stations and stayed operationally stable. +The hybrid query finds projects that are actually similar in scope and execution footprint (vector), and then filters to only the ones that ran well (graph). For the hospital request in the question — "450 meters of IQB beams, tight timeline" — you'd get P04 or P06 back as references, not P05 (the 1200m hospital project that overran at 016). That's the useful answer. -Vector finds similarity. Graph filters for execution quality. +The Boardy parallel makes sense here: instead of matching project descriptions to project history, you're matching what a person needs to what someone else offers. Vector finds the semantic overlap, graph confirms they're actually in compatible communities. --- -# Q5. My L6 Blueprint +## Q5. My L6 Blueprint -## Node Mapping +### Node Labels → CSV Columns -| Node | CSV Source | Columns | +| Node | CSV Source | Columns Mapped | |---|---|---| -| `Project` | production | project_id, project_number, project_name | -| `WorkOrder` | production | planned_hours, actual_hours, completed_units | -| `Station` | production | station_code, station_name | -| `Product` | production | product_type, unit, unit_factor | -| `Week` | both | week | -| `Worker` | workers | worker_id, name, role, type | -| `Certification` | workers | certifications | -| `CapacitySnapshot` | capacity | total_capacity, total_planned, deficit | - ---- +| `Project` | factory_production.csv | `project_id`, `project_number`, `project_name`, `etapp`, `bop` | +| `WorkOrder` | factory_production.csv | `planned_hours`, `actual_hours`, `completed_units` (one node per row) | +| `Station` | factory_production.csv | `station_code`, `station_name` | +| `Product` | factory_production.csv | `product_type`, `unit`, `unit_factor` | +| `Week` | both CSVs | `week` column (w1–w8) | +| `Worker` | factory_workers.csv | `worker_id`, `name`, `role`, `hours_per_week`, `type` | +| `Certification` | factory_workers.csv | `certifications` split by comma → one node per cert | +| `CapacitySnapshot` | factory_capacity.csv | `own_staff_count`, `hired_staff_count`, `own_hours`, `hired_hours`, `overtime_hours`, `total_capacity`, `total_planned`, `deficit` | -## Relationship Mapping +### Relationship Types → What Creates Them | Relationship | Created From | |---|---| -| `HAS_WORKORDER` | project_id | -| `AT_STATION` | station_code | -| `PRODUCES` | product_type | -| `SCHEDULED_IN` | week | -| `ASSIGNED_TO` | primary_station | -| `CAN_COVER` | can_cover_stations | -| `CERTIFIED_IN` | certifications | -| `HAS_CAPACITY` | week join | -| `FEEDS_INTO` | derived production flow | -| `FOLLOWS` | sequential work order progression | - ---- - -## Streamlit Dashboard Panels - -### 1. Project Overview - -Shows: -- total planned vs actual hours -- variance % -- products involved -- completed units - -Cypher: +| `(Project)-[:HAS_WORKORDER]->(WorkOrder)` | `project_id` column | +| `(WorkOrder)-[:AT_STATION]->(Station)` | `station_code` column | +| `(WorkOrder)-[:PRODUCES]->(Product)` | `product_type` column | +| `(WorkOrder)-[:SCHEDULED_IN {planned_hours, actual_hours, completed_units}]->(Week)` | `week` column + hour columns | +| `(Week)-[:HAS_CAPACITY]->(CapacitySnapshot)` | join on `week` across both CSVs | +| `(Worker)-[:ASSIGNED_TO]->(Station)` | `primary_station` column | +| `(Worker)-[:CAN_COVER]->(Station)` | `can_cover_stations` split by comma | +| `(Worker)-[:CERTIFIED_IN]->(Certification)` | `certifications` split by comma | +| `(Station)-[:REQUIRES]->(Certification)` | derived from worker certs per station | +| `(Station)-[:FEEDS_INTO]->(Station)` | derived from production flow order (011→012→013→014→015→016→017→018→019→021) | +| `(WorkOrder)-[:FOLLOWS]->(WorkOrder)` | same project-station pair, consecutive weeks | + +### Dashboard Panels + +#### Panel 1 — Project Overview +All 8 projects: planned vs actual hours, variance %, products involved, completed units. ```cypher MATCH (p:Project)-[:HAS_WORKORDER]->(wo:WorkOrder) - +MATCH (wo)-[r:SCHEDULED_IN]->(:Week) RETURN + p.project_id, p.project_name, - sum(wo.planned_hours) AS planned, - sum(wo.actual_hours) AS actual, + sum(r.planned_hours) AS total_planned, + sum(r.actual_hours) AS total_actual, round( - ( - sum(wo.actual_hours) - sum(wo.planned_hours) - ) / sum(wo.planned_hours) * 100, + (sum(r.actual_hours) - sum(r.planned_hours)) / sum(r.planned_hours) * 100, 1 ) AS variance_pct +ORDER BY p.project_id ``` ---- +Display: horizontal bar chart planned vs actual, table below with variance % in red if > 10%. -### 2. Station Load Dashboard - -Shows: -- station load across weeks -- overload hotspots -- planned vs actual variance - -Cypher: +#### Panel 2 — Station Load by Week +Heat map showing which stations are overloaded in which weeks. Overload cells highlighted. ```cypher MATCH (wo:WorkOrder)-[:AT_STATION]->(s:Station) MATCH (wo)-[r:SCHEDULED_IN]->(w:Week) - RETURN s.station_code, + s.station_name, w.week_id, sum(r.planned_hours) AS planned, - sum(r.actual_hours) AS actual + sum(r.actual_hours) AS actual, + round( + (sum(r.actual_hours) - sum(r.planned_hours)) / sum(r.planned_hours) * 100, + 1 + ) AS variance_pct +ORDER BY s.station_code, w.week_id ``` ---- - -### 3. Capacity Tracker - -Shows: -- weekly capacity -- planned demand -- deficit weeks highlighted +Display: Plotly heatmap (stations × weeks), cells red where variance_pct > 10%. -Cypher: +#### Panel 3 — Capacity Tracker +Weekly capacity vs demand, deficit weeks flagged red. ```cypher MATCH (w:Week)-[:HAS_CAPACITY]->(c:CapacitySnapshot) - RETURN w.week_id, c.total_capacity, c.total_planned, - c.deficit + c.deficit, + c.overtime_hours +ORDER BY w.week_id ``` ---- - -### 4. Worker Coverage Matrix +Display: grouped bar chart capacity vs planned, deficit bars below zero shown red. -Shows: -- worker-to-station coverage -- stations with weak backup coverage -- single-point-of-failure stations - -Cypher: +#### Panel 4 — Worker Coverage Matrix +Which workers cover which stations, SPOF stations flagged. Shows downstream risk via `FEEDS_INTO`. ```cypher MATCH (s:Station) - OPTIONAL MATCH (w:Worker)-[:CAN_COVER]->(s) - +WITH s, + collect(DISTINCT w.name) AS coverers, + count(DISTINCT w) AS coverage_count RETURN s.station_code, - collect(w.name) AS covering_workers, - count(w) AS coverage_count + s.station_name, + coverers, + coverage_count, + CASE WHEN coverage_count <= 1 THEN true ELSE false END AS is_spof ORDER BY coverage_count ASC -``` \ No newline at end of file +``` + +Display: table with red badge on SPOF stations. Secondary query on click shows which projects depend on that station. \ No newline at end of file diff --git a/submissions/Dia-Vats/level5/schema.md b/submissions/Dia-Vats/level5/schema.md index 23b6b3836..dd6508c18 100644 --- a/submissions/Dia-Vats/level5/schema.md +++ b/submissions/Dia-Vats/level5/schema.md @@ -1,4 +1,7 @@ # Level 5 — Graph Schema +**Dia Vats** + +--- ## Schema Diagram @@ -9,24 +12,70 @@ ## Mermaid Source ```mermaid -flowchart LR - - Project -->|HAS_WORKORDER| WorkOrder - WorkOrder -->|AT_STATION| Station - WorkOrder -->|PRODUCES| Product - WorkOrder -->|SCHEDULED_IN| Week - - Week -->|HAS_CAPACITY| CapacitySnapshot - - Worker -->|ASSIGNED_TO| Station - Worker -->|CAN_COVER| Station - Worker -->|CERTIFIED_IN| Certification - - Station -->|REQUIRES| Certification - - Station -->|FEEDS_INTO| Station - - WorkOrder -->|FOLLOWS| WorkOrder +flowchart TD + Project["**Project** + --- + project_id + project_number + project_name + etapp + bop"] + + WorkOrder["**WorkOrder** + --- + planned_hours + actual_hours + completed_units"] + + Station["**Station** + --- + station_code + station_name"] + + Product["**Product** + --- + product_type + unit + unit_factor"] + + Week["**Week** + --- + week_id"] + + Worker["**Worker** + --- + worker_id + name + role + hours_per_week + type"] + + Certification["**Certification** + --- + name"] + + CapacitySnapshot["**CapacitySnapshot** + --- + own_staff_count + hired_staff_count + own_hours + hired_hours + overtime_hours + total_capacity + total_planned + deficit"] + + Project -->|"HAS_WORKORDER"| WorkOrder + WorkOrder -->|"AT_STATION"| Station + WorkOrder -->|"PRODUCES"| Product + WorkOrder -->|"SCHEDULED_IN\n{planned_hours, actual_hours, completed_units}"| Week + Week -->|"HAS_CAPACITY"| CapacitySnapshot + Worker -->|"ASSIGNED_TO"| Station + Worker -->|"CAN_COVER"| Station + Worker -->|"CERTIFIED_IN"| Certification + Station -->|"REQUIRES"| Certification + Station -->|"FEEDS_INTO"| Station + WorkOrder -->|"FOLLOWS"| WorkOrder ``` --- @@ -35,34 +84,46 @@ flowchart LR | Relationship | Properties | |---|---| -| `SCHEDULED_IN` | planned_hours, actual_hours, completed_units | -| `PRODUCES` | quantity, unit_factor | +| `(WorkOrder)-[:SCHEDULED_IN]->(Week)` | `planned_hours`, `actual_hours`, `completed_units` | +| `(Project)-[:PRODUCES]->(Product)` | `quantity`, `unit_factor` *(project-level aggregate)* | --- -## Node Labels +## Node Summary (8 labels) + +| # | Label | Key Properties | CSV Source | +|---|-------|---------------|-----------| +| 1 | `Project` | project_id, project_number, project_name, etapp, bop | factory_production.csv | +| 2 | `WorkOrder` | planned_hours, actual_hours, completed_units | factory_production.csv (one node per row) | +| 3 | `Station` | station_code, station_name | factory_production.csv | +| 4 | `Product` | product_type, unit, unit_factor | factory_production.csv | +| 5 | `Week` | week_id | both CSVs | +| 6 | `Worker` | worker_id, name, role, hours_per_week, type | factory_workers.csv | +| 7 | `Certification` | name | factory_workers.csv (split by comma) | +| 8 | `CapacitySnapshot` | own_staff_count, hired_staff_count, own_hours, hired_hours, overtime_hours, total_capacity, total_planned, deficit | factory_capacity.csv | + +--- -- `Project` -- `WorkOrder` -- `Station` -- `Product` -- `Week` -- `Worker` -- `Certification` -- `CapacitySnapshot` +## Relationship Summary (11 types) + +| # | Relationship | Direction | Properties | +|---|-------------|-----------|-----------| +| 1 | `HAS_WORKORDER` | Project → WorkOrder | — | +| 2 | `AT_STATION` | WorkOrder → Station | — | +| 3 | `PRODUCES` | WorkOrder → Product | — | +| 4 | `SCHEDULED_IN` | WorkOrder → Week | `planned_hours`, `actual_hours`, `completed_units` | +| 5 | `HAS_CAPACITY` | Week → CapacitySnapshot | — | +| 6 | `ASSIGNED_TO` | Worker → Station | — | +| 7 | `CAN_COVER` | Worker → Station | — | +| 8 | `CERTIFIED_IN` | Worker → Certification | — | +| 9 | `REQUIRES` | Station → Certification | — | +| 10 | `FEEDS_INTO` | Station → Station | — | +| 11 | `FOLLOWS` | WorkOrder → WorkOrder | — | --- -## Relationship Types - -- `HAS_WORKORDER` -- `AT_STATION` -- `PRODUCES` -- `SCHEDULED_IN` -- `FEEDS_INTO` -- `FOLLOWS` -- `ASSIGNED_TO` -- `CAN_COVER` -- `CERTIFIED_IN` -- `REQUIRES` -- `HAS_CAPACITY` \ No newline at end of file +## Design Notes + +`FEEDS_INTO` between stations captures the physical production flow (e.g. 011 FS IQB → 012 Förmontering → 013 Montering). This isn't in any CSV directly — it's derived from the station sequence implicit in the data. It lets you query downstream impact from a bottleneck station. + +`FOLLOWS` between WorkOrders links the same project-station pair across consecutive weeks, making temporal progression traversable without aggregating by week in every query. \ No newline at end of file diff --git a/submissions/Dia-Vats/level5/schema.png b/submissions/Dia-Vats/level5/schema.png index fe7dc4910..83832bd03 100644 Binary files a/submissions/Dia-Vats/level5/schema.png and b/submissions/Dia-Vats/level5/schema.png differ diff --git a/submissions/Dia-Vats/level6/.env.example b/submissions/Dia-Vats/level6/.env.example new file mode 100644 index 000000000..a4b5a6c92 --- /dev/null +++ b/submissions/Dia-Vats/level6/.env.example @@ -0,0 +1,3 @@ +NEO4J_URI=bolt://localhost:7687 +NEO4J_USER=neo4j +NEO4J_PASSWORD=your_password_here \ No newline at end of file diff --git a/submissions/Dia-Vats/level6/.streamlit/secrets.toml.example b/submissions/Dia-Vats/level6/.streamlit/secrets.toml.example new file mode 100644 index 000000000..35356b531 --- /dev/null +++ b/submissions/Dia-Vats/level6/.streamlit/secrets.toml.example @@ -0,0 +1,4 @@ +[NEO4J] +NEO4J_URI = "bolt://localhost:7687" +NEO4J_USER = "neo4j" +NEO4J_PASSWORD = "your_password_here" \ No newline at end of file diff --git a/submissions/Dia-Vats/level6/DASHBOARD_URL.txt b/submissions/Dia-Vats/level6/DASHBOARD_URL.txt new file mode 100644 index 000000000..76db7f4be --- /dev/null +++ b/submissions/Dia-Vats/level6/DASHBOARD_URL.txt @@ -0,0 +1 @@ +https://diavats-l6.streamlit.app \ No newline at end of file diff --git a/submissions/Dia-Vats/level6/README.md b/submissions/Dia-Vats/level6/README.md new file mode 100644 index 000000000..032da445c --- /dev/null +++ b/submissions/Dia-Vats/level6/README.md @@ -0,0 +1,108 @@ +# Steel Factory Production Dashboard +**Dia Vats | Level 6 - LifeAtlas LPI Developer Kit** +**Live: https://diavats-l6.streamlit.app** + +--- + +## What I Built + +The raw data was 3 CSVs describing a Swedish steel factory - 8 projects, 9 stations, 14 workers, 8 weeks. I turned it into a Neo4j knowledge graph and built a 7-page Streamlit dashboard on top of it. + +The point wasn't to build a chart on top of a spreadsheet. The graph models actual operational dependencies - which work orders flow through which stations, which workers can cover which stations if someone's out, and where downstream risk accumulates when a station overruns. The dashboard surfaces that reasoning directly. + +I also completed 2 bonus pages (Bonus B — Factory Floor, Bonus C — Forecast). + +**Graph stats: 148 nodes, 446 relationships, 9 labels, 12 relationship types.** + +--- + +## Graph Schema + +**Nodes:** Project, WorkOrder, Station, Product, Week, Worker, Certification, CapacitySnapshot, Etapp + +**Relationships:** HAS_WORKORDER, AT_STATION, PRODUCES, SCHEDULED_IN, HAS_CAPACITY, ASSIGNED_TO, CAN_COVER, CERTIFIED_IN, REQUIRES, FEEDS_INTO, FOLLOWS, IN_ETAPP + +A few things worth noting: + +- WorkOrder is the core unit - one node per row in the production CSV. It sits between Project and Station and carries planned hours, actual hours, variance %, and bottleneck flag. +- FEEDS_INTO chains stations in physical flow order (011 → 012 → ... → 021). This lets you trace downstream impact from any overrunning station. +- FOLLOWS links the same project-station pair across consecutive weeks so you can traverse time without aggregating in every query. +- Bottleneck rule: `actual_hours > planned_hours × 1.1` +- WorkOrder ID format: `P01_011_w1_IQB` + +--- + +## Dashboard Pages + +**Page 1 - Project Overview** +KPI cards, grouped bar chart (planned vs actual per project), variance table with red highlights where variance exceeds 10%. + +**Page 2 - Station Load** +Plotly heatmap, stations vs weeks, coloured green/yellow/red by variance. Station 016 shows up clearly as the recurring problem. + +**Page 3 - Capacity Tracker** +Weekly capacity vs planned demand. Deficit weeks in red. Calls out that 5 of 8 weeks run over capacity, worst at -132 hours in week 1. + +**Page 4 - Worker Coverage** +Shows who covers which station and flags single points of failure. Station 016 is marked CRITICAL — Victor Elm is the only backup for Per Hansen, and 4 projects run through that station. + +**Page 5 - Factory Floor (Bonus B)** +Scatter-based floor plan with stations on a physical grid, coloured by load severity. Hover shows active projects and overload %. + +**Page 6 - Forecast (Bonus C)** +Linear extrapolation from weeks 1–8 per station, projecting week 9. Shows which stations are trending toward overload. + +**Page 7 - Self-Test** +6 automated Neo4j checks, scored out of 20. Runs on every page load. + +--- + +## Project Structure + +``` +submissions/Dia-Vats/level6/ +├── app.py +├── db.py +├── seed_graph.py +├── requirements.txt +├── .env.example +├── DASHBOARD_URL.txt +├── README.md +├── data/ +│ ├── factory_production.csv +│ ├── factory_workers.csv +│ └── factory_capacity.csv +├── pages_impl/ +│ ├── page1_overview.py +│ ├── page2_station.py +│ ├── page3_capacity.py +│ ├── page4_workers.py +│ ├── page5_floor.py +│ ├── page6_forecast.py +│ └── page7_selftest.py +└── .streamlit/ + └── secrets.toml.example +``` + +--- + +## Running It + +```bash +pip install -r requirements.txt +python seed_graph.py +streamlit run app.py +``` + +For Streamlit Cloud, add credentials under Settings > Secrets: +```toml +NEO4J_URI = "neo4j+s://your-instance.databases.neo4j.io" +NEO4J_USER = "your-username" +NEO4J_PASSWORD = "your-password" +``` + +`seed_graph.py` uses MERGE everywhere — safe to re-run without duplicating data. + +--- + +*Made by Dia Vats* \ No newline at end of file diff --git a/submissions/Dia-Vats/level6/app.py b/submissions/Dia-Vats/level6/app.py new file mode 100644 index 000000000..be703c038 --- /dev/null +++ b/submissions/Dia-Vats/level6/app.py @@ -0,0 +1,105 @@ +""" +app.py — Swedish Steel Factory Dashboard +Author: Dia Vats +7-page Streamlit + Neo4j dashboard. +""" +import streamlit as st + +st.set_page_config( + page_title="Steel Factory Dashboard", + page_icon="[SF]", + layout="wide", + initial_sidebar_state="expanded", +) + +# ── Global styles ───────────────────────────────────────────────────────────── +st.markdown(""" + +""", unsafe_allow_html=True) + +PAGES = [ + "Project Overview", + "Station Load", + "Capacity Tracker", + "Worker Coverage", + "Factory Floor", + "Forecast", + "Self-Test", +] + +# ── Sidebar navigation ──────────────────────────────────────────────────────── +with st.sidebar: + st.image("https://img.icons8.com/fluency/96/factory.png", width=56) + st.markdown("## Steel Factory") + st.markdown("*Neo4j Production Dashboard*") + st.markdown("---") + page = st.radio("Navigate", PAGES, label_visibility="collapsed") + st.markdown("---") + st.caption("Made by **Dia Vats**") + +# ── Footer helper ───────────────────────────────────────────────────────────── +def footer(): + st.caption("Made by Dia Vats") + +# ── Route to page ───────────────────────────────────────────────────────────── +if page == PAGES[0]: + from pages_impl.page1_overview import render; render(); footer() +elif page == PAGES[1]: + from pages_impl.page2_station import render; render(); footer() +elif page == PAGES[2]: + from pages_impl.page3_capacity import render; render(); footer() +elif page == PAGES[3]: + from pages_impl.page4_workers import render; render(); footer() +elif page == PAGES[4]: + from pages_impl.page5_floor import render; render(); footer() +elif page == PAGES[5]: + from pages_impl.page6_forecast import render; render(); footer() +elif page == PAGES[6]: + from pages_impl.page7_selftest import render; render(); footer() diff --git a/submissions/Dia-Vats/level6/data/factory_capacity.csv b/submissions/Dia-Vats/level6/data/factory_capacity.csv new file mode 100644 index 000000000..795ff52f0 --- /dev/null +++ b/submissions/Dia-Vats/level6/data/factory_capacity.csv @@ -0,0 +1,9 @@ +week,own_staff_count,hired_staff_count,own_hours,hired_hours,overtime_hours,total_capacity,total_planned,deficit +w1,10,2,400,80,0,480,612,-132 +w2,10,2,400,80,40,520,645,-125 +w3,10,2,400,80,0,480,398,82 +w4,10,2,400,80,20,500,550,-50 +w5,10,2,400,80,30,510,480,30 +w6,9,2,360,80,0,440,520,-80 +w7,10,2,400,80,40,520,600,-80 +w8,10,2,400,80,20,500,470,30 \ No newline at end of file diff --git a/submissions/Dia-Vats/level6/data/factory_production.csv b/submissions/Dia-Vats/level6/data/factory_production.csv new file mode 100644 index 000000000..ca6ce43e1 --- /dev/null +++ b/submissions/Dia-Vats/level6/data/factory_production.csv @@ -0,0 +1,69 @@ +project_id,project_number,project_name,product_type,unit,quantity,unit_factor,station_code,station_name,etapp,bop,week,planned_hours,actual_hours,completed_units +P01,4501,Stålverket Borås,IQB,meter,600,1.77,011,FS IQB,ET1,BOP1,w1,48.0,45.2,28 +P01,4501,Stålverket Borås,IQB,meter,600,1.77,012,Förmontering IQB,ET1,BOP1,w1,32.0,35.5,25 +P01,4501,Stålverket Borås,IQB,meter,600,1.77,013,Montering IQB,ET1,BOP1,w1,28.0,26.0,22 +P01,4501,Stålverket Borås,IQB,meter,600,1.77,014,Svets o montage IQB,ET1,BOP1,w1,35.0,38.2,20 +P01,4501,Stålverket Borås,SB,styck,40,4.0,018,SB B/F-hall,ET1,BOP1,w1,16.0,14.5,4 +P01,4501,Stålverket Borås,SP,styck,180,2.0,019,SP B/F-hall,ET1,BOP1,w1,12.0,13.0,7 +P01,4501,Stålverket Borås,IQB,meter,600,1.77,011,FS IQB,ET1,BOP1,w2,48.0,50.0,32 +P01,4501,Stålverket Borås,IQB,meter,600,1.77,012,Förmontering IQB,ET1,BOP1,w2,32.0,30.0,28 +P01,4501,Stålverket Borås,IQP,styck,90,2.80,015,Montering IQP,ET1,BOP2,w2,25.0,28.0,9 +P01,4501,Stålverket Borås,SR,styck,8,45.0,021,SR B/F-hall,ET1,BOP2,w2,40.0,42.0,1 +P02,4502,Kontorshus Mölndal,IQB,meter,350,1.50,011,FS IQB,ET1,BOP1,w1,30.0,28.0,20 +P02,4502,Kontorshus Mölndal,IQB,meter,350,1.50,012,Förmontering IQB,ET1,BOP1,w1,22.0,24.5,18 +P02,4502,Kontorshus Mölndal,IQB,meter,350,1.50,013,Montering IQB,ET1,BOP1,w1,18.0,17.0,16 +P02,4502,Kontorshus Mölndal,IQP,styck,70,2.70,015,Montering IQP,ET1,BOP1,w1,19.0,21.0,7 +P02,4502,Kontorshus Mölndal,SD,styck,30,3.00,018,SB B/F-hall,ET1,BOP1,w1,9.0,8.5,3 +P02,4502,Kontorshus Mölndal,IQB,meter,350,1.50,011,FS IQB,ET1,BOP1,w2,30.0,32.0,24 +P02,4502,Kontorshus Mölndal,IQB,meter,350,1.50,014,Svets o montage IQB,ET1,BOP1,w2,25.0,23.0,20 +P02,4502,Kontorshus Mölndal,SP,styck,120,1.75,019,SP B/F-hall,ET1,BOP2,w2,14.0,15.5,8 +P03,4503,Lagerhall Jönköping,IQB,meter,900,1.89,011,FS IQB,ET1,BOP1,w1,72.0,70.0,40 +P03,4503,Lagerhall Jönköping,IQB,meter,900,1.89,012,Förmontering IQB,ET1,BOP1,w1,48.0,52.0,35 +P03,4503,Lagerhall Jönköping,IQB,meter,900,1.89,013,Montering IQB,ET1,BOP1,w1,38.0,36.5,30 +P03,4503,Lagerhall Jönköping,IQB,meter,900,1.89,014,Svets o montage IQB,ET1,BOP1,w1,42.0,48.0,28 +P03,4503,Lagerhall Jönköping,SB,styck,60,6.00,018,SB B/F-hall,ET1,BOP1,w1,36.0,38.0,6 +P03,4503,Lagerhall Jönköping,IQB,meter,900,1.89,011,FS IQB,ET1,BOP1,w2,72.0,75.0,45 +P03,4503,Lagerhall Jönköping,IQP,styck,110,2.90,015,Montering IQP,ET1,BOP2,w2,32.0,30.0,11 +P03,4503,Lagerhall Jönköping,IQB,meter,900,1.89,016,Gjutning,ET1,BOP2,w2,28.0,35.0,8 +P03,4503,Lagerhall Jönköping,IQB,meter,900,1.89,017,Målning,ET1,BOP2,w3,24.0,22.0,20 +P04,4504,Parkering Helsingborg,IQB,meter,450,1.65,011,FS IQB,ET1,BOP1,w1,38.0,36.0,24 +P04,4504,Parkering Helsingborg,IQB,meter,450,1.65,012,Förmontering IQB,ET1,BOP1,w1,25.0,27.0,20 +P04,4504,Parkering Helsingborg,IQB,meter,450,1.65,013,Montering IQB,ET1,BOP1,w1,20.0,19.0,18 +P04,4504,Parkering Helsingborg,IQP,styck,55,2.85,015,Montering IQP,ET1,BOP1,w1,16.0,18.0,6 +P04,4504,Parkering Helsingborg,SB,styck,25,7.50,018,SB B/F-hall,ET1,BOP1,w1,19.0,22.0,3 +P04,4504,Parkering Helsingborg,IQB,meter,450,1.65,011,FS IQB,ET1,BOP1,w2,38.0,40.0,28 +P04,4504,Parkering Helsingborg,SP,styck,100,2.00,019,SP B/F-hall,ET1,BOP2,w2,12.0,11.0,6 +P04,4504,Parkering Helsingborg,SR,styck,12,120.0,021,SR B/F-hall,ET1,BOP2,w2,60.0,65.0,1 +P05,4505,Sjukhus Linköping ET2,IQB,meter,1200,1.85,011,FS IQB,ET2,BOP3,w1,95.0,90.0,50 +P05,4505,Sjukhus Linköping ET2,IQB,meter,1200,1.85,012,Förmontering IQB,ET2,BOP3,w1,65.0,68.0,42 +P05,4505,Sjukhus Linköping ET2,IQB,meter,1200,1.85,013,Montering IQB,ET2,BOP3,w1,50.0,48.0,38 +P05,4505,Sjukhus Linköping ET2,IQB,meter,1200,1.85,014,Svets o montage IQB,ET2,BOP3,w1,58.0,62.0,35 +P05,4505,Sjukhus Linköping ET2,IQP,styck,150,2.88,015,Montering IQP,ET2,BOP3,w1,30.0,33.0,10 +P05,4505,Sjukhus Linköping ET2,SB,styck,50,5.00,018,SB B/F-hall,ET2,BOP3,w1,25.0,28.0,5 +P05,4505,Sjukhus Linköping ET2,SD,styck,45,2.75,018,SB B/F-hall,ET2,BOP3,w1,12.0,11.5,4 +P05,4505,Sjukhus Linköping ET2,IQB,meter,1200,1.85,011,FS IQB,ET2,BOP3,w2,95.0,98.0,55 +P05,4505,Sjukhus Linköping ET2,IQB,meter,1200,1.85,016,Gjutning,ET2,BOP3,w2,35.0,40.0,12 +P05,4505,Sjukhus Linköping ET2,IQB,meter,1200,1.85,017,Målning,ET2,BOP3,w2,28.0,26.0,25 +P05,4505,Sjukhus Linköping ET2,SR,styck,20,274.0,021,SR B/F-hall,ET2,BOP3,w3,120.0,115.0,2 +P06,4506,Skola Uppsala,IQB,meter,500,1.60,011,FS IQB,ET1,BOP1,w2,40.0,38.0,26 +P06,4506,Skola Uppsala,IQB,meter,500,1.60,012,Förmontering IQB,ET1,BOP1,w2,28.0,30.0,22 +P06,4506,Skola Uppsala,IQB,meter,500,1.60,013,Montering IQB,ET1,BOP1,w2,22.0,20.0,18 +P06,4506,Skola Uppsala,IQP,styck,80,2.75,015,Montering IQP,ET1,BOP1,w2,22.0,24.0,8 +P06,4506,Skola Uppsala,SB,styck,35,4.50,018,SB B/F-hall,ET1,BOP1,w2,16.0,18.0,4 +P06,4506,Skola Uppsala,SP,styck,140,1.50,019,SP B/F-hall,ET1,BOP2,w3,14.0,12.0,10 +P07,4507,Idrottshall Västerås,HSQ,meter,400,2.05,011,FS IQB,ET1,BOP1,w1,45.0,42.0,22 +P07,4507,Idrottshall Västerås,HSQ,meter,400,2.05,012,Förmontering IQB,ET1,BOP1,w1,30.0,33.0,18 +P07,4507,Idrottshall Västerås,HSQ,meter,400,2.05,014,Svets o montage IQB,ET1,BOP1,w1,35.0,32.0,16 +P07,4507,Idrottshall Västerås,SB,styck,45,3.50,018,SB B/F-hall,ET1,BOP1,w1,16.0,18.0,5 +P07,4507,Idrottshall Västerås,HSQ,meter,400,2.05,011,FS IQB,ET1,BOP1,w2,45.0,48.0,26 +P07,4507,Idrottshall Västerås,HSQ,meter,400,2.05,016,Gjutning,ET1,BOP2,w2,20.0,22.0,5 +P07,4507,Idrottshall Västerås,HSQ,meter,400,2.05,017,Målning,ET1,BOP2,w3,18.0,16.0,15 +P08,4508,Bro E6 Halmstad,IQB,meter,800,1.80,011,FS IQB,ET1,BOP1,w1,65.0,62.0,36 +P08,4508,Bro E6 Halmstad,IQB,meter,800,1.80,012,Förmontering IQB,ET1,BOP1,w1,42.0,45.0,30 +P08,4508,Bro E6 Halmstad,IQB,meter,800,1.80,013,Montering IQB,ET1,BOP1,w1,35.0,38.0,25 +P08,4508,Bro E6 Halmstad,IQB,meter,800,1.80,014,Svets o montage IQB,ET1,BOP1,w1,40.0,44.0,22 +P08,4508,Bro E6 Halmstad,SP,styck,200,2.50,019,SP B/F-hall,ET1,BOP1,w1,20.0,18.0,8 +P08,4508,Bro E6 Halmstad,IQB,meter,800,1.80,011,FS IQB,ET1,BOP1,w2,65.0,68.0,42 +P08,4508,Bro E6 Halmstad,IQP,styck,95,2.93,015,Montering IQP,ET1,BOP2,w2,28.0,30.0,10 +P08,4508,Bro E6 Halmstad,IQB,meter,800,1.80,016,Gjutning,ET1,BOP2,w3,22.0,25.0,8 +P08,4508,Bro E6 Halmstad,SR,styck,15,180.0,021,SR B/F-hall,ET1,BOP2,w3,90.0,85.0,2 \ No newline at end of file diff --git a/submissions/Dia-Vats/level6/data/factory_workers.csv b/submissions/Dia-Vats/level6/data/factory_workers.csv new file mode 100644 index 000000000..3110285cc --- /dev/null +++ b/submissions/Dia-Vats/level6/data/factory_workers.csv @@ -0,0 +1,15 @@ +worker_id,name,role,primary_station,can_cover_stations,certifications,hours_per_week,type +W01,Erik Lindberg,Operator,011,"011,012","MIG/MAG,TIG,ISO 9606",40,permanent +W02,Anna Berg,Operator,011,"011,014","MIG/MAG,TIG",40,permanent +W03,Lars Jensen,Operator,012,"012,013","Surface treatment,CE marking",40,permanent +W04,Maria Stone,Operator,013,"013","Blasting,Surface protection",40,permanent +W05,Johan Peters,Operator,014,"014,015","Hydraulics,Mechanics,Crane",40,permanent +W06,Karen Nilsen,Inspector,015,"015","SIS,SS-EN 1090,NDT",40,permanent +W07,Per Hansen,Operator,016,"016,017","Casting,Formwork",40,permanent +W08,Sofia Arden,Operator,017,"017","Surface treatment,Spray painting",40,permanent +W09,Magnus Stone,Operator,018,"018,019","Sheet metal,Assembly",40,permanent +W10,Elin Frank,Operator,019,"019,018","Assembly,Welding",32,permanent +W11,Victor Elm,Foreman,all,"011,012,013,014,015,016,017,018,019,021","Leadership,CE,ISO 9001",45,permanent +W12,Lena Dale,Quality Manager,015,"015","ISO 9001,SS-EN 1090,Audit",40,permanent +W13,Ahmed Hassan,Operator,011,"011","MIG/MAG",40,hired +W14,Petra Steen,Operator,012,"012,013","Surface treatment",40,hired \ No newline at end of file diff --git a/submissions/Dia-Vats/level6/db.py b/submissions/Dia-Vats/level6/db.py new file mode 100644 index 000000000..a9d4782c6 --- /dev/null +++ b/submissions/Dia-Vats/level6/db.py @@ -0,0 +1,25 @@ +"""db.py — shared Neo4j driver for the Streamlit dashboard.""" +import os +import streamlit as st +from neo4j import GraphDatabase + +@st.cache_resource(show_spinner=False) +def get_driver(): + try: + uri = st.secrets["NEO4J_URI"] + user = st.secrets["NEO4J_USER"] + pw = st.secrets["NEO4J_PASSWORD"] + except Exception: + from dotenv import load_dotenv + load_dotenv() + uri = os.getenv("NEO4J_URI", "bolt://localhost:7687") + user = os.getenv("NEO4J_USER", "neo4j") + pw = os.getenv("NEO4J_PASSWORD", "password") + return GraphDatabase.driver(uri, auth=(user, pw)) + + +def run_query(cypher: str, params: dict | None = None) -> list[dict]: + driver = get_driver() + with driver.session() as s: + result = s.run(cypher, params or {}) + return [dict(r) for r in result] diff --git a/submissions/Dia-Vats/level6/pages_impl/__init__.py b/submissions/Dia-Vats/level6/pages_impl/__init__.py new file mode 100644 index 000000000..63024e556 --- /dev/null +++ b/submissions/Dia-Vats/level6/pages_impl/__init__.py @@ -0,0 +1 @@ +# pages_impl/__init__.py \ No newline at end of file diff --git a/submissions/Dia-Vats/level6/pages_impl/page1_overview.py b/submissions/Dia-Vats/level6/pages_impl/page1_overview.py new file mode 100644 index 000000000..a30b0f008 --- /dev/null +++ b/submissions/Dia-Vats/level6/pages_impl/page1_overview.py @@ -0,0 +1,82 @@ +"""Page 1 — Project Overview""" +import streamlit as st +import pandas as pd +import plotly.graph_objects as go +import sys, os +sys.path.insert(0, os.path.dirname(os.path.dirname(__file__))) +from db import run_query + + +def render(): + st.markdown('
Project Overview
', unsafe_allow_html=True) + st.markdown("Planned vs actual hours per project, variance analysis, and bottleneck summary.") + + rows = run_query(""" + MATCH (p:Project)-[:HAS_WORKORDER]->(wo:WorkOrder) + RETURN + p.project_id AS project_id, + p.project_name AS project_name, + sum(wo.planned_hours) AS total_planned, + sum(wo.actual_hours) AS total_actual, + sum(CASE WHEN wo.is_bottleneck THEN 1 ELSE 0 END) AS bottleneck_count + ORDER BY p.project_id + """) + + if not rows: + st.error("No data returned. Run seed_graph.py first.") + return + + df = pd.DataFrame(rows) + df["variance_pct"] = ((df["total_actual"] - df["total_planned"]) / df["total_planned"] * 100).round(1) + + # ── KPI cards ───────────────────────────────────────────────────────────── + col1, col2, col3, col4 = st.columns(4) + with col1: + st.markdown(f'
Total Projects
{len(df)}
', unsafe_allow_html=True) + with col2: + st.markdown(f'
Total Planned hrs
{int(df["total_planned"].sum()):,}
', unsafe_allow_html=True) + with col3: + st.markdown(f'
Total Actual hrs
{int(df["total_actual"].sum()):,}
', unsafe_allow_html=True) + with col4: + btn = int(df["bottleneck_count"].sum()) + st.markdown(f'
Bottleneck Work Orders
{btn}
', unsafe_allow_html=True) + + st.markdown("---") + + # ── Grouped bar chart ────────────────────────────────────────────────────── + fig = go.Figure() + fig.add_trace(go.Bar( + name="Planned hrs", x=df["project_name"], y=df["total_planned"], + marker_color="#3b82f6", marker_line_color="#1d4ed8", marker_line_width=1, + )) + fig.add_trace(go.Bar( + name="Actual hrs", x=df["project_name"], y=df["total_actual"], + marker_color="#f97316", marker_line_color="#c2410c", marker_line_width=1, + )) + fig.update_layout( + barmode="group", template="plotly_dark", + title="Planned vs Actual Hours — All Projects", + xaxis_title="Project", yaxis_title="Hours", + legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1), + height=420, margin=dict(l=40, r=20, t=60, b=100), + xaxis_tickangle=-30, + paper_bgcolor="rgba(0,0,0,0)", plot_bgcolor="rgba(15,17,23,0.6)", + ) + st.plotly_chart(fig, use_container_width=True) + + st.markdown("---") + st.markdown("**PROJECT DETAIL TABLE**") + + # ── Variance table ───────────────────────────────────────────────────────── + display_df = df[["project_id","project_name","total_planned","total_actual","variance_pct","bottleneck_count"]].copy() + display_df.columns = ["ID","Project","Planned hrs","Actual hrs","Variance %","Bottlenecks"] + + def color_variance(val): + if val > 10: + return "background-color:#7f1d1d; color:#fca5a5; font-weight:600" + elif val > 0: + return "color:#fbbf24" + return "color:#4ade80" + + styled = display_df.style.applymap(color_variance, subset=["Variance %"]) + st.dataframe(styled, use_container_width=True, hide_index=True) diff --git a/submissions/Dia-Vats/level6/pages_impl/page2_station.py b/submissions/Dia-Vats/level6/pages_impl/page2_station.py new file mode 100644 index 000000000..f5034deea --- /dev/null +++ b/submissions/Dia-Vats/level6/pages_impl/page2_station.py @@ -0,0 +1,96 @@ +"""Page 2 — Station Load Heatmap""" +import streamlit as st +import pandas as pd +import plotly.graph_objects as go +import sys, os +sys.path.insert(0, os.path.dirname(os.path.dirname(__file__))) +from db import run_query + + +def render(): + st.markdown('
Station Load
', unsafe_allow_html=True) + st.markdown("Heatmap of variance % by station × week. Red = overloaded (>10%), green = on track.") + + rows = run_query(""" + MATCH (wo:WorkOrder)-[:AT_STATION]->(s:Station) + MATCH (wo)-[r:SCHEDULED_IN]->(wk:Week) + RETURN + s.station_code AS station_code, + s.station_name AS station_name, + wk.week_id AS week_id, + sum(r.planned_hours) AS planned, + sum(r.actual_hours) AS actual + ORDER BY s.station_code, wk.week_id + """) + + if not rows: + st.error("No data returned. Run seed_graph.py first.") + return + + df = pd.DataFrame(rows) + df["variance_pct"] = ((df["actual"] - df["planned"]) / df["planned"] * 100).round(1) + df["label"] = df["station_code"] + "\n" + df["station_name"] + + # pivot for heatmap + week_order = ["w1","w2","w3","w4","w5","w6","w7","w8"] + pivot = df.pivot_table(index="label", columns="week_id", values="variance_pct", aggfunc="mean") + pivot = pivot.reindex(columns=[w for w in week_order if w in pivot.columns]) + + stations = list(pivot.index) + weeks = list(pivot.columns) + z_vals = pivot.values.tolist() + + # Custom text for hover + hover_text = [] + for s_label in stations: + row_texts = [] + for wk in weeks: + try: + v = pivot.loc[s_label, wk] + row_texts.append(f"Station: {s_label}
Week: {wk}
Variance: {v:.1f}%") + except Exception: + row_texts.append("") + hover_text.append(row_texts) + + fig = go.Figure(data=go.Heatmap( + z=z_vals, + x=weeks, + y=stations, + text=pivot.values.tolist(), + texttemplate="%{text:.1f}%", + colorscale=[ + [0.0, "#166534"], + [0.3, "#4ade80"], + [0.5, "#facc15"], + [0.7, "#f97316"], + [1.0, "#991b1b"], + ], + zmid=0, + zmin=-20, zmax=30, + colorbar=dict(title="Variance %", ticksuffix="%"), + hovertext=hover_text, + hoverinfo="text", + )) + fig.update_layout( + template="plotly_dark", + title="Station Load Heatmap — Variance % (Actual vs Planned)", + xaxis_title="Week", yaxis_title="Station", + height=520, + paper_bgcolor="rgba(0,0,0,0)", + plot_bgcolor="rgba(15,17,23,0.8)", + margin=dict(l=180, r=20, t=60, b=40), + ) + st.plotly_chart(fig, use_container_width=True) + + st.markdown("---") + st.markdown("**OVERLOADED CELLS — VARIANCE > 10%**") + overloaded = df[df["variance_pct"] > 10][["station_code","station_name","week_id","planned","actual","variance_pct"]] + overloaded.columns = ["Code","Station","Week","Planned hrs","Actual hrs","Variance %"] + if len(overloaded): + st.dataframe( + overloaded.style.highlight_between(subset=["Variance %"], left=10, right=999, + props="background-color:#7f1d1d;color:#fca5a5;font-weight:600"), + use_container_width=True, hide_index=True + ) + else: + st.success("No overloaded cells found.") diff --git a/submissions/Dia-Vats/level6/pages_impl/page3_capacity.py b/submissions/Dia-Vats/level6/pages_impl/page3_capacity.py new file mode 100644 index 000000000..968bd2d8d --- /dev/null +++ b/submissions/Dia-Vats/level6/pages_impl/page3_capacity.py @@ -0,0 +1,98 @@ +"""Page 3 — Capacity Tracker""" +import streamlit as st +import pandas as pd +import plotly.graph_objects as go +import sys, os +sys.path.insert(0, os.path.dirname(os.path.dirname(__file__))) +from db import run_query + + +def render(): + st.markdown('
Capacity Tracker
', unsafe_allow_html=True) + st.markdown("Weekly factory capacity vs planned demand. Red bars = deficit weeks.") + + rows = run_query(""" + MATCH (wk:Week)-[:HAS_CAPACITY]->(cs:CapacitySnapshot) + RETURN + wk.week_id AS week_id, + cs.total_capacity AS total_capacity, + cs.total_planned AS total_planned, + cs.deficit AS deficit, + cs.overtime_hours AS overtime_hours, + cs.own_staff_count AS own_staff, + cs.hired_staff_count AS hired_staff + ORDER BY wk.week_id + """) + + if not rows: + st.error("No data returned. Run seed_graph.py first.") + return + + df = pd.DataFrame(rows) + week_order = ["w1","w2","w3","w4","w5","w6","w7","w8"] + df["week_id"] = pd.Categorical(df["week_id"], categories=week_order, ordered=True) + df = df.sort_values("week_id") + + deficit_weeks = int((df["deficit"] < 0).sum()) + total_weeks = len(df) + + # ── KPI cards ───────────────────────────────────────────────────────────── + col1, col2, col3 = st.columns(3) + with col1: + st.markdown(f'
Deficit Weeks
{deficit_weeks} of {total_weeks}
', unsafe_allow_html=True) + with col2: + total_def = int(df[df["deficit"] < 0]["deficit"].sum()) + st.markdown(f'
Total Deficit hrs
{total_def}
', unsafe_allow_html=True) + with col3: + ot = int(df["overtime_hours"].sum()) + st.markdown(f'
Total Overtime hrs
{ot}
', unsafe_allow_html=True) + + # Insight text + st.warning(f"**{deficit_weeks} of {total_weeks} weeks are in deficit** — factory demand exceeds available capacity in {deficit_weeks} out of {total_weeks} weeks. Consider scheduling overtime or additional hired staff.") + + st.markdown("---") + + # ── Grouped bar chart ────────────────────────────────────────────────────── + cap_colors = ["#3b82f6"] * len(df) + plan_colors = ["#ef4444" if d < 0 else "#f97316" for d in df["deficit"]] + + fig = go.Figure() + fig.add_trace(go.Bar( + name="Total Capacity", x=df["week_id"], y=df["total_capacity"], + marker_color=cap_colors, opacity=0.85, + )) + fig.add_trace(go.Bar( + name="Total Planned", x=df["week_id"], y=df["total_planned"], + marker_color=plan_colors, opacity=0.9, + )) + fig.add_trace(go.Scatter( + name="Deficit / Surplus", x=df["week_id"], y=df["deficit"], + mode="lines+markers", + line=dict(color="#a855f7", width=2, dash="dot"), + marker=dict(size=8, color=["#ef4444" if d < 0 else "#4ade80" for d in df["deficit"]]), + )) + fig.add_hline(y=0, line_color="#64748b", line_dash="dash") + fig.update_layout( + barmode="group", template="plotly_dark", + title="Weekly Capacity vs Planned Demand", + xaxis_title="Week", yaxis_title="Hours", + legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1), + height=440, + paper_bgcolor="rgba(0,0,0,0)", plot_bgcolor="rgba(15,17,23,0.6)", + ) + st.plotly_chart(fig, use_container_width=True) + + st.markdown("---") + st.markdown("**WEEKLY BREAKDOWN**") + display = df[["week_id","own_staff","hired_staff","total_capacity","total_planned","overtime_hours","deficit"]].copy() + display.columns = ["Week","Own Staff","Hired Staff","Capacity hrs","Planned hrs","Overtime hrs","Deficit"] + + def color_deficit(val): + if val < 0: + return "background-color:#7f1d1d; color:#fca5a5; font-weight:700" + return "color:#4ade80; font-weight:700" + + st.dataframe( + display.style.applymap(color_deficit, subset=["Deficit"]), + use_container_width=True, hide_index=True + ) diff --git a/submissions/Dia-Vats/level6/pages_impl/page4_workers.py b/submissions/Dia-Vats/level6/pages_impl/page4_workers.py new file mode 100644 index 000000000..e2cf4c20d --- /dev/null +++ b/submissions/Dia-Vats/level6/pages_impl/page4_workers.py @@ -0,0 +1,118 @@ +"""Page 4 — Worker Coverage""" +import streamlit as st +import pandas as pd +import sys, os +sys.path.insert(0, os.path.dirname(os.path.dirname(__file__))) +from db import run_query + + +def render(): + st.markdown('
Worker Coverage
', unsafe_allow_html=True) + st.markdown("Station coverage matrix — SPOF (Single Point of Failure) stations highlighted in red.") + + # Coverage query + coverage_rows = run_query(""" + MATCH (s:Station) + OPTIONAL MATCH (w:Worker)-[:CAN_COVER]->(s) + WITH s, + collect(DISTINCT w.name) AS coverers, + count(DISTINCT w) AS coverage_count + OPTIONAL MATCH (wo:WorkOrder)-[:AT_STATION]->(s) + OPTIONAL MATCH (p:Project)-[:HAS_WORKORDER]->(wo) + WITH s, coverers, coverage_count, + collect(DISTINCT p.project_name) AS active_projects + RETURN + s.station_code AS station_code, + s.station_name AS station_name, + coverers, + coverage_count, + active_projects, + CASE WHEN coverage_count <= 1 THEN true ELSE false END AS is_spof + ORDER BY coverage_count ASC + """) + + if not coverage_rows: + st.error("No data returned. Run seed_graph.py first.") + return + + df = pd.DataFrame(coverage_rows) + + # ── Summary metrics ──────────────────────────────────────────────────────── + spof_count = int((df["is_spof"] == True).sum()) + col1, col2, col3 = st.columns(3) + with col1: + st.markdown(f'
Total Stations
{len(df)}
', unsafe_allow_html=True) + with col2: + st.markdown(f'
SPOF Stations
{spof_count}
', unsafe_allow_html=True) + with col3: + avg_cov = round(df["coverage_count"].mean(), 1) + st.markdown(f'
Avg Coverage per Station
{avg_cov}
', unsafe_allow_html=True) + + st.markdown("---") + + # ── Station table ────────────────────────────────────────────────────────── + st.markdown("**STATION COVERAGE MATRIX**") + for _, row in df.iterrows(): + spof_html = 'CRITICAL' if row["is_spof"] else 'OK' + workers_str = ", ".join(row["coverers"]) if row["coverers"] else "—" + projects_str = ", ".join(row["active_projects"][:4]) if row["active_projects"] else "—" + cert_count = len(row["coverers"]) # proxy + + with st.expander(f"**{row['station_code']}** — {row['station_name']} {spof_html}", expanded=row["is_spof"]): + c1, c2, c3 = st.columns([2, 2, 3]) + with c1: + st.metric("Coverage Count", int(row["coverage_count"])) + with c2: + st.metric("Active Projects", len(row["active_projects"])) + with c3: + st.markdown(f"**Workers who can cover:** {workers_str}") + st.markdown(f"**Dependent projects:** {projects_str}") + if row["is_spof"]: + st.error(f"SPOF ALERT: Only {int(row['coverage_count'])} worker(s) cover this station. " + f"Projects at risk: {projects_str}") + + st.markdown("---") + + # ── SPOF downstream risk ─────────────────────────────────────────────────── + st.markdown("**SPOF DOWNSTREAM RISK (VIA FEEDS\_INTO)**") + spof_downstream = run_query(""" + MATCH (s:Station) + WHERE NOT EXISTS { MATCH (w:Worker)-[:CAN_COVER]->(s) } + OR 1 >= size([(w:Worker)-[:CAN_COVER]->(s) | w]) + MATCH path = (s)-[:FEEDS_INTO*1..5]->(ds:Station) + MATCH (p:Project)-[:HAS_WORKORDER]->(wo:WorkOrder)-[:AT_STATION]->(ds) + RETURN + s.station_code AS spof_station, + s.station_name AS spof_name, + ds.station_code AS downstream_station, + ds.station_name AS downstream_name, + collect(DISTINCT p.project_name) AS at_risk_projects + LIMIT 30 + """) + + if spof_downstream: + df_down = pd.DataFrame(spof_downstream) + df_down.columns = ["SPOF Station","SPOF Name","Downstream Station","Downstream Name","At-Risk Projects"] + df_down["At-Risk Projects"] = df_down["At-Risk Projects"].apply(lambda x: ", ".join(x)) + st.dataframe(df_down, use_container_width=True, hide_index=True) + else: + st.info("No downstream risk paths found.") + + st.markdown("---") + st.markdown("**ALL WORKERS**") + workers_rows = run_query(""" + MATCH (w:Worker) + OPTIONAL MATCH (w)-[:CERTIFIED_IN]->(c:Certification) + RETURN + w.worker_id AS worker_id, + w.name AS name, + w.role AS role, + w.type AS type, + w.hours_per_week AS hours_pw, + collect(c.name) AS certifications + ORDER BY w.worker_id + """) + wdf = pd.DataFrame(workers_rows) + wdf["certifications"] = wdf["certifications"].apply(lambda x: ", ".join(x) if x else "—") + wdf.columns = ["ID","Name","Role","Type","hrs/wk","Certifications"] + st.dataframe(wdf, use_container_width=True, hide_index=True) diff --git a/submissions/Dia-Vats/level6/pages_impl/page5_floor.py b/submissions/Dia-Vats/level6/pages_impl/page5_floor.py new file mode 100644 index 000000000..219831b7b --- /dev/null +++ b/submissions/Dia-Vats/level6/pages_impl/page5_floor.py @@ -0,0 +1,133 @@ +"""Page 5 — Factory Floor (Bonus B)""" +import streamlit as st +import pandas as pd +import plotly.graph_objects as go +import sys, os +sys.path.insert(0, os.path.dirname(os.path.dirname(__file__))) +from db import run_query + +# Grid positions per spec: 011→row0,col0 / 012→row0,col1 / ... / 021→row2,col1 +STATION_GRID = { + "011": (0, 0), "012": (0, 1), "013": (0, 2), "014": (0, 3), + "015": (1, 0), "016": (1, 1), "017": (1, 2), "018": (1, 3), + "019": (2, 0), "021": (2, 1), +} + +FEEDS_INTO_EDGES = [ + ("011","012"),("012","013"),("013","014"),("014","015"), + ("015","016"),("016","017"),("017","018"),("018","019"),("019","021"), +] + + +def render(): + st.markdown('
Factory Floor
', unsafe_allow_html=True) + st.markdown("Scatter-based factory floor plan. Stations coloured by load severity. Hover for active projects and overload %.") + + rows = run_query(""" + MATCH (wo:WorkOrder)-[:AT_STATION]->(s:Station) + MATCH (wo)-[r:SCHEDULED_IN]->(wk:Week) + MATCH (p:Project)-[:HAS_WORKORDER]->(wo) + WITH s, + sum(r.planned_hours) AS planned, + sum(r.actual_hours) AS actual, + collect(DISTINCT p.project_name) AS projects + RETURN + s.station_code AS station_code, + s.station_name AS station_name, + planned, actual, projects + """) + + if not rows: + st.error("No data returned. Run seed_graph.py first.") + return + + df = pd.DataFrame(rows) + df["variance_pct"] = ((df["actual"] - df["planned"]) / df["planned"] * 100).round(1) + + def severity_color(v): + if v > 15: return "#ef4444" + if v > 5: return "#f97316" + if v > 0: return "#facc15" + return "#4ade80" + + # Build scatter points + x_vals, y_vals, colors, sizes, labels, hovers = [], [], [], [], [], [] + station_pos = {} # code -> (x, y) for drawing edges + + for _, row in df.iterrows(): + sc = row["station_code"] + if sc not in STATION_GRID: + continue + grid_r, grid_c = STATION_GRID[sc] + # invert row so row0 is at top + x = grid_c * 2.5 + y = (2 - grid_r) * 2.0 + station_pos[sc] = (x, y) + + proj_list = ", ".join(row["projects"][:3]) + hover = (f"{sc} — {row['station_name']}
" + f"Planned: {row['planned']:.0f} hrs
" + f"Actual: {row['actual']:.0f} hrs
" + f"Overload: {row['variance_pct']:.1f}%
" + f"Projects: {proj_list}") + + x_vals.append(x); y_vals.append(y) + colors.append(severity_color(row["variance_pct"])) + sizes.append(40 + max(0, row["variance_pct"]) * 1.5) + labels.append(f"{sc}") + hovers.append(hover) + + fig = go.Figure() + + # Draw FEEDS_INTO edges first + for src, dst in FEEDS_INTO_EDGES: + if src in station_pos and dst in station_pos: + sx, sy = station_pos[src] + dx, dy = station_pos[dst] + fig.add_trace(go.Scatter( + x=[sx, dx], y=[sy, dy], mode="lines", + line=dict(color="#334155", width=2, dash="dot"), + showlegend=False, hoverinfo="skip", + )) + + # Station nodes + fig.add_trace(go.Scatter( + x=x_vals, y=y_vals, mode="markers+text", + marker=dict(color=colors, size=sizes, line=dict(color="#1e293b", width=2), + opacity=0.9), + text=labels, textposition="middle center", + textfont=dict(color="#f1f5f9", size=11, family="Inter"), + hovertext=hovers, hoverinfo="text", + showlegend=False, + )) + + # Legend annotation + for color, label in [("#4ade80","On track"),("#facc15","Slight over"), + ("#f97316",">5% over"),("#ef4444",">15% over")]: + fig.add_trace(go.Scatter( + x=[None], y=[None], mode="markers", + marker=dict(color=color, size=12), + name=label, showlegend=True, + )) + + fig.update_layout( + template="plotly_dark", + title="Factory Floor — Station Load Map", + xaxis=dict(showgrid=False, zeroline=False, showticklabels=False, range=[-0.5, 9]), + yaxis=dict(showgrid=False, zeroline=False, showticklabels=False, range=[-0.5, 5]), + height=500, + paper_bgcolor="rgba(0,0,0,0)", + plot_bgcolor="rgba(15,17,23,0.8)", + legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1), + margin=dict(l=20, r=20, t=60, b=20), + ) + st.plotly_chart(fig, use_container_width=True) + + st.markdown("---") + # Table summary + display = df[df["station_code"].isin(STATION_GRID)][ + ["station_code","station_name","planned","actual","variance_pct"] + ].copy() + display.columns = ["Code","Station","Planned hrs","Actual hrs","Overload %"] + st.dataframe(display.style.background_gradient(subset=["Overload %"], cmap="RdYlGn_r"), + use_container_width=True, hide_index=True) diff --git a/submissions/Dia-Vats/level6/pages_impl/page6_forecast.py b/submissions/Dia-Vats/level6/pages_impl/page6_forecast.py new file mode 100644 index 000000000..d98abd2bb --- /dev/null +++ b/submissions/Dia-Vats/level6/pages_impl/page6_forecast.py @@ -0,0 +1,130 @@ +"""Page 6 — Forecast (Bonus C)""" +import streamlit as st +import pandas as pd +import numpy as np +import plotly.graph_objects as go +import sys, os +sys.path.insert(0, os.path.dirname(os.path.dirname(__file__))) +from db import run_query + +WEEK_NUM = {"w1":1,"w2":2,"w3":3,"w4":4,"w5":5,"w6":6,"w7":7,"w8":8} + + +def render(): + st.markdown('
Forecast
', unsafe_allow_html=True) + st.markdown("Linear trend extrapolation per station from weeks w1–w8, predicting week 9 load.") + + rows = run_query(""" + MATCH (wo:WorkOrder)-[:AT_STATION]->(s:Station) + MATCH (wo)-[r:SCHEDULED_IN]->(wk:Week) + RETURN + s.station_code AS station_code, + s.station_name AS station_name, + wk.week_id AS week_id, + sum(r.actual_hours) AS actual_hours + ORDER BY s.station_code, wk.week_id + """) + + if not rows: + st.error("No data returned. Run seed_graph.py first.") + return + + df = pd.DataFrame(rows) + df["week_num"] = df["week_id"].map(WEEK_NUM) + df = df.dropna(subset=["week_num"]) + + stations = sorted(df["station_code"].unique()) + sel = st.multiselect("Select stations to display", stations, default=stations[:5]) + if not sel: + st.warning("Select at least one station.") + return + + fig = go.Figure() + predictions = [] + + colors_palette = [ + "#3b82f6","#f97316","#a855f7","#22c55e","#f43f5e", + "#06b6d4","#eab308","#6366f1","#14b8a6","#ec4899", + ] + + for i, sc in enumerate(sel): + sub = df[df["station_code"] == sc].sort_values("week_num") + if len(sub) < 2: + continue + station_name = sub["station_name"].iloc[0] + color = colors_palette[i % len(colors_palette)] + + x = sub["week_num"].values + y = sub["actual_hours"].values + + # Linear regression + coeffs = np.polyfit(x, y, 1) + slope, intercept = coeffs + trend_fn = np.poly1d(coeffs) + pred_w9 = float(trend_fn(9)) + + # Trend line over w1–w9 + x_trend = np.linspace(1, 9, 50) + y_trend = trend_fn(x_trend) + + # Actual data points + fig.add_trace(go.Scatter( + x=sub["week_id"], y=y, + mode="lines+markers", + name=f"{sc} actual", + line=dict(color=color, width=2), + marker=dict(size=7), + legendgroup=sc, + )) + + # Week labels for trend x-axis (w1..w9) + week_labels = [f"w{int(n)}" for n in np.round(x_trend)] + + # Trend line + fig.add_trace(go.Scatter( + x=week_labels, y=y_trend, + mode="lines", + name=f"{sc} trend", + line=dict(color=color, width=1.5, dash="dot"), + opacity=0.5, + legendgroup=sc, + showlegend=False, + )) + + # Week-9 predicted point + fig.add_trace(go.Scatter( + x=["w9"], y=[pred_w9], + mode="markers", + name=f"{sc} w9 forecast", + marker=dict(size=14, color=color, symbol="star", + line=dict(color="#fff", width=2)), + legendgroup=sc, + showlegend=True, + )) + + predictions.append({ + "Station Code": sc, + "Station": station_name, + "Trend (slope)": f"{slope:+.2f} hrs/wk", + "Week 8 Actual": f"{y[-1]:.1f} hrs", + "Week 9 Forecast": f"{max(0, pred_w9):.1f} hrs", + "Risk": "High" if pred_w9 > 60 else ("Medium" if pred_w9 > 30 else "Low"), + }) + + fig.update_layout( + template="plotly_dark", + title="Station Load Forecast — Linear Extrapolation to Week 9", + xaxis_title="Week", yaxis_title="Actual Hours", + legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1), + height=480, + paper_bgcolor="rgba(0,0,0,0)", + plot_bgcolor="rgba(15,17,23,0.6)", + ) + st.plotly_chart(fig, use_container_width=True) + + if predictions: + st.markdown("---") + st.subheader("Week 9 Forecast Summary") + pred_df = pd.DataFrame(predictions) + st.dataframe(pred_df, use_container_width=True, hide_index=True) + st.caption("Star markers on the chart indicate week 9 predicted load. Forecast uses simple OLS linear regression on actual hours w1–w8.") diff --git a/submissions/Dia-Vats/level6/pages_impl/page7_selftest.py b/submissions/Dia-Vats/level6/pages_impl/page7_selftest.py new file mode 100644 index 000000000..baf0cb084 --- /dev/null +++ b/submissions/Dia-Vats/level6/pages_impl/page7_selftest.py @@ -0,0 +1,137 @@ +"""Page 7 — Self-Test (runs automatically, green/red checklist, score out of 20)""" +import streamlit as st +import sys, os +sys.path.insert(0, os.path.dirname(os.path.dirname(__file__))) +from db import run_query, get_driver + + +def _check(label: str, passed: bool, points: int, detail: str = ""): + icon = "✅" if passed else "❌" + color = "#166534" if passed else "#7f1d1d" + border = "#4ade80" if passed else "#ef4444" + pts_earned = points if passed else 0 + st.markdown( + f"""
+ {icon} {label}   + + {pts_earned}/{points} pts + {'
' + + detail + '' if detail else ''} +
""", + unsafe_allow_html=True, + ) + return pts_earned + + +def render(): + st.title("✅ Self-Test") + st.markdown("Automated graph validation — 6 checks, 20 points total. Runs on every page load.") + + total = 0 + max_score = 20 + + # ── Check 1: Neo4j connection alive (3 pts) ──────────────────────────────── + try: + driver = get_driver() + driver.verify_connectivity() + total += _check("Check 1 — Neo4j connection alive", True, 3, "Connected successfully.") + except Exception as e: + _check("Check 1 — Neo4j connection alive", False, 3, str(e)) + + # ── Check 2: node count ≥ 50 (3 pts) ────────────────────────────────────── + try: + res = run_query("MATCH (n) RETURN count(n) AS cnt") + node_count = res[0]["cnt"] if res else 0 + passed = node_count >= 50 + total += _check( + "Check 2 — Node count ≥ 50", passed, 3, + f"Found {node_count} nodes." + ) + except Exception as e: + _check("Check 2 — Node count ≥ 50", False, 3, str(e)) + + # ── Check 3: relationship count ≥ 100 (3 pts) ───────────────────────────── + try: + res = run_query("MATCH ()-[r]->() RETURN count(r) AS cnt") + rel_count = res[0]["cnt"] if res else 0 + passed = rel_count >= 100 + total += _check( + "Check 3 — Relationship count ≥ 100", passed, 3, + f"Found {rel_count} relationships." + ) + except Exception as e: + _check("Check 3 — Relationship count ≥ 100", False, 3, str(e)) + + # ── Check 4: 6+ distinct node labels (3 pts) ────────────────────────────── + try: + res = run_query("CALL db.labels() YIELD label RETURN collect(label) AS labels") + labels = res[0]["labels"] if res else [] + passed = len(labels) >= 6 + total += _check( + "Check 4 — 6+ distinct node labels", passed, 3, + f"Labels: {', '.join(sorted(labels))} ({len(labels)} total)" + ) + except Exception as e: + _check("Check 4 — 6+ distinct node labels", False, 3, str(e)) + + # ── Check 5: 8+ distinct relationship types (3 pts) ─────────────────────── + try: + res = run_query("CALL db.relationshipTypes() YIELD relationshipType RETURN collect(relationshipType) AS rels") + rels = res[0]["rels"] if res else [] + passed = len(rels) >= 8 + total += _check( + "Check 5 — 8+ distinct relationship types", passed, 3, + f"Types: {', '.join(sorted(rels))} ({len(rels)} total)" + ) + except Exception as e: + _check("Check 5 — 8+ distinct relationship types", False, 3, str(e)) + + # ── Check 6: Bottleneck query returns results (5 pts) ───────────────────── + BOTTLENECK_CYPHER = """ +MATCH (p:Project)-[:HAS_WORKORDER]->(wo:WorkOrder)-[:AT_STATION]->(s:Station) +WHERE wo.actual_hours > wo.planned_hours * 1.1 +RETURN p.project_name AS project, s.station_name AS station, + wo.planned_hours AS planned, wo.actual_hours AS actual +LIMIT 10 +""" + try: + res = run_query(BOTTLENECK_CYPHER) + passed = len(res) > 0 + detail_lines = [] + for r in res[:5]: + detail_lines.append( + f"• {r['project']} @ {r['station']} — " + f"planned {r['planned']:.0f}h actual {r['actual']:.0f}h" + ) + detail = f"Returned {len(res)} bottleneck work orders.
" + "
".join(detail_lines) + total += _check( + "Check 6 — Bottleneck query (actual > planned × 1.1)", passed, 5, + detail if passed else "Query returned 0 rows — no bottlenecks found or query mismatch." + ) + except Exception as e: + _check("Check 6 — Bottleneck query (actual > planned × 1.1)", False, 5, str(e)) + + # ── Score summary ────────────────────────────────────────────────────────── + st.markdown("---") + pct = int(total / max_score * 100) + score_color = "#4ade80" if pct >= 80 else ("#facc15" if pct >= 50 else "#ef4444") + grade = "A" if pct >= 90 else ("B" if pct >= 75 else ("C" if pct >= 50 else "F")) + + st.markdown( + f"""
+
{total} / {max_score}
+
Grade: {grade} ({pct}%)
+
""", + unsafe_allow_html=True, + ) + + if total == max_score: + st.balloons() + st.success("🎉 Perfect score! All checks passed.") + elif total >= 14: + st.info(f"Good — {total}/{max_score} points. Fix failing checks to reach 100%.") + else: + st.error(f"Only {total}/{max_score} points. Run seed_graph.py and verify your Neo4j connection.") diff --git a/submissions/Dia-Vats/level6/requirements.txt b/submissions/Dia-Vats/level6/requirements.txt new file mode 100644 index 000000000..845cdad31 --- /dev/null +++ b/submissions/Dia-Vats/level6/requirements.txt @@ -0,0 +1,6 @@ +streamlit>=1.35.0 +neo4j>=5.19.0 +pandas>=2.2.0 +plotly>=5.22.0 +python-dotenv>=1.0.1 +numpy>=1.26.0 \ No newline at end of file diff --git a/submissions/Dia-Vats/level6/seed_graph.py b/submissions/Dia-Vats/level6/seed_graph.py new file mode 100644 index 000000000..3c238aab0 --- /dev/null +++ b/submissions/Dia-Vats/level6/seed_graph.py @@ -0,0 +1,386 @@ +""" +seed_graph.py — Swedish Steel Factory Graph Seeder +Author: Dia Vats +Description: + Reads factory_production.csv, factory_workers.csv, factory_capacity.csv + and seeds a Neo4j graph database that exactly follows the schema defined + in schema.md / schema.png. + + Safe to re-run: uses MERGE everywhere, never CREATE. + WorkOrder IDs: {project_id}_{station_code}_{week}_{product_type} + e.g. P01_011_w1_IQB +""" + +import os +import csv +from itertools import groupby +from dotenv import load_dotenv +from neo4j import GraphDatabase + +# --------------------------------------------------------------------------- +# Configuration +# --------------------------------------------------------------------------- +load_dotenv() + +NEO4J_URI = os.getenv("NEO4J_URI", "bolt://localhost:7687") +NEO4J_USER = os.getenv("NEO4J_USER", "neo4j") +NEO4J_PASSWORD = os.getenv("NEO4J_PASSWORD", "password") + +BASE_DIR = os.path.dirname(os.path.abspath(__file__)) + +PRODUCTION_CSV = os.path.join(BASE_DIR, "data", "factory_production.csv") +WORKERS_CSV = os.path.join(BASE_DIR, "data", "factory_workers.csv") +CAPACITY_CSV = os.path.join(BASE_DIR, "data", "factory_capacity.csv") + +# Production flow order (exactly as specified) +STATION_FLOW = [ + "011", "012", "013", "014", "015", + "016", "017", "018", "019", "021" +] + +# --------------------------------------------------------------------------- +# Helpers +# --------------------------------------------------------------------------- + +def read_csv(path: str) -> list[dict]: + with open(path, newline="", encoding="utf-8") as f: + return list(csv.DictReader(f)) + + +def workorder_id(project_id: str, station_code: str, week: str, product_type: str) -> str: + return f"{project_id}_{station_code}_{week}_{product_type}" + + +def variance_pct(planned: float, actual: float) -> float: + if planned == 0: + return 0.0 + return round((actual - planned) / planned * 100, 2) + + +def is_bottleneck(planned: float, actual: float) -> bool: + return actual > planned * 1.1 + + +# --------------------------------------------------------------------------- +# Constraints +# --------------------------------------------------------------------------- + +def create_constraints(session): + print("Creating constraints …") + constraints = [ + "CREATE CONSTRAINT project_id IF NOT EXISTS FOR (n:Project) REQUIRE n.project_id IS UNIQUE", + "CREATE CONSTRAINT workorder_id IF NOT EXISTS FOR (n:WorkOrder) REQUIRE n.workorder_id IS UNIQUE", + "CREATE CONSTRAINT station_code IF NOT EXISTS FOR (n:Station) REQUIRE n.station_code IS UNIQUE", + "CREATE CONSTRAINT product_type IF NOT EXISTS FOR (n:Product) REQUIRE n.product_type IS UNIQUE", + "CREATE CONSTRAINT week_id IF NOT EXISTS FOR (n:Week) REQUIRE n.week_id IS UNIQUE", + "CREATE CONSTRAINT worker_id IF NOT EXISTS FOR (n:Worker) REQUIRE n.worker_id IS UNIQUE", + "CREATE CONSTRAINT certification_name IF NOT EXISTS FOR (n:Certification) REQUIRE n.name IS UNIQUE", + "CREATE CONSTRAINT capacity_week IF NOT EXISTS FOR (n:CapacitySnapshot) REQUIRE n.week_id IS UNIQUE", + "CREATE CONSTRAINT etapp_name IF NOT EXISTS FOR (n:Etapp) REQUIRE n.name IS UNIQUE", + ] + for cypher in constraints: + session.run(cypher) + print(" ✓ Constraints ready.") + + +# --------------------------------------------------------------------------- +# Seed: Project, WorkOrder, Station, Product, Week nodes + core relationships +# --------------------------------------------------------------------------- + +def seed_production(session, rows: list[dict]): + print("Seeding production data …") + + # Build sorted week list for FOLLOWS logic + week_sort = {"w1": 1, "w2": 2, "w3": 3, "w4": 4, "w5": 5, "w6": 6, "w7": 7, "w8": 8} + + for row in rows: + project_id = row["project_id"].strip() + project_number = row["project_number"].strip() + project_name = row["project_name"].strip() + product_type = row["product_type"].strip() + unit = row["unit"].strip() + unit_factor = float(row["unit_factor"]) + quantity = int(row["quantity"]) + station_code = row["station_code"].strip() + station_name = row["station_name"].strip() + etapp = row["etapp"].strip() + bop = row["bop"].strip() + week = row["week"].strip() + planned_hrs = float(row["planned_hours"]) + actual_hrs = float(row["actual_hours"]) + completed = int(row["completed_units"]) + + wo_id = workorder_id(project_id, station_code, week, product_type) + var = variance_pct(planned_hrs, actual_hrs) + bottleneck = is_bottleneck(planned_hrs, actual_hrs) + + session.run(""" + MERGE (p:Project {project_id: $project_id}) + SET p.project_number = $project_number, + p.project_name = $project_name, + p.etapp = $etapp, + p.bop = $bop + + MERGE (st:Station {station_code: $station_code}) + SET st.station_name = $station_name + + MERGE (pr:Product {product_type: $product_type}) + SET pr.unit = $unit, + pr.unit_factor = $unit_factor + + MERGE (wk:Week {week_id: $week}) + + MERGE (wo:WorkOrder {workorder_id: $wo_id}) + SET wo.planned_hours = $planned_hrs, + wo.actual_hours = $actual_hrs, + wo.completed_units = $completed, + wo.variance_pct = $var, + wo.is_bottleneck = $bottleneck, + wo.week = $week, + wo.station_code = $station_code, + wo.project_id = $project_id, + wo.product_type = $product_type + + MERGE (p)-[:HAS_WORKORDER]->(wo) + MERGE (wo)-[:AT_STATION]->(st) + MERGE (wo)-[:PRODUCES]->(pr) + + MERGE (wo)-[r:SCHEDULED_IN]->(wk) + SET r.planned_hours = $planned_hrs, + r.actual_hours = $actual_hrs, + r.completed_units = $completed + """, + project_id=project_id, + project_number=project_number, + project_name=project_name, + etapp=etapp, + bop=bop, + station_code=station_code, + station_name=station_name, + product_type=product_type, + unit=unit, + unit_factor=unit_factor, + week=week, + wo_id=wo_id, + planned_hrs=planned_hrs, + actual_hrs=actual_hrs, + completed=completed, + var=var, + bottleneck=bottleneck, + ) + + # (Project)-[:IN_ETAPP]->(Etapp) — ET1 and ET2 only + if etapp in ("ET1", "ET2"): + session.run(""" + MERGE (e:Etapp {name: $etapp}) + MERGE (p:Project {project_id: $project_id}) + MERGE (p)-[:IN_ETAPP]->(e) + """, etapp=etapp, project_id=project_id) + + print(f" ✓ Seeded {len(rows)} WorkOrder rows.") + + +# --------------------------------------------------------------------------- +# Seed: FEEDS_INTO between stations (production flow order) +# --------------------------------------------------------------------------- + +def seed_feeds_into(session): + print("Creating FEEDS_INTO relationships …") + for i in range(len(STATION_FLOW) - 1): + src = STATION_FLOW[i] + dst = STATION_FLOW[i + 1] + session.run(""" + MATCH (a:Station {station_code: $src}) + MATCH (b:Station {station_code: $dst}) + MERGE (a)-[:FEEDS_INTO]->(b) + """, src=src, dst=dst) + print(f" ✓ Created {len(STATION_FLOW)-1} FEEDS_INTO relationships.") + + +# --------------------------------------------------------------------------- +# Seed: FOLLOWS between WorkOrders (same project, same station, consecutive weeks) +# --------------------------------------------------------------------------- + +def seed_follows(session, rows: list[dict]): + print("Creating FOLLOWS relationships …") + week_order = {"w1": 1, "w2": 2, "w3": 3, "w4": 4, "w5": 5, "w6": 6, "w7": 7, "w8": 8} + + # Group by (project_id, station_code, product_type) + key = lambda r: (r["project_id"].strip(), r["station_code"].strip(), r["product_type"].strip()) + sorted_rows = sorted(rows, key=key) + + count = 0 + for grp_key, group in groupby(sorted_rows, key=key): + project_id, station_code, product_type = grp_key + group_list = sorted(list(group), key=lambda r: week_order.get(r["week"].strip(), 99)) + + for idx in range(len(group_list) - 1): + cur = group_list[idx] + nxt = group_list[idx + 1] + cur_week = cur["week"].strip() + nxt_week = nxt["week"].strip() + # Only link consecutive weeks + if week_order.get(nxt_week, 99) - week_order.get(cur_week, 0) == 1: + wo1 = workorder_id(project_id, station_code, cur_week, product_type) + wo2 = workorder_id(project_id, station_code, nxt_week, product_type) + session.run(""" + MATCH (a:WorkOrder {workorder_id: $wo1}) + MATCH (b:WorkOrder {workorder_id: $wo2}) + MERGE (a)-[:FOLLOWS]->(b) + """, wo1=wo1, wo2=wo2) + count += 1 + + print(f" ✓ Created {count} FOLLOWS relationships.") + + +# --------------------------------------------------------------------------- +# Seed: Workers, Certifications, ASSIGNED_TO, CAN_COVER, CERTIFIED_IN, REQUIRES +# --------------------------------------------------------------------------- + +def seed_workers(session, rows: list[dict]): + print("Seeding workers …") + for row in rows: + worker_id = row["worker_id"].strip() + name = row["name"].strip() + role = row["role"].strip() + primary_sta = row["primary_station"].strip() + can_cover = [s.strip() for s in row["can_cover_stations"].split(",")] + certs = [c.strip() for c in row["certifications"].split(",")] + hours_pw = int(row["hours_per_week"]) + wtype = row["type"].strip() + + session.run(""" + MERGE (w:Worker {worker_id: $worker_id}) + SET w.name = $name, + w.role = $role, + w.hours_per_week = $hours_pw, + w.type = $wtype + """, worker_id=worker_id, name=name, role=role, hours_pw=hours_pw, wtype=wtype) + + # ASSIGNED_TO primary station (skip "all" sentinel for Victor Elm) + if primary_sta != "all": + session.run(""" + MATCH (w:Worker {worker_id: $worker_id}) + MATCH (s:Station {station_code: $station_code}) + MERGE (w)-[:ASSIGNED_TO]->(s) + """, worker_id=worker_id, station_code=primary_sta) + + # CAN_COVER stations + for sc in can_cover: + if sc: + session.run(""" + MATCH (w:Worker {worker_id: $worker_id}) + MATCH (s:Station {station_code: $station_code}) + MERGE (w)-[:CAN_COVER]->(s) + """, worker_id=worker_id, station_code=sc) + + # CERTIFIED_IN certifications + for cert in certs: + if cert: + session.run(""" + MERGE (c:Certification {name: $cert}) + WITH c + MATCH (w:Worker {worker_id: $worker_id}) + MERGE (w)-[:CERTIFIED_IN]->(c) + """, cert=cert, worker_id=worker_id) + + # REQUIRES: link each station this worker covers to their certifications + for sc in can_cover: + if sc: + for cert in certs: + if cert: + session.run(""" + MATCH (s:Station {station_code: $station_code}) + MATCH (c:Certification {name: $cert}) + MERGE (s)-[:REQUIRES]->(c) + """, station_code=sc, cert=cert) + + print(f" ✓ Seeded {len(rows)} workers.") + + +# --------------------------------------------------------------------------- +# Seed: CapacitySnapshot + HAS_CAPACITY +# --------------------------------------------------------------------------- + +def seed_capacity(session, rows: list[dict]): + print("Seeding capacity snapshots …") + for row in rows: + week = row["week"].strip() + session.run(""" + MERGE (wk:Week {week_id: $week}) + + MERGE (cs:CapacitySnapshot {week_id: $week}) + SET cs.own_staff_count = $own_staff, + cs.hired_staff_count = $hired_staff, + cs.own_hours = $own_hours, + cs.hired_hours = $hired_hours, + cs.overtime_hours = $overtime, + cs.total_capacity = $total_cap, + cs.total_planned = $total_planned, + cs.deficit = $deficit + + MERGE (wk)-[:HAS_CAPACITY]->(cs) + """, + week=week, + own_staff=int(row["own_staff_count"]), + hired_staff=int(row["hired_staff_count"]), + own_hours=int(row["own_hours"]), + hired_hours=int(row["hired_hours"]), + overtime=int(row["overtime_hours"]), + total_cap=int(row["total_capacity"]), + total_planned=int(row["total_planned"]), + deficit=int(row["deficit"]), + ) + print(f" ✓ Seeded {len(rows)} CapacitySnapshot nodes.") + + +# --------------------------------------------------------------------------- +# Verification summary +# --------------------------------------------------------------------------- + +def print_summary(session): + print("\n--- Graph Summary ---") + for label in ["Project", "WorkOrder", "Station", "Product", "Week", + "Worker", "Certification", "CapacitySnapshot", "Etapp"]: + result = session.run(f"MATCH (n:{label}) RETURN count(n) AS cnt") + cnt = result.single()["cnt"] + print(f" {label}: {cnt}") + + result = session.run("MATCH ()-[r]->() RETURN count(r) AS cnt") + print(f" Relationships: {result.single()['cnt']}") + + result = session.run("CALL db.relationshipTypes() YIELD relationshipType RETURN collect(relationshipType)") + rels = result.single()[0] + print(f" Relationship types: {rels}") + print("---") + + +# --------------------------------------------------------------------------- +# Main +# --------------------------------------------------------------------------- + +def main(): + print(f"Connecting to Neo4j at {NEO4J_URI} …") + driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD)) + driver.verify_connectivity() + print(" ✓ Connected.") + + production_rows = read_csv(PRODUCTION_CSV) + workers_rows = read_csv(WORKERS_CSV) + capacity_rows = read_csv(CAPACITY_CSV) + + with driver.session() as session: + create_constraints(session) + seed_production(session, production_rows) + seed_feeds_into(session) + seed_follows(session, production_rows) + seed_workers(session, workers_rows) + seed_capacity(session, capacity_rows) + print_summary(session) + + driver.close() + print("\n✅ Graph seeding complete. Safe to re-run.") + + +if __name__ == "__main__": + main()