A privacy-preserving federated learning system for crop yield prediction. Ten geographically distributed farm nodes — each growing a different crop in a different climate zone — collaboratively train a shared neural network without ever sharing raw sensor data. Implements FedAvg and the Gaussian differential privacy mechanism from scratch in NumPy.
Farm data is commercially sensitive. Yield records, soil health, and input costs are competitive intelligence that farmers will not share with a central server. Federated learning solves this: each farm trains locally on its own data, and only model weight updates (never raw observations) are sent to a coordination server. The server aggregates these into a global model that benefits from the collective knowledge of the entire network.
This matters because a single farm's data is rarely enough to build a reliable model — especially for rare stress events (drought years, disease outbreaks) that happen infrequently at any one location but are well-represented across a network.
Farm 0 (Wheat / Temperate) Farm 1 (Rice / Tropical)
Farm 2 (Corn / Arid) Farm 3 (Soybean / Mediterranean)
... Farm 9 (Rice / Tropical)
│ │
│ weights only (never data)│
▼ ▼
┌──────────────────────────────────────────┐
│ FedAvg Server │
│ w_global ← Σ (n_k / n_total) * w_k │
└──────────────────────────────────────────┘
│
│ broadcast global weights
▼
next round of local training
Each round:
- Server broadcasts the current global weights to all farm clients
- Each farm trains locally for 20 epochs on its private sensor data
- Each farm optionally applies Gaussian DP noise to its weights before sending
- Server computes the sample-weighted average of all received weights
- Repeat for 60 rounds
fedcrop/
├── main.py # Orchestrates the full experiment
└── fedcrop/
├── farm_data.py # Sensor data simulator for 4 crop types × 4 climate zones
├── mlp.py # Two-layer MLP (NumPy only) with Adam optimizer
├── privacy.py # Gaussian mechanism: clip weights + calibrated noise
├── client.py # Farm node: receive weights → train locally → send back
├── server.py # FedAvg aggregation: weighted average of client updates
└── __init__.py
farm_data.py — Generates 300 growing seasons per farm using realistic biophysical models. Each season produces 120 daily observations of temperature, humidity, rainfall, soil moisture, soil pH, and NDVI. These are summarised into 24 statistics (mean, std, min, max × 6 sensors) as the feature vector. Yield is computed from crop-specific stress functions (temperature stress, water score, NDVI score, pH score), making the task genuinely non-trivial.
mlp.py — A two-hidden-layer MLP (24→64→32→1) with ReLU activations, He initialisation, and Adam optimiser, implemented entirely in NumPy. Weights are exposed as a flat vector for FedAvg aggregation.
privacy.py — Implements the Gaussian mechanism from Abadi et al. (2016): clips the weight vector to an L2 norm bound C, then adds noise calibrated to (ε, δ)-differential privacy. Also computes the composed privacy budget across all rounds using advanced composition.
client.py — Encapsulates the farm node logic. Maintains a local 80/20 train/val split that never leaves the node. Applies DP noise to outgoing weights if enabled.
server.py — Pure aggregation: computes the sample-count-weighted mean of incoming weight vectors. Records per-round statistics for analysis.
──────────────────────────────────────────────────────────
Summary — RMSE Comparison (lower is better)
──────────────────────────────────────────────────────────
Method Final RMSE vs Local
────────────────────────────────────────────────────
Local-Only (no federation) 0.8297 t/ha baseline
Federated (FedAvg) 0.5326 t/ha +35.8%
Federated + DP (ε=2.0) 0.7955 t/ha +4.1%
Federated learning reduces prediction error by 35.8% over isolated local training. The federation helps most for farms with high-variance crops (corn in arid climates, RMSE 0.9248 locally) that benefit from shared knowledge about neighbouring crop types and climate patterns.
Adding differential privacy (ε=2.0) introduces noise that raises RMSE from 0.53 to 0.80 — still a 4% improvement over local-only — demonstrating the classic privacy-utility tradeoff. The noisy DP rounds show characteristic oscillation rather than smooth descent, as expected from the Gaussian mechanism.
Round 1: RMSE 3.355 ████████████████████████████████████████
Round 5: RMSE 2.725 ████████████████████████████████████████
Round 10: RMSE 1.944 █████████████████████████████████████░░░
Round 15: RMSE 1.206 ██████████████████████████████████████░░
Round 20: RMSE 0.768 ████████████████████████░░░░░░░░░░░░░░░░
Round 25: RMSE 0.591 ██████████████████░░░░░░░░░░░░░░░░░░░░░░
Round 30: RMSE 0.547 █████████████████░░░░░░░░░░░░░░░░░░░░░░░
Round 40: RMSE 0.535 █████████████████░░░░░░░░░░░░░░░░░░░░░░░
Round 55: RMSE 0.533 █████████████████░░░░░░░░░░░░░░░░░░░░░░░
Round 60: RMSE 0.533 █████████████████░░░░░░░░░░░░░░░░░░░░░░░
The model converges in ~30 rounds and plateaus cleanly, indicating a stable global optimum has been reached across all 10 heterogeneous farm distributions.
Farm 0 │ wheat │ temperate │ 0.7355 t/ha
Farm 1 │ rice │ tropical │ 0.7789 t/ha
Farm 2 │ corn │ arid │ 0.9058 t/ha
Farm 3 │ soybean │ mediterranean │ 0.7421 t/ha
Farm 4 │ wheat │ temperate │ 0.8423 t/ha
Farm 5 │ rice │ tropical │ 0.9358 t/ha
Farm 6 │ corn │ arid │ 0.9248 t/ha
Farm 7 │ soybean │ mediterranean │ 0.7090 t/ha
Farm 8 │ wheat │ temperate │ 0.7534 t/ha
Farm 9 │ rice │ tropical │ 0.9694 t/ha
─────────────────────────────────────────────────
Mean 0.8297 t/ha
Mechanism : Gaussian (Abadi et al., 2016)
Per-round ε : 2.0
Per-round δ : 1e-5
Rounds : 60
Composed ε (60 rounds): 52.565 ← advanced composition theorem
Composed δ : 0.0006
The per-round budget of ε=2.0 is a reasonable starting point for demonstration. Tighter guarantees (ε<1.0) would require more rounds or more farms to compensate for the larger noise injection.
| Component | Implementation |
|---|---|
| Neural network | NumPy from scratch — no PyTorch/TensorFlow |
| Optimiser | Adam (β₁=0.9, β₂=0.999) |
| Aggregation | FedAvg (McMahan et al., 2017) |
| Privacy | Gaussian mechanism (Abadi et al., 2016) |
| Data simulation | Biophysical crop-climate model |
| Runtime | Pure Python 3.11, single dependency (NumPy) |
docker build -t fedcrop .
docker run --rm fedcroppip install -r requirements.txt
python main.py| Parameter | Default | Effect |
|---|---|---|
N_FARMS |
10 | Number of federated farm nodes |
N_SEASONS |
300 | Training samples per farm |
FL_ROUNDS |
60 | Federation rounds |
LOCAL_EPOCHS |
20 | Local training epochs per round |
USE_DP |
True | Enable differential privacy |
DP_EPSILON |
2.0 | Per-round privacy budget |
DP_DELTA |
1e-5 | DP failure probability |
- McMahan et al., "Communication-Efficient Learning of Deep Networks from Decentralized Data", AISTATS 2017 — FedAvg algorithm
- Abadi et al., "Deep Learning with Differential Privacy", CCS 2016 — Gaussian mechanism