fedcrop

A privacy-preserving federated learning system for crop yield prediction. Ten geographically distributed farm nodes — each growing a different crop in a different climate zone — collaboratively train a shared neural network without ever sharing raw sensor data. Implements FedAvg and the Gaussian differential privacy mechanism from scratch in NumPy.

Why federated learning for agriculture?

Farm data is commercially sensitive. Yield records, soil health, and input costs are competitive intelligence that farmers will not share with a central server. Federated learning solves this: each farm trains locally on its own data, and only model weight updates (never raw observations) are sent to a coordination server. The server aggregates these into a global model that benefits from the collective knowledge of the entire network.

This matters because a single farm's data is rarely enough to build a reliable model — especially for rare stress events (drought years, disease outbreaks) that happen infrequently at any one location but are well-represented across a network.

Architecture

  Farm 0 (Wheat / Temperate)   Farm 1 (Rice / Tropical)
  Farm 2 (Corn / Arid)         Farm 3 (Soybean / Mediterranean)
  ...                          Farm 9 (Rice / Tropical)
         │                           │
         │  weights only (never data)│
         ▼                           ▼
  ┌──────────────────────────────────────────┐
  │            FedAvg Server                 │
  │  w_global ← Σ (n_k / n_total) * w_k     │
  └──────────────────────────────────────────┘
         │
         │  broadcast global weights
         ▼
    next round of local training

Each round:

Server broadcasts the current global weights to all farm clients
Each farm trains locally for 20 epochs on its private sensor data
Each farm optionally applies Gaussian DP noise to its weights before sending
Server computes the sample-weighted average of all received weights
Repeat for 60 rounds

What each module does

fedcrop/
├── main.py              # Orchestrates the full experiment
└── fedcrop/
    ├── farm_data.py     # Sensor data simulator for 4 crop types × 4 climate zones
    ├── mlp.py           # Two-layer MLP (NumPy only) with Adam optimizer
    ├── privacy.py       # Gaussian mechanism: clip weights + calibrated noise
    ├── client.py        # Farm node: receive weights → train locally → send back
    ├── server.py        # FedAvg aggregation: weighted average of client updates
    └── __init__.py

farm_data.py — Generates 300 growing seasons per farm using realistic biophysical models. Each season produces 120 daily observations of temperature, humidity, rainfall, soil moisture, soil pH, and NDVI. These are summarised into 24 statistics (mean, std, min, max × 6 sensors) as the feature vector. Yield is computed from crop-specific stress functions (temperature stress, water score, NDVI score, pH score), making the task genuinely non-trivial.

mlp.py — A two-hidden-layer MLP (24→64→32→1) with ReLU activations, He initialisation, and Adam optimiser, implemented entirely in NumPy. Weights are exposed as a flat vector for FedAvg aggregation.

privacy.py — Implements the Gaussian mechanism from Abadi et al. (2016): clips the weight vector to an L2 norm bound C, then adds noise calibrated to (ε, δ)-differential privacy. Also computes the composed privacy budget across all rounds using advanced composition.

client.py — Encapsulates the farm node logic. Maintains a local 80/20 train/val split that never leaves the node. Applies DP noise to outgoing weights if enabled.

server.py — Pure aggregation: computes the sample-count-weighted mean of incoming weight vectors. Records per-round statistics for analysis.

Results

  ──────────────────────────────────────────────────────────
  Summary — RMSE Comparison (lower is better)
  ──────────────────────────────────────────────────────────
  Method                         Final RMSE    vs Local
  ────────────────────────────────────────────────────
  Local-Only (no federation)       0.8297 t/ha   baseline
  Federated (FedAvg)               0.5326 t/ha   +35.8%
  Federated + DP  (ε=2.0)          0.7955 t/ha    +4.1%

Federated learning reduces prediction error by 35.8% over isolated local training. The federation helps most for farms with high-variance crops (corn in arid climates, RMSE 0.9248 locally) that benefit from shared knowledge about neighbouring crop types and climate patterns.

Adding differential privacy (ε=2.0) introduces noise that raises RMSE from 0.53 to 0.80 — still a 4% improvement over local-only — demonstrating the classic privacy-utility tradeoff. The noisy DP rounds show characteristic oscillation rather than smooth descent, as expected from the Gaussian mechanism.

Convergence (Federated, no DP)

  Round  1:  RMSE 3.355  ████████████████████████████████████████
  Round  5:  RMSE 2.725  ████████████████████████████████████████
  Round 10:  RMSE 1.944  █████████████████████████████████████░░░
  Round 15:  RMSE 1.206  ██████████████████████████████████████░░
  Round 20:  RMSE 0.768  ████████████████████████░░░░░░░░░░░░░░░░
  Round 25:  RMSE 0.591  ██████████████████░░░░░░░░░░░░░░░░░░░░░░
  Round 30:  RMSE 0.547  █████████████████░░░░░░░░░░░░░░░░░░░░░░░
  Round 40:  RMSE 0.535  █████████████████░░░░░░░░░░░░░░░░░░░░░░░
  Round 55:  RMSE 0.533  █████████████████░░░░░░░░░░░░░░░░░░░░░░░
  Round 60:  RMSE 0.533  █████████████████░░░░░░░░░░░░░░░░░░░░░░░

The model converges in ~30 rounds and plateaus cleanly, indicating a stable global optimum has been reached across all 10 heterogeneous farm distributions.

Per-farm local baseline (no federation)

  Farm  0 │ wheat    │ temperate      │  0.7355 t/ha
  Farm  1 │ rice     │ tropical       │  0.7789 t/ha
  Farm  2 │ corn     │ arid           │  0.9058 t/ha
  Farm  3 │ soybean  │ mediterranean  │  0.7421 t/ha
  Farm  4 │ wheat    │ temperate      │  0.8423 t/ha
  Farm  5 │ rice     │ tropical       │  0.9358 t/ha
  Farm  6 │ corn     │ arid           │  0.9248 t/ha
  Farm  7 │ soybean  │ mediterranean  │  0.7090 t/ha
  Farm  8 │ wheat    │ temperate      │  0.7534 t/ha
  Farm  9 │ rice     │ tropical       │  0.9694 t/ha
  ─────────────────────────────────────────────────
  Mean                                  0.8297 t/ha

Privacy Budget

  Mechanism            : Gaussian (Abadi et al., 2016)
  Per-round ε          : 2.0
  Per-round δ          : 1e-5
  Rounds               : 60
  Composed ε (60 rounds): 52.565   ← advanced composition theorem
  Composed δ           : 0.0006

The per-round budget of ε=2.0 is a reasonable starting point for demonstration. Tighter guarantees (ε<1.0) would require more rounds or more farms to compensate for the larger noise injection.

Tech Stack

Component	Implementation
Neural network	NumPy from scratch — no PyTorch/TensorFlow
Optimiser	Adam (β₁=0.9, β₂=0.999)
Aggregation	FedAvg (McMahan et al., 2017)
Privacy	Gaussian mechanism (Abadi et al., 2016)
Data simulation	Biophysical crop-climate model
Runtime	Pure Python 3.11, single dependency (NumPy)

How to Run

Docker (recommended)

docker build -t fedcrop .
docker run --rm fedcrop

Local

pip install -r requirements.txt
python main.py

Configuration (in `main.py`)

Parameter	Default	Effect
`N_FARMS`	10	Number of federated farm nodes
`N_SEASONS`	300	Training samples per farm
`FL_ROUNDS`	60	Federation rounds
`LOCAL_EPOCHS`	20	Local training epochs per round
`USE_DP`	True	Enable differential privacy
`DP_EPSILON`	2.0	Per-round privacy budget
`DP_DELTA`	1e-5	DP failure probability

References

McMahan et al., "Communication-Efficient Learning of Deep Networks from Decentralized Data", AISTATS 2017 — FedAvg algorithm
Abadi et al., "Deep Learning with Differential Privacy", CCS 2016 — Gaussian mechanism

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
fedcrop		fedcrop
Dockerfile		Dockerfile
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fedcrop

Why federated learning for agriculture?

Architecture

What each module does

Results

Convergence (Federated, no DP)

Per-farm local baseline (no federation)

Privacy Budget

Tech Stack

How to Run

Docker (recommended)

Local

Configuration (in `main.py`)

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

fedcrop

Why federated learning for agriculture?

Architecture

What each module does

Results

Convergence (Federated, no DP)

Per-farm local baseline (no federation)

Privacy Budget

Tech Stack

How to Run

Docker (recommended)

Local

Configuration (in main.py)

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Configuration (in `main.py`)

Packages