Skip to content

KaparthyReddy/fedcrop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fedcrop

A privacy-preserving federated learning system for crop yield prediction. Ten geographically distributed farm nodes — each growing a different crop in a different climate zone — collaboratively train a shared neural network without ever sharing raw sensor data. Implements FedAvg and the Gaussian differential privacy mechanism from scratch in NumPy.


Why federated learning for agriculture?

Farm data is commercially sensitive. Yield records, soil health, and input costs are competitive intelligence that farmers will not share with a central server. Federated learning solves this: each farm trains locally on its own data, and only model weight updates (never raw observations) are sent to a coordination server. The server aggregates these into a global model that benefits from the collective knowledge of the entire network.

This matters because a single farm's data is rarely enough to build a reliable model — especially for rare stress events (drought years, disease outbreaks) that happen infrequently at any one location but are well-represented across a network.


Architecture

  Farm 0 (Wheat / Temperate)   Farm 1 (Rice / Tropical)
  Farm 2 (Corn / Arid)         Farm 3 (Soybean / Mediterranean)
  ...                          Farm 9 (Rice / Tropical)
         │                           │
         │  weights only (never data)│
         ▼                           ▼
  ┌──────────────────────────────────────────┐
  │            FedAvg Server                 │
  │  w_global ← Σ (n_k / n_total) * w_k     │
  └──────────────────────────────────────────┘
         │
         │  broadcast global weights
         ▼
    next round of local training

Each round:

  1. Server broadcasts the current global weights to all farm clients
  2. Each farm trains locally for 20 epochs on its private sensor data
  3. Each farm optionally applies Gaussian DP noise to its weights before sending
  4. Server computes the sample-weighted average of all received weights
  5. Repeat for 60 rounds

What each module does

fedcrop/
├── main.py              # Orchestrates the full experiment
└── fedcrop/
    ├── farm_data.py     # Sensor data simulator for 4 crop types × 4 climate zones
    ├── mlp.py           # Two-layer MLP (NumPy only) with Adam optimizer
    ├── privacy.py       # Gaussian mechanism: clip weights + calibrated noise
    ├── client.py        # Farm node: receive weights → train locally → send back
    ├── server.py        # FedAvg aggregation: weighted average of client updates
    └── __init__.py

farm_data.py — Generates 300 growing seasons per farm using realistic biophysical models. Each season produces 120 daily observations of temperature, humidity, rainfall, soil moisture, soil pH, and NDVI. These are summarised into 24 statistics (mean, std, min, max × 6 sensors) as the feature vector. Yield is computed from crop-specific stress functions (temperature stress, water score, NDVI score, pH score), making the task genuinely non-trivial.

mlp.py — A two-hidden-layer MLP (24→64→32→1) with ReLU activations, He initialisation, and Adam optimiser, implemented entirely in NumPy. Weights are exposed as a flat vector for FedAvg aggregation.

privacy.py — Implements the Gaussian mechanism from Abadi et al. (2016): clips the weight vector to an L2 norm bound C, then adds noise calibrated to (ε, δ)-differential privacy. Also computes the composed privacy budget across all rounds using advanced composition.

client.py — Encapsulates the farm node logic. Maintains a local 80/20 train/val split that never leaves the node. Applies DP noise to outgoing weights if enabled.

server.py — Pure aggregation: computes the sample-count-weighted mean of incoming weight vectors. Records per-round statistics for analysis.


Results

  ──────────────────────────────────────────────────────────
  Summary — RMSE Comparison (lower is better)
  ──────────────────────────────────────────────────────────
  Method                         Final RMSE    vs Local
  ────────────────────────────────────────────────────
  Local-Only (no federation)       0.8297 t/ha   baseline
  Federated (FedAvg)               0.5326 t/ha   +35.8%
  Federated + DP  (ε=2.0)          0.7955 t/ha    +4.1%

Federated learning reduces prediction error by 35.8% over isolated local training. The federation helps most for farms with high-variance crops (corn in arid climates, RMSE 0.9248 locally) that benefit from shared knowledge about neighbouring crop types and climate patterns.

Adding differential privacy (ε=2.0) introduces noise that raises RMSE from 0.53 to 0.80 — still a 4% improvement over local-only — demonstrating the classic privacy-utility tradeoff. The noisy DP rounds show characteristic oscillation rather than smooth descent, as expected from the Gaussian mechanism.

Convergence (Federated, no DP)

  Round  1:  RMSE 3.355  ████████████████████████████████████████
  Round  5:  RMSE 2.725  ████████████████████████████████████████
  Round 10:  RMSE 1.944  █████████████████████████████████████░░░
  Round 15:  RMSE 1.206  ██████████████████████████████████████░░
  Round 20:  RMSE 0.768  ████████████████████████░░░░░░░░░░░░░░░░
  Round 25:  RMSE 0.591  ██████████████████░░░░░░░░░░░░░░░░░░░░░░
  Round 30:  RMSE 0.547  █████████████████░░░░░░░░░░░░░░░░░░░░░░░
  Round 40:  RMSE 0.535  █████████████████░░░░░░░░░░░░░░░░░░░░░░░
  Round 55:  RMSE 0.533  █████████████████░░░░░░░░░░░░░░░░░░░░░░░
  Round 60:  RMSE 0.533  █████████████████░░░░░░░░░░░░░░░░░░░░░░░

The model converges in ~30 rounds and plateaus cleanly, indicating a stable global optimum has been reached across all 10 heterogeneous farm distributions.

Per-farm local baseline (no federation)

  Farm  0 │ wheat    │ temperate      │  0.7355 t/ha
  Farm  1 │ rice     │ tropical       │  0.7789 t/ha
  Farm  2 │ corn     │ arid           │  0.9058 t/ha
  Farm  3 │ soybean  │ mediterranean  │  0.7421 t/ha
  Farm  4 │ wheat    │ temperate      │  0.8423 t/ha
  Farm  5 │ rice     │ tropical       │  0.9358 t/ha
  Farm  6 │ corn     │ arid           │  0.9248 t/ha
  Farm  7 │ soybean  │ mediterranean  │  0.7090 t/ha
  Farm  8 │ wheat    │ temperate      │  0.7534 t/ha
  Farm  9 │ rice     │ tropical       │  0.9694 t/ha
  ─────────────────────────────────────────────────
  Mean                                  0.8297 t/ha

Privacy Budget

  Mechanism            : Gaussian (Abadi et al., 2016)
  Per-round ε          : 2.0
  Per-round δ          : 1e-5
  Rounds               : 60
  Composed ε (60 rounds): 52.565   ← advanced composition theorem
  Composed δ           : 0.0006

The per-round budget of ε=2.0 is a reasonable starting point for demonstration. Tighter guarantees (ε<1.0) would require more rounds or more farms to compensate for the larger noise injection.


Tech Stack

Component Implementation
Neural network NumPy from scratch — no PyTorch/TensorFlow
Optimiser Adam (β₁=0.9, β₂=0.999)
Aggregation FedAvg (McMahan et al., 2017)
Privacy Gaussian mechanism (Abadi et al., 2016)
Data simulation Biophysical crop-climate model
Runtime Pure Python 3.11, single dependency (NumPy)

How to Run

Docker (recommended)

docker build -t fedcrop .
docker run --rm fedcrop

Local

pip install -r requirements.txt
python main.py

Configuration (in main.py)

Parameter Default Effect
N_FARMS 10 Number of federated farm nodes
N_SEASONS 300 Training samples per farm
FL_ROUNDS 60 Federation rounds
LOCAL_EPOCHS 20 Local training epochs per round
USE_DP True Enable differential privacy
DP_EPSILON 2.0 Per-round privacy budget
DP_DELTA 1e-5 DP failure probability

References

  • McMahan et al., "Communication-Efficient Learning of Deep Networks from Decentralized Data", AISTATS 2017 — FedAvg algorithm
  • Abadi et al., "Deep Learning with Differential Privacy", CCS 2016 — Gaussian mechanism

About

Privacy-preserving federated learning for crop yield prediction — FedAvg + differential privacy across distributed farm nodes, no raw data ever shared

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors