Skip to content

buddinikhilesh/k8s-reliability-operator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

k8s-reliability-operator

Golang Kubernetes operator for automated reliability enforcement. Monitors distributed workloads, detects stalled jobs, and triggers fault-recovery actions automatically.

What this solves

  • Detects stalled long-running distributed jobs before they breach SLOs
  • Auto-restarts failed workloads up to a configurable limit
  • Escalates to on-call when auto-recovery is exhausted
  • Runs controlled Chaos Engineering experiments safely
  • Tracks DORA metrics across all managed workloads

Architecture

About

Golang Kubernetes operator for automated reliability enforcement — stall detection and fault recovery for distributed workloads

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages