Skip to content

roulbac/krayne

Repository files navigation

Krayne

CLI and SDK for creating, managing, and scaling Ray clusters on Kubernetes.

Krayne wraps the KubeRay operator behind a clean, opinionated interface so ML practitioners can get distributed compute without touching Kubernetes manifests.

A fast and intuitive terminal TUI (Terminal User Interface) is also available.

ikrayne demo

Navigate clusters, create with prefilled forms, scale, delete, and toggle tunnels — all with keyboard shortcuts. See the Interactive TUI guide for details.

Quickstart

pip install krayne

1. Connect Krayne to your Kubernetes cluster

krayne init picks a kubeconfig and context and saves them to ~/.krayne/config.yaml. Run it once after installing — every other command reads from that file.

krayne init

By default this prompts you to choose between ~/.kube/config, the local sandbox kubeconfig, and a custom path. To skip the prompts (e.g. in CI):

krayne init --kubeconfig ~/.kube/config --context my-context

Don't have a Kubernetes cluster handy? Run krayne sandbox setup first to spin up a local k3s cluster with KubeRay pre-installed (Docker required).

2. Create a cluster

Pick whichever entrypoint suits you — they all produce the same Ray cluster. Pass -n/--namespace (or the namespace= field in ClusterConfig) to target a specific Kubernetes namespace; it defaults to default.

CLI:

krayne create my-cluster -n ml-team --gpus-per-worker 1 --workers 2

TUI — press c in the explorer to open the create form:

krayne tui

Python SDK:

from krayne.api import create_cluster
from krayne.config import ClusterConfig, WorkerGroupConfig

config = ClusterConfig(
    name="my-cluster",
    namespace="ml-team",
    worker_groups=[WorkerGroupConfig(replicas=2, gpus=1)],
)
create_cluster(config, wait=True)

3. Run a Ray job against it

Recommended: krayne submit. It opens a tunnel if one isn't already up, then wraps ray job submit so your script's driver runs inside the cluster — no Python version match required, no ray.init glue:

krayne submit demo.py --cluster my-cluster -n ml-team

Add --no-wait to skip log tailing, or pass -- arg1 arg2 … to forward arguments to the script. See docs/reference/cli.md for the full reference.

Advanced: Ray Client (ray.init("ray://…")) — strict version match required

[!WARNING] Ray Client requires the exact same Python major.minor.patch and Ray version on your laptop as in the cluster image. A single patch difference (e.g. 3.12.6 vs 3.12.9) is rejected at handshake. This is a known Ray pain point, not specific to krayne. Only use this path if you've pinned your local interpreter to match rayproject/ray:<ver>-pyXY.

open_tunnel opens port-forward tunnels to the cluster's services so ray.init can reach the head node from your laptop, and closes them on exit:

import ray
from krayne.api import open_tunnel

with open_tunnel("my-cluster", "ml-team") as session:
    ray.init(session.client_url)   # ray://localhost:...

    @ray.remote
    def hello(i: int) -> str:
        return f"Hello from worker {i}"

    print(ray.get([hello.remote(i) for i in range(4)]))
    ray.shutdown()
# tunnels closed when the block exits

When you're done, krayne delete my-cluster -n ml-team (or delete_cluster("my-cluster", "ml-team") from the SDK) tears the cluster down.

Interactive TUI

Krayne includes a k9s-style interactive terminal UI:

krayne tui

Or run it directly without installing: uvx krayne tui

Features

  • Zero-config defaults — every command works with no flags. Sensible defaults get you a working cluster instantly.
  • CLI and SDK — the CLI is a thin shell over the Python SDK. Anything you do from the terminal, you can do from code.
  • Interactive TUI — k9s-style terminal UI for keyboard-driven cluster management.
  • Functional API — stateless free functions, not class hierarchies. Easy to test, easy to compose.
  • Pydantic config — validated configuration with YAML override support. No silent failures.
  • Rich output — beautiful terminal tables via Rich, with --output json for scripting.

CLI Overview

krayne init               Select kubeconfig + context (run once after install)
krayne create <name>      Create a new Ray cluster
krayne get                List clusters in a namespace
krayne describe <name>    Show detailed cluster info
krayne scale <name>       Scale a worker group
krayne delete <name>      Delete a cluster
krayne tui                Launch interactive TUI

All commands support -n/--namespace, --output json, and --debug flags.

Documentation

Full documentation is available at the Krayne docs site.

Requirements

  • Python 3.10+
  • A Kubernetes cluster with the KubeRay operator installed
  • A valid kubeconfig (or running inside the cluster)

Development

# Clone and install
git clone https://github.com/roulbac/krayne.git
cd krayne
uv sync

# Run tests
uv run pytest

# Run integration tests (sandbox is provisioned automatically by test fixtures)
uv run pytest -m integration

Acknowledgements

Krayne is inspired by Spotify-Ray (sp-ray), Spotify's internal platform for running Ray on Kubernetes. The sp-ray team demonstrated that a CLI and SDK with sensible defaults, progressive disclosure of complexity, and managed KubeRay infrastructure can let ML practitioners focus on business logic instead of Kubernetes manifests. Krayne follows this philosophy as an open-source tool for the broader community.

License

Apache 2.0

About

A Python SDK and CLI to create and manage Ray resources programatically

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages