Skip to content

feat: add init and upload in kernel-builder cli#378

Open
drbh wants to merge 22 commits intomainfrom
kernel-builder-init-command
Open

feat: add init and upload in kernel-builder cli#378
drbh wants to merge 22 commits intomainfrom
kernel-builder-init-command

Conversation

@drbh
Copy link
Collaborator

@drbh drbh commented Mar 18, 2026

This PR is a WIP to add the init command to the kernel-builder rust cli. This branch mainly ports the init command from the python cli, with some improvements to the interface and updates to the template

This PR adds an the init and upload command to the rust kernel-builder cli similar to implementation in the current python cli. It also adds a build command that is a facade for the nix run so devs can interact with a single tool for the full creation lifecycle of a kernel.

Mainly it aims to have better defaults to make the commands easier to use

  1. the init command only requires the name of the kernels (folder) and the owner is pulled via a whoami request if the user is already logged into hf
  2. the init command creates a new dir if a name is specified, if called within a dir, the cli will use the dir name as the kernel name
  3. the upload command does not require a repo id if the id can be read from the build
  4. the upload command does not require a path to the build folder and will default to looking for the build dir if no explicit path is specified

example usage

init from inside of a existing dir

mkdir some-kernel
cd some-kernel
kernel-builder init
# Downloading template from kernels-community/template...
# Initialized `drbh/some-kernelt` at /home/drbh/Projects/kernels/kernel-builder/some-kernel

init from outside of a dir

kernel-builder init some-kernel
# Downloading template from kernels-community/template...
# Initialized `drbh/some-kernelt` at /home/drbh/Projects/kernels/kernel-builder/some-kernel

build

kernel-builder build
# ...
# some-kernel-torch-ext> no Makefile or custom installCheckPhase, doing nothing
# some-kernel-torch-ext> Checking of ABI compatibility
# some-kernel-torch-ext> 🐍 Checking for compatibility with manylinux_2_28 and Python ABI version 3.9
# some-kernel-torch-ext> ✅ No compatibility issues found
# some-kernel-torch-ext> Checking loading kernel with get_kernel
# some-kernel-torch-ext> Check whether the kernel can be loaded with get-kernel: some_kernel
# some-kernel-torch-ext> Running phase: removeBytecodeHook
# some-kernel-torch-ext> Removing Python bytecode

upload

kernel-builder upload
# Found 7 build variant(s) in /home/drbh/Projects/kernels/kernel-builder/some-kernel/build
# Using branch `v1` (new)
# Uploading 36 operations...
# Kernel upload successful. Find the kernel at: https://hf.co/drbh/some-kernel

@drbh drbh changed the title feat: support init in builder cli feat: add init and upload in kernel-builder cli Mar 19, 2026
@drbh drbh force-pushed the kernel-builder-init-command branch from 534af0b to 368adad Compare March 19, 2026 15:25
@drbh drbh marked this pull request as ready for review March 19, 2026 16:42
@sayakpaul
Copy link
Member

sayakpaul commented Mar 20, 2026

Some high-level thoughts first:

the init command only requires the name of the kernels (folder) and the owner is pulled via a whoami request if the user is already logged into hf

Should we somehow inform the users about it and take confirmation before proceeding?

# Kernel upload successful. Find the kernel at: https://hf.co/drbh/some-kernel

Should this be Find the kernel at: https://hf.co/drbh/some-kernel/tree/{version}? This way, users can directly use that link to inspect files, etc.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor comments. No RUST expertise yet.

template/CARD.md Outdated
@@ -0,0 +1,38 @@
---
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


/// Nix flake target to run.
#[arg(long, default_value = "build-and-copy")]
pub target: BuildTarget,
Copy link
Member

@danieldk danieldk Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think top-level

kernel-builder build
kernel-builder build-and-copy
kernel-builder build-and-upload

are nicer, easier to discover. I think we can use #[command(flatten)] to share arguments between different build commands, something like:

// ...
Build {
  #[command(flatten)]
  pub common_build_args: CommonBuildArgs,
  // ...
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, updated in latest and added those three commands at the top level


/// Additional arguments passed through to `nix run`.
#[arg(last = true)]
pub nix_args: Vec<String>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we want this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea I guess if the user wants more advance flags they should call nix run directly. removed in latest

let bare_name = path.file_name().and_then(OsStr::to_str).ok_or_else(|| {
eyre::eyre!("Cannot determine directory name from `{path_str}`")
})?;
let owner = hf::whoami_username()?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

for (from, to) in replacements {
text = text.replace(from, to);
}
fs::write(&destination, text).wrap_err_with(|| {
Copy link
Member

@danieldk danieldk Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I possible (not sure if it supports everything needed), I think it would be nice to use the fileset API that is also used for generating pyproject files. The nice thing is that it is atomic, if something fails while still preparing the output, nothing is written.

I think we would have to extend it with the notion of files that should not be overwritten (if a project already exists and you only want to add new files). But I think making it impossible to have an operation half-completed is a nice feature.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yea thats a nice improvement, updated to use the file set api in the latest changes. this is a better ux and pattern so we avoid getting into a strange state

.wrap_err_with(|| format!("Cannot parse TOML in `{}`", build_toml_path.display()))?;

// Update [general].backends
if let Some(general) = document.get_mut("general").and_then(|v| v.as_table_mut()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget if we discussed this before, but it would be worthwhile putting jinja templates in the template, so that we can render everything editing TOML (but maybe parse + write to format).

This has the benefit that if we ever change the build.toml format, we don't have to change it both in the template and here in the code.

E.g., suppose that backends moves somewhere else, we have to change it in the template and here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes thats a great improvement. updated the kernel template to use jinja in the latest changes

Comment on lines +498 to +512
let mut impl_block = String::new();
for (i, (condition, device, ref_prefix)) in conditions.iter().enumerate() {
let directive = if i == 0 { "#if" } else { "#elif" };
impl_block.push_str(directive);
impl_block.push(' ');
impl_block.push_str(condition);
impl_block.push_str("\n ops.impl(\"");
impl_block.push_str(func_name);
impl_block.push_str("\", ");
impl_block.push_str(device);
impl_block.push_str(", ");
impl_block.push_str(ref_prefix);
impl_block.push_str(func_name);
impl_block.push_str(");\n");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be a jinja template with if blocks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to prefer jinja in latest

Comment on lines +526 to +554
let marker = "ops.def(\"";
let start = content.find(marker)? + marker.len();
let rest = &content[start..];
let end = rest.find('(')?;
Some(&rest[..end])
}

/// Replace `#if defined(CPU_KERNEL)...#endif` block with new content
fn replace_ifdef_block(content: &str, replacement: &str) -> String {
const START_MARKER: &str = "#if defined(CPU_KERNEL)";
const END_MARKER: &str = "#endif";

let Some(start) = content.find(START_MARKER) else {
return content.to_owned();
};

let search_region = &content[start..];
let Some(end_offset) = search_region.find(END_MARKER) else {
return content.to_owned();
};

let end = start + end_offset + END_MARKER.len();

let mut result = String::with_capacity(content.len());
result.push_str(&content[..start]);
result.push_str(replacement);
result.push_str(&content[end..]);
result
}
Copy link
Member

@danieldk danieldk Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be easier to maintain with a jinja template and some if blocks I think.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to prefer jinja in latest

/// Upload kernel build artifacts to the Hugging Face Hub.
Upload(UploadArgs),

#[command(hide = true)]
Copy link
Member

@danieldk danieldk Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this removes the subcommand from help. Sounds bad? Leftover from testing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed in latest

Copy link
Member

@danieldk danieldk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the upload subcommand now as well. Really excited to get batched uploads!

Comment on lines +22 to +23
#[arg(value_name = "KERNEL_DIR", default_value = ".")]
pub kernel_dir: PathBuf,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a small nit, I'd use

Suggested change
#[arg(value_name = "KERNEL_DIR", default_value = ".")]
pub kernel_dir: PathBuf,
#[arg(value_name = "KERNEL_DIR")]
pub kernel_dir: Option<PathBuf>,

It expresses the semantics a bit better, there is now a helper function to get the path from that. (No strong opinions though.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed this is a more idiomatic, updated in latest

let mut operations: Vec<CommitOperation> = Vec::new();

collect_benchmark_ops(&kernel_dir, &existing_files, is_new_branch, &mut operations)?;
collect_readme_ops(&kernel_dir, &mut operations);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

README should go to the main branch. I think probably the most elegant way is to make operations a HashMap<String, Vec<CommitOperation>> with the branch name as the key. That way we can have a nested loop to handle everything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For later: we may want to do the card generation and upload in one go as part of upload, since we cannot do the card generation in the Nix sandbox iff we need to fetch something from the main repo to do it (depends a bit on whether we still need to enumerate all the build variants or whether that information is handled by the Hub itself through the new kernel type, if the remaining information in the card is relatively static, we can handle it during build as well).

Not for this PR though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drbh I think the most sensible thing for now would be to keep the card contents limited to what we can fetch from the build (after the build is successful). We will render capabilities, frameworks etc. on the Hub side upon each commit, anyway. So, the card could contain an example snippet, available function interfaces, etc. This is all implemented in Python CLI.

Totally okay if we want to do the segregation in a separate PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then there is this other thing of always uploading the card to main revision.

}

// Collect benchmark file operations: add matching files, delete stale ones.
fn collect_benchmark_ops(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should call them commit_ops? I was initially a bit confused, because we already use the term ops for ops in the kernel itself. Adding the commit_ prefix may make it a bit clearer that we are referring to different ops here.

Suggested change
fn collect_benchmark_ops(
fn collect_benchmark_commit_ops(

Comment on lines +322 to +342
fn read_repo_id_from_build_toml(kernel_dir: &Path) -> Result<String> {
let build_toml_path = kernel_dir.join("build.toml");
let content = fs::read_to_string(&build_toml_path)
.wrap_err_with(|| format!("Cannot read `{}`", build_toml_path.display()))?;
let parsed: toml::Value = toml::from_str(&content)
.wrap_err_with(|| format!("Cannot parse `{}`", build_toml_path.display()))?;

parsed
.get("general")
.and_then(|g| g.get("hub"))
.and_then(|h| h.get("repo-id"))
.and_then(|v| v.as_str())
.filter(|s| !s.is_empty())
.map(|s| s.to_owned())
.ok_or_else(|| {
eyre::eyre!(
"No `general.hub.repo-id` in `{}`. Use --repo-id to specify it.",
build_toml_path.display()
)
})
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use util::parse_build. We could also implement a fn repo_id(&self) -> Option<&str>) on Build if it is helpful and then we probably do not need this function at all.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea thats much cleaner, updated in latest! thanks

Comment on lines +388 to +410
for variant in variants {
let metadata_path = variant.join("metadata.json");
let data = fs::read_to_string(&metadata_path)
.wrap_err_with(|| format!("Cannot read `{}`", metadata_path.display()))?;
let parsed: serde_json::Value = serde_json::from_str(&data)
.wrap_err_with(|| format!("Cannot parse `{}`", metadata_path.display()))?;
let version = parsed.get("version").and_then(|v| v.as_u64());
versions.insert(version);
}

if versions.len() > 1 {
let version_strs: Vec<String> = versions
.iter()
.map(|v| match v {
Some(n) => n.to_string(),
None => "none".to_owned(),
})
.collect();
bail!(
"Found multiple versions in build variants: {}",
version_strs.join(", ")
);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use pyproject::Metadata to deserialize metadata.json. If helpful, we could add a function similar to parse_build.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice point, added a parse_metadata function in latest commits.

Comment on lines +426 to +439
fn walk_recursive(dir: &Path, files: &mut Vec<PathBuf>) -> Result<()> {
for entry in
fs::read_dir(dir).wrap_err_with(|| format!("Cannot read directory `{}`", dir.display()))?
{
let entry = entry?;
let path = entry.path();
if path.is_dir() {
walk_recursive(&path, files)?;
} else {
files.push(path);
}
}
Ok(())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_dir follows symbolic links, so if a build directory contains a link like ln -s . foo, this will get stuck in an infinite loop (until it runs out of stack or heap). So we either need to skip symbolic links or we need loop detection. Maybe it's worth considering BurntSushi's walkdir, since it covers a lot of edge cases and doesn't follow symlinks by default. It also returns an iterator, which is more efficient than pushing everything into a Vec. Not that we care too much about performance here, but it is a nice property.

https://docs.rs/walkdir/latest/walkdir/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very good catch, updated to use walkdir in latest. thanks for pointing out

@drbh drbh force-pushed the kernel-builder-init-command branch from 57523a2 to f88776a Compare March 23, 2026 21:19
@drbh drbh marked this pull request as draft March 23, 2026 22:53
@drbh drbh force-pushed the kernel-builder-init-command branch from a50eff0 to f6b9889 Compare March 24, 2026 19:25
@drbh drbh marked this pull request as ready for review March 24, 2026 21:05
Copy link
Member

@danieldk danieldk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Added some more smaller comments.

let content = fs::read_to_string(&init_path).ok()?;

let re = Regex::new(r#"__all__\s*=\s*\[\s*([^\]]*)\s*\]"#).ok()?;
let list_content = re.captures(&content)?.get(1)?.as_str();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regex parsing is really fragile. Not for this PR, but we should use a Python AST parser. Maybe tree-sitter-python or whatever ruff/ty uses (if it's available from crates.io). Maybe it's worthwhile creating a ticket?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, I opt'ed to use https://crates.io/crates/rustpython-parser which is what ruff forked, only because ruff's parser has git dependencies and our use case is relatively simple

Comment on lines +17 to +23
/// Maximum number of Nix build jobs.
#[arg(long, default_value = "4")]
pub max_jobs: u32,

/// Number of CPU cores per build job.
#[arg(long, default_value = "4")]
pub cores: u32,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use Option, we want to use the default from nix.conf when the user has not specified any arguments. E.g. I have these turned on my builder for the number of cores/RAM available.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, agree that option is a better pattern (as noted earlier) thanks for catching. updated to prefer the NixArgs struct where these values are optional

use develop::{devshell, testshell};

mod build;
use build::{run_build, CommonBuildArgs};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
use build::{run_build, CommonBuildArgs};
use build::run_build;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed in latest changes

/// Build the kernel locally (alias for build-and-copy).
Build {
#[command(flatten)]
args: CommonBuildArgs,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use

struct NixArgs {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! updated to use this struct

repo_type,
} => {
let kernel_dir = build_args.kernel_dir.clone();
run_build(build_args, "build-and-copy")?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
run_build(build_args, "build-and-copy")?;

Remove, the idea behind build-and-upload is to upload directly from the Nix store without using duplicate storage for both the store version and the build version.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, updated to only call build then upload from the symlinked result dir

Commands::Init(args) => run_init(args),
Commands::Upload(args) => run_upload(args),
Commands::Build { args } | Commands::BuildAndCopy { args } => {
run_build(args, "build-and-copy")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would only do a nix build for this subcommand. This unnecessarily creates another copy, which can be very large (e.g. FA) and the kernels tools work with the result directory/symlink as well (at least kernels upload).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, updated in latest

.filter_map(|e| e.ok())
.filter(|e| e.file_type().is_file())
.map(|e| e.into_path())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably copy the upload test from the Python tests?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, copied in latest changes

@drbh drbh force-pushed the kernel-builder-init-command branch from 3ed8a0f to 63dbeb1 Compare March 25, 2026 18:52
Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments regarding upload and the card components. Additionally:

  • Can the users optionally specify revision and repo_id in upload? I guess we always upload to a revision during our uploads (performed after the build) to respect our versioning system.
  • I see we have init_e2e.rs tests. But I think we should also have tests for upload and the card generation bits, preferably mimicking how we do it in the current Python CLI?
  • Do we want to update the docs in a separate PR?
  • I think we should require users to have hf-xet installed to benefit from fast uploads and downloads if we're not doing in here already?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants