Skip to content

feat: implement strict type validation using strict from huggingface_hub#393

Merged
sayakpaul merged 15 commits intomainfrom
strict-validation
Mar 25, 2026
Merged

feat: implement strict type validation using strict from huggingface_hub#393
sayakpaul merged 15 commits intomainfrom
strict-validation

Conversation

@sayakpaul
Copy link
Member

@sayakpaul sayakpaul commented Mar 24, 2026

As discussed internally.

Note that I have chosen NOT to use pydantic for this as per https://huggingface.co/docs/huggingface_hub/package_reference/dataclasses#why-not-use-pydantic--or-attrs--or-marshmallowdataclass-. Plus, we don't have to add a new dependency and get away with what we already have (which is always nice). But I can switch to pydantic should we mutually decide.

Some inline comments.

Notes

  • Bump huggingface_hub minimum version to use @strict.
  • @strict can't validate packaging.version.Version (a third-party type). So, I had to remove @strict from the Torch class, for example, in variants.py. If we decided to go the @strict way, we could follow up with the Hub team for adding support.
  • Had to add some dependencies to overlay.nix due to the version bump in Hugging Face Hub. I think that they should be quite safe.
  • Didn't do the nix related updates in terraform yet in case we want to use pydantic.

from kernels.compat import has_torch


@runtime_checkable
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because @strict validates field types using isinstance() checks at construction time. When Arch.backend is typed as Backend (a Protocol), @strict tries to do isinstance(value, Backend) — and Python's typing.Protocol raises TypeError unless it's decorated with @runtime_checkable.

verified: bool | None = None # None = no verify fn, True = passed, False = failed
ref_mean_ms: float | None = None # Reference implementation mean time

def validate_iterations(self):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are "magic" methods. @strict auto-discovers any method named validate_* and calls it during __init__ (and on __setattr__). They replaced the manual checks that were previously inside from_dict().

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Comment on lines +88 to +90
python-self.httpx
python-self.shellingham
python-self.typer-slim
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can add their hash as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed if we use them from nixpkgs as-is.

Copy link
Member

@danieldk danieldk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!!! Added a bunch of comments.

version=data.get("version"),
license=data.get("license"),
backends=data.get("backends"),
hub=HubConfig.from_dict(hub_data) if hub_data else None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note on the discussion from Slack, this is one thing that goes wrong with hand-rolled deserialization. What if the user put in something that makes hub_data truthy, but not a dict. Then HubConfig.from_dict will fail with a hard-to-understand error, whereas it should just say what the field/section is expected to be.

You can avoid this by programming very defensively (e.g. here checking that it is a dict first), but those things you get for free from a library that does deserialization for you.

I think for now it's ok, since we are going to replace this deserialization with Rust as discussed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a note about it as well in bc042be.

Comment on lines +42 to +44
raise ValueError(
f"min_capability ({self.min_capability}) must be <= max_capability ({self.max_capability})"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Comment on lines +88 to +90
python-self.httpx
python-self.shellingham
python-self.typer-slim
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed if we use them from nixpkgs as-is.

sayakpaul and others added 2 commits March 24, 2026 20:20
Copy link
Member

@danieldk danieldk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Nice to have all the extra validation!

@sayakpaul sayakpaul merged commit 131f49f into main Mar 25, 2026
38 checks passed
@sayakpaul sayakpaul deleted the strict-validation branch March 25, 2026 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants