Update python version, DLIO, and uv.lock#298
Conversation
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
|
This is intended to resolve #293 |
|
This one needs real testing before we merge it in, so please don't anyone merge it until we're sure it hasn't broken anything. |
|
@FileSystemGuy Performed the following tests to validate PR:
DLIO data generation test: DLIO benchmark run test: |
|
I did run existing unit/ integration tests too, but see 16 failed tests and 13 errors. However, those seem to be pre-existing bugs in the main branch test suite. Reasoning for failure:Category 1: Category 2: Category 3: Category 4: |
|
Verifying the Submodule Issue (#293) is actually addressed: The To summarizeThis PR actually partially addresses issue #293 in three ways: A follow-up cleanup to remove the submodule entirely (i.e. delete The PR looks safe to approve from functional standpoint overall. |
|
Devasena, thank you for digging into this!! Both the testing of the pyproject.toml and uv.lock files, and the suggestion that the submodule linkage is not required. Since using dlio_benchmark as a "submodule" causes trouble for the end-user (they have to add the --recurse-submodules option when cloning the storage repo), if we can depend upon the pyproject.toml file to pull our DLIO into the user's environment, then it sounds like we should go that direction? IIUC, that means two things:
Does that sound correct, and is it a good idea? |
|
Thank you, Curtis! Yes, that's correct and a good idea.
pip path: Can we support both in the README, since we are not sure all users will have The key thing is that DLIO gets pulled transitively either way — the user never needs to think about submodules, separate clones, or branch names.
One last thing to look for is to make sure no scripts in the repo reference |
|
To dig into one sub-topic: I'm attracted to the idea that uv will ensure all the correct versions of all the python libraries we need are installed. My impression is that pip alone will not do that (ie: uv.lock is not referenced by pip?). I'm building PR's for the other topics you pointed out. |
|
@FileSystemGuy You're right. Shall we just then make So, the README will then just be updated with below: @russfellows what are your thoughts on using just |
|
I did a bit more research.
The recommendation is to drop pip entirely, because it doesn’t lock down the runtime properly. That is, go to mandating uv.
Here is my Gemini response, which I think is correct:
…------
Moving to uv is a smart play. In the current Python ecosystem, it’s effectively the "fast-forward" button for dependency management. If your goal is reproducibility—especially for benchmarks where a minor version drift in a library like torch or numpy can skew results—locking to uv.lock is the only way to fly.
Here is some advice on how to structure your setup and guide your users.
1. Commit to the "uv-First" Workflow
Don't offer pip as a co-equal alternative if reproducibility is your top priority. When you provide two ways to do something, users will pick the one they think they know (pip), and then they'll open issues when their environment doesn't match your benchmark.
The "Middle Ground" is a trap. Using pip with constraints is just a manual, slower version of what uv does automatically. If someone is technical enough to run a complex Python benchmark, they are technical enough to install uv.
Recommended README Workflow
I would structure the installation section like this:
Markdown
## Setup
We use `uv` for deterministic, reproducible environments.
1. **Install uv**:
```bash
curl -LsSf [https://astral.sh/uv/install.sh](https://astral.sh/uv/install.sh) | sh
Sync the environment: This creates a virtual environment (.venv) and installs all dependencies (including editable project code) exactly as specified in the lockfile.
Bash
uv sync --extra dlio
Run the code: Always prefix commands with uv run to ensure you are using the locked environment.
Bash
uv run python your_script.py
---
## 2. Handling the "Editable" Install
You mentioned needing `pip install -e .`. The beauty of `uv sync` is that **it handles this by default.** If your `pyproject.toml` is configured correctly, `uv sync` will automatically install your local package in **editable mode** into the `.venv` it creates. You don't need a separate step for it.
---
## 3. Why `uv sync` beats `pip install`
If you need to convince your team, here is the breakdown:
| Feature | `pip install -e .` | `uv sync` |
| :--- | :--- | :--- |
| **Speed** | Slow (re-resolves every time) | Instant (uses global cache) |
| **Reproducibility** | Low (depends on when you run it) | **High** (reads `uv.lock` exactly) |
| **Cleanliness** | Leaves old packages behind | **Purges** unlisted packages |
| **Venv Management** | Manual (`python -m venv...`) | **Automatic** |
---
## 4. Addressing your specific inputs
* **On `uv.lock`:** You are 100% correct. If a user runs `pip install`, they are effectively ignoring your lockfile and asking `pip` to resolve dependencies from scratch. This is how "it works on my machine" bugs are born.
* **On the "Fallback":** Instead of a fallback, list `pip` under a **"Legacy or Alternative Installation"** header at the bottom of the README. Explicitly state: *"Using pip will not honor the lockfile and may result in different library versions than those used in our official benchmarks."*
* **The Command:** Your proposed command `uv sync --extra dlio` is perfect. Just ensure your `pyproject.toml` defines `dlio` under `[project.optional-dependencies]`.
## 5. Pro-Tip: The `.python-version` file
To make it even more foolproof, include a `.python-version` file in your repo (e.g., containing just `3.11`). When a user runs `uv sync`, it will see that file and automatically download/use the correct Python version, so they don't even have to worry about what version of Python they have installed globally.
——
Regards,
—Russ
On Apr 2, 2026, at 4:47 PM, Devasena I ***@***.***> wrote:
idevasena
left a comment
(mlcommons/storage#298)
<#298?email_source=notifications&email_token=AF64UJ3AAMWP4AVM4IHGLOL4T3UXPA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTIMJYGA4DONBSGAYKM4TFMFZW63VHNVSW45DJN5XKKZLWMVXHJNLQOJPWG33NNVSW45C7N5YGK3S7MNWGSY3L#issuecomment-4180874200>
@FileSystemGuy <https://github.com/FileSystemGuy> You're right. pip does not read uv.lock. The lock file is purely a uv artifact. So the two paths aren't equivalent in terms of reproducibility.
Shall we just then make uv sync the official install path in the README and submission guidelines, ensuring benchmark reproducibility. We can mention pip install -e ".[dlio]" as a "works but not guaranteed identical" if needed.
So, the README will then just be updated with below:
pip install uv
uv sync --extra dlio
@russfellows <https://github.com/russfellows> what are your thoughts on using just uv as opposed to pip fallback or pip with constraints as a middle ground?
—
Reply to this email directly, view it on GitHub <#298?email_source=notifications&email_token=AF64UJ3AAMWP4AVM4IHGLOL4T3UXPA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTIMJYGA4DONBSGAYKM4TFMFZW63VHNVSW45DJN5XKKZLWMVXHJNLQOJPWG33NNVSW45C7N5YGK3S7MNWGSY3L#issuecomment-4180874200>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AF64UJ56VEOYWYHDBOOVU6L4T3UXPAVCNFSM6AAAAACXIJXRT6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DCOBQHA3TIMRQGA>.
You are receiving this because you were mentioned.
|
Without this change, the user has to add "-extra dlio" to the uv sync command, which they likely will do, so better to download it even if they don't use it than to not get it when they need it.
|
Based upon Russ' research I made one change in the PR, I moved the dlio dependency to required rather than optional. Without this change, the user has to add "-extra dlio" to the "uv sync" command, which they likely will forget to do, so better to download it even if they don't use it than to not get it when they need it. I will make the suggested changes to the README to note that "uv" is the required package manager, including instructions on how to install uv, that pip is explicitly not supported for use with the benchmark, and that "uv run" should be prepended to all use of mlpstorage to ensure that all the correct versions of libraries are used. I'll also see if there's a way to force "uv run" into mlpstorage itself to guarantee that it is used on every benchmark execution. Is this too micromanaging, or is this better ease-of-use and repeatability? |
|
I took my own advice and decided to fix this once and for all. I created PR#308 to prepend "uv run" to every invocation of mlpstorage. Whether they used pip or not, uv will force the creation and use of the correct virtual environment. If they haven't installed uv yet, then the command will fail and they will have to install uv. How's that for a big hammer? :-) |
Constrain python to be in the 3.12 family only,
Change the DLIO dependencies in pyproject.toml to refer to DLIO_local_changes,
Update the uv.lock file to reflect those changes,