Add automated type checking by nateprewitt · Pull Request #538 · fsspec/adlfs

nateprewitt · 2026-02-18T20:02:38Z

This is a draft PR to serve as a talking point on how we'd enable automated type checking for current and future PRs. That will help enforce the existing typing and make sure we're keeping types up to date.

This PR touches a number of places in the code base so I've split it into 3 distinct commits.

Enabling automated type checking with pyright, for now this is non-blocking for merge.
Fixing existing bugs/inaccurate typing in the code base.
Migrating remaining typing to PEP 585 and PEP 604.

There are some cases where we're adding inline ignores so I'm going to comment on those specifically with context. We can discuss the tradeoffs there. We'll also need to get the other outstanding type PRs merged before this is ready for final review.

nateprewitt

Here's an initial overview of the changes.

nateprewitt · 2026-02-18T20:06:08Z

adlfs/gen1.py

    def __setstate__(self, state):
        logger.debug("De-serialize with state: %s", state)
-        self.__dict__.update(state)
+        self.__dict__.update(state)  # type: ignore[reportAttributeAccessIssue]


This is an interaction with pyright and inheriting from a 3p untyped library. While this should be a generic Mapping/dict as an instance of the class, pyright is inferring it to be MappingProxyType[str, Any] which isn't guaranteed to have update.

I believe we either need to get this addressed at the pyright level or with typing in fsspec. Neither of those are likely to happen soon, so I've commented this out.

I also know we may remove this gen1.py file at some point but this same issue occurs elsewhere in the changes.

nateprewitt · 2026-02-18T20:07:31Z

adlfs/spec.py

+    service_client_kwargs: dict[str, Any] = {
        "account_url": account_url,
        "user_agent": _USER_AGENT,
    }


This needs to be explicitly typed because values are mutated later to more than the inferred type of dict[str, str]. This is the most sensible future proof option since it's kwargs. If we really want to be strict, we can enumerate all of the current types but that may need to be updated in the future.

nateprewitt · 2026-02-18T20:10:22Z

adlfs/spec.py


        if max_concurrency is None:
-            batch_size = _get_batch_size()
+            batch_size: int = _get_batch_size()  # type: ignore[assignment]


We know this is an number in all cases (we do numerical comparison on the next line) but _get_batch_size() can technically return None at the fsspec layer. This declares the type we know it to be and the code is built around.

We could add special handling for None if we want, I just wasn't able to find a way to actually trigger it for adlfs.

nateprewitt · 2026-02-18T20:13:43Z

adlfs/spec.py

+            else:
+                max_concurrency = 1


This is resolving a configuration edge case. self.max_concurrency is the fallback if max_concurrency is None. That will be an int in all cases except if the operating system returns a batch_size < 1. At that point we never initialize the value correctly, and the SDK errors out because it didn't receive an int.

This change moves us to single request throughput (also the SDK default) if we can't determine a batch_size from the OS. That is what would have been intended by None (use the default) before.

I'm a little hesitant to change this fallback value as part of supporting type checking, especially if type checking can pass without this change. Mainly I would want to understand the intention of a return value of zero or negative from _get_batch_size() and compare with other fsspec implementations before committing to how it is interpreted.

I have a PR open with the Azure SDK to hopefully help fix this issue. The APIs are inconsistently typed, some allow None, others don't. A different subset actually works with None (this is one of them).

Once we get that aligned, we should be fine to remove this piece.

We got 3 PRs merged upstream. Once the next Python SDK release goes out, we should be able to remove this.

nateprewitt · 2026-02-18T20:17:06Z

adlfs/spec.py

        """
        if isinstance(path, list):
-            return [cls._strip_protocol(p) for p in path]
+            return [cls._strip_protocol(p) for p in path]  # type: ignore[return-value]


We already explicitly require path to be a str. The upstream fsspec version of this also only uses a str but has this code path in the event we somehow get a list. This block keeps parity with upstream but isn't possible through the current public API for adlfs.

We can introduce a change that records a return type of str | list[str] but it significantly complicates the typing story without providing more accuracy.

nateprewitt · 2026-02-18T20:21:10Z

adlfs/spec.py

-    def expand_path(self, path, recursive=False, maxdepth=None, skip_noexist=True):
-        return sync(
+    def expand_path(self, path, recursive=False, maxdepth=None, skip_noexist=True) -> list[str]:
+        return sync(  # type: ignore[return-value]


sync is defined in fsspec without a return type. It's effectively acting as a proxy from async -> sync so we could likely do something with generics for this. The only issue is the best ways to do that require 3.12+.

For the time being, we know this is the same return type, so it's ignored.

nateprewitt · 2026-02-18T20:22:56Z

adlfs/spec.py

+        if block_size == "default" or block_size is None:
+            self.blocksize: int = self.DEFAULT_BLOCK_SIZE
+        else:
+            self.blocksize = block_size


This is a limitation of pyright. The code doesn't have any notable performance changes, it just allows the type checker to reason. pyright isn't currently able to extrapolate the in check to equivalence.

nateprewitt · 2026-02-18T20:23:53Z

adlfs/spec.py

        try:
            if hasattr(self.fs, "account_host"):
-                self.fs.account_url: str = f"https://{self.fs.account_host}"
+                self.fs.account_url = f"https://{self.fs.account_host}"


These aren't valid type definitions because they're modifying self.fs not a member of the class directly. The correct typing already lives on the fs implementation.

nateprewitt · 2026-02-18T20:24:47Z

adlfs/spec.py

                await bc.stage_block(
                    block_id=block_id,
-                    data=data[start:end],
+                    data=data[start:end],  # type: ignore[arg-type]


I talked with the Azure Python SDK folks and am going to open a PR about this. The current definition doesn't support memoryview or bytearray which are both valid inputs. Once that's fixed there, this ignore is no longer needed.

nateprewitt · 2026-02-18T20:25:41Z

adlfs/spec.py

                    data=data,
                    length=length,
-                    blob_type=BlobType.AppendBlob,
+                    blob_type=BlobType.APPENDBLOB,


AppendBlob is an alias pointing to this Enum but isn't explicitly declared as a member. That's leading the type checker to infer it's Unknown. We'll use the Enum directly instead of the alias. It's semantically identical.

nateprewitt · 2026-02-18T20:46:14Z

adlfs/_version.pyi

+version: str
+__version__: str
+version_tuple: tuple[int | str, ...]
+__version_tuple__: tuple[int | str, ...]
+commit_id: str | None
+__commit_id__: str | None


This covers the _version.py generated by setuptools_scm at installation time. That doesn't exist in the repository so CI complains it can't resolve typing from the utils method. You can find a similar setup in Black to handle this.

kyleknap

Looks good. I just had a few comments.

kyleknap · 2026-03-09T17:35:46Z

.github/workflows/tests.yml

+
+  type-check:
+    runs-on: "ubuntu-22.04"
+    continue-on-error: true


Is there a particular reason this was not added to the pre-commit hook and requirements/dev.txt file? Was it for allowing to continue when there is an error? Seems like that is where most of the code checkers are and it would also mean anyone who is use the pre-commit hooks will be able to run the type checker as well.

No we can definitely add them there. This was originally written to get testing bootstrapped on a machine other than my local one but we should make sure it's easily reproducible everywhere.

The continue-on-error was to not block PRs just because there was a typing issue. Whether we want to keep that, I'm indifferent on. What I wanted to avoid was people adding #ignore or making the types potentially worse to get around the typechecker.

kyleknap · 2026-03-09T17:39:15Z

adlfs/spec.py

+            else:
+                max_concurrency = 1


I'm a little hesitant to change this fallback value as part of supporting type checking, especially if type checking can pass without this change. Mainly I would want to understand the intention of a return value of zero or negative from _get_batch_size() and compare with other fsspec implementations before committing to how it is interpreted.

nateprewitt added 3 commits February 18, 2026 09:18

Add basic type checking with pyright to CI

026be6f

Fix typing

f7296e5

Update typing syntax to use PEP 585/604

134cc01

nateprewitt commented Feb 18, 2026

View reviewed changes

nateprewitt added 2 commits February 18, 2026 13:27

Lint changes

816c355

Add type stub for dynamic version

1392026

nateprewitt commented Feb 18, 2026

View reviewed changes

Ignore gen1.py due to dependency typing missing

deb657c

kyleknap reviewed Mar 9, 2026

View reviewed changes

nateprewitt mentioned this pull request Mar 10, 2026

Fix typing contract for Blob client max_concurrency Azure/azure-sdk-for-python#45598

Merged

6 tasks

Conversation

nateprewitt commented Feb 18, 2026

Uh oh!

nateprewitt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kyleknap left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants