Skip to content

Allow parsing source code directly#54

Open
bzoracler wants to merge 5 commits intomypyc:mainfrom
bzoracler:parse-source
Open

Allow parsing source code directly#54
bzoracler wants to merge 5 commits intomypyc:mainfrom
bzoracler:parse-source

Conversation

@bzoracler
Copy link
Copy Markdown

@bzoracler bzoracler commented Apr 17, 2026

Resolves #21

Tests are part of python/mypy#21260

@ilevkivskyi
Copy link
Copy Markdown
Collaborator

Thanks for the PR! Me or Jukka will try to take a look next week. In the meantime, could you please resolve the merge conflict?

Comment thread src/serialize_ast.rs Outdated
Comment on lines +280 to +287
match pyo3::exceptions::PyUnicodeDecodeError::new_utf8(
obj.py(),
e.as_bytes(),
utf8_err,
) {
Ok(err) => Self::Error::from_value(err.into_any()),
Err(err) => err,
}
Copy link
Copy Markdown
Author

@bzoracler bzoracler Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation taken from https://pyo3.rs/main/doc/src/pyo3/exceptions.rs#802-811 (PR https://github.com/PyO3/pyo3/pull/5668/changes), because new_err_from_utf8 is only in pyo3 0.29 which is not released yet.

The test is here: python/mypy@ac275e4

Comment thread src/serialize_ast.rs
)> {
serialize_module(
source,
PySourceType::Python,
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've hard-coded Python source type here, because mypy parsing functions never previously exposed an option to let the user treat source code directly as a .pyi stub source.

@bzoracler
Copy link
Copy Markdown
Author

Hmm...there might be some unnecessary allocations that are preventable. I'm going to do some benchmarking with another implementation.

@bzoracler bzoracler marked this pull request as draft April 28, 2026 08:54
@bzoracler bzoracler marked this pull request as ready for review April 29, 2026 06:46
@bzoracler
Copy link
Copy Markdown
Author

bzoracler commented Apr 29, 2026

If we're willing to use &str in downstream functions instead of String (which IMO is fine), 30bd655 removes unnecessary allocations when working with Python builtins.bytes passed as source code. Microbenchmarks on a test machine (Intel Core i7-10750H CPU @ 2.60GHz, x86_64, 12 cores) show ~45% reduction in allocation during type conversion from PyBytes to &str (tested with 100 MB source code), and the type conversion itself runs about 6x faster (although execution speed is dominated by parsing, so this speedup doesn't matter as much).

Performance of builtins.str passed as source code does not seem to be improvable unless stable-abi is increased to ["pyo3/abi3-py310"], after which we can use this extraction function. I tried the same microbenchmarks after increasing the stable-abi, got similar allocation reduction and a huge speedup in type conversion (thousands of times faster, but again the execution time is dominated by parsing so doesn't really matter here).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support parsing strings in addition to files

2 participants