Fix hardcoded zeros for encode(0, ...) and padding#27
Fix hardcoded zeros for encode(0, ...) and padding#27sd-yip wants to merge 2 commits intosuminb:developfrom
encode(0, ...) and padding#27Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
|
@suminb Would you mind taking a look? |
There was a problem hiding this comment.
Pull request overview
Fixes incorrect roundtrips for encode(0, charset=...) and leading-zero byte padding by removing hardcoded "0" assumptions and consistently using charset[0] as the zero digit / marker.
Changes:
- Update
encode(0, charset=...)to returncharset[0]instead of the hardcoded"0". - Update
encodebytes/decodebytesleading-zero padding logic to usecharset[0]instead of"0". - Add a new test validating encode/decode behavior with a non-default (QWERTY) charset.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
base62.py |
Removes hardcoded "0" usage by switching to charset[0] for zero and byte-padding markers. |
tests/test_basic.py |
Adds coverage for encode(0) / decode() with a custom charset to prevent regressions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Encode the leading zeros as "0" followed by a character indicating the count. | ||
| # This pattern may occur several times if there are many leading zeros. |
There was a problem hiding this comment.
The comment describing leading-zero padding is now inaccurate: the marker is no longer the literal string "0", but charset[0]. Please update the comment so it matches the new behavior (and avoids confusing readers when using non-default charsets).
| # Encode the leading zeros as "0" followed by a character indicating the count. | |
| # This pattern may occur several times if there are many leading zeros. | |
| # Encode the leading zeros using charset[0] as a marker, followed by a character | |
| # indicating the count. This pattern may occur several times if there are many | |
| # leading zeros. |
| def test_basic_qwerty(): | ||
| kwargs = {"charset": CHARSET_QWERTY} | ||
|
|
||
| assert base62.encode(0, **kwargs) == "1" | ||
| assert base62.decode("1", **kwargs) == 0 | ||
| assert base62.decode("1111", **kwargs) == 0 | ||
| assert base62.decode("111112", **kwargs) == 1 |
There was a problem hiding this comment.
This adds coverage for encode(0) / decode() with a non-default charset, but the PR also changes the non-default-charset behavior of encodebytes() / decodebytes() leading-zero padding. Please add assertions that encodebytes/decodebytes roundtrip (including leading null bytes) with CHARSET_QWERTY (and/or CHARSET_INVERTED) so the padding changes are exercised by tests.
Currently,
decode(encode(0, charset=MY_CHARSET), charset=MY_CHARSET)may not yield0due to hardcoding. This proposed change eliminates hardcoding instances and resolves the issue.