Skip to content

Add a stored normalized name column with an index#1160

Open
git-hyagi wants to merge 1 commit intopulp:mainfrom
git-hyagi:pulp-python-high-db-load-issue
Open

Add a stored normalized name column with an index#1160
git-hyagi wants to merge 1 commit intopulp:mainfrom
git-hyagi:pulp-python-high-db-load-issue

Conversation

@git-hyagi
Copy link
Contributor

Add a name_normalized field to PythonPackageContent that stores the pre-computed LOWER(REGEXP_REPLACE(name, ...)) value, populated via a BEFORE_SAVE hook.
Add db_index=True.
Change all name__normalize= lookups to use name_normalized__exact=. This eliminates the regex computation at query time.

closes: #1159
Assisted By: claude-opus-4.6

📜 Checklist

  • Commits are cleanly separated with meaningful messages (simple features and bug fixes should be squashed to one commit)
  • A changelog entry or entries has been added for any significant changes
  • Follows the Pulp policy on AI Usage
  • (For new features) - User documentation and test coverage has been added

See: Pull Request Walkthrough

license = models.TextField() # Deprecated in favour of License-Expression
metadata_version = models.TextField()
name = models.TextField()
name = models.TextField(db_index=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since all the other filters in pypi/views will use name_normalized I think this index is almost unused. The only place it will be used is in normal content filtering on the Pulp endpoint. Should we switch that name filter to use the new field under the hood?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this index now that you switched the filter over?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeap

@git-hyagi git-hyagi force-pushed the pulp-python-high-db-load-issue branch from 2edeffd to 02d0537 Compare March 24, 2026 19:28
@git-hyagi git-hyagi requested a review from gerrod3 March 24, 2026 19:42
Add a name_normalized field to PythonPackageContent that stores
the pre-computed LOWER(REGEXP_REPLACE(name, ...)) value, populated
via a BEFORE_SAVE hook.
Add db_index=True.
Change all name__normalize= lookups to use name_normalized__exact=.
This eliminates the regex computation at query time.

closes: pulp#1159
Assisted By: claude-opus-4.6
@git-hyagi git-hyagi force-pushed the pulp-python-high-db-load-issue branch from 02d0537 to 53b6f61 Compare March 24, 2026 20:32
Copy link
Contributor

@gerrod3 gerrod3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@@ -211,6 +214,11 @@ class PythonPackageContent(Content):
name.register_lookup(NormalizeName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need name.register_lookup(NormalizeName) with this change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NormalizeName transform uses unindexable REGEXP_REPLACE

3 participants