-
Notifications
You must be signed in to change notification settings - Fork 83
Description
Is your feature request related to a problem? Please describe.
class NormalizeName(models.Transform):
function = "REGEXP_REPLACE"
template = "LOWER(%(function)s(%(expressions)s, '(\.|_|-)', '-', 'ig'))"
lookup_name = "normalize"Every query using name__normalize= generates SQL like:
WHERE LOWER(REGEXP_REPLACE(name, '(\.|_|-)', '-', 'ig')) = 'torch'This cannot use any index and forces a full table scan with regex computation on every row.
Describe the solution you'd like
Add a name_normalized field to PythonPackageContent that stores the pre-computed LOWER(REGEXP_REPLACE(name, ...)) value and change all name__normalize= lookups to use name_normalized__exact=. This eliminates the regex computation at query time.
Additional context
Another index that could improve performance is on name field, where it would help the order_by("name").distinct() query in SimpleView.list().
name = models.TextField() # NO db_index=TrueThe name field is used in nearly every PyPI API query but has no database index. Every package lookup by name requires a sequential scan.