Skip to content

NormalizeName transform uses unindexable REGEXP_REPLACE #1159

@git-hyagi

Description

@git-hyagi

Is your feature request related to a problem? Please describe.

class NormalizeName(models.Transform):
    function = "REGEXP_REPLACE"
    template = "LOWER(%(function)s(%(expressions)s, '(\.|_|-)', '-', 'ig'))"
    lookup_name = "normalize"

Every query using name__normalize= generates SQL like:

WHERE LOWER(REGEXP_REPLACE(name, '(\.|_|-)', '-', 'ig')) = 'torch'

This cannot use any index and forces a full table scan with regex computation on every row.

Describe the solution you'd like
Add a name_normalized field to PythonPackageContent that stores the pre-computed LOWER(REGEXP_REPLACE(name, ...)) value and change all name__normalize= lookups to use name_normalized__exact=. This eliminates the regex computation at query time.

Additional context
Another index that could improve performance is on name field, where it would help the order_by("name").distinct() query in SimpleView.list().

name = models.TextField()  # NO db_index=True

The name field is used in nearly every PyPI API query but has no database index. Every package lookup by name requires a sequential scan.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions