Skip to content
This repository was archived by the owner on May 19, 2025. It is now read-only.
This repository was archived by the owner on May 19, 2025. It is now read-only.

Tiered datasets, a limit, or none? #4

@danielamitay

Description

@danielamitay

The dataset from my initial push was actually filtered down (by way of # of reviews, and availability) from an original collection of >20,000 URL schemes.

Previously, this dataset needed to be retrieved from the server, and each additional scheme check takes ~1ms, so I did this filtering for the sake of bandwidth and speed. Seeing as the current dataset adds only ~180kB to a compiled app, perhaps a limit on the number of URL schemes is unnecessary?

As an extreme example, were we to collect 100,000 URL schemes, not only would the compressed file size jump to >1MB, but the detection process itself would take 7X longer to complete.

Thoughts?

Tags regarding dataset: @HBehrens @steipete

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions