Skip to content

Skip metrics can lead to degenerate performance #219

@okyangyishen

Description

@okyangyishen

I tested an eval config that skips almost everything and keeps only pearson_delta and discrimination_score_l1 for faster iteration through evaluator.compute(profile="full", metric_configs={}, skip_metrics=skip_metrics). Unexpectedly, those two metrics got much worse, even though model predictions were the same. This looks like a cell_eval pipeline bug: skipping many metrics changes internal intermediate state (likely hidden dependency/order effect), which makes pearson_delta/discrimination_score_l1 unreliable in that reduced setup.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions