Current Z-score outlier removal method is resulting in increased number of false positives.

The current method for outlier removal (eg removing the top x% in an ordered list) seems to be resulting in a large number of false positives.  Recommendation to replace this method with a numpy optimized version of the original mean/sd based approach and/or to to make the number of reads excluded a tunable parameter.  Thanks to @jxmavs for raising this issue and for suggestions!

Examples of false positives (at z score 14) (false z score in center of frame).  Thanks to James for collecting examples:

<img width="1246" alt="image" src="https://github.com/miraep8/rendseq/assets/10511777/7c71334c-8fef-4b55-904f-235b1e7d0a4a">
<img width="1245" alt="image" src="https://github.com/miraep8/rendseq/assets/10511777/d00b4a56-acb0-4ffb-8647-03fb49c59b19">





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Current Z-score outlier removal method is resulting in increased number of false positives. #57

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Current Z-score outlier removal method is resulting in increased number of false positives. #57

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions