Faster computation of quantiles in describe#2909
Conversation
Computing all quantiles when we only need the median is signficantly slower. Also avoid trying to compute quantiles for string columns since the failure only happens after sorting the vector, which is almost all of the work.
bkamins
left a comment
There was a problem hiding this comment.
Looks good. We could add a correctness test.
Also note that this assumes that some custom subtype of AbstractString does not define arithmetic operations on it, but I assume it is OK to make this assumption in practice.
We already cover this AFAICT.
Yes, in theory that's possible, though I'm not sure what sense it would make... An alternative would be to call |
|
Can we be even smarter about this and use |
"Partially sorted" refers to |
|
Ah I see. Nevermind, this looks good. |
|
I've pushed a commit implementing the more general approach, let me know what you think. It's funny to note that on Julia >= 1.7, thanks to https://github.com/JuliaLang/Statistics.jl/pull/72, a test on |
Computing all quantiles when we only need the median is signficantly slower (and it's even worse than it should due to https://github.com/JuliaLang/Statistics.jl/issues/84).
Also avoid trying to compute quantiles for string columns since the failure only happens after sorting the vector, which is almost all of the work.