Update rwi.stats.running.R#34
Conversation
I have added a check whether the rwi dataset has been detrended using difference=T, thus having a grand mean of 0. If so, it needs to be transposed by an arbitrary positive number, otherwise normalize1() messes up the dataset and can divide a series by a slightly negative value, returning false rbar and eps values.
|
Hi Stefan, Thanks for tracking this down and taking the time to submit a fix. As I understand it, the issue is that normalize1() divides each series by the grand mean of rwi. If rwi has been detrended using difference=TRUE the grand mean will be near zero. Dividing by a value near zero - or a slightly negative value - produces nonsensical rbar and eps values, and the user has no idea anything went wrong. Is that a fair description of the problem? Your fix works for the specific case you're describing, but I'm hesitant to merge it for two reasons. First, adding 100 is arbitrary - it works, but it's hard to defend that number over any other, and it would confuse anyone reading the code later (something we find all the time in Ed’s legacy code — hard to unravel later). Second, rounding and checking for an exact grand mean of 0 is too narrow. A grand mean of -0.6 rounds to -1 and slips through, even though normalize1() would still have a problem with it. I think a stop() with an informative error is cleaner. It catches any non-positive grand mean and puts the fix in the user's hands, where it belongs: if(mean(unlist(rwi), na.rm=TRUE) <= 0) { I'll implement that and close this PR. But if you have a better approach or think I've misunderstood the problem, please open another PR. Thanks again for the careful work. |
|
Hi Andy, More general, and maybe I should open another request, when detrend(..., difference=T) then all the fail-safe mechanisms (divisor goes through zero --> divide by series mean) should be automatically switched off. Otherwise, one has to again arbitrarily transpose the data when there is objectively no need to (since we are subtracting and not dividing, it does not matter if the predictor crosses zero). Would you agree? And maybe concretely, why do we need the normalize1() command for eps and rbar calculation? I don't really understand the rationale for this step. Correlations work, no matter what their mean is, so dividing the series by its mean is unnecessary and causes all sorts of problems downstream if people do not use ratios because they detrended isotopes or had log-transformed (or power transformed) ring width series and their detrended mean is around 0, not 1. |
|
Hi @sklesse! Fair point on normalize1(). When n=NULL and prewhitening is off, it just divides by the column mean, which doesn't change rbar or eps since correlation is scale-invariant. So the stop() I added is too blunt - the better fix is to detect the situation and skip normalize1() when the data look like first differences. I'd welcome a PR for that. The main design question is where to draw the threshold for 'near zero' but that seems tractable. On ratio vs. log+subtract - the variance inflation problem when the detrending curve fits poorly is a real issue and worth a separate thread. I just pushed 1.7.9 to CRAN to address an unrelated bug. Planning another release in the next month or so - adding to the testing suite (long overdue) and better rwl class checks, which will touch a lot of files. A PR for the normalize1() auto-detect would fit in well with that work if you have time. Cheers, |
Hi Andy,
I have added a check whether the rwi dataset has been detrended using difference=T, thus having a grand mean of 0. If so, it needs to be transposed by an arbitrary positive number, otherwise normalize1() messes up the dataset and can divide a series by a slightly negative value, returning false rbar and eps values.
Right now, if people don't pay attention their stats are f***ed up. In 99% of the cases I'd say the normalize1() is not needed, but the easy workaround is to transpose the data if needed. Would be nice if this was an automatic check and operation in the function and dplR library. :-)
Thx for updating in the next round!
Cheers,
Stefan