Add support for KeyedArrays as a Tables sink#21
Conversation
mcabbott
left a comment
There was a problem hiding this comment.
Looks good, thanks for the PR!
I scribbled a few comments, but haven't actually tried this yet. One general one is: does it behave OK with bad input? Mentioning a non-existent column name probably gives a sensible error already.
populate! will also silently over-write data. I wonder whether it can (or should) give some warning. For instance could it keep a BitSet with LinearIndex(I) as it goes, perhaps that would not cost much speed?
Yeah, most of the errors fallback on whatever the table chooses, which is why I left those tests out. For example, with a
That's correct. My only concern was how best to report the warning without spamming the user. Here the options I can think of:
We could also probably do a combination of 2 and 4 if you like? |
|
Okay, looks like doing options (2) and (4) with a mask has minimal impact on performance. The code is just a bit uglier now. |
|
Re errors that sounds fine, just as long as it doesn't return unhelpful "can't iterate Nothing" things too easily. Re overwriting, maybe I'm over-thinking this, and should regard it as conceptually |
nalimilan
left a comment
There was a problem hiding this comment.
Sounds cool! How did you choose the names wrapdims though? I wouldn't have guess what it does. :-) Couldn't we find something more explicit?
FWIW I agree that it would be better to throw an error instead of printing warnings by default: either you're OK with duplicates and you should pass force=true, or you're not and a warning isn't really a safe way of signalling the problem.
It was
It's the key-vector type, such as |
|
Actually I hadn't realized |
|
The basic constructor allows |
|
OK. Is there any reason why these methods couldn't be |
|
At the moment these aren't type-stable, it's possible that the first two could be made so: Right now they are handy things for humans, but I think getting the length / indices wrong deep inside some function should be an error, not silently corrected. The last is a convenient hack, so as not to need Requires... although in fact NamedDims uses Requires so it will always around, maybe there's no penalty, not sure. |
|
@nalimilan I think discussing whether |
|
The lowercase version is a clever idea for a name. But I agree that's orthogonal, sorry. About this PR, I think I agree with @nalimilan that a choice between overwrite & error seems like enough. Like
|
|
Okay, I've (1) updated the docstring with @nalimilan's suggestions and (2) simplified the duplicates logic to either error or not error based on the |
|
Is there anything else that needs to be done on this or are we just waiting on #24 so test pass again? |
|
Sorry I dropped the ball here. I meant something slightly different about types &
|
|
No worries. I took a look at your PR and it seems fine. Are you okay if I merge that now? I'm happy that it doesn't really change the API for my use case and it does seem more consistent with the rest of the package after reviewing a few other |
|
Yes, sounds good. Agree that re-organising |
PR to PR#21
We have two main use cases for this:
We've been doing this internally for AxisArrays for a while, but never got around to open sourcing it.