fix(flow-entry): Fix entries overwriting for flow table#1290
Conversation
Fredi-raspall
left a comment
There was a problem hiding this comment.
I have several comments on this PR.
-
can we make the commit explanations more concise? This includes fixing some paths in the commit message that seem to be out-dated.
-
I am not sure this is the best fix to the problem.
- I wonder if we should be keeping an Arc in the table and let the PQ have a Weak one. With a prior fix, we explicitly remove elements that are expired from the table.
- Also, since the problem is when inserting elements (in the table), I wonder if, on insertion, we could just make a lookup for the new entry and, if found, just update it.
- If you remove the entry from the PQ (which holds the strong references), even if you add it again, isn't that going to drop the object and any weak reference to it become invalidated? This would be bad IMO.
-
Can we have a test that validates your fix?
-
IMO, the whole crate (which was a "first cut" for NAT) should be revisited, given the confusion it is creating, to adapt it to our real needs, but that's another story. I'll open an issue for that.
I'm open to alternatives, although I don't have too many cycles to spend on this at the moment (so I won't do any large refactor to address this bug, now).
I don't know, I suppose we could. It would not solve the current issue as far as I can tell, though - we'd still fail to insert the new value into the priority queue.
Do you mean a lookup in the flow table, or in the priority queue? Assuming you mean the priority queue, we'd need to do the lookup then compare the value to see if it's identical (I'm not really sure why the hash and
That's a good point, I need to double-check that - if we keep the current fix. Your request for a test is very reasonable by the way. |
mvachhar
left a comment
There was a problem hiding this comment.
@sergeymatov should be working on removing the priority queue entirely in favor of tokio timers, but this fix looks good for now.
I think we should accept this. The correct fix is to drop the priority queue entirely in favor of tokio's timers. |
I was thinking in the flow table |
ok. I'd just add a test, since we don't know when the timers support will be ready? |
1488030 to
15c6060
Compare
So I checked: I remove any existing entry with a similar key, and that one is dropped, as expected (given that it's also no longer present from the flow table itself, it's been overwritten). But I don't remove or drop the new value, which is different, before inserting it in the priority queue, so we keep the strong reference to that one. I've added a test to validate the behaviour. |
15c6060 to
5e5cce5
Compare
Fredi-raspall
left a comment
There was a problem hiding this comment.
LGTM, thanks @qmonnet !
We currently have an issue when trying to overwrite an entry with an existing key in the flow table. We're about to fix it (see the commit log from the fix for the full explanation), but first introduce a test to highlight the issue. Signed-off-by: Quentin Monnet <qmo@qmon.net>
We've had a long-standing issue where push()-ing a new entry into the
flow table's priority queue does not actually insert the new value when
an entry with a similar key already exists in the queue, resulting in an
invalid weak reference in the flow table.
With more details: We've got an insert() method, defined in
flow-entry/src/flow_table/table.rs, that simply says: "Add a flow to the
table." Looking at this comment, the expectation is that, when trying to
insert an entry with a key that is already present in the table, the new
value will replace the existing one, and the old value will be returned
by the method. What happens instead when calling the method in that
case:
- The old value is returned indeed
- The new value is not inserted correctly. Instead, we only keep a
"weak" reference in the table.
Let's unroll what happens for the first insertion, step by step:
1. We want to create a new entry, we call self.insert(flow_key,
flow_info).
2. insert() takes an Arc pointer (strong reference) to flow_info, then
calls self.insert_common(flow_key, &val) defined in
flow-entry/src/flow_table/table.rs.
3. insert_common() downgrades the pointer and stores a weak pointer in
self.table (the "table" itself).
4. insert_common() also pushes a clone of the Arc into the priority
queue, to be able to reap expired entries later.
5. insert_common() exits, returning None (no prior value).
6. insert() exits, the Arc gets out of scope and is dropped, but it's OK
because we still hold a clone in the priority queue so the weak
reference in self.table remains valid.
7. Then at lookup time, we find the weak reference in self.table and
successfully upgrade it to obtain an Arc to our initial flow_info
object.
Now, here's what happens when trying to insert another value (same or
different), with an identical key:
1. We call self.insert(flow_key, flow_info).
2. insert() takes an Arc pointer to flow_info, then calls
self.insert_common(flow_key, &val).
3. insert_common() downgrades the pointer and stores a weak pointer in
self.table, overwriting the existing reference, as one might expect.
4. insert_common() tries also pushes a clone of the Arc into the
priority queue, but let's look at the details:
- It does so by calling the push() method, defined in
flow-entry/src/flow_table/thread_local_pq.rs.
- ... which calls in turn the priority queue's push(), for which the
docs say: "If an element equal to `item` is already in the queue,
its priority is updated and the old priority is returned in
`Some`".
- The presence of an identical element in the queue this is
determined using the hash() method for Entry, defined in
flow-entry/src/flow_table/thread_local_pq.rs, and which only hashes
the key!
- So our items are considered as identical: we update the priority
and do not replace the value in the priority queue, the new value
never makes it to the priority queue.
5. insert_common() exits, correctly returning the old value
6. insert() exits, the Arc gets out of scope and is dropped: given that
we haven't stored a strong reference for the current value in the
priority queue, the weak value in the table becomes invalid.
7. Then at lookup time, we find the invalid weak reference, reap
it from the table, and return no value.
Here we fix the issue by always removing any existing entry with an
identical key hash before trying to insert a new entry into the priority
queue. This way, we're sure we do not keep an older entry with a
lingering weak reference to a deleted value.
Fixes: 0bade5a ("feat(pkt-meta): Add initial flow table implementation")
Signed-off-by: Quentin Monnet <qmo@qmon.net>
5e5cce5 to
9c660fb
Compare
We've had a long-standing issue where push()-ing an entry into the flow table's priority queue does not actually insert the new value when an entry with an identical key already exists in the queue, resulting in an invalid weak reference in the flow table; here's a fix.
See commit description or linked issue for details.
Fixes: #1085