Skip to content

fix(flow-entry): Fix entries overwriting for flow table#1290

Merged
qmonnet merged 2 commits intomainfrom
pr/qmonnet/prioqueue-push
Feb 19, 2026
Merged

fix(flow-entry): Fix entries overwriting for flow table#1290
qmonnet merged 2 commits intomainfrom
pr/qmonnet/prioqueue-push

Conversation

@qmonnet
Copy link
Member

@qmonnet qmonnet commented Feb 17, 2026

We've had a long-standing issue where push()-ing an entry into the flow table's priority queue does not actually insert the new value when an entry with an identical key already exists in the queue, resulting in an invalid weak reference in the flow table; here's a fix.

See commit description or linked issue for details.

Fixes: #1085

@qmonnet qmonnet self-assigned this Feb 17, 2026
@qmonnet qmonnet requested a review from a team as a code owner February 17, 2026 16:42
@qmonnet qmonnet added the area/nat Related to Network Address Translation (NAT) label Feb 17, 2026
Copy link
Contributor

@Fredi-raspall Fredi-raspall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have several comments on this PR.

  1. can we make the commit explanations more concise? This includes fixing some paths in the commit message that seem to be out-dated.

  2. I am not sure this is the best fix to the problem.

    • I wonder if we should be keeping an Arc in the table and let the PQ have a Weak one. With a prior fix, we explicitly remove elements that are expired from the table.
    • Also, since the problem is when inserting elements (in the table), I wonder if, on insertion, we could just make a lookup for the new entry and, if found, just update it.
    • If you remove the entry from the PQ (which holds the strong references), even if you add it again, isn't that going to drop the object and any weak reference to it become invalidated? This would be bad IMO.
  3. Can we have a test that validates your fix?

  4. IMO, the whole crate (which was a "first cut" for NAT) should be revisited, given the confusion it is creating, to adapt it to our real needs, but that's another story. I'll open an issue for that.

@qmonnet
Copy link
Member Author

qmonnet commented Feb 18, 2026

  1. I am not sure this is the best fix to the problem.

I'm open to alternatives, although I don't have too many cycles to spend on this at the moment (so I won't do any large refactor to address this bug, now).

* I wonder if we should be keeping an Arc in the table and let the PQ have a Weak one. With a prior fix, we explicitly remove elements that are expired from the table.

I don't know, I suppose we could. It would not solve the current issue as far as I can tell, though - we'd still fail to insert the new value into the priority queue.

* Also, since the problem is when inserting elements (in the table), I wonder if, on insertion, we could just make a lookup for the new entry and, if found, just update it.

Do you mean a lookup in the flow table, or in the priority queue? Assuming you mean the priority queue, we'd need to do the lookup then compare the value to see if it's identical (I'm not really sure why the hash and PartialEq implementations only take the key into account), and if it is, we could update it. If it's not we'd still need to remove the old entry and insert the new one (or find an alternative fix). Is this really different from the fix in here?

* If you remove the entry from the PQ (which holds the strong references), even if you add it again, isn't that going to drop the object and any weak reference to it become invalidated? This would be bad IMO.

That's a good point, I need to double-check that - if we keep the current fix. Your request for a test is very reasonable by the way.

Copy link
Contributor

@mvachhar mvachhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sergeymatov should be working on removing the priority queue entirely in favor of tokio timers, but this fix looks good for now.

@mvachhar
Copy link
Contributor

I have several comments on this PR.

  1. can we make the commit explanations more concise? This includes fixing some paths in the commit message that seem to be out-dated.

  2. I am not sure this is the best fix to the problem.

    • I wonder if we should be keeping an Arc in the table and let the PQ have a Weak one. With a prior fix, we explicitly remove elements that are expired from the table.
    • Also, since the problem is when inserting elements (in the table), I wonder if, on insertion, we could just make a lookup for the new entry and, if found, just update it.
    • If you remove the entry from the PQ (which holds the strong references), even if you add it again, isn't that going to drop the object and any weak reference to it become invalidated? This would be bad IMO.
  3. Can we have a test that validates your fix?

  4. IMO, the whole crate (which was a "first cut" for NAT) should be revisited, given the confusion it is creating, to adapt it to our real needs, but that's another story. I'll open an issue for that.

I think we should accept this. The correct fix is to drop the priority queue entirely in favor of tokio's timers.

@Fredi-raspall
Copy link
Contributor

Do you mean a lookup in the flow table, or in the priority queue? Assuming you mean the priority queue, we'd need to do the lookup then compare the value to see if it's identical (I'm not really sure why the hash and PartialEq implementations only take the key into account), and if it is, we could update it. If it's not we'd still need to remove the old entry and insert the new one (or find an alternative fix). Is this really different from the fix in here?

I was thinking in the flow table

@Fredi-raspall
Copy link
Contributor

I think we should accept this. The correct fix is to drop the priority queue entirely in favor of tokio's timers.

ok. I'd just add a test, since we don't know when the timers support will be ready?

@qmonnet qmonnet added the dont-merge Do not merge this Pull Request label Feb 18, 2026
@qmonnet qmonnet force-pushed the pr/qmonnet/prioqueue-push branch from 1488030 to 15c6060 Compare February 18, 2026 22:46
@qmonnet
Copy link
Member Author

qmonnet commented Feb 18, 2026

  • If you remove the entry from the PQ (which holds the strong references), even if you add it again, isn't that going to drop the object and any weak reference to it become invalidated?

So I checked: I remove any existing entry with a similar key, and that one is dropped, as expected (given that it's also no longer present from the flow table itself, it's been overwritten). But I don't remove or drop the new value, which is different, before inserting it in the priority queue, so we keep the strong reference to that one.

I've added a test to validate the behaviour.

@qmonnet qmonnet force-pushed the pr/qmonnet/prioqueue-push branch from 15c6060 to 5e5cce5 Compare February 18, 2026 22:54
@qmonnet qmonnet removed the dont-merge Do not merge this Pull Request label Feb 18, 2026
@qmonnet qmonnet enabled auto-merge February 18, 2026 22:54
Copy link
Contributor

@Fredi-raspall Fredi-raspall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @qmonnet !

@qmonnet qmonnet added this pull request to the merge queue Feb 19, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Feb 19, 2026
@qmonnet qmonnet added this pull request to the merge queue Feb 19, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Feb 19, 2026
We currently have an issue when trying to overwrite an entry with an
existing key in the flow table. We're about to fix it (see the commit
log from the fix for the full explanation), but first introduce a test
to highlight the issue.

Signed-off-by: Quentin Monnet <qmo@qmon.net>
We've had a long-standing issue where push()-ing a new entry into the
flow table's priority queue does not actually insert the new value when
an entry with a similar key already exists in the queue, resulting in an
invalid weak reference in the flow table.

With more details: We've got an insert() method, defined in
flow-entry/src/flow_table/table.rs, that simply says: "Add a flow to the
table." Looking at this comment, the expectation is that, when trying to
insert an entry with a key that is already present in the table, the new
value will replace the existing one, and the old value will be returned
by the method. What happens instead when calling the method in that
case:

- The old value is returned indeed
- The new value is not inserted correctly. Instead, we only keep a
  "weak" reference in the table.

Let's unroll what happens for the first insertion, step by step:

1. We want to create a new entry, we call self.insert(flow_key,
   flow_info).
2. insert() takes an Arc pointer (strong reference) to flow_info, then
   calls self.insert_common(flow_key, &val) defined in
   flow-entry/src/flow_table/table.rs.
3. insert_common() downgrades the pointer and stores a weak pointer in
   self.table (the "table" itself).
4. insert_common() also pushes a clone of the Arc into the priority
   queue, to be able to reap expired entries later.
5. insert_common() exits, returning None (no prior value).
6. insert() exits, the Arc gets out of scope and is dropped, but it's OK
   because we still hold a clone in the priority queue so the weak
   reference in self.table remains valid.
7. Then at lookup time, we find the weak reference in self.table and
   successfully upgrade it to obtain an Arc to our initial flow_info
   object.

Now, here's what happens when trying to insert another value (same or
different), with an identical key:

1. We call self.insert(flow_key, flow_info).
2. insert() takes an Arc pointer to flow_info, then calls
   self.insert_common(flow_key, &val).
3. insert_common() downgrades the pointer and stores a weak pointer in
   self.table, overwriting the existing reference, as one might expect.
4. insert_common() tries also pushes a clone of the Arc into the
   priority queue, but let's look at the details:
   - It does so by calling the push() method, defined in
     flow-entry/src/flow_table/thread_local_pq.rs.
   - ... which calls in turn the priority queue's push(), for which the
     docs say: "If an element equal to `item` is already in the queue,
     its priority is updated and the old priority is returned in
     `Some`".
   - The presence of an identical element in the queue this is
     determined using the hash() method for Entry, defined in
     flow-entry/src/flow_table/thread_local_pq.rs, and which only hashes
     the key!
   - So our items are considered as identical: we update the priority
     and do not replace the value in the priority queue, the new value
     never makes it to the priority queue.
5. insert_common() exits, correctly returning the old value
6. insert() exits, the Arc gets out of scope and is dropped: given that
   we haven't stored a strong reference for the current value in the
   priority queue, the weak value in the table becomes invalid.
7. Then at lookup time, we find the invalid weak reference, reap
   it from the table, and return no value.

Here we fix the issue by always removing any existing entry with an
identical key hash before trying to insert a new entry into the priority
queue. This way, we're sure we do not keep an older entry with a
lingering weak reference to a deleted value.

Fixes: 0bade5a ("feat(pkt-meta): Add initial flow table implementation")
Signed-off-by: Quentin Monnet <qmo@qmon.net>
@qmonnet qmonnet force-pushed the pr/qmonnet/prioqueue-push branch from 5e5cce5 to 9c660fb Compare February 19, 2026 17:37
@qmonnet qmonnet enabled auto-merge February 19, 2026 17:38
@qmonnet qmonnet added this pull request to the merge queue Feb 19, 2026
Merged via the queue into main with commit 25f3292 Feb 19, 2026
21 checks passed
@qmonnet qmonnet deleted the pr/qmonnet/prioqueue-push branch February 19, 2026 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/nat Related to Network Address Translation (NAT)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

"Overwriting" an entry in flow table drops the value

3 participants

Comments