Conversation
Two component tests demonstrate that the database allows duplicate Branch nodes with the same name. The create_branch task uses check-then-create without a lock or DB uniqueness constraint, enabling concurrent workers to both create branches with identical names. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The create_branch task used a check-then-create (TOCTOU) pattern where concurrent workers could both pass the Branch.get_by_name() existence check before either completed Branch.save(), resulting in duplicate Branch nodes with the same name in Neo4j. Wrap the entire check-and-create flow in create_branch (tasks.py) with a distributed lock keyed by branch name (namespace="branch"), using the existing InfrahubLock infrastructure. This serializes concurrent branch creation attempts for the same name, ensuring the second worker sees the already-created branch and raises ValidationError. Update the test to assert the correct (fixed) behavior: the second creation attempt is rejected and only one Branch node exists. Closes #8368 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Automated Code Review: fix distributed lock for branch creation (closes #8368)Verdict: APPROVED WITH SUGGESTIONS A. CorrectnessThe fix correctly addresses the root cause. The TOCTOU race condition in Key observations:
Edge cases considered:
No issues found. B. Code Quality
Suggestion: The test helper function C. Documentation Alignment
Missing items (from PR checklist):
D. Test Quality
Minor suggestions:
Recommended Next Steps for Human Reviewer
Automated review by Claude Code |
|
Three problems:
I have modified pipeline and prompts in order to fix these issues. Let's retry on this one |
Root cause analysisRoot cause: The Affected files:
Explanation: When multiple API requests to create a branch arrive simultaneously (common with multiple workers via
The Replication testTest file added: Why it reproduces the bug: The test directly creates two Branch objects with the same name in Neo4j (bypassing the existence check, which is exactly what happens when two workers both pass the check before either saves). It then verifies that both saves succeed and two Branch nodes with the same name exist in the database. The test PASSES on buggy code (duplicates are allowed) and will FAIL once a fix is applied. A second test ( Notes for the fix agent
|
Why
The
create_branchtask uses a check-then-create (TOCTOU) pattern: it callsBranch.get_by_name()to check if a branch exists, then callsBranch.save()to create it. When multiple concurrent workers execute this flow simultaneously for the same branch name, both pass the existence check before either completes the save, resulting in duplicate Branch nodes in Neo4j.Closes #8368
What changed
backend/infrahub/core/branch/tasks.py: Wrapped the entire check-and-create flow increate_branchwith a distributed lock (lock.registry.get(name=model.name, namespace="branch")). This serializes concurrent branch creation attempts for the same name, so the second worker acquires the lock only after the first has already created the branch, then fails with "already exists".backend/tests/component/core/branch/test_branch_creation_race_condition.py: Rewrote the test to assert the correct (fixed) behavior — the second creation attempt raisesValidationErrorand only one Branch node exists in the database. Added a concurrency test usingasyncio.gatherto verify lock serialization.No schema changes. API contract unchanged. The GraphQL mutation's early existence check in
branch.pyremains as a fast-fail optimization; the actual protection is in thecreate_branchtask.Fix strategy
The root cause is a TOCTOU race in
create_branch— the existence check and creation are not atomic. The existinglocal_schema_lockonly protects schema operations on a single worker, not cross-worker coordination. The fix uses the existingInfrahubLockdistributed lock infrastructure (backed by Redis or NATS in production, local asyncio lock in tests) to serialize the entire check-create-register flow per branch name.How to review
backend/infrahub/core/branch/tasks.py— the one-line addition ofasync with lock.registry.get(name=model.name, namespace="branch"):and re-indentation of the existing blockbackend/tests/component/core/branch/test_branch_creation_race_condition.py— the rewritten testsHow to test
Both tests should pass: sequential duplicate creation is rejected, and concurrent creation is serialized with only one succeeding.
Impact & rollout
Checklist