fix(backend): add db backoff retry for resolvers#8712
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (12)
✅ Files skipped from review due to trivial changes (3)
🚧 Files skipped from review as they are similar to previous changes (7)
WalkthroughAdded three database settings: 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
9ee1e8d to
9ad1b42
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/infrahub/database/__init__.py (1)
520-537:⚠️ Potential issue | 🟡 MinorDon't sleep or count a retry after the last attempt.
On the terminal failure path this block still computes backoff, logs
Retrying ..., incrementsTRANSACTION_RETRIES, and awaitsasyncio.sleep(retry_time)before raising. That adds avoidable latency and overcounts retries whenever the final attempt fails.💡 Proposed fix
async def wrapper(*args: Any, **kwargs: Any) -> R: error = Exception() for attempt in range(1, config.SETTINGS.database.retry_limit + 1): try: return await func(*args, **kwargs) except (TransientError, ClientError) as exc: if isinstance(exc, ClientError): if exc.code != "Neo.ClientError.Statement.EntityNotFound": raise exc + if attempt == config.SETTINGS.database.retry_limit: + error = exc + break base_delay = config.SETTINGS.database.retry_base_delay max_delay = config.SETTINGS.database.retry_max_delay jitter = random.uniform(0, config.SETTINGS.database.retry_jitter_max) retry_time = min(base_delay * (2 ** (attempt - 1)) + jitter, max_delay) log.exception("Retry handler caught database error") @@ ) log.debug("Database transaction failed", message=exc.message) TRANSACTION_RETRIES.labels(name).inc() await asyncio.sleep(retry_time) - if attempt == config.SETTINGS.database.retry_limit: - error = exc - break raise error🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/infrahub/database/__init__.py` around lines 520 - 537, The retry branch inside the transaction retry loop wrongly performs backoff work even on the final attempt; update the block that handles (TransientError, ClientError) so that it checks if attempt == config.SETTINGS.database.retry_limit first and, on the last attempt, sets the final error (error = exc) and breaks/raises without computing retry_time, calling log.info("Retrying..."), incrementing TRANSACTION_RETRIES.labels(name).inc(), or awaiting asyncio.sleep(retry_time); only compute jitter/base_delay/max_delay, log the retry, increment the metric and await asyncio.sleep when there will be another attempt (i.e., attempt < retry_limit). Ensure you reference the existing symbols: attempt, config.SETTINGS.database.retry_limit, TRANSACTION_RETRIES.labels(name).inc(), asyncio.sleep, log.info and error = exc.
🧹 Nitpick comments (2)
backend/infrahub/graphql/resolvers/resolver.py (1)
28-29: Per-attempt span visibility is lost with trace decorator above retry logic.With
@trace.get_tracer(__name__).start_as_current_span()above@retry_db_transaction, all retry attempts execute within a single span. Attempt-level information (which attempt failed/succeeded) is logged but not separated into distinct spans. To enable per-attempt span visibility, either swap the decorator order so@retry_db_transactionwraps the trace spans, or add span instrumentation inside theretry_db_transactiondecorator.Also applies to: 54-56, 187-189, 288-290, 296-298, 318-320
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/infrahub/graphql/resolvers/resolver.py` around lines 28 - 29, The trace span decorator is placed above the retry decorator so all attempts run inside one span; swap the decorator order so `@retry_db_transaction`(name="account_resolver") is above trace.get_tracer(__name__).start_as_current_span("account_resolver") (or alternatively add instrumentation to open a new span inside retry_db_transaction) for the account_resolver and the other listed resolvers (the decorators at lines around 54-56, 187-189, 288-290, 296-298, 318-320) so each retry attempt creates its own span and attempt-level visibility is preserved.backend/tests/unit/graphql/resolvers/test_retry_db_transaction.py (1)
33-39: Replace test helper with a property mock to avoid depending on Neo4j's private exception implementation.The
_make_client_errorhelper (lines 33–39) sets private attributes (_neo4j_code,_classification,_category,_title) that the Neo4j driver may change internally without notice. While production code correctly uses the publicexc.codeproperty, the test's reliance on private fields is brittle.Instead, construct the exception and mock its
codeproperty directly:def _make_client_error( message: str = "not found", code: str = "Neo.ClientError.Statement.EntityNotFound" ) -> ClientError: exc = ClientError(message) exc.code = code # or use unittest.mock.PropertyMock if needed return excOr use
unittest.mock.patch.objecton thecodeproperty when setting up test fixtures to ensure the mock aligns with the public API the retry logic actually reads.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/tests/unit/graphql/resolvers/test_retry_db_transaction.py` around lines 33 - 39, The helper _make_client_error currently sets Neo4j private attributes (_neo4j_code, _classification, _category, _title) on a ClientError which is brittle; change it to set or mock the public .code property instead (either assign exc.code = code if possible or use unittest.mock.PropertyMock/patch.object to patch ClientError.code) so tests use the same public API that retry_db_transaction reads; update any test fixtures that call _make_client_error accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/tests/component/graphql/test_graphql_read_retry.py`:
- Around line 69-77: The neo4j_small_thread_pool fixture currently sets the Bolt
thread pool max size to "10" but should be deterministic and match the module
docstring (size=1); update the extra_env value passed to start_neo4j_container
in neo4j_small_thread_pool to set "NEO4J_server_bolt_thread__pool__max__size" to
"1" (and ensure the fixture docstring reflects size=1) so the test reliably
exhausts the pool and exercises retry_db_transaction.
- Around line 84-102: The fixture db_small_thread_pool can skip readiness
because backend/infrahub/database/__init__.py::validated_database may already be
set, and it leaves config.SETTINGS.database.address/port mutated on exceptions;
fix by forcing validation and ensuring cleanup: save and clear the module-level
validated_database before calling get_db (or call get_db with a force/validate
flag if available), wrap the setup/yield/teardown in a try/finally so you always
restore config.SETTINGS.database.address and .port and restore the original
validated_database, and ensure any created driver/db (from get_db and
InfrahubDatabase) are closed in the finally if setup failed before yield;
reference db_small_thread_pool, get_db, validated_database, config.SETTINGS,
InfrahubDatabase, and add_indexes to locate changed code.
In `@backend/tests/unit/graphql/resolvers/test_retry_db_transaction.py`:
- Around line 19-24: The fixture _set_retry_limit only pins retry_limit; update
it to also capture and restore the timing-related settings
(config.SETTINGS.database.retry_base, config.SETTINGS.database.retry_max, and
config.SETTINGS.database.retry_jitter) alongside retry_limit so tests asserting
timing (lines referencing those defaults) remain deterministic—store each
original value before changing, set the desired deterministic values in the
fixture, yield, then restore all originals after the test.
---
Outside diff comments:
In `@backend/infrahub/database/__init__.py`:
- Around line 520-537: The retry branch inside the transaction retry loop
wrongly performs backoff work even on the final attempt; update the block that
handles (TransientError, ClientError) so that it checks if attempt ==
config.SETTINGS.database.retry_limit first and, on the last attempt, sets the
final error (error = exc) and breaks/raises without computing retry_time,
calling log.info("Retrying..."), incrementing
TRANSACTION_RETRIES.labels(name).inc(), or awaiting asyncio.sleep(retry_time);
only compute jitter/base_delay/max_delay, log the retry, increment the metric
and await asyncio.sleep when there will be another attempt (i.e., attempt <
retry_limit). Ensure you reference the existing symbols: attempt,
config.SETTINGS.database.retry_limit, TRANSACTION_RETRIES.labels(name).inc(),
asyncio.sleep, log.info and error = exc.
---
Nitpick comments:
In `@backend/infrahub/graphql/resolvers/resolver.py`:
- Around line 28-29: The trace span decorator is placed above the retry
decorator so all attempts run inside one span; swap the decorator order so
`@retry_db_transaction`(name="account_resolver") is above
trace.get_tracer(__name__).start_as_current_span("account_resolver") (or
alternatively add instrumentation to open a new span inside
retry_db_transaction) for the account_resolver and the other listed resolvers
(the decorators at lines around 54-56, 187-189, 288-290, 296-298, 318-320) so
each retry attempt creates its own span and attempt-level visibility is
preserved.
In `@backend/tests/unit/graphql/resolvers/test_retry_db_transaction.py`:
- Around line 33-39: The helper _make_client_error currently sets Neo4j private
attributes (_neo4j_code, _classification, _category, _title) on a ClientError
which is brittle; change it to set or mock the public .code property instead
(either assign exc.code = code if possible or use
unittest.mock.PropertyMock/patch.object to patch ClientError.code) so tests use
the same public API that retry_db_transaction reads; update any test fixtures
that call _make_client_error accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b91d40ce-b414-452d-865e-0edf9252f300
📒 Files selected for processing (11)
backend/infrahub/config.pybackend/infrahub/database/__init__.pybackend/infrahub/graphql/resolvers/account_metadata.pybackend/infrahub/graphql/resolvers/ipam.pybackend/infrahub/graphql/resolvers/resolver.pybackend/tests/component/graphql/test_graphql_read_retry.pybackend/tests/helpers/utils.pybackend/tests/unit/graphql/resolvers/__init__.pybackend/tests/unit/graphql/resolvers/test_retry_db_transaction.pychangelog/8696.fixed.mddocs/docs/reference/configuration.mdx
| def neo4j_small_thread_pool(request: pytest.FixtureRequest) -> dict[int, int]: | ||
| """Start a dedicated Neo4j container with a minimal Bolt thread pool (size=10).""" | ||
| if not INFRAHUB_USE_TEST_CONTAINERS: | ||
| pytest.skip("Test containers are disabled") | ||
|
|
||
| container = start_neo4j_container( | ||
| neo4j_image=NEO4J_IMAGE, | ||
| extra_env={"NEO4J_server_bolt_thread__pool__max__size": "10"}, | ||
| ) |
There was a problem hiding this comment.
Make the thread-pool bottleneck deterministic here.
The module docstring says this container should run with server.bolt.thread_pool_max_size=1, but this fixture actually sets it to 10. With the success case only issuing 10 concurrent queries on Line 312, the test can pass without ever exhausting the pool or exercising retry_db_transaction.
💡 Proposed fix
`@pytest.fixture`(scope="module")
def neo4j_small_thread_pool(request: pytest.FixtureRequest) -> dict[int, int]:
- """Start a dedicated Neo4j container with a minimal Bolt thread pool (size=10)."""
+ """Start a dedicated Neo4j container with a minimal Bolt thread pool (size=1)."""
@@
container = start_neo4j_container(
neo4j_image=NEO4J_IMAGE,
- extra_env={"NEO4J_server_bolt_thread__pool__max__size": "10"},
+ extra_env={"NEO4J_server_bolt_thread__pool__max__size": "1"},
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/tests/component/graphql/test_graphql_read_retry.py` around lines 69 -
77, The neo4j_small_thread_pool fixture currently sets the Bolt thread pool max
size to "10" but should be deterministic and match the module docstring
(size=1); update the extra_env value passed to start_neo4j_container in
neo4j_small_thread_pool to set "NEO4J_server_bolt_thread__pool__max__size" to
"1" (and ensure the fixture docstring reflects size=1) so the test reliably
exhausts the pool and exercises retry_db_transaction.
| async def db_small_thread_pool( | ||
| neo4j_small_thread_pool: dict[int, int], | ||
| ) -> AsyncGenerator[InfrahubDatabase, None]: | ||
| """Create an InfrahubDatabase connected to the small-thread-pool Neo4j.""" | ||
| original_address = config.SETTINGS.database.address | ||
| original_port = config.SETTINGS.database.port | ||
|
|
||
| config.SETTINGS.database.address = "localhost" | ||
| config.SETTINGS.database.port = neo4j_small_thread_pool[PORT_BOLT_NEO4J] | ||
|
|
||
| driver = await get_db(retry=5) | ||
| db = InfrahubDatabase(driver=driver) | ||
| await add_indexes(db=db) | ||
|
|
||
| yield db | ||
|
|
||
| await db.close() | ||
| config.SETTINGS.database.address = original_address | ||
| config.SETTINGS.database.port = original_port |
There was a problem hiding this comment.
Harden this fixture against stale validation state and setup failures.
get_db(retry=5) only waits through validate_database(), but that path is skipped once backend/infrahub/database/__init__.py::validated_database already contains the default database name. In a full suite run this fixture can therefore skip readiness checks for the dedicated container, and if setup fails before yield, the overridden config.SETTINGS.database.address / port never get restored.
💡 Proposed fix
-from infrahub.database import InfrahubDatabase, get_db
+from infrahub.database import InfrahubDatabase, get_db, validate_database `@pytest.fixture`(scope="module")
async def db_small_thread_pool(
neo4j_small_thread_pool: dict[int, int],
) -> AsyncGenerator[InfrahubDatabase, None]:
"""Create an InfrahubDatabase connected to the small-thread-pool Neo4j."""
original_address = config.SETTINGS.database.address
original_port = config.SETTINGS.database.port
+ db: InfrahubDatabase | None = None
- config.SETTINGS.database.address = "localhost"
- config.SETTINGS.database.port = neo4j_small_thread_pool[PORT_BOLT_NEO4J]
-
- driver = await get_db(retry=5)
- db = InfrahubDatabase(driver=driver)
- await add_indexes(db=db)
-
- yield db
-
- await db.close()
- config.SETTINGS.database.address = original_address
- config.SETTINGS.database.port = original_port
+ try:
+ config.SETTINGS.database.address = "localhost"
+ config.SETTINGS.database.port = neo4j_small_thread_pool[PORT_BOLT_NEO4J]
+
+ driver = await get_db(retry=5)
+ await validate_database(
+ driver=driver,
+ database_name=config.SETTINGS.database.database_name,
+ retry=5,
+ create_db=True,
+ )
+ db = InfrahubDatabase(driver=driver)
+ await add_indexes(db=db)
+
+ yield db
+ finally:
+ if db is not None:
+ await db.close()
+ config.SETTINGS.database.address = original_address
+ config.SETTINGS.database.port = original_port🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/tests/component/graphql/test_graphql_read_retry.py` around lines 84 -
102, The fixture db_small_thread_pool can skip readiness because
backend/infrahub/database/__init__.py::validated_database may already be set,
and it leaves config.SETTINGS.database.address/port mutated on exceptions; fix
by forcing validation and ensuring cleanup: save and clear the module-level
validated_database before calling get_db (or call get_db with a force/validate
flag if available), wrap the setup/yield/teardown in a try/finally so you always
restore config.SETTINGS.database.address and .port and restore the original
validated_database, and ensure any created driver/db (from get_db and
InfrahubDatabase) are closed in the finally if setup failed before yield;
reference db_small_thread_pool, get_db, validated_database, config.SETTINGS,
InfrahubDatabase, and add_indexes to locate changed code.
| def _set_retry_limit() -> Generator[None, None, None]: | ||
| original = config.SETTINGS.database.retry_limit | ||
| config.SETTINGS.database.retry_limit = 3 | ||
| yield | ||
| config.SETTINGS.database.retry_limit = original | ||
|
|
There was a problem hiding this comment.
Pin retry timing settings in the fixture to keep these tests deterministic.
Line 19-24 only fixes retry_limit, but Line 105-112 and Line 128-131 assert timing behavior tied to base/max/jitter defaults. Please set and restore all retry timing settings in the fixture to avoid unrelated config drift breaking unit tests.
Suggested patch
`@pytest.fixture`(autouse=True)
def _set_retry_limit() -> Generator[None, None, None]:
- original = config.SETTINGS.database.retry_limit
+ original_retry_limit = config.SETTINGS.database.retry_limit
+ original_base_delay = config.SETTINGS.database.retry_base_delay
+ original_max_delay = config.SETTINGS.database.retry_max_delay
+ original_jitter_max = config.SETTINGS.database.retry_jitter_max
+
config.SETTINGS.database.retry_limit = 3
+ config.SETTINGS.database.retry_base_delay = 0.1
+ config.SETTINGS.database.retry_max_delay = 2.0
+ config.SETTINGS.database.retry_jitter_max = 0.0
yield
- config.SETTINGS.database.retry_limit = original
+ config.SETTINGS.database.retry_limit = original_retry_limit
+ config.SETTINGS.database.retry_base_delay = original_base_delay
+ config.SETTINGS.database.retry_max_delay = original_max_delay
+ config.SETTINGS.database.retry_jitter_max = original_jitter_maxAlso applies to: 105-112, 128-131
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/tests/unit/graphql/resolvers/test_retry_db_transaction.py` around
lines 19 - 24, The fixture _set_retry_limit only pins retry_limit; update it to
also capture and restore the timing-related settings
(config.SETTINGS.database.retry_base, config.SETTINGS.database.retry_max, and
config.SETTINGS.database.retry_jitter) alongside retry_limit so tests asserting
timing (lines referencing those defaults) remain deterministic—store each
original value before changing, set the desired deterministic values in the
fixture, yield, then restore all originals after the test.
| default=2.0, ge=0, description="Maximum delay in seconds for exponential backoff on transaction retries." | ||
| ) | ||
| retry_jitter_max: float = Field( | ||
| default=0.1, ge=0, description="Maximum jitter in seconds added to retry delay to avoid thundering herd." |
| """Test that without retries, concurrent queries on a small thread pool fail | ||
| with TransientError, proving the retry mechanism is needed.""" | ||
| original_retry_limit = config.SETTINGS.database.retry_limit | ||
| config.SETTINGS.database.retry_limit = 1 # Only 1 attempt, no retries |
There was a problem hiding this comment.
should be able to use a fixture with the yield pattern for this and avoid the try-finally below
| db_small_thread_pool: InfrahubDatabase, | ||
| default_branch_stp: Branch, | ||
| ) -> SchemaBranch: | ||
| """Register data_schema + node_group_schema + car_person_schema on the small-thread-pool DB. |
There was a problem hiding this comment.
we have a pattern that we've used to rescope fixtures in other places that I think would let you remove all or almost all of this function. I'm pretty sure we've got fixtures for all of these (maybe not CoreNode) in other places with the default or "class" scope
this is one example
@pytest.fixture
async def default_branch(
reset_registry: None, local_storage_dir: Path, empty_database: None, db: InfrahubDatabase
) -> Branch:
return await do_default_branch(db=db)
@pytest.fixture(scope="class")
async def default_branch_scope_class(
reset_registry_scope_class: None,
local_storage_dir_scope_class: Path,
empty_database_scope_class: None,
db: InfrahubDatabase,
) -> Branch:
return await do_default_branch(db=db)
async def do_default_branch(db: InfrahubDatabase) -> Branch:
branch = await create_default_branch(db=db)
await create_global_branch(db=db)
registry.schema = SchemaManager()
return branch
There was a problem hiding this comment.
thanks I think there's less duplicate now
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
backend/tests/component/graphql/test_graphql_read_retry.py (2)
83-102:⚠️ Potential issue | 🟡 MinorConfig restoration is not guaranteed on setup failure.
If an exception occurs during setup (lines 91-96) before the
yield, the originalconfig.SETTINGS.database.addressandportvalues won't be restored, potentially affecting subsequent tests. Wrap the setup/yield/teardown in atry/finallyblock.🛡️ Proposed fix
`@pytest.fixture`(scope="module") async def db_small_thread_pool( neo4j_small_thread_pool: dict[int, int], ) -> AsyncGenerator[InfrahubDatabase, None]: """Create an InfrahubDatabase connected to the small-thread-pool Neo4j.""" original_address = config.SETTINGS.database.address original_port = config.SETTINGS.database.port + db: InfrahubDatabase | None = None - config.SETTINGS.database.address = "localhost" - config.SETTINGS.database.port = neo4j_small_thread_pool[PORT_BOLT_NEO4J] - - driver = await get_db(retry=5) - db = InfrahubDatabase(driver=driver) - await add_indexes(db=db) - - yield db - - await db.close() - config.SETTINGS.database.address = original_address - config.SETTINGS.database.port = original_port + try: + config.SETTINGS.database.address = "localhost" + config.SETTINGS.database.port = neo4j_small_thread_pool[PORT_BOLT_NEO4J] + + driver = await get_db(retry=5) + db = InfrahubDatabase(driver=driver) + await add_indexes(db=db) + + yield db + finally: + if db is not None: + await db.close() + config.SETTINGS.database.address = original_address + config.SETTINGS.database.port = original_port🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/tests/component/graphql/test_graphql_read_retry.py` around lines 83 - 102, The fixture db_small_thread_pool currently sets config.SETTINGS.database.address/port and performs async setup (get_db, InfrahubDatabase, add_indexes) but if an exception occurs before yield the original config isn't restored; modify db_small_thread_pool to wrap the setup, yield and teardown in a try/finally so that in the finally block you always call await db.close() if db was created and restore config.SETTINGS.database.address and .port regardless of errors (referencing db_small_thread_pool, get_db, add_indexes, db.close and config.SETTINGS.database.address/port to locate the code).
1-9:⚠️ Potential issue | 🟡 MinorModule docstring is inconsistent with actual configuration.
The module docstring states
server.bolt.thread_pool_max_size=1(Line 4), but the fixture at Line 76 actually configures"NEO4J_server_bolt_thread__pool__max__size": "10". Update the module docstring to reflect the actual thread pool size used."""Component tests for GraphQL read resolver retry on transient Neo4j errors. These tests simulate Neo4j server-side Bolt thread pool exhaustion by spinning up -a dedicated Neo4j container with server.bolt.thread_pool_max_size=1 and firing +a dedicated Neo4j container with server.bolt.thread_pool_max_size=10 and firing concurrent queries that hit many-cardinality relationship resolvers.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/tests/component/graphql/test_graphql_read_retry.py` around lines 1 - 9, Update the module docstring at the top of backend/tests/component/graphql/test_graphql_read_retry.py to match the actual Neo4j configuration used in the tests: change the described Bolt thread pool max size from 1 to 10 to reflect the fixture that sets "NEO4J_server_bolt_thread__pool__max__size": "10"; ensure the docstring wording mentions server.bolt.thread_pool_max_size=10 so it accurately describes the test setup.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/tests/component/graphql/test_graphql_read_retry.py`:
- Around line 155-165: The return type hint for _run_graphql_query is incorrect:
the function returns the ExecutionResult from graphql(), not a dict; update the
function signature to use the correct type (e.g., graphql.ExecutionResult or the
appropriate ExecutionResult import) and add/update the import if needed so
_run_graphql_query(db: InfrahubDatabase, branch: Branch) -> ExecutionResult
accurately reflects the value returned by the graphql(...) call.
---
Duplicate comments:
In `@backend/tests/component/graphql/test_graphql_read_retry.py`:
- Around line 83-102: The fixture db_small_thread_pool currently sets
config.SETTINGS.database.address/port and performs async setup (get_db,
InfrahubDatabase, add_indexes) but if an exception occurs before yield the
original config isn't restored; modify db_small_thread_pool to wrap the setup,
yield and teardown in a try/finally so that in the finally block you always call
await db.close() if db was created and restore config.SETTINGS.database.address
and .port regardless of errors (referencing db_small_thread_pool, get_db,
add_indexes, db.close and config.SETTINGS.database.address/port to locate the
code).
- Around line 1-9: Update the module docstring at the top of
backend/tests/component/graphql/test_graphql_read_retry.py to match the actual
Neo4j configuration used in the tests: change the described Bolt thread pool max
size from 1 to 10 to reflect the fixture that sets
"NEO4J_server_bolt_thread__pool__max__size": "10"; ensure the docstring wording
mentions server.bolt.thread_pool_max_size=10 so it accurately describes the test
setup.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 97d9397e-7b65-4583-87d9-3c80efbf9f12
📒 Files selected for processing (2)
backend/tests/component/graphql/test_graphql_read_retry.pybackend/tests/conftest.py
d5d9ed3 to
40321e2
Compare
|
@cubic review this PR |
@BeArchiTek I have started the AI code review. It will take a few minutes to complete. |
There was a problem hiding this comment.
2 issues found across 12 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="backend/tests/unit/graphql/resolvers/test_retry_db_transaction.py">
<violation number="1" location="backend/tests/unit/graphql/resolvers/test_retry_db_transaction.py:115">
P2: Custom agent: **Detect try/finally-based ad-hoc cleanup in tests (encourage yield fixtures)**
This test uses `try/finally` for setup/teardown of `retry_limit`, duplicating the yield-fixture pattern already used by `_set_retry_limit` in the same file. Extract a parameterized or dedicated yield fixture (e.g., `_set_high_retry_limit`) to centralize the cleanup and keep the test body focused on assertions.</violation>
</file>
<file name="backend/tests/component/graphql/test_graphql_read_retry.py">
<violation number="1" location="backend/tests/component/graphql/test_graphql_read_retry.py:175">
P1: Custom agent: **Flag ad-hoc reproducer tests committed to CI (unmarked/flaky reproducer code)**
This test is unmarked but lives in a file flagged as flaky/reproducer. The file explicitly contains 'flaky' and 'Reproduction test for issue #8696' tokens (line 207), and the PR author states these tests should be disabled in CI. Per the rule, tests in files containing these keywords must be gated with `@pytest.mark.skip` or `@pytest.mark.xfail`. Add a skip marker to prevent CI instability.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
ajtmccarty
left a comment
There was a problem hiding this comment.
just some test comments that don't need to block merging
There was a problem hiding this comment.
not certain if this should be a component test
I'd think of spinning up a new container as more of an integration or integration_docker test... BUT it is just another database container, so I'm not sure. asking for team's input in slack, but does not need to block merging this PR
There was a problem hiding this comment.
In its current state, based on the dev/guidelines/backend/testing.md file, I think this should be into component as this is only related to the database, but maybe in a separate folder that test specifically the database container and its settings. I do not find an existing database folder in component tests, we could create one if this is the first time we try to solve this kind of db specific issue.
In my opinion, if we want this test scenario to be related to similar other integration tests, it may be justified if we group this kind of test. In that case, we should perhaps send the requests via the "accessible" API infrahub_sdk.client.InfrahubClient.execute_graphql rather than directly from the inner graphql library?
There was a problem hiding this comment.
I think either way is fine as long as the test exercises the resolvers that have the new decorator
There was a problem hiding this comment.
I understand the debate but this felt obvious to me: I just want to test an internal behavior (the graphql resolvers) against a custom, real database... component test is the right place in my opinion
207d096 to
bd23d0d
Compare
This PR improves the retry_db_transaction decorator by having a configurable backoff retry. It also enables the decorator on the graphql query resolvers. Adds component tests to check the retry behavior solves hard failures when neo4j thread pool exhausts: we must start a new container with a low thread pool size. Signed-off-by: Fatih Acar <fatih@opsmill.com>
Signed-off-by: Fatih Acar <fatih@opsmill.com>
bd23d0d to
17ed35f
Compare
This PR improves the retry_db_transaction decorator by having a configurable backoff retry.
It also enables the decorator on the graphql query resolvers.
Adds component tests to check the retry behavior solves hard failures
when neo4j thread pool exhausts: we must start a new container with a
low thread pool size.
I had to duplicate a few fixtures to have the new neo4j container working... I'm not very satisfied of the tests, looking for feedback on those. There's also high chance they are flaky... so we probably want to disable them in CI for now.
Fixes #8696
Summary by CodeRabbit
Bug Fixes
New Features
Documentation
Tests
Chores