unify TLS logic and caching in one place#8713
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughTLS context creation was extracted from HTTPSettings into two new components: TlsContextBuilder and TlsContextRegistry. HTTPSettings.get_tls_context was removed. HttpxAdapter and WorkflowWorkerExecution were changed to accept a TlsContextRegistry and obtain SSL contexts from it. Service wiring was updated so InfrahubServices is constructed with an HTTP dependency and adapters are instantiated with a TLS registry; CLI, server, and worker initialization paths were updated to pass these dependencies. New registry builder and dependency provider functions were added and tests adjusted accordingly. 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
|
||
| workflow = config.OVERRIDE.workflow or ( | ||
| WorkflowWorkerExecution() | ||
| WorkflowWorkerExecution(tls_registry=build_tls_registry()) |
There was a problem hiding this comment.
WorkflowWorkerExecution had a confusing kludge where it instantiated HttpxAdapter as a class variable so that it could use HttpxAdapeter.verify_tls() in a context manager for running a deployment.
passing in the new TlxContextRegistry lets us remove that weird workaround
There was a problem hiding this comment.
That sounds good, having said that, we should probably remove this cli command git_agent.py... but should be done in another PR as it's not related to this change.
|
|
||
| _settings: config.HTTPSettings | None = None | ||
| def __init__(self, tls_registry: TlsContextRegistry) -> None: | ||
| self._tls_registry = tls_registry |
There was a problem hiding this comment.
new dependency that needs to be injected. allows removing the @cached_propertys below and moves them to the TlsContextRegistry which is now responsible for caching TLS context instances
| def build_tls_registry() -> TlsContextRegistry: | ||
| if "tls_registry" not in _singletons: | ||
| _singletons["tls_registry"] = TlsContextRegistry() | ||
| return _singletons["tls_registry"] |
There was a problem hiding this comment.
new singleton so everywhere can use the same TlsContextRegistry in a given worker
|
|
||
| return self | ||
|
|
||
| def get_tls_context(self, force_verify: bool = False) -> ssl.SSLContext: |
There was a problem hiding this comment.
moved to TlsContextBuilder.build
| with patch.dict(os.environ, {"INFRAHUB_STORAGE_MAX_FILE_SIZE": "75"}): | ||
| assert StorageSettings().max_file_size == 75 | ||
| assert isinstance(SETTINGS.storage.max_file_size, int) | ||
|
|
There was a problem hiding this comment.
moved to their own file b/c they are no longer part of config
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/infrahub/config.py (1)
537-544:⚠️ Potential issue | 🟠 MajorForce-verify the configured CA bundle during startup validation.
TlsContextBuilder.build()short-circuits whentls_insecure=True, so this validator never parsestls_ca_bundle. A malformed bundle or path then survives boot and only fails later on the first forced-verification request inbackend/infrahub/services/adapters/http/httpx.py:55-60; the same short-circuit also meansbackend/infrahub/tls/registry.py:23-28cannot prewarm that path today. If early SSL failures are part of the contract, validate the forced-verification variant here as well.🐛 Possible fix
- TlsContextBuilder.build(insecure=self.tls_insecure, ca_bundle=self.tls_ca_bundle) + TlsContextBuilder.build( + insecure=self.tls_insecure, + ca_bundle=self.tls_ca_bundle, + force_verify=bool(self.tls_ca_bundle), + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/infrahub/config.py` around lines 537 - 544, The validator set_tls_context currently calls TlsContextBuilder.build with insecure=self.tls_insecure which short-circuits when tls_insecure=True and skips parsing tls_ca_bundle; change the validation to explicitly attempt a forced-verification build (call TlsContextBuilder.build with insecure=False) when a tls_ca_bundle is configured (or always attempt a forced-verify build in addition to the original call) so malformed bundle paths are caught at startup, and keep the existing except ssl.SSLError -> raise ValueError behavior referencing tls_ca_bundle; use the existing set_tls_context and TlsContextBuilder.build symbols to locate where to add the extra forced-verify build attempt.
🧹 Nitpick comments (2)
backend/infrahub/tls/registry.py (1)
14-21: Canonicalize file-path CA bundles before using them as cache keys.
key = (insecure, ca_bundle, force_verify)is keyed on the caller's raw string, so the same CA file can be cached multiple times via relative, absolute, or symlinked spellings. That weakens the per-worker reuse this registry is supposed to provide and makes prewarming depend on how the path was written. Normalize existing file paths before constructing the key.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/infrahub/tls/registry.py` around lines 14 - 21, The cache key currently uses the raw ca_bundle string, so normalize the CA file path before building the key in get; if ca_bundle is not None, resolve it to a canonical absolute path (e.g., via pathlib.Path(ca_bundle).resolve() or os.path.realpath) and convert to a stable string, otherwise keep None, then use that normalized value in the key tuple (insecure, normalized_ca_bundle, force_verify) and proceed to read/create the SSL context via TlsContextBuilder.build and store it in self._cache to avoid duplicate entries caused by relative paths or symlinks.backend/tests/unit/tls/test_context_builder.py (1)
9-79: Add direct tests for the new registry contract.These cases exercise
TlsContextBuilder, but they never instantiateTlsContextRegistry. A regression inbackend/infrahub/tls/registry.py—for example losingforce_verifyfrom the cache key or not validating the forced-verification path—would still pass. Please add a small unit suite aroundTlsContextRegistry.get()andvalidate()so the new caching layer is covered directly.As per coding guidelines, "Write tests for new functionality in Python backend code".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/tests/unit/tls/test_context_builder.py` around lines 9 - 79, Add direct tests for the TlsContextRegistry by creating a TlsContextRegistry instance and exercising its get() and validate() methods (in addition to existing TlsContextBuilder tests) to cover the caching layer and forced-verification behavior; specifically, call TlsContextRegistry.get() with different parameter combinations (insecure True/False, force_verify True/False, ca_bundle as path and as PEM string) and assert that (1) entries keyed differently by force_verify produce distinct cached contexts, (2) validate() enforces the forced-verification path (i.e., returns a context with verify_mode == ssl.CERT_REQUIRED and check_hostname True when force_verify=True and a valid ca_bundle is provided), and (3) long PEM strings that raise OSError when treated as paths are handled as PEM content by validate(); reference TlsContextRegistry.get, TlsContextRegistry.validate, and TlsContextBuilder.build when adding these assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@backend/infrahub/config.py`:
- Around line 537-544: The validator set_tls_context currently calls
TlsContextBuilder.build with insecure=self.tls_insecure which short-circuits
when tls_insecure=True and skips parsing tls_ca_bundle; change the validation to
explicitly attempt a forced-verification build (call TlsContextBuilder.build
with insecure=False) when a tls_ca_bundle is configured (or always attempt a
forced-verify build in addition to the original call) so malformed bundle paths
are caught at startup, and keep the existing except ssl.SSLError -> raise
ValueError behavior referencing tls_ca_bundle; use the existing set_tls_context
and TlsContextBuilder.build symbols to locate where to add the extra
forced-verify build attempt.
---
Nitpick comments:
In `@backend/infrahub/tls/registry.py`:
- Around line 14-21: The cache key currently uses the raw ca_bundle string, so
normalize the CA file path before building the key in get; if ca_bundle is not
None, resolve it to a canonical absolute path (e.g., via
pathlib.Path(ca_bundle).resolve() or os.path.realpath) and convert to a stable
string, otherwise keep None, then use that normalized value in the key tuple
(insecure, normalized_ca_bundle, force_verify) and proceed to read/create the
SSL context via TlsContextBuilder.build and store it in self._cache to avoid
duplicate entries caused by relative paths or symlinks.
In `@backend/tests/unit/tls/test_context_builder.py`:
- Around line 9-79: Add direct tests for the TlsContextRegistry by creating a
TlsContextRegistry instance and exercising its get() and validate() methods (in
addition to existing TlsContextBuilder tests) to cover the caching layer and
forced-verification behavior; specifically, call TlsContextRegistry.get() with
different parameter combinations (insecure True/False, force_verify True/False,
ca_bundle as path and as PEM string) and assert that (1) entries keyed
differently by force_verify produce distinct cached contexts, (2) validate()
enforces the forced-verification path (i.e., returns a context with verify_mode
== ssl.CERT_REQUIRED and check_hostname True when force_verify=True and a valid
ca_bundle is provided), and (3) long PEM strings that raise OSError when treated
as paths are handled as PEM content by validate(); reference
TlsContextRegistry.get, TlsContextRegistry.validate, and TlsContextBuilder.build
when adding these assertions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e130ebbc-05f8-4c17-a339-a9698a18a4a8
📒 Files selected for processing (19)
backend/infrahub/cli/__init__.pybackend/infrahub/cli/git_agent.pybackend/infrahub/cli/tasks.pybackend/infrahub/config.pybackend/infrahub/server.pybackend/infrahub/services/__init__.pybackend/infrahub/services/adapters/http/httpx.pybackend/infrahub/services/adapters/workflow/worker.pybackend/infrahub/tls/__init__.pybackend/infrahub/tls/context_builder.pybackend/infrahub/tls/registry.pybackend/infrahub/workers/dependencies.pybackend/infrahub/workers/infrahub_async.pybackend/tests/integration/services/adapters/http/test_httpx.pybackend/tests/integration/services/adapters/workflow/test_workflow_execution.pybackend/tests/unit/config/test_config.pybackend/tests/unit/tls/__init__.pybackend/tests/unit/tls/test_context_builder.pypyproject.toml
💤 Files with no reviewable changes (1)
- backend/tests/unit/config/test_config.py
ogenstad
left a comment
There was a problem hiding this comment.
LGTM, great improvement!
One additional thing to highlight could potentially be this part:
def build_client() -> InfrahubClient:
client_config = Config(address=config.SETTINGS.main.internal_address, retry_on_failure=True)
client_config.set_ssl_context(context=get_http().verify_tls())
client = InfrahubClient(config=client_config)
# Populate client schema cache using our internal schema cache
if registry.schema:
for branch in registry.schema.get_branches():
client.schema.set_cache(schema=registry.schema.get_sdk_schema_branch(name=branch), branch=branch)
return clientReaching into get_http().verify_tls() here still feels somewhat hacky. On the other hand even if we were to change this we'd still be using the settings from the HTTP adapter. Have a look at this and see if you want to change anything, otherwise we can leave it as is for now. I think the next step after this is merged would be to introduce a common TLS setting across Infrahub which will be the default configuration option that could be overridden by a specific adapter if needed.
|
|
||
| workflow = config.OVERRIDE.workflow or ( | ||
| WorkflowWorkerExecution() | ||
| WorkflowWorkerExecution(tls_registry=build_tls_registry()) |
There was a problem hiding this comment.
That sounds good, having said that, we should probably remove this cli command git_agent.py... but should be done in another PR as it's not related to this change.
fatih-acar
left a comment
There was a problem hiding this comment.
Thanks for this long awaited refactor, LGTM!
There was a problem hiding this comment.
1 issue found across 19 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="backend/infrahub/tls/registry.py">
<violation number="1" location="backend/infrahub/tls/registry.py:23">
P2: `validate()` omits `force_verify`, so when `insecure=True` the builder short-circuits to an unverified context and never touches the CA bundle. A broken CA bundle won't be caught at startup — the error will surface only at first request time with `force_verify=True`, defeating the purpose of this method.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
| ) | ||
| return self._cache[key] | ||
|
|
||
| def validate(self, insecure: bool = False, ca_bundle: str | None = None) -> None: |
There was a problem hiding this comment.
P2: validate() omits force_verify, so when insecure=True the builder short-circuits to an unverified context and never touches the CA bundle. A broken CA bundle won't be caught at startup — the error will surface only at first request time with force_verify=True, defeating the purpose of this method.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/infrahub/tls/registry.py, line 23:
<comment>`validate()` omits `force_verify`, so when `insecure=True` the builder short-circuits to an unverified context and never touches the CA bundle. A broken CA bundle won't be caught at startup — the error will surface only at first request time with `force_verify=True`, defeating the purpose of this method.</comment>
<file context>
@@ -0,0 +1,28 @@
+ )
+ return self._cache[key]
+
+ def validate(self, insecure: bool = False, ca_bundle: str | None = None) -> None:
+ """Build and cache the SSL context for the given config, raising ValueError on failure."""
+ try:
</file context>
Why
TLS context creation logic was duplicated in this enterprise PR to add a new
TCPTransportclass for log forwarding. Consolidating the TLS context creation code in one place that can be reused is better than duplicating it. This also adds a TLS context caching componentTlsContextRegistryso that any TLS context can be computed one time per worker and then cachedWhat changed
infrahub/tls/context_builder.py— newTlsContextBuilderclass with a singlebuild(insecure, ca_bundle, force_verify)static method containing the canonical CA bundle resolution logic (replaces bothHTTPSettings.get_tls_contextandbuild_tls_context)infrahub/tls/registry.py— newTlsContextRegistryclass that wrapsTlsContextBuilder, caches built contexts by(insecure, ca_bundle, force_verify)key, and exposes avalidate()method for startup-time pre-warmingworkers/dependencies.py—build_tls_registry()/get_tls_registry()added following the existingbuild_*/get_*singleton pattern;build_http_serviceandbuild_workflownow injectTlsContextRegistryinto their respective componentsHttpxAdapter— now acceptsTlsContextRegistryas a constructor argument;@cached_property tls_context / tls_context_verifiedremoved in favour of direct registry calls inverify_tls()WorkflowWorkerExecution— class-level_http_adapter = HttpxAdapter()workaround removed (the comment explicitly anticipated this); now acceptsTlsContextRegistrydirectlyHTTPSettings.get_tls_context— removed; the model validator callsTlsContextBuilder.build()directly for startup validationbuild_tls_contextin enterprisetransport/base.py— removed;tcp.pynow callsTlsContextBuilder.build()directlyserver.py,workers/infrahub_async.py,cli/__init__.py,cli/git_agent.py— updated to pass a constructedHttpxAdapterintoInfrahubServices.new()No public API changes, no schema changes, no config/env changes.
Suggested review order
infrahub/tls/context_builder.pyandinfrahub/tls/registry.py— the new primitivesworkers/dependencies.py— DI wiringservices/adapters/http/httpx.pyandservices/adapters/workflow/worker.py— the two main consumersconfig.py— removal ofget_tls_contextenterprise/transport/base.pyandtcp.py— removal ofbuild_tls_contextHow to review
Focus on
tls/registry.py— specifically that the cache key(insecure, ca_bundle, force_verify)is correct and thatvalidate()surfaces SSL errors at startup rather than at first use.How to test
# Unit tests for TlsContextBuilder uv run invoke backend.test-unit -- -k tlsImpact & rollout
HTTPSettings.get_tls_context()was internal; no public API removed.Checklist
uv run towncrier create ...)Summary by CodeRabbit
Refactor
Tests