RFC: fix config sync application and retry semantics#309
Conversation
|
Upstream red/green evidence for the sync application fix: On untouched go test ./internal/config -run 'TestMergeConfiguration_|TestApplyPatch_'That repro fails on the pre-fix code with the expected sync-application breakages: Representative constructed fixtures from those tests: // new role
local := Roles.Definitions = nil
server := RegistrationResponse{
Roles: {Definitions: {"admin": {Name: "admin", Enabled: true}}},
}
// expected after MergeConfiguration:
// Roles.Definitions == {"admin": {Name: "admin", Enabled: true}}// updated role
local := Roles.Definitions = {
"editor": {Name: "editor", Description: "Can edit", Enabled: true},
"untouched": {Name: "untouched", Description: "Keep me", Enabled: true},
}
server := RegistrationResponse{
Roles: {Definitions: {"editor": {Name: "editor", Description: "Can edit and publish", Enabled: true}}},
}
// expected after MergeConfiguration:
// editor.Description == "Can edit and publish"
// untouched still present// new workflow
local := Workflows.Definitions = nil
server := RegistrationResponse{
Workflows: {Definitions: {"approval": makeTestWorkflow("approval", "Handles approvals")}},
}
// expected after MergeConfiguration:
// Workflows.Definitions["approval"].Description == "Handles approvals"// updated provider
local := Providers.Definitions = {
"mock-primary": {Name: "mock-primary", Description: "Old provider description", Provider: "mock", Enabled: true},
"mock-extra": {Name: "mock-extra", Description: "Keep me", Provider: "mock", Enabled: true},
}
server := RegistrationResponse{
Providers: {Definitions: {"mock-primary": {Name: "mock-primary", Description: "Updated provider description", Provider: "mock", Enabled: true}}},
}
// expected after MergeConfiguration:
// mock-primary.Description == "Updated provider description"
// mock-extra still presentThe same focused cases pass on this branch: go test ./internal/config -run 'TestMergeConfiguration_ServerSends(NewRoles|UpdatedRole|NewWorkflows|UpdatedWorkflow|NewProviders|UpdatedProvider|PartialConfig_OnlyRoles)|TestApplyPatch_AppliesRoles'So the red/green story here is: upstream computes and validates the incoming synced definitions, but the live |
05e7a36 to
6621fa2
Compare
| } | ||
|
|
||
| func (c *Config) updateRoles(roleConfig *RoleConfig) error { | ||
| _, err := c.ApplyRoles([]*models.RoleDefinitions{{ |
There was a problem hiding this comment.
the defs returned by ApplyRoles get thrown away here?
| } | ||
|
|
||
| // Create a patch to see the differences between existing and new | ||
| incomingPatch, err := jsonpatch.CreateMergePatch(existingData, newData) |
There was a problem hiding this comment.
this patch is sparse AIUI, contains only fields that changed:
{
"roles": {
"definitions": {
"editor": {
"description": "Can edit and publish"
}
}
}
}| // Convert patches back to structs - these are the NEW changes from the remote | ||
| // server that we need to apply to our existing configuration | ||
| var incomingDiff ConfigPatchRequest | ||
| err = json.Unmarshal(incomingPatch, &incomingDiff) |
There was a problem hiding this comment.
I think Unmarshal will do the wrong thing here for the example above:
ConfigPatchRequest{
RoleConfig: &RoleConfig{
Definitions: map[string]models.Role{
"editor": {
Name: "",
Description: "Can edit and publish",
Enabled: false,
},
},
},
}We can't discriminate whether Name: "" was an intended change, or just the zero default value backfilled by Unmarshal.
| // sparse diff payload. Unmarshaling the sparse merge patch here would | ||
| // collapse omitted fields to zero values in typed structs. | ||
| var mergedConfig ConfigPatchRequest | ||
| err = json.Unmarshal(newData, &mergedConfig) |
There was a problem hiding this comment.
This is the ugly bit: mergedConfig is always a full patch, so that no fields are zeroed. There is no point to apply the patch as before, hence use applyMergedConfig below.
| return err | ||
| } | ||
|
|
||
| return c.storeRoleDefinitions(merged.Definitions) |
There was a problem hiding this comment.
Here the changes are applied.
(The tricky bit here is that between the RLock above, and the Lock section in storeRoleDefinitions, some other update could have happened. That's addressed in the second commit of this branch.)
Unmarshal the full merged config in MergeConfiguration and apply that end state directly instead of routing a sparse diff back through the sync apply path. Keep applyPatch as the helper for real partial section diffs, and factor shared normalization/store helpers so both flows continue to validate and normalize definitions before persisting them.
Snapshot the current config generation before building the merged sync view, normalize the merged role/workflow/provider definition maps off-lock, and only commit them if the generation is unchanged. Keep the retry logic scoped to MergeConfiguration, compare and commit definitions only, and detach the snapshot through JSON so stale retries do not alias nested state. Reloaded definitions now bump the generation counter, while broader nested-mutation cleanup remains tracked in thand-io#306.
6621fa2 to
7026fac
Compare
This fixes a pre-existing config sync bug where server-synced role, workflow, and provider definitions could be validated without becoming the live in-memory config, and then hardens that merged sync path against stale-snapshot lost updates.
How to Read
Apply*results get thrown away?, and (2) sparse diffs would cause zeroed values that the validation would throw away anyway.If that makes sense:
What changed
MergeConfiguration, instead of routing that merged state back through the partial-patch helperapplyPatchas the helper for real partial section diffs, with shared normalize/store helpers underneathResidual gap
Definition maps are now treated as immutable snapshots for sync and reload purposes, but broader nested-mutation and lock-discipline cleanup is still tracked separately in #306. The skipped evidence tests in this branch document that remaining gap without widening this PR.
Validation
go test ./internal/configgo test ./...