feat(proto): portable Sigma IR protobuf interchange + conformance vectors#204
Draft
mostafa wants to merge 1 commit into
Draft
feat(proto): portable Sigma IR protobuf interchange + conformance vectors#204mostafa wants to merge 1 commit into
mostafa wants to merge 1 commit into
Conversation
… vectors Define a language-neutral protobuf schema for a post-pipeline, modifier-resolved, selector-resolved Sigma rule: the wire message a remote backend responds to and the single source of truth an engine's HIR is generated from or conformance-locked against. Built by reconciling the RSigma parser AST / HIR with pySigma's resolved types, conditions, modifiers, correlations, and filters. - sigma_ir.proto: values (placeholder-aware SigmaString), all matcher variants, detections (incl. ArrayMatch/Conditional extensions), selector-resolved conditions, rule/correlation/filter, IrRuleMetadata superset, Pack envelope. - sigma_backend.proto: SigmaBackend gRPC service (Capabilities + Convert with explicit Unsupported), the separable remote-backend transport. - conformance/vectors: 19 hand-authored golden (rule YAML -> canonical IR) vectors covering every matcher kind, value linking, base64offset/windash expansions (computed with pySigma's exact algorithm), keywords, selector resolution, and/not, an event_count correlation, a filter, and the metadata superset. All validated strictly against the schema. Intended to move to a neutral sigma-ir-schema repo.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Sigma backends are reimplemented per engine today: a Splunk/KQL/Elastic/etc. backend written for one engine has to be re-ported into RSigma and any future engine. This defines a shared, strongly typed message so a backend can be written once and reused across engines, including as a remote gRPC service. The message sits at the post-pipeline, modifier-resolved, selector-resolved layer that pySigma (after modifier application and condition postprocess) and RSigma (its HIR) independently arrive at, which is what makes a single schema faithful rather than forced. External/dynamic sources are converging across engines too (see SigmaHQ/pySigma#470), so they belong in the shared schema rather than as an engine-specific extension.
Summary
proto/sigma_ir.proto, a language-neutral protobuf schema for a post-pipeline, modifier-resolved, selector-resolved Sigma rule: the wire message a remote backend responds to, and a single canonical form an engine's HIR can be generated from or conformance-locked against.proto/sigma_backend.proto, aSigmaBackendgRPC service (Capabilities+Convert, with explicitUnsupportedresults) as the separable remote-backend transport over the schema.(rule YAML -> canonical IR)conformance vectors underproto/conformance/vectors/, plus READMEs documenting the schema, conventions, and vector format.proto/into a neutral standalone schema repo so a second implementation can vendor it.Test plan
protoc(libprotoc 34.1) compilessigma_ir.protoandsigma_backend.protocleanly.json_format.ParseDict, which rejects unknown fields).base64offset/windashexpansions in the vectors were computed with pySigma's exact algorithm, not hand-written.