|
| 1 | +# Architecture Overview |
| 2 | + |
| 3 | +Linux EDR is designed with a modular and robust architecture to handle real-time event processing and reporting efficiently. |
| 4 | + |
| 5 | +## Core Components |
| 6 | + |
| 7 | +- **Trace Reader (`trace.py`)**: Uses non-blocking I/O (`selectors`) to read from the kernel's `trace_pipe` without impacting system performance. Includes robust error handling and automatic reconnection logic. |
| 8 | +- **Aggregator (`aggregator.py`)**: A thread-safe buffer (`deque`) that collects events from the trace reader. Implements backpressure using a maximum length and optional event age limits. |
| 9 | +- **Report Manager (`report_manager.py`)**: Orchestrates the creation, storage, and aggregation of hierarchical reports (Cells, Blocks, Daily, Weekly, Monthly). Manages the lifecycle of reports based on time and event counts. |
| 10 | +- **Models (`models.py`)**: Defines the structure of events and reports using Pydantic, ensuring data consistency and validation. |
| 11 | +- **Reporter (`reporter.py`)**: Handles the output of reports, including saving to JSON files and sending data to OpenAI for analysis. |
| 12 | +- **Summary (`summary.py`)**: Contains logic for building the initial summary reports (Cells) from aggregated events. |
| 13 | +- **Application (`app.py`)**: The main application class that initializes components, manages the scheduler (using `APScheduler`), and orchestrates the event processing pipeline. |
| 14 | +- **Configuration (`config.py`)**: Loads and provides access to configuration settings from `config.ini` files. |
| 15 | +- **CLI (`cli.py`)**: Provides the command-line interface using Typer. |
| 16 | + |
| 17 | +## Data Flow |
| 18 | + |
| 19 | +1. The `TraceReader` continuously reads `execve` events from the kernel trace pipe. |
| 20 | +2. Events are passed to the `Aggregator`, which buffers them in a thread-safe manner. |
| 21 | +3. A background scheduler triggers the `_summarize` method in `app.py` at the configured interval (`report_interval`). |
| 22 | +4. `_summarize` retrieves a snapshot of events from the `Aggregator`. |
| 23 | +5. `build_summary` creates a Level 1 `Cell` report from the event snapshot. |
| 24 | +6. The `Cell` is passed to the `ReportManager`. |
| 25 | +7. The `ReportManager` saves the `Cell` and checks if enough Cells exist to create a Level 2 `Block`. This process continues up the hierarchy (Daily, Weekly, Monthly). |
| 26 | +8. The `Reporter` can optionally save the initial `Cell` report to a JSON file (`output_file`) and send it to OpenAI for analysis. |
| 27 | +9. Higher-level reports (Blocks, etc.) can also be configured for AI analysis via the `ReportManager` interacting with the `Reporter`. |
| 28 | + |
| 29 | +## Project Structure |
| 30 | + |
| 31 | +```text |
| 32 | +linux-edr/ |
| 33 | +├── linux_edr/ # Main source code package |
| 34 | +│ ├── __init__.py |
| 35 | +│ ├── cli.py # Typer-based CLI interface |
| 36 | +│ ├── app.py # Core application logic |
| 37 | +│ ├── config.py # Configuration management |
| 38 | +│ ├── trace.py # Non-blocking trace reader |
| 39 | +│ ├── aggregator.py # Thread-safe event buffering |
| 40 | +│ ├── summary.py # Initial report generation (Cells) |
| 41 | +│ ├── reporter.py # OpenAI integration and output handling |
| 42 | +│ ├── report_manager.py # Hierarchical report management |
| 43 | +│ └── models.py # Pydantic data models |
| 44 | +├── tests/ # Comprehensive test suite |
| 45 | +├── docs/ # Documentation source files |
| 46 | +├── .github/ # GitHub Actions workflows |
| 47 | +│ └── workflows/ |
| 48 | +│ └── docs.yml # Documentation deployment workflow |
| 49 | +├── linux-edr.service # Systemd service definition |
| 50 | +├── pyproject.toml # Project metadata and dependencies |
| 51 | +├── mkdocs.yml # MkDocs configuration |
| 52 | +├── PRIVACY.md # Privacy policy |
| 53 | +└── README.md # Repository README |
| 54 | +``` |
0 commit comments