docs: update docs

scc-tw · scc-tw · commit 44df60090f41 · 2025-05-01T22:22:25.000+08:00
diff --git a/docs/api/aggregator.md b/docs/api/aggregator.md
@@ -0,0 +1,3 @@
+# Aggregator API
+
+::: linux_edr.aggregator.Aggregator 
diff --git a/docs/api/models.md b/docs/api/models.md
@@ -0,0 +1,21 @@
+# Models API
+
+This page documents the Pydantic models used for representing events and reports.
+
+::: linux_edr.models.CommandLine
+
+::: linux_edr.models.ProcessEvents
+
+::: linux_edr.models.SummaryReport
+
+::: linux_edr.models.Cell
+
+::: linux_edr.models.Block
+
+::: linux_edr.models.DailyReport
+
+::: linux_edr.models.WeeklyReport
+
+::: linux_edr.models.MonthlyReport
+
+::: linux_edr.models.DailySummary 
diff --git a/docs/api/report_manager.md b/docs/api/report_manager.md
@@ -0,0 +1,3 @@
+# ReportManager API
+
+::: linux_edr.report_manager.ReportManager 
diff --git a/docs/api/reporter.md b/docs/api/reporter.md
@@ -0,0 +1,3 @@
+# Reporter API
+
+::: linux_edr.reporter.Reporter 
diff --git a/docs/api/trace.md b/docs/api/trace.md
@@ -0,0 +1,3 @@
+# TraceReader API
+
+::: linux_edr.trace.TraceReader 
diff --git a/docs/architecture/ai-analysis.md b/docs/architecture/ai-analysis.md
@@ -0,0 +1,29 @@
+# AI-Enhanced Security Analysis
+
+Linux EDR integrates with OpenAI's language models (defaulting to `gpt-4o-mini`, configurable via `model` setting) to provide automated analysis of system activity reports.
+
+## Analysis Process
+
+1.  At each reporting interval (default 15 minutes), the generated `Cell` report can be sent to the configured OpenAI model.
+2.  Higher-level reports (`Block`, `DailyReport`, etc.) generated by the `ReportManager` can also trigger analysis.
+3.  The `Reporter` component formats the report data into a specific prompt tailored for security analysis.
+4.  The prompt instructs the AI to act as a security analyst specializing in Linux systems and to look for potential threats based on command execution patterns.
+
+## Focus Areas for AI Analysis
+
+The AI analysis primarily looks for:
+
+-   **Unusual Command Patterns**: Execution of rare commands, unexpected command sequences, or commands run at odd times.
+-   **Privilege Escalation Indicators**: Commands associated with gaining higher privileges (e.g., `sudo`, `su`, exploits).
+-   **Data Exfiltration Attempts**: Use of tools like `scp`, `rsync`, `curl`, `wget` in suspicious contexts or targeting sensitive directories.
+-   **Anomalous Network Activity**: Commands initiating unexpected network connections (though network traffic itself is not monitored, the commands *causing* it are).
+-   **Suspicious File/Directory Operations**: Access to sensitive files (`/etc/shadow`), creation of hidden files/directories, or unusual use of file manipulation tools.
+-   **Living-off-the-Land Techniques**: Misuse of standard system utilities for malicious purposes.
+
+## Output
+
+-   The analysis text generated by the AI is logged by the application.
+-   If an `output_file` is configured for the base `Cell` reports, the corresponding AI analysis is appended to a separate file named `<output_file>.analysis`.
+-   For higher-level reports managed by `ReportManager`, the analysis text is stored directly within the `analysis` field of the respective report's JSON file (e.g., in `reports/blocks/block_....json`).
+
+This automated analysis provides actionable security insights directly from the collected data, reducing the need for manual log review and helping to quickly identify potential threats. 
diff --git a/docs/architecture/overview.md b/docs/architecture/overview.md
@@ -0,0 +1,54 @@
+# Architecture Overview
+
+Linux EDR is designed with a modular and robust architecture to handle real-time event processing and reporting efficiently.
+
+## Core Components
+
+-   **Trace Reader (`trace.py`)**: Uses non-blocking I/O (`selectors`) to read from the kernel's `trace_pipe` without impacting system performance. Includes robust error handling and automatic reconnection logic.
+-   **Aggregator (`aggregator.py`)**: A thread-safe buffer (`deque`) that collects events from the trace reader. Implements backpressure using a maximum length and optional event age limits.
+-   **Report Manager (`report_manager.py`)**: Orchestrates the creation, storage, and aggregation of hierarchical reports (Cells, Blocks, Daily, Weekly, Monthly). Manages the lifecycle of reports based on time and event counts.
+-   **Models (`models.py`)**: Defines the structure of events and reports using Pydantic, ensuring data consistency and validation.
+-   **Reporter (`reporter.py`)**: Handles the output of reports, including saving to JSON files and sending data to OpenAI for analysis.
+-   **Summary (`summary.py`)**: Contains logic for building the initial summary reports (Cells) from aggregated events.
+-   **Application (`app.py`)**: The main application class that initializes components, manages the scheduler (using `APScheduler`), and orchestrates the event processing pipeline.
+-   **Configuration (`config.py`)**: Loads and provides access to configuration settings from `config.ini` files.
+-   **CLI (`cli.py`)**: Provides the command-line interface using Typer.
+
+## Data Flow
+
+1.  The `TraceReader` continuously reads `execve` events from the kernel trace pipe.
+2.  Events are passed to the `Aggregator`, which buffers them in a thread-safe manner.
+3.  A background scheduler triggers the `_summarize` method in `app.py` at the configured interval (`report_interval`).
+4.  `_summarize` retrieves a snapshot of events from the `Aggregator`.
+5.  `build_summary` creates a Level 1 `Cell` report from the event snapshot.
+6.  The `Cell` is passed to the `ReportManager`.
+7.  The `ReportManager` saves the `Cell` and checks if enough Cells exist to create a Level 2 `Block`. This process continues up the hierarchy (Daily, Weekly, Monthly).
+8.  The `Reporter` can optionally save the initial `Cell` report to a JSON file (`output_file`) and send it to OpenAI for analysis.
+9.  Higher-level reports (Blocks, etc.) can also be configured for AI analysis via the `ReportManager` interacting with the `Reporter`.
+
+## Project Structure
+
+```text
+linux-edr/
+├── linux_edr/            # Main source code package
+│   ├── __init__.py
+│   ├── cli.py            # Typer-based CLI interface
+│   ├── app.py            # Core application logic
+│   ├── config.py         # Configuration management
+│   ├── trace.py          # Non-blocking trace reader
+│   ├── aggregator.py     # Thread-safe event buffering
+│   ├── summary.py        # Initial report generation (Cells)
+│   ├── reporter.py       # OpenAI integration and output handling
+│   ├── report_manager.py # Hierarchical report management
+│   └── models.py         # Pydantic data models
+├── tests/                # Comprehensive test suite
+├── docs/                 # Documentation source files
+├── .github/              # GitHub Actions workflows
+│   └── workflows/
+│       └── docs.yml      # Documentation deployment workflow
+├── linux-edr.service     # Systemd service definition
+├── pyproject.toml        # Project metadata and dependencies
+├── mkdocs.yml            # MkDocs configuration
+├── PRIVACY.md            # Privacy policy
+└── README.md             # Repository README
+``` 
diff --git a/docs/architecture/reporting.md b/docs/architecture/reporting.md
@@ -0,0 +1,31 @@
+# Hierarchical Reporting Architecture
+
+Linux EDR implements a sophisticated multi-tiered reporting system that provides security visibility across different time scales. This allows for analysis ranging from immediate, granular events to long-term strategic trends.
+
+## Reporting Levels
+
+The system aggregates data progressively through the following levels:
+
+| Level | Coverage            | Name           | Source Components         | Description                                          |
+|:-----:|:--------------------|:---------------|:--------------------------|:-----------------------------------------------------|
+| 1     | 15 minutes          | **Cell**       | 1 Event Snapshot          | Base unit capturing immediate system activity        |
+| 2     | 16 Cells = 4 hours  | **Block**      | 16 Cells                  | Short-term patterns across multiple Cells            |
+| 3     | 6 Blocks = 24 hours | **DailyReport**| 6 Blocks                  | Consolidated view of a full day's activity           |
+| 4     | 7 DailyReports      | **WeeklyReport**| 7 DailyReports            | Week-long trends with daily breakdowns              |
+| 5     | ~4 WeeklyReports    | **MonthlyReport**| Approx. 4 WeeklyReports   | Strategic view of monthly security posture         |
+
+*(Default intervals and aggregation counts are configurable in `config.ini`)*
+
+## Benefits
+
+This hierarchical architecture enables:
+
+-   **Immediate Threat Detection**: The `Cell` level provides a near real-time view (default 15 mins) of command executions, allowing for rapid identification of obviously malicious or unusual commands.
+-   **Contextual Pattern Recognition**: The `Block` level (default 4 hours) aggregates data to reveal short-term patterns, such as repeated failed login attempts followed by a suspicious command, or unusual process behavior within a limited timeframe.
+-   **Daily Security Posture Assessment**: The `DailyReport` consolidates a full day's activity, highlighting the most active processes and commands, and serving as a basis for identifying significant deviations from normal daily operations.
+-   **Trend Identification**: The `WeeklyReport` analyzes trends over seven days, making it possible to spot recurring suspicious activities, track the evolution of potential incidents, and calculate weekly risk scores.
+-   **Strategic Security Planning**: The `MonthlyReport` offers a high-level, long-term view of the system's security posture, summarizing key activities, risks, and incidents, suitable for strategic reviews and planning security improvements.
+
+## Storage
+
+All generated reports are automatically stored as individual JSON files within the directory specified by `reports_dir` in the configuration. They are organized into subdirectories corresponding to their level (e.g., `reports/cells/`, `reports/blocks/`, etc.). 
diff --git a/docs/configuration.md b/docs/configuration.md
@@ -0,0 +1,66 @@
+# Configuration
+
+Linux EDR behavior is controlled via a configuration file, typically named `config.ini`. The tool searches for this file in the following locations (in order):
+
+1.  `./config.ini` (current directory)
+2.  `~/.config/linux_edr/config.ini` (user's config directory)
+3.  `/etc/linux_edr/config.ini` (system-wide config)
+4.  The default `config.ini` included with the package.
+
+You can also specify a path directly using the `--config` command-line option.
+
+## Configuration Options
+
+Here are the available sections and options:
+
+```ini
+[DEFAULT]
+# Path to the kernel trace_pipe used for monitoring execve events.
+# Default: /sys/kernel/tracing/trace_pipe
+trace_path = /sys/kernel/tracing/trace_pipe
+
+# Interval (in minutes) at which summary reports (Cells) are generated.
+# Default: 15
+report_interval = 15
+
+# The OpenAI model to use for security analysis (e.g., gpt-4o-mini, gpt-4).
+# Default: gpt-4o-mini
+model = gpt-4o-mini
+
+# Enable verbose debug logging (true/false).
+# Default: false
+debug = false
+
+# Path to save periodic JSON reports (Cells). Leave empty to disable file output.
+# The Report Manager will still store hierarchical reports in `reports_dir`.
+# Default: (empty string)
+output_file = 
+
+[OPENAI]
+# Your OpenAI API key. If left empty, the tool will attempt to read the
+# OPENAI_API_KEY environment variable.
+# Default: (empty string)
+api_key = 
+
+[REPORTS]
+# The base directory where hierarchical reports (Cells, Blocks, Daily, etc.)
+# will be stored in subdirectories.
+# Default: reports
+reports_dir = reports
+
+[ADVANCED]
+# The maximum number of raw events to buffer in memory before being processed
+# into a Cell report. Acts as a backpressure mechanism.
+# Default: 10000
+max_events_buffer = 10000
+
+# Limits the number of command examples per process included in the prompt
+# sent to the LLM for Cell-level analysis, preventing overly long prompts.
+# Default: 50
+max_summary_lines = 50
+
+# Whether to include the raw event data within the saved JSON Cell reports.
+# Set to false to reduce storage space if raw data is not needed.
+# Default: true
+include_raw_events = true
+``` 
diff --git a/docs/development.md b/docs/development.md
@@ -0,0 +1,66 @@
+# Development Guide
+
+Contributions and local development are welcome!
+
+## Setup
+
+1.  **Clone the repository:**
+    ```bash
+    git clone https://github.com/ParttimeWorks/linux_edr.git
+    cd linux-edr
+    ```
+
+2.  **Install in editable mode with development dependencies:**
+    We use `uv` for all dependency management.
+    ```bash
+    # Installs the package itself and dependencies listed under [project.optional-dependencies]
+    # in pyproject.toml (like pytest, mypy, black, mkdocs, etc.)
+    uv pip install -e .[dev]
+    ```
+
+## Running Tests
+
+The project uses `pytest` for testing.
+
+```bash
+# Run all tests
+uv run pytest
+
+# Run with verbose output
+uv run pytest -v
+
+# Run specific test files or functions
+uv run pytest tests/test_app.py::test_parse_execve
+```
+
+## Type Checking
+
+We use `mypy` for static type checking.
+
+```bash
+uv run mypy linux_edr
+```
+
+## Code Style
+
+Code style is enforced using `black`.
+
+```bash
+# Check formatting
+uv run black --check .
+
+# Apply formatting
+uv run black .
+```
+
+## Building Documentation
+
+Documentation is built using `MkDocs`.
+
+```bash
+# Serve documentation locally for preview (auto-reloads on changes)
+mkdocs serve
+
+# Build the static documentation site (output in the `site/` directory)
+mkdocs build
+``` 
diff --git a/docs/index.md b/docs/index.md
@@ -1,15 +1,21 @@
-# Linux EDR Documentation
+# Welcome to Linux EDR
 
-A lightweight Endpoint Detection and Response (EDR) tool for Linux systems.
+A lightweight yet comprehensive Endpoint Detection and Response (EDR) solution for Linux systems that monitors command execution, analyzes system behavior, and provides actionable security insights with minimal performance impact.
 
 ## Overview
 
-Linux EDR monitors system activity by reading ftrace events and detecting suspicious patterns. It uses:
+Linux EDR captures process execution data through Linux's kernel tracing capabilities and builds a multi-tiered reporting structure that allows for both real-time threat detection and long-term security trend analysis. By focusing on command execution patterns, it provides valuable security insights without the overhead of traditional EDR solutions.
 
-- Non-blocking I/O for efficient trace reading
-- Thread-safe event aggregation
-- Scheduled reporting and summarization
-- Optional AI-powered analysis via OpenAI
+## Key Features
+
+- **Efficient Monitoring**: Non-blocking trace reader for `/sys/kernel/tracing/trace_pipe` with automatic recovery
+- **Scalable Architecture**: Thread-safe event buffer with configurable capacity and age limits
+- **Smart Data Organization**: Process-focused event collection and intelligent command grouping
+- **Hierarchical Reporting**: Tiered reports from 15-minute snapshots to monthly trend analysis
+- **AI-Enhanced Security**: OpenAI integration with gpt-4o-mini for automated threat detection
+- **Flexible Output**: Configurable reporting to JSON files or console
+- **Production-Ready**: Comprehensive error handling with graceful recovery from failures
+- **Privacy-Focused**: Collects only necessary command execution data (see [Privacy Policy](privacy.md))
 
 ## Installation
 
diff --git a/docs/installation.md b/docs/installation.md
@@ -0,0 +1,18 @@
+# Installation
+
+Install Linux EDR using `uv` directly from the latest GitHub release or a specific version tag:
+
+```bash
+# Install from the latest GitHub release
+uv pip install git+https://github.com/ParttimeWorks/linux_edr.git@latest
+
+# Or install a specific version (e.g., v1.0.0)
+uv pip install git+https://github.com/ParttimeWorks/linux_edr.git@v1.0.0
+```
+
+## Requirements
+
+- Python 3.11 or later
+- [uv](https://github.com/astral-sh/uv) for dependency management
+- Linux kernel with ftrace support
+- Appropriate permissions to read from `/sys/kernel/tracing/trace_pipe` (typically requires root privileges) 
diff --git a/docs/privacy.md b/docs/privacy.md
@@ -0,0 +1,3 @@
+# Privacy Policy
+
+Please refer to the main [Privacy Policy](../PRIVACY.md) document in the repository root. 
diff --git a/docs/usage.md b/docs/usage.md

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+# Aggregator API`
	`2`	`+`
	`3`	`+::: linux_edr.aggregator.Aggregator`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+# ReportManager API`
	`2`	`+`
	`3`	`+::: linux_edr.report_manager.ReportManager`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+# Reporter API`
	`2`	`+`
	`3`	`+::: linux_edr.reporter.Reporter`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+# TraceReader API`
	`2`	`+`
	`3`	`+::: linux_edr.trace.TraceReader`