Skip to content

perf: add database indexes and optimize dashboard query for production-scale workloads#268

Open
KaparthyReddy wants to merge 3 commits into
utksh1:mainfrom
KaparthyReddy:perf/database-indexes-and-query-optimization
Open

perf: add database indexes and optimize dashboard query for production-scale workloads#268
KaparthyReddy wants to merge 3 commits into
utksh1:mainfrom
KaparthyReddy:perf/database-indexes-and-query-optimization

Conversation

@KaparthyReddy
Copy link
Copy Markdown

Description

Profiles and optimizes the four hot query paths identified in the issue scope: dashboard aggregation, findings list, reports list, and task queries.

Query optimization (routes.py):

  • Replaced SELECT * FROM findings full table load + Python-side severity counting loop with a single SELECT severity, COUNT(*) GROUP BY severity DB-level aggregation — reduces dashboard latency from O(n) to O(1) on large finding datasets
  • Replaced full findings fetch for recent_findings with SELECT ... ORDER BY discovered_at DESC LIMIT 5 — only 5 rows transferred instead of the full collection
  • Added SELECT COUNT(*) AS total FROM findings for total count instead of len(all_findings) after full fetch

Index additions (database.py + migrations/001_add_performance_indexes.sql):

  • idx_tasks_status_created(status, created_at DESC) — composite index for dashboard running tasks query; eliminates full scan + filter
  • idx_findings_severity — supports GROUP BY severity aggregation
  • idx_findings_task_id — supports foreign key lookups from tasks
  • idx_findings_discovered_at DESC — supports ORDER BY on findings list
  • idx_findings_plugin_id, idx_findings_target — common filter columns
  • idx_findings_task_severity(task_id, severity) — composite for per-task severity breakdown
  • idx_reports_task_id, idx_reports_generated_at DESC, idx_reports_status — reports list and filter queries
  • idx_audit_timestamp DESC, idx_audit_event_type, idx_audit_task_id — audit log queries

Related Issues

Closes #257

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

How Has This Been Tested?

Integration tests — testing/backend/integration/test_database_indexes.py (14 tests, all passing):

  • 10 index existence tests: verify every new index is present after schema migration
  • test_dashboard_severity_counts_correct: seeds 50 findings (10 per severity), asserts exact counts from dashboard endpoint
  • test_dashboard_recent_findings_limit: seeds 200 findings, asserts recent_findings length ≤ 5
  • test_dashboard_empty_findings: asserts correct zero-state response with no findings
  • test_dashboard_severity_counts_with_single_severity: seeds 15 critical findings, asserts all other severities return 0

Benchmark script — scripts/benchmark_db.py:

  • Seeds 10,000 findings and 1,000 tasks
  • Prints EXPLAIN QUERY PLAN output for hot paths
  • Reports avg/min/max execution time over 10 runs for each query
image image

Run tests:

python -m pytest testing/backend/integration/test_database_indexes.py -v

Run benchmark:

python scripts/benchmark_db.py

Checklist

  • My code follows the code style of this project.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.

- Add composite index idx_tasks_status_created(status, created_at DESC)
  for dashboard running tasks query — eliminates full scan + filter
- Add findings indexes: severity, task_id, discovered_at DESC, plugin_id,
  target, and composite (task_id, severity) for grouped severity counts
- Add reports indexes: task_id, generated_at DESC, status
- Add audit_log indexes: timestamp DESC, event_type, task_id
- Refactor dashboard query: replace SELECT * FROM findings full table
  load + Python-side severity counting with a single DB-level GROUP BY
  query — reduces dashboard latency from O(n) to O(1) on large datasets
- Fetch only 5 most recent findings for dashboard instead of entire table
- Add migration script 001_add_performance_indexes.sql for existing DBs
- Add composite index idx_tasks_status_created(status, created_at DESC)
  for dashboard running tasks query — eliminates full scan + filter
- Add findings indexes: severity, task_id, discovered_at DESC, plugin_id,
  target, and composite (task_id, severity) for grouped severity counts
- Add reports indexes: task_id, generated_at DESC, status
- Add audit_log indexes: timestamp DESC, event_type, task_id
- Refactor dashboard query: replace SELECT * FROM findings full table
  load + Python-side severity counting with a single DB-level GROUP BY
  query — reduces dashboard latency from O(n) to O(1) on large datasets
- Fetch only 5 most recent findings for dashboard instead of entire table
- Add migration script 001_add_performance_indexes.sql for existing DBs
- Add integration tests: verify all 10 new indexes exist post-migration,
  verify dashboard severity counts correct on seeded dataset,
  verify recent_findings limited to 5 regardless of total count
- Add benchmark script scripts/benchmark_db.py: seeds 10k findings +
  1k tasks, prints EXPLAIN QUERY PLAN and timed results for hot paths
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[PERF] Add database indexes and query plans for high-volume task/finding views

1 participant