Skip to content

Commit 4cbbe1d

Browse files
etrclaude
andcommitted
TASK-027: validation fixes — pre-compiled regex tier, classify_route_tier helper
Address validation findings from /groundwork:validate (2 passes): - src/httpserver/detail/webserver_impl.hpp: regex_routes_ now holds pre-compiled std::regex objects in a regex_route struct (url_complete, compiled_re, entry). Architecture §4.7 specifies the regex tier as a vector of (compiled std::regex, route_entry), not (std::string, route_entry) — addressing architecture-alignment-checker. - src/webserver.cpp: introduced classify_route_tier() helper as single source-of-truth for v2 tier placement (radix / regex / exact), replacing the duplicated tier-classification branches in register_impl_ and on_methods_ (code-simplifier finding). The on_methods_ update path is now fresh-gated to avoid redundant regex recompilation on subsequent on_post / on_get / etc. calls. - Tests added: regex_route_hits_regex_tier and prefix_subpath_first_lookup_hits_radix_not_cache in lookup_pipeline_test; radix_tree_matches_multiple_parameterized_segments and cache_duplicate_insert_replaces_in_place_and_keeps_size in route_table_test; on_get_and_on_post_compose_on_true_regex_path in webserver_on_methods_test; concurrency-test extensions in route_table_concurrency. - Minor comment cleanup in radix_tree.hpp. - Marked TASK-027 Done in task spec and _index.md. - Recorded unworked review issues (2 runs: 10 findings then 62 findings; follow-ups for v2 dispatch cutover, heap allocations on lookup hot path, end-to-end captured-params test, etc.). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent ee4a782 commit 4cbbe1d

11 files changed

Lines changed: 672 additions & 52 deletions

File tree

specs/tasks/M5-routing-lifecycle/TASK-027.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,4 +47,4 @@ Replace v1's three maps with the architecture-mandated 3-tier structure: `unorde
4747
**Related Requirements:** PRD-HDL-REQ-002, PRD-HDL-REQ-004
4848
**Related Decisions:** DR-007, §4.7, §5.1
4949

50-
**Status:** In Progress
50+
**Status:** Done

specs/tasks/_index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ Nominally: **13 sequential tasks**, each S–XL. Most other tasks parallelize of
109109
| TASK-024 | `register_path` and `register_prefix` (replace `bool family`) | M4 | Done | TASK-023 |
110110
| TASK-025 | Lambda handler entry points `on_*` | M4 | Done | TASK-005, TASK-009, TASK-014 |
111111
| TASK-026 | Generic `webserver::route(method, path, handler)` | M4 | Done | TASK-005, TASK-025 |
112-
| TASK-027 | 3-tier route table with LRU cache | M5 | Not Started | TASK-005, TASK-014, TASK-021, TASK-024, TASK-025, TASK-026 |
112+
| TASK-027 | 3-tier route table with LRU cache | M5 | Done | TASK-005, TASK-014, TASK-021, TASK-024, TASK-025, TASK-026 |
113113
| TASK-028 | Routing-semantics regression gate | M5 | Not Started | TASK-027 |
114114
| TASK-029 | Naming consistency — `stop_and_wait`, `block_ip`/`unblock_ip` | M5 | Not Started | TASK-014 |
115115
| TASK-030 | `_handler` suffix renames + `explicit` constructor | M5 | Not Started | TASK-014 |
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# Unworked Review Issues
2+
3+
**Run:** 2026-05-10 22:45:00
4+
**Task:** TASK-027
5+
**Total:** 10 (0 critical, 6 major, 4 minor)
6+
7+
## Major
8+
9+
1. [ ] **security-reviewer** | `src/webserver.cpp:1541` | broken-access-control
10+
`lookup_v2()` is defined but never called from `finalize_answer()` or any other dispatch site. All live request dispatch still uses the v1 `registered_resources_str` / `registered_resources_regex` path exclusively. Resources registered through the v2 table have no effect on actual request routing until the v1 cutover happens. TASK-027 explicitly defers the dispatch cutover to a follow-up task (Cycle K) to keep this diff reviewable; the v2 table is a shadow/test table in this PR. Any access-control logic placed only in the v2 table (e.g. a catch-all deny prefix route) would silently have no effect on live traffic in the current state. CWE-284.
11+
*Recommendation:* Wire `lookup_v2()` into `finalize_answer()` as the primary dispatch mechanism in the follow-up cutover task (TASK-036 or a dedicated TASK-028 gate). Until then, document clearly in API-facing comments that routes registered via `register_path` / `on_*` / `route()` are stored in both v1 and v2 tables, but only the v1 table drives actual dispatch.
12+
13+
2. [ ] **test-quality-reviewer** | `test/unit/lookup_pipeline_test.cpp:1` | missing-test
14+
The acceptance criterion "path-piece extraction populates `http_request`" has no end-to-end test. `lookup_v2()` returns `captured_params` in the `lookup_result`, but no test verifies that those captures are subsequently written into the `http_request` as args (equivalent to the v1 `mr->dhr->set_arg` calls at webserver.cpp:1907). Because the live dispatch still uses v1, this gap cannot be filled until `lookup_v2` is wired into the dispatch path. Once Cycle K (dispatch cutover) lands, a live-dispatch integration test must verify that `GET /users/42/posts` results in `http_request::get_arg("id") == "42"`.
15+
*Recommendation:* After `lookup_v2` is wired into dispatch, add an integration test that starts the webserver, issues a real request to `/users/42/posts`, and asserts `http_request::get_arg("id") == "42"`. Pair this with the TASK-028 routing-semantics regression gate.
16+
17+
3. [ ] **security-reviewer** | `src/webserver.cpp:1890` | denial-of-service
18+
The v1 route cache (`route_cache_list` / `route_cache_map`) and the v2 `route_cache_v2` use different locking patterns. In `unregister_impl_()` the v1 cache is cleared inline (direct lock + list.clear() + map.clear()) while `registered_resources_mutex` (unique_lock) is already held, whereas in `register_impl_()` and `on_methods_()` the invalidation is delegated to `invalidate_route_cache()` after releasing the table lock. If a v1-path thread holds `route_cache_mutex` and tries to acquire `registered_resources_mutex` while another thread holds `registered_resources_mutex` and waits on `route_cache_mutex`, a deadlock can occur. CWE-833. Will be resolved when the v1 cache is removed in the dispatch-cutover follow-up (TASK-036). The lock-order discipline should be audited at that point. In the interim, consolidating the v1 cache clearing inside `invalidate_route_cache()` would remove the asymmetry and reduce audit surface.
19+
*Recommendation:* Track as a known lock-order risk during the v1/v2 coexistence period. Resolve by removing the v1 cache entirely when the dispatch cutover (Cycle K) ships.
20+
21+
4. [ ] **security-reviewer** | `src/httpserver/detail/radix_tree.hpp:79` | denial-of-service
22+
`radix_node<T>` uses `std::unordered_map<std::string, ...>` for `children_`. On most libc++ / libstdc++ implementations `std::hash<std::string>` is not hash-randomized by default. An attacker who can register many route segments that hash-collide can degrade lookup from O(1) to O(n) per hash probe. Under the shared lock on `route_table_mutex_` this enables per-request CPU spikes proportional to the collision depth. CWE-400.
23+
*Recommendation:* Use `std::map` (ordered, collision-free) for `children_`, or supply a per-process randomized hash seeded via `std::random_device` at startup. Alternatively cap the number of children per node or the radix tree depth.
24+
25+
5. [ ] **performance-reviewer** | `src/httpserver/detail/radix_tree.hpp:232` | memory-allocation
26+
`tokenize()` is called on every `find()` invocation and internally calls `http_utils::tokenize_url(std::string{path})`, which (a) copies the `string_view` into a `std::string`, (b) allocates a `std::vector<std::string>` for the segments, and (c) allocates each segment as an individual `std::string`. These heap allocations sit on the per-request critical path for every parameterised-route lookup and every cold cache miss.
27+
*Recommendation:* Tokenize inline within `find()` using `std::string_view` iteration over the path. Use transparent lookup (heterogeneous hashing, C++20) on `children_` to avoid segment string copies during descent. This eliminates the vector allocation and the per-segment heap copies.
28+
29+
## Minor
30+
31+
6. [ ] **performance-reviewer** | `src/webserver.cpp:1545` | memory-allocation
32+
`cache_key key{method, path}` at line 1545 copies `path` (a `const std::string&`) into a new `std::string` inside `cache_key::path` on every call to `lookup_v2`, including every warm-cache hit. For paths longer than SSO (~15 bytes) this is a heap allocation on every request.
33+
*Recommendation:* Add a `find_by_view()` overload to `route_cache` that takes `(http_method, std::string_view)` and performs the map lookup without constructing a `cache_key` by value. The key copy is only required on cache insert.
34+
35+
7. [ ] **performance-reviewer** | `src/httpserver/detail/radix_tree.hpp:64` | memory-allocation
36+
`radix_match<T>::captures` is a `std::vector<std::pair<std::string,std::string>>`. In `find()`, `caps` is built incrementally then potentially assigned to `best_prefix_caps` (a copy at line 172) and again moved into `out.captures`. For routes with N wildcard segments this produces 2 copies of the captures vector before it lands in `lookup_result`. In `lookup_v2` the captures then flow into `cache_value::captured_params` (another copy at line 1601-1602), for 3 total copies between radix match and cache insert.
37+
*Recommendation:* Eliminate the intermediate `best_prefix_caps` copy by tracking a slice count. In `lookup_v2`, move `result.captured_params` directly into the `cache_value` struct instead of assigning by value.
38+
39+
8. [ ] **security-reviewer** | `src/webserver.cpp:476` | insecure-design
40+
In `on_methods_()`, the `param_and_prefix_routes_.find(key, existing)` call returns a `const T* entry` (non-owning pointer into the tree). Between find and insert the caller holds `route_table_mutex_` (unique_lock, so no concurrent mutation), but `insert()` unconditionally replaces the terminus. If a prefix route and a parameterised exact route share the same path string, the merge logic may pick the wrong terminus and silently drop a registered handler. CWE-362 (semantic, not a true data race).
41+
*Recommendation:* Separate the find/insert paths by also checking `existing.entry` for the prefix_terminus case. Or expose a dedicated `update-terminus` method on `radix_tree<T>` that performs an atomic read-modify-write on the correct terminus field.
42+
43+
9. [ ] **architecture-alignment-checker** | `src/httpserver/detail/webserver_impl.hpp:207` | pattern-violation
44+
The v2 LRU cache mutex is named `route_cache_mutex` (no trailing underscore) whereas the project's member-naming convention uses a trailing underscore for private member data (e.g. `route_table_mutex_`, `registered_resources_mutex` — though the latter also lacks one). The architecture spec names it `route_cache_mutex_` (with underscore).
45+
*Recommendation:* Rename `route_cache_mutex` to `route_cache_mutex_` throughout `webserver_impl.hpp` and `webserver.cpp` to align with the architectural spec's documented identifier and the broader member-naming convention.
46+
47+
10. [ ] **security-reviewer** (iter2) | `src/webserver.cpp:1688` | broken-access-control (CWE-22, CWE-284)
48+
**OUT-OF-SCOPE for TASK-027.** `normalize_path()` and `should_skip_auth()` were introduced before TASK-027 (commits 86d4631 / d8b055e); TASK-027 did not modify either function. The finding is valid but predates this task.
49+
Two sub-issues are flagged:
50+
(a) `normalize_path` does not collapse consecutive slashes (`//`), so a wildcard skip_path like `/public/*` fails to match `/public//secret` after normalization, which could admit a path that should be auth-protected or block one that should be skipped.
51+
(b) The contract that the `path` argument to `should_skip_auth` must already be MHD-unescaped (by `unescaper_func`, registered via `MHD_OPTION_UNESCAPE_CALLBACK`) is not documented at the call site. If a future caller passes a raw percent-encoded path, percent-encoded dot sequences (`%2e`, `%2f`) survive `normalize_path` and the auth-skip decision may be wrong.
52+
*Recommendation:* In a follow-up task targeting the auth/skip-auth subsystem: (1) collapse consecutive `/` separators in `normalize_path` before the segment-split loop; (2) add a comment at the `should_skip_auth` call site (webserver.cpp ~line 2002) and at the function declaration documenting the MHD-unescape precondition.

0 commit comments

Comments
 (0)