RDK-59001: [POC] Create Marker Instrumentation Discovery Tool#288
RDK-59001: [POC] Create Marker Instrumentation Discovery Tool#288yogeswaransky wants to merge 7 commits intodevelopfrom
Conversation
Signed-off-by: Yogeswaran K <yogeswaransky@gmail.com>
There was a problem hiding this comment.
Pull request overview
Adds a POC Python-based “marker discovery” CLI under tools/marker_discovery/ to inventory T2 telemetry marker instrumentation across repos (C/C++ direct calls + wrapper resolution, script calls, and .patch additions) and emit a consolidated Markdown report.
Changes:
- Introduces GitHub org scanning + shallow clone orchestration, plus C/script/patch marker scanners.
- Adds Markdown report generation (sorting, duplicate detection, dynamic marker section).
- Adds pytest unit tests and supporting spec/docs artifacts (plus a sample generated
report.md).
Reviewed changes
Copilot reviewed 20 out of 22 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/marker_discovery/init.py | Defines the MarkerRecord datamodel used across scanners/reporting. |
| tools/marker_discovery/main.py | Enables python -m tools.marker_discovery execution. |
| tools/marker_discovery/code_parser.py | tree-sitter-based C/C++ parser for direct marker calls + wrapper detection/resolution. |
| tools/marker_discovery/github_client.py | GitHub API client for repo listing/search, plus shallow cloning with branch fallback. |
| tools/marker_discovery/marker_scanner.py | CLI entry point and orchestration across orgs/repos; runs all scanners and generates report. |
| tools/marker_discovery/patch_parser.py | Scans added lines in .patch files for marker API usage. |
| tools/marker_discovery/report_generator.py | Produces the Markdown inventory report including duplicate/dynamic sections. |
| tools/marker_discovery/script_parser.py | Scans non-C/patch files for t2ValNotify/t2CountNotify, flags dynamic markers, resolves $1 for simple wrappers. |
| tools/marker_discovery/requirements.txt | Declares Python dependencies for the tool. |
| tools/marker_discovery/tests/init.py | Test package marker. |
| tools/marker_discovery/tests/test_code_parser.py | Unit tests for direct-call extraction and wrapper call-site resolution. |
| tools/marker_discovery/tests/test_script_parser.py | Unit tests for script marker extraction, dynamic detection, and positional arg resolution. |
| tools/marker_discovery/tests/test_patch_parser.py | Unit tests for patch scanning (added lines only) across C + script APIs. |
| tools/marker_discovery/tests/test_report_generator.py | Unit tests for report structure, sorting, duplicates, and dynamic marker handling. |
| tools/marker_discovery/tests/integration_test.py | Adds a manual “integration test” style script (currently placed under tests). |
| tools/marker_discovery/tests/debug_wrappers.py | Adds a local debugging helper script for wrapper detection. |
| report.md | Sample/generated marker inventory report output. |
| openspec/config.yaml | Adds OpenSpec config scaffold. |
| openspec/changes/marker-discovery/.openspec.yaml | OpenSpec metadata for the change. |
| openspec/changes/marker-discovery/proposal.md | Proposal/spec for the marker discovery tool. |
| openspec/changes/marker-discovery/design.md | Design doc describing architecture and behavior. |
| openspec/changes/marker-discovery/tasks.md | Task breakdown/checklist for implementing the tool. |
| repos_to_clone = [r["name"] for r in all_repos if r["name"] in repo_names_to_clone] | ||
|
|
||
| cloned = github_client.clone_matching_repos(org, repos_to_clone, branch, temp_dir) | ||
| return cloned, len(all_repos) |
| def clone_matching_repos(org, repo_names, branch, temp_dir): | ||
| """Clone a list of repos with branch fallback. | ||
|
|
||
| Args: | ||
| org: GitHub organization name | ||
| repo_names: iterable of repo names to clone | ||
| branch: target branch name | ||
| temp_dir: directory to clone into | ||
|
|
||
| Returns list of dicts: {'name': str, 'path': str, 'branch': str} | ||
| """ | ||
| cloned = [] | ||
| for repo_name in sorted(repo_names): | ||
| clone_path, actual_branch = clone_repo(org, repo_name, branch, temp_dir) | ||
| if clone_path: | ||
| cloned.append({ | ||
| "name": repo_name, | ||
| "path": clone_path, | ||
| "branch": actual_branch, | ||
| }) | ||
| logger.info("Cloned %d/%d repos for org %s", len(cloned), len(list(repo_names)), org) | ||
| return cloned |
| def _find_shell_function_for_line(file_content, line_num): | ||
| """Find what shell function a given line belongs to. | ||
|
|
||
| Returns (function_name, positional_arg_index) if the marker at line_num | ||
| uses a positional parameter like $1, $2, or None if not in a function. | ||
| """ | ||
| lines = file_content.split('\n') | ||
| if line_num < 1 or line_num > len(lines): | ||
| return None | ||
|
|
||
| # Walk backwards to find function definition | ||
| func_pattern = re.compile(r'^\s*(?:function\s+)?(\w+)\s*\(\s*\)') | ||
| for i in range(line_num - 1, -1, -1): | ||
| line = lines[i] | ||
| m = func_pattern.match(line) | ||
| if m: | ||
| return m.group(1) | ||
| # If we hit another closing brace at column 0, we've left the function | ||
| if line.strip() == '}' and i < line_num - 1: | ||
| return None | ||
|
|
||
| return None |
| file_path: str | ||
| line: int | ||
| api: str | ||
| source_type: str # "source" | "script" | "patch" |
| def run_fast_path(org, branch, temp_dir): | ||
| """Fast path: use search API to find repos with markers, then clone only those.""" | ||
| logger.info("Fast path: searching for marker repos in %s via Search API...", org) | ||
|
|
||
| matching_repos = github_client.search_all_markers_in_org(org) | ||
| if not matching_repos: | ||
| logger.warning("No repos with markers found in %s via search", org) | ||
| return [], 0 | ||
|
|
||
| logger.info("Found %d repos with potential markers, cloning...", len(matching_repos)) | ||
|
|
||
| # Get clone URLs for matching repos | ||
| all_repos = github_client.list_org_repos(org) | ||
| repo_names_to_clone = {r for r in matching_repos} | ||
| repos_to_clone = [r["name"] for r in all_repos if r["name"] in repo_names_to_clone] | ||
|
|
||
| cloned = github_client.clone_matching_repos(org, repos_to_clone, branch, temp_dir) | ||
| return cloned, len(all_repos) | ||
|
|
||
|
|
||
| def run_full_path(org, branch, temp_dir): | ||
| """Full path: list all repos, clone all, scan locally.""" | ||
| logger.info("Full path: listing all repos in %s...", org) | ||
|
|
||
| all_repos = github_client.list_org_repos(org) | ||
| repo_names = [r["name"] for r in all_repos] | ||
|
|
||
| logger.info("Cloning %d repos on branch %s...", len(repo_names), branch) | ||
| cloned = github_client.clone_matching_repos(org, repo_names, branch, temp_dir) | ||
| return cloned, len(all_repos) | ||
|
|
||
|
|
||
| def scan_cloned_repos(cloned_repos): | ||
| """Run all scanners on cloned repos. Returns list of MarkerRecord.""" | ||
| all_markers = [] | ||
|
|
||
| for repo_info in cloned_repos: | ||
| repo_path = repo_info["path"] | ||
| repo_name = repo_info["name"] | ||
|
|
||
| logger.info("Scanning %s...", repo_name) | ||
|
|
||
| # C/C++ source scanning (tree-sitter) | ||
| try: | ||
| markers = code_parser.scan_repo(repo_path, repo_name) | ||
| all_markers.extend(markers) | ||
| except Exception as e: | ||
| logger.warning("Code parser failed for %s: %s", repo_name, e) | ||
|
|
||
| # Script scanning | ||
| try: | ||
| markers = script_parser.scan_repo_scripts(repo_path, repo_name) | ||
| all_markers.extend(markers) | ||
| except Exception as e: | ||
| logger.warning("Script parser failed for %s: %s", repo_name, e) | ||
|
|
||
| # Patch scanning | ||
| try: | ||
| markers = patch_parser.scan_repo_patches(repo_path, repo_name) | ||
| all_markers.extend(markers) | ||
| except Exception as e: | ||
| logger.warning("Patch parser failed for %s: %s", repo_name, e) | ||
|
|
||
| return all_markers | ||
|
|
||
|
|
||
| def main(argv=None): | ||
| args = parse_args(argv) | ||
| setup_logging(args.verbose) | ||
|
|
||
| temp_dir = github_client.create_temp_dir() | ||
| logger.info("Using temp directory: %s", temp_dir) | ||
|
|
||
| try: | ||
| all_markers = [] | ||
| total_repos_scanned = 0 | ||
|
|
||
| for org in args.orgs: | ||
| # Choose strategy based on branch | ||
| if args.branch == "develop": | ||
| cloned, total_in_org = run_fast_path(org, args.branch, temp_dir) | ||
| else: | ||
| cloned, total_in_org = run_full_path(org, args.branch, temp_dir) | ||
|
|
||
| total_repos_scanned += total_in_org | ||
|
|
||
| # Scan cloned repos | ||
| markers = scan_cloned_repos(cloned) | ||
| all_markers.extend(markers) | ||
|
|
||
| # Generate report | ||
| report = report_generator.generate_report( | ||
| markers=all_markers, | ||
| branch=args.branch, | ||
| orgs=args.orgs, | ||
| components_scanned=total_repos_scanned, | ||
| ) | ||
|
|
||
| # Output | ||
| if args.output: | ||
| with open(args.output, "w") as f: | ||
| f.write(report) | ||
| logger.info("Report written to %s", args.output) | ||
| else: | ||
| print(report) | ||
|
|
||
| logger.info("Done. Found %d markers total.", len(all_markers)) | ||
| return 0 | ||
|
|
||
| except Exception as e: | ||
| logger.error("Fatal error: %s", e, exc_info=True) | ||
| return 1 | ||
|
|
||
| finally: |
| # Telemetry Marker Inventory | ||
| **Branch**: develop | ||
| **Organizations**: rdkcentral | ||
| **Generated**: 2026-03-16 06:31:31 UTC | ||
|
|
||
| ## Summary | ||
| - **Total Markers**: 913 | ||
| - **Static Markers**: 909 | ||
| - **Dynamic Markers**: 4 (contain shell variables) | ||
| - **Components Scanned**: 594 | ||
| - **Duplicate Markers**: 56 ⚠️ | ||
|
|
||
| ## Marker Inventory | ||
| | Marker Name | Component | File Path | Line | API | | ||
| |-------------|-----------|-----------|------|-----| | ||
| | 2GRxPackets_split | OneWifi | scripts/process_monitor_atom.sh | 592 | t2ValNotify | |
| """Quick integration test against a real cloned repo.""" | ||
| import sys | ||
| sys.path.insert(0, "/mnt/L2_CONTAINER_SHARED_VOLUME/59001") | ||
|
|
||
| from tools.marker_discovery.code_parser import scan_repo | ||
| from tools.marker_discovery.script_parser import scan_repo_scripts | ||
| from tools.marker_discovery.patch_parser import scan_repo_patches | ||
| from tools.marker_discovery.report_generator import generate_report | ||
|
|
||
| repo_path = "/tmp/test_telemetry" | ||
| repo_name = "telemetry" | ||
|
|
||
| print("=== Code Parser ===") | ||
| code_markers = scan_repo(repo_path, repo_name) | ||
| for m in code_markers[:10]: | ||
| print(f" {m.marker_name:40s} | {m.file_path:50s} | {m.line:5d} | {m.api}") | ||
| print(f" ... Total code markers: {len(code_markers)}") | ||
|
|
||
| print("\n=== Script Parser ===") | ||
| script_markers = scan_repo_scripts(repo_path, repo_name) | ||
| for m in script_markers[:10]: | ||
| print(f" {m.marker_name:40s} | {m.file_path:50s} | {m.line:5d} | {m.api}") | ||
| print(f" ... Total script markers: {len(script_markers)}") | ||
|
|
||
| print("\n=== Patch Parser ===") | ||
| patch_markers = scan_repo_patches(repo_path, repo_name) | ||
| for m in patch_markers[:10]: | ||
| print(f" {m.marker_name:40s} | {m.file_path:50s} | {m.line:5d} | {m.api}") | ||
| print(f" ... Total patch markers: {len(patch_markers)}") | ||
|
|
||
| all_markers = code_markers + script_markers + patch_markers | ||
| print(f"\n=== TOTAL: {len(all_markers)} markers ===") | ||
|
|
||
| print("\n=== Report Preview (first 30 lines) ===") | ||
| report = generate_report(all_markers, "develop", ["rdkcentral"], 1) | ||
| for line in report.split("\n")[:30]: | ||
| print(line) |
|
|
||
| def _create_parser(): | ||
| """Create a tree-sitter parser with C grammar.""" | ||
| parser = Parser(C_LANGUAGE) |
Signed-off-by: Yogeswaran K <yogeswaransky@gmail.com>
Signed-off-by: Yogeswaran K <yogeswaransky@gmail.com>
| """Remove a directory tree, handling stubborn files (e.g. git-lfs).""" | ||
| def _on_error(_func, _path, _exc_info): | ||
| try: | ||
| os.chmod(_path, 0o777) |
Check failure
Code scanning / CodeQL
Overly permissive file permissions High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 19 hours ago
In general, to fix overly permissive chmod calls, replace world-readable/writable modes like 0o777 with the minimum permissions necessary for the operation (here: successful removal of files/directories by the current user). For deletion, the process only needs owner permissions; there is no need to grant access to group or others.
The best targeted fix here is to change os.chmod(_path, 0o777) inside _force_rmtree’s _on_error handler to a more restrictive mask. Since this handler may be called for both files and directories and we only care about enabling deletion by the current process, we can safely use 0o700 (rwx for owner, no permissions for group/others) for all paths. That gives the owner full control to change or delete the object while not granting any rights to other users. No other functionality changes: _on_error still relaxes permissions enough for _func (the removal function) to succeed, and we still swallow any OSError during the cleanup attempt.
Concretely:
- Edit
tools/marker_discovery/github_client.py. - In the
_force_rmtreefunction’s_on_errorcallback (around line 449), change theos.chmodmode argument from0o777to0o700. - No new imports, methods, or definitions are required.
| @@ -446,7 +446,7 @@ | ||
| """Remove a directory tree, handling stubborn files (e.g. git-lfs).""" | ||
| def _on_error(_func, _path, _exc_info): | ||
| try: | ||
| os.chmod(_path, 0o777) | ||
| os.chmod(_path, 0o700) | ||
| _func(_path) | ||
| except OSError: | ||
| pass |
There was a problem hiding this comment.
Pull request overview
Adds a Python-based “Marker Discovery Tool” under tools/marker_discovery/ to scan GitHub org repos for T2 telemetry marker instrumentation (C/C++ direct calls + wrapper resolution, script calls, and .patch additions) and emit a consolidated markdown inventory report.
Changes:
- Introduces GitHub org enumeration + shallow cloning workflow and a CLI orchestrator to scan repos and generate a markdown report.
- Adds parsers for C/C++ (tree-sitter), scripts (regex + limited positional arg resolution), and patch files (added-lines-only).
- Adds pytest unit tests and supporting documentation/spec artifacts.
Reviewed changes
Copilot reviewed 22 out of 24 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/marker_discovery/code_parser.py | tree-sitter-based extraction of t2_event_* markers + wrapper detection/resolution. |
| tools/marker_discovery/github_client.py | GitHub API client + cloning helpers (branch/commit fallback strategies). |
| tools/marker_discovery/marker_scanner.py | CLI entry point/orchestration across orgs or an input manifest. |
| tools/marker_discovery/script_parser.py | Scans non-C/non-binary files for t2ValNotify/t2CountNotify, flags dynamic markers, attempts $1 resolution. |
| tools/marker_discovery/patch_parser.py | Scans .patch files for added-line marker calls (C + script APIs). |
| tools/marker_discovery/report_generator.py | Markdown report generation, sorting, duplicates, dynamic markers, unresolved components. |
| tools/marker_discovery/init.py | Defines MarkerRecord dataclass used across scanners/reporting. |
| tools/marker_discovery/main.py | Enables python3 -m tools.marker_discovery execution. |
| tools/marker_discovery/requirements.txt | Python dependencies for the scanner (tree-sitter + requests). |
| tools/marker_discovery/README.md | Tool documentation, usage, and design details. |
| tools/marker_discovery/tests/test_code_parser.py | Unit tests for direct call extraction, wrapper detection, wrapper call resolution. |
| tools/marker_discovery/tests/test_script_parser.py | Unit tests for script parsing, dynamic marker classification, positional arg resolution. |
| tools/marker_discovery/tests/test_patch_parser.py | Unit tests for .patch scanning behavior. |
| tools/marker_discovery/tests/test_report_generator.py | Unit tests for markdown report formatting, sorting, duplicates, dynamic/unresolved sections. |
| tools/marker_discovery/tests/integration_test.py | Adds a “quick integration test” script (currently executed at import time). |
| tools/marker_discovery/tests/debug_wrappers.py | Adds a wrapper-debug helper script. |
| tools/marker_discovery/tests/init.py | Marks test package. |
| report.md | Checked-in generated report output (sample inventory). |
| report6.md | Checked-in generated report output (sample inventory). |
| openspec/config.yaml | Adds OpenSpec configuration scaffold. |
| openspec/changes/marker-discovery/tasks.md | Spec-driven task breakdown for the feature. |
| openspec/changes/marker-discovery/proposal.md | Proposal/spec for the marker discovery tool. |
| openspec/changes/marker-discovery/design.md | Design document for architecture and module responsibilities. |
| openspec/changes/marker-discovery/.openspec.yaml | OpenSpec metadata for the change. |
| # Telemetry Marker Inventory | ||
| **Branch**: per-component (from versions-e.txt) | ||
| **Organizations**: rdkcentral, rdk-e, rdk-common, rdk-gdcs | ||
| **Generated**: 2026-03-29 18:30:14 UTC | ||
|
|
||
| ## Summary | ||
| - **Total Markers**: 491 | ||
| - **Static Markers**: 487 | ||
| - **Dynamic Markers**: 4 (contain shell variables) | ||
| - **Components Scanned**: 196 | ||
| - **Duplicate Markers**: 43 ⚠️ | ||
|
|
There was a problem hiding this comment.
These appear to be generated output reports (large, time-stamped inventories). Committing generated scan artifacts will quickly become stale and inflates the repository/PR diff. Suggest removing these from source control (or moving to a docs/examples location with clear naming) and generating them as build artifacts instead.
| # Telemetry Marker Inventory | ||
| **Branch**: develop | ||
| **Organizations**: rdkcentral | ||
| **Generated**: 2026-03-16 06:31:31 UTC | ||
|
|
||
| ## Summary | ||
| - **Total Markers**: 913 | ||
| - **Static Markers**: 909 | ||
| - **Dynamic Markers**: 4 (contain shell variables) | ||
| - **Components Scanned**: 594 | ||
| - **Duplicate Markers**: 56 ⚠️ |
There was a problem hiding this comment.
This looks like a generated, time-stamped output report rather than source. Keeping it committed will go stale and adds a very large diff/repo footprint. Recommend removing it from version control and generating it on demand (or storing a small, curated example under docs/ if needed).
| sys.path.insert(0, "/mnt/L2_CONTAINER_SHARED_VOLUME/59001") | ||
|
|
||
| from tools.marker_discovery.code_parser import scan_repo | ||
| from tools.marker_discovery.script_parser import scan_repo_scripts | ||
| from tools.marker_discovery.patch_parser import scan_repo_patches | ||
| from tools.marker_discovery.report_generator import generate_report | ||
|
|
||
| repo_path = "/tmp/test_telemetry" | ||
| repo_name = "telemetry" | ||
|
|
||
| print("=== Code Parser ===") | ||
| code_markers = scan_repo(repo_path, repo_name) | ||
| for m in code_markers[:10]: | ||
| print(f" {m.marker_name:40s} | {m.file_path:50s} | {m.line:5d} | {m.api}") | ||
| print(f" ... Total code markers: {len(code_markers)}") | ||
|
|
||
| print("\n=== Script Parser ===") | ||
| script_markers = scan_repo_scripts(repo_path, repo_name) | ||
| for m in script_markers[:10]: | ||
| print(f" {m.marker_name:40s} | {m.file_path:50s} | {m.line:5d} | {m.api}") | ||
| print(f" ... Total script markers: {len(script_markers)}") | ||
|
|
||
| print("\n=== Patch Parser ===") | ||
| patch_markers = scan_repo_patches(repo_path, repo_name) | ||
| for m in patch_markers[:10]: | ||
| print(f" {m.marker_name:40s} | {m.file_path:50s} | {m.line:5d} | {m.api}") | ||
| print(f" ... Total patch markers: {len(patch_markers)}") | ||
|
|
||
| all_markers = code_markers + script_markers + patch_markers | ||
| print(f"\n=== TOTAL: {len(all_markers)} markers ===") | ||
|
|
||
| print("\n=== Report Preview (first 30 lines) ===") | ||
| report = generate_report(all_markers, "develop", ["rdkcentral"], 1) | ||
| for line in report.split("\n")[:30]: | ||
| print(line) |
There was a problem hiding this comment.
This module will be collected by pytest (matches *_test.py) and executes network/FS-dependent code at import time (hardcoded sys.path, /tmp/test_telemetry, print loops). This will break pytest runs in CI. Convert this into a proper test with fixtures and pytest.mark.integration + opt-in skipping, or move it out of tests/ and guard execution with if __name__ == "__main__":.
| sys.path.insert(0, "/mnt/L2_CONTAINER_SHARED_VOLUME/59001") | |
| from tools.marker_discovery.code_parser import scan_repo | |
| from tools.marker_discovery.script_parser import scan_repo_scripts | |
| from tools.marker_discovery.patch_parser import scan_repo_patches | |
| from tools.marker_discovery.report_generator import generate_report | |
| repo_path = "/tmp/test_telemetry" | |
| repo_name = "telemetry" | |
| print("=== Code Parser ===") | |
| code_markers = scan_repo(repo_path, repo_name) | |
| for m in code_markers[:10]: | |
| print(f" {m.marker_name:40s} | {m.file_path:50s} | {m.line:5d} | {m.api}") | |
| print(f" ... Total code markers: {len(code_markers)}") | |
| print("\n=== Script Parser ===") | |
| script_markers = scan_repo_scripts(repo_path, repo_name) | |
| for m in script_markers[:10]: | |
| print(f" {m.marker_name:40s} | {m.file_path:50s} | {m.line:5d} | {m.api}") | |
| print(f" ... Total script markers: {len(script_markers)}") | |
| print("\n=== Patch Parser ===") | |
| patch_markers = scan_repo_patches(repo_path, repo_name) | |
| for m in patch_markers[:10]: | |
| print(f" {m.marker_name:40s} | {m.file_path:50s} | {m.line:5d} | {m.api}") | |
| print(f" ... Total patch markers: {len(patch_markers)}") | |
| all_markers = code_markers + script_markers + patch_markers | |
| print(f"\n=== TOTAL: {len(all_markers)} markers ===") | |
| print("\n=== Report Preview (first 30 lines) ===") | |
| report = generate_report(all_markers, "develop", ["rdkcentral"], 1) | |
| for line in report.split("\n")[:30]: | |
| print(line) | |
| def main() -> None: | |
| # Adjust sys.path so local tools package can be imported when run as a script. | |
| sys.path.insert(0, "/mnt/L2_CONTAINER_SHARED_VOLUME/59001") | |
| from tools.marker_discovery.code_parser import scan_repo | |
| from tools.marker_discovery.script_parser import scan_repo_scripts | |
| from tools.marker_discovery.patch_parser import scan_repo_patches | |
| from tools.marker_discovery.report_generator import generate_report | |
| repo_path = "/tmp/test_telemetry" | |
| repo_name = "telemetry" | |
| print("=== Code Parser ===") | |
| code_markers = scan_repo(repo_path, repo_name) | |
| for m in code_markers[:10]: | |
| print(f" {m.marker_name:40s} | {m.file_path:50s} | {m.line:5d} | {m.api}") | |
| print(f" ... Total code markers: {len(code_markers)}") | |
| print("\n=== Script Parser ===") | |
| script_markers = scan_repo_scripts(repo_path, repo_name) | |
| for m in script_markers[:10]: | |
| print(f" {m.marker_name:40s} | {m.file_path:50s} | {m.line:5d} | {m.api}") | |
| print(f" ... Total script markers: {len(script_markers)}") | |
| print("\n=== Patch Parser ===") | |
| patch_markers = scan_repo_patches(repo_path, repo_name) | |
| for m in patch_markers[:10]: | |
| print(f" {m.marker_name:40s} | {m.file_path:50s} | {m.line:5d} | {m.api}") | |
| print(f" ... Total patch markers: {len(patch_markers)}") | |
| all_markers = code_markers + script_markers + patch_markers | |
| print(f"\n=== TOTAL: {len(all_markers)} markers ===") | |
| print("\n=== Report Preview (first 30 lines) ===") | |
| report = generate_report(all_markers, "develop", ["rdkcentral"], 1) | |
| for line in report.split("\n")[:30]: | |
| print(line) | |
| if __name__ == "__main__": | |
| main() |
| sys.path.insert(0, "/mnt/L2_CONTAINER_SHARED_VOLUME/59001") | ||
| from tools.marker_discovery.code_parser import detect_wrappers | ||
| ws = detect_wrappers("/tmp/test_telemetry") | ||
| for w in ws: | ||
| print(w) |
There was a problem hiding this comment.
This helper script hardcodes a container-specific sys.path and a local path (/tmp/test_telemetry). Even if it isn't collected by pytest today, keeping this under tests/ is brittle and encourages non-reproducible debugging paths. Consider moving it to a separate scripts/ or devtools/ directory and/or making the paths configurable via CLI args/env vars.
| sys.path.insert(0, "/mnt/L2_CONTAINER_SHARED_VOLUME/59001") | |
| from tools.marker_discovery.code_parser import detect_wrappers | |
| ws = detect_wrappers("/tmp/test_telemetry") | |
| for w in ws: | |
| print(w) | |
| import os | |
| import argparse | |
| def main() -> None: | |
| parser = argparse.ArgumentParser( | |
| description="Debug helper for listing marker discovery wrappers." | |
| ) | |
| parser.add_argument( | |
| "--code-path", | |
| dest="code_path", | |
| default=os.environ.get("TELEMETRY_CODE_PATH"), | |
| help=( | |
| "Optional path to prepend to sys.path so that " | |
| "tools.marker_discovery can be imported " | |
| "(default: value of TELEMETRY_CODE_PATH, if set)." | |
| ), | |
| ) | |
| parser.add_argument( | |
| "--workspace", | |
| dest="workspace", | |
| default=os.environ.get("TELEMETRY_WORKSPACE", "/tmp/test_telemetry"), | |
| help=( | |
| "Path to the telemetry workspace to scan for wrappers " | |
| "(default: TELEMETRY_WORKSPACE env var or /tmp/test_telemetry)." | |
| ), | |
| ) | |
| args = parser.parse_args() | |
| if args.code_path: | |
| sys.path.insert(0, args.code_path) | |
| from tools.marker_discovery.code_parser import detect_wrappers | |
| ws = detect_wrappers(args.workspace) | |
| for w in ws: | |
| print(w) | |
| if __name__ == "__main__": | |
| main() |
| file_path: str | ||
| line: int | ||
| api: str | ||
| source_type: str # "source" | "script" | "patch" |
There was a problem hiding this comment.
MarkerRecord.source_type is documented as only "source" | "script" | "patch", but the implementation/reporting also uses "script_dynamic". Update this comment/type annotation to reflect the actual allowed values to avoid confusing callers and test writers.
| source_type: str # "source" | "script" | "patch" | |
| source_type: str # "source" | "script" | "patch" | "script_dynamic" |
| Returns (function_name, positional_arg_index) if the marker at line_num | ||
| uses a positional parameter like $1, $2, or None if not in a function. |
There was a problem hiding this comment.
The docstring for _find_shell_function_for_line says it returns (function_name, positional_arg_index) but the function actually returns only the function name (or None). Please correct the docstring to match behavior (or adjust the function to return the documented tuple).
| Returns (function_name, positional_arg_index) if the marker at line_num | |
| uses a positional parameter like $1, $2, or None if not in a function. | |
| Returns the name of the enclosing shell function as a string, or None if | |
| the line is not inside a function definition. |
| def clone_matching_repos(org, repo_names, branch, temp_dir): | ||
| """Clone a list of repos with branch fallback. | ||
|
|
||
| Args: | ||
| org: GitHub organization name | ||
| repo_names: iterable of repo names to clone | ||
| branch: target branch name | ||
| temp_dir: directory to clone into | ||
|
|
||
| Returns list of dicts: {'name': str, 'path': str, 'branch': str} | ||
| """ | ||
| cloned = [] | ||
| for repo_name in sorted(repo_names): | ||
| clone_path, actual_branch = clone_repo(org, repo_name, branch, temp_dir) | ||
| if clone_path: | ||
| cloned.append({ | ||
| "name": repo_name, | ||
| "path": clone_path, | ||
| "branch": actual_branch, | ||
| }) | ||
| logger.info("Cloned %d/%d repos for org %s", len(cloned), len(list(repo_names)), org) | ||
| return cloned |
There was a problem hiding this comment.
clone_matching_repos logs the total repo count using len(list(repo_names)), which can (a) consume an iterator/generator and report an incorrect total, and (b) allocate an extra list unnecessarily. Compute the total once up front (e.g., repo_names = list(repo_names) at entry) and reuse it for both sorting and logging.
Signed-off-by: Yogeswaran K <yogeswaransky@gmail.com>
Signed-off-by: Yogeswaran K <yogeswaransky@gmail.com>
| # Telemetry Marker Inventory | ||
| **Branch**: per-component (from versions-e.txt) | ||
| **Organizations**: rdkcentral, rdk-e, rdk-common, rdk-gdcs | ||
| **Generated**: 2026-03-29 18:30:14 UTC | ||
|
|
||
| ## Summary | ||
| - **Total Markers**: 491 | ||
| - **Static Markers**: 487 | ||
| - **Dynamic Markers**: 4 (contain shell variables) | ||
| - **Components Scanned**: 196 | ||
| - **Duplicate Markers**: 43 ⚠️ | ||
|
|
||
| ## Marker Inventory | ||
| | Marker Name | Component | File Path | Line | API | | ||
| |-------------|-----------|-----------|------|-----| | ||
| | 5GclientMac_split | telemetry | source/testApp/testCommonLibApi.c | 73 | t2_event_s | |
There was a problem hiding this comment.
This looks like a generated output report committed into the repo. It also does not match the current report_generator output structure (e.g., this file uses a single "## Marker Inventory" section, while the generator emits "## Unique Marker Inventory" and "## Detailed Marker Inventory").
Suggestion: avoid committing generated reports to source control; if you want a sample, keep a minimal example under docs/ or README and ensure it is produced by the current generator to prevent it from going stale.
| _REQUIRED = ["tree_sitter", "tree_sitter_c", "requests"] | ||
|
|
||
|
|
||
| def _ensure_deps(): | ||
| missing = [] | ||
| for mod in _REQUIRED: | ||
| try: | ||
| importlib.import_module(mod) | ||
| except ImportError: | ||
| missing.append(mod) | ||
| if missing: | ||
| req = str(__import__("pathlib").Path(__file__).parent / "requirements.txt") | ||
| print(f"Installing missing dependencies: {', '.join(missing)}", file=sys.stderr) | ||
| subprocess.check_call([sys.executable, "-m", "pip", "install", "-r", req]) | ||
|
|
||
|
|
||
| _ensure_deps() |
There was a problem hiding this comment.
main.py automatically runs pip install at import time when dependencies are missing. This has significant operational/security implications (unexpected network access, mutating the runtime environment, and surprising behavior in CI or production images).
Suggestion: fail fast with a clear error message instructing users to install requirements, or gate auto-install behind an explicit CLI flag / environment variable.
| _GITHUB_LINE_PATTERN = re.compile( | ||
| r"^(?:b')?" # optional b' prefix | ||
| r"(?:" | ||
| r"https?://github\.com/" # HTTPS URL | ||
| r"|" | ||
| r"ssh://(?:git@)?github\.com/" # SSH URL (with optional git@ user) | ||
| r")" | ||
| r"([^/]+)/" # org | ||
| r"([^@'.\s]+)" # repo name | ||
| r"(?:\.git)?" # optional .git suffix | ||
| r"'?" # optional closing quote | ||
| r"@(\S*)" # @ref (branch/tag/sha, may be empty) | ||
| r"\s*:\s*" # separator | ||
| r"(\S+)" # commit SHA or checksum |
There was a problem hiding this comment.
The versions.txt parser regex disallows '.' in repo names (r"([^@'.\s]+)"). GitHub repo names can include dots, so those entries would be silently skipped.
Suggestion: relax the repo-name character class to allow '.' (and rely on the '@' delimiter / optional .git suffix to bound the repo name).
Signed-off-by: Yogeswaran K <yogeswaransky@gmail.com>
Signed-off-by: Yogeswaran K <yogeswaransky@gmail.com>
shibu-kv
left a comment
There was a problem hiding this comment.
LGTM . Will defer the merges to mainline for now to support other critical releases.
No description provided.