RDK-59001: [POC] Create Marker Instrumentation Discovery Tool by yogeswaransky · Pull Request #288 · rdkcentral/telemetry

yogeswaransky · 2026-03-17T13:10:51Z

No description provided.

Signed-off-by: Yogeswaran K <yogeswaransky@gmail.com>

Copilot

Pull request overview

Adds a POC Python-based “marker discovery” CLI under tools/marker_discovery/ to inventory T2 telemetry marker instrumentation across repos (C/C++ direct calls + wrapper resolution, script calls, and .patch additions) and emit a consolidated Markdown report.

Changes:

Introduces GitHub org scanning + shallow clone orchestration, plus C/script/patch marker scanners.
Adds Markdown report generation (sorting, duplicate detection, dynamic marker section).
Adds pytest unit tests and supporting spec/docs artifacts (plus a sample generated report.md).

Reviewed changes

Copilot reviewed 20 out of 22 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
tools/marker_discovery/init.py	Defines the `MarkerRecord` datamodel used across scanners/reporting.
tools/marker_discovery/main.py	Enables `python -m tools.marker_discovery` execution.
tools/marker_discovery/code_parser.py	tree-sitter-based C/C++ parser for direct marker calls + wrapper detection/resolution.
tools/marker_discovery/github_client.py	GitHub API client for repo listing/search, plus shallow cloning with branch fallback.
tools/marker_discovery/marker_scanner.py	CLI entry point and orchestration across orgs/repos; runs all scanners and generates report.
tools/marker_discovery/patch_parser.py	Scans added lines in `.patch` files for marker API usage.
tools/marker_discovery/report_generator.py	Produces the Markdown inventory report including duplicate/dynamic sections.
tools/marker_discovery/script_parser.py	Scans non-C/patch files for `t2ValNotify`/`t2CountNotify`, flags dynamic markers, resolves `$1` for simple wrappers.
tools/marker_discovery/requirements.txt	Declares Python dependencies for the tool.
tools/marker_discovery/tests/init.py	Test package marker.
tools/marker_discovery/tests/test_code_parser.py	Unit tests for direct-call extraction and wrapper call-site resolution.
tools/marker_discovery/tests/test_script_parser.py	Unit tests for script marker extraction, dynamic detection, and positional arg resolution.
tools/marker_discovery/tests/test_patch_parser.py	Unit tests for patch scanning (added lines only) across C + script APIs.
tools/marker_discovery/tests/test_report_generator.py	Unit tests for report structure, sorting, duplicates, and dynamic marker handling.
tools/marker_discovery/tests/integration_test.py	Adds a manual “integration test” style script (currently placed under tests).
tools/marker_discovery/tests/debug_wrappers.py	Adds a local debugging helper script for wrapper detection.
report.md	Sample/generated marker inventory report output.
openspec/config.yaml	Adds OpenSpec config scaffold.
openspec/changes/marker-discovery/.openspec.yaml	OpenSpec metadata for the change.
openspec/changes/marker-discovery/proposal.md	Proposal/spec for the marker discovery tool.
openspec/changes/marker-discovery/design.md	Design doc describing architecture and behavior.
openspec/changes/marker-discovery/tasks.md	Task breakdown/checklist for implementing the tool.

tools/marker_discovery/marker_scanner.py

+    repos_to_clone = [r["name"] for r in all_repos if r["name"] in repo_names_to_clone]
+
+    cloned = github_client.clone_matching_repos(org, repos_to_clone, branch, temp_dir)
+    return cloned, len(all_repos)


tools/marker_discovery/github_client.py

+def clone_matching_repos(org, repo_names, branch, temp_dir):
+    """Clone a list of repos with branch fallback.
+
+    Args:
+        org: GitHub organization name
+        repo_names: iterable of repo names to clone
+        branch: target branch name
+        temp_dir: directory to clone into
+
+    Returns list of dicts: {'name': str, 'path': str, 'branch': str}
+    """
+    cloned = []
+    for repo_name in sorted(repo_names):
+        clone_path, actual_branch = clone_repo(org, repo_name, branch, temp_dir)
+        if clone_path:
+            cloned.append({
+                "name": repo_name,
+                "path": clone_path,
+                "branch": actual_branch,
+            })
+    logger.info("Cloned %d/%d repos for org %s", len(cloned), len(list(repo_names)), org)
+    return cloned


tools/marker_discovery/script_parser.py

+def _find_shell_function_for_line(file_content, line_num):
+    """Find what shell function a given line belongs to.
+
+    Returns (function_name, positional_arg_index) if the marker at line_num
+    uses a positional parameter like $1, $2, or None if not in a function.
+    """
+    lines = file_content.split('\n')
+    if line_num < 1 or line_num > len(lines):
+        return None
+
+    # Walk backwards to find function definition
+    func_pattern = re.compile(r'^\s*(?:function\s+)?(\w+)\s*\(\s*\)')
+    for i in range(line_num - 1, -1, -1):
+        line = lines[i]
+        m = func_pattern.match(line)
+        if m:
+            return m.group(1)
+        # If we hit another closing brace at column 0, we've left the function
+        if line.strip() == '}' and i < line_num - 1:
+            return None
+
+    return None


tools/marker_discovery/__init__.py

+    file_path: str
+    line: int
+    api: str
+    source_type: str  # "source" | "script" | "patch"


tools/marker_discovery/marker_scanner.py

+def run_fast_path(org, branch, temp_dir):
+    """Fast path: use search API to find repos with markers, then clone only those."""
+    logger.info("Fast path: searching for marker repos in %s via Search API...", org)
+
+    matching_repos = github_client.search_all_markers_in_org(org)
+    if not matching_repos:
+        logger.warning("No repos with markers found in %s via search", org)
+        return [], 0
+
+    logger.info("Found %d repos with potential markers, cloning...", len(matching_repos))
+
+    # Get clone URLs for matching repos
+    all_repos = github_client.list_org_repos(org)
+    repo_names_to_clone = {r for r in matching_repos}
+    repos_to_clone = [r["name"] for r in all_repos if r["name"] in repo_names_to_clone]
+
+    cloned = github_client.clone_matching_repos(org, repos_to_clone, branch, temp_dir)
+    return cloned, len(all_repos)
+
+
+def run_full_path(org, branch, temp_dir):
+    """Full path: list all repos, clone all, scan locally."""
+    logger.info("Full path: listing all repos in %s...", org)
+
+    all_repos = github_client.list_org_repos(org)
+    repo_names = [r["name"] for r in all_repos]
+
+    logger.info("Cloning %d repos on branch %s...", len(repo_names), branch)
+    cloned = github_client.clone_matching_repos(org, repo_names, branch, temp_dir)
+    return cloned, len(all_repos)
+
+
+def scan_cloned_repos(cloned_repos):
+    """Run all scanners on cloned repos. Returns list of MarkerRecord."""
+    all_markers = []
+
+    for repo_info in cloned_repos:
+        repo_path = repo_info["path"]
+        repo_name = repo_info["name"]
+
+        logger.info("Scanning %s...", repo_name)
+
+        # C/C++ source scanning (tree-sitter)
+        try:
+            markers = code_parser.scan_repo(repo_path, repo_name)
+            all_markers.extend(markers)
+        except Exception as e:
+            logger.warning("Code parser failed for %s: %s", repo_name, e)
+
+        # Script scanning
+        try:
+            markers = script_parser.scan_repo_scripts(repo_path, repo_name)
+            all_markers.extend(markers)
+        except Exception as e:
+            logger.warning("Script parser failed for %s: %s", repo_name, e)
+
+        # Patch scanning
+        try:
+            markers = patch_parser.scan_repo_patches(repo_path, repo_name)
+            all_markers.extend(markers)
+        except Exception as e:
+            logger.warning("Patch parser failed for %s: %s", repo_name, e)
+
+    return all_markers
+
+
+def main(argv=None):
+    args = parse_args(argv)
+    setup_logging(args.verbose)
+
+    temp_dir = github_client.create_temp_dir()
+    logger.info("Using temp directory: %s", temp_dir)
+
+    try:
+        all_markers = []
+        total_repos_scanned = 0
+
+        for org in args.orgs:
+            # Choose strategy based on branch
+            if args.branch == "develop":
+                cloned, total_in_org = run_fast_path(org, args.branch, temp_dir)
+            else:
+                cloned, total_in_org = run_full_path(org, args.branch, temp_dir)
+
+            total_repos_scanned += total_in_org
+
+            # Scan cloned repos
+            markers = scan_cloned_repos(cloned)
+            all_markers.extend(markers)
+
+        # Generate report
+        report = report_generator.generate_report(
+            markers=all_markers,
+            branch=args.branch,
+            orgs=args.orgs,
+            components_scanned=total_repos_scanned,
+        )
+
+        # Output
+        if args.output:
+            with open(args.output, "w") as f:
+                f.write(report)
+            logger.info("Report written to %s", args.output)
+        else:
+            print(report)
+
+        logger.info("Done. Found %d markers total.", len(all_markers))
+        return 0
+
+    except Exception as e:
+        logger.error("Fatal error: %s", e, exc_info=True)
+        return 1
+
+    finally:


report.md

+# Telemetry Marker Inventory
+**Branch**: develop
+**Organizations**: rdkcentral
+**Generated**: 2026-03-16 06:31:31 UTC
+
+## Summary
+- **Total Markers**: 913
+- **Static Markers**: 909
+- **Dynamic Markers**: 4 (contain shell variables)
+- **Components Scanned**: 594
+- **Duplicate Markers**: 56 ⚠️
+
+## Marker Inventory
+| Marker Name | Component | File Path | Line | API |
+|-------------|-----------|-----------|------|-----|
+| 2GRxPackets_split | OneWifi | scripts/process_monitor_atom.sh | 592 | t2ValNotify |


tools/marker_discovery/tests/integration_test.py

+"""Quick integration test against a real cloned repo."""
+import sys
+sys.path.insert(0, "/mnt/L2_CONTAINER_SHARED_VOLUME/59001")
+
+from tools.marker_discovery.code_parser import scan_repo
+from tools.marker_discovery.script_parser import scan_repo_scripts
+from tools.marker_discovery.patch_parser import scan_repo_patches
+from tools.marker_discovery.report_generator import generate_report
+
+repo_path = "/tmp/test_telemetry"
+repo_name = "telemetry"
+
+print("=== Code Parser ===")
+code_markers = scan_repo(repo_path, repo_name)
+for m in code_markers[:10]:
+    print(f"  {m.marker_name:40s} | {m.file_path:50s} | {m.line:5d} | {m.api}")
+print(f"  ... Total code markers: {len(code_markers)}")
+
+print("\n=== Script Parser ===")
+script_markers = scan_repo_scripts(repo_path, repo_name)
+for m in script_markers[:10]:
+    print(f"  {m.marker_name:40s} | {m.file_path:50s} | {m.line:5d} | {m.api}")
+print(f"  ... Total script markers: {len(script_markers)}")
+
+print("\n=== Patch Parser ===")
+patch_markers = scan_repo_patches(repo_path, repo_name)
+for m in patch_markers[:10]:
+    print(f"  {m.marker_name:40s} | {m.file_path:50s} | {m.line:5d} | {m.api}")
+print(f"  ... Total patch markers: {len(patch_markers)}")
+
+all_markers = code_markers + script_markers + patch_markers
+print(f"\n=== TOTAL: {len(all_markers)} markers ===")
+
+print("\n=== Report Preview (first 30 lines) ===")
+report = generate_report(all_markers, "develop", ["rdkcentral"], 1)
+for line in report.split("\n")[:30]:
+    print(line)


tools/marker_discovery/code_parser.py

+
+def _create_parser():
+    """Create a tree-sitter parser with C grammar."""
+    parser = Parser(C_LANGUAGE)


Signed-off-by: Yogeswaran K <yogeswaransky@gmail.com>

tools/marker_discovery/github_client.py

+    """Remove a directory tree, handling stubborn files (e.g. git-lfs)."""
+    def _on_error(_func, _path, _exc_info):
+        try:
+            os.chmod(_path, 0o777)


In general, to fix overly permissive chmod calls, replace world-readable/writable modes like 0o777 with the minimum permissions necessary for the operation (here: successful removal of files/directories by the current user). For deletion, the process only needs owner permissions; there is no need to grant access to group or others.

The best targeted fix here is to change os.chmod(_path, 0o777) inside _force_rmtree’s _on_error handler to a more restrictive mask. Since this handler may be called for both files and directories and we only care about enabling deletion by the current process, we can safely use 0o700 (rwx for owner, no permissions for group/others) for all paths. That gives the owner full control to change or delete the object while not granting any rights to other users. No other functionality changes: _on_error still relaxes permissions enough for _func (the removal function) to succeed, and we still swallow any OSError during the cleanup attempt.

Concretely:

Edit tools/marker_discovery/github_client.py.

In the _force_rmtree function’s _on_error callback (around line 449), change the os.chmod mode argument from 0o777 to 0o700.

No new imports, methods, or definitions are required.

Copilot

Pull request overview

Adds a Python-based “Marker Discovery Tool” under tools/marker_discovery/ to scan GitHub org repos for T2 telemetry marker instrumentation (C/C++ direct calls + wrapper resolution, script calls, and .patch additions) and emit a consolidated markdown inventory report.

Changes:

Introduces GitHub org enumeration + shallow cloning workflow and a CLI orchestrator to scan repos and generate a markdown report.
Adds parsers for C/C++ (tree-sitter), scripts (regex + limited positional arg resolution), and patch files (added-lines-only).
Adds pytest unit tests and supporting documentation/spec artifacts.

Reviewed changes

Copilot reviewed 22 out of 24 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
tools/marker_discovery/code_parser.py	tree-sitter-based extraction of `t2_event_*` markers + wrapper detection/resolution.
tools/marker_discovery/github_client.py	GitHub API client + cloning helpers (branch/commit fallback strategies).
tools/marker_discovery/marker_scanner.py	CLI entry point/orchestration across orgs or an input manifest.
tools/marker_discovery/script_parser.py	Scans non-C/non-binary files for `t2ValNotify`/`t2CountNotify`, flags dynamic markers, attempts `$1` resolution.
tools/marker_discovery/patch_parser.py	Scans `.patch` files for added-line marker calls (C + script APIs).
tools/marker_discovery/report_generator.py	Markdown report generation, sorting, duplicates, dynamic markers, unresolved components.
tools/marker_discovery/init.py	Defines `MarkerRecord` dataclass used across scanners/reporting.
tools/marker_discovery/main.py	Enables `python3 -m tools.marker_discovery` execution.
tools/marker_discovery/requirements.txt	Python dependencies for the scanner (tree-sitter + requests).
tools/marker_discovery/README.md	Tool documentation, usage, and design details.
tools/marker_discovery/tests/test_code_parser.py	Unit tests for direct call extraction, wrapper detection, wrapper call resolution.
tools/marker_discovery/tests/test_script_parser.py	Unit tests for script parsing, dynamic marker classification, positional arg resolution.
tools/marker_discovery/tests/test_patch_parser.py	Unit tests for `.patch` scanning behavior.
tools/marker_discovery/tests/test_report_generator.py	Unit tests for markdown report formatting, sorting, duplicates, dynamic/unresolved sections.
tools/marker_discovery/tests/integration_test.py	Adds a “quick integration test” script (currently executed at import time).
tools/marker_discovery/tests/debug_wrappers.py	Adds a wrapper-debug helper script.
tools/marker_discovery/tests/init.py	Marks test package.
report.md	Checked-in generated report output (sample inventory).
report6.md	Checked-in generated report output (sample inventory).
openspec/config.yaml	Adds OpenSpec configuration scaffold.
openspec/changes/marker-discovery/tasks.md	Spec-driven task breakdown for the feature.
openspec/changes/marker-discovery/proposal.md	Proposal/spec for the marker discovery tool.
openspec/changes/marker-discovery/design.md	Design document for architecture and module responsibilities.
openspec/changes/marker-discovery/.openspec.yaml	OpenSpec metadata for the change.

Copilot · 2026-03-29T18:46:13Z

report6.md

+# Telemetry Marker Inventory
+**Branch**: per-component (from versions-e.txt)
+**Organizations**: rdkcentral, rdk-e, rdk-common, rdk-gdcs
+**Generated**: 2026-03-29 18:30:14 UTC
+
+## Summary
+- **Total Markers**: 491
+- **Static Markers**: 487
+- **Dynamic Markers**: 4 (contain shell variables)
+- **Components Scanned**: 196
+- **Duplicate Markers**: 43 ⚠️
+


These appear to be generated output reports (large, time-stamped inventories). Committing generated scan artifacts will quickly become stale and inflates the repository/PR diff. Suggest removing these from source control (or moving to a docs/examples location with clear naming) and generating them as build artifacts instead.

Copilot · 2026-03-29T18:46:14Z

report.md

+# Telemetry Marker Inventory
+**Branch**: develop
+**Organizations**: rdkcentral
+**Generated**: 2026-03-16 06:31:31 UTC
+
+## Summary
+- **Total Markers**: 913
+- **Static Markers**: 909
+- **Dynamic Markers**: 4 (contain shell variables)
+- **Components Scanned**: 594
+- **Duplicate Markers**: 56 ⚠️


This looks like a generated, time-stamped output report rather than source. Keeping it committed will go stale and adds a very large diff/repo footprint. Recommend removing it from version control and generating it on demand (or storing a small, curated example under docs/ if needed).

Copilot · 2026-03-29T18:46:14Z

tools/marker_discovery/tests/integration_test.py

+sys.path.insert(0, "/mnt/L2_CONTAINER_SHARED_VOLUME/59001")
+
+from tools.marker_discovery.code_parser import scan_repo
+from tools.marker_discovery.script_parser import scan_repo_scripts
+from tools.marker_discovery.patch_parser import scan_repo_patches
+from tools.marker_discovery.report_generator import generate_report
+
+repo_path = "/tmp/test_telemetry"
+repo_name = "telemetry"
+
+print("=== Code Parser ===")
+code_markers = scan_repo(repo_path, repo_name)
+for m in code_markers[:10]:
+    print(f"  {m.marker_name:40s} | {m.file_path:50s} | {m.line:5d} | {m.api}")
+print(f"  ... Total code markers: {len(code_markers)}")
+
+print("\n=== Script Parser ===")
+script_markers = scan_repo_scripts(repo_path, repo_name)
+for m in script_markers[:10]:
+    print(f"  {m.marker_name:40s} | {m.file_path:50s} | {m.line:5d} | {m.api}")
+print(f"  ... Total script markers: {len(script_markers)}")
+
+print("\n=== Patch Parser ===")
+patch_markers = scan_repo_patches(repo_path, repo_name)
+for m in patch_markers[:10]:
+    print(f"  {m.marker_name:40s} | {m.file_path:50s} | {m.line:5d} | {m.api}")
+print(f"  ... Total patch markers: {len(patch_markers)}")
+
+all_markers = code_markers + script_markers + patch_markers
+print(f"\n=== TOTAL: {len(all_markers)} markers ===")
+
+print("\n=== Report Preview (first 30 lines) ===")
+report = generate_report(all_markers, "develop", ["rdkcentral"], 1)
+for line in report.split("\n")[:30]:
+    print(line)


This module will be collected by pytest (matches *_test.py) and executes network/FS-dependent code at import time (hardcoded sys.path, /tmp/test_telemetry, print loops). This will break pytest runs in CI. Convert this into a proper test with fixtures and pytest.mark.integration + opt-in skipping, or move it out of tests/ and guard execution with if __name__ == "__main__":.

Suggested change

sys.path.insert(0, "/mnt/L2_CONTAINER_SHARED_VOLUME/59001")

from tools.marker_discovery.code_parser import scan_repo

from tools.marker_discovery.script_parser import scan_repo_scripts

from tools.marker_discovery.patch_parser import scan_repo_patches

from tools.marker_discovery.report_generator import generate_report

repo_path = "/tmp/test_telemetry"

repo_name = "telemetry"

print("=== Code Parser ===")

code_markers = scan_repo(repo_path, repo_name)

for m in code_markers[:10]:

print(f" {m.marker_name:40s} | {m.file_path:50s} | {m.line:5d} | {m.api}")

print(f" ... Total code markers: {len(code_markers)}")

print("\n=== Script Parser ===")

script_markers = scan_repo_scripts(repo_path, repo_name)

for m in script_markers[:10]:

print(f" {m.marker_name:40s} | {m.file_path:50s} | {m.line:5d} | {m.api}")

print(f" ... Total script markers: {len(script_markers)}")

print("\n=== Patch Parser ===")

patch_markers = scan_repo_patches(repo_path, repo_name)

for m in patch_markers[:10]:

print(f" {m.marker_name:40s} | {m.file_path:50s} | {m.line:5d} | {m.api}")

print(f" ... Total patch markers: {len(patch_markers)}")

all_markers = code_markers + script_markers + patch_markers

print(f"\n=== TOTAL: {len(all_markers)} markers ===")

print("\n=== Report Preview (first 30 lines) ===")

report = generate_report(all_markers, "develop", ["rdkcentral"], 1)

for line in report.split("\n")[:30]:

print(line)

def main() -> None:

# Adjust sys.path so local tools package can be imported when run as a script.

sys.path.insert(0, "/mnt/L2_CONTAINER_SHARED_VOLUME/59001")

from tools.marker_discovery.code_parser import scan_repo

from tools.marker_discovery.script_parser import scan_repo_scripts

from tools.marker_discovery.patch_parser import scan_repo_patches

from tools.marker_discovery.report_generator import generate_report

repo_path = "/tmp/test_telemetry"

repo_name = "telemetry"

print("=== Code Parser ===")

code_markers = scan_repo(repo_path, repo_name)

for m in code_markers[:10]:

print(f" {m.marker_name:40s} | {m.file_path:50s} | {m.line:5d} | {m.api}")

print(f" ... Total code markers: {len(code_markers)}")

print("\n=== Script Parser ===")

script_markers = scan_repo_scripts(repo_path, repo_name)

for m in script_markers[:10]:

print(f" {m.marker_name:40s} | {m.file_path:50s} | {m.line:5d} | {m.api}")

print(f" ... Total script markers: {len(script_markers)}")

print("\n=== Patch Parser ===")

patch_markers = scan_repo_patches(repo_path, repo_name)

for m in patch_markers[:10]:

print(f" {m.marker_name:40s} | {m.file_path:50s} | {m.line:5d} | {m.api}")

print(f" ... Total patch markers: {len(patch_markers)}")

all_markers = code_markers + script_markers + patch_markers

print(f"\n=== TOTAL: {len(all_markers)} markers ===")

print("\n=== Report Preview (first 30 lines) ===")

report = generate_report(all_markers, "develop", ["rdkcentral"], 1)

for line in report.split("\n")[:30]:

print(line)

if __name__ == "__main__":

main()

Copilot · 2026-03-29T18:46:14Z

tools/marker_discovery/tests/debug_wrappers.py

+sys.path.insert(0, "/mnt/L2_CONTAINER_SHARED_VOLUME/59001")
+from tools.marker_discovery.code_parser import detect_wrappers
+ws = detect_wrappers("/tmp/test_telemetry")
+for w in ws:
+    print(w)


This helper script hardcodes a container-specific sys.path and a local path (/tmp/test_telemetry). Even if it isn't collected by pytest today, keeping this under tests/ is brittle and encourages non-reproducible debugging paths. Consider moving it to a separate scripts/ or devtools/ directory and/or making the paths configurable via CLI args/env vars.

Suggested change

sys.path.insert(0, "/mnt/L2_CONTAINER_SHARED_VOLUME/59001")

from tools.marker_discovery.code_parser import detect_wrappers

ws = detect_wrappers("/tmp/test_telemetry")

for w in ws:

print(w)

import os

import argparse

def main() -> None:

parser = argparse.ArgumentParser(

description="Debug helper for listing marker discovery wrappers."

)

parser.add_argument(

"--code-path",

dest="code_path",

default=os.environ.get("TELEMETRY_CODE_PATH"),

help=(

"Optional path to prepend to sys.path so that "

"tools.marker_discovery can be imported "

"(default: value of TELEMETRY_CODE_PATH, if set)."

),

)

parser.add_argument(

"--workspace",

dest="workspace",

default=os.environ.get("TELEMETRY_WORKSPACE", "/tmp/test_telemetry"),

help=(

"Path to the telemetry workspace to scan for wrappers "

"(default: TELEMETRY_WORKSPACE env var or /tmp/test_telemetry)."

),

)

args = parser.parse_args()

if args.code_path:

sys.path.insert(0, args.code_path)

from tools.marker_discovery.code_parser import detect_wrappers

ws = detect_wrappers(args.workspace)

for w in ws:

print(w)

if __name__ == "__main__":

main()

Copilot · 2026-03-29T18:46:15Z

tools/marker_discovery/__init__.py

+    file_path: str
+    line: int
+    api: str
+    source_type: str  # "source" | "script" | "patch"


MarkerRecord.source_type is documented as only "source" | "script" | "patch", but the implementation/reporting also uses "script_dynamic". Update this comment/type annotation to reflect the actual allowed values to avoid confusing callers and test writers.

Suggested change

source_type: str # "source" | "script" | "patch"

source_type: str # "source" | "script" | "patch" | "script_dynamic"

Copilot · 2026-03-29T18:46:15Z

tools/marker_discovery/script_parser.py

+    Returns (function_name, positional_arg_index) if the marker at line_num
+    uses a positional parameter like $1, $2, or None if not in a function.


The docstring for _find_shell_function_for_line says it returns (function_name, positional_arg_index) but the function actually returns only the function name (or None). Please correct the docstring to match behavior (or adjust the function to return the documented tuple).

Suggested change

Returns (function_name, positional_arg_index) if the marker at line_num

uses a positional parameter like $1, $2, or None if not in a function.

Returns the name of the enclosing shell function as a string, or None if

the line is not inside a function definition.

Copilot · 2026-03-29T18:46:15Z

tools/marker_discovery/github_client.py

+def clone_matching_repos(org, repo_names, branch, temp_dir):
+    """Clone a list of repos with branch fallback.
+
+    Args:
+        org: GitHub organization name
+        repo_names: iterable of repo names to clone
+        branch: target branch name
+        temp_dir: directory to clone into
+
+    Returns list of dicts: {'name': str, 'path': str, 'branch': str}
+    """
+    cloned = []
+    for repo_name in sorted(repo_names):
+        clone_path, actual_branch = clone_repo(org, repo_name, branch, temp_dir)
+        if clone_path:
+            cloned.append({
+                "name": repo_name,
+                "path": clone_path,
+                "branch": actual_branch,
+            })
+    logger.info("Cloned %d/%d repos for org %s", len(cloned), len(list(repo_names)), org)
+    return cloned


clone_matching_repos logs the total repo count using len(list(repo_names)), which can (a) consume an iterator/generator and report an incorrect total, and (b) allocate an extra list unnecessarily. Compute the total once up front (e.g., repo_names = list(repo_names) at entry) and reuse it for both sorting and logging.

Signed-off-by: Yogeswaran K <yogeswaransky@gmail.com>

Copilot

Pull request overview

Copilot reviewed 24 out of 26 changed files in this pull request and generated 3 comments.

Copilot · 2026-04-02T18:54:28Z

report6.md

+# Telemetry Marker Inventory
+**Branch**: per-component (from versions-e.txt)
+**Organizations**: rdkcentral, rdk-e, rdk-common, rdk-gdcs
+**Generated**: 2026-03-29 18:30:14 UTC
+
+## Summary
+- **Total Markers**: 491
+- **Static Markers**: 487
+- **Dynamic Markers**: 4 (contain shell variables)
+- **Components Scanned**: 196
+- **Duplicate Markers**: 43 ⚠️
+
+## Marker Inventory
+| Marker Name | Component | File Path | Line | API |
+|-------------|-----------|-----------|------|-----|
+| 5GclientMac_split | telemetry | source/testApp/testCommonLibApi.c | 73 | t2_event_s |


This looks like a generated output report committed into the repo. It also does not match the current report_generator output structure (e.g., this file uses a single "## Marker Inventory" section, while the generator emits "## Unique Marker Inventory" and "## Detailed Marker Inventory").

Suggestion: avoid committing generated reports to source control; if you want a sample, keep a minimal example under docs/ or README and ensure it is produced by the current generator to prevent it from going stale.

Copilot · 2026-04-02T18:54:29Z

tools/marker_discovery/__main__.py

+_REQUIRED = ["tree_sitter", "tree_sitter_c", "requests"]
+
+
+def _ensure_deps():
+    missing = []
+    for mod in _REQUIRED:
+        try:
+            importlib.import_module(mod)
+        except ImportError:
+            missing.append(mod)
+    if missing:
+        req = str(__import__("pathlib").Path(__file__).parent / "requirements.txt")
+        print(f"Installing missing dependencies: {', '.join(missing)}", file=sys.stderr)
+        subprocess.check_call([sys.executable, "-m", "pip", "install", "-r", req])
+
+
+_ensure_deps()


main.py automatically runs pip install at import time when dependencies are missing. This has significant operational/security implications (unexpected network access, mutating the runtime environment, and surprising behavior in CI or production images).

Suggestion: fail fast with a clear error message instructing users to install requirements, or gate auto-install behind an explicit CLI flag / environment variable.

Copilot · 2026-04-02T18:54:29Z

tools/marker_discovery/component_file_parser.py

+_GITHUB_LINE_PATTERN = re.compile(
+    r"^(?:b')?"                          # optional b' prefix
+    r"(?:"
+        r"https?://github\.com/"         # HTTPS URL
+        r"|"
+        r"ssh://(?:git@)?github\.com/"   # SSH URL (with optional git@ user)
+    r")"
+    r"([^/]+)/"                          # org
+    r"([^@'.\s]+)"                       # repo name
+    r"(?:\.git)?"                        # optional .git suffix
+    r"'?"                                # optional closing quote
+    r"@(\S*)"                            # @ref (branch/tag/sha, may be empty)
+    r"\s*:\s*"                           # separator
+    r"(\S+)"                             # commit SHA or checksum


The versions.txt parser regex disallows '.' in repo names (r"([^@'.\s]+)"). GitHub repo names can include dots, so those entries would be silently skipped.

Suggestion: relax the repo-name character class to allow '.' (and rely on the '@' delimiter / optional .git suffix to bound the repo name).

Signed-off-by: Yogeswaran K <yogeswaransky@gmail.com>

shibu-kv

LGTM . Will defer the merges to mainline for now to support other critical releases.

RDK-59001: [POC] Create Marker Instrumentation Discovery Tool

a5365b6

Signed-off-by: Yogeswaran K <yogeswaransky@gmail.com>

yogeswaransky requested a review from a team as a code owner March 17, 2026 13:10

Copilot AI review requested due to automatic review settings March 17, 2026 13:10

Copilot started reviewing on behalf of yogeswaransky March 17, 2026 13:11 View session

Copilot AI reviewed Mar 17, 2026

View reviewed changes

yogeswaransky added 2 commits March 18, 2026 15:32

RDK-59001: [POC] Create Marker Instrumentation Discovery Tool

2b5132a

Signed-off-by: Yogeswaran K <yogeswaransky@gmail.com>

RDK-59001: [POC] Create Marker Instrumentation Discovery Tool

6abfa8c

Signed-off-by: Yogeswaran K <yogeswaransky@gmail.com>

Copilot AI review requested due to automatic review settings March 29, 2026 18:42

Copilot started reviewing on behalf of yogeswaransky March 29, 2026 18:43 View session

github-advanced-security bot found potential problems Mar 29, 2026

View reviewed changes

Copilot AI reviewed Mar 29, 2026

View reviewed changes

yogeswaransky added 2 commits April 2, 2026 12:23

RDK-59001: [POC] Create Marker Instrumentation Discovery Tool

2e6079b

Signed-off-by: Yogeswaran K <yogeswaransky@gmail.com>

RDK-59001: [POC] Create Marker Instrumentation Discovery Tool

bd9d68f

Signed-off-by: Yogeswaran K <yogeswaransky@gmail.com>

Copilot AI review requested due to automatic review settings April 2, 2026 18:50

Copilot started reviewing on behalf of yogeswaransky April 2, 2026 18:51 View session

Copilot AI reviewed Apr 2, 2026

View reviewed changes

yogeswaransky added 2 commits April 2, 2026 19:36

RDK-59001: [POC] Create Marker Instrumentation Discovery Tool

34e0358

Signed-off-by: Yogeswaran K <yogeswaransky@gmail.com>

RDK-59001: [POC] Create Marker Instrumentation Discovery Tool

abb73a7

Signed-off-by: Yogeswaran K <yogeswaransky@gmail.com>

shibu-kv reviewed Apr 3, 2026

View reviewed changes

@@ -446,7 +446,7 @@
                 """Remove a directory tree, handling stubborn files (e.g. git-lfs)."""
                 def _on_error(_func, _path, _exc_info):
                     try:
-                        os.chmod(_path, 0o777)
+                        os.chmod(_path, 0o700)
                         _func(_path)
                     except OSError:
                         pass

-sys.path.insert(0, "/mnt/L2_CONTAINER_SHARED_VOLUME/59001")
-from tools.marker_discovery.code_parser import detect_wrappers
-ws = detect_wrappers("/tmp/test_telemetry")
-for w in ws:
-    print(w)
+import os
+import argparse
+def main() -> None:
+    parser = argparse.ArgumentParser(
+        description="Debug helper for listing marker discovery wrappers."
+    )
+    parser.add_argument(
+        "--code-path",
+        dest="code_path",
+        default=os.environ.get("TELEMETRY_CODE_PATH"),
+        help=(
+            "Optional path to prepend to sys.path so that "
+            "tools.marker_discovery can be imported "
+            "(default: value of TELEMETRY_CODE_PATH, if set)."
+        ),
+    )
+    parser.add_argument(
+        "--workspace",
+        dest="workspace",
+        default=os.environ.get("TELEMETRY_WORKSPACE", "/tmp/test_telemetry"),
+        help=(
+            "Path to the telemetry workspace to scan for wrappers "
+            "(default: TELEMETRY_WORKSPACE env var or /tmp/test_telemetry)."
+        ),
+    )
+    args = parser.parse_args()
+    if args.code_path:
+        sys.path.insert(0, args.code_path)
+    from tools.marker_discovery.code_parser import detect_wrappers
+    ws = detect_wrappers(args.workspace)
+    for w in ws:
+        print(w)
+if __name__ == "__main__":
+    main()

	source_type: str # "source" \| "script" \| "patch"
	source_type: str # "source" \| "script" \| "patch" \| "script_dynamic"

		Returns (function_name, positional_arg_index) if the marker at line_num
		uses a positional parameter like $1, $2, or None if not in a function.

Conversation

yogeswaransky commented Mar 17, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Check failure

Copilot Autofix

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

shibu-kv left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants