From 6d2728b76378a215f264e6b9d5fbea29c6ffe8b1 Mon Sep 17 00:00:00 2001
From: shibu-kv <shibu.kakkoth@gmail.com>
Date: Fri, 27 Mar 2026 16:02:40 -0700
Subject: [PATCH 01/12] Thread hardening architecture diagram

---
 .../thread-safety-hardening-diagram.md        | 622 ++++++++++++++++++
 1 file changed, 622 insertions(+)
 create mode 100644 docs/architecture/thread-safety-hardening-diagram.md
diff --git a/docs/architecture/thread-safety-hardening-diagram.md b/docs/architecture/thread-safety-hardening-diagram.md
new file mode 100644
index 00000000..8487f831
--- /dev/null
+++ b/docs/architecture/thread-safety-hardening-diagram.md
@@ -0,0 +1,622 @@
+# Telemetry Thread Safety Hardening - Architecture Diagram
+
+## User Story
+**[T2] [RDKB] Harden Telemetry Thread Safety Under Concurrent Load**
+
+Harden critical synchronization paths across telemetry modules to eliminate deadlocks and race conditions under concurrent load scenarios (15+ profiles with extended offline periods).
+
+---
+
+## 1. High-Level Component Architecture with Threading
+
+```mermaid
+graph TB
+    subgraph "External Systems"
+        APPS[Applications<br/>t2_event_s/d/f calls]
+        XCONF[XConf Server<br/>Configuration Source]
+        COLLECTOR[Collection Server<br/>HTTPS/RBUS]
+    end
+    
+    subgraph "Telemetry Core Process"
+        subgraph "Main Thread"
+            MAIN[Main Thread<br/>Initialization & Cleanup]
+        end
+        
+        subgraph "Event Collection Thread"
+            ER[Event Receiver Thread<br/>🔴 Queue processing<br/>⚠️ High cyclomatic complexity]
+            EQ[(Event Queue<br/>Max: 200 events<br/>🔴 Lock contention)]
+        end
+        
+        subgraph "Configuration Thread"
+            XC[XConf Client Thread<br/>🔴 Config update races<br/>Periodic fetch]
+        end
+        
+        subgraph "Scheduling Thread"
+            SCHED[Scheduler Thread<br/>Timer-based triggers]
+        end
+        
+        subgraph "Per-Profile Report Threads (1-15+)"
+            RT1[Report Thread 1<br/>🔴 Deadlock risk<br/>plMutex + pool_mutex]
+            RT2[Report Thread 2<br/>...]
+            RTN[Report Thread N<br/>🔴 Connection pool blocking]
+        end
+        
+        subgraph "Data Model Threads"
+            DM[Data Model Thread<br/>TR-181/RBUS queries]
+        end
+        
+        subgraph "Shared Resources"
+            PROF[(Profile List<br/>🔴 plMutex contention<br/>⚠️ Lock ordering issues)]
+            POOL[(Connection Pool<br/>🔴 pool_mutex deadlock<br/>Size: 1-5 handles<br/>⚠️ No timeout!)]
+            MARKERS[(Marker Cache<br/>Hash map lookup)]
+        end
+    end
+    
+    APPS -->|t2_event_*| ER
+    ER --> EQ
+    EQ --> MARKERS
+    MARKERS --> PROF
+    
+    XCONF -->|HTTPS| XC
+    XC -->|🔴 Write lock| PROF
+    
+    SCHED -->|Trigger| PROF
+    PROF --> RT1
+    PROF --> RT2
+    PROF --> RTN
+    
+    RT1 -->|Acquire| POOL
+    RT2 -->|Acquire| POOL
+    RTN -->|🔴 Blocks forever| POOL
+    
+    RT1 --> DM
+    POOL -->|HTTPS| COLLECTOR
+    
+    style ER fill:#FFE6E6
+    style RT1 fill:#FFE6E6
+    style RTN fill:#FFE6E6
+    style POOL fill:#FFE6E6
+    style PROF fill:#FFE6E6
+    style XC fill:#FFE6E6
+    style EQ fill:#FFE6E6
+```
+
+**Legend:**
+- 🔴 **Current Critical Issues** - Deadlocks, race conditions, or blocking problems
+- ⚠️ **High Complexity Areas** - Cyclomatic complexity or maintainability concerns
+- 🟢 **Hardened Solutions** - Applied in hardening effort (shown in later diagrams)
+
+---
+
+## 2. Thread Interaction & Synchronization Points
+
+```mermaid
+sequenceDiagram
+    participant App as Application<br/>(External)
+    participant ER as Event Receiver<br/>Thread
+    participant XC as XConf Client<br/>Thread
+    participant Sched as Scheduler<br/>Thread
+    participant RT1 as Report Thread 1
+    participant RT2 as Report Thread 2
+    participant Pool as Connection Pool<br/>(Shared Resource)
+    participant Prof as Profile List<br/>(plMutex)
+    
+    Note over App,Pool: 🔴 Problem Scenario: Report Generation Deadlock
+    
+    App->>ER: t2_event_s("WIFI_ERROR")
+    activate ER
+    ER->>ER: Lock erMutex
+    ER->>Prof: Lock plMutex
+    Note right of Prof: 🔴 DEADLOCK RISK:<br/>Lock order violation
+    
+    par Configuration Update (Concurrent)
+        XC->>Prof: Lock plMutex<br/>🔴 Already locked!
+        Note right of XC: ⏳ Blocks waiting...
+    and Report Thread 1 (Concurrent)
+        Sched->>RT1: Trigger report
+        activate RT1
+        RT1->>Prof: Lock plMutex<br/>🔴 Already locked!
+        Note right of RT1: ⏳ Blocks waiting...
+    and Report Thread 2 (Concurrent)
+        Sched->>RT2: Trigger report
+        activate RT2
+        RT2->>Pool: Acquire connection
+        Note right of Pool: 🔴 All handles busy
+        RT2->>Pool: ⏳ Spin-wait<br/>NO TIMEOUT!
+        Note right of RT2: 🔴 Can block forever<br/>if RT1 holds handle
+    end
+    
+    ER->>Prof: Unlock plMutex
+    ER->>ER: Unlock erMutex
+    deactivate ER
+    
+    RT1->>Prof: Lock acquired
+    RT1->>Pool: Acquire connection
+    RT1->>Pool: ⏳ Spin-wait
+    Note over RT1,RT2: 🔴 DEADLOCK:<br/>RT1 waits for pool<br/>RT2 holds pool, waits for plMutex<br/>plMutex held by XC
+    
+    deactivate RT1
+    deactivate RT2
+```
+
+---
+
+## 3. Critical Synchronization Mechanisms (Current State)
+
+### Current Mutex Inventory
+
+```mermaid
+graph LR
+    subgraph "Global Mutexes"
+        PM[plMutex<br/>🔴 Profile List<br/>High contention]
+        POOLM[pool_mutex<br/>🔴 Connection Pool<br/>Deadlock risk]
+        ERM[erMutex<br/>Event Queue]
+        SCM[scMutex<br/>Scheduler]
+        XCM[xcMutex<br/>XConf Client]
+    end
+    
+    subgraph "Per-Profile Mutexes"
+        RIPM[reportInProgressMutex<br/>Per profile]
+        TCM[triggerCondMutex<br/>Per profile]
+        EM[eventMutex<br/>Per profile]
+        RM[reportMutex<br/>Per profile]
+    end
+    
+    subgraph "Condition Variables"
+        RIPC[reportInProgressCond]
+        RC[reportcond]
+        ERC[erCond]
+        SCC[xcCond]
+    end
+    
+    PM ---|🔴 Lock order<br/>violation risk| RIPM
+    POOLM ---|🔴 Circular<br/>dependency| PM
+    PM ---|Used by| ERM
+    
+    RIPM -.Signal.-> RIPC
+    RM -.Signal.-> RC
+    ERM -.Signal.-> ERC
+    XCM -.Signal.-> SCC
+    
+    style PM fill:#FFE6E6
+    style POOLM fill:#FFE6E6
+    style RIPM fill:#FFE6E6
+```
+
+### 🔴 Current Lock Ordering Issues
+
+**No documented lock ordering!** Current code exhibits these patterns:
+
+```c
+// Pattern 1: Event Receiver -> Profile List
+pthread_mutex_lock(&erMutex);
+pthread_mutex_lock(&plMutex);    // ← Lock order A→B
+
+// Pattern 2: Report Thread -> Pool
+pthread_mutex_lock(&plMutex);     
+acquire_pool_handle();             // Acquires pool_mutex internally
+// ← Lock order A→C
+
+// Pattern 3: XConf Update -> Profile
+pthread_mutex_lock(&plMutex);     // ← Can block report threads
+// Long-running configuration update
+pthread_mutex_unlock(&plMutex);
+
+// Pattern 4: reportInProgress flag access
+// 🔴 RACE CONDITION: Accessed without consistent protection!
+if (!profile->reportInProgress) {  // ← Read without lock in some paths
+    profile->reportInProgress = true;
+}
+```
+
+---
+
+## 4. Critical Data Flow: Report Generation with Concurrent Load
+
+```mermaid
+sequenceDiagram
+    participant Sched as Scheduler
+    participant Prof as Profile<br/>(plMutex)
+    participant RT as Report Thread
+    participant Pool as Connection Pool<br/>(pool_mutex)
+    participant DM as Data Model<br/>Client
+    participant Srv as Collection<br/>Server
+    
+    Note over Sched,Srv: 🔴 Problematic Flow: 15+ Profiles Under Load
+    
+    loop For each of 15+ profiles
+        Sched->>Prof: Lock plMutex
+        Sched->>Prof: Check reportInProgress
+        
+        alt Report NOT in progress
+            Prof->>Prof: Set reportInProgress = true
+            Prof->>RT: Create/signal thread
+            Prof->>Prof: Unlock plMutex
+            
+            activate RT
+            RT->>Prof: Lock plMutex<br/>🔴 Re-acquire lock!
+            RT->>Prof: Get profile data
+            RT->>Prof: Unlock plMutex
+            
+            RT->>Pool: Acquire handle<br/>Lock pool_mutex
+            Note right of Pool: 🔴 BLOCKING POINT<br/>If pool exhausted,<br/>spin-wait with NO timeout
+            
+            alt Pool handle available
+                Pool-->>RT: Return handle
+                RT->>DM: Get TR-181 params
+                DM-->>RT: Parameter values
+                RT->>RT: Build JSON report
+                RT->>Srv: HTTP POST (via CURL)
+                Srv-->>RT: 200 OK
+                RT->>Pool: Release handle<br/>Unlock pool_mutex
+            else 🔴 All handles busy (>35s)
+                Pool-->>RT: TIMEOUT (new)
+                RT->>RT: Fail report
+                RT->>Prof: reportInProgress = false
+                Note right of RT: 🟢 HARDENED:<br/>Timeout prevents<br/>indefinite blocking
+            end
+            
+            RT->>Prof: Lock reportInProgressMutex
+            RT->>Prof: Set reportInProgress = false
+            RT->>Prof: Signal reportInProgressCond
+            RT->>Prof: Unlock reportInProgressMutex
+            deactivate RT
+            
+        else 🔴 Report already in progress
+            Note right of Prof: ⚠️ Skip this cycle<br/>Can accumulate delays<br/>under sustained load
+            Prof->>Prof: Unlock plMutex
+        end
+    end
+```
+
+**Critical Path Issues:**
+1. **plMutex held during thread creation** - Blocks all profile operations
+2. **No pool acquisition timeout** - Can block indefinitely if pool exhausted
+3. **reportInProgress flag** - Pattern allows race between check and set
+4. **Profile count scales badly** - 15+ profiles = 15+ lock cycles per scheduler tick
+
+---
+
+## 5. Problem Areas: Annotated Critical Sections
+
+```mermaid
+graph TB
+    subgraph "🔴 Problem Area 1: Report Generation Deadlock"
+        P1A[Profile Update<br/>Holds plMutex]
+        P1B[Report Thread<br/>Waits for plMutex]
+        P1C[Connection Pool<br/>Held by another thread]
+        
+        P1A -->|Blocks| P1B
+        P1B -->|Waits for| P1C
+        P1C -->|Held by blocked thread| P1A
+        
+        P1Note[🔴 Circular wait:<br/>A→B→C→A]
+    end
+    
+    subgraph "🔴 Problem Area 2: Connection Pool Exhaustion"
+        P2A[15+ profiles trigger<br/>simultaneously]
+        P2B[Pool size: 1-5 handles]
+        P2C[No timeout on acquire]
+        P2D[Threads spin-wait forever]
+        
+        P2A --> P2B
+        P2B --> P2C
+        P2C --> P2D
+        
+        P2Note[🔴 Starvation:<br/>Threads blocked indefinitely<br/>No backpressure mechanism]
+    end
+    
+    subgraph "🔴 Problem Area 3: Configuration Update Race"
+        P3A[XConf receives update]
+        P3B[Lock plMutex]
+        P3C[Delete old profiles]
+        P3D[Create new profiles]
+        P3E[Unlock plMutex]
+        
+        P3A --> P3B
+        P3B --> P3C
+        P3C --> P3D
+        P3D --> P3E
+        
+        P3RC[🔴 Race condition:<br/>Report threads may access<br/>deleted profile memory<br/>Use-after-free risk]
+        
+        P3D -.Race.-> P3RC
+    end
+    
+    subgraph "🔴 Problem Area 4: reportInProgress Flag Sync"
+        P4A[Check: !reportInProgress]
+        P4B[Set: reportInProgress = true]
+        P4C[Thread 2 checks same flag]
+        
+        P4A -.Window.-> P4C
+        P4C -.Race.-> P4B
+        
+        P4Note[🔴 TOCTOU Race:<br/>Time-of-check to<br/>time-of-use vulnerability<br/>Multiple threads enter<br/>critical section]
+    end
+    
+    style P1A fill:#FFE6E6
+    style P1B fill:#FFE6E6
+    style P1C fill:#FFE6E6
+    style P2A fill:#FFE6E6
+    style P2D fill:#FFE6E6
+    style P3C fill:#FFE6E6
+    style P3RC fill:#FFE6E6
+    style P4A fill:#FFE6E6
+    style P4B fill:#FFE6E6
+```
+
+---
+
+## 6. Hardened Architecture: Solutions Applied
+
+### Solution 1: Documented Lock Ordering
+```mermaid
+graph LR
+    S1[Strict Lock Hierarchy:<br/>1. plMutex global profile list<br/>2. profile mutexes instance<br/>3. pool_mutex connection pool<br/>4. erMutex event queue]
+    S1A[Validation: Static analysis<br/>enforces at compile-time]
+    S1B[Runtime: Lock tracking<br/>with debug assertions]
+    
+    S1 --> S1A
+    S1 --> S1B
+    
+    style S1 fill:#E6FFE6
+    style S1A fill:#E6FFE6
+    style S1B fill:#E6FFE6
+```
+
+### Solution 2: Pool Acquisition Timeout
+```mermaid
+graph LR
+    S2[Timeout: 35 seconds<br/>on pool acquisition]
+    S2A[Fail fast: Return error<br/>instead of infinite wait]
+    S2B[Backpressure: Scheduler<br/>backs off on failures]
+    S2C[Metrics: Track pool<br/>contention and timeouts]
+    
+    S2 --> S2A
+    S2 --> S2B
+    S2 --> S2C
+    
+    style S2 fill:#E6FFE6
+    style S2A fill:#E6FFE6
+    style S2B fill:#E6FFE6
+    style S2C fill:#E6FFE6
+```
+
+### Solution 3: Reference-Counted Profiles
+```mermaid
+graph LR
+    S3[Profile Refcount:<br/>Atomic increment/decrement]
+    S3A[Safe deletion:<br/>Wait for refcount = 0]
+    S3B[Use-after-free:<br/>Prevented by refcount]
+    
+    S3 --> S3A
+    S3 --> S3B
+    
+    style S3 fill:#E6FFE6
+    style S3A fill:#E6FFE6
+    style S3B fill:#E6FFE6
+```
+
+### Solution 4: Atomic reportInProgress
+```mermaid
+graph LR
+    S4[Atomic flag:<br/>Compare-and-swap]
+    S4A[Race-free:<br/>Only one thread succeeds]
+    S4B[No mutex needed:<br/>Reduced contention]
+    
+    S4 --> S4A
+    S4 --> S4B
+    
+    style S4 fill:#E6FFE6
+    style S4A fill:#E6FFE6
+    style S4B fill:#E6FFE6
+```
+
+### Solution 5: Fine-Grained Locking
+```mermaid
+graph LR
+    S5[Per-profile locks:<br/>Replace coarse plMutex]
+    S5A[Concurrent profiles:<br/>Different profiles do not block]
+    S5B[Reduced contention:<br/>15+ profiles scale better]
+    
+    S5 --> S5A
+    S5 --> S5B
+    
+    style S5 fill:#E6FFE6
+    style S5A fill:#E6FFE6
+    style S5B fill:#E6FFE6
+```
+
+### Solution 6: ThreadSanitizer Integration
+```mermaid
+graph LR
+    S6[TSan enabled:<br/>Detect races at runtime]
+    S6A[CI/CD integration:<br/>Automated testing]
+    S6B[Production monitoring:<br/>Detect edge cases]
+    
+    S6 --> S6A
+    S6 --> S6B
+    
+    style S6 fill:#E6FFE6
+    style S6A fill:#E6FFE6
+    style S6B fill:#E6FFE6
+```
+
+---
+
+## 7. Hardened Report Generation Flow (After Fixes)
+
+```mermaid
+sequenceDiagram
+    participant Sched as Scheduler
+    participant Prof as Profile<br/>(Fine-grained lock)
+    participant RT as Report Thread
+    participant Pool as Connection Pool<br/>(With timeout)
+    participant Srv as Server
+    
+    Note over Sched,Srv: 🟢 Hardened Flow: Safe Under 15+ Concurrent Profiles
+    
+    Sched->>Prof: Lock profile→scheduleMutex<br/>🟢 Fine-grained, not global
+    Sched->>Prof: Atomic CAS reportInProgress<br/>🟢 Race-free
+    
+    alt CAS succeeded
+        Prof->>Prof: Increment refcount<br/>🟢 Prevent deletion
+        Prof-->>Sched: Success
+        Sched->>Prof: Unlock scheduleMutex
+        
+        Sched->>RT: Signal thread
+        activate RT
+        
+        RT->>Prof: Lock profile→dataMutex<br/>🟢 Independent of schedule lock
+        RT->>Prof: Read profile config
+        RT->>Prof: Unlock dataMutex
+        
+        RT->>Pool: acquire_pool_handle()<br/>with 35s timeout
+        
+        alt Pool handle available
+            Pool-->>RT: Handle acquired
+            RT->>Srv: HTTP POST
+            Srv-->>RT: 200 OK
+            RT->>Pool: Release handle
+            
+        else 🟢 Timeout after 35s
+            Pool-->>RT: T2ERROR_FAILURE
+            RT->>RT: Log pool timeout
+            RT->>Sched: Signal backoff
+            Note right of Sched: 🟢 Scheduler adjusts<br/>retry interval
+        end
+        
+        RT->>Prof: Atomic store reportInProgress = false
+        RT->>Prof: Decrement refcount<br/>🟢 Safe to delete if 0
+        deactivate RT
+        
+    else CAS failed (already in progress)
+        Note right of Prof: 🟢 Expected behavior<br/>No contention/blocking
+        Prof-->>Sched: Skip this cycle
+        Sched->>Prof: Unlock scheduleMutex
+    end
+```
+
+**Improvements:**
+- ✅ Fine-grained per-profile locks eliminate global contention
+- ✅ Atomic CAS eliminates reportInProgress races
+- ✅ Reference counting prevents use-after-free
+- ✅ Pool timeout prevents indefinite blocking
+- ✅ Backpressure mechanism handles load spikes
+
+---
+
+## 8. Lock Ordering Hierarchy (Hardened)
+
+```mermaid
+graph TD
+    L1[Level 1: Profile List Lock<br/>profileListMutex<br/>🟢 Short critical sections only]
+    L2[Level 2: Profile Instance Locks<br/>profile→scheduleMutex<br/>profile→dataMutex<br/>profile→eventMutex<br/>🟢 Independent per profile]
+    L3[Level 3: Connection Pool<br/>pool_mutex<br/>🟢 Timeout-protected]
+    L4[Level 4: Event Queue<br/>erMutex<br/>🟢 Lowest priority]
+    
+    L1 -->|May acquire| L2
+    L2 -->|May acquire| L3
+    L2 -->|May acquire| L4
+    
+    L1 -.Never.-> L3
+    L1 -.Never.-> L4
+    L3 -.Never.-> L1
+    L3 -.Never.-> L2
+    L4 -.Never.-> L1
+    
+    RULE1[🟢 Rule: Always acquire<br/>in descending order<br/>Never hold L2+ while acquiring L1]
+    RULE2[🟢 Rule: Pool operations<br/>must not hold profile locks<br/>Release before acquire_pool_handle]
+    RULE3[🟢 Validation: Static analyzer<br/>enforces at compile time<br/>ThreadSanitizer checks at runtime]
+    
+    style L1 fill:#E6FFE6
+    style L2 fill:#E6FFE6
+    style L3 fill:#E6FFE6
+    style L4 fill:#E6FFE6
+```
+
+---
+
+## 9. Validation Strategy
+
+```mermaid
+graph LR
+    subgraph "🔍 Static Analysis"
+        SA1[Clang Thread Safety<br/>Annotations]
+        SA2[Lock Order Checker]
+        SA3[Cyclomatic Complexity<br/>Analysis]
+    end
+    
+    subgraph "🧪 Dynamic Testing"
+        DT1[ThreadSanitizer TSan<br/>Race detection]
+        DT2[Deadlock Detector<br/>Lock cycle detection]
+        DT3[Load Testing<br/>15+ concurrent profiles]
+    end
+    
+    subgraph "📊 Production Monitoring"
+        PM1[Lock contention metrics]
+        PM2[Pool timeout counters]
+        PM3[Report failure rates]
+    end
+    
+    SA1 --> CODE[Codebase]
+    SA2 --> CODE
+    SA3 --> CODE
+    
+    CODE --> DT1
+    CODE --> DT2
+    CODE --> DT3
+    
+    DT1 --> PASS{All checks<br/>pass?}
+    DT2 --> PASS
+    DT3 --> PASS
+    
+    PASS -->|Yes| DEPLOY[Deploy]
+    PASS -->|No| FIX[Fix Issues]
+    FIX --> CODE
+    
+    DEPLOY --> PM1
+    DEPLOY --> PM2
+    DEPLOY --> PM3
+    
+    style SA1 fill:#E6F3FF
+    style DT1 fill:#FFF9E6
+    style PM1 fill:#F0E6FF
+```
+
+---
+
+## 10. Summary: Before vs. After Hardening
+
+| Aspect | 🔴 Before Hardening | 🟢 After Hardening |
+|--------|---------------------|-------------------|
+| **Lock Ordering** | Undocumented, ad-hoc | Strict hierarchy enforced by static analysis |
+| **Pool Blocking** | Infinite spin-wait | 35s timeout with backpressure |
+| **Profile Deletion** | Use-after-free risk | Reference-counted, safe deletion |
+| **reportInProgress** | TOCTOU race condition | Atomic compare-and-swap |
+| **Concurrency** | Global plMutex bottleneck | Per-profile fine-grained locks |
+| **Validation** | Manual testing only | TSan + static analysis + load tests |
+
+---
+
+## Acceptance Criteria Coverage
+
+✅ **Report generation/connection deadlocks eliminated** - Pool timeout + lock ordering  
+✅ **Configuration client synchronization hardened** - Reference counting + fine-grained locks  
+✅ **Profile lifecycle race conditions resolved** - Atomic flags + proper synchronization  
+✅ **ThreadSanitizer integration complete** - CI/CD automated testing  
+✅ **Cyclomatic complexity reduced** - Refactored critical paths  
+✅ **Production-grade reliability verified** - Load tested with 15+ profiles under prolonged offline periods  
+
+---
+
+## References
+
+- Main implementation: [source/bulkdata/profile.c](../../source/bulkdata/profile.c)
+- Connection pool: [source/protocol/http/multicurlinterface.c](../../source/protocol/http/multicurlinterface.c)
+- Configuration client: [source/xconf-client/xconfclient.c](../../source/xconf-client/xconfclient.c)
+- Event receiver: [source/bulkdata/t2eventreceiver.c](../../source/bulkdata/t2eventreceiver.c)
+- Architecture overview: [overview.md](./overview.md)
+
+---
+

From 9acdd137616875bfa87b74d56c553b8b09da545c Mon Sep 17 00:00:00 2001
From: shibu-kv <shibu.kakkoth@gmail.com>
Date: Fri, 27 Mar 2026 16:15:22 -0700
Subject: [PATCH 02/12] Summarized thread hardening changes

---
 .../summarized_thread_safety_hardening.md     | 247 ++++++++++++++++++
 1 file changed, 247 insertions(+)
 create mode 100644 docs/architecture/summarized_thread_safety_hardening.md

diff --git a/docs/architecture/summarized_thread_safety_hardening.md b/docs/architecture/summarized_thread_safety_hardening.md
new file mode 100644
index 00000000..af5fb6de
--- /dev/null
+++ b/docs/architecture/summarized_thread_safety_hardening.md
@@ -0,0 +1,247 @@
+# Telemetry Thread Safety Hardening - Summary
+
+## User Story
+**[T2] [RDKB] Harden Telemetry Thread Safety Under Concurrent Load**
+
+Eliminate deadlocks and race conditions under concurrent load scenarios (15+ profiles with extended offline periods).
+
+---
+
+## 🔴 BEFORE: Current Architecture with Thread Safety Issues
+
+```mermaid
+graph TB
+    subgraph "Application Layer"
+        APP[Applications<br/>Multiple concurrent calls]
+    end
+    
+    subgraph "Telemetry Process - Thread Safety Issues"
+        ER[Event Receiver<br/>Thread]
+        XC[XConf Client<br/>Thread]
+        SCHED[Scheduler<br/>Thread]
+        
+        RT1[Report Thread 1]
+        RT2[Report Thread 2]
+        RT15[Report Thread 15+]
+        
+        subgraph "🔴 Problematic Shared Resources"
+            PROF[Profile List<br/>🔴 Global plMutex<br/>🔴 Lock contention<br/>🔴 No lock ordering]
+            POOL[Connection Pool<br/>🔴 pool_mutex deadlock<br/>🔴 NO timeout<br/>🔴 Size: 1-5 handles]
+        end
+    end
+    
+    subgraph "External Systems"
+        XCONF[XConf Server]
+        SERVER[Collection Server]
+    end
+    
+    APP -->|Events| ER
+    XCONF -->|Config| XC
+    
+    ER -->|🔴 Lock| PROF
+    XC -->|🔴 Lock holds long| PROF
+    SCHED -->|🔴 Lock| PROF
+    
+    PROF -->|🔴 Blocks| RT1
+    PROF -->|🔴 Blocks| RT2
+    PROF -->|🔴 Blocks| RT15
+    
+    RT1 -->|🔴 Waits forever| POOL
+    RT2 -->|🔴 Waits forever| POOL
+    RT15 -->|🔴 Waits forever| POOL
+    
+    POOL -->|HTTP| SERVER
+    
+    DEADLOCK1[🔴 DEADLOCK 1:<br/>RT1 holds plMutex, waits for pool_mutex<br/>RT2 holds pool_mutex, waits for plMutex]
+    DEADLOCK2[🔴 DEADLOCK 2:<br/>XConf holds plMutex during config update<br/>All report threads block indefinitely]
+    RACE1[🔴 RACE CONDITION:<br/>reportInProgress flag<br/>Time-of-check to time-of-use]
+    STARVATION[🔴 STARVATION:<br/>Pool exhausted, no timeout<br/>Threads spin-wait forever]
+    
+    style PROF fill:#FFE6E6
+    style POOL fill:#FFE6E6
+    style RT1 fill:#FFE6E6
+    style RT2 fill:#FFE6E6
+    style RT15 fill:#FFE6E6
+    style ER fill:#FFE6E6
+    style XC fill:#FFE6E6
+```
+
+### Critical Issues Identified
+
+| Issue | Impact | Affected Components |
+|-------|--------|-------------------|
+| **Global Lock Contention** | All operations block on single plMutex | Profile List, Event Receiver, XConf Client, Report Threads |
+| **Connection Pool Deadlock** | Circular wait: plMutex ↔ pool_mutex | Report Threads, Connection Pool |
+| **No Pool Timeout** | Threads spin-wait indefinitely if pool exhausted | All Report Threads (15+ concurrent) |
+| **Race Condition** | reportInProgress TOCTOU vulnerability | Profile lifecycle, multiple threads |
+| **Use-After-Free Risk** | Profile deletion during active report | XConf updates, Report Threads |
+| **Undocumented Lock Ordering** | Ad-hoc locking leads to deadlocks | Entire codebase |
+
+---
+
+## 🟢 AFTER: Hardened Architecture with Thread Safety
+
+```mermaid
+graph TB
+    subgraph "Application Layer"
+        APP[Applications<br/>Multiple concurrent calls]
+    end
+    
+    subgraph "Telemetry Process - Hardened Thread Safety"
+        ER[Event Receiver<br/>Thread]
+        XC[XConf Client<br/>Thread]
+        SCHED[Scheduler<br/>Thread]
+        
+        RT1[Report Thread 1]
+        RT2[Report Thread 2]
+        RT15[Report Thread 15+]
+        
+        subgraph "🟢 Hardened Shared Resources"
+            PROF[Profile List<br/>🟢 Fine-grained locks<br/>🟢 Refcounting<br/>🟢 Strict lock ordering]
+            POOL[Connection Pool<br/>🟢 35s timeout<br/>🟢 Backpressure<br/>🟢 Size: 1-5 handles]
+        end
+    end
+    
+    subgraph "External Systems"
+        XCONF[XConf Server]
+        SERVER[Collection Server]
+    end
+    
+    subgraph "🔍 Validation Layer"
+        TSAN[ThreadSanitizer<br/>Race detection]
+        STATIC[Static Analysis<br/>Lock order checker]
+        METRICS[Production Metrics<br/>Contention tracking]
+    end
+    
+    APP -->|Events| ER
+    XCONF -->|Config| XC
+    
+    ER -->|🟢 Per-profile lock| PROF
+    XC -->|🟢 Refcount + short lock| PROF
+    SCHED -->|🟢 Per-profile lock| PROF
+    
+    PROF -->|🟢 Non-blocking| RT1
+    PROF -->|🟢 Non-blocking| RT2
+    PROF -->|🟢 Non-blocking| RT15
+    
+    RT1 -->|🟢 35s timeout| POOL
+    RT2 -->|🟢 35s timeout| POOL
+    RT15 -->|🟢 35s timeout| POOL
+    
+    POOL -->|HTTP| SERVER
+    POOL -.Timeout.-> RT15
+    RT15 -.Backpressure.-> SCHED
+    
+    PROF -.Monitored.-> TSAN
+    POOL -.Enforced.-> STATIC
+    RT1 -.Tracked.-> METRICS
+    
+    FIXED1[🟢 NO DEADLOCK:<br/>Strict lock hierarchy<br/>Level 1: Profile List<br/>Level 2: Profile Instance<br/>Level 3: Connection Pool]
+    FIXED2[🟢 ATOMIC FLAGS:<br/>reportInProgress uses CAS<br/>Race-free synchronization]
+    FIXED3[🟢 SAFE DELETION:<br/>Reference counting<br/>Profiles deleted only at refcount=0]
+    FIXED4[🟢 TIMEOUT PROTECTION:<br/>Pool acquire fails at 35s<br/>Scheduler backs off gracefully]
+    
+    style PROF fill:#E6FFE6
+    style POOL fill:#E6FFE6
+    style RT1 fill:#E6FFE6
+    style RT2 fill:#E6FFE6
+    style RT15 fill:#E6FFE6
+    style ER fill:#E6FFE6
+    style XC fill:#E6FFE6
+    style TSAN fill:#E6F3FF
+    style STATIC fill:#E6F3FF
+    style METRICS fill:#E6F3FF
+```
+
+### Hardening Solutions Applied
+
+| Solution | Benefit | Implementation |
+|----------|---------|----------------|
+| **Fine-Grained Locking** | Eliminates global bottleneck | Per-profile locks replace coarse plMutex |
+| **Documented Lock Hierarchy** | Prevents deadlocks | Static analysis enforces ordering |
+| **Pool Acquisition Timeout** | Prevents infinite blocking | 35s timeout with backpressure mechanism |
+| **Reference Counting** | Prevents use-after-free | Atomic refcount on profile structures |
+| **Atomic Flags** | Eliminates race conditions | CAS for reportInProgress flag |
+| **ThreadSanitizer Integration** | Early race detection | CI/CD automated testing |
+
+---
+
+## Before vs. After Comparison
+
+| Aspect | 🔴 Before | 🟢 After |
+|--------|-----------|----------|
+| **Concurrency** | Global plMutex → all threads block | Per-profile locks → 15+ profiles concurrent |
+| **Deadlock Risk** | High (circular wait possible) | Zero (strict lock hierarchy enforced) |
+| **Pool Blocking** | Infinite spin-wait | 35s timeout + backpressure |
+| **Race Conditions** | reportInProgress TOCTOU | Atomic compare-and-swap |
+| **Profile Deletion** | Use-after-free risk | Reference-counted safe deletion |
+| **Lock Ordering** | Undocumented, ad-hoc | Level 1→2→3 hierarchy enforced |
+| **Validation** | Manual testing only | TSan + static analysis + metrics |
+| **Scalability** | Poor (1-3 profiles max) | Production-grade (15+ profiles) |
+| **Production Safety** | Service hangs, crashes | Graceful degradation under load |
+
+---
+
+## Key Metrics
+
+### Performance Under Load (15+ Concurrent Profiles)
+
+| Metric | Before | After | Improvement |
+|--------|--------|-------|-------------|
+| **Lock Contention** | High (>80% wait time) | Low (<10% wait time) | 8x reduction |
+| **Deadlock Frequency** | 2-3 per week | 0 | 100% eliminated |
+| **Report Success Rate** | 60-70% under load | 99%+ under load | 40% improvement |
+| **Pool Timeout Events** | N/A (infinite wait) | <1% of requests | Monitored |
+| **Profile Update Latency** | 5-30s (blocking) | <100ms (non-blocking) | 50-300x faster |
+
+---
+
+## Validation Strategy
+
+```mermaid
+graph LR
+    CODE[Codebase] --> STATIC[Static Analysis<br/>Lock order checker]
+    CODE --> TSAN[ThreadSanitizer<br/>Race detection]
+    CODE --> LOAD[Load Testing<br/>15+ profiles]
+    
+    STATIC --> PASS{All Pass?}
+    TSAN --> PASS
+    LOAD --> PASS
+    
+    PASS -->|Yes| DEPLOY[Deploy to<br/>Production]
+    PASS -->|No| FIX[Fix Issues]
+    
+    FIX --> CODE
+    
+    DEPLOY --> MONITOR[Production<br/>Monitoring]
+    MONITOR --> METRICS[Metrics:<br/>Contention<br/>Timeouts<br/>Failures]
+    
+    style STATIC fill:#E6F3FF
+    style TSAN fill:#FFF9E6
+    style MONITOR fill:#F0E6FF
+```
+
+---
+
+## Acceptance Criteria
+
+✅ **Report generation/connection deadlocks eliminated** - Zero deadlocks with lock hierarchy + timeout  
+✅ **Configuration client synchronization hardened** - Refcounting + fine-grained locks  
+✅ **Profile lifecycle race conditions resolved** - Atomic CAS flags + proper synchronization  
+✅ **ThreadSanitizer integration complete** - CI/CD automated race detection  
+✅ **Cyclomatic complexity reduced** - Refactored critical paths, simplified logic  
+✅ **Production-grade reliability verified** - Load tested: 15+ profiles, extended offline periods  
+
+---
+
+## References
+
+- Detailed architecture: [thread-safety-hardening-diagram.md](./thread-safety-hardening-diagram.md)
+- Main implementation: [source/bulkdata/profile.c](../../source/bulkdata/profile.c)
+- Connection pool: [source/protocol/http/multicurlinterface.c](../../source/protocol/http/multicurlinterface.c)
+
+---
+
+**Document Status:** Summary for stakeholder review  
+**Last Updated:** 2026-03-27  
+**Target Release:** Next sprint (hardening implementation)

From 1d33bf0d759c3096eb034ce1b2f3cb3de79b7485 Mon Sep 17 00:00:00 2001
From: shibu-kv <shibu.kakkoth@gmail.com>
Date: Fri, 27 Mar 2026 16:26:29 -0700
Subject: [PATCH 03/12] Removed hyped up analysis

---
 .../summarized_thread_safety_hardening.md     | 42 ++-----------------
 1 file changed, 3 insertions(+), 39 deletions(-)

diff --git a/docs/architecture/summarized_thread_safety_hardening.md b/docs/architecture/summarized_thread_safety_hardening.md
index af5fb6de..e1e90403 100644
--- a/docs/architecture/summarized_thread_safety_hardening.md
+++ b/docs/architecture/summarized_thread_safety_hardening.md
@@ -180,19 +180,6 @@ graph TB
 | **Scalability** | Poor (1-3 profiles max) | Production-grade (15+ profiles) |
 | **Production Safety** | Service hangs, crashes | Graceful degradation under load |
 
----
-
-## Key Metrics
-
-### Performance Under Load (15+ Concurrent Profiles)
-
-| Metric | Before | After | Improvement |
-|--------|--------|-------|-------------|
-| **Lock Contention** | High (>80% wait time) | Low (<10% wait time) | 8x reduction |
-| **Deadlock Frequency** | 2-3 per week | 0 | 100% eliminated |
-| **Report Success Rate** | 60-70% under load | 99%+ under load | 40% improvement |
-| **Pool Timeout Events** | N/A (infinite wait) | <1% of requests | Monitored |
-| **Profile Update Latency** | 5-30s (blocking) | <100ms (non-blocking) | 50-300x faster |
 
 ---
 
@@ -208,13 +195,13 @@ graph LR
     TSAN --> PASS
     LOAD --> PASS
     
-    PASS -->|Yes| DEPLOY[Deploy to<br/>Production]
+    PASS -->|Yes| DEPLOY[Deploy to<br/>Sprint Testing]
     PASS -->|No| FIX[Fix Issues]
     
     FIX --> CODE
     
-    DEPLOY --> MONITOR[Production<br/>Monitoring]
-    MONITOR --> METRICS[Metrics:<br/>Contention<br/>Timeouts<br/>Failures]
+    DEPLOY --> MONITOR[Sprint NG Build<br/>Monitoring]
+    MONITOR --> METRICS[Metrics:<br/>Contention<br/>Timeouts<br/>Crashes]
     
     style STATIC fill:#E6F3FF
     style TSAN fill:#FFF9E6
@@ -222,26 +209,3 @@ graph LR
 ```
 
 ---
-
-## Acceptance Criteria
-
-✅ **Report generation/connection deadlocks eliminated** - Zero deadlocks with lock hierarchy + timeout  
-✅ **Configuration client synchronization hardened** - Refcounting + fine-grained locks  
-✅ **Profile lifecycle race conditions resolved** - Atomic CAS flags + proper synchronization  
-✅ **ThreadSanitizer integration complete** - CI/CD automated race detection  
-✅ **Cyclomatic complexity reduced** - Refactored critical paths, simplified logic  
-✅ **Production-grade reliability verified** - Load tested: 15+ profiles, extended offline periods  
-
----
-
-## References
-
-- Detailed architecture: [thread-safety-hardening-diagram.md](./thread-safety-hardening-diagram.md)
-- Main implementation: [source/bulkdata/profile.c](../../source/bulkdata/profile.c)
-- Connection pool: [source/protocol/http/multicurlinterface.c](../../source/protocol/http/multicurlinterface.c)
-
----
-
-**Document Status:** Summary for stakeholder review  
-**Last Updated:** 2026-03-27  
-**Target Release:** Next sprint (hardening implementation)

From 99cb85cac88edbe928d18ceeb7452afbca264de7 Mon Sep 17 00:00:00 2001
From: Aravindan NC <35158113+AravindanNC@users.noreply.github.com>
Date: Thu, 2 Apr 2026 16:21:38 -0400
Subject: [PATCH 04/12] Update run_l2.sh

---
 test/run_l2.sh | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/test/run_l2.sh b/test/run_l2.sh
index 00bcdc97..5425c27c 100755
--- a/test/run_l2.sh
+++ b/test/run_l2.sh
@@ -19,8 +19,12 @@
 # limitations under the License.
 ####################################################################################
 
+# ThreadSanitizer is always enabled for L2 tests to catch race conditions
+echo "ThreadSanitizer enabled - running with race condition detection"
+RESULT_DIR="/tmp/l2_test_report_tsan"
+export TSAN_OPTIONS="suppressions=./test/tsan.supp:halt_on_error=1:abort_on_error=1:detect_thread_leaks=1:report_bugs=1"
+
 export top_srcdir=`pwd`
-RESULT_DIR="/tmp/l2_test_report"
 mkdir -p "$RESULT_DIR"
 
 if ! grep -q "LOG_PATH=/opt/logs/" /etc/include.properties; then

From 8288dac1118d762ce0d024afb1c7e93b128d49ab Mon Sep 17 00:00:00 2001
From: Aravindan NC <35158113+AravindanNC@users.noreply.github.com>
Date: Thu, 2 Apr 2026 16:21:59 -0400
Subject: [PATCH 05/12] Create tsan.supp

---
 test/tsan.supp | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)
 create mode 100644 test/tsan.supp

diff --git a/test/tsan.supp b/test/tsan.supp
new file mode 100644
index 00000000..69d497ca
--- /dev/null
+++ b/test/tsan.supp
@@ -0,0 +1,35 @@
+# ThreadSanitizer suppression file for telemetry2_0
+# Suppress known false positives and third-party library races
+
+# Suppress races in libcurl - external library we cannot fix
+race:libcurl.so.*
+race:curl_*
+race:Curl_*
+
+# Suppress races in glibc - system library false positives  
+race:libc.so.*
+race:libpthread.so.*
+race:__pthread_*
+race:pthread_*
+
+# Suppress races in OpenSSL - external crypto library
+race:libssl.so.*
+race:libcrypto.so.*
+
+# Suppress races in JSON library - external parser
+race:libcjson.so.*
+
+# Suppress races in RDK libraries - external dependencies
+race:librdkloggers.so.*
+race:librbus.so.*
+race:libccsp_common.so.*
+
+# Known safe patterns - suppress specific functions
+# Legacy logging system - safe single writer pattern
+race:T2Error
+race:T2Info  
+race:T2Debug
+race:T2Warning
+
+# Safe atomic-like operations on single variables
+# (Remove these as we fix the actual races)

From f26dd4fe34e7ed7330bb740d3b17ebd199976517 Mon Sep 17 00:00:00 2001
From: Aravindan NC <35158113+AravindanNC@users.noreply.github.com>
Date: Thu, 2 Apr 2026 16:22:28 -0400
Subject: [PATCH 06/12] Update configure.ac

---
 configure.ac | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/configure.ac b/configure.ac
index 03951e9f..f259c62e 100644
--- a/configure.ac
+++ b/configure.ac
@@ -68,6 +68,27 @@ m4_ifdef([AM_SILENT_RULES],[AM_SILENT_RULES([yes])],
    AC_SUBST(AM_DEFAULT_VERBOSITY)])
 
 
+dnl **********************************
+dnl Thread Safety Analysis Support
+dnl **********************************
+AC_ARG_ENABLE([thread-sanitizer],
+              AS_HELP_STRING([--enable-thread-sanitizer],[enable ThreadSanitizer for race condition detection (default is no)]),
+              [
+                case "${enableval}" in
+                 yes) THREAD_SANITIZER_ENABLED=true
+                      T2_THREAD_SANITIZER_CFLAGS="-fsanitize=thread -g -O1"
+                      T2_THREAD_SANITIZER_LDFLAGS="-fsanitize=thread"
+                      AC_MSG_NOTICE([ThreadSanitizer enabled for race condition detection])
+                      ;;
+                  no) THREAD_SANITIZER_ENABLED=false ;;
+                   *) AC_MSG_ERROR([bad value ${enableval} for --enable-thread-sanitizer]) ;;
+                esac
+              ],
+              [THREAD_SANITIZER_ENABLED=false])
+AM_CONDITIONAL([WITH_THREAD_SANITIZER], [test x$THREAD_SANITIZER_ENABLED = xtrue])
+AC_SUBST([T2_THREAD_SANITIZER_CFLAGS])
+AC_SUBST([T2_THREAD_SANITIZER_LDFLAGS])
+
 dnl **********************************
 dnl checks for dependencies
 dnl **********************************

From 0ade98afb954532ec18762161ac2298f5d8d77f0 Mon Sep 17 00:00:00 2001
From: Aravindan NC <35158113+AravindanNC@users.noreply.github.com>
Date: Thu, 2 Apr 2026 16:22:45 -0400
Subject: [PATCH 07/12] Update Makefile.am

---
 source/Makefile.am | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/source/Makefile.am b/source/Makefile.am
index eeeb2c0e..60228563 100644
--- a/source/Makefile.am
+++ b/source/Makefile.am
@@ -35,6 +35,11 @@ endif
 AM_CFLAGS =
 AM_CFLAGS += -DCCSP_INC_no_asm_sigcontext_h
 
+if WITH_THREAD_SANITIZER
+AM_CFLAGS += $(T2_THREAD_SANITIZER_CFLAGS)
+AM_LDFLAGS = $(T2_THREAD_SANITIZER_LDFLAGS)
+endif
+
 ACLOCAL_AMFLAGS = -I m4
 
 bin_PROGRAMS = telemetry2_0

From 6e544cf6ec358eca84edb08e8dccbee9a63bc193 Mon Sep 17 00:00:00 2001
From: Aravindan NC <35158113+AravindanNC@users.noreply.github.com>
Date: Thu, 2 Apr 2026 16:23:01 -0400
Subject: [PATCH 08/12] Update profile.h

---
 source/bulkdata/profile.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/source/bulkdata/profile.h b/source/bulkdata/profile.h
index 574e6558..c4d5d8d6 100644
--- a/source/bulkdata/profile.h
+++ b/source/bulkdata/profile.h
@@ -21,6 +21,7 @@
 #define _PROFILE_H_
 
 #include <stdbool.h>
+#include <stdatomic.h>
 #include <pthread.h>
 #include <cjson/cJSON.h>
 
@@ -44,7 +45,7 @@ typedef struct _Profile
     bool enable;
     bool isSchedulerstarted;
     bool isUpdated;
-    bool reportInProgress;
+    atomic_bool reportInProgress;        // Thread-safe atomic flag - no mutex needed for simple checks
     pthread_cond_t reportInProgressCond;
     pthread_mutex_t reportInProgressMutex;
     bool generateNow;

From 7db77ed7ca92844ee6535c3a33b1666d380d6d5b Mon Sep 17 00:00:00 2001
From: Aravindan NC <35158113+AravindanNC@users.noreply.github.com>
Date: Thu, 2 Apr 2026 16:23:14 -0400
Subject: [PATCH 09/12] Update profile.c

---
 source/bulkdata/profile.c | 62 +++++++++++++++++++++------------------
 1 file changed, 34 insertions(+), 28 deletions(-)

diff --git a/source/bulkdata/profile.c b/source/bulkdata/profile.c
index fe7d91fd..17fde758 100644
--- a/source/bulkdata/profile.c
+++ b/source/bulkdata/profile.c
@@ -337,7 +337,7 @@ static void* CollectAndReport(void* data)
     {
         T2Info("%s while Loop -- START \n", __FUNCTION__);
         pthread_mutex_lock(&profile->reportInProgressMutex);
-        profile->reportInProgress = true;
+        atomic_store(&profile->reportInProgress, true);  // Atomic store - thread-safe
         pthread_cond_signal(&profile->reportInProgressCond);
         pthread_mutex_unlock(&profile->reportInProgressMutex);
 
@@ -370,7 +370,7 @@ static void* CollectAndReport(void* data)
             {
                 T2Debug(" profile->triggerReportOnCondition is not set \n");
             }
-            profile->reportInProgress = false;
+            atomic_store(&profile->reportInProgress, false);
             //return NULL;
             goto reportThreadEnd;
         }
@@ -396,7 +396,7 @@ static void* CollectAndReport(void* data)
                 {
                     T2Debug(" profile->triggerReportOnCondition is not set \n");
                 }
-                profile->reportInProgress = false;
+                atomic_store(&profile->reportInProgress, false);
                 //return NULL;
                 goto reportThreadEnd;
             }
@@ -409,7 +409,7 @@ static void* CollectAndReport(void* data)
             if(T2ERROR_SUCCESS != initJSONReportProfile(&profile->jsonReportObj, &valArray, profile->RootName))
             {
                 T2Error("Failed to initialize JSON Report\n");
-                profile->reportInProgress = false;
+                atomic_store(&profile->reportInProgress, false);
                 //pthread_mutex_unlock(&profile->triggerCondMutex);
                 if(profile->triggerReportOnCondition)
                 {
@@ -479,7 +479,7 @@ static void* CollectAndReport(void* data)
                 if(ret != T2ERROR_SUCCESS)
                 {
                     T2Error("Unable to generate report for : %s\n", profile->name);
-                    profile->reportInProgress = false;
+                    atomic_store(&profile->reportInProgress, false);
                     if(profile->triggerReportOnCondition)
                     {
                         profile->triggerReportOnCondition = false ;
@@ -519,7 +519,7 @@ static void* CollectAndReport(void* data)
                     if(cJSON_GetArraySize(array) == 0)
                     {
                         T2Warning("Array size of Report is %d. Report is empty. Cannot send empty report\n", cJSON_GetArraySize(array));
-                        profile->reportInProgress = false;
+                        atomic_store(&profile->reportInProgress, false);
                         if(profile->triggerReportOnCondition)
                         {
                             T2Info(" Unlock trigger condition mutex and set report on condition to false \n");
@@ -584,7 +584,7 @@ static void* CollectAndReport(void* data)
                                     free(httpUrl);
                                     httpUrl = NULL;
                                 }
-                                profile->reportInProgress = false;
+                                atomic_store(&profile->reportInProgress, false);
                                 if(profile->triggerReportOnCondition)
                                 {
                                     T2Info(" Unlock trigger condition mutex and set report on condition to false \n");
@@ -630,7 +630,7 @@ static void* CollectAndReport(void* data)
                                 T2Error("Profile : %s pthread_cond_timedwait ERROR!!!\n", profile->name);
                                 pthread_mutex_unlock(&profile->reportMutex);
                                 pthread_cond_destroy(&profile->reportcond);
-                                profile->reportInProgress = false;
+                                atomic_store(&profile->reportInProgress, false);
                                 if(profile->triggerReportOnCondition)
                                 {
                                     T2Info(" Unlock trigger condition mutex and set report on condition to false \n");
@@ -690,7 +690,7 @@ static void* CollectAndReport(void* data)
                             if(profile->SendErr > 3 && !(rbusCheckMethodExists(profile->t2RBUSDest->rbusMethodName)))   //to delete the profile in the next CollectAndReport or triggercondition
                             {
                                 T2Debug("RBUS_METHOD doesn't exists after 3 retries\n");
-                                profile->reportInProgress = false;
+                                atomic_store(&profile->reportInProgress, false);
                                 if(profile->triggerReportOnCondition)
                                 {
                                     profile->triggerReportOnCondition = false ;
@@ -769,7 +769,7 @@ static void* CollectAndReport(void* data)
             jsonReport = NULL;
         }
 
-        profile->reportInProgress = false;
+        atomic_store(&profile->reportInProgress, false);
         if(profile->triggerReportOnCondition)
         {
             T2Info(" Unlock trigger condition mutex and set report on condition to false \n");
@@ -794,7 +794,7 @@ reportThreadEnd :
     while(profile->enable);
     T2Info("%s --out Exiting collect and report Thread\n", __FUNCTION__);
     pthread_mutex_lock(&profile->reportInProgressMutex);
-    profile->reportInProgress = false;
+    atomic_store(&profile->reportInProgress, false);
     pthread_mutex_unlock(&profile->reportInProgressMutex);
     profile->threadExists = false;
     pthread_mutex_unlock(&profile->reuseThreadMutex);
@@ -818,29 +818,33 @@ void NotifyTimeout(const char* profileName, bool isClearSeekMap)
 
     pthread_mutex_unlock(&plMutex);
     T2Info("%s: profile %s is in %s state\n", __FUNCTION__, profileName, profile->enable ? "Enabled" : "Disabled");
-    pthread_mutex_lock(&profile->reportInProgressMutex);
-    if(profile->enable && !profile->reportInProgress)
-    {
-        profile->reportInProgress = true;
-        profile->bClearSeekMap = isClearSeekMap;
-        /* To avoid previous report thread to go into zombie state, mark it detached. */
-        if (profile->threadExists)
-        {
-            T2Info("Signal Thread To restart\n");
+    
+    // ✅ THREAD SAFETY: Atomic compare-and-swap eliminates TOCTOU race condition
+    if(profile->enable) {
+        bool expected = false;
+        if(atomic_compare_exchange_strong(&profile->reportInProgress, &expected, true)) {
+            // Successfully acquired report generation rights atomically
+            profile->bClearSeekMap = isClearSeekMap;
+            /* To avoid previous report thread to go into zombie state, mark it detached. */
+            if (profile->threadExists)
+            {
+                T2Info("Signal Thread To restart\n");
             pthread_mutex_lock(&profile->reuseThreadMutex);
             pthread_cond_signal(&profile->reuseThread);
             pthread_mutex_unlock(&profile->reuseThreadMutex);
+            }
+            else
+            {
+                pthread_create(&profile->reportThread, NULL, CollectAndReport, (void*)profile);
+            }
         }
-        else
-        {
-            pthread_create(&profile->reportThread, NULL, CollectAndReport, (void*)profile);
+        else {
+            // CAS failed - another thread already set reportInProgress = true
+            T2Warning("Report generation already in progress - ignoring the request\n");
         }
+    } else {
+        T2Warning("Profile is disabled - ignoring the request\n");
     }
-    else
-    {
-        T2Warning("Either profile is disabled or report generation still in progress - ignoring the request\n");
-    }
-    pthread_mutex_unlock(&profile->reportInProgressMutex);
     T2Debug("%s --out\n", __FUNCTION__);
 }
 
@@ -1045,6 +1049,8 @@ T2ERROR enableProfile(const char *profileName)
     else
     {
         profile->enable = true;
+        // Initialize atomic reportInProgress flag - safe concurrent access without mutex
+        atomic_init(&profile->reportInProgress, false);
         if(pthread_mutex_init(&profile->triggerCondMutex, NULL) != 0)
         {
             T2Error(" %s Mutex init has failed\n", __FUNCTION__);

From b06964fb5e3661cdf641c8b259a445a6a70c457c Mon Sep 17 00:00:00 2001
From: Aravindan NC <35158113+AravindanNC@users.noreply.github.com>
Date: Thu, 2 Apr 2026 16:29:23 -0400
Subject: [PATCH 10/12] Delete
 docs/architecture/summarized_thread_safety_hardening.md

---
 .../summarized_thread_safety_hardening.md     | 211 ------------------
 1 file changed, 211 deletions(-)
 delete mode 100644 docs/architecture/summarized_thread_safety_hardening.md

diff --git a/docs/architecture/summarized_thread_safety_hardening.md b/docs/architecture/summarized_thread_safety_hardening.md
deleted file mode 100644
index e1e90403..00000000
--- a/docs/architecture/summarized_thread_safety_hardening.md
+++ /dev/null
@@ -1,211 +0,0 @@
-# Telemetry Thread Safety Hardening - Summary
-
-## User Story
-**[T2] [RDKB] Harden Telemetry Thread Safety Under Concurrent Load**
-
-Eliminate deadlocks and race conditions under concurrent load scenarios (15+ profiles with extended offline periods).
-
----
-
-## 🔴 BEFORE: Current Architecture with Thread Safety Issues
-
-```mermaid
-graph TB
-    subgraph "Application Layer"
-        APP[Applications<br/>Multiple concurrent calls]
-    end
-    
-    subgraph "Telemetry Process - Thread Safety Issues"
-        ER[Event Receiver<br/>Thread]
-        XC[XConf Client<br/>Thread]
-        SCHED[Scheduler<br/>Thread]
-        
-        RT1[Report Thread 1]
-        RT2[Report Thread 2]
-        RT15[Report Thread 15+]
-        
-        subgraph "🔴 Problematic Shared Resources"
-            PROF[Profile List<br/>🔴 Global plMutex<br/>🔴 Lock contention<br/>🔴 No lock ordering]
-            POOL[Connection Pool<br/>🔴 pool_mutex deadlock<br/>🔴 NO timeout<br/>🔴 Size: 1-5 handles]
-        end
-    end
-    
-    subgraph "External Systems"
-        XCONF[XConf Server]
-        SERVER[Collection Server]
-    end
-    
-    APP -->|Events| ER
-    XCONF -->|Config| XC
-    
-    ER -->|🔴 Lock| PROF
-    XC -->|🔴 Lock holds long| PROF
-    SCHED -->|🔴 Lock| PROF
-    
-    PROF -->|🔴 Blocks| RT1
-    PROF -->|🔴 Blocks| RT2
-    PROF -->|🔴 Blocks| RT15
-    
-    RT1 -->|🔴 Waits forever| POOL
-    RT2 -->|🔴 Waits forever| POOL
-    RT15 -->|🔴 Waits forever| POOL
-    
-    POOL -->|HTTP| SERVER
-    
-    DEADLOCK1[🔴 DEADLOCK 1:<br/>RT1 holds plMutex, waits for pool_mutex<br/>RT2 holds pool_mutex, waits for plMutex]
-    DEADLOCK2[🔴 DEADLOCK 2:<br/>XConf holds plMutex during config update<br/>All report threads block indefinitely]
-    RACE1[🔴 RACE CONDITION:<br/>reportInProgress flag<br/>Time-of-check to time-of-use]
-    STARVATION[🔴 STARVATION:<br/>Pool exhausted, no timeout<br/>Threads spin-wait forever]
-    
-    style PROF fill:#FFE6E6
-    style POOL fill:#FFE6E6
-    style RT1 fill:#FFE6E6
-    style RT2 fill:#FFE6E6
-    style RT15 fill:#FFE6E6
-    style ER fill:#FFE6E6
-    style XC fill:#FFE6E6
-```
-
-### Critical Issues Identified
-
-| Issue | Impact | Affected Components |
-|-------|--------|-------------------|
-| **Global Lock Contention** | All operations block on single plMutex | Profile List, Event Receiver, XConf Client, Report Threads |
-| **Connection Pool Deadlock** | Circular wait: plMutex ↔ pool_mutex | Report Threads, Connection Pool |
-| **No Pool Timeout** | Threads spin-wait indefinitely if pool exhausted | All Report Threads (15+ concurrent) |
-| **Race Condition** | reportInProgress TOCTOU vulnerability | Profile lifecycle, multiple threads |
-| **Use-After-Free Risk** | Profile deletion during active report | XConf updates, Report Threads |
-| **Undocumented Lock Ordering** | Ad-hoc locking leads to deadlocks | Entire codebase |
-
----
-
-## 🟢 AFTER: Hardened Architecture with Thread Safety
-
-```mermaid
-graph TB
-    subgraph "Application Layer"
-        APP[Applications<br/>Multiple concurrent calls]
-    end
-    
-    subgraph "Telemetry Process - Hardened Thread Safety"
-        ER[Event Receiver<br/>Thread]
-        XC[XConf Client<br/>Thread]
-        SCHED[Scheduler<br/>Thread]
-        
-        RT1[Report Thread 1]
-        RT2[Report Thread 2]
-        RT15[Report Thread 15+]
-        
-        subgraph "🟢 Hardened Shared Resources"
-            PROF[Profile List<br/>🟢 Fine-grained locks<br/>🟢 Refcounting<br/>🟢 Strict lock ordering]
-            POOL[Connection Pool<br/>🟢 35s timeout<br/>🟢 Backpressure<br/>🟢 Size: 1-5 handles]
-        end
-    end
-    
-    subgraph "External Systems"
-        XCONF[XConf Server]
-        SERVER[Collection Server]
-    end
-    
-    subgraph "🔍 Validation Layer"
-        TSAN[ThreadSanitizer<br/>Race detection]
-        STATIC[Static Analysis<br/>Lock order checker]
-        METRICS[Production Metrics<br/>Contention tracking]
-    end
-    
-    APP -->|Events| ER
-    XCONF -->|Config| XC
-    
-    ER -->|🟢 Per-profile lock| PROF
-    XC -->|🟢 Refcount + short lock| PROF
-    SCHED -->|🟢 Per-profile lock| PROF
-    
-    PROF -->|🟢 Non-blocking| RT1
-    PROF -->|🟢 Non-blocking| RT2
-    PROF -->|🟢 Non-blocking| RT15
-    
-    RT1 -->|🟢 35s timeout| POOL
-    RT2 -->|🟢 35s timeout| POOL
-    RT15 -->|🟢 35s timeout| POOL
-    
-    POOL -->|HTTP| SERVER
-    POOL -.Timeout.-> RT15
-    RT15 -.Backpressure.-> SCHED
-    
-    PROF -.Monitored.-> TSAN
-    POOL -.Enforced.-> STATIC
-    RT1 -.Tracked.-> METRICS
-    
-    FIXED1[🟢 NO DEADLOCK:<br/>Strict lock hierarchy<br/>Level 1: Profile List<br/>Level 2: Profile Instance<br/>Level 3: Connection Pool]
-    FIXED2[🟢 ATOMIC FLAGS:<br/>reportInProgress uses CAS<br/>Race-free synchronization]
-    FIXED3[🟢 SAFE DELETION:<br/>Reference counting<br/>Profiles deleted only at refcount=0]
-    FIXED4[🟢 TIMEOUT PROTECTION:<br/>Pool acquire fails at 35s<br/>Scheduler backs off gracefully]
-    
-    style PROF fill:#E6FFE6
-    style POOL fill:#E6FFE6
-    style RT1 fill:#E6FFE6
-    style RT2 fill:#E6FFE6
-    style RT15 fill:#E6FFE6
-    style ER fill:#E6FFE6
-    style XC fill:#E6FFE6
-    style TSAN fill:#E6F3FF
-    style STATIC fill:#E6F3FF
-    style METRICS fill:#E6F3FF
-```
-
-### Hardening Solutions Applied
-
-| Solution | Benefit | Implementation |
-|----------|---------|----------------|
-| **Fine-Grained Locking** | Eliminates global bottleneck | Per-profile locks replace coarse plMutex |
-| **Documented Lock Hierarchy** | Prevents deadlocks | Static analysis enforces ordering |
-| **Pool Acquisition Timeout** | Prevents infinite blocking | 35s timeout with backpressure mechanism |
-| **Reference Counting** | Prevents use-after-free | Atomic refcount on profile structures |
-| **Atomic Flags** | Eliminates race conditions | CAS for reportInProgress flag |
-| **ThreadSanitizer Integration** | Early race detection | CI/CD automated testing |
-
----
-
-## Before vs. After Comparison
-
-| Aspect | 🔴 Before | 🟢 After |
-|--------|-----------|----------|
-| **Concurrency** | Global plMutex → all threads block | Per-profile locks → 15+ profiles concurrent |
-| **Deadlock Risk** | High (circular wait possible) | Zero (strict lock hierarchy enforced) |
-| **Pool Blocking** | Infinite spin-wait | 35s timeout + backpressure |
-| **Race Conditions** | reportInProgress TOCTOU | Atomic compare-and-swap |
-| **Profile Deletion** | Use-after-free risk | Reference-counted safe deletion |
-| **Lock Ordering** | Undocumented, ad-hoc | Level 1→2→3 hierarchy enforced |
-| **Validation** | Manual testing only | TSan + static analysis + metrics |
-| **Scalability** | Poor (1-3 profiles max) | Production-grade (15+ profiles) |
-| **Production Safety** | Service hangs, crashes | Graceful degradation under load |
-
-
----
-
-## Validation Strategy
-
-```mermaid
-graph LR
-    CODE[Codebase] --> STATIC[Static Analysis<br/>Lock order checker]
-    CODE --> TSAN[ThreadSanitizer<br/>Race detection]
-    CODE --> LOAD[Load Testing<br/>15+ profiles]
-    
-    STATIC --> PASS{All Pass?}
-    TSAN --> PASS
-    LOAD --> PASS
-    
-    PASS -->|Yes| DEPLOY[Deploy to<br/>Sprint Testing]
-    PASS -->|No| FIX[Fix Issues]
-    
-    FIX --> CODE
-    
-    DEPLOY --> MONITOR[Sprint NG Build<br/>Monitoring]
-    MONITOR --> METRICS[Metrics:<br/>Contention<br/>Timeouts<br/>Crashes]
-    
-    style STATIC fill:#E6F3FF
-    style TSAN fill:#FFF9E6
-    style MONITOR fill:#F0E6FF
-```
-
----

From 7a12b2eb371328ed20a28afda0bb4a91b6887272 Mon Sep 17 00:00:00 2001
From: Aravindan NC <35158113+AravindanNC@users.noreply.github.com>
Date: Thu, 2 Apr 2026 16:29:43 -0400
Subject: [PATCH 11/12] Delete
 docs/architecture/thread-safety-hardening-diagram.md

---
 .../thread-safety-hardening-diagram.md        | 622 ------------------
 1 file changed, 622 deletions(-)
 delete mode 100644 docs/architecture/thread-safety-hardening-diagram.md

diff --git a/docs/architecture/thread-safety-hardening-diagram.md b/docs/architecture/thread-safety-hardening-diagram.md
deleted file mode 100644
index 8487f831..00000000
--- a/docs/architecture/thread-safety-hardening-diagram.md
+++ /dev/null
@@ -1,622 +0,0 @@
-# Telemetry Thread Safety Hardening - Architecture Diagram
-
-## User Story
-**[T2] [RDKB] Harden Telemetry Thread Safety Under Concurrent Load**
-
-Harden critical synchronization paths across telemetry modules to eliminate deadlocks and race conditions under concurrent load scenarios (15+ profiles with extended offline periods).
-
----
-
-## 1. High-Level Component Architecture with Threading
-
-```mermaid
-graph TB
-    subgraph "External Systems"
-        APPS[Applications<br/>t2_event_s/d/f calls]
-        XCONF[XConf Server<br/>Configuration Source]
-        COLLECTOR[Collection Server<br/>HTTPS/RBUS]
-    end
-    
-    subgraph "Telemetry Core Process"
-        subgraph "Main Thread"
-            MAIN[Main Thread<br/>Initialization & Cleanup]
-        end
-        
-        subgraph "Event Collection Thread"
-            ER[Event Receiver Thread<br/>🔴 Queue processing<br/>⚠️ High cyclomatic complexity]
-            EQ[(Event Queue<br/>Max: 200 events<br/>🔴 Lock contention)]
-        end
-        
-        subgraph "Configuration Thread"
-            XC[XConf Client Thread<br/>🔴 Config update races<br/>Periodic fetch]
-        end
-        
-        subgraph "Scheduling Thread"
-            SCHED[Scheduler Thread<br/>Timer-based triggers]
-        end
-        
-        subgraph "Per-Profile Report Threads (1-15+)"
-            RT1[Report Thread 1<br/>🔴 Deadlock risk<br/>plMutex + pool_mutex]
-            RT2[Report Thread 2<br/>...]
-            RTN[Report Thread N<br/>🔴 Connection pool blocking]
-        end
-        
-        subgraph "Data Model Threads"
-            DM[Data Model Thread<br/>TR-181/RBUS queries]
-        end
-        
-        subgraph "Shared Resources"
-            PROF[(Profile List<br/>🔴 plMutex contention<br/>⚠️ Lock ordering issues)]
-            POOL[(Connection Pool<br/>🔴 pool_mutex deadlock<br/>Size: 1-5 handles<br/>⚠️ No timeout!)]
-            MARKERS[(Marker Cache<br/>Hash map lookup)]
-        end
-    end
-    
-    APPS -->|t2_event_*| ER
-    ER --> EQ
-    EQ --> MARKERS
-    MARKERS --> PROF
-    
-    XCONF -->|HTTPS| XC
-    XC -->|🔴 Write lock| PROF
-    
-    SCHED -->|Trigger| PROF
-    PROF --> RT1
-    PROF --> RT2
-    PROF --> RTN
-    
-    RT1 -->|Acquire| POOL
-    RT2 -->|Acquire| POOL
-    RTN -->|🔴 Blocks forever| POOL
-    
-    RT1 --> DM
-    POOL -->|HTTPS| COLLECTOR
-    
-    style ER fill:#FFE6E6
-    style RT1 fill:#FFE6E6
-    style RTN fill:#FFE6E6
-    style POOL fill:#FFE6E6
-    style PROF fill:#FFE6E6
-    style XC fill:#FFE6E6
-    style EQ fill:#FFE6E6
-```
-
-**Legend:**
-- 🔴 **Current Critical Issues** - Deadlocks, race conditions, or blocking problems
-- ⚠️ **High Complexity Areas** - Cyclomatic complexity or maintainability concerns
-- 🟢 **Hardened Solutions** - Applied in hardening effort (shown in later diagrams)
-
----
-
-## 2. Thread Interaction & Synchronization Points
-
-```mermaid
-sequenceDiagram
-    participant App as Application<br/>(External)
-    participant ER as Event Receiver<br/>Thread
-    participant XC as XConf Client<br/>Thread
-    participant Sched as Scheduler<br/>Thread
-    participant RT1 as Report Thread 1
-    participant RT2 as Report Thread 2
-    participant Pool as Connection Pool<br/>(Shared Resource)
-    participant Prof as Profile List<br/>(plMutex)
-    
-    Note over App,Pool: 🔴 Problem Scenario: Report Generation Deadlock
-    
-    App->>ER: t2_event_s("WIFI_ERROR")
-    activate ER
-    ER->>ER: Lock erMutex
-    ER->>Prof: Lock plMutex
-    Note right of Prof: 🔴 DEADLOCK RISK:<br/>Lock order violation
-    
-    par Configuration Update (Concurrent)
-        XC->>Prof: Lock plMutex<br/>🔴 Already locked!
-        Note right of XC: ⏳ Blocks waiting...
-    and Report Thread 1 (Concurrent)
-        Sched->>RT1: Trigger report
-        activate RT1
-        RT1->>Prof: Lock plMutex<br/>🔴 Already locked!
-        Note right of RT1: ⏳ Blocks waiting...
-    and Report Thread 2 (Concurrent)
-        Sched->>RT2: Trigger report
-        activate RT2
-        RT2->>Pool: Acquire connection
-        Note right of Pool: 🔴 All handles busy
-        RT2->>Pool: ⏳ Spin-wait<br/>NO TIMEOUT!
-        Note right of RT2: 🔴 Can block forever<br/>if RT1 holds handle
-    end
-    
-    ER->>Prof: Unlock plMutex
-    ER->>ER: Unlock erMutex
-    deactivate ER
-    
-    RT1->>Prof: Lock acquired
-    RT1->>Pool: Acquire connection
-    RT1->>Pool: ⏳ Spin-wait
-    Note over RT1,RT2: 🔴 DEADLOCK:<br/>RT1 waits for pool<br/>RT2 holds pool, waits for plMutex<br/>plMutex held by XC
-    
-    deactivate RT1
-    deactivate RT2
-```
-
----
-
-## 3. Critical Synchronization Mechanisms (Current State)
-
-### Current Mutex Inventory
-
-```mermaid
-graph LR
-    subgraph "Global Mutexes"
-        PM[plMutex<br/>🔴 Profile List<br/>High contention]
-        POOLM[pool_mutex<br/>🔴 Connection Pool<br/>Deadlock risk]
-        ERM[erMutex<br/>Event Queue]
-        SCM[scMutex<br/>Scheduler]
-        XCM[xcMutex<br/>XConf Client]
-    end
-    
-    subgraph "Per-Profile Mutexes"
-        RIPM[reportInProgressMutex<br/>Per profile]
-        TCM[triggerCondMutex<br/>Per profile]
-        EM[eventMutex<br/>Per profile]
-        RM[reportMutex<br/>Per profile]
-    end
-    
-    subgraph "Condition Variables"
-        RIPC[reportInProgressCond]
-        RC[reportcond]
-        ERC[erCond]
-        SCC[xcCond]
-    end
-    
-    PM ---|🔴 Lock order<br/>violation risk| RIPM
-    POOLM ---|🔴 Circular<br/>dependency| PM
-    PM ---|Used by| ERM
-    
-    RIPM -.Signal.-> RIPC
-    RM -.Signal.-> RC
-    ERM -.Signal.-> ERC
-    XCM -.Signal.-> SCC
-    
-    style PM fill:#FFE6E6
-    style POOLM fill:#FFE6E6
-    style RIPM fill:#FFE6E6
-```
-
-### 🔴 Current Lock Ordering Issues
-
-**No documented lock ordering!** Current code exhibits these patterns:
-
-```c
-// Pattern 1: Event Receiver -> Profile List
-pthread_mutex_lock(&erMutex);
-pthread_mutex_lock(&plMutex);    // ← Lock order A→B
-
-// Pattern 2: Report Thread -> Pool
-pthread_mutex_lock(&plMutex);     
-acquire_pool_handle();             // Acquires pool_mutex internally
-// ← Lock order A→C
-
-// Pattern 3: XConf Update -> Profile
-pthread_mutex_lock(&plMutex);     // ← Can block report threads
-// Long-running configuration update
-pthread_mutex_unlock(&plMutex);
-
-// Pattern 4: reportInProgress flag access
-// 🔴 RACE CONDITION: Accessed without consistent protection!
-if (!profile->reportInProgress) {  // ← Read without lock in some paths
-    profile->reportInProgress = true;
-}
-```
-
----
-
-## 4. Critical Data Flow: Report Generation with Concurrent Load
-
-```mermaid
-sequenceDiagram
-    participant Sched as Scheduler
-    participant Prof as Profile<br/>(plMutex)
-    participant RT as Report Thread
-    participant Pool as Connection Pool<br/>(pool_mutex)
-    participant DM as Data Model<br/>Client
-    participant Srv as Collection<br/>Server
-    
-    Note over Sched,Srv: 🔴 Problematic Flow: 15+ Profiles Under Load
-    
-    loop For each of 15+ profiles
-        Sched->>Prof: Lock plMutex
-        Sched->>Prof: Check reportInProgress
-        
-        alt Report NOT in progress
-            Prof->>Prof: Set reportInProgress = true
-            Prof->>RT: Create/signal thread
-            Prof->>Prof: Unlock plMutex
-            
-            activate RT
-            RT->>Prof: Lock plMutex<br/>🔴 Re-acquire lock!
-            RT->>Prof: Get profile data
-            RT->>Prof: Unlock plMutex
-            
-            RT->>Pool: Acquire handle<br/>Lock pool_mutex
-            Note right of Pool: 🔴 BLOCKING POINT<br/>If pool exhausted,<br/>spin-wait with NO timeout
-            
-            alt Pool handle available
-                Pool-->>RT: Return handle
-                RT->>DM: Get TR-181 params
-                DM-->>RT: Parameter values
-                RT->>RT: Build JSON report
-                RT->>Srv: HTTP POST (via CURL)
-                Srv-->>RT: 200 OK
-                RT->>Pool: Release handle<br/>Unlock pool_mutex
-            else 🔴 All handles busy (>35s)
-                Pool-->>RT: TIMEOUT (new)
-                RT->>RT: Fail report
-                RT->>Prof: reportInProgress = false
-                Note right of RT: 🟢 HARDENED:<br/>Timeout prevents<br/>indefinite blocking
-            end
-            
-            RT->>Prof: Lock reportInProgressMutex
-            RT->>Prof: Set reportInProgress = false
-            RT->>Prof: Signal reportInProgressCond
-            RT->>Prof: Unlock reportInProgressMutex
-            deactivate RT
-            
-        else 🔴 Report already in progress
-            Note right of Prof: ⚠️ Skip this cycle<br/>Can accumulate delays<br/>under sustained load
-            Prof->>Prof: Unlock plMutex
-        end
-    end
-```
-
-**Critical Path Issues:**
-1. **plMutex held during thread creation** - Blocks all profile operations
-2. **No pool acquisition timeout** - Can block indefinitely if pool exhausted
-3. **reportInProgress flag** - Pattern allows race between check and set
-4. **Profile count scales badly** - 15+ profiles = 15+ lock cycles per scheduler tick
-
----
-
-## 5. Problem Areas: Annotated Critical Sections
-
-```mermaid
-graph TB
-    subgraph "🔴 Problem Area 1: Report Generation Deadlock"
-        P1A[Profile Update<br/>Holds plMutex]
-        P1B[Report Thread<br/>Waits for plMutex]
-        P1C[Connection Pool<br/>Held by another thread]
-        
-        P1A -->|Blocks| P1B
-        P1B -->|Waits for| P1C
-        P1C -->|Held by blocked thread| P1A
-        
-        P1Note[🔴 Circular wait:<br/>A→B→C→A]
-    end
-    
-    subgraph "🔴 Problem Area 2: Connection Pool Exhaustion"
-        P2A[15+ profiles trigger<br/>simultaneously]
-        P2B[Pool size: 1-5 handles]
-        P2C[No timeout on acquire]
-        P2D[Threads spin-wait forever]
-        
-        P2A --> P2B
-        P2B --> P2C
-        P2C --> P2D
-        
-        P2Note[🔴 Starvation:<br/>Threads blocked indefinitely<br/>No backpressure mechanism]
-    end
-    
-    subgraph "🔴 Problem Area 3: Configuration Update Race"
-        P3A[XConf receives update]
-        P3B[Lock plMutex]
-        P3C[Delete old profiles]
-        P3D[Create new profiles]
-        P3E[Unlock plMutex]
-        
-        P3A --> P3B
-        P3B --> P3C
-        P3C --> P3D
-        P3D --> P3E
-        
-        P3RC[🔴 Race condition:<br/>Report threads may access<br/>deleted profile memory<br/>Use-after-free risk]
-        
-        P3D -.Race.-> P3RC
-    end
-    
-    subgraph "🔴 Problem Area 4: reportInProgress Flag Sync"
-        P4A[Check: !reportInProgress]
-        P4B[Set: reportInProgress = true]
-        P4C[Thread 2 checks same flag]
-        
-        P4A -.Window.-> P4C
-        P4C -.Race.-> P4B
-        
-        P4Note[🔴 TOCTOU Race:<br/>Time-of-check to<br/>time-of-use vulnerability<br/>Multiple threads enter<br/>critical section]
-    end
-    
-    style P1A fill:#FFE6E6
-    style P1B fill:#FFE6E6
-    style P1C fill:#FFE6E6
-    style P2A fill:#FFE6E6
-    style P2D fill:#FFE6E6
-    style P3C fill:#FFE6E6
-    style P3RC fill:#FFE6E6
-    style P4A fill:#FFE6E6
-    style P4B fill:#FFE6E6
-```
-
----
-
-## 6. Hardened Architecture: Solutions Applied
-
-### Solution 1: Documented Lock Ordering
-```mermaid
-graph LR
-    S1[Strict Lock Hierarchy:<br/>1. plMutex global profile list<br/>2. profile mutexes instance<br/>3. pool_mutex connection pool<br/>4. erMutex event queue]
-    S1A[Validation: Static analysis<br/>enforces at compile-time]
-    S1B[Runtime: Lock tracking<br/>with debug assertions]
-    
-    S1 --> S1A
-    S1 --> S1B
-    
-    style S1 fill:#E6FFE6
-    style S1A fill:#E6FFE6
-    style S1B fill:#E6FFE6
-```
-
-### Solution 2: Pool Acquisition Timeout
-```mermaid
-graph LR
-    S2[Timeout: 35 seconds<br/>on pool acquisition]
-    S2A[Fail fast: Return error<br/>instead of infinite wait]
-    S2B[Backpressure: Scheduler<br/>backs off on failures]
-    S2C[Metrics: Track pool<br/>contention and timeouts]
-    
-    S2 --> S2A
-    S2 --> S2B
-    S2 --> S2C
-    
-    style S2 fill:#E6FFE6
-    style S2A fill:#E6FFE6
-    style S2B fill:#E6FFE6
-    style S2C fill:#E6FFE6
-```
-
-### Solution 3: Reference-Counted Profiles
-```mermaid
-graph LR
-    S3[Profile Refcount:<br/>Atomic increment/decrement]
-    S3A[Safe deletion:<br/>Wait for refcount = 0]
-    S3B[Use-after-free:<br/>Prevented by refcount]
-    
-    S3 --> S3A
-    S3 --> S3B
-    
-    style S3 fill:#E6FFE6
-    style S3A fill:#E6FFE6
-    style S3B fill:#E6FFE6
-```
-
-### Solution 4: Atomic reportInProgress
-```mermaid
-graph LR
-    S4[Atomic flag:<br/>Compare-and-swap]
-    S4A[Race-free:<br/>Only one thread succeeds]
-    S4B[No mutex needed:<br/>Reduced contention]
-    
-    S4 --> S4A
-    S4 --> S4B
-    
-    style S4 fill:#E6FFE6
-    style S4A fill:#E6FFE6
-    style S4B fill:#E6FFE6
-```
-
-### Solution 5: Fine-Grained Locking
-```mermaid
-graph LR
-    S5[Per-profile locks:<br/>Replace coarse plMutex]
-    S5A[Concurrent profiles:<br/>Different profiles do not block]
-    S5B[Reduced contention:<br/>15+ profiles scale better]
-    
-    S5 --> S5A
-    S5 --> S5B
-    
-    style S5 fill:#E6FFE6
-    style S5A fill:#E6FFE6
-    style S5B fill:#E6FFE6
-```
-
-### Solution 6: ThreadSanitizer Integration
-```mermaid
-graph LR
-    S6[TSan enabled:<br/>Detect races at runtime]
-    S6A[CI/CD integration:<br/>Automated testing]
-    S6B[Production monitoring:<br/>Detect edge cases]
-    
-    S6 --> S6A
-    S6 --> S6B
-    
-    style S6 fill:#E6FFE6
-    style S6A fill:#E6FFE6
-    style S6B fill:#E6FFE6
-```
-
----
-
-## 7. Hardened Report Generation Flow (After Fixes)
-
-```mermaid
-sequenceDiagram
-    participant Sched as Scheduler
-    participant Prof as Profile<br/>(Fine-grained lock)
-    participant RT as Report Thread
-    participant Pool as Connection Pool<br/>(With timeout)
-    participant Srv as Server
-    
-    Note over Sched,Srv: 🟢 Hardened Flow: Safe Under 15+ Concurrent Profiles
-    
-    Sched->>Prof: Lock profile→scheduleMutex<br/>🟢 Fine-grained, not global
-    Sched->>Prof: Atomic CAS reportInProgress<br/>🟢 Race-free
-    
-    alt CAS succeeded
-        Prof->>Prof: Increment refcount<br/>🟢 Prevent deletion
-        Prof-->>Sched: Success
-        Sched->>Prof: Unlock scheduleMutex
-        
-        Sched->>RT: Signal thread
-        activate RT
-        
-        RT->>Prof: Lock profile→dataMutex<br/>🟢 Independent of schedule lock
-        RT->>Prof: Read profile config
-        RT->>Prof: Unlock dataMutex
-        
-        RT->>Pool: acquire_pool_handle()<br/>with 35s timeout
-        
-        alt Pool handle available
-            Pool-->>RT: Handle acquired
-            RT->>Srv: HTTP POST
-            Srv-->>RT: 200 OK
-            RT->>Pool: Release handle
-            
-        else 🟢 Timeout after 35s
-            Pool-->>RT: T2ERROR_FAILURE
-            RT->>RT: Log pool timeout
-            RT->>Sched: Signal backoff
-            Note right of Sched: 🟢 Scheduler adjusts<br/>retry interval
-        end
-        
-        RT->>Prof: Atomic store reportInProgress = false
-        RT->>Prof: Decrement refcount<br/>🟢 Safe to delete if 0
-        deactivate RT
-        
-    else CAS failed (already in progress)
-        Note right of Prof: 🟢 Expected behavior<br/>No contention/blocking
-        Prof-->>Sched: Skip this cycle
-        Sched->>Prof: Unlock scheduleMutex
-    end
-```
-
-**Improvements:**
-- ✅ Fine-grained per-profile locks eliminate global contention
-- ✅ Atomic CAS eliminates reportInProgress races
-- ✅ Reference counting prevents use-after-free
-- ✅ Pool timeout prevents indefinite blocking
-- ✅ Backpressure mechanism handles load spikes
-
----
-
-## 8. Lock Ordering Hierarchy (Hardened)
-
-```mermaid
-graph TD
-    L1[Level 1: Profile List Lock<br/>profileListMutex<br/>🟢 Short critical sections only]
-    L2[Level 2: Profile Instance Locks<br/>profile→scheduleMutex<br/>profile→dataMutex<br/>profile→eventMutex<br/>🟢 Independent per profile]
-    L3[Level 3: Connection Pool<br/>pool_mutex<br/>🟢 Timeout-protected]
-    L4[Level 4: Event Queue<br/>erMutex<br/>🟢 Lowest priority]
-    
-    L1 -->|May acquire| L2
-    L2 -->|May acquire| L3
-    L2 -->|May acquire| L4
-    
-    L1 -.Never.-> L3
-    L1 -.Never.-> L4
-    L3 -.Never.-> L1
-    L3 -.Never.-> L2
-    L4 -.Never.-> L1
-    
-    RULE1[🟢 Rule: Always acquire<br/>in descending order<br/>Never hold L2+ while acquiring L1]
-    RULE2[🟢 Rule: Pool operations<br/>must not hold profile locks<br/>Release before acquire_pool_handle]
-    RULE3[🟢 Validation: Static analyzer<br/>enforces at compile time<br/>ThreadSanitizer checks at runtime]
-    
-    style L1 fill:#E6FFE6
-    style L2 fill:#E6FFE6
-    style L3 fill:#E6FFE6
-    style L4 fill:#E6FFE6
-```
-
----
-
-## 9. Validation Strategy
-
-```mermaid
-graph LR
-    subgraph "🔍 Static Analysis"
-        SA1[Clang Thread Safety<br/>Annotations]
-        SA2[Lock Order Checker]
-        SA3[Cyclomatic Complexity<br/>Analysis]
-    end
-    
-    subgraph "🧪 Dynamic Testing"
-        DT1[ThreadSanitizer TSan<br/>Race detection]
-        DT2[Deadlock Detector<br/>Lock cycle detection]
-        DT3[Load Testing<br/>15+ concurrent profiles]
-    end
-    
-    subgraph "📊 Production Monitoring"
-        PM1[Lock contention metrics]
-        PM2[Pool timeout counters]
-        PM3[Report failure rates]
-    end
-    
-    SA1 --> CODE[Codebase]
-    SA2 --> CODE
-    SA3 --> CODE
-    
-    CODE --> DT1
-    CODE --> DT2
-    CODE --> DT3
-    
-    DT1 --> PASS{All checks<br/>pass?}
-    DT2 --> PASS
-    DT3 --> PASS
-    
-    PASS -->|Yes| DEPLOY[Deploy]
-    PASS -->|No| FIX[Fix Issues]
-    FIX --> CODE
-    
-    DEPLOY --> PM1
-    DEPLOY --> PM2
-    DEPLOY --> PM3
-    
-    style SA1 fill:#E6F3FF
-    style DT1 fill:#FFF9E6
-    style PM1 fill:#F0E6FF
-```
-
----
-
-## 10. Summary: Before vs. After Hardening
-
-| Aspect | 🔴 Before Hardening | 🟢 After Hardening |
-|--------|---------------------|-------------------|
-| **Lock Ordering** | Undocumented, ad-hoc | Strict hierarchy enforced by static analysis |
-| **Pool Blocking** | Infinite spin-wait | 35s timeout with backpressure |
-| **Profile Deletion** | Use-after-free risk | Reference-counted, safe deletion |
-| **reportInProgress** | TOCTOU race condition | Atomic compare-and-swap |
-| **Concurrency** | Global plMutex bottleneck | Per-profile fine-grained locks |
-| **Validation** | Manual testing only | TSan + static analysis + load tests |
-
----
-
-## Acceptance Criteria Coverage
-
-✅ **Report generation/connection deadlocks eliminated** - Pool timeout + lock ordering  
-✅ **Configuration client synchronization hardened** - Reference counting + fine-grained locks  
-✅ **Profile lifecycle race conditions resolved** - Atomic flags + proper synchronization  
-✅ **ThreadSanitizer integration complete** - CI/CD automated testing  
-✅ **Cyclomatic complexity reduced** - Refactored critical paths  
-✅ **Production-grade reliability verified** - Load tested with 15+ profiles under prolonged offline periods  
-
----
-
-## References
-
-- Main implementation: [source/bulkdata/profile.c](../../source/bulkdata/profile.c)
-- Connection pool: [source/protocol/http/multicurlinterface.c](../../source/protocol/http/multicurlinterface.c)
-- Configuration client: [source/xconf-client/xconfclient.c](../../source/xconf-client/xconfclient.c)
-- Event receiver: [source/bulkdata/t2eventreceiver.c](../../source/bulkdata/t2eventreceiver.c)
-- Architecture overview: [overview.md](./overview.md)
-
----
-

From 59f56ac61d2769f3d5063461da7596f20d19b3bb Mon Sep 17 00:00:00 2001
From: Aravindan NC <35158113+AravindanNC@users.noreply.github.com>
Date: Thu, 2 Apr 2026 16:35:50 -0400
Subject: [PATCH 12/12] Update run_l2.sh

---
 test/run_l2.sh | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/test/run_l2.sh b/test/run_l2.sh
index 5425c27c..00bcdc97 100755
--- a/test/run_l2.sh
+++ b/test/run_l2.sh
@@ -19,12 +19,8 @@
 # limitations under the License.
 ####################################################################################
 
-# ThreadSanitizer is always enabled for L2 tests to catch race conditions
-echo "ThreadSanitizer enabled - running with race condition detection"
-RESULT_DIR="/tmp/l2_test_report_tsan"
-export TSAN_OPTIONS="suppressions=./test/tsan.supp:halt_on_error=1:abort_on_error=1:detect_thread_leaks=1:report_bugs=1"
-
 export top_srcdir=`pwd`
+RESULT_DIR="/tmp/l2_test_report"
 mkdir -p "$RESULT_DIR"
 
 if ! grep -q "LOG_PATH=/opt/logs/" /etc/include.properties; then