Skip to content

test PR#309

Open
AravindanNC wants to merge 13 commits intodevelopfrom
topic/telimprov
Open

test PR#309
AravindanNC wants to merge 13 commits intodevelopfrom
topic/telimprov

Conversation

@AravindanNC
Copy link
Copy Markdown

No description provided.

Copilot AI review requested due to automatic review settings April 2, 2026 20:23
@AravindanNC AravindanNC requested a review from a team as a code owner April 2, 2026 20:23
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces ThreadSanitizer (TSan) support for telemetry builds/tests and begins hardening one concurrency hotspot in bulkdata profile reporting by converting reportInProgress into an atomic flag, alongside adding thread-safety architecture documentation.

Changes:

  • Add --enable-thread-sanitizer configure option and wire TSan flags into the build.
  • Update bulkdata Profile to use an atomic reportInProgress flag and use CAS in NotifyTimeout to avoid TOCTOU.
  • Add TSan suppressions, update L2 test runner to export TSAN_OPTIONS, and add thread-safety hardening docs.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
configure.ac Adds --enable-thread-sanitizer configure toggle and exports sanitizer flags.
source/Makefile.am Attempts to apply sanitizer compile/link flags when enabled.
source/bulkdata/profile.h Changes reportInProgress to atomic_bool and includes <stdatomic.h>.
source/bulkdata/profile.c Uses atomic stores and CAS for reportInProgress, adjusts timeout notification logic.
test/tsan.supp Adds a TSan suppression file for known/third-party races.
test/run_l2.sh Exports TSAN_OPTIONS and changes result directory for L2 tests.
docs/architecture/thread-safety-hardening-diagram.md Adds a detailed thread-safety hardening architecture diagram.
docs/architecture/summarized_thread_safety_hardening.md Adds a summarized before/after thread-safety hardening doc.


if WITH_THREAD_SANITIZER
AM_CFLAGS += $(T2_THREAD_SANITIZER_CFLAGS)
AM_LDFLAGS = $(T2_THREAD_SANITIZER_LDFLAGS)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WITH_THREAD_SANITIZER currently only adds $(T2_THREAD_SANITIZER_CFLAGS) in this directory, so the subdir libraries (bulkdata, scheduler, etc.) are still compiled without TSan instrumentation. That makes the --enable-thread-sanitizer build incomplete and can lead to missed races or inconsistent sanitizer behavior. Consider propagating the sanitizer flags via global CFLAGS/LDFLAGS (in configure.ac) or by appending them in each relevant subdir Makefile.am so all objects in the final binary are built with TSan when enabled.

Suggested change
AM_LDFLAGS = $(T2_THREAD_SANITIZER_LDFLAGS)
CFLAGS += $(T2_THREAD_SANITIZER_CFLAGS)
AM_LDFLAGS += $(T2_THREAD_SANITIZER_LDFLAGS)
LDFLAGS += $(T2_THREAD_SANITIZER_LDFLAGS)

Copilot uses AI. Check for mistakes.

if WITH_THREAD_SANITIZER
AM_CFLAGS += $(T2_THREAD_SANITIZER_CFLAGS)
AM_LDFLAGS = $(T2_THREAD_SANITIZER_LDFLAGS)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AM_LDFLAGS = $(T2_THREAD_SANITIZER_LDFLAGS) overwrites any existing AM_LDFLAGS/user-provided LDFLAGS. Use an additive form (e.g., +=) or attach the sanitizer flags to the specific program’s link flags to avoid accidentally dropping other required linker flags.

Suggested change
AM_LDFLAGS = $(T2_THREAD_SANITIZER_LDFLAGS)
AM_LDFLAGS += $(T2_THREAD_SANITIZER_LDFLAGS)

Copilot uses AI. Check for mistakes.
Comment on lines +74 to +85
AC_ARG_ENABLE([thread-sanitizer],
AS_HELP_STRING([--enable-thread-sanitizer],[enable ThreadSanitizer for race condition detection (default is no)]),
[
case "${enableval}" in
yes) THREAD_SANITIZER_ENABLED=true
T2_THREAD_SANITIZER_CFLAGS="-fsanitize=thread -g -O1"
T2_THREAD_SANITIZER_LDFLAGS="-fsanitize=thread"
AC_MSG_NOTICE([ThreadSanitizer enabled for race condition detection])
;;
no) THREAD_SANITIZER_ENABLED=false ;;
*) AC_MSG_ERROR([bad value ${enableval} for --enable-thread-sanitizer]) ;;
esac
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When --enable-thread-sanitizer is set, the build unconditionally adds -fsanitize=thread. It’s safer to probe whether the compiler/linker support these flags (especially in cross-compilation toolchains) and fail with a clear message if not supported, rather than producing confusing build errors later.

Copilot uses AI. Check for mistakes.
test/run_l2.sh Outdated
Comment on lines 25 to 27
export TSAN_OPTIONS="suppressions=./test/tsan.supp:halt_on_error=1:abort_on_error=1:detect_thread_leaks=1:report_bugs=1"

export top_srcdir=`pwd`
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TSAN_OPTIONS uses a relative suppression path (./test/tsan.supp) and is exported before top_srcdir is set. If this script is run from a different working directory, TSan won’t find the suppression file. Use an absolute path based on the script location (or set top_srcdir first and reference $top_srcdir/test/tsan.supp).

Suggested change
export TSAN_OPTIONS="suppressions=./test/tsan.supp:halt_on_error=1:abort_on_error=1:detect_thread_leaks=1:report_bugs=1"
export top_srcdir=`pwd`
script_dir=$(CDPATH= cd -- "$(dirname -- "$0")" && pwd)
top_srcdir=$(CDPATH= cd -- "$script_dir/.." && pwd)
export top_srcdir
export TSAN_OPTIONS="suppressions=$top_srcdir/test/tsan.supp:halt_on_error=1:abort_on_error=1:detect_thread_leaks=1:report_bugs=1"

Copilot uses AI. Check for mistakes.
Comment on lines +9 to +13
# Suppress races in glibc - system library false positives
race:libc.so.*
race:libpthread.so.*
race:__pthread_*
race:pthread_*
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suppressions for libc.so.* / libpthread.so.* and pthread_* are extremely broad and are likely to match many real race reports (most threaded stacks involve libc/pthread). This can mask genuine races in telemetry code. Prefer suppressing specific known false-positive stacks/symbols and document the originating TSan report for each suppression entry.

Suggested change
# Suppress races in glibc - system library false positives
race:libc.so.*
race:libpthread.so.*
race:__pthread_*
race:pthread_*
# Do not add broad glibc/pthread suppressions here.
# Patterns such as libc.so.*, libpthread.so.*, __pthread_* or pthread_*
# can match real telemetry race reports because most threaded stacks include
# libc/pthread frames. Add only narrowly scoped suppressions for specific
# known false positives and document the originating TSan report for each.

Copilot uses AI. Check for mistakes.
Comment on lines 44 to 50
{
bool enable;
bool isSchedulerstarted;
bool isUpdated;
bool reportInProgress;
atomic_bool reportInProgress; // Thread-safe atomic flag - no mutex needed for simple checks
pthread_cond_t reportInProgressCond;
pthread_mutex_t reportInProgressMutex;
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reportInProgress is now an atomic_bool, but Profile instances are commonly zeroed with memset during creation (e.g., in the parser) and some call sites allocate Profile without any explicit initialization. Using atomic operations on an object that hasn’t been initialized with atomic_init/static initializer is undefined behavior and can also be problematic for C++ unit tests that treat Profile as a POD. Consider introducing a single Profile_init() routine (called immediately after allocation) that initializes the atomic and related sync primitives, and avoid blanket memset on structs containing atomics.

Copilot uses AI. Check for mistakes.
Comment on lines +823 to +839
if(profile->enable) {
bool expected = false;
if(atomic_compare_exchange_strong(&profile->reportInProgress, &expected, true)) {
// Successfully acquired report generation rights atomically
profile->bClearSeekMap = isClearSeekMap;
/* To avoid previous report thread to go into zombie state, mark it detached. */
if (profile->threadExists)
{
T2Info("Signal Thread To restart\n");
pthread_mutex_lock(&profile->reuseThreadMutex);
pthread_cond_signal(&profile->reuseThread);
pthread_mutex_unlock(&profile->reuseThreadMutex);
}
else
{
pthread_create(&profile->reportThread, NULL, CollectAndReport, (void*)profile);
}
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NotifyTimeout now relies on an atomic CAS for reportInProgress, but it also reads/writes other shared fields (enable, bClearSeekMap, threadExists) and uses reuseThreadMutex without any broader synchronization. This can still produce data races under TSan and can lead to inconsistent behavior (e.g., seeing stale threadExists). Consider protecting these accesses with an appropriate mutex (e.g., keep using reportInProgressMutex here, or use plMutex/a per-profile mutex) or convert the other concurrently-accessed flags to atomics as well.

Copilot uses AI. Check for mistakes.
{
T2Info("Signal Thread To restart\n");

// ✅ THREAD SAFETY: Atomic compare-and-swap eliminates TOCTOU race condition
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inline comment includes a non-ASCII checkmark character. Some embedded toolchains / source-processing steps assume ASCII and can fail or produce garbled output. Prefer plain ASCII in C source comments (keep the explanation, drop the emoji).

Suggested change
// THREAD SAFETY: Atomic compare-and-swap eliminates TOCTOU race condition
// THREAD SAFETY: Atomic compare-and-swap eliminates TOCTOU race condition

Copilot uses AI. Check for mistakes.
Comment on lines +836 to 844
else
{
pthread_create(&profile->reportThread, NULL, CollectAndReport, (void*)profile);
}
}
else
{
pthread_create(&profile->reportThread, NULL, CollectAndReport, (void*)profile);
else {
// CAS failed - another thread already set reportInProgress = true
T2Warning("Report generation already in progress - ignoring the request\n");
}
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pthread_create(...) return value is not checked. If thread creation fails, reportInProgress remains true (because the CAS already succeeded) and future timeout notifications will be ignored indefinitely. Capture the return code, log it, and reset reportInProgress back to false on failure.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings April 2, 2026 20:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Comment on lines +821 to +826

// ✅ THREAD SAFETY: Atomic compare-and-swap eliminates TOCTOU race condition
if(profile->enable) {
bool expected = false;
if(atomic_compare_exchange_strong(&profile->reportInProgress, &expected, true)) {
// Successfully acquired report generation rights atomically
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block introduces non-project-standard formatting and non-ASCII characters in comments (the "✅" marker) alongside K&R-style braces, while surrounding code uses Allman braces. Please align this new code with the existing style and avoid emoji/non-ASCII in source comments to keep logs/tooling consistent across embedded toolchains.

Copilot uses AI. Check for mistakes.
Comment on lines +74 to +89
AC_ARG_ENABLE([thread-sanitizer],
AS_HELP_STRING([--enable-thread-sanitizer],[enable ThreadSanitizer for race condition detection (default is no)]),
[
case "${enableval}" in
yes) THREAD_SANITIZER_ENABLED=true
T2_THREAD_SANITIZER_CFLAGS="-fsanitize=thread -g -O1"
T2_THREAD_SANITIZER_LDFLAGS="-fsanitize=thread"
AC_MSG_NOTICE([ThreadSanitizer enabled for race condition detection])
;;
no) THREAD_SANITIZER_ENABLED=false ;;
*) AC_MSG_ERROR([bad value ${enableval} for --enable-thread-sanitizer]) ;;
esac
],
[THREAD_SANITIZER_ENABLED=false])
AM_CONDITIONAL([WITH_THREAD_SANITIZER], [test x$THREAD_SANITIZER_ENABLED = xtrue])
AC_SUBST([T2_THREAD_SANITIZER_CFLAGS])
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--enable-thread-sanitizer unconditionally injects -fsanitize=thread flags without verifying compiler/linker support. This can break configure/build on toolchains without TSAN (common in cross-compilation). Please add a compile/link check (and a clear error or automatic disable) and consider applying the flags to both C and C++ builds (CFLAGS/CXXFLAGS/LDFLAGS) since the repo’s unit tests are C++.

Suggested change
AC_ARG_ENABLE([thread-sanitizer],
AS_HELP_STRING([--enable-thread-sanitizer],[enable ThreadSanitizer for race condition detection (default is no)]),
[
case "${enableval}" in
yes) THREAD_SANITIZER_ENABLED=true
T2_THREAD_SANITIZER_CFLAGS="-fsanitize=thread -g -O1"
T2_THREAD_SANITIZER_LDFLAGS="-fsanitize=thread"
AC_MSG_NOTICE([ThreadSanitizer enabled for race condition detection])
;;
no) THREAD_SANITIZER_ENABLED=false ;;
*) AC_MSG_ERROR([bad value ${enableval} for --enable-thread-sanitizer]) ;;
esac
],
[THREAD_SANITIZER_ENABLED=false])
AM_CONDITIONAL([WITH_THREAD_SANITIZER], [test x$THREAD_SANITIZER_ENABLED = xtrue])
AC_SUBST([T2_THREAD_SANITIZER_CFLAGS])
T2_THREAD_SANITIZER_CFLAGS=""
T2_THREAD_SANITIZER_CXXFLAGS=""
T2_THREAD_SANITIZER_LDFLAGS=""
AC_ARG_ENABLE([thread-sanitizer],
AS_HELP_STRING([--enable-thread-sanitizer],[enable ThreadSanitizer for race condition detection (default is no)]),
[
case "${enableval}" in
yes)
THREAD_SANITIZER_ENABLED=true
T2_THREAD_SANITIZER_CFLAGS="-fsanitize=thread -g -O1"
T2_THREAD_SANITIZER_CXXFLAGS="-fsanitize=thread -g -O1"
T2_THREAD_SANITIZER_LDFLAGS="-fsanitize=thread"
t2_tsan_saved_CFLAGS="$CFLAGS"
t2_tsan_saved_CXXFLAGS="$CXXFLAGS"
t2_tsan_saved_LDFLAGS="$LDFLAGS"
CFLAGS="$CFLAGS $T2_THREAD_SANITIZER_CFLAGS"
LDFLAGS="$LDFLAGS $T2_THREAD_SANITIZER_LDFLAGS"
AC_LANG_PUSH([C])
AC_MSG_CHECKING([whether the C compiler and linker support ThreadSanitizer])
AC_LINK_IFELSE(
[AC_LANG_PROGRAM([], [return 0;])],
[t2_tsan_c_supported=yes],
[t2_tsan_c_supported=no])
AC_MSG_RESULT([$t2_tsan_c_supported])
AC_LANG_POP([C])
CXXFLAGS="$t2_tsan_saved_CXXFLAGS $T2_THREAD_SANITIZER_CXXFLAGS"
LDFLAGS="$t2_tsan_saved_LDFLAGS $T2_THREAD_SANITIZER_LDFLAGS"
AC_LANG_PUSH([C++])
AC_MSG_CHECKING([whether the C++ compiler and linker support ThreadSanitizer])
AC_LINK_IFELSE(
[AC_LANG_PROGRAM([], [return 0;])],
[t2_tsan_cxx_supported=yes],
[t2_tsan_cxx_supported=no])
AC_MSG_RESULT([$t2_tsan_cxx_supported])
AC_LANG_POP([C++])
CFLAGS="$t2_tsan_saved_CFLAGS"
CXXFLAGS="$t2_tsan_saved_CXXFLAGS"
LDFLAGS="$t2_tsan_saved_LDFLAGS"
AS_IF([test "x$t2_tsan_c_supported" != "xyes" -o "x$t2_tsan_cxx_supported" != "xyes"],
[AC_MSG_ERROR([--enable-thread-sanitizer was requested, but the current compiler/linker toolchain does not support -fsanitize=thread for both C and C++ builds])])
AC_MSG_NOTICE([ThreadSanitizer enabled for race condition detection])
;;
no)
THREAD_SANITIZER_ENABLED=false
;;
*)
AC_MSG_ERROR([bad value ${enableval} for --enable-thread-sanitizer])
;;
esac
],
[THREAD_SANITIZER_ENABLED=false])
AM_CONDITIONAL([WITH_THREAD_SANITIZER], [test x$THREAD_SANITIZER_ENABLED = xtrue])
AC_SUBST([T2_THREAD_SANITIZER_CFLAGS])
AC_SUBST([T2_THREAD_SANITIZER_CXXFLAGS])

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +7
# ThreadSanitizer suppression file for telemetry2_0
# Suppress known false positives and third-party library races

# Suppress races in libcurl - external library we cannot fix
race:libcurl.so.*
race:curl_*
race:Curl_*
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test/tsan.supp is added but is not referenced anywhere in the repository (no TSAN_OPTIONS=suppressions=... in scripts/CI, and no build/test integration). Without wiring, it won’t have any effect. Please document how to use it and/or update the test runner/CI to pass the suppressions file when running TSAN-instrumented binaries.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants