Skip to content

fix: Refactor the file lock#189

Merged
liulx20 merged 22 commits intoalibaba:mainfrom
liulx20:signal
Apr 15, 2026
Merged

fix: Refactor the file lock#189
liulx20 merged 22 commits intoalibaba:mainfrom
liulx20:signal

Conversation

@liulx20
Copy link
Copy Markdown
Collaborator

@liulx20 liulx20 commented Apr 8, 2026

Fixes

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@qodo-code-review
Copy link
Copy Markdown

Review Summary by Qodo

Fix signal handler to only handle graceful termination signals

🐞 Bug fix

Grey Divider

Walkthroughs

Description
• Restrict signal handler to graceful termination signals only
• Remove unsafe crash signal handling (SIGSEGV, SIGABRT, SIGFPE)
• Replace exit() with std::exit() for proper cleanup
• Add clarifying comments about signal handling limitations
Diagram
flowchart LR
  A["Signal Handler Setup"] --> B["Remove Crash Signals"]
  B --> C["Keep SIGINT/SIGTERM/SIGQUIT"]
  C --> D["Use std::exit()"]
  D --> E["Improved Safety"]
Loading

Grey Divider

File Changes

1. tools/python_bind/src/neug_binding.cc 🐞 Bug fix +12/-16

Restrict signal handler to graceful termination signals

• Added #include <cstdlib> for std::exit()
• Restricted signal_handler() to only handle SIGINT, SIGTERM, SIGQUIT
• Removed handling of crash signals (SIGSEGV, SIGABRT, SIGFPE) and SIGKILL
• Changed exit(signal) to std::exit(128 + signal) for proper exit codes
• Simplified control flow by removing else branch
• Updated comments to document signal handling safety considerations
• Removed duplicate SIGKILL registration and SIGHUP handler

tools/python_bind/src/neug_binding.cc


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review bot commented Apr 8, 2026

Code Review by Qodo

🐞 Bugs (2)   📘 Rule violations (0)   📎 Requirement gaps (0)   🎨 UX Issues (0)
🐞\ ☼ Reliability (2)

Grey Divider


Action required

1. Unsafe signal-handler cleanup 🐞
Description
neug::signal_handler() runs logging, filesystem/STL operations, and optional cpptrace backtrace
generation inside an async signal context, which can deadlock or crash when SIGTERM/SIGINT/SIGQUIT
arrives. This also risks data races because CleanupAllLocks() mutates a global std::set that is
concurrently modified by FileLock::lock()/unlock().
Code

tools/python_bind/src/neug_binding.cc[R42-54]

+  // NOTE:
+  // - SIGKILL cannot be handled.
+  // - Avoid handling crash signals (SIGSEGV/SIGABRT/SIGFPE) here because
+  //   complex cleanup/logging in async signal context is unsafe.
+  if (signal == SIGINT || signal == SIGTERM || signal == SIGQUIT) {
    LOG(ERROR) << "Received signal " << signal << ", Remove all filelocks";
    // remove all files in work_dir
    neug::FileLock::CleanupAllLocks();
#ifdef NEUG_BACKTRACE
    cpptrace::generate_trace(1 /*skip this function's frame*/).print();
#endif
-    exit(signal);
-  } else {
-    LOG(ERROR) << "Received unexpected signal " << signal << ", exiting...";
-    exit(1);
+    std::exit(128 + signal);
  }
Evidence
The Python binding’s signal handler directly calls glog and performs lock cleanup/backtrace work.
CleanupAllLocks() iterates and clears a process-global std::set and calls std::filesystem::exists +
std::remove, while the same std::set is mutated during normal execution in lock()/unlock(); invoking
these from a signal handler can deadlock, crash, or exhibit undefined behavior due to reentrancy and
races.

tools/python_bind/src/neug_binding.cc[41-58]
include/neug/main/file_lock.h[37-44]
src/main/file_lock.cc[30-108]
src/main/file_lock.cc[110-123]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`neug::signal_handler()` currently performs non-async-signal-safe operations (glog logging, `std::filesystem`, `std::set` mutation, and optional cpptrace printing). This can deadlock/crash because a signal can interrupt code paths that hold internal locks (e.g., logging), and it can race with normal `FileLock::lock()/unlock()` operations.

### Issue Context
Signal handlers should do only async-signal-safe work (typically set a `volatile sig_atomic_t` / atomic flag and possibly call `_Exit`), and defer complex cleanup to a safe context (main thread, shutdown thread, or normal teardown).

### Fix Focus Areas
- tools/python_bind/src/neug_binding.cc[41-58]
- include/neug/main/file_lock.h[37-44]

### Implementation direction
- Replace handler body with minimal operations:
 - Set a global `volatile sig_atomic_t` (or similar) flag indicating shutdown requested.
 - Optionally write a short message with `::write(2, ...)` (not `LOG`).
 - If you must terminate immediately, prefer `std::_Exit(128 + signal)` / `::_exit(...)` rather than `std::exit()`.
- Perform `FileLock::CleanupAllLocks()` (and any backtrace printing) from a non-signal context (e.g., during orderly shutdown / module finalization / a dedicated shutdown thread that polls the flag).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. No stale-lock auto-recovery 🐞
Description
After removing crash-signal handling, crashes (e.g., SIGABRT from LOG(FATAL)) can leave neugdb.lock
behind, and FileLock::lock() treats any existing lock file as active without checking whether the
owning PID is still alive. This can block reopening the database until the user manually removes the
lock file.
Code

tools/python_bind/src/neug_binding.cc[R60-68]

void setup_signal_handler(bool is_interactive) {
-  // Register handlers for SIGKILL, SIGINT, SIGTERM, SIGSEGV, SIGABRT
-  // LOG(FATAL) cause SIGABRT
+  // Register handlers for graceful termination signals only.
+  // SIGKILL cannot be handled.
  if (is_interactive) {
    std::signal(SIGINT, signal_handler);
  }
  std::signal(SIGTERM, signal_handler);
-  std::signal(SIGKILL, signal_handler);
-  std::signal(SIGSEGV, signal_handler);
-  std::signal(SIGABRT, signal_handler);
-  std::signal(SIGFPE, signal_handler);
  std::signal(SIGQUIT, signal_handler);
-  std::signal(SIGKILL, signal_handler);
-  std::signal(SIGHUP, signal_handler);
}
Evidence
The binding now registers only SIGINT/SIGTERM/SIGQUIT, so SIGABRT/SIGSEGV won’t trigger lock
cleanup. Meanwhile, FileLock writes the PID into the lock file but does not use it to detect stale
locks: it fails/throws when the file exists rather than checking whether that PID is still running
and cleaning up automatically.

tools/python_bind/src/neug_binding.cc[60-68]
src/main/file_lock.cc[34-71]
src/main/file_lock.cc[92-105]
tools/utils/bulk_loader.cc[189-194]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`FileLock::lock()` refuses to proceed if the lock file exists, even if the process that created it has died. Since crashes can leave the lock behind, this can unnecessarily block reopening the DB.

### Issue Context
The lock file content includes a PID (`WRITE:<pid>` / `READ:<pid>`). That PID can be used to detect whether the lock owner is still alive and auto-remove stale locks.

### Fix Focus Areas
- src/main/file_lock.cc[30-71]
- src/main/file_lock.cc[92-105]

### Implementation direction
- When the lock file exists:
 - Read and parse the PID from the file content.
 - Check if the PID is still alive (e.g., on POSIX: `kill(pid, 0)` and inspect errno).
 - If not alive, remove the lock file and retry acquiring the lock.
- Keep the existing “manual removal” error path only when the PID is alive (or PID parsing fails).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

Comment thread tools/python_bind/src/neug_binding.cc Outdated
@liulx20 liulx20 requested a review from zhanglei1949 April 13, 2026 07:11
@liulx20 liulx20 requested a review from luoxiaojian April 14, 2026 07:07
@liulx20 liulx20 changed the title fix: fix the signal processing fix: Refactor the file lock Apr 14, 2026
Comment thread src/main/file_lock.cc Outdated
}
} // namespace neug

bool get_is_interactive() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lnfjpt Is it ok to remove this code?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Refactors the database directory locking mechanism to use OS-level file locks and removes prior signal-based cleanup, while updating error handling and related tests.

Changes:

  • Reworked FileLock to use fcntl locks and added in-process lock tracking.
  • Updated lock-related exception messages and adjusted Python tests accordingly.
  • Hardened temp directory cleanup in NeugDB destructor with exception handling.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
tools/python_bind/tests/test_db_transaction.py Updates assertions to match new lock error message wording.
tools/python_bind/src/neug_binding.cc Removes signal handler cleanup logic tied to previous lock-file behavior.
src/main/neug_db.cc Wraps temp-dir cleanup in try/catch; changes lock acquisition to return a specific locked exception message.
src/main/file_lock.cc Implements fcntl-based locking and in-process tracking; changes lock/unlock semantics and messaging.
include/neug/main/file_lock.h Updates FileLock API/fields; removes global lock cleanup API.
bin/rt_server.cc Removes signal handler cleanup logic tied to previous lock-file behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/main/file_lock.cc
Comment thread src/main/file_lock.cc
Comment thread src/main/file_lock.cc Outdated
Comment thread src/main/file_lock.cc
Comment thread src/main/neug_db.cc
Comment thread tools/python_bind/tests/test_db_transaction.py Outdated
Comment thread tools/python_bind/tests/test_db_transaction.py Outdated
Comment thread tools/python_bind/tests/test_db_transaction.py Outdated
Comment thread src/main/file_lock.cc Outdated
allLockFiles.erase(lock_file_path_);
std::string error_msg;
if (!lock(F_UNLCK, true, error_msg)) {
LOG(WARNING) << "Failed to unlock file lock: " << error_msg;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be LOG(ERROR) ?

Copy link
Copy Markdown
Member

@zhanglei1949 zhanglei1949 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@liulx20 liulx20 requested a review from lnfjpt April 15, 2026 11:32
@liulx20 liulx20 merged commit fb4ed90 into alibaba:main Apr 15, 2026
16 checks passed
@liulx20 liulx20 deleted the signal branch April 15, 2026 11:34
@liulx20 liulx20 linked an issue Apr 16, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix the signal processing

4 participants