Skip to content

Conversation

@github-actions
Copy link
Contributor

Cherry-picked from #59204

#59204)

During the execution of `init_file_cache_factory`, the following call
path is triggered:

```txt
init_file_cache_factory -> FileCacheFactory::create_file_cache -> cache->initialize() -> initialize_unlocked -> _storage->init(this) -> FSFileCacheStorage::init()

```

(At this point, a thread named `_cache_background_load_thread` is
created, and the remaining operations run within this thread)
`-> upgrade_cache_dir_if_necessary -> read_file_cache_version ->
FileSystem::open_file -> open_file_impl ->
LocalFileReader::LocalFileReader ->
BeConfDataDirReader::get_data_dir_by_file_path`

After `FSFileCacheStorage::init` completes (spawning the
`_cache_background_load_thread`), `ExecEnv::_init` continues to execute
`doris::io::BeConfDataDirReader::init_be_conf_data_dir`. This function
performs push operations on `be_config_data_dir_list`.

Simultaneously, `BeConfDataDirReader::get_data_dir_by_file_path`
(running in the background thread) iterates over this same
`be_config_data_dir_list`. This leads to a race condition: if
`doris::io::BeConfDataDirReader::init_be_conf_data_dir` is inserting
data while the vector is being read, two issues arise:

1. Modifying `be_config_data_dir_list` while iterating over it via a
range-based for loop results in **Undefined Behavior (UB)**.
2. If `be_config_data_dir_list` triggers a reallocation (expansion)
during the insertion, concurrent read operations on its elements will
access dangling references, triggering a **heap-use-after-free** error.

Since `init_be_conf_data_dir` depends on `cache_paths` derived from
`init_file_cache_factory`, we must carefully manage the synchronization
sequence to prevent these errors.
@github-actions github-actions bot requested a review from yiguolei as a code owner December 30, 2025 03:35
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring reopened this Dec 30, 2025
@hello-stephen
Copy link
Contributor

run buildall

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 36.36% (4/11) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.36% (18669/34990)
Line Coverage 39.12% (172942/442036)
Region Coverage 33.81% (133699/395477)
Branch Coverage 34.78% (57786/166163)

@yiguolei yiguolei merged commit 7dcce38 into branch-4.0 Dec 30, 2025
23 of 27 checks passed
@github-actions github-actions bot deleted the auto-pick-59204-branch-4.0 branch December 30, 2025 09:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants