Skip to content

linux-kdevops/20250416-ext4-jbd2-bh-migrate-corruption

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ext4 jbd2 buffer-head migrate corruption

Filesystems which use buffer-heads where it cannot guarantees that there are no other references to the folio, for example with a folio lock, must use buffer_migrate_folio_norefs() for the address space mapping migrate_folio() callback. There are only 3 filesystems which use this callback:

  1. the block device cache
  2. ext4 for its ext4_journalled_aops, ie, jbd2
  3. nilfs2

jbd2's use of this however callback however is very race prone, consider folio migration while reviewing jbd2_journal_write_metadata_buffer() and the fact that jbd2:

  • does not hold the folio lock
  • does not have have page writeback bit set
  • does not lock the buffer

And so, it can race with folio_set_bh() on folio migration. The commit ebdf4de5642fb6 ("mm: migrate: fix reference check race between __find_get_block() and migration") added a spin lock to prevent races with page migration which ext4 users were reporting through the SUSE bugzilla bnc#1137609 .

Although we don't have exact traces of the original filesystem corruption we can can reproduce filesystem corruption on ext4 on Linus' tree today on v6.15-rc2, that is with commit ebdf4de5642fb6 merged, by running the generic/750 for about ~ 20 hours on ext4 2k block size filesystem profile.

Reproducing with kdevops

This is easily reproducible with kdevops using:

make defconfig-ext4_2k SOAK_DURATION=432000
make -j128
make bringup
make linux
make fstests
make fstests-baseline TESTS="generic/750"

Traces

See the traces/ directory.

General pattern

We now have a slew of traces collected for the ext4 corruptions possible, we've used ChatGPT provide a summary of them:

do_writepages() # write back -->
   ext4_map_block() # performs logical to physical block mapping -->
     ext4_ext_insert_extent() # updates extent tree -->
       jbd2_journal_dirty_metadata()  # marks metadata as dirty for
                                      # journaling. This can lead
                                      # to any of the following hints
                                      # as to what happened from
                                      # ext4 / jbd2

         - Directory and extent metadata corruption splats or

         - Failure to handle out-of-space conditions gracefully, with
           cascading metadata errors and eventual filesystem shutdown
           to prevent further damage.

         - Failure to journal new extent metadata during extent tree
           growth, triggered under memory pressure or heavy writeback.
           Commonly results in ENOSPC, journal abort, and read-only
           fallback. **

         - Journal metadata failure during extent tree growth causes
           read-only fallback. Seen repeatedly on small-block (2k)
           filesystems under stress (e.g. fsstress). Triggers errors in
           bitmap and inode updates, and persists in journal replay logs.
           "Error count since last fsck" shows long-term corruption
           footprint.

Call trace (ENOSPC journal failure):
  do_writepages()
    → ext4_do_writepages()
      → ext4_map_blocks()
        → ext4_ext_map_blocks()
          → ext4_ext_insert_extent()
            → __ext4_handle_dirty_metadata()
              → jbd2_journal_dirty_metadata() → ERROR -28 (ENOSPC)

And so jbd2 still needs more work to avoid races with folio migration.

About

ext4 corruption details on v6.15-rc1

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors