Go to file
Filipe Manana 4c8f353272 btrfs: fix filesystem corruption after a device replace
We use a device's allocation state tree to track ranges in a device used
for allocated chunks, and we set ranges in this tree when allocating a new
chunk. However after a device replace operation, we were not setting the
allocated ranges in the new device's allocation state tree, so that tree
is empty after a device replace.

This means that a fitrim operation after a device replace will trim the
device ranges that have allocated chunks and extents, as we trim every
range for which there is not a range marked in the device's allocation
state tree. It is also important during chunk allocation, since the
device's allocation state is used to determine if a range is already
allocated when allocating a new chunk.

This is trivial to reproduce and the following script triggers the bug:

  $ cat reproducer.sh
  #!/bin/bash

  DEV1="/dev/sdg"
  DEV2="/dev/sdh"
  DEV3="/dev/sdi"

  wipefs -a $DEV1 $DEV2 $DEV3 &> /dev/null

  # Create a raid1 test fs on 2 devices.
  mkfs.btrfs -f -m raid1 -d raid1 $DEV1 $DEV2 > /dev/null
  mount $DEV1 /mnt/btrfs

  xfs_io -f -c "pwrite -S 0xab 0 10M" /mnt/btrfs/foo

  echo "Starting to replace $DEV1 with $DEV3"
  btrfs replace start -B $DEV1 $DEV3 /mnt/btrfs
  echo

  echo "Running fstrim"
  fstrim /mnt/btrfs
  echo

  echo "Unmounting filesystem"
  umount /mnt/btrfs

  echo "Mounting filesystem in degraded mode using $DEV3 only"
  wipefs -a $DEV1 $DEV2 &> /dev/null
  mount -o degraded $DEV3 /mnt/btrfs
  if [ $? -ne 0 ]; then
          dmesg | tail
          echo
          echo "Failed to mount in degraded mode"
          exit 1
  fi

  echo
  echo "File foo data (expected all bytes = 0xab):"
  od -A d -t x1 /mnt/btrfs/foo

  umount /mnt/btrfs

When running the reproducer:

  $ ./replace-test.sh
  wrote 10485760/10485760 bytes at offset 0
  10 MiB, 2560 ops; 0.0901 sec (110.877 MiB/sec and 28384.5216 ops/sec)
  Starting to replace /dev/sdg with /dev/sdi

  Running fstrim

  Unmounting filesystem
  Mounting filesystem in degraded mode using /dev/sdi only
  mount: /mnt/btrfs: wrong fs type, bad option, bad superblock on /dev/sdi, missing codepage or helper program, or other error.
  [19581.748641] BTRFS info (device sdg): dev_replace from /dev/sdg (devid 1) to /dev/sdi started
  [19581.803842] BTRFS info (device sdg): dev_replace from /dev/sdg (devid 1) to /dev/sdi finished
  [19582.208293] BTRFS info (device sdi): allowing degraded mounts
  [19582.208298] BTRFS info (device sdi): disk space caching is enabled
  [19582.208301] BTRFS info (device sdi): has skinny extents
  [19582.212853] BTRFS warning (device sdi): devid 2 uuid 1f731f47-e1bb-4f00-bfbb-9e5a0cb4ba9f is missing
  [19582.213904] btree_readpage_end_io_hook: 25839 callbacks suppressed
  [19582.213907] BTRFS error (device sdi): bad tree block start, want 30490624 have 0
  [19582.214780] BTRFS warning (device sdi): failed to read root (objectid=7): -5
  [19582.231576] BTRFS error (device sdi): open_ctree failed

  Failed to mount in degraded mode

So fix by setting all allocated ranges in the replace target device when
the replace operation is finishing, when we are holding the chunk mutex
and we can not race with new chunk allocations.

A test case for fstests follows soon.

Fixes: 1c11b63eff ("btrfs: replace pending/pinned chunks lists with io tree")
CC: stable@vger.kernel.org # 5.2+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-09-30 19:40:51 +02:00
Documentation Char/misc driver fixes for 5.8-rc7 2020-07-26 09:33:25 -07:00
LICENSES LICENSES: Rename other to deprecated 2019-05-03 06:34:32 -06:00
arch Merge branch 'parisc-5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux into master 2020-07-26 12:14:46 -07:00
block block-5.8-2020-07-10 2020-07-10 09:55:46 -07:00
certs .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
crypto keys: asymmetric: fix error return code in software_key_query() 2020-07-15 15:49:04 -07:00
drivers Char/misc driver fixes for 5.8-rc7 2020-07-26 09:33:25 -07:00
fs btrfs: fix filesystem corruption after a device replace 2020-09-30 19:40:51 +02:00
include btrfs: add metadata_uuid to FS_INFO ioctl 2020-07-27 12:55:43 +02:00
init kbuild: fix CONFIG_CC_CAN_LINK(_STATIC) for cross-compilation with Clang 2020-07-02 00:57:45 +09:00
ipc mmap locking API: use coccinelle to convert mmap_sem rwsem call sites 2020-06-09 09:39:14 -07:00
kernel Fix a interaction/regression between uprobes based shared library tracing & GDB. 2020-07-25 13:55:38 -07:00
lib RISC-V Fixes for 5.8-rc5 (ideally) 2020-07-11 19:22:46 -07:00
mm khugepaged: fix null-pointer dereference due to race 2020-07-24 12:42:41 -07:00
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net into master 2020-07-25 11:50:59 -07:00
samples samples/vfs: avoid warning in statx override 2020-07-03 16:15:25 -07:00
scripts Kbuild fixes for v5.8 (3rd) 2020-07-26 13:46:57 -07:00
security integrity/ima: switch to using __kernel_read 2020-07-08 08:27:57 +02:00
sound sound fixes for 5.8-rc7 2020-07-21 08:06:45 -07:00
tools Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net into master 2020-07-25 11:50:59 -07:00
usr bpfilter: match bit size of bpfilter_umh to that of the kernel 2020-05-17 18:52:01 +09:00
virt kvm: use more precise cast and do not drop __user 2020-07-02 05:39:31 -04:00
.clang-format block: add bio_for_each_bvec_all() 2020-05-25 11:25:24 +02:00
.cocciconfig
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore .gitignore: Do not track `defconfig` from `make savedefconfig` 2020-07-05 16:15:46 +09:00
.mailmap mailmap: add entry for Mike Rapoport 2020-07-24 12:42:41 -07:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS mailmap: change email for Ricardo Ribalda 2020-05-25 18:59:59 -06:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS Merge branch 'akpm' into master (patches from Andrew) 2020-07-24 14:24:35 -07:00
Makefile Linux 5.8-rc7 2020-07-26 14:14:06 -07:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.