linux/drivers
Yuanhan Liu e9e4c377e2 md/raid5: per hash value and exclusive wait_for_stripe
I noticed heavy spin lock contention at get_active_stripe() with fsmark
multiple thread write workloads.

Here is how this hot contention comes from. We have limited stripes, and
it's a multiple thread write workload. Hence, those stripes will be taken
soon, which puts later processes to sleep for waiting free stripes. When
enough stripes(>= 1/4 total stripes) are released, all process are woken,
trying to get the lock. But there is one only being able to get this lock
for each hash lock, making other processes spinning out there for acquiring
the lock.

Thus, it's effectiveless to wakeup all processes and let them battle for
a lock that permits one to access only each time. Instead, we could make
it be a exclusive wake up: wake up one process only. That avoids the heavy
spin lock contention naturally.

To do the exclusive wake up, we've to split wait_for_stripe into multiple
wait queues, to make it per hash value, just like the hash lock.

Here are some test results I have got with this patch applied(all test run
3 times):

`fsmark.files_per_sec'
=====================

next-20150317                 this patch
-------------------------     -------------------------
metric_value     ±stddev      metric_value     ±stddev     change      testbox/benchmark/testcase-params
-------------------------     -------------------------   --------     ------------------------------
      25.600     ±0.0              92.700     ±2.5          262.1%     ivb44/fsmark/1x-64t-4BRD_12G-RAID5-btrfs-4M-30G-fsyncBeforeClose
      25.600     ±0.0              77.800     ±0.6          203.9%     ivb44/fsmark/1x-64t-9BRD_6G-RAID5-btrfs-4M-30G-fsyncBeforeClose
      32.000     ±0.0              93.800     ±1.7          193.1%     ivb44/fsmark/1x-64t-4BRD_12G-RAID5-ext4-4M-30G-fsyncBeforeClose
      32.000     ±0.0              81.233     ±1.7          153.9%     ivb44/fsmark/1x-64t-9BRD_6G-RAID5-ext4-4M-30G-fsyncBeforeClose
      48.800     ±14.5             99.667     ±2.0          104.2%     ivb44/fsmark/1x-64t-4BRD_12G-RAID5-xfs-4M-30G-fsyncBeforeClose
       6.400     ±0.0              12.800     ±0.0          100.0%     ivb44/fsmark/1x-64t-3HDD-RAID5-btrfs-4M-40G-fsyncBeforeClose
      63.133     ±8.2              82.800     ±0.7           31.2%     ivb44/fsmark/1x-64t-9BRD_6G-RAID5-xfs-4M-30G-fsyncBeforeClose
     245.067     ±0.7             306.567     ±7.9           25.1%     ivb44/fsmark/1x-64t-4BRD_12G-RAID5-f2fs-4M-30G-fsyncBeforeClose
      17.533     ±0.3              21.000     ±0.8           19.8%     ivb44/fsmark/1x-1t-3HDD-RAID5-xfs-4M-40G-fsyncBeforeClose
     188.167     ±1.9             215.033     ±3.1           14.3%     ivb44/fsmark/1x-1t-4BRD_12G-RAID5-btrfs-4M-30G-NoSync
     254.500     ±1.8             290.733     ±2.4           14.2%     ivb44/fsmark/1x-1t-9BRD_6G-RAID5-btrfs-4M-30G-NoSync

`time.system_time'
=====================

next-20150317                 this patch
-------------------------    -------------------------
metric_value     ±stddev     metric_value     ±stddev     change       testbox/benchmark/testcase-params
-------------------------    -------------------------    --------     ------------------------------
    7235.603     ±1.2             185.163     ±1.9          -97.4%     ivb44/fsmark/1x-64t-4BRD_12G-RAID5-btrfs-4M-30G-fsyncBeforeClose
    7666.883     ±2.9             202.750     ±1.0          -97.4%     ivb44/fsmark/1x-64t-9BRD_6G-RAID5-btrfs-4M-30G-fsyncBeforeClose
   14567.893     ±0.7             421.230     ±0.4          -97.1%     ivb44/fsmark/1x-64t-3HDD-RAID5-btrfs-4M-40G-fsyncBeforeClose
    3697.667     ±14.0            148.190     ±1.7          -96.0%     ivb44/fsmark/1x-64t-4BRD_12G-RAID5-xfs-4M-30G-fsyncBeforeClose
    5572.867     ±3.8             310.717     ±1.4          -94.4%     ivb44/fsmark/1x-64t-9BRD_6G-RAID5-ext4-4M-30G-fsyncBeforeClose
    5565.050     ±0.5             313.277     ±1.5          -94.4%     ivb44/fsmark/1x-64t-4BRD_12G-RAID5-ext4-4M-30G-fsyncBeforeClose
    2420.707     ±17.1            171.043     ±2.7          -92.9%     ivb44/fsmark/1x-64t-9BRD_6G-RAID5-xfs-4M-30G-fsyncBeforeClose
    3743.300     ±4.6             379.827     ±3.5          -89.9%     ivb44/fsmark/1x-64t-3HDD-RAID5-ext4-4M-40G-fsyncBeforeClose
    3308.687     ±6.3             363.050     ±2.0          -89.0%     ivb44/fsmark/1x-64t-3HDD-RAID5-xfs-4M-40G-fsyncBeforeClose

Where,

     1x: where 'x' means iterations or loop, corresponding to the 'L' option of fsmark

     1t, 64t: where 't' means thread

     4M: means the single file size, corresponding to the '-s' option of fsmark
     40G, 30G, 120G: means the total test size

     4BRD_12G: BRD is the ramdisk, where '4' means 4 ramdisk, and where '12G' means
               the size of one ramdisk. So, it would be 48G in total. And we made a
               raid on those ramdisk

As you can see, though there are no much performance gain for hard disk
workload, the system time is dropped heavily, up to 97%. And as expected,
the performance increased a lot, up to 260%, for fast device(ram disk).

v2: use bits instead of array to note down wait queue need to wake up.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-06-17 10:00:27 +10:00
..
accessibility
acpi Merge branches 'acpi-init' and 'acpica' 2015-05-15 00:31:23 +02:00
amba
android
ata Merge branch 'for-4.1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata 2015-06-08 08:47:08 -07:00
atm
auxdisplay
base drivers/base: cacheinfo: handle absence of caches 2015-06-01 10:34:24 +09:00
bcma Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2015-04-17 15:50:54 -04:00
block Merge branch 'for-linus' of git://git.kernel.dk/linux-block 2015-06-12 11:35:19 -07:00
bluetooth Bluetooth: ath3k: add support of 04ca:300f AR3012 device 2015-05-13 23:04:20 +02:00
bus mvebu fixes for 4.1 (part 3) 2015-06-01 17:03:44 +02:00
cdrom
char ipmi: Fix multi-part message handling 2015-05-05 19:37:22 -05:00
clk clk: si5351: Do not pass struct clk in platform_data 2015-05-08 11:22:30 -07:00
clocksource Initial ACPI support for arm64: 2015-04-24 08:23:45 -07:00
connector
cpufreq cpufreq: intel_pstate: Fix an annoying !CONFIG_SMP warning 2015-04-15 23:02:24 +02:00
cpuidle cpuidle: Run tick_broadcast_exit() with disabled interrupts 2015-04-29 15:19:21 +02:00
crypto Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2015-04-26 13:51:05 -07:00
dca
devfreq
dio
dma dmaengine: Fix choppy sound because of unimplemented resume 2015-06-12 15:22:26 +05:30
dma-buf dma-buf: cleanup dma_buf_export() to make it easily extensible 2015-04-21 14:47:16 +05:30
edac
eisa
extcon extcon: usb-gpio: register extcon device before IRQ registration 2015-04-27 11:06:05 +09:00
firewire
firmware Merge branch 'stable/for-linus-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/ibft 2015-06-06 09:03:54 -07:00
fmc
gpio gpio: gpio-kempld: Fix get_direction return value 2015-05-12 13:49:13 +02:00
gpu Merge tag 'drm-intel-fixes-2015-06-11' of git://anongit.freedesktop.org/drm-intel into drm-fixes 2015-06-12 10:11:50 +10:00
hid Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid 2015-05-21 17:23:11 -07:00
hsi HSI: cmt_speech: fix error return code 2015-04-05 14:45:27 +02:00
hv Drivers: hv: hv_balloon: correctly handle num_pages>INT_MAX case 2015-04-03 16:20:12 +02:00
hwmon hwmon: (nct6683) Add missing sysfs attribute initialization 2015-05-29 17:47:42 -07:00
hwspinlock
hwtracing/coresight Char/Misc driver patches for 4.1-rc1 2015-04-21 09:42:58 -07:00
i2c i2c: s3c2410: fix oops in suspend callback for non-dt platforms 2015-05-12 18:13:46 +02:00
ide Merge branch 'for-4.1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata 2015-05-11 10:54:20 -07:00
idle Power management and ACPI updates for v4.1-rc1 2015-04-14 20:21:54 -07:00
iio iio: adc: twl6030-gpadc: Fix modalias 2015-05-23 12:30:52 +01:00
infiniband Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending 2015-05-31 11:31:42 -07:00
input Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2015-06-09 15:05:27 -07:00
iommu Merge git://git.infradead.org/intel-iommu 2015-06-12 11:28:57 -07:00
ipack
irqchip Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2015-06-14 15:38:57 -10:00
isdn Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2015-04-15 09:00:47 -07:00
leds This is the bulk of GPIO changes for the v4.1 development 2015-04-18 08:22:10 -04:00
lguest lguest: fix out-by-one error in address checking. 2015-05-27 09:57:21 -07:00
macintosh Merge branch 'next-remove-ldst' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc into next 2015-04-07 13:25:14 +10:00
mailbox
mcb mcb: request_mem_region() returns NULL on error 2015-04-03 16:15:30 +02:00
md md/raid5: per hash value and exclusive wait_for_stripe 2015-06-17 10:00:27 +10:00
media media fixes for v4.1-rc3 2015-05-05 08:42:06 -07:00
memory ARM: SoC driver updates for v4.1 2015-04-22 09:18:17 -07:00
memstick memstick: mspro_block: add missing curly braces 2015-04-17 09:04:09 -04:00
message
mfd mfd: da9052: Fix broken regulator probe 2015-05-27 13:34:15 +01:00
misc Char/Misc driver patches for 4.1-rc1 2015-04-21 09:42:58 -07:00
mmc mmc: atmel-mci: fix bad variable type for clkdiv 2015-05-18 09:04:42 +02:00
mtd Two MTD fixes for 4.1: 2015-05-18 10:01:54 -07:00
net net: igb: fix the start time for periodic output signals 2015-06-11 16:04:02 -07:00
nfc NFC: logging neatening 2015-04-07 12:05:12 +02:00
ntb ntb: initialize max_mw for Atom before using it 2015-06-11 09:27:24 -04:00
nubus
of Driver core fixes for 4.1-rc7 2015-06-06 22:37:45 -07:00
oprofile Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-04-26 17:22:07 -07:00
parisc parisc: %pf is only for function pointers 2015-04-24 13:45:54 +02:00
parport
pci PCI: Preserve resource size during alignment reordering 2015-06-01 17:56:32 -05:00
pcmcia ARM: SoC cleanups for v4.1 2015-04-22 09:04:39 -07:00
phy phy: phy-rcar-gen2: Fix USBHS_UGSTS_LOCK value 2015-05-12 20:57:19 +05:30
pinctrl pinctrl: Fix gpio/pin mapping for Meson8b 2015-05-19 11:40:52 +02:00
platform thinkpad_acpi: Revert unintentional device attribute renaming 2015-05-20 02:18:12 -07:00
pnp Power management and ACPI updates for v4.1-rc1 2015-04-14 20:21:54 -07:00
power power: bq27x00_battery: Add missing MODULE_ALIAS 2015-05-01 23:01:48 +02:00
powercap powercap / RAPL: Add support for Intel Skylake processors 2015-04-15 23:06:16 +02:00
pps
ps3
ptp
pwm pwm: img: Impose upper and lower timebase steps value 2015-05-19 16:07:40 +02:00
rapidio
ras
regulator mfd: da9052: Fix broken regulator probe 2015-05-27 13:34:15 +01:00
remoteproc
reset
rpmsg
rtc drivers/rtc/rtc-armada38x.c: remove unused local `flags' 2015-05-14 17:55:51 -07:00
s390 s390/zcrypt: Fix invalid domain handling during ap module unload 2015-05-13 09:57:29 +02:00
sbus drivers/sbus/char/envctrl.c: ignore orderly_poweroff return value 2015-04-15 16:35:23 -07:00
scsi Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending 2015-05-31 11:31:42 -07:00
sfi
sh drivers: sh: Remove test for now unsupported sh7372 2015-04-27 13:08:14 +09:00
sn
soc soc: mediatek: Add compile dependency to pmic-wrapper 2015-05-27 16:27:05 +02:00
spi Merge remote-tracking branches 'spi/fix/fsl-cpm', 'spi/fix/fsl-dspi' and 'spi/fix/fsl-espi' into spi-linus 2015-05-11 17:29:49 +01:00
spmi spmi: pmic_arb: remove ARM build time dependency 2015-04-03 16:15:30 +02:00
ssb SSB: Fix handling of ssb_pmu_get_alp_clock() 2015-06-09 16:38:06 +02:00
staging staging: rtl8712: fix stack dump 2015-05-31 11:54:25 +09:00
target target: Use a PASSTHROUGH flag instead of transport_types 2015-05-30 19:58:11 -07:00
tc
thermal Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal into for-rc 2015-05-19 08:12:27 +08:00
thunderbolt
tty TTY/Serial driver fixes for 4.1-rc7 2015-06-06 22:14:23 -07:00
uio Revert "uio: constify of_device_id array" 2015-04-03 16:04:21 +02:00
usb USB-serial fixes for v4.1-rc7 2015-06-05 23:19:45 +09:00
uwb
vfio vfio: Fix runaway interruptible timeout 2015-05-01 16:31:41 -06:00
vhost target: Fix se_tpg_tfo->tf_subsys regression + remove tf_subsystem 2015-05-30 18:04:20 -07:00
video backlight: pwm: Handle EPROBE_DEFER while requesting the PWM 2015-05-26 08:44:59 +01:00
virt
virtio virtio_pci: Clear stale cpumask when setting irq affinity 2015-06-04 14:47:49 +02:00
vlynq
vme
w1
watchdog Merge git://www.linux-watchdog.org/linux-watchdog 2015-04-22 11:22:55 -07:00
xen xen/events: don't bind non-percpu VIRQs with percpu chip 2015-05-19 19:55:36 +01:00
zorro
Kconfig
Makefile coresight: moving to new "hwtracing" directory 2015-04-03 16:17:04 +02:00