linux/drivers
Nick Piggin b3e19d924b fs: scale mntget/mntput
The problem that this patch aims to fix is vfsmount refcounting scalability.
We need to take a reference on the vfsmount for every successful path lookup,
which often go to the same mount point.

The fundamental difficulty is that a "simple" reference count can never be made
scalable, because any time a reference is dropped, we must check whether that
was the last reference. To do that requires communication with all other CPUs
that may have taken a reference count.

We can make refcounts more scalable in a couple of ways, involving keeping
distributed counters, and checking for the global-zero condition less
frequently.

- check the global sum once every interval (this will delay zero detection
  for some interval, so it's probably a showstopper for vfsmounts).

- keep a local count and only taking the global sum when local reaches 0 (this
  is difficult for vfsmounts, because we can't hold preempt off for the life of
  a reference, so a counter would need to be per-thread or tied strongly to a
  particular CPU which requires more locking).

- keep a local difference of increments and decrements, which allows us to sum
  the total difference and hence find the refcount when summing all CPUs. Then,
  keep a single integer "long" refcount for slow and long lasting references,
  and only take the global sum of local counters when the long refcount is 0.

This last scheme is what I implemented here. Attached mounts and process root
and working directory references are "long" references, and everything else is
a short reference.

This allows scalable vfsmount references during path walking over mounted
subtrees and unattached (lazy umounted) mounts with processes still running
in them.

This results in one fewer atomic op in the fastpath: mntget is now just a
per-CPU inc, rather than an atomic inc; and mntput just requires a spinlock
and non-atomic decrement in the common case. However code is otherwise bigger
and heavier, so single threaded performance is basically a wash.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:33 +11:00
..
accessibility
acpi Merge branches 'bugzilla-25412' and 'bugzilla-25302' into release 2010-12-26 17:05:07 -05:00
amba
ata pata_cs5536: avoid implicit MSR API inclusion on x86-64 2010-12-26 19:42:15 -05:00
atm drivers/atm/atmtcp.c: add missing atm_dev_put 2010-12-31 12:52:05 -08:00
auxdisplay
base PM: Allow devices to be removed during late suspend and early resume 2010-11-11 01:50:53 +01:00
block Fix build error in drivers/block/cciss.c 2010-12-20 21:21:49 -08:00
bluetooth Bluetooth: add NULL pointer check in HCI 2010-12-08 13:22:22 -02:00
cdrom cdrom: gdrom: ctrl_in/outX to __raw_read/writeX conversion. 2010-10-27 14:33:39 +09:00
char RAMOOPS: Don't overflow over non-allocated regions 2010-12-28 11:12:32 -08:00
clocksource clocksource: sh_cmt: Remove nested spinlock fix 2010-12-17 19:38:33 +09:00
connector connector: add module alias 2010-12-10 12:27:49 -08:00
cpufreq
cpuidle
crypto Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2010-11-13 09:55:56 -08:00
dca
dio
dma Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx 2011-01-03 11:48:54 -08:00
edac amd64_edac: Fix interleaving check 2010-12-08 19:52:54 +01:00
eisa
firewire firewire: ohci: fix regression with Agere FW643 rev 06, disable MSI 2010-12-12 15:47:02 +01:00
firmware dmi: log board, system, and BIOS information 2010-10-27 18:03:05 -07:00
gpio cs5535-gpio: handle GPIO regs where higher (clear) bits are set 2010-12-23 15:31:48 -08:00
gpu drm/i915/dvo: Report LVDS attached to ch701x as connected 2010-12-30 13:50:43 +00:00
hid Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid 2010-12-02 17:40:04 -08:00
hwmon hwmon: (s3c-hwmon) Fix compilation 2011-01-02 15:31:11 -08:00
i2c i2c_intel_mid: Fix slash in sysfs name 2010-12-14 18:46:01 -08:00
ide ide: clean up timed out request handling 2010-10-26 10:17:30 -07:00
idle intel_idle: recognize ARAT on WSM-EX 2010-12-02 01:19:32 -05:00
ieee802154
infiniband fs: dcache rationalise dget variants 2011-01-07 17:50:24 +11:00
input Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2010-12-16 08:33:44 -08:00
isdn ISDN, Gigaset: Fix memory leak in do_disconnect_req() 2010-12-31 11:17:10 -08:00
leds led_class: fix typo in blink API 2010-12-22 19:43:34 -08:00
lguest
macintosh leds: fix up dependencies 2010-12-02 14:51:15 -08:00
mca
md Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block 2010-12-20 09:19:46 -08:00
media [media] em28xx: radio_fops should also use unlocked_ioctl 2011-01-03 09:52:25 -02:00
memstick
message SCSI host lock push-down 2010-11-16 13:33:23 -08:00
mfd mfd: Support additional parent IDs for wm831x 2010-12-22 12:05:22 +01:00
misc drivers/misc/isl29020.c: remove incorrect kfree in isl29020_remove() 2010-11-25 06:50:47 +09:00
mmc mmc: Fix re-probing with PM_POST_RESTORE notification 2010-12-21 11:46:49 -08:00
mtd fs: scale mntget/mntput 2011-01-07 17:50:33 +11:00
net atl1: fix oops when changing tx/rx ring params 2011-01-03 11:04:49 -08:00
nubus
of of/i2c: Fix request module by alias 2010-12-24 01:28:54 -07:00
oprofile Merge branches 'perf-fixes-for-linus' and 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-10-30 11:43:26 -07:00
parisc parisc: KittyHawk LCD fix 2010-12-04 11:18:25 -05:00
parport
pci PCI hotplug: Fix unexpected driver unregister in pciehp_acpi.c 2010-12-23 12:51:49 -08:00
pcmcia ARM: 6456/1: Fix for building DEBUG with sa11xx_base.c as a module. 2010-12-04 12:47:48 +00:00
platform drm/i915, intel_ips: When i915 loads after IPS, make IPS relink to i915. 2010-12-23 09:51:36 +00:00
pnp ACPI/PNP: avoid section mismatch warning 2010-12-11 02:01:47 -05:00
power power: Revert "power_supply: Mark twl4030_charger as broken" 2010-10-29 00:30:44 +02:00
pps
ps3
rapidio rapidio: use resource_size() 2010-11-12 07:55:30 -08:00
regulator regulator: tps6586x: correct register table 2010-12-09 09:23:43 +00:00
rtc rtc: rs5c372: fix buffer size 2010-12-22 19:43:34 -08:00
s390 [SCSI] zfcp: Issue FCP command without holding SCSI host_lock 2010-12-09 09:41:23 -06:00
sbus Merge branch 'next-devicetree' of git://git.secretlab.ca/git/linux-2.6 2010-10-25 08:19:14 -07:00
scsi Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6 2010-12-24 12:58:43 -08:00
serial Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb 2010-12-14 14:35:04 -08:00
sfi
sh sh: intc: Initialize radix tree gfp mask explicitly. 2010-12-24 19:38:37 +09:00
sn
spi spi/m68knommu: Coldfire QSPI platform support 2010-12-29 23:28:25 -07:00
ssb ssb: b43-pci-bridge: Add new vendor for BCM4318 2010-11-22 15:19:31 -05:00
staging fs: provide rcu-walk aware permission i_ops 2011-01-07 17:50:29 +11:00
tc
telephony
thermal
tty n_gsm: gsm_data_alloc buffer allocation could fail and it is not being checked 2010-12-16 13:03:13 -08:00
uio uio: Change mail address of Hans J. Koch 2010-11-10 16:57:11 -08:00
usb fs: dcache remove dcache_lock 2011-01-07 17:50:23 +11:00
uwb UWB: Return UWB_RSV_ALLOC_NOT_FOUND rather than crashing on NULL dereference if kzalloc fails 2010-11-11 07:14:07 -08:00
vhost vhost: correctly set bits of dirty pages 2010-11-29 10:26:55 +02:00
video Merge branch 'fbdev-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/fbdev-2.6 2010-12-27 10:33:30 -08:00
virtio virtio: fix format of sysfs driver/vendor files 2010-11-24 15:21:12 +10:30
vlynq
w1 w1: don't allow arbitrary users to remove w1 devices 2010-10-27 18:03:17 -07:00
watchdog watchdog: Fix null pointer dereference while accessing rdc321x platform_data 2010-12-22 12:05:21 +01:00
xen Merge branch '2.6.37-rc4-pvhvm-fixes' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm 2010-12-03 11:30:57 -08:00
zorro BKL: remove extraneous #include <smp_lock.h> 2010-11-17 08:59:32 -08:00
Kconfig
Makefile TTY: create drivers/tty and move the tty core files there 2010-11-05 08:10:33 -07:00