linux/drivers
Nick Piggin 557ed1fa26 remove ZERO_PAGE
The commit b5810039a5 contains the note

  A last caveat: the ZERO_PAGE is now refcounted and managed with rmap
  (and thus mapcounted and count towards shared rss).  These writes to
  the struct page could cause excessive cacheline bouncing on big
  systems.  There are a number of ways this could be addressed if it is
  an issue.

And indeed this cacheline bouncing has shown up on large SGI systems.
There was a situation where an Altix system was essentially livelocked
tearing down ZERO_PAGE pagetables when an HPC app aborted during startup.
This situation can be avoided in userspace, but it does highlight the
potential scalability problem with refcounting ZERO_PAGE, and corner
cases where it can really hurt (we don't want the system to livelock!).

There are several broad ways to fix this problem:
1. add back some special casing to avoid refcounting ZERO_PAGE
2. per-node or per-cpu ZERO_PAGES
3. remove the ZERO_PAGE completely

I will argue for 3. The others should also fix the problem, but they
result in more complex code than does 3, with little or no real benefit
that I can see.

Why? Inserting a ZERO_PAGE for anonymous read faults appears to be a
false optimisation: if an application is performance critical, it would
not be doing many read faults of new memory, or at least it could be
expected to write to that memory soon afterwards. If cache or memory use
is critical, it should not be working with a significant number of
ZERO_PAGEs anyway (a more compact representation of zeroes should be
used).

As a sanity check -- mesuring on my desktop system, there are never many
mappings to the ZERO_PAGE (eg. 2 or 3), thus memory usage here should not
increase much without it.

When running a make -j4 kernel compile on my dual core system, there are
about 1,000 mappings to the ZERO_PAGE created per second, but about 1,000
ZERO_PAGE COW faults per second (less than 1 ZERO_PAGE mapping per second
is torn down without being COWed). So removing ZERO_PAGE will save 1,000
page faults per second when running kbuild, while keeping it only saves
less than 1 page clearing operation per second. 1 page clear is cheaper
than a thousand faults, presumably, so there isn't an obvious loss.

Neither the logical argument nor these basic tests give a guarantee of no
regressions. However, this is a reasonable opportunity to try to remove
the ZERO_PAGE from the pagefault path. If it is found to cause regressions,
we can reintroduce it and just avoid refcounting it.

The /dev/zero ZERO_PAGE usage and TLB tricks also get nuked.  I don't see
much use to them except on benchmarks.  All other users of ZERO_PAGE are
converted just to use ZERO_PAGE(0) for simplicity. We can look at
replacing them all and maybe ripping out ZERO_PAGE completely when we are
more satisfied with this solution.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus "snif" Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:53 -07:00
..
acorn/char Remove the arm26 port 2007-07-31 15:39:39 -07:00
acpi more trivial signedness fixes in drivers 2007-10-14 12:41:52 -07:00
amba Driver core: change add_uevent_var to use a struct 2007-10-12 14:51:01 -07:00
ata docbook: fix libata content 2007-10-15 17:56:36 -07:00
atm more trivial signedness fixes in drivers 2007-10-14 12:41:52 -07:00
auxdisplay cfag12864b fix 2007-08-22 19:52:46 -07:00
base sparsemem: record when a section has a valid mem_map 2007-10-16 09:42:51 -07:00
block more trivial signedness fixes in drivers 2007-10-14 12:41:52 -07:00
bluetooth [Bluetooth] Add missing stat.byte_rx counter modification 2007-09-09 08:39:27 +02:00
cdrom [POWERPC] iSeries: Move detection of virtual cdroms 2007-10-11 20:40:47 +10:00
char remove ZERO_PAGE 2007-10-16 09:42:53 -07:00
clocksource
connector [NET]: make netlink user -> kernel interface synchronious 2007-10-10 21:15:29 -07:00
cpufreq Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6 2007-10-12 15:49:37 -07:00
crypto [CRYPTO] sha: Add header file for SHA definitions 2007-10-10 16:55:50 -07:00
dio
dma [IOAT]: ioatdma needs to to play nice in a multi-dma-client world 2007-08-26 18:35:40 -07:00
edac Drivers: clean up direct setting of the name of a kset 2007-10-12 14:51:02 -07:00
eisa signedness: module_param_array nump argument 2007-10-14 12:41:52 -07:00
fc4 Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 2007-10-15 08:19:33 -07:00
firewire fw-cdev __user annotations 2007-10-14 12:41:51 -07:00
firmware Driver core: rename ktype_edd and ktype_efivar 2007-10-12 14:51:12 -07:00
hid HID: fix HIDIOCGRDESC memory access in hidraw 2007-10-15 08:12:00 -07:00
hwmon Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2007-10-15 13:41:39 -07:00
i2c Merge branch 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm 2007-10-15 16:08:50 -07:00
ide alim15x3: remove redundant m5229_revision check 2007-10-13 17:47:53 +02:00
ieee1394 Driver core: change add_uevent_var to use a struct 2007-10-12 14:51:01 -07:00
infiniband IB/ipoib: Verify address handle validity on send 2007-10-15 14:20:45 -04:00
input Merge branch 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm 2007-10-15 16:08:50 -07:00
isdn [ISDN]: Fix compile with CONFIG_ISDN_X25 disabled. 2007-10-15 12:52:20 -07:00
kvm sched: guest CPU accounting: maintain guest state in KVM 2007-10-15 17:00:19 +02:00
leds [ARM] 4576/1: CM-X270 machine support 2007-10-15 18:53:57 +01:00
lguest fix modules oopsing in lguest guests 2007-09-25 08:51:04 -07:00
macintosh Merge master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 2007-10-12 21:27:47 -04:00
mca
md dm: emc_endio returns void 2007-10-13 09:41:03 -07:00
media signedness: module_param_array nump argument 2007-10-14 12:41:52 -07:00
message docbook: fix kernel-api content 2007-10-15 17:56:36 -07:00
mfd
misc Map volume and brightness events on thinkpads 2007-10-15 13:54:40 -07:00
mmc Merge branch 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm 2007-10-15 16:08:50 -07:00
mtd Reinstate lost flush_ioremap_region() fix to pxa2xx-flash driver 2007-10-15 12:55:20 -07:00
net Merge branch 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm 2007-10-15 16:08:50 -07:00
nubus
of
oprofile
parisc [NET]: Make the device list and device lookups per namespace. 2007-10-10 16:49:10 -07:00
parport parport_pc locking fix 2007-07-31 15:39:37 -07:00
pci Get rid of unused variable warning in drivers/pci/hotplug/pci_hotplug_core.c 2007-10-15 09:07:58 -07:00
pcmcia pcmcia: use DMA_MASK_NONE for the default for all pcmcia devices 2007-10-16 09:42:50 -07:00
pnp drivers/firmware: const-ify DMI API and internals 2007-10-09 20:22:20 -04:00
power Driver core: change add_uevent_var to use a struct 2007-10-12 14:51:01 -07:00
ps3
rapidio
rtc rtc: rtc-sh: Support 4-digit year on SH7705/SH7710/SH7712. 2007-09-21 11:57:47 +09:00
s390 Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 2007-10-15 08:19:33 -07:00
sbus Videopix Frame Grabber: Fix unreleased lock in vfc_debug() 2007-07-31 15:39:43 -07:00
scsi scsi/gdth: fix crash in gdth_timeout if no gdth controllers found 2007-10-15 12:46:16 -07:00
serial Add support for Wacom WACF007 and WACF008 to serial pnp driver 2007-10-16 09:42:50 -07:00
sh sh: Add maple bus support for the SEGA Dreamcast. 2007-09-21 15:55:55 +09:00
sn
spi Driver core: change add_uevent_var to use a struct 2007-10-12 14:51:01 -07:00
ssb missing include in ssb 2007-10-14 08:53:33 -07:00
tc
telephony
uio
usb docbook: fix usb content 2007-10-15 17:56:36 -07:00
video Merge branch 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm 2007-10-15 16:08:50 -07:00
w1 Driver core: change add_uevent_var to use a struct 2007-10-12 14:51:01 -07:00
xen xenbus_xs.c: fix a use-after-free 2007-07-26 11:35:17 -07:00
zorro zorro: Make sysfs config attribute read-only 2007-08-22 19:52:45 -07:00
Kconfig [SSB]: add Sonics Silicon Backplane bus support 2007-10-10 16:51:36 -07:00
Makefile [SSB]: add Sonics Silicon Backplane bus support 2007-10-10 16:51:36 -07:00