linux/drivers/char
Nick Piggin 557ed1fa26 remove ZERO_PAGE
The commit b5810039a5 contains the note

  A last caveat: the ZERO_PAGE is now refcounted and managed with rmap
  (and thus mapcounted and count towards shared rss).  These writes to
  the struct page could cause excessive cacheline bouncing on big
  systems.  There are a number of ways this could be addressed if it is
  an issue.

And indeed this cacheline bouncing has shown up on large SGI systems.
There was a situation where an Altix system was essentially livelocked
tearing down ZERO_PAGE pagetables when an HPC app aborted during startup.
This situation can be avoided in userspace, but it does highlight the
potential scalability problem with refcounting ZERO_PAGE, and corner
cases where it can really hurt (we don't want the system to livelock!).

There are several broad ways to fix this problem:
1. add back some special casing to avoid refcounting ZERO_PAGE
2. per-node or per-cpu ZERO_PAGES
3. remove the ZERO_PAGE completely

I will argue for 3. The others should also fix the problem, but they
result in more complex code than does 3, with little or no real benefit
that I can see.

Why? Inserting a ZERO_PAGE for anonymous read faults appears to be a
false optimisation: if an application is performance critical, it would
not be doing many read faults of new memory, or at least it could be
expected to write to that memory soon afterwards. If cache or memory use
is critical, it should not be working with a significant number of
ZERO_PAGEs anyway (a more compact representation of zeroes should be
used).

As a sanity check -- mesuring on my desktop system, there are never many
mappings to the ZERO_PAGE (eg. 2 or 3), thus memory usage here should not
increase much without it.

When running a make -j4 kernel compile on my dual core system, there are
about 1,000 mappings to the ZERO_PAGE created per second, but about 1,000
ZERO_PAGE COW faults per second (less than 1 ZERO_PAGE mapping per second
is torn down without being COWed). So removing ZERO_PAGE will save 1,000
page faults per second when running kbuild, while keeping it only saves
less than 1 page clearing operation per second. 1 page clear is cheaper
than a thousand faults, presumably, so there isn't an obvious loss.

Neither the logical argument nor these basic tests give a guarantee of no
regressions. However, this is a reasonable opportunity to try to remove
the ZERO_PAGE from the pagefault path. If it is found to cause regressions,
we can reintroduce it and just avoid refcounting it.

The /dev/zero ZERO_PAGE usage and TLB tricks also get nuked.  I don't see
much use to them except on benchmarks.  All other users of ZERO_PAGE are
converted just to use ZERO_PAGE(0) for simplicity. We can look at
replacing them all and maybe ripping out ZERO_PAGE completely when we are
more satisfied with this solution.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus "snif" Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:53 -07:00
..
agp fix use after free in amd create gatt pages 2007-10-15 10:32:15 +10:00
drm via invalid device ids removal 2007-10-15 11:09:35 +10:00
hw_random x86_64: Geode HW Random Number Generator depends on X86_32 2007-07-21 18:37:13 -07:00
ip2 Convert from class_device to device in drivers/char 2007-10-12 14:51:04 -07:00
ipmi signedness: module_param_array nump argument 2007-10-14 12:41:52 -07:00
mwave [PATCH] mwave: interesting flags savings 2007-02-20 17:10:14 -08:00
pcmcia Convert from class_device to device in drivers/char 2007-10-12 14:51:04 -07:00
rio long vs. unsigned long - low-hanging fruits in drivers 2007-10-14 12:41:51 -07:00
tpm tpmdd maintainers 2007-08-22 19:52:44 -07:00
watchdog mpc5200_wdt: __user annotations 2007-10-14 12:41:51 -07:00
.gitignore
ChangeLog
Kconfig Char: cyclades, select FW_LOADER 2007-07-26 11:35:19 -07:00
Makefile Correct Makefile rule for generating custom keymap 2007-10-08 16:06:51 -07:00
amiserial.c some kmalloc/memset ->kzalloc (tree wide) 2007-07-19 10:04:50 -07:00
apm-emulation.c Freezer: make kernel threads nonfreezable by default 2007-07-17 10:23:02 -07:00
applicom.c
applicom.h
briq_panel.c [POWERPC] Remove dead code for preventing pread() and pwrite() calls 2007-07-10 22:03:26 +10:00
cd1865.h
consolemap.c Kernel utf-8 handling 2007-07-16 09:05:46 -07:00
cp437.uni
cs5535_gpio.c Char: cs5535_gpio, add MODULE_DEVICE_TABLE 2007-05-08 11:15:04 -07:00
cyclades.c drivers/*: mark variables with uninitialized_var() 2007-07-17 16:23:19 -04:00
defkeymap.c_shipped
defkeymap.map
digi1.h
digiFep1.h
digiPCI.h
ds1286.c [CHAR] ds1286: Fix handling of seconds in RTC_ALM_SET ioctl. 2007-03-08 01:10:30 +00:00
ds1302.c [PATCH] DS1302: local_irq_disable() is redundant after local_irq_save() 2007-02-12 09:48:30 -08:00
ds1620.c header cleaning: don't include smp_lock.h when not used 2007-05-08 11:15:07 -07:00
dsp56k.c long vs. unsigned long - low-hanging fruits in drivers 2007-10-14 12:41:51 -07:00
dtlk.c dtlk: fix error checks in module_init() 2007-05-08 11:15:09 -07:00
efirtc.c
epca.c drivers/char: use __set_current_state() 2007-05-08 11:15:13 -07:00
epca.h
epcaconfig.h
esp.c some kmalloc/memset ->kzalloc (tree wide) 2007-07-19 10:04:50 -07:00
generic_nvram.c [PATCH] mark struct file_operations const 3 2007-02-12 09:48:45 -08:00
generic_serial.c genericserial: remove bogus optimisation check and dead code paths 2007-07-16 09:05:51 -07:00
genrtc.c Char: genrtc, use wait_event_interruptible 2007-07-16 09:05:44 -07:00
hangcheck-timer.c Detach sched.h from mm.h 2007-05-21 09:18:19 -07:00
hpet.c Silent drivers/char/hpet.c build warnings on i386 2007-09-26 09:22:04 -07:00
hvc_beat.c [POWERPC] Init markings for hvc_beat 2007-08-17 11:01:50 +10:00
hvc_console.c Freezer: make kernel threads nonfreezable by default 2007-07-17 10:23:02 -07:00
hvc_console.h
hvc_iseries.c [POWERPC] init and exit markings for hvc_iseries 2007-07-22 21:30:59 +10:00
hvc_lguest.c lguest files should explicitly include asm/paravirt.h 2007-08-11 15:47:42 -07:00
hvc_rtas.c [POWERPC] Quiet section mismatch in hvc_rtas.c 2007-07-22 21:30:59 +10:00
hvc_vio.c [POWERPC] Rename device_is_compatible to of_device_is_compatible 2007-05-07 20:31:14 +10:00
hvc_xen.c xen: use the hvc console infrastructure for Xen console 2007-07-18 08:47:44 -07:00
hvcs.c [POWERPC] hvcs: Make some things static and const 2007-07-22 21:30:59 +10:00
hvsi.c [POWERPC] Rename get_property to of_get_property: partial drivers 2007-04-27 15:51:56 +10:00
i8k.c drivers/firmware: const-ify DMI API and internals 2007-10-09 20:22:20 -04:00
ip27-rtc.c header cleaning: don't include smp_lock.h when not used 2007-05-08 11:15:07 -07:00
isicom.c Char: isicom, proper variables types 2007-07-17 10:23:10 -07:00
istallion.c Convert from class_device to device in drivers/char 2007-10-12 14:51:04 -07:00
keyboard.c m68k/mac: Make mac_hid_mouse_emulate_buttons() declaration visible 2007-08-22 19:52:45 -07:00
lcd.c [MIPS] Delete duplicate inclusion of <linux/delay.h>. 2007-08-27 02:16:59 +01:00
lcd.h [MIPS] Add MTD device support for Cobalt 2007-02-20 17:11:55 +00:00
lp.c Convert from class_device to device in drivers/char 2007-10-12 14:51:04 -07:00
mbcs.c mbcs: Remove lots of global symbols 2007-07-19 10:04:43 -07:00
mbcs.h mbcs: Remove lots of global symbols 2007-07-19 10:04:43 -07:00
mem.c remove ZERO_PAGE 2007-10-16 09:42:53 -07:00
misc.c Make /proc/misc use seq_list_xxx helpers 2007-07-16 09:05:42 -07:00
mmtimer.c Remove fs.h from mm.h 2007-07-29 17:09:29 -07:00
moxa.c Char: moxa, eliminate busy waiting 2007-07-17 10:23:10 -07:00
mspec.c fix "mspec: handle shrinking virtual memory areas" 2007-09-25 08:51:04 -07:00
mxser.c serial: remove termios checks from various old char serial drivers 2007-07-16 09:05:52 -07:00
mxser.h [PATCH] mxser: remove ambiguous redefinition of INIT_WORK 2007-02-11 10:51:25 -08:00
mxser_new.c serial: remove termios checks from various old char serial drivers 2007-07-16 09:05:52 -07:00
mxser_new.h [PATCH] Char: mxser_new, upgrade to 1.9.15 2007-02-11 10:51:29 -08:00
n_hdlc.c Char: n_hdlc, allow RESTARTSYS retval of tty write 2007-07-16 09:05:43 -07:00
n_r3964.c Char: n_r3964, use wait_event_interruptible 2007-07-16 09:05:44 -07:00
n_tty.c Audit: add TTY input auditing 2007-07-16 09:05:47 -07:00
nsc_gpio.c
nvram.c COBALT: remove all references to Cobalt NVRAM 2007-07-16 09:05:47 -07:00
nwbutton.c [PATCH] Char: timers cleanup 2007-02-12 09:48:30 -08:00
nwbutton.h
nwflash.c [PATCH] remove many unneeded #includes of sched.h 2007-02-14 08:09:54 -08:00
pc8736x_gpio.c
ppdev.c header cleaning: don't include smp_lock.h when not used 2007-05-08 11:15:07 -07:00
ps3flash.c ps3: FLASH ROM Storage Driver 2007-07-21 17:49:16 -07:00
pty.c PTY: add kernel parameter to overwrite legacy pty count 2007-10-12 14:51:09 -07:00
random.c [TCP]: secure_tcp_sequence_number() should not use a too fast clock 2007-10-01 21:01:24 -07:00
raw.c cdev: remove unneeded setting of cdev names 2007-10-12 14:51:02 -07:00
riscom8.c Char: riscom8, eliminate busy loop 2007-07-17 10:23:10 -07:00
riscom8.h long vs. unsigned long - low-hanging fruits in drivers 2007-10-14 12:41:51 -07:00
riscom8_reg.h
rocket.c some kmalloc/memset ->kzalloc (tree wide) 2007-07-19 10:04:50 -07:00
rocket.h
rocket_int.h Kill unused sesssion and group values in rocket driver 2007-05-11 08:29:36 -07:00
rtc.c x86_64: Untangle asm/hpet.h from asm/timex.h 2007-07-21 18:37:08 -07:00
scc.h
scx200_gpio.c
selection.c Kernel utf-8 handling 2007-07-16 09:05:46 -07:00
ser_a2232.c [PATCH] remove many unneeded #includes of sched.h 2007-02-14 08:09:54 -08:00
ser_a2232.h
ser_a2232fw.ax
ser_a2232fw.h
serial167.c m68k: remove empty ->setup is several consoles 2007-07-20 08:24:49 -07:00
snsc.c Convert from class_device to device in drivers/char 2007-10-12 14:51:04 -07:00
snsc.h
snsc_event.c [IA64] drivers/char/snsc_event.c:206: warning: unused variable `p' 2007-05-10 13:23:05 -07:00
sonypi.c ACPI: Schedule /proc/acpi/event for removal 2007-08-23 15:20:26 -04:00
specialix.c Char: specialix, remove busy waiting 2007-07-17 10:23:10 -07:00
specialix_io8.h
stallion.c Convert from class_device to device in drivers/char 2007-10-12 14:51:04 -07:00
sx.c sx: switch subven and subid values 2007-07-10 17:51:13 -07:00
sx.h long vs. unsigned long - low-hanging fruits in drivers 2007-10-14 12:41:51 -07:00
sxboards.h
sxwindow.h
synclink.c some kmalloc/memset ->kzalloc (tree wide) 2007-07-19 10:04:50 -07:00
synclink_gt.c synclink_gt endianness annotations 2007-10-14 12:41:51 -07:00
synclinkmp.c some kmalloc/memset ->kzalloc (tree wide) 2007-07-19 10:04:50 -07:00
sysrq.c header cleaning: don't include smp_lock.h when not used 2007-05-08 11:15:07 -07:00
tb0219.c
tipar.c Convert from class_device to device in drivers/char 2007-10-12 14:51:04 -07:00
tlclk.c [PATCH] remove many unneeded #includes of sched.h 2007-02-14 08:09:54 -08:00
toshiba.c [PATCH] remove many unneeded #includes of sched.h 2007-02-14 08:09:54 -08:00
tty_audit.c Audit: add TTY input auditing 2007-07-16 09:05:47 -07:00
tty_io.c tty: dont needlessly cast kmalloc() return value 2007-08-23 21:39:41 -07:00
tty_ioctl.c sparc64 (and others): fix tty_ioctl.c build 2007-09-15 08:18:30 -07:00
vc_screen.c use mutex instead of semaphore in virtual console driver 2007-05-08 11:15:33 -07:00
viocons.c [POWERPC] iSeries: fix viocons init 2006-12-20 16:37:48 +11:00
viotape.c Convert from class_device to device in drivers/char 2007-10-12 14:51:04 -07:00
vme_scc.c m68k: remove empty ->setup is several consoles 2007-07-20 08:24:49 -07:00
vr41xx_giu.c [MIPS] Separate platform_device registration for VR41xx GPIO 2007-07-12 17:41:15 +01:00
vt.c Fix the graphic corruption issue on IA64 machines 2007-07-17 10:23:13 -07:00
vt_ioctl.c VT_WAITACTIVE: Avoid returning EINTR when not necessary 2007-10-07 16:02:55 -07:00