linux/Documentation
David Rientjes b6bb981149 mm, vmpressure: pass-through notification support
By default, vmpressure events are not pass-through, i.e.  they propagate
up through the memcg hierarchy until an event notifier is found for any
threshold level.

This presents a difficulty when a thread waiting on a read(2) for a
vmpressure event cannot distinguish between local memory pressure and
memory pressure in a descendant memcg, especially when that thread may
not control the memcg hierarchy.

Consider a user-controlled child memcg with a smaller limit than a
top-level memcg controlled by the "Activity Manager" specified in
Documentation/cgroup-v1/memory.txt.  It may register for memory pressure
notification for descendant memcgs to make a policy decision: oom kill a
low priority job, increase the limit, decrease other limits, etc.  If it
registers for memory pressure notification on the top-level memcg, it
currently cannot distinguish between memory pressure in its own memcg or
a descendant memcg, which is user-controlled.

Conversely, if a user registers for memory pressure notification on
their own descendant memcg, the Activity Manager does not receive any
pressure notification for that child memcg hierarchy.  Vmpressure events
are not received for ancestor memcgs if the memcg experiencing pressure
have notifiers registered, perhaps outside the knowledge of the thread
waiting on read(2) at the top level.

Both of these are consequences of vmpressure notification not being
pass-through.

This implements a pass-through behavior for vmpressure events.  When
writing to control.event_control, vmpressure event handlers may
optionally specify a mode.  There are two new modes:

 - "hierarchy": always propagate memory pressure events up the hierarchy
   regardless if descendant memcgs have their own notifiers registered,
   and

 - "local": only receive notifications when the memcg for which the
   event is registered experiences memory pressure.

Of course, processes may register for one notification of "low,local",
for example, and another for "low".

If no mode is specified, the current behavior is maintained for
backwards compatibility.

See the change to Documentation/cgroup-v1/memory.txt for full
specification.

[dan.carpenter@oracle.com: free the same pointer we allocated]
  Link: http://lkml.kernel.org/r/20170613191820.GA20003@elgon.mountain
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1705311421320.8946@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Anton Vorontsov <anton@enomsg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:31 -07:00
..
ABI DeviceTree for 4.13: 2017-07-07 10:37:54 -07:00
EDID
PCI
RCU
accounting
acpi This is the bulk of GPIO changes for the v4.13 series: 2017-07-07 12:40:27 -07:00
admin-guide Merge branch 'akpm' (patches from Andrew) 2017-07-06 22:27:08 -07:00
aoe
arm
arm64
auxdisplay
backlight
blackfin
block block: remove bio_clone() and all references. 2017-06-18 12:40:59 -06:00
blockdev
bus-devices
cdrom
cgroup-v1 mm, vmpressure: pass-through notification support 2017-07-10 16:32:31 -07:00
cma
connector
console
core-api There has been a fair amount of activity in the docs tree this time 2017-07-03 21:13:25 -07:00
cpu-freq
cpuidle
cris
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2017-07-05 12:22:23 -07:00
dev-tools linux-kselftest-4.13-rc1-update 2017-07-07 14:04:47 -07:00
device-mapper dm zoned: drive-managed zoned block device target 2017-06-19 11:05:20 -04:00
devicetree main drm pull for v4.13 2017-07-09 18:48:37 -07:00
dmaengine
doc-guide
driver-api This is the big bulk of pin control changes for the v4.13 series: 2017-07-06 11:38:59 -07:00
driver-model pci-v4.13-changes 2017-07-08 15:51:57 -07:00
early-userspace
extcon
fault-injection
fb
features
filesystems Writeback error handling fixes (pile #2) 2017-07-07 19:38:17 -07:00
firmware_class
fmc
fpga
frv
gpio
gpu main drm pull for v4.13 2017-07-09 18:48:37 -07:00
hid
hwmon
i2c
ia64
ide
iio
infiniband
input
ioctl scsi: cxlflash: Introduce host ioctl support 2017-06-26 15:01:11 -04:00
isdn
kbuild Kbuild thin archives updates for v4.13 2017-07-07 15:11:12 -07:00
kdump
kernel-hacking There has been a fair amount of activity in the docs tree this time 2017-07-03 21:13:25 -07:00
laptops
leds
lightnvm
livepatch
locking
m68k
md
media media updates for v4.13-rc1 2017-07-06 11:15:19 -07:00
memory-devices
metag
mic
mips
misc-devices
mmc
mn10300
mtd
namespaces
netlabel
networking doc: SKB_GSO_[IPIP|SIT] have been replaced 2017-07-08 11:25:56 +01:00
nfc
nios2
nvdimm
nvmem
parisc
pcmcia
perf
phy
platform
power power supply and reset changes for the v4.13 series 2017-07-04 14:25:14 -07:00
powerpc powerpc updates for 4.13 2017-07-07 13:55:45 -07:00
pps
process Kbuild thin archives updates for v4.13 2017-07-07 15:11:12 -07:00
pti
ptp
rapidio
s390
scheduler
scsi
security
serial
sh
sound sound updates for 4.13-rc1 2017-07-06 10:56:51 -07:00
sparc
sphinx Docs: clean up some DocBook loose ends 2017-06-23 14:17:38 -06:00
sphinx-static
spi
sysctl
target
thermal
timers
trace Char/Misc patches for 4.13-rc1 2017-07-03 20:55:59 -07:00
translations doc/kokr/howto: Only send regression fixes after -rc1 2017-06-22 10:25:22 -06:00
usb usb: gadget: add f_uac1 variant based on a new u_audio api 2017-06-19 09:22:47 +03:00
userspace-api
virtual kvm: x86: mmu: allow A/D bits to be disabled in an mmu 2017-07-03 11:19:54 +02:00
vm ksm: introduce ksm_max_page_sharing per page deduplication limit 2017-07-06 16:24:31 -07:00
w1
watchdog
wimax
x86
xtensa of: update ePAPR references to point to Devicetree Specification 2017-06-22 11:22:06 -05:00
.gitignore
00-INDEX linux-kselftest-4.13-rc1-update 2017-07-07 14:04:47 -07:00
Changes
CodingStyle
DMA-API-HOWTO.txt dma-mapping: remove DMA_ERROR_CODE 2017-06-28 06:54:37 -07:00
DMA-API.txt
DMA-ISA-LPC.txt
DMA-attributes.txt
IPMI.txt
IRQ-affinity.txt
IRQ-domain.txt
IRQ.txt
Intel-IOMMU.txt
Makefile
SAK.txt
SM501.txt
SubmittingPatches
bcache.txt
bt8xxgpio.txt
btmrvl.txt
bus-virt-phys-mapping.txt
cachetlb.txt
cgroup-v2.txt mm/oom_kill: count global and memory cgroup oom kills 2017-07-06 16:24:35 -07:00
circular-buffers.txt
clk.txt
conf.py Docs: Fix breakage with Sphinx 1.5 and upper 2017-06-23 13:45:37 -06:00
cpu-load.txt
cputopology.txt
crc32.txt
dcdbas.txt
debugging-modules.txt
debugging-via-ohci1394.txt
dell_rbu.txt
digsig.txt
docutils.conf
dontdiff GCC plugin updates: 2017-07-05 11:46:59 -07:00
efi-stub.txt
eisa.txt
flexible-arrays.txt
futex-requeue-pi.txt
gcc-plugins.txt
highuid.txt
hw_random.txt
hwspinlock.txt
index.rst Make the main documentation title less Geocities 2017-06-23 14:02:27 -06:00
intel_txt.txt
io-mapping.txt
io_ordering.txt
iostats.txt
irqflags-tracing.txt
isa.txt
isapnp.txt
kernel-doc-nano-HOWTO.txt
kernel-per-CPU-kthreads.txt
kobject.txt
kprobes.txt
kref.txt
ldm.txt
lockup-watchdogs.txt
logo.gif
logo.txt
lsm.txt
lzo.txt
mailbox.txt
memory-barriers.txt There has been a fair amount of activity in the docs tree this time 2017-07-03 21:13:25 -07:00
memory-hotplug.txt
men-chameleon-bus.txt
nommu-mmap.txt
ntb.txt
numastat.txt
padata.txt
parport-lowlevel.txt
percpu-rw-semaphore.txt
phy.txt
pi-futex.txt
pnp.txt
preempt-locking.txt
printk-formats.txt vsprintf: Add %p extension "%pOF" for device tree 2017-06-27 12:36:40 -05:00
pwm.txt
rbtree.txt
remoteproc.txt
rfkill.txt
robust-futex-ABI.txt
robust-futexes.txt
rpmsg.txt
rtc.txt
sgi-ioc4.txt
siphash.txt
smsc_ece1099.txt
static-keys.txt
svga.txt
switchtec.txt
sync_file.txt
tee.txt
this_cpu_ops.txt
unaligned-memory-access.txt
vfio-mediated-device.txt
vfio.txt
video-output.txt
xillybus.txt
xz.txt
zorro.txt