linux/include/uapi/linux
Andrea Arcangeli dd0db88d80 userfaultfd: non-cooperative: rollback userfaultfd_exit
Patch series "userfaultfd non-cooperative further update for 4.11 merge
window".

Unfortunately I noticed one relevant bug in userfaultfd_exit while doing
more testing.  I've been doing testing before and this was also tested
by kbuild bot and exercised by the selftest, but this bug never
reproduced before.

I dropped userfaultfd_exit as result.  I dropped it because of
implementation difficulty in receiving signals in __mmput and because I
think -ENOSPC as result from the background UFFDIO_COPY should be enough
already.

Before I decided to remove userfaultfd_exit, I noticed userfaultfd_exit
wasn't exercised by the selftest and when I tried to exercise it, after
moving it to a more correct place in __mmput where it would make more
sense and where the vma list is stable, it resulted in the
event_wait_completion in D state.  So then I added the second patch to
be sure even if we call userfaultfd_event_wait_completion too late
during task exit(), we won't risk to generate tasks in D state.  The
same check exists in handle_userfault() for the same reason, except it
makes a difference there, while here is just a robustness check and it's
run under WARN_ON_ONCE.

While looking at the userfaultfd_event_wait_completion() function I
looked back at its callers too while at it and I think it's not ok to
stop executing dup_fctx on the fcs list because we relay on
userfaultfd_event_wait_completion to execute
userfaultfd_ctx_put(fctx->orig) which is paired against
userfaultfd_ctx_get(fctx->orig) in dup_userfault just before
list_add(fcs).  This change only takes care of fctx->orig but this area
also needs further review looking for similar problems in fctx->new.

The only patch that is urgent is the first because it's an use after
free during a SMP race condition that affects all processes if
CONFIG_USERFAULTFD=y.  Very hard to reproduce though and probably
impossible without SLUB poisoning enabled.

This patch (of 3):

I once reproduced this oops with the userfaultfd selftest, it's not
easily reproducible and it requires SLUB poisoning to reproduce.

    general protection fault: 0000 [#1] SMP
    Modules linked in:
    CPU: 2 PID: 18421 Comm: userfaultfd Tainted: G               ------------ T 3.10.0+ #15
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014
    task: ffff8801f83b9440 ti: ffff8801f833c000 task.ti: ffff8801f833c000
    RIP: 0010:[<ffffffff81451299>]  [<ffffffff81451299>] userfaultfd_exit+0x29/0xa0
    RSP: 0018:ffff8801f833fe80  EFLAGS: 00010202
    RAX: ffff8801f833ffd8 RBX: 6b6b6b6b6b6b6b6b RCX: ffff8801f83b9440
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800baf18600
    RBP: ffff8801f833fee8 R08: 0000000000000000 R09: 0000000000000001
    R10: 0000000000000000 R11: ffffffff8127ceb3 R12: 0000000000000000
    R13: ffff8800baf186b0 R14: ffff8801f83b99f8 R15: 00007faed746c700
    FS:  0000000000000000(0000) GS:ffff88023fc80000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 00007faf0966f028 CR3: 0000000001bc6000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Call Trace:
      do_exit+0x297/0xd10
      SyS_exit+0x17/0x20
      tracesys+0xdd/0xe2
    Code: 00 00 66 66 66 66 90 55 48 89 e5 41 54 53 48 83 ec 58 48 8b 1f 48 85 db 75 11 eb 73 66 0f 1f 44 00 00 48 8b 5b 10 48 85 db 74 64 <4c> 8b a3 b8 00 00 00 4d 85 e4 74 eb 41 f6 84 24 2c 01 00 00 80
    RIP  [<ffffffff81451299>] userfaultfd_exit+0x29/0xa0
     RSP <ffff8801f833fe80>
    ---[ end trace 9fecd6dcb442846a ]---

In the debugger I located the "mm" pointer in the stack and walking
mm->mmap->vm_next through the end shows the vma->vm_next list is fully
consistent and it is null terminated list as expected.  So this has to
be an SMP race condition where userfaultfd_exit was running while the
vma list was being modified by another CPU.

When userfaultfd_exit() run one of the ->vm_next pointers pointed to
SLAB_POISON (RBX is the vma pointer and is 0x6b6b..).

The reason is that it's not running in __mmput but while there are still
other threads running and it's not holding the mmap_sem (it can't as it
has to wait the even to be received by the manager).  So this is an use
after free that was happening for all processes.

One more implementation problem aside from the race condition:
userfaultfd_exit has really to check a flag in mm->flags before walking
the vma or it's going to slowdown the exit() path for regular tasks.

One more implementation problem: at that point signals can't be
delivered so it would also create a task in D state if the manager
doesn't read the event.

The major design issue: it overall looks superfluous as the manager can
check for -ENOSPC in the background transfer:

	if (mmget_not_zero(ctx->mm)) {
[..]
	} else {
		return -ENOSPC;
	}

It's safer to roll it back and re-introduce it later if at all.

[rppt@linux.vnet.ibm.com: documentation fixup after removal of UFFD_EVENT_EXIT]
  Link: http://lkml.kernel.org/r/1488345437-4364-1-git-send-email-rppt@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/20170224181957.19736-2-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09 17:01:09 -08:00
..
android binder: Add support for file-descriptor arrays 2017-02-10 16:00:01 +01:00
byteorder
caif
can can: dev: add CAN interface API for fixed bitrates 2017-01-24 13:52:00 +01:00
cifs
dvb
genwqe
hdlc
hsi
iio iio: Add channel for Gravity 2017-01-05 13:02:25 +00:00
isdn
mmc
netfilter uapi: fix linux/netfilter/xt_hashlimit.h userspace compilation error 2017-02-25 13:32:04 +01:00
netfilter_arp
netfilter_bridge
netfilter_ipv4
netfilter_ipv6
nfsd nfsd: opt in to labeled nfs per export 2017-01-31 12:31:54 -05:00
raid
sched sched/headers: Move various ABI definitions to <uapi/linux/sched/types.h> 2017-03-02 08:42:42 +01:00
spi
sunrpc
tc_act net/act_pedit: Introduce 'add' operation 2017-02-10 13:18:33 -05:00
tc_ematch
usb usb: gadget: f_fs: Document eventfd effect on descriptor format. 2017-01-02 10:55:28 +02:00
wimax
Kbuild virtio, vhost: optimizations, fixes 2017-03-02 13:53:13 -08:00
a.out.h
acct.h
adb.h
adfs_fs.h
affs_hardblocks.h
agpgart.h
aio_abi.h
am437x-vpfe.h
apm_bios.h
arcfb.h
atalk.h
atm.h
atm_eni.h
atm_he.h
atm_idt77105.h
atm_nicstar.h
atm_tcp.h
atm_zatm.h
atmapi.h
atmarp.h
atmbr2684.h
atmclip.h
atmdev.h
atmioc.h
atmlec.h
atmmpc.h
atmppp.h
atmsap.h
atmsvc.h
audit.h Merge branch 'stable-4.11' of git://git.infradead.org/users/pcmoore/audit 2017-02-21 13:25:50 -08:00
auto_dev-ioctl.h autofs: remove duplicated AUTOFS_DEV_IOCTL_SIZE definition 2017-02-27 18:43:45 -08:00
auto_fs.h autofs: add command enum/macros for root-dir ioctls 2017-02-27 18:43:45 -08:00
auto_fs4.h autofs: add command enum/macros for root-dir ioctls 2017-02-27 18:43:45 -08:00
auxvec.h
ax25.h
b1lli.h
batman_adv.h batman-adv: update copyright years for 2017 2017-01-26 08:34:19 +01:00
baycom.h
bcache.h
bcm933xx_hcs.h
bfs_fs.h
binfmts.h
blkpg.h
blktrace_api.h
blkzoned.h
bpf.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-02-16 19:34:01 -05:00
bpf_common.h
bpf_perf_event.h
bpqether.h
bsg.h
bt-bmc.h
btrfs.h
btrfs_tree.h
can.h
capability.h
capi.h
cciss_defs.h
cciss_ioctl.h
cdrom.h
cec-funcs.h [media] cec: fix report_current_latency 2016-12-21 06:59:13 -02:00
cec.h
cgroupstats.h
chio.h
cm4000_cs.h
cn_proc.h
coda.h
coda_psdev.h
coff.h
connector.h
const.h
coresight-stm.h
cramfs_fs.h
cryptouser.h
cuda.h
cyclades.h
cycx_cfm.h
dcbnl.h
dccp.h
devlink.h devlink: fix the name of eswitch commands 2017-02-10 14:43:00 -05:00
dlm.h
dlm_device.h
dlm_netlink.h
dlm_plock.h
dlmconstants.h
dm-ioctl.h
dm-log-userspace.h
dma-buf.h
dn.h
dqblk_xfs.h
edd.h
efs_fs_sb.h
elf-em.h
elf-fdpic.h
elf.h
elfcore.h
errno.h
errqueue.h
ethtool.h net: ethtool: add support for 2500BaseT and 5000BaseT link modes 2017-01-30 10:14:28 -05:00
eventpoll.h
fadvise.h
falloc.h
fanotify.h
fb.h
fcntl.h statx: Add a system call to make enhanced file info available 2017-03-02 20:51:15 -05:00
fd.h
fdreg.h
fib_rules.h
fiemap.h
filter.h
firewire-cdev.h
firewire-constants.h
flat.h
fou.h
fs.h fs: Better permission checking for submounts 2017-02-02 04:36:12 +13:00
fsl_hypervisor.h
fuse.h
futex.h
gameport.h
gen_stats.h
genetlink.h
gfs2_ondisk.h
gigaset_dev.h
gpio.h
gsmmux.h
gtp.h
hash_info.h
hdlc.h
hdlcdrv.h
hdreg.h
hid.h
hiddev.h
hidraw.h
hpet.h
hsr_netlink.h
hw_breakpoint.h
hyperv.h
hysdn_if.h
i2c-dev.h
i2c.h
i2o-dev.h
i8k.h
icmp.h
icmpv6.h
if.h uapi: fix linux/if.h userspace compilation errors 2017-02-22 16:09:04 -05:00
if_addr.h
if_addrlabel.h
if_alg.h
if_arcnet.h
if_arp.h
if_bonding.h
if_bridge.h bridge: uapi: add per vlan tunnel info 2017-02-03 15:21:21 -05:00
if_cablemodem.h
if_eql.h
if_ether.h RDMA: Adding ethertype ETH_P_IBOE 2017-01-10 14:05:11 -05:00
if_fc.h
if_fddi.h
if_frad.h
if_hippi.h
if_infiniband.h
if_link.h bridge: uapi: add per vlan tunnel info 2017-02-03 15:21:21 -05:00
if_ltalk.h
if_macsec.h
if_packet.h
if_phonet.h
if_plip.h
if_ppp.h
if_pppol2tp.h
if_pppox.h
if_slip.h
if_team.h
if_tun.h
if_tunnel.h
if_vlan.h
if_x25.h
ife.h net: Introduce ife encapsulation module 2017-02-03 15:16:45 -05:00
igmp.h bridge: sparse fixes in br_ip6_multicast_alloc_query() 2017-01-17 15:22:05 -05:00
ila.h
in.h
in6.h
in_route.h
inet_diag.h
inotify.h
input-event-codes.h
input.h
ioctl.h
ip.h
ip6_tunnel.h uapi: fix linux/ip6_tunnel.h userspace compilation errors 2017-02-23 10:46:07 -05:00
ip_vs.h
ipc.h
ipmi.h
ipmi_msgdefs.h
ipsec.h
ipv6.h net/ipv6: allow sysctl to change link-local address generation mode 2017-01-27 10:25:34 -05:00
ipv6_route.h uapi: fix linux/ipv6_route.h userspace compilation errors 2017-02-19 18:15:12 -05:00
ipx.h
irda.h
irqnr.h
isdn.h
isdn_divertif.h
isdn_ppp.h
isdnif.h
iso_fs.h
ivtv.h
ivtvfb.h
ixjuser.h
jffs2.h
joystick.h
kcm.h
kcmp.h
kcov.h
kd.h
kdev_t.h
kernel-page-flags.h
kernel.h
kernelcapi.h
kexec.h
keyboard.h
keyctl.h
kfd_ioctl.h
kvm.h KVM: race-free exit from KVM_RUN without POSIX signals 2017-02-17 12:27:37 +01:00
kvm_para.h KVM: x86: add KVM_HC_CLOCK_PAIRING hypercall 2017-02-07 18:16:45 +01:00
l2tp.h uapi: fix linux/if_pppol2tp.h userspace compilation errors 2017-02-14 22:18:05 -05:00
libc-compat.h
lightnvm.h lightnvm: add ioctls for vector I/Os 2017-01-31 08:32:13 -07:00
limits.h
lirc.h
llc.h uapi: fix linux/llc.h userspace compilation error 2017-02-23 10:46:08 -05:00
loop.h
lp.h
lwtunnel.h
magic.h
major.h
map_to_7segment.h
matroxfb.h
mdio.h
media-bus-format.h
media.h
mei.h
membarrier.h
memfd.h
mempolicy.h
meye.h
mic_common.h
mic_ioctl.h
mii.h
minix_fs.h
mman.h
mmtimer.h
module.h
mpls.h mpls: Packet stats 2017-01-17 14:38:43 -05:00
mpls_iptunnel.h
mqueue.h uapi: mqueue.h: add missing linux/types.h include 2017-02-24 17:46:56 -08:00
mroute.h uapi: fix linux/mroute.h userspace compilation errors 2017-02-19 18:15:12 -05:00
mroute6.h uapi: fix linux/mroute6.h userspace compilation errors 2017-02-19 18:15:12 -05:00
msdos_fs.h
msg.h
mtio.h
n_r3964.h
nbd.h
ncp.h
ncp_fs.h
ncp_mount.h
ncp_no.h
ndctl.h
neighbour.h vxlan: support fdb and learning in COLLECT_METADATA mode 2017-02-03 15:21:21 -05:00
net.h
net_dropmon.h
net_namespace.h
net_tstamp.h
netconf.h net: mpls: Add support for netconf 2017-02-20 11:13:37 -05:00
netdevice.h
netfilter.h uapi: stop including linux/sysctl.h in uapi/linux/netfilter.h 2017-02-23 21:51:39 +01:00
netfilter_arp.h
netfilter_bridge.h
netfilter_decnet.h
netfilter_ipv4.h
netfilter_ipv6.h
netlink.h smc: netlink interface for SMC sockets 2017-01-09 16:07:41 -05:00
netlink_diag.h
netrom.h
nfc.h
nfs.h
nfs2.h
nfs3.h
nfs4.h
nfs4_mount.h
nfs_fs.h
nfs_idmap.h
nfs_mount.h
nfsacl.h
nilfs2_api.h
nilfs2_ondisk.h
nl80211.h cfg80211: fix NAN bands definition 2017-02-09 15:17:30 +01:00
nsfs.h nsfs: Add an ioctl() to return owner UID of a userns 2017-02-03 14:35:43 +13:00
nubus.h
nvme_ioctl.h
nvram.h
omap3isp.h
omapfb.h
oom.h
openvswitch.h openvswitch: Add force commit. 2017-02-09 22:59:34 -05:00
packet_diag.h
param.h
parport.h
patchkey.h
pci.h
pci_regs.h Merge branch 'pci/dpc' into next 2017-02-15 11:56:07 -06:00
perf_event.h
personality.h
pfkeyv2.h
pg.h
phantom.h
phonet.h
pkt_cls.h net/sched: Reflect HW offload status 2017-02-17 12:08:05 -05:00
pkt_sched.h
pktcdvd.h
pmu.h
poll.h
posix_acl.h
posix_acl_xattr.h
posix_types.h
ppdev.h
ppp-comp.h
ppp-ioctl.h
ppp_defs.h
pps.h
pr.h
prctl.h
psample.h net: Introduce psample, a new genetlink channel for packet sampling 2017-01-24 13:44:28 -05:00
psci.h
ptp_clock.h
ptrace.h
qnx4_fs.h
qnxtypes.h
qrtr.h
quota.h
radeonfb.h
random.h
raw.h
rds.h uapi: fix linux/rds.h userspace compilation errors 2017-02-23 10:55:08 -05:00
reboot.h
reiserfs_fs.h
reiserfs_xattr.h
resource.h
rfkill.h
rio_cm_cdev.h
rio_mport_cdev.h
romfs_fs.h
rose.h
route.h
rpmsg.h rpmsg: Driver for user space endpoint interface 2017-01-18 10:43:15 -08:00
rtc.h
rtnetlink.h net: mpls: Add support for netconf 2017-02-20 11:13:37 -05:00
scc.h
sched.h
scif_ioctl.h
screen_info.h
sctp.h sctp: add support for generating stream ssn reset event notification 2017-02-19 18:17:59 -05:00
sdla.h
seccomp.h
securebits.h
sed-opal.h uapi: sed-opal fix IOW for activate lsp to use correct struct 2017-02-14 19:47:16 -07:00
seg6.h uapi: fix linux/seg6.h and linux/seg6_iptunnel.h userspace compilation errors 2017-02-23 10:55:08 -05:00
seg6_genl.h
seg6_hmac.h ipv6: sr: add missing Kbuild export for header files 2017-01-16 14:47:21 -05:00
seg6_iptunnel.h uapi: fix linux/seg6.h and linux/seg6_iptunnel.h userspace compilation errors 2017-02-23 10:55:08 -05:00
selinux_netlink.h
sem.h
serial.h
serial_core.h serial: 8250: Add new port type for TI DA8xx/66AK2x 2017-01-12 11:51:25 +01:00
serial_reg.h serial: exar: Move register defines from uapi header to consumer site 2017-02-10 15:13:26 +01:00
serio.h Input: psmouse - add a custom serio protocol to send extra information 2017-02-09 11:43:15 -08:00
shm.h
signal.h
signalfd.h
smc.h smc: establish pnet table management 2017-01-09 16:07:38 -05:00
smc_diag.h smc: netlink interface for SMC sockets 2017-01-09 16:07:41 -05:00
smiapp.h
snmp.h net: add LINUX_MIB_PFMEMALLOCDROP counter 2017-02-02 23:34:19 -05:00
sock_diag.h
socket.h
sockios.h
sonet.h
sonypi.h
sound.h
soundcard.h
stat.h statx: Add a system call to make enhanced file info available 2017-03-02 20:51:15 -05:00
stddef.h
stm.h
string.h
suspend_ioctls.h
swab.h
sync_file.h
synclink.h
sysctl.h
sysinfo.h
target_core_user.h uapi: fix linux/target_core_user.h userspace compilation errors 2017-02-18 21:44:59 -08:00
taskstats.h
tcp.h tcp: record pkts sent and retransmistted 2017-01-29 19:17:23 -05:00
tcp_metrics.h
telephony.h
termios.h
thermal.h
time.h
timerfd.h timerfd: export defines to userspace 2017-01-10 18:31:55 -08:00
times.h
timex.h
tiocl.h
tipc.h tipc: make replicast a user selectable option 2017-01-20 12:10:17 -05:00
tipc_config.h
tipc_netlink.h
toshiba.h
tty.h
tty_flags.h
types.h
udf_fs_i.h
udp.h
uhid.h
uinput.h
uio.h
uleds.h
ultrasound.h
un.h unix: add ioctl to open a unix socket file with O_PATH 2017-02-02 21:58:02 -05:00
unistd.h
unix_diag.h
usbdevice_fs.h
usbip.h
userfaultfd.h userfaultfd: non-cooperative: rollback userfaultfd_exit 2017-03-09 17:01:09 -08:00
userio.h
utime.h
utsname.h
uuid.h
uvcvideo.h
v4l2-common.h
v4l2-controls.h
v4l2-dv-timings.h
v4l2-mediabus.h
v4l2-subdev.h
veth.h
vfio.h
vhost.h
videodev2.h [media] videodev2.h: go back to limited range Y'CbCr for SRGB and, ADOBERGB 2017-02-13 14:33:56 -02:00
virtio_9p.h
virtio_balloon.h
virtio_blk.h
virtio_config.h
virtio_console.h
virtio_crypto.h
virtio_gpu.h
virtio_ids.h
virtio_input.h
virtio_mmio.h virtio_mmio: expose header to userspace 2017-02-27 16:31:23 +02:00
virtio_net.h
virtio_pci.h virtio_pci: don't duplicate the msix_enable flag in struct pci_dev 2017-02-27 20:54:03 +02:00
virtio_ring.h
virtio_rng.h
virtio_scsi.h
virtio_types.h
virtio_vsock.h
vm_sockets.h
vt.h
vtpm_proxy.h
wait.h
wanrouter.h
watchdog.h
wil6210_uapi.h
wimax.h
wireless.h
x25.h
xattr.h
xfrm.h
xilinx-v4l2-controls.h
zorro.h
zorro_ids.h