linux/drivers
Ira Weiny 932f4a630a mm/gup: replace get_user_pages_longterm() with FOLL_LONGTERM
Pach series "Add FOLL_LONGTERM to GUP fast and use it".

HFI1, qib, and mthca, use get_user_pages_fast() due to its performance
advantages.  These pages can be held for a significant time.  But
get_user_pages_fast() does not protect against mapping FS DAX pages.

Introduce FOLL_LONGTERM and use this flag in get_user_pages_fast() which
retains the performance while also adding the FS DAX checks.  XDP has also
shown interest in using this functionality.[1]

In addition we change get_user_pages() to use the new FOLL_LONGTERM flag
and remove the specialized get_user_pages_longterm call.

[1] https://lkml.org/lkml/2019/3/19/939

"longterm" is a relative thing and at this point is probably a misnomer.
This is really flagging a pin which is going to be given to hardware and
can't move.  I've thought of a couple of alternative names but I think we
have to settle on if we are going to use FL_LAYOUT or something else to
solve the "longterm" problem.  Then I think we can change the flag to a
better name.

Secondly, it depends on how often you are registering memory.  I have
spoken with some RDMA users who consider MR in the performance path...
For the overall application performance.  I don't have the numbers as the
tests for HFI1 were done a long time ago.  But there was a significant
advantage.  Some of which is probably due to the fact that you don't have
to hold mmap_sem.

Finally, architecturally I think it would be good for everyone to use
*_fast.  There are patches submitted to the RDMA list which would allow
the use of *_fast (they reworking the use of mmap_sem) and as soon as they
are accepted I'll submit a patch to convert the RDMA core as well.  Also
to this point others are looking to use *_fast.

As an aside, Jasons pointed out in my previous submission that *_fast and
*_unlocked look very much the same.  I agree and I think further cleanup
will be coming.  But I'm focused on getting the final solution for DAX at
the moment.

This patch (of 7):

This patch starts a series which aims to support FOLL_LONGTERM in
get_user_pages_fast().  Some callers who would like to do a longterm (user
controlled pin) of pages with the fast variant of GUP for performance
purposes.

Rather than have a separate get_user_pages_longterm() call, introduce
FOLL_LONGTERM and change the longterm callers to use it.

This patch does not change any functionality.  In the short term
"longterm" or user controlled pins are unsafe for Filesystems and FS DAX
in particular has been blocked.  However, callers of get_user_pages_fast()
were not "protected".

FOLL_LONGTERM can _only_ be supported with get_user_pages[_fast]() as it
requires vmas to determine if DAX is in use.

NOTE: In merging with the CMA changes we opt to change the
get_user_pages() call in check_and_migrate_cma_pages() to a call of
__get_user_pages_locked() on the newly migrated pages.  This makes the
code read better in that we are calling __get_user_pages_locked() on the
pages before and after a potential migration.

As a side affect some of the interfaces are cleaned up but this is not the
primary purpose of the series.

In review[1] it was asked:

<quote>
> This I don't get - if you do lock down long term mappings performance
> of the actual get_user_pages call shouldn't matter to start with.
>
> What do I miss?

A couple of points.

First "longterm" is a relative thing and at this point is probably a
misnomer.  This is really flagging a pin which is going to be given to
hardware and can't move.  I've thought of a couple of alternative names
but I think we have to settle on if we are going to use FL_LAYOUT or
something else to solve the "longterm" problem.  Then I think we can
change the flag to a better name.

Second, It depends on how often you are registering memory.  I have spoken
with some RDMA users who consider MR in the performance path...  For the
overall application performance.  I don't have the numbers as the tests
for HFI1 were done a long time ago.  But there was a significant
advantage.  Some of which is probably due to the fact that you don't have
to hold mmap_sem.

Finally, architecturally I think it would be good for everyone to use
*_fast.  There are patches submitted to the RDMA list which would allow
the use of *_fast (they reworking the use of mmap_sem) and as soon as they
are accepted I'll submit a patch to convert the RDMA core as well.  Also
to this point others are looking to use *_fast.

As an asside, Jasons pointed out in my previous submission that *_fast and
*_unlocked look very much the same.  I agree and I think further cleanup
will be coming.  But I'm focused on getting the final solution for DAX at
the moment.

</quote>

[1] https://lore.kernel.org/lkml/20190220180255.GA12020@iweiny-DESK2.sc.intel.com/T/#md6abad2569f3bf6c1f03686c8097ab6563e94965

[ira.weiny@intel.com: v3]
  Link: http://lkml.kernel.org/r/20190328084422.29911-2-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190328084422.29911-2-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190317183438.2057-2-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:45 -07:00
..
accessibility
acpi IOMMU Updates for Linux v5.2 2019-05-13 09:23:18 -04:00
amba
android Char/Misc patches for 5.2-rc1 - part 2 2019-05-07 13:39:22 -07:00
ata
atm
auxdisplay
base Driver core/kobject patches for 5.2-rc1 2019-05-07 13:01:40 -07:00
bcma
block Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2019-05-07 22:03:58 -07:00
bluetooth Bluetooth: hci_qca: Rename STATE_<flags> to QCA_<flags> 2019-05-05 19:34:00 +02:00
bus
cdrom
char Some minor cleanups for the IPMI driver. 2019-05-08 10:34:17 -07:00
clk Merge branch 'clk-parent-rewrite-1' into clk-next 2019-05-07 11:46:13 -07:00
clocksource Kbuild updates for v5.2 2019-05-08 12:25:12 -07:00
connector
counter
cpufreq Printk changes for 5.2 2019-05-07 09:18:12 -07:00
cpuidle
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2019-05-06 20:15:06 -07:00
dax mm/huge_memory: fix vmf_insert_pfn_{pmd, pud}() crash, handle unaligned addresses 2019-05-14 09:47:44 -07:00
dca
devfreq
dio
dma dmaengine updates for v5.2-rc1 2019-05-09 08:51:45 -07:00
dma-buf drm pull request for 5.2 2019-05-08 21:35:19 -07:00
edac Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-05-06 19:54:57 -07:00
eisa
extcon Char/Misc patches for 5.2-rc1 - part 2 2019-05-07 13:39:22 -07:00
firewire stream_open related patches for Linux 5.2 2019-05-07 12:15:13 -07:00
firmware Char/Misc patches for 5.2-rc1 - part 2 2019-05-07 13:39:22 -07:00
fmc
fpga
fsi
gnss Char/Misc patches for 5.2-rc1 - part 2 2019-05-07 13:39:22 -07:00
gpio This is the bulk of the GPIO changes for the v5.2 kernel cycle: 2019-05-11 10:54:43 -04:00
gpu drm pull request for 5.2 2019-05-08 21:35:19 -07:00
hid stream_open related patches for Linux 5.2 2019-05-07 12:15:13 -07:00
hsi
hv
hwmon stream_open related patches for Linux 5.2 2019-05-07 12:15:13 -07:00
hwspinlock
hwtracing Char/Misc patches for 5.2-rc1 - part 2 2019-05-07 13:39:22 -07:00
i2c Merge branch 'i2c/for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux 2019-05-09 14:41:55 -07:00
i3c * Fix a shift wrap bug in the core 2019-05-07 08:50:40 -07:00
ide ide: officially deprecated the legacy IDE driver 2019-05-08 16:47:23 -07:00
idle
iio Staging / IIO driver patches for 5.2-rc1 2019-05-07 13:31:29 -07:00
infiniband mm/gup: replace get_user_pages_longterm() with FOLL_LONGTERM 2019-05-14 09:47:45 -07:00
input *: convert stream-like files from nonseekable_open -> stream_open 2019-05-06 17:46:41 +03:00
interconnect
iommu IOMMU Updates for Linux v5.2 2019-05-13 09:23:18 -04:00
ipack
irqchip Driver core/kobject patches for 5.2-rc1 2019-05-07 13:01:40 -07:00
isdn Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2019-05-07 22:03:58 -07:00
leds LED updates for 5.2-rc1. 2019-05-07 18:02:51 -07:00
lightnvm
macintosh
mailbox - New driver: Armada 37xx mailbox controller 2019-05-10 12:55:16 -04:00
mcb
md for-5.2/block-20190507 2019-05-07 18:14:36 -07:00
media mm/gup: replace get_user_pages_longterm() with FOLL_LONGTERM 2019-05-14 09:47:45 -07:00
memory This pull request contains the following changes for MTD: 2019-05-12 17:57:52 -04:00
memstick MMC core: 2019-05-07 12:56:19 -07:00
message
mfd dmaengine updates for v5.2-rc1 2019-05-09 08:51:45 -07:00
misc powerpc updates for 5.2 2019-05-10 05:29:27 -07:00
mmc MMC core: 2019-05-07 12:56:19 -07:00
mtd This pull request contains the following changes for UBI/UBIFS 2019-05-12 18:16:31 -04:00
mux
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-05-09 17:00:51 -07:00
nfc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2019-05-07 22:03:58 -07:00
ntb
nubus
nvdimm for-5.2/block-20190507 2019-05-07 18:14:36 -07:00
nvme SCSI misc on 20190507 2019-05-08 10:12:46 -07:00
nvmem
of Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2019-05-07 22:03:58 -07:00
opp
oprofile
parisc parisc: Skip registering LED when running in QEMU 2019-05-03 23:47:39 +02:00
parport DMA mapping updates for 5.2 2019-05-09 08:40:55 -07:00
pci stream_open related patches for Linux 5.2 2019-05-07 12:15:13 -07:00
pcmcia
perf
phy USB/PHY patches for 5.2-rc1 2019-05-08 10:03:52 -07:00
pinctrl This is the bulk of the GPIO changes for the v5.2 kernel cycle: 2019-05-11 10:54:43 -04:00
platform chrome platform changes for v5.2 2019-05-12 07:00:21 -04:00
pnp
power Power Supply Fixes for 5.1 cycle 2019-05-01 14:57:23 -07:00
powercap
pps
ps3
ptp ptp_qoriq: fix NULL access if ptp dt node missing 2019-05-09 09:19:26 -07:00
pwm pwm: Changes for v5.2-rc1 2019-05-10 12:57:15 -04:00
rapidio
ras
regulator Merge branch 'regulator-5.2' into regulator-next 2019-05-06 22:52:14 +09:00
remoteproc
reset
rpmsg
rtc chrome platform changes for v5.2 2019-05-12 07:00:21 -04:00
s390 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2019-05-07 22:03:58 -07:00
sbus docs: sparc: convert to ReST 2019-05-08 17:13:35 -07:00
scsi SCSI misc on 20190507 2019-05-08 10:12:46 -07:00
sfi
sh
siox
slimbus
sn
soc
soundwire
spi dmaengine updates for v5.2-rc1 2019-05-09 08:51:45 -07:00
spmi
ssb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2019-05-07 22:03:58 -07:00
staging drm pull request for 5.2 2019-05-08 21:35:19 -07:00
target SCSI misc on 20190507 2019-05-08 10:12:46 -07:00
tc
tee
thermal
thunderbolt Char/Misc patches for 5.2-rc1 - part 2 2019-05-07 13:39:22 -07:00
tty dmaengine updates for v5.2-rc1 2019-05-09 08:51:45 -07:00
uio
usb drm pull request for 5.2 2019-05-08 21:35:19 -07:00
uwb
vfio mm/gup: replace get_user_pages_longterm() with FOLL_LONGTERM 2019-05-14 09:47:45 -07:00
vhost
video fbdev changes for v5.2: 2019-05-10 12:59:51 -04:00
virt
virtio
visorbus
vlynq
vme
w1 Char/Misc patches for 5.2-rc1 - part 2 2019-05-07 13:39:22 -07:00
watchdog linux-watchdog 5.2-rc1 tag 2019-05-13 09:20:42 -04:00
xen Merge branch 'stable/for-linus-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb 2019-05-07 18:45:27 -07:00
zorro
Kconfig
Makefile