linux/drivers
Dave Jiang a2d581675d mm,fs,dax: change ->pmd_fault to ->huge_fault
Patch series "1G transparent hugepage support for device dax", v2.

The following series implements support for 1G trasparent hugepage on
x86 for device dax.  The bulk of the code was written by Mathew Wilcox a
while back supporting transparent 1G hugepage for fs DAX.  I have
forward ported the relevant bits to 4.10-rc.  The current submission has
only the necessary code to support device DAX.

Comments from Dan Williams: So the motivation and intended user of this
functionality mirrors the motivation and users of 1GB page support in
hugetlbfs.  Given expected capacities of persistent memory devices an
in-memory database may want to reduce tlb pressure beyond what they can
already achieve with 2MB mappings of a device-dax file.  We have
customer feedback to that effect as Willy mentioned in his previous
version of these patches [1].

[1]: https://lkml.org/lkml/2016/1/31/52

Comments from Nilesh @ Oracle:

There are applications which have a process model; and if you assume
10,000 processes attempting to mmap all the 6TB memory available on a
server; we are looking at the following:

processes         : 10,000
memory            :    6TB
pte @ 4k page size: 8 bytes / 4K of memory * #processes = 6TB / 4k * 8 * 10000 = 1.5GB * 80000 = 120,000GB
pmd @ 2M page size: 120,000 / 512 = ~240GB
pud @ 1G page size: 240GB / 512 = ~480MB

As you can see with 2M pages, this system will use up an exorbitant
amount of DRAM to hold the page tables; but the 1G pages finally brings
it down to a reasonable level.  Memory sizes will keep increasing; so
this number will keep increasing.

An argument can be made to convert the applications from process model
to thread model, but in the real world that may not be always practical.
Hopefully this helps explain the use case where this is valuable.

This patch (of 3):

In preparation for adding the ability to handle PUD pages, convert
vm_operations_struct.pmd_fault to vm_operations_struct.huge_fault.  The
vm_fault structure is extended to include a union of the different page
table pointers that may be needed, and three flag bits are reserved to
indicate which type of pointer is in the union.

[ross.zwisler@linux.intel.com: remove unused function ext4_dax_huge_fault()]
  Link: http://lkml.kernel.org/r/1485813172-7284-1-git-send-email-ross.zwisler@linux.intel.com
[dave.jiang@intel.com: clear PMD or PUD size flags when in fall through path]
  Link: http://lkml.kernel.org/r/148589842696.5820.16078080610311444794.stgit@djiang5-desk3.ch.intel.com
Link: http://lkml.kernel.org/r/148545058784.17912.6353162518188733642.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Nilesh Choudhury <nilesh.choudhury@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:54 -08:00
..
accessibility
acpi pci-v4.11-changes 2017-02-23 11:53:22 -08:00
amba
android mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf 2017-02-24 17:46:54 -08:00
ata ARM: SoC driver updates 2017-02-23 15:57:04 -08:00
atm atm: idt77252, use setup_timer and mod_timer 2017-02-15 13:24:53 -05:00
auxdisplay
base mm: validate device_hotplug is held for memory hotplug 2017-02-24 17:46:53 -08:00
bcma
block zram: remove waitqueue for IO done 2017-02-24 17:46:54 -08:00
bluetooth btmrvl: fix spelling mistake: "actived" -> "activated" 2017-02-19 00:26:37 +01:00
bus ARM: SoC driver updates 2017-02-23 15:57:04 -08:00
cdrom Merge branch 'for-4.11/next' into for-4.11/linus-merge 2017-02-17 14:08:19 -07:00
char mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf 2017-02-24 17:46:54 -08:00
clk ARM: DT updates for v4.11 2017-02-23 15:46:25 -08:00
clocksource Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-02-20 10:06:32 -08:00
connector
cpufreq ARM: SoC non-urgent fixes for merge window 2017-02-23 15:28:04 -08:00
cpuidle powerpc updates for 4.11 part 1. 2017-02-22 10:30:38 -08:00
crypto crypto: cavium - switch to pci_alloc_irq_vectors 2017-02-23 20:11:02 +08:00
dax mm,fs,dax: change ->pmd_fault to ->huge_fault 2017-02-24 17:46:54 -08:00
dca
devfreq Merge branch 'pm-devfreq' 2017-02-20 14:23:40 +01:00
dio
dma TTY/Serial driver patches for 4.11-rc1 2017-02-22 12:17:25 -08:00
dma-buf
edac Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-02-20 12:47:44 -08:00
eisa
extcon
firewire
firmware ARM: SoC driver updates 2017-02-23 15:57:04 -08:00
fmc
fpga
fsi
gpio This is the bulk of GPIO changes for the v4.11 cycle 2017-02-23 08:46:04 -08:00
gpu mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf 2017-02-24 17:46:54 -08:00
hid Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid 2017-02-21 17:28:25 -08:00
hsi mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf 2017-02-24 17:46:54 -08:00
hv vmbus: replace modulus operation with subtraction 2017-02-14 10:20:35 -08:00
hwmon hwmon: (sht15) Add device tree support 2017-02-16 06:49:05 -08:00
hwspinlock
hwtracing mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf 2017-02-24 17:46:54 -08:00
i2c Generic device properties framework updates for v4.11-rc1 2017-02-20 18:06:09 -08:00
ide Merge branch 'for-4.11/next' into for-4.11/linus-merge 2017-02-17 14:08:19 -07:00
idle
iio
infiniband mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf 2017-02-24 17:46:54 -08:00
input This is the bulk of GPIO changes for the v4.11 cycle 2017-02-23 08:46:04 -08:00
iommu iommu/vt-d: Fix crash on boot when DMAR is disabled 2017-02-22 12:25:31 +01:00
ipack
irqchip IOMMU Updates for Linux v4.11 2017-02-20 16:42:43 -08:00
isdn Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2017-02-22 10:15:09 -08:00
leds This is the bulk of GPIO changes for the v4.11 cycle 2017-02-23 08:46:04 -08:00
lguest
lightnvm lightnvm: set default lun range when no luns are specified 2017-02-15 08:27:21 -07:00
macintosh driver core patches for 4.11-rc1 2017-02-22 11:44:32 -08:00
mailbox
mcb
md - Fix dm-raid transient device failure processing and other smaller 2017-02-21 12:11:41 -08:00
media mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf 2017-02-24 17:46:54 -08:00
memory ARM: SoC driver updates 2017-02-23 15:57:04 -08:00
memstick Merge branch 'for-4.11/next' into for-4.11/linus-merge 2017-02-17 14:08:19 -07:00
message SCSI misc on 20170220 2017-02-21 11:51:42 -08:00
mfd staging/iio driver patches for 4.11-rc1 2017-02-22 12:14:01 -08:00
misc mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf 2017-02-24 17:46:54 -08:00
mmc MMC core: 2017-02-21 12:04:54 -08:00
mtd for-4.11/linus-merge-signed 2017-02-21 10:57:33 -08:00
net pci-v4.11-changes 2017-02-23 11:53:22 -08:00
nfc
ntb ntb: ntb_hw_intel: link_poll isn't clearing the pending status properly 2017-02-16 23:11:26 -05:00
nubus
nvdimm
nvme Merge branch 'for-4.11/next' into for-4.11/linus-merge 2017-02-17 14:08:19 -07:00
nvmem
of DeviceTree updates for 4.11: 2017-02-22 19:23:14 -08:00
oprofile
parisc
parport
pci pci-v4.11-changes 2017-02-23 11:53:22 -08:00
pcmcia
perf
phy pci-v4.11-changes 2017-02-23 11:53:22 -08:00
pinctrl Pin control bulk changes for the v4.11 kernel cycle: 2017-02-21 16:34:22 -08:00
platform - Core Frameworks 2017-02-23 08:18:01 -08:00
pnp
power
powercap
pps
ps3
ptp 4.11 is going to be a relatively large release for KVM, with a little over 2017-02-22 18:22:53 -08:00
pwm
rapidio
ras
regulator regulator: Updates for v4.11 2017-02-20 17:23:57 -08:00
remoteproc remoteproc: qcom: mdt_loader: Use signed type for offset 2017-02-22 02:07:13 -08:00
reset ARM: SoC driver updates 2017-02-23 15:57:04 -08:00
rpmsg
rtc Pin control bulk changes for the v4.11 kernel cycle: 2017-02-21 16:34:22 -08:00
s390 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux 2017-02-22 10:20:04 -08:00
sbus
scsi mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf 2017-02-24 17:46:54 -08:00
sfi
sh
sn
soc ARM: SoC driver updates 2017-02-23 15:57:04 -08:00
spi ACPI updates for v4.11-rc1 2017-02-20 17:55:15 -08:00
spmi
ssb
staging mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf 2017-02-24 17:46:54 -08:00
target mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf 2017-02-24 17:46:54 -08:00
tc
thermal Merge branch 'pm-opp' 2017-02-20 14:22:50 +01:00
thunderbolt
tty lib/show_mem.c: teach show_mem to work with the given nodemask 2017-02-22 16:41:30 -08:00
uio mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf 2017-02-24 17:46:54 -08:00
usb mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf 2017-02-24 17:46:54 -08:00
uwb
vfio VFIO updates for v4.11 2017-02-23 11:26:09 -08:00
vhost
video mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf 2017-02-24 17:46:54 -08:00
virt
virtio
vlynq
vme
w1
watchdog
xen mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf 2017-02-24 17:46:54 -08:00
zorro
Kconfig
Makefile pci-v4.11-changes 2017-02-23 11:53:22 -08:00