Go to file
Linus Torvalds 30bac164ac Revert "Change mincore() to count "mapped" pages rather than "cached" pages"
This reverts commit 574823bfab.

It turns out that my hope that we could just remove the code that
exposes the cache residency status from mincore() was too optimistic.

There are various random users that want it, and one example would be
the Netflix database cluster maintenance. To quote Josh Snyder:

 "For Netflix, losing accurate information from the mincore syscall
  would lengthen database cluster maintenance operations from days to
  months. We rely on cross-process mincore to migrate the contents of a
  page cache from machine to machine, and across reboots.

  To do this, I wrote and maintain happycache [1], a page cache
  dumper/loader tool. It is quite similar in architecture to pgfincore,
  except that it is agnostic to workload. The gist of happycache's
  operation is "produce a dump of residence status for each page, do
  some operation, then reload exactly the same pages which were present
  before." happycache is entirely dependent on accurate reporting of the
  in-core status of file-backed pages, as accessed by another process.

  We primarily use happycache with Cassandra, which (like Postgres +
  pgfincore) relies heavily on OS page cache to reduce disk accesses.
  Because our workloads never experience a cold page cache, we are able
  to provision hardware for a peak utilization level that is far lower
  than the hypothetical "every query is a cache miss" peak.

  A database warmed by happycache can be ready for service in seconds
  (bounded only by the performance of the drives and the I/O subsystem),
  with no period of in-service degradation. By contrast, putting a
  database in service without a page cache entails a potentially
  unbounded period of degradation (at Netflix, the time to populate a
  single node's cache via natural cache misses varies by workload from
  hours to weeks). If a single node upgrade were to take weeks, then
  upgrading an entire cluster would take months. Since we want to apply
  security upgrades (and other things) on a somewhat tighter schedule,
  we would have to develop more complex solutions to provide the same
  functionality already provided by mincore.

  At the bottom line, happycache is designed to benignly exploit the
  same information leak documented in the paper [2]. I think it makes
  perfect sense to remove cross-process mincore functionality from
  unprivileged users, but not to remove it entirely"

We do have an alternate approach that limits the cache residency
reporting only to processes that have write permissions to the file, so
we can fix the original information leak issue that way.  It involves
_adding_ code rather than removing it, which is sad, but hey, at least
we haven't found any users that would find the restrictions
unacceptable.

So revert the optimistic first approach to make room for that alternate
fix instead.

Reported-by: Josh Snyder <joshs@netflix.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Kevin Easton <kevin@guarana.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Cyril Hrubis <chrubis@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Daniel Gruss <daniel@gruss.cc>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-24 09:04:37 +13:00
Documentation XArray updates for 5.0-rc3 2019-01-22 17:08:30 +13:00
LICENSES This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
arch s390 update with bug fixes for 5.0-rc4 2019-01-24 08:58:01 +13:00
block block: Cleanup license notice 2019-01-17 21:21:40 -07:00
certs kbuild: remove redundant target cleaning on failure 2019-01-06 09:46:51 +09:00
crypto crypto: sm3 - fix undefined shift by >= width of value 2019-01-10 21:37:32 +08:00
drivers I missed the merge window, which wasn't really important at the time 2019-01-24 09:00:19 +13:00
firmware kbuild: change filechk to surround the given command with { } 2019-01-06 09:46:51 +09:00
fs Fixes for pstore/ram 2019-01-21 13:12:03 +13:00
include Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid 2019-01-23 07:16:05 +13:00
init kbuild: Disable LD_DEAD_CODE_DATA_ELIMINATION with ftrace & GCC <= 4.7 2019-01-14 10:37:09 +09:00
ipc ipc: IPCMNI limit check for semmni 2018-10-31 08:54:14 -07:00
kernel Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-01-21 12:52:31 +13:00
lib XArray updates for 5.0-rc3 2019-01-22 17:08:30 +13:00
mm Revert "Change mincore() to count "mapped" pages rather than "cached" pages" 2019-01-24 09:04:37 +13:00
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-01-21 12:52:31 +13:00
samples samples/bpf: workaround clang asm goto compilation errors 2019-01-15 20:57:30 +01:00
scripts Bug fixes for gcc-plugins 2019-01-21 13:07:03 +13:00
security Merge branch 'fixes-v5.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2019-01-17 16:54:58 +12:00
sound remove dma_zalloc_coherent 2019-01-12 10:52:40 -08:00
tools linux-kselftest-5.0-rc4 2019-01-23 14:02:14 +13:00
usr user/Makefile: Fix typo and capitalization in comment section 2018-12-11 00:18:03 +09:00
virt KVM: validate userspace input in kvm_clear_dirty_log_protect() 2019-01-11 18:38:07 +01:00
.clang-format clang-format: Update .clang-format with the latest for_each macro list 2019-01-19 19:26:06 +01:00
.cocciconfig
.get_maintainer.ignore
.gitattributes .gitattributes: set git diff driver for C source code files 2016-10-07 18:46:30 -07:00
.gitignore kbuild: Add support for DT binding schema checks 2018-12-13 09:41:32 -06:00
.mailmap A few early MIPS fixes for 4.21: 2019-01-05 12:48:25 -08:00
COPYING COPYING: use the new text with points to the license files 2018-03-23 12:41:45 -06:00
CREDITS Add CREDITS entry for Shaohua Li 2019-01-04 14:27:09 -07:00
Kbuild kbuild: use assignment instead of define ... endef for filechk_* rules 2019-01-06 10:22:35 +09:00
Kconfig kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt 2018-08-02 08:06:55 +09:00
MAINTAINERS MAINTAINERS: update email addresses of liquidio driver maintainers 2019-01-18 14:07:06 -08:00
Makefile Linux 5.0-rc3 2019-01-21 13:14:44 +13:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.