linux/include/net
Mel Gorman c76562b670 netvm: prevent a stream-specific deadlock
This patch series is based on top of "Swap-over-NBD without deadlocking
v15" as it depends on the same reservation of PF_MEMALLOC reserves logic.

When a user or administrator requires swap for their application, they
create a swap partition and file, format it with mkswap and activate it
with swapon.  In diskless systems this is not an option so if swap if
required then swapping over the network is considered.  The two likely
scenarios are when blade servers are used as part of a cluster where the
form factor or maintenance costs do not allow the use of disks and thin
clients.

The Linux Terminal Server Project recommends the use of the Network Block
Device (NBD) for swap but this is not always an option.  There is no
guarantee that the network attached storage (NAS) device is running Linux
or supports NBD.  However, it is likely that it supports NFS so there are
users that want support for swapping over NFS despite any performance
concern.  Some distributions currently carry patches that support swapping
over NFS but it would be preferable to support it in the mainline kernel.

Patch 1 avoids a stream-specific deadlock that potentially affects TCP.

Patch 2 is a small modification to SELinux to avoid using PFMEMALLOC
	reserves.

Patch 3 adds three helpers for filesystems to handle swap cache pages.
	For example, page_file_mapping() returns page->mapping for
	file-backed pages and the address_space of the underlying
	swap file for swap cache pages.

Patch 4 adds two address_space_operations to allow a filesystem
	to pin all metadata relevant to a swapfile in memory. Upon
	successful activation, the swapfile is marked SWP_FILE and
	the address space operation ->direct_IO is used for writing
	and ->readpage for reading in swap pages.

Patch 5 notes that patch 3 is bolting
	filesystem-specific-swapfile-support onto the side and that
	the default handlers have different information to what
	is available to the filesystem. This patch refactors the
	code so that there are generic handlers for each of the new
	address_space operations.

Patch 6 adds an API to allow a vector of kernel addresses to be
	translated to struct pages and pinned for IO.

Patch 7 adds support for using highmem pages for swap by kmapping
	the pages before calling the direct_IO handler.

Patch 8 updates NFS to use the helpers from patch 3 where necessary.

Patch 9 avoids setting PF_private on PG_swapcache pages within NFS.

Patch 10 implements the new swapfile-related address_space operations
	for NFS and teaches the direct IO handler how to manage
	kernel addresses.

Patch 11 prevents page allocator recursions in NFS by using GFP_NOIO
	where appropriate.

Patch 12 fixes a NULL pointer dereference that occurs when using
	swap-over-NFS.

With the patches applied, it is possible to mount a swapfile that is on an
NFS filesystem.  Swap performance is not great with a swap stress test
taking roughly twice as long to complete than if the swap device was
backed by NBD.

This patch: netvm: prevent a stream-specific deadlock

It could happen that all !SOCK_MEMALLOC sockets have buffered so much data
that we're over the global rmem limit.  This will prevent SOCK_MEMALLOC
buffers from receiving data, which will prevent userspace from running,
which is needed to reduce the buffered data.

Fix this by exempting the SOCK_MEMALLOC sockets from the rmem limit.  Once
this change it applied, it is important that sockets that set
SOCK_MEMALLOC do not clear the flag until the socket is being torn down.
If this happens, a warning is generated and the tokens reclaimed to avoid
accounting errors until the bug is fixed.

[davem@davemloft.net: Warning about clearing SOCK_MEMALLOC]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31 18:42:47 -07:00
..
9p
bluetooth Bluetooth: Use tx window from config response for ack timing 2012-07-15 12:18:29 -03:00
caif caif-hsi: Remove use of module parameters 2012-06-25 16:44:12 -07:00
irda
iucv
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-07-10 23:56:33 -07:00
netns tcp: use hash_32() in tcp_metrics 2012-07-20 10:59:41 -07:00
nfc NFC: Allow HCI driver to pre-open pipes to some gates 2012-07-09 16:42:12 -04:00
phonet net: remove my future former mail address 2012-06-17 16:29:38 -07:00
sctp sctp: Implement quick failover draft from tsvwg 2012-07-22 12:13:46 -07:00
tc_act
act_api.h
addrconf.h ipv6: add ipv6_addr_hash() helper 2012-07-18 11:28:46 -07:00
af_ieee802154.h
af_rxrpc.h
af_unix.h af_unix: speedup /proc/net/unix 2012-06-08 14:27:23 -07:00
ah.h
arp.h ipv4: Fix neigh lookup keying over loopback/point-to-point devices. 2012-07-20 16:06:10 -07:00
atmclip.h
ax25.h net ax25: Fix the build when sysctl support is disabled. 2012-04-23 22:14:47 -04:00
ax88796.h
cfg80211-wext.h
cfg80211.h cfg80211: support TX error rate CQM 2012-07-17 11:57:23 +02:00
checksum.h
cipso_ipv4.h cipso: handle CIPSO options correctly when NetLabel is disabled 2012-06-01 14:18:29 -04:00
cls_cgroup.h
codel.h fq_codel: should use qdisc backlog as threshold 2012-05-16 15:30:26 -04:00
compat.h net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
datalink.h
dcbevent.h
dcbnl.h net/dcb: Add an optional max rate attribute 2012-04-05 05:08:04 -04:00
dn_dev.h
dn_fib.h net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
dn_neigh.h
dn_nsp.h
dn_route.h decnet: Use neighbours privately in dn_route struct. 2012-07-05 01:12:14 -07:00
dn.h net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
dsa.h
dsfield.h
dst_ops.h net: Fix warnings in dst_ops.h 2012-07-19 10:43:03 -07:00
dst.h ipv4: Kill routes during PMTU/redirect updates. 2012-07-20 13:31:22 -07:00
esp.h
ethoc.h
fib_rules.h ipv4: Elide fib_validate_source() completely when possible. 2012-06-29 01:36:36 -07:00
flow_keys.h
flow.h ipv4: Kill FLOWI_FLAG_RT_NOCACHE and associated code. 2012-07-20 13:36:54 -07:00
garp.h
gen_stats.h
genetlink.h net: Use NLMSG_DEFAULT_SIZE in combination with nlmsg_new() 2012-06-28 17:56:43 -07:00
gre.h
icmp.h net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
ieee80211_radiotap.h
ieee802154_netdev.h mac802154: declare reduced mlme operations 2012-05-16 15:16:56 -04:00
ieee802154.h
if_inet6.h net: delete all instances of special processing for token ring 2012-05-15 20:14:35 -04:00
inet6_connection_sock.h ipv6: Add helper inet6_csk_update_pmtu(). 2012-07-16 03:44:56 -07:00
inet6_hashtables.h ipv6: Early TCP socket demux 2012-07-26 15:50:39 -07:00
inet_common.h net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN) 2012-07-19 11:02:03 -07:00
inet_connection_sock.h ipv4: Kill FLOWI_FLAG_RT_NOCACHE and associated code. 2012-07-20 13:36:54 -07:00
inet_ecn.h
inet_frag.h ip_frag: struct inet_frags match() method returns a bool 2012-05-18 01:40:27 -04:00
inet_hashtables.h ipv4: Early TCP socket demux. 2012-06-19 21:22:05 -07:00
inet_sock.h ipv4: Prepare for change of rt->rt_iif encoding. 2012-07-23 16:36:26 -07:00
inet_timewait_sock.h
inetpeer.h ipv4: Maintain redirect and PMTU info in struct rtable again. 2012-07-10 22:40:14 -07:00
ip6_checksum.h
ip6_fib.h ipv6: Store route neighbour in rt6_info struct. 2012-07-05 02:41:58 -07:00
ip6_route.h ipv6: fix inet6_csk_xmit() 2012-07-18 08:59:58 -07:00
ip6_tunnel.h ipv6_tunnel: Allow receiving packets on the fallback tunnel if they pass sanity checks 2012-06-29 00:52:32 -07:00
ip_fib.h ipv4: Cache input routes in fib_info nexthops. 2012-07-20 13:36:40 -07:00
ip_vs.h ipvs: fix oops on NAT reply in br_nf context 2012-07-17 12:00:46 +02:00
ip.h ipv4: tcp: remove per net tcp_sock 2012-07-19 10:35:30 -07:00
ipcomp.h
ipconfig.h
ipip.h tunnel: implement 64 bits statistics 2012-04-14 14:47:05 -04:00
ipv6.h ipv6: add ipv6_addr_hash() helper 2012-07-18 11:28:46 -07:00
ipx.h
iw_handler.h
lapb.h lapb: Neaten debugging 2012-05-17 18:45:20 -04:00
lib80211.h
llc_c_ac.h
llc_c_ev.h net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
llc_c_st.h
llc_conn.h
llc_if.h
llc_pdu.h net: delete all instances of special processing for token ring 2012-05-15 20:14:35 -04:00
llc_s_ac.h
llc_s_ev.h
llc_s_st.h
llc_sap.h
llc.h
mac80211.h mac80211: add time synchronisation with BSS for assoc 2012-07-12 12:10:46 +02:00
mac802154.h mac802154: add wpan device-class support 2012-06-26 21:06:11 -07:00
mip6.h
mld.h
ndisc.h ipv6: Export ndisc option parsing from ndisc.c 2012-07-11 23:39:11 -07:00
neighbour.h net: Do delayed neigh confirmation. 2012-07-05 01:03:06 -07:00
net_namespace.h net: make sock diag per-namespace 2012-07-16 22:31:34 -07:00
net_ratelimit.h
netdma.h
netevent.h net: Pass neighbours and dest address into NETEVENT_REDIRECT events. 2012-07-05 02:21:55 -07:00
netlabel.h
netlink.h netlink: Delete all NLA_PUT*() macros. 2012-04-02 04:33:45 -04:00
netprio_cgroup.h net: netprio_cgroup: rework update socket logic 2012-07-22 12:44:01 -07:00
netrom.h
nexthop.h
nl802154.h
p8022.h
ping.h
pkt_cls.h
pkt_sched.h net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
protocol.h ipv6: Early TCP socket demux 2012-07-26 15:50:39 -07:00
psnap.h
raw.h
rawv6.h ipv6: bool/const conversions phase2 2012-05-19 01:08:16 -04:00
red.h net_sched: red: Make minor corrections to comments 2012-04-16 23:53:11 -04:00
regulatory.h cfg80211: add cellular base station regulatory hint support 2012-07-17 12:16:39 +02:00
request_sock.h
rose.h
route.h ipv4: Fix input route performance regression. 2012-07-26 15:50:39 -07:00
rtnetlink.h rtnl: allow to specify different num for rx and tx queue count 2012-07-20 11:06:59 -07:00
sch_generic.h net: rename bond_queue_mapping to slave_dev_queue_mapping 2012-07-20 11:07:00 -07:00
scm.h get rid of ->scm_work_list 2012-07-22 23:58:00 +04:00
secure_seq.h
slhc_vj.h
snmp.h
sock.h netvm: prevent a stream-specific deadlock 2012-07-31 18:42:47 -07:00
stp.h
tcp_memcontrol.h cgroup: pass struct mem_cgroup instead of struct cgroup to socket memcg 2012-04-10 10:04:07 -07:00
tcp_states.h
tcp.h tcp: improve latencies of timer triggered events 2012-07-20 10:59:41 -07:00
timewait_sock.h [PATCH] tcp: Cache inetpeer in timewait socket, and only when necessary. 2012-06-09 14:56:12 -07:00
transp_v6.h
udp.h net/ipv6/udp: UDP encapsulation: introduce encap_rcv hook into IPv6 2012-04-28 22:21:51 -04:00
udplite.h
wext.h
wimax.h net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
wpan-phy.h mac802154: monitor device support 2012-05-16 15:17:08 -04:00
x25.h net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
x25device.h
xfrm.h net/ipv4: VTI support rx-path hook in xfrm4_mode_tunnel. 2012-07-18 09:36:12 -07:00