linux/net
Vladimir Davydov 3aa9799e13 af_unix: charge buffers to kmemcg
Unix sockets can consume a significant amount of system memory, hence
they should be accounted to kmemcg.

Since unix socket buffers are always allocated from process context, all
we need to do to charge them to kmemcg is set __GFP_ACCOUNT in
sock->sk_allocation mask.

Eric asked:

> 1) What happens when a buffer, allocated from socket <A> lands in a
> different socket <B>, maybe owned by another user/process.
>
> Who owns it now, in term of kmemcg accounting ?

We never move memcg charges.  E.g.  if two processes from different
cgroups are sharing a memory region, each page will be charged to the
process which touched it first.  Or if two processes are working with
the same directory tree, inodes and dentries will be charged to the
first user.  The same is fair for unix socket buffers - they will be
charged to the sender.

> 2) Has performance impact been evaluated ?

I ran netperf STREAM_STREAM with default options in a kmemcg on a 4 core
x2 HT box.  The results are below:

 # clients            bandwidth (10^6bits/sec)
                    base              patched
         1      67643 +-  725      64874 +-  353    - 4.0 %
         4     193585 +- 2516     186715 +- 1460    - 3.5 %
         8     194820 +-  377     187443 +- 1229    - 3.7 %

So the accounting doesn't come for free - it takes ~4% of performance.
I believe we could optimize it by using per cpu batching not only on
charge, but also on uncharge in memcg core, but that's beyond the scope
of this patch set - I'll take a look at this later.

Anyway, if performance impact is found to be unacceptable, it is always
possible to disable kmem accounting at boot time (cgroup.memory=nokmem)
or not use memory cgroups at runtime at all (thanks to jump labels
there'll be no overhead even if they are compiled in).

Link: http://lkml.kernel.org/r/fcfe6cae27a59fbc5e40145664b3cf085a560c68.1464079538.git.vdavydov@virtuozzo.com
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26 16:19:19 -07:00
..
6lowpan 6lowpan: move mac802154 header 2016-04-13 10:41:10 +02:00
9p remove lots of IS_ERR_VALUE abuses 2016-05-27 15:26:11 -07:00
802
8021q vlan: use a valid default mtu value for vlan over macsec 2016-07-16 20:15:02 -07:00
appletalk appletalk: fix erroneous return value 2016-02-18 14:59:34 -05:00
atm net/atm: sk_err_soft must be positive 2016-05-23 13:51:10 -07:00
ax25 AX.25: Close socket connection on session completion 2016-06-18 20:55:34 -07:00
batman-adv batman-adv: Fix speedy join in gateway client mode 2016-07-06 16:03:40 +02:00
bluetooth Bluetooth: fix power_on vs close race 2016-05-13 16:50:23 +02:00
bridge ipv4: Fix ip_skb_dst_mtu to use the sk passed by ip_finish_output 2016-06-30 09:02:48 -04:00
caif net: caif: fix misleading indentation 2016-03-14 13:09:50 -04:00
can sock: enable timestamping using control messages 2016-04-04 15:50:30 -04:00
ceph libceph: apply new_state before new_up_client on incrementals 2016-07-22 15:17:40 +02:00
core dccp: limit sk_filter trim to payload 2016-07-13 11:53:41 -07:00
dcb net/dcb: make dcbnl.c explicitly non-modular 2015-10-09 07:52:27 -07:00
dccp dccp: limit sk_filter trim to payload 2016-07-13 11:53:41 -07:00
decnet net: fix decnet rtnexthop parsing 2016-07-05 14:08:47 -07:00
dns_resolver KEYS: Add a facility to restrict new links into a keyring 2016-04-11 22:37:37 +01:00
dsa dsa: Rename switch chip data to cd 2016-05-11 19:36:28 -04:00
ethernet eth: Pull header from first fragment via eth_get_headlen 2016-02-24 13:58:05 -05:00
hsr net/hsr: Use setup_timer and mod_timer. 2016-05-16 14:00:43 -04:00
ieee802154 ieee802154: fix logic error in ieee802154_llsec_parse_dev_addr 2016-05-29 22:36:25 -07:00
ipv4 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-07-25 20:43:12 -07:00
ipv6 udp: prevent bugcheck if filter truncates packet too much 2016-07-11 12:43:15 -07:00
ipx
irda net: ircomm, cleanup TIOCGSERIAL 2016-06-25 08:56:30 -07:00
iucv af_iucv: Validate socket address length in iucv_sock_bind() 2016-01-19 14:21:08 -05:00
kcm kcm: fix /proc memory leak 2016-06-22 16:32:23 -04:00
key af_key: fix two typos 2015-10-23 03:05:19 -07:00
l2tp l2tp: fix configuration passed to setup_udp_tunnel_sock() 2016-06-08 11:11:53 -07:00
l3mdev net: l3mdev: Allow send on enslaved interface 2016-05-09 22:33:52 -04:00
lapb net/lapb: tuse %*ph to dump buffers 2016-05-29 22:33:25 -07:00
llc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-05-09 15:59:24 -04:00
mac80211 mac80211: Fix mesh estab_plinks counting in STA removal case 2016-06-28 12:39:50 +02:00
mac802154 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2016-03-19 10:05:34 -07:00
mpls gso: Remove arbitrary checks for unsupported GSO 2016-05-20 18:03:15 -04:00
netfilter Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-07-25 12:41:29 -07:00
netlabel netlabel: fix a problem with netlbl_secattr_catmap_setrng() 2016-04-05 16:10:47 -04:00
netlink netlink: Fix dump skb leak/double free 2016-05-16 22:05:15 -04:00
netrom
nfc nfc: nci: Add nci_nfcc_loopback to the nci core 2016-05-04 01:48:16 +02:00
openvswitch openvswitch: fix conntrack netlink event delivery 2016-06-29 08:13:59 -04:00
packet packet: propagate sock_cmsg_send() error 2016-07-22 01:41:48 -04:00
phonet sock: struct proto hash function may error 2016-02-11 03:54:14 -05:00
qrtr Merge tag 'qcom-soc-for-4.7-2' into net-next 2016-05-17 14:11:19 -04:00
rds RDS: fix rds_tcp_init() error path 2016-07-04 16:09:49 -07:00
rfkill rfkill: Use switch to demux userspace operations 2016-04-05 10:48:53 +02:00
rose rose: limit sk_filter trim to payload 2016-07-13 11:53:40 -07:00
rxrpc rxrpc: fix ptr_ret.cocci warnings 2016-06-07 15:30:21 -07:00
sched net/sched/sch_htb: clamp xstats tokens to fit into 32-bit int 2016-07-18 22:44:31 -07:00
sctp sctp: load transport header after sk_filter 2016-07-18 22:46:52 -07:00
sunrpc rpc: share one xps between all backchannels 2016-06-15 10:32:25 -04:00
switchdev switchdev: pass pointer to fib_info instead of copy 2016-05-17 13:58:49 -04:00
tipc tipc: reset all unicast links when broadcast send link fails 2016-07-11 22:42:12 -07:00
unix af_unix: charge buffers to kmemcg 2016-07-26 16:19:19 -07:00
vmw_vsock vsock: make listener child lock ordering explicit 2016-06-27 10:44:46 -04:00
wimax
wireless cfg80211: handle failed skb allocation 2016-07-06 13:52:18 +02:00
x25 net: fix a kernel infoleak in x25 module 2016-05-09 22:45:33 -04:00
xfrm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-05-09 15:59:24 -04:00
Kconfig bpf: add generic constant blinding for use in jits 2016-05-16 13:49:32 -04:00
Makefile net: Add Qualcomm IPC router 2016-05-08 23:46:14 -04:00
compat.c packet: compat support for sock_fprog 2016-06-09 23:41:03 -07:00
socket.c fs: poll/select/recvmmsg: use timespec64 for timeout events 2016-05-19 19:12:14 -07:00
sysctl_net.c net: sysctl: fix a kmemleak warning 2015-10-23 06:22:08 -07:00