linux/net
Eric Dumazet 4a8e320c92 net: sched: use pinned timers
While using a MQ + NETEM setup, I had confirmation that the default
timer migration ( /proc/sys/kernel/timer_migration ) is killing us.

Installing this on a receiver side of a TCP_STREAM test, (NIC has 8 TX
queues) :

EST="est 1sec 4sec"
for ETH in eth1
do
 tc qd del dev $ETH root 2>/dev/null
 tc qd add dev $ETH root handle 1: mq
 tc qd add dev $ETH parent 1:1 $EST netem limit 70000 delay 6ms
 tc qd add dev $ETH parent 1:2 $EST netem limit 70000 delay 8ms
 tc qd add dev $ETH parent 1:3 $EST netem limit 70000 delay 10ms
 tc qd add dev $ETH parent 1:4 $EST netem limit 70000 delay 12ms
 tc qd add dev $ETH parent 1:5 $EST netem limit 70000 delay 14ms
 tc qd add dev $ETH parent 1:6 $EST netem limit 70000 delay 16ms
 tc qd add dev $ETH parent 1:7 $EST netem limit 80000 delay 18ms
 tc qd add dev $ETH parent 1:8 $EST netem limit 90000 delay 20ms
done

We can see that timers get migrated into a single cpu, presumably idle
at the time timers are set up.
Then all qdisc dequeues run from this cpu and huge lock contention
happens. This single cpu is stuck in softirq mode and cannot dequeue
fast enough.

    39.24%  [kernel]          [k] _raw_spin_lock
     2.65%  [kernel]          [k] netem_enqueue
     1.80%  [kernel]          [k] netem_dequeue
     1.63%  [kernel]          [k] copy_user_enhanced_fast_string
     1.45%  [kernel]          [k] _raw_spin_lock_bh

By pinning qdisc timers on the cpu running the qdisc, we respect proper
XPS setting and remove this lock contention.

     5.84%  [kernel]          [k] netem_enqueue
     4.83%  [kernel]          [k] _raw_spin_lock
     2.92%  [kernel]          [k] copy_user_enhanced_fast_string

Current Qdiscs that benefit from this change are :

	netem, cbq, fq, hfsc, tbf, htb.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26 00:26:48 -04:00
..
6lowpan 6lowpan: Allow 6LoWPAN to be modular 2014-08-07 11:44:18 -07:00
9p 9P: remove unnecessary break after return 2014-07-15 16:27:00 -07:00
802 net: set name_assign_type in alloc_netdev() 2014-07-15 16:12:48 -07:00
8021q net: Always untag vlan-tagged traffic on input. 2014-08-11 12:16:51 -07:00
appletalk Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-07-16 14:09:34 -07:00
atm atm: Convert pr_warning to pr_warn 2014-09-10 12:40:10 -07:00
ax25 net: Fix use after free by removing length arg from sk_data_ready callbacks. 2014-04-11 16:15:36 -04:00
batman-adv batman-adv: Fix parameter order of hlist_add_behind 2014-08-16 19:19:08 -07:00
bluetooth Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless 2014-09-08 11:14:56 -04:00
bridge Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-09-23 12:09:27 -04:00
caif caif: remove unnecessary break after goto 2014-07-15 16:27:01 -07:00
can can: add hash based access to single EFF frame filters 2014-05-19 09:38:24 +02:00
ceph libceph: do not hard code max auth ticket len 2014-09-10 20:08:36 +04:00
core net: Remove gso_send_check as an offload callback 2014-09-26 00:22:47 -04:00
dcb dcbnl : Fix misleading dcb_app->priority explanation 2014-07-30 17:21:05 -07:00
dccp inet: move ipv6only in sock_common 2014-07-01 23:46:21 -07:00
decnet af_decnet: Use time_after_eq 2014-08-22 12:23:11 -07:00
dns_resolver Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2014-08-06 08:06:39 -07:00
dsa net: dsa: add {get, set}_wol callbacks to slave devices 2014-09-22 14:41:23 -04:00
ethernet net: Add function for parsing the header length out of linear ethernet frames 2014-09-05 17:47:02 -07:00
hsr net/hsr: Remove left-over never-true conditional code. 2014-07-11 15:04:40 -07:00
ieee802154 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless 2014-09-08 11:14:56 -04:00
ipv4 net: Remove gso_send_check as an offload callback 2014-09-26 00:22:47 -04:00
ipv6 net: Remove gso_send_check as an offload callback 2014-09-26 00:22:47 -04:00
ipx net: Split sk_no_check into sk_no_check_{rx,tx} 2014-05-23 16:28:53 -04:00
irda irda: Fix rd_frame control field initialization in irlap_send_rd_frame() 2014-08-13 20:05:52 -07:00
iucv iucv: Convert pr_warning to pr_warn 2014-09-10 12:40:10 -07:00
key af_key: remove unnecessary break after return 2014-07-15 16:27:00 -07:00
l2tp l2tp: Refactor l2tp core driver to make use of the common UDP tunnel functions 2014-09-19 15:57:15 -04:00
lapb net/lapb: re-send packets on timeout 2013-09-23 16:52:45 -04:00
llc llc: remove noisy WARN from llc_mac_hdr_init 2014-01-28 18:01:32 -08:00
mac80211 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-09-23 12:09:27 -04:00
mac802154 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless 2014-09-08 11:14:56 -04:00
mpls net: Remove gso_send_check as an offload callback 2014-09-26 00:22:47 -04:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next 2014-09-10 12:46:32 -07:00
netlabel Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-08-06 09:38:14 -07:00
netlink netlink: Annotate RCU locking for seq_file walker 2014-08-14 15:13:40 -07:00
netrom net: set name_assign_type in alloc_netdev() 2014-07-15 16:12:48 -07:00
nfc Merge tag 'master-2014-07-31' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next 2014-08-05 13:18:20 -07:00
openvswitch Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-09-23 12:09:27 -04:00
packet net: Pass a "more" indication down into netdev_start_xmit() code paths. 2014-09-01 17:39:55 -07:00
phonet net: set name_assign_type in alloc_netdev() 2014-07-15 16:12:48 -07:00
rds Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-06-12 14:27:40 -07:00
rfkill net: rfkill: gpio: Fix clock status 2014-09-22 16:02:15 -04:00
rose rose: use %*ph specifier 2014-09-07 16:07:25 -07:00
rxrpc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-09-23 12:09:27 -04:00
sched net: sched: use pinned timers 2014-09-26 00:26:48 -04:00
sctp net/ipv4: bind ip_nonlocal_bind to current netns 2014-09-09 11:27:09 -07:00
sunrpc NFS client updates for Linux 3.17 2014-08-13 18:13:19 -06:00
tipc tipc: fix sparse warnings 2014-09-10 14:00:58 -07:00
unix Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-06-12 14:27:40 -07:00
vmw_vsock vsock: Make transport the proto owner 2014-05-05 13:13:50 -04:00
wimax wimax: remove dead code 2013-11-21 13:09:42 -05:00
wireless Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-09-23 12:09:27 -04:00
x25 net: Fix use after free by removing length arg from sk_data_ready callbacks. 2014-04-11 16:15:36 -04:00
xfrm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-09-23 12:09:27 -04:00
Kconfig 6lowpan: introduce new net/6lowpan directory 2014-07-12 01:53:30 +02:00
Makefile 6lowpan: introduce new net/6lowpan directory 2014-07-12 01:53:30 +02:00
compat.c net: sendmsg: fix NULL pointer dereference 2014-07-29 12:20:22 -07:00
nonet.c
socket.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-09-23 12:09:27 -04:00
sysctl_net.c net: Update the sysctl permissions handler to test effective uid/gid 2013-10-07 15:57:56 -04:00