linux/net/sched
Daniel Borkmann c78e1746d3 net: sched: fix call_rcu() race on classifier module unloads
Vijay reported that a loop as simple as ...

  while true; do
    tc qdisc add dev foo root handle 1: prio
    tc filter add dev foo parent 1: u32 match u32 0 0  flowid 1
    tc qdisc del dev foo root
    rmmod cls_u32
  done

... will panic the kernel. Moreover, he bisected the change
apparently introducing it to 78fd1d0ab0 ("netlink: Re-add
locking to netlink_lookup() and seq walker").

The removal of synchronize_net() from the netlink socket
triggering the qdisc to be removed, seems to have uncovered
an RCU resp. module reference count race from the tc API.
Given that RCU conversion was done after e341694e3e ("netlink:
Convert netlink_lookup() to use RCU protected hash table")
which added the synchronize_net() originally, occasion of
hitting the bug was less likely (not impossible though):

When qdiscs that i) support attaching classifiers and,
ii) have at least one of them attached, get deleted, they
invoke tcf_destroy_chain(), and thus call into ->destroy()
handler from a classifier module.

After RCU conversion, all classifier that have an internal
prio list, unlink them and initiate freeing via call_rcu()
deferral.

Meanhile, tcf_destroy() releases already reference to the
tp->ops->owner module before the queued RCU callback handler
has been invoked.

Subsequent rmmod on the classifier module is then not prevented
since all module references are already dropped.

By the time, the kernel invokes the RCU callback handler from
the module, that function address is then invalid.

One way to fix it would be to add an rcu_barrier() to
unregister_tcf_proto_ops() to wait for all pending call_rcu()s
to complete.

synchronize_rcu() is not appropriate as under heavy RCU
callback load, registered call_rcu()s could be deferred
longer than a grace period. In case we don't have any pending
call_rcu()s, the barrier is allowed to return immediately.

Since we came here via unregister_tcf_proto_ops(), there
are no users of a given classifier anymore. Further nested
call_rcu()s pointing into the module space are not being
done anywhere.

Only cls_bpf_delete_prog() may schedule a work item, to
unlock pages eventually, but that is not in the range/context
of cls_bpf anymore.

Fixes: 25d8c0d55f ("net: rcu-ify tcf_proto")
Fixes: 9888faefe1 ("net: sched: cls_basic use RCU")
Reported-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Tested-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-21 18:48:18 -04:00
..
Kconfig Merge branch 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild 2015-02-19 10:36:45 -08:00
Makefile net: sched: Introduce connmark action 2015-01-19 16:02:06 -05:00
act_api.c net: sched: enable per cpu qstats 2014-09-30 01:02:26 -04:00
act_bpf.c bpf: fix bpf helpers to use skb->mac_header relative offsets 2015-04-16 14:08:49 -04:00
act_connmark.c net: sched: act_connmark: don't zap skb->nfct 2015-04-29 14:56:40 -04:00
act_csum.c net: sched: fix skb->protocol use in case of accelerated vlan path 2015-01-13 17:51:08 -05:00
act_gact.c sched: fix act file names in header comment 2014-11-06 15:04:41 -05:00
act_ipt.c sched: fix act file names in header comment 2014-11-06 15:04:41 -05:00
act_mirred.c act_mirred: Fix bogus header when redirecting from VLAN 2015-04-17 13:29:28 -04:00
act_nat.c
act_pedit.c sched: fix act file names in header comment 2014-11-06 15:04:41 -05:00
act_police.c sched: fix act file names in header comment 2014-11-06 15:04:41 -05:00
act_simple.c sched: fix act file names in header comment 2014-11-06 15:04:41 -05:00
act_skbedit.c
act_vlan.c sched: introduce vlan action 2014-11-21 14:20:18 -05:00
cls_api.c net: sched: fix call_rcu() race on classifier module unloads 2015-05-21 18:48:18 -04:00
cls_basic.c net_sched: destroy proto tp when all filters are gone 2015-03-09 15:35:55 -04:00
cls_bpf.c bpf: fix bpf helpers to use skb->mac_header relative offsets 2015-04-16 14:08:49 -04:00
cls_cgroup.c net_sched: destroy proto tp when all filters are gone 2015-03-09 15:35:55 -04:00
cls_flow.c net_sched: destroy proto tp when all filters are gone 2015-03-09 15:35:55 -04:00
cls_fw.c net_sched: destroy proto tp when all filters are gone 2015-03-09 15:35:55 -04:00
cls_route.c net_sched: destroy proto tp when all filters are gone 2015-03-09 15:35:55 -04:00
cls_rsvp.c
cls_rsvp.h net_sched: destroy proto tp when all filters are gone 2015-03-09 15:35:55 -04:00
cls_rsvp6.c
cls_tcindex.c net_sched: destroy proto tp when all filters are gone 2015-03-09 15:35:55 -04:00
cls_u32.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-03-20 18:51:09 -04:00
em_canid.c net: sched: remove tcf_proto from ematch calls 2014-10-06 18:02:32 -04:00
em_cmp.c
em_ipset.c net: sched: fix skb->protocol use in case of accelerated vlan path 2015-01-13 17:51:08 -05:00
em_meta.c net: rename vlan_tx_* helpers since "tx" is misleading there 2015-01-13 17:51:08 -05:00
em_nbyte.c net: sched: remove tcf_proto from ematch calls 2014-10-06 18:02:32 -04:00
em_text.c net: Remove state argument from skb_find_text() 2015-02-22 15:59:54 -05:00
em_u32.c
ematch.c ematch: Fix auto-loading of ematch modules. 2015-02-20 15:30:56 -05:00
sch_api.c net_sched: destroy proto tp when all filters are gone 2015-03-09 15:35:55 -04:00
sch_atm.c net: sched: enable per cpu qstats 2014-09-30 01:02:26 -04:00
sch_blackhole.c
sch_cbq.c net: sched: enable per cpu qstats 2014-09-30 01:02:26 -04:00
sch_choke.c net: sched: implement qstat helper routines 2014-09-30 01:02:26 -04:00
sch_codel.c codel: fix maxpacket/mtu confusion 2015-05-03 22:17:40 -04:00
sch_drr.c net: sched: enable per cpu qstats 2014-09-30 01:02:26 -04:00
sch_dsmark.c net: sched: fix skb->protocol use in case of accelerated vlan path 2015-01-13 17:51:08 -05:00
sch_fifo.c net: sched: implement qstat helper routines 2014-09-30 01:02:26 -04:00
sch_fq.c pkt_sched: fq: correct spelling of locally 2015-04-01 22:52:29 -04:00
sch_fq_codel.c codel: fix maxpacket/mtu confusion 2015-05-03 22:17:40 -04:00
sch_generic.c net_sched: restore qdisc quota fairness limits after bulk dequeue 2014-10-09 19:12:26 -04:00
sch_gred.c net_sched: gred: use correct backlog value in WRED mode 2015-05-11 13:26:26 -04:00
sch_hfsc.c net: sched: enable per cpu qstats 2014-09-30 01:02:26 -04:00
sch_hhf.c net: sched: implement qstat helper routines 2014-09-30 01:02:26 -04:00
sch_htb.c net: sched: enable per cpu qstats 2014-09-30 01:02:26 -04:00
sch_ingress.c net: use jump label patching for ingress qdisc in __netif_receive_skb_core 2015-04-13 13:34:40 -04:00
sch_mq.c net: sched: enable per cpu qstats 2014-09-30 01:02:26 -04:00
sch_mqprio.c net: sched: enable per cpu qstats 2014-09-30 01:02:26 -04:00
sch_multiq.c net: sched: enable per cpu qstats 2014-09-30 01:02:26 -04:00
sch_netem.c netem: Fixes byte backlog accounting for the first of two chained netem instances 2015-04-07 18:34:24 -04:00
sch_pie.c sch_pie: schedule the timer after all init succeed 2014-10-29 14:28:01 -04:00
sch_plug.c
sch_prio.c net: sched: enable per cpu qstats 2014-09-30 01:02:26 -04:00
sch_qfq.c net: sched: enable per cpu qstats 2014-09-30 01:02:26 -04:00
sch_red.c net: sched: implement qstat helper routines 2014-09-30 01:02:26 -04:00
sch_sfb.c net: sched: implement qstat helper routines 2014-09-30 01:02:26 -04:00
sch_sfq.c net: sched: enable per cpu qstats 2014-09-30 01:02:26 -04:00
sch_tbf.c net: sched: avoid costly atomic operation in fq_dequeue() 2014-10-06 00:55:10 -04:00
sch_teql.c net: sched: fix skb->protocol use in case of accelerated vlan path 2015-01-13 17:51:08 -05:00