linux/net
Neil Horman 7b363e4400 netpoll: fix race on poll_list resulting in garbage entry
A few months back a race was discused between the netpoll napi service
path, and the fast path through net_rx_action:
http://kerneltrap.org/mailarchive/linux-netdev/2007/10/16/345470

A patch was submitted for that bug, but I think we missed a case.

Consider the following scenario:

INITIAL STATE
CPU0 has one napi_struct A on its poll_list
CPU1 is calling netpoll_send_skb and needs to call poll_napi on the same
napi_struct A that CPU0 has on its list



CPU0						CPU1
net_rx_action					poll_napi
!list_empty (returns true)			locks poll_lock for A
						 poll_one_napi
						  napi->poll
						   netif_rx_complete
						    __napi_complete
						    (removes A from poll_list)
list_entry(list->next)


In the above scenario, net_rx_action assumes that the per-cpu poll_list is
exclusive to that cpu.  netpoll of course violates that, and because the netpoll
path can dequeue from the poll list, its possible for CPU0 to detect a non-empty
list at the top of the while loop in net_rx_action, but have it become empty by
the time it calls list_entry.  Since the poll_list isn't surrounded by any other
structure, the returned data from that list_entry call in this situation is
garbage, and any number of crashes can result based on what exactly that garbage
is.

Given that its not fasible for performance reasons to place exclusive locks
arround each cpus poll list to provide that mutal exclusion, I think the best
solution is modify the netpoll path in such a way that we continue to guarantee
that the poll_list for a cpu is in fact exclusive to that cpu.  To do this I've
implemented the patch below.  It adds an additional bit to the state field in
the napi_struct.  When executing napi->poll from the netpoll_path, this bit will
be set. When a driver calls netif_rx_complete, if that bit is set, it will not
remove the napi_struct from the poll_list.  That work will be saved for the next
iteration of net_rx_action.

I've tested this and it seems to work well.  About the biggest drawback I can
see to it is the fact that it might result in an extra loop through
net_rx_action in the event that the device is actually contended for (i.e. the
netpoll path actually preforms all the needed work no the device, and the call
to net_rx_action winds up doing nothing, except removing the napi_struct from
the poll_list.  However I think this is probably a small price to pay, given
that the alternative is a crash.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-09 23:22:26 -08:00
..
9p 9p: restrict RDMA usage 2008-11-12 23:33:57 -08:00
802 net/802/fc.c: Fix compilation warnings 2008-10-15 00:13:53 -07:00
8021q vlan: Fix typos in proc output string 2008-11-10 13:37:40 -08:00
appletalk net: Rationalise email address: Network Specific Parts 2008-10-13 19:01:08 -07:00
atm ATM: CVE-2008-5079: duplicate listen() on socket corrupts the vcc table 2008-12-04 14:58:13 -08:00
ax25 ax25: Quick fix for making sure unaccepted sockets get destroyed. 2008-10-06 12:53:50 -07:00
bluetooth Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 2008-10-17 08:58:52 -07:00
bridge bridge: netfilter: fix update_pmtu crash with GRE 2008-11-24 16:06:50 -08:00
can can: omit received RTR frames for single ID filter lists 2008-12-04 15:01:08 -08:00
core netpoll: fix race on poll_list resulting in garbage entry 2008-12-09 23:22:26 -08:00
dccp dccp: Port redirection support for DCCP 2008-10-19 23:36:47 -07:00
decnet Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 2008-10-17 08:58:52 -07:00
dsa dsa: fix master interface allmulti/promisc handling 2008-11-10 21:53:12 -08:00
econet netns: Use net_eq() to compare net-namespaces for optimization. 2008-07-19 22:34:43 -07:00
ethernet dsa: add support for Trailer tagging format 2008-10-08 17:24:16 -07:00
ieee80211 net/ieee80211: adjust error handling 2008-08-22 16:29:49 -04:00
ipv4 tcp: tcp_vegas cong avoid fix 2008-12-09 00:13:04 -08:00
ipv6 ipv6: silence log messages for locally generated multicast 2008-12-09 15:48:32 -08:00
ipx netns: Use net_eq() to compare net-namespaces for optimization. 2008-07-19 22:34:43 -07:00
irda Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 2008-07-20 17:43:29 -07:00
iucv iucv: Fix mismerge again. 2008-09-30 03:03:35 -07:00
key af_key: mark policy as dead before destroying 2008-11-06 23:08:37 -08:00
lapb [LAPB] net/lapb/lapb_iface.c: use LIST_HEAD instead of LIST_HEAD_INIT 2008-01-28 14:56:52 -08:00
llc netns: Use net_eq() to compare net-namespaces for optimization. 2008-07-19 22:34:43 -07:00
mac80211 mac80211: use unaligned safe memcmp() in-place of compare_ether_addr() 2008-12-05 09:18:35 -05:00
netfilter tproxy: fixe a possible read from an invalid location in the socket match 2008-12-07 23:53:46 -08:00
netlabel netlabel: Fix a potential NULL pointer dereference 2008-12-03 00:37:04 -08:00
netlink net: Remove CONFIG_KMOD from net/ (towards removing CONFIG_KMOD entirely) 2008-10-16 15:24:51 -07:00
netrom netrom: Fix sock_orphan() use in nr_release 2008-10-06 12:54:57 -07:00
packet net: convert BUG_TRAP to generic WARN_ON 2008-07-25 21:43:18 -07:00
phonet Phonet: do not dump addresses from other namespaces 2008-12-03 15:42:09 -08:00
rfkill Fix logic error in rfkill_check_duplicity 2008-11-06 16:37:09 -05:00
rose rose: zero length frame filtering in af_rose.c 2008-11-25 00:56:20 -08:00
rxrpc net/rxrpc: Use an IS_ERR test rather than a NULL test 2008-08-13 02:40:48 -07:00
sched pkt_sched: fix missing check for packet overrun in qdisc_dump_stab() 2008-11-20 04:07:14 -08:00
sctp sctp: Fix to handle SHUTDOWN in SHUTDOWN_RECEIVED state 2008-10-23 01:01:18 -07:00
sunrpc SUNRPC: Fix a performance regression in the RPC authentication code 2008-11-20 13:17:40 -08:00
tipc tipc: Don't use structure names which easily globally conflict. 2008-09-02 23:38:32 -07:00
unix net: Fix soft lockups/OOM issues w/ unix garbage collector 2008-11-26 15:32:27 -08:00
wanrouter wanmain.c doesn't need syncppp.h 2008-07-23 23:00:36 +02:00
wireless net/wireless/reg.c: fix bad WARN_ON in if statement 2008-11-25 16:13:09 -05:00
x25 netns: Use net_eq() to compare net-namespaces for optimization. 2008-07-19 22:34:43 -07:00
xfrm xfrm: Fix kernel panic when flush and dump SPD entries 2008-12-03 00:27:18 -08:00
Kconfig net: Distributed Switch Architecture protocol support 2008-10-08 17:15:19 -07:00
Makefile net: Distributed Switch Architecture protocol support 2008-10-08 17:15:19 -07:00
TUNABLE
compat.c reintroduce accept4 2008-11-19 18:49:57 -08:00
nonet.c
socket.c reintroduce accept4 2008-11-19 18:49:57 -08:00
sysctl_net.c missing bits of net-namespace / sysctl 2008-07-27 09:45:34 -07:00