linux/include/net
Konstantin Khlebnikov b56141ab34 net: frag, fix race conditions in LRU list maintenance
This patch fixes race between inet_frag_lru_move() and inet_frag_lru_add()
which was introduced in commit 3ef0eb0db4
("net: frag, move LRU list maintenance outside of rwlock")

One cpu already added new fragment queue into hash but not into LRU.
Other cpu found it in hash and tries to move it to the end of LRU.
This leads to NULL pointer dereference inside of list_move_tail().

Another possible race condition is between inet_frag_lru_move() and
inet_frag_lru_del(): move can happens after deletion.

This patch initializes LRU list head before adding fragment into hash and
inet_frag_lru_move() doesn't touches it if it's empty.

I saw this kernel oops two times in a couple of days.

[119482.128853] BUG: unable to handle kernel NULL pointer dereference at           (null)
[119482.132693] IP: [<ffffffff812ede89>] __list_del_entry+0x29/0xd0
[119482.136456] PGD 2148f6067 PUD 215ab9067 PMD 0
[119482.140221] Oops: 0000 [#1] SMP
[119482.144008] Modules linked in: vfat msdos fat 8021q fuse nfsd auth_rpcgss nfs_acl nfs lockd sunrpc ppp_async ppp_generic bridge slhc stp llc w83627ehf hwmon_vid snd_hda_codec_hdmi snd_hda_codec_realtek kvm_amd k10temp kvm snd_hda_intel snd_hda_codec edac_core radeon snd_hwdep ath9k snd_pcm ath9k_common snd_page_alloc ath9k_hw snd_timer snd soundcore drm_kms_helper ath ttm r8169 mii
[119482.152692] CPU 3
[119482.152721] Pid: 20, comm: ksoftirqd/3 Not tainted 3.9.0-zurg-00001-g9f95269 #132 To Be Filled By O.E.M. To Be Filled By O.E.M./RS880D
[119482.161478] RIP: 0010:[<ffffffff812ede89>]  [<ffffffff812ede89>] __list_del_entry+0x29/0xd0
[119482.166004] RSP: 0018:ffff880216d5db58  EFLAGS: 00010207
[119482.170568] RAX: 0000000000000000 RBX: ffff88020882b9c0 RCX: dead000000200200
[119482.175189] RDX: 0000000000000000 RSI: 0000000000000880 RDI: ffff88020882ba00
[119482.179860] RBP: ffff880216d5db58 R08: ffffffff8155c7f0 R09: 0000000000000014
[119482.184570] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88020882ba00
[119482.189337] R13: ffffffff81c8d780 R14: ffff880204357f00 R15: 00000000000005a0
[119482.194140] FS:  00007f58124dc700(0000) GS:ffff88021fcc0000(0000) knlGS:0000000000000000
[119482.198928] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[119482.203711] CR2: 0000000000000000 CR3: 00000002155f0000 CR4: 00000000000007e0
[119482.208533] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[119482.213371] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[119482.218221] Process ksoftirqd/3 (pid: 20, threadinfo ffff880216d5c000, task ffff880216d3a9a0)
[119482.223113] Stack:
[119482.228004]  ffff880216d5dbd8 ffffffff8155dcda 0000000000000000 ffff000200000001
[119482.233038]  ffff8802153c1f00 ffff880000289440 ffff880200000014 ffff88007bc72000
[119482.238083]  00000000000079d5 ffff88007bc72f44 ffffffff00000002 ffff880204357f00
[119482.243090] Call Trace:
[119482.248009]  [<ffffffff8155dcda>] ip_defrag+0x8fa/0xd10
[119482.252921]  [<ffffffff815a8013>] ipv4_conntrack_defrag+0x83/0xe0
[119482.257803]  [<ffffffff8154485b>] nf_iterate+0x8b/0xa0
[119482.262658]  [<ffffffff8155c7f0>] ? inet_del_offload+0x40/0x40
[119482.267527]  [<ffffffff815448e4>] nf_hook_slow+0x74/0x130
[119482.272412]  [<ffffffff8155c7f0>] ? inet_del_offload+0x40/0x40
[119482.277302]  [<ffffffff8155d068>] ip_rcv+0x268/0x320
[119482.282147]  [<ffffffff81519992>] __netif_receive_skb_core+0x612/0x7e0
[119482.286998]  [<ffffffff81519b78>] __netif_receive_skb+0x18/0x60
[119482.291826]  [<ffffffff8151a650>] process_backlog+0xa0/0x160
[119482.296648]  [<ffffffff81519f29>] net_rx_action+0x139/0x220
[119482.301403]  [<ffffffff81053707>] __do_softirq+0xe7/0x220
[119482.306103]  [<ffffffff81053868>] run_ksoftirqd+0x28/0x40
[119482.310809]  [<ffffffff81074f5f>] smpboot_thread_fn+0xff/0x1a0
[119482.315515]  [<ffffffff81074e60>] ? lg_local_lock_cpu+0x40/0x40
[119482.320219]  [<ffffffff8106d870>] kthread+0xc0/0xd0
[119482.324858]  [<ffffffff8106d7b0>] ? insert_kthread_work+0x40/0x40
[119482.329460]  [<ffffffff816c32dc>] ret_from_fork+0x7c/0xb0
[119482.334057]  [<ffffffff8106d7b0>] ? insert_kthread_work+0x40/0x40
[119482.338661] Code: 00 00 55 48 8b 17 48 b9 00 01 10 00 00 00 ad de 48 8b 47 08 48 89 e5 48 39 ca 74 29 48 b9 00 02 20 00 00 00 ad de 48 39 c8 74 7a <4c> 8b 00 4c 39 c7 75 53 4c 8b 42 08 4c 39 c7 75 2b 48 89 42 08
[119482.343787] RIP  [<ffffffff812ede89>] __list_del_entry+0x29/0xd0
[119482.348675]  RSP <ffff880216d5db58>
[119482.353493] CR2: 0000000000000000

Oops happened on this path:
ip_defrag() -> ip_frag_queue() -> inet_frag_lru_move() -> list_move_tail() -> __list_del_entry()

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Eric Dumazet <edumazet@google.com>
Cc: David S. Miller <davem@davemloft.net>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-06 11:06:51 -04:00
..
9p 9p: turn fid->dlist into hlist 2013-02-27 22:51:08 -05:00
bluetooth Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-05-01 17:51:54 -07:00
caif caif: Remove my bouncing email address. 2013-04-23 13:25:51 -04:00
irda irda: small read past the end of array in debug code 2013-04-19 17:32:31 -04:00
iucv af_iucv: fix recvmsg by replacing skb_pull() function 2013-04-08 17:16:57 -04:00
netfilter netfilter: move skb_gso_segment into nfnetlink_queue module 2013-04-29 20:09:05 +02:00
netns netfilter: nf_log: prepare net namespace support for loggers 2013-04-05 20:12:54 +02:00
nfc NFC: RFKILL support 2013-04-12 16:54:45 +02:00
phonet
sctp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2013-05-01 14:08:52 -07:00
tc_act
act_api.h act_police: move struct tcf_police to act_police.c 2013-02-12 18:59:45 -05:00
addrconf.h ipv6: statically link register_inet6addr_notifier() 2013-04-14 15:24:17 -04:00
af_ieee802154.h
af_rxrpc.h
af_unix.h af_unix: fix a fatal race with bit fields 2013-05-01 15:13:49 -04:00
ah.h
arp.h
atmclip.h
ax25.h hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
ax88796.h
cfg80211-wext.h
cfg80211.h cfg80211: introduce critical protocol indication from user-space 2013-04-22 15:48:00 +02:00
checksum.h
cipso_ipv4.h
cls_cgroup.h cls_cgroup: remove task_struct parameter from sock_update_classid() 2013-04-09 13:19:35 -04:00
codel.h
compat.h
datalink.h
dcbevent.h
dcbnl.h
dn_dev.h
dn_fib.h decnet: Parse netlink attributes on our own 2013-03-22 10:31:16 -04:00
dn_neigh.h
dn_nsp.h
dn_route.h decnet: use correct RCU API to deref sk_dst_cache field 2013-01-28 00:15:27 -05:00
dn.h
dsa.h
dsfield.h ipv6: Optimize ipv6_change_dsfield(). 2013-01-09 23:59:53 -08:00
dst_ops.h
dst.h Fix dst_neigh_lookup/dst_neigh_lookup_skb return value handling bug 2013-03-15 09:06:58 -04:00
esp.h
ethoc.h
fib_rules.h
firewire.h firewire net, ipv4 arp: Extend hardware address and remove driver-level packet inspection. 2013-03-26 12:32:13 -04:00
flow_keys.h flow_keys: include thoff into flow_keys for later usage 2013-03-20 12:14:36 -04:00
flow.h
garp.h
gen_stats.h
genetlink.h genl: Allow concurrent genl callbacks. 2013-04-25 01:43:15 -04:00
gre.h GRE: Refactor GRE tunneling code. 2013-03-26 12:27:18 -04:00
gro_cells.h gro: Fix kcalloc argument order 2013-01-27 22:46:33 -05:00
icmp.h ipv4: fix error handling in icmp_protocol. 2013-02-22 15:10:18 -05:00
ieee80211_radiotap.h mac80211: support (partial) VHT radiotap information 2012-11-27 11:56:18 +01:00
ieee802154_netdev.h ieee802154/nl-mac.c: make some MLME operations optional 2013-04-08 12:00:16 -04:00
ieee802154.h
if_inet6.h net: ipv6: only invalidate previously tokenized addresses 2013-04-09 13:12:23 -04:00
inet6_connection_sock.h
inet6_hashtables.h ipv6: use a stronger hash for tcp 2013-02-21 18:15:58 -05:00
inet_common.h
inet_connection_sock.h tcp: Tail loss probe (TLP) 2013-03-12 08:30:34 -04:00
inet_ecn.h
inet_frag.h net: frag, fix race conditions in LRU list maintenance 2013-05-06 11:06:51 -04:00
inet_hashtables.h hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
inet_sock.h ipv6: use a stronger hash for tcp 2013-02-21 18:15:58 -05:00
inet_timewait_sock.h hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
inetpeer.h
ip6_checksum.h ipv6: move csum_ipv6_magic() and udp6_csum_init() into static library 2013-01-08 17:56:10 -08:00
ip6_fib.h ipv6: fix race condition regarding dst->expires and dst->from. 2013-02-20 15:11:45 -05:00
ip6_route.h ipv6: Remove unused neigh argument for icmp6_dst_alloc() and its callers. 2013-01-18 14:41:13 -05:00
ip6_tunnel.h GRE: Refactor GRE tunneling code. 2013-03-26 12:27:18 -04:00
ip_fib.h ipv4: fix definition of FIB_TABLE_HASHSZ 2013-03-13 10:47:09 -04:00
ip_tunnels.h GRE: Refactor GRE tunneling code. 2013-03-26 12:27:18 -04:00
ip_vs.h ipvs: fix sparse warnings for some parameters 2013-04-23 11:43:05 +09:00
ip.h ipv4: Add a socket release callback for datagram sockets 2013-01-21 14:17:05 -05:00
ipcomp.h
ipconfig.h
ipv6.h ipv6: implement RFC3168 5.3 (ecn protection) for ipv6 fragmentation handling 2013-03-24 17:16:30 -04:00
ipx.h
iw_handler.h
lapb.h
lib80211.h hostap: Don't use create_proc_read_entry() 2013-04-29 15:41:56 -04:00
llc_c_ac.h
llc_c_ev.h
llc_c_st.h
llc_conn.h
llc_if.h
llc_pdu.h
llc_s_ac.h
llc_s_ev.h
llc_s_st.h
llc_sap.h
llc.h
mac80211.h mac80211: improve the rate control API 2013-04-22 16:16:41 +02:00
mac802154.h
mip6.h
mld.h
mrp.h net/802: Implement Multiple Registration Protocol (MRP) 2013-02-10 20:37:22 -05:00
ndisc.h ndisc: Move ndisc_opt_addr_space() to include/net/ndisc.h. 2013-01-21 13:33:14 -05:00
neighbour.h net neighbour, decnet: Ensure to align device private data on preferred alignment. 2013-02-11 00:21:44 -05:00
net_namespace.h netfilter: make /proc/net/netfilter pernet 2013-04-05 19:35:02 +02:00
net_ratelimit.h
netdma.h
netevent.h ipv6 netevent: Remove old_neigh from netevent_redirect. 2013-01-14 15:04:59 -05:00
netlabel.h
netlink.h
netprio_cgroup.h netprio_cgroup: remove task_struct parameter from sock_update_netprio() 2013-04-09 13:19:37 -04:00
netrom.h hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
nexthop.h
nl802154.h
p8022.h
ping.h
pkt_cls.h pkt_sched: namespace aware act_mirred 2013-01-14 15:09:36 -05:00
pkt_sched.h sch_api: introduce qdisc_watchdog_schedule_ns() 2013-02-12 18:59:45 -05:00
protocol.h net: Remove code duplication between offload structures 2012-11-15 17:39:51 -05:00
psnap.h
raw.h
rawv6.h
red.h
regulatory.h regulatory: use RCU to protect last_request 2013-01-03 13:01:30 +01:00
request_sock.h net: remove a stale comment for dl_next 2013-04-22 15:55:48 -04:00
rose.h
route.h
rtnetlink.h rtnetlink: Remove passing of attributes into rtnl_doit functions 2013-03-22 10:31:16 -04:00
sch_generic.h hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
scm.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2013-04-22 20:32:51 -04:00
secure_seq.h net: defer net_secret[] initialization 2013-04-29 15:14:02 -04:00
slhc_vj.h
snmp.h
sock.h net: sock: make sock_tx_timestamp void 2013-04-14 15:41:49 -04:00
stp.h
tcp_memcontrol.h
tcp_states.h
tcp.h tcp: GSO should be TSQ friendly 2013-04-12 18:17:06 -04:00
timewait_sock.h
transp_v6.h ipv6: rename datagram_send_ctl and datagram_recv_ctl 2013-01-31 13:53:08 -05:00
udp.h
udplite.h
wext.h
wimax.h
wpan-phy.h
x25.h
x25device.h
xfrm.h xfrm: allow to avoid copying DSCP during encapsulation 2013-03-06 07:02:45 +01:00