linux/include/net
Eric Dumazet 271b72c7fa udp: RCU handling for Unicast packets.
Goals are :

1) Optimizing handling of incoming Unicast UDP frames, so that no memory
 writes should happen in the fast path.

 Note: Multicasts and broadcasts still will need to take a lock,
 because doing a full lockless lookup in this case is difficult.

2) No expensive operations in the socket bind/unhash phases :
  - No expensive synchronize_rcu() calls.

  - No added rcu_head in socket structure, increasing memory needs,
  but more important, forcing us to use call_rcu() calls,
  that have the bad property of making sockets structure cold.
  (rcu grace period between socket freeing and its potential reuse
   make this socket being cold in CPU cache).
  David did a previous patch using call_rcu() and noticed a 20%
  impact on TCP connection rates.
  Quoting Cristopher Lameter :
   "Right. That results in cacheline cooldown. You'd want to recycle
    the object as they are cache hot on a per cpu basis. That is screwed
    up by the delayed regular rcu processing. We have seen multiple
    regressions due to cacheline cooldown.
    The only choice in cacheline hot sensitive areas is to deal with the
    complexity that comes with SLAB_DESTROY_BY_RCU or give up on RCU."

  - Because udp sockets are allocated from dedicated kmem_cache,
  use of SLAB_DESTROY_BY_RCU can help here.

Theory of operation :
---------------------

As the lookup is lockfree (using rcu_read_lock()/rcu_read_unlock()),
special attention must be taken by readers and writers.

Use of SLAB_DESTROY_BY_RCU is tricky too, because a socket can be freed,
reused, inserted in a different chain or in worst case in the same chain
while readers could do lookups in the same time.

In order to avoid loops, a reader must check each socket found in a chain
really belongs to the chain the reader was traversing. If it finds a
mismatch, lookup must start again at the begining. This *restart* loop
is the reason we had to use rdlock for the multicast case, because
we dont want to send same message several times to the same socket.

We use RCU only for fast path.
Thus, /proc/net/udp still takes spinlocks.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-29 02:11:14 -07:00
..
9p 9p: fix sparse warnings 2008-10-22 18:54:47 -05:00
bluetooth include: replace __FUNCTION__ with __func__ 2008-10-16 11:21:30 -07:00
irda include: replace __FUNCTION__ with __func__ 2008-10-16 11:21:30 -07:00
iucv
netfilter net: replace uses of NIP6_FMT with %p6 2008-10-28 23:02:31 -07:00
netns net: implement emergency route cache rebulds when gc_elasticity is exceeded 2008-10-27 17:06:14 -07:00
phonet Phonet: include generic link-layer header size in MAX_PHONET_HEADER 2008-10-26 23:06:31 -07:00
sctp net: replace uses of NIP6_FMT with %p6 2008-10-28 23:02:31 -07:00
tc_act pkt_action: add new action skbedit 2008-09-12 16:30:20 -07:00
tipc tipc: Remove unneeded parameter to tipc_createport_raw() 2008-07-14 22:42:19 -07:00
act_api.h
addrconf.h netns: Add network namespace argument to rt6_fill_node() and ipv6_dev_get_saddr() 2008-08-14 15:33:21 -07:00
af_rxrpc.h
af_unix.h [PATCH] f_count may wrap around 2008-07-26 20:53:40 -04:00
ah.h
arp.h
atmclip.h
ax25.h
ax88796.h
cfg80211.h cfg80211: show interface type 2008-09-24 16:18:00 -04:00
checksum.h
cipso_ipv4.h cipso: Add support for native local labeling and fixup mapping names 2008-10-10 10:16:34 -04:00
compat.h net: Use standard structures for generic socket address structures. 2008-07-19 22:35:47 -07:00
datalink.h
dn_dev.h
dn_fib.h
dn_neigh.h
dn_nsp.h
dn_route.h
dn.h
dsa.h dsa: add support for Trailer tagging format 2008-10-08 17:24:16 -07:00
dsfield.h
dst.h net: reduce structures when XFRM=n 2008-10-28 13:24:06 -07:00
esp.h
fib_rules.h net: add fib_rules_ops to flush_cache method 2008-07-05 19:01:28 -07:00
flow.h ipv4: Loosen source address check on IPv4 output 2008-10-01 07:28:28 -07:00
garp.h vlan: Add GVRP support 2008-07-05 21:26:57 -07:00
gen_stats.h
genetlink.h netlink: Improve returned error codes 2008-06-03 16:36:54 -07:00
icmp.h mib: put icmpmsg statistics on struct net 2008-07-18 04:04:22 -07:00
ieee80211_crypt.h
ieee80211_radiotap.h include: use get/put_unaligned_* helpers 2008-07-25 10:53:26 -07:00
ieee80211.h include: replace __FUNCTION__ with __func__ 2008-10-16 11:21:30 -07:00
if_inet6.h ipv6: make struct ipv6_devconf static 2008-07-22 14:21:58 -07:00
inet6_connection_sock.h
inet6_hashtables.h inet: Don't lookup the socket if there's a socket attached to the skb 2008-10-07 12:41:01 -07:00
inet_common.h
inet_connection_sock.h net: more #ifdef CONFIG_COMPAT 2008-08-28 02:53:51 -07:00
inet_ecn.h
inet_frag.h
inet_hashtables.h inet: Don't lookup the socket if there's a socket attached to the skb 2008-10-07 12:41:01 -07:00
inet_sock.h tcp: Port redirection support for TCP 2008-10-01 07:46:49 -07:00
inet_timewait_sock.h ipv4: Implement IP_TRANSPARENT socket option 2008-10-01 07:30:02 -07:00
inetpeer.h net: remove CVS keywords 2008-06-11 21:00:38 -07:00
ip6_checksum.h
ip6_fib.h
ip6_route.h netns: Add network namespace argument to rt6_fill_node() and ipv6_dev_get_saddr() 2008-08-14 15:33:21 -07:00
ip6_tunnel.h net: remove CVS keywords 2008-06-11 21:00:38 -07:00
ip_fib.h
ip_vs.h net: replace uses of NIP6_FMT with %p6 2008-10-28 23:02:31 -07:00
ip.h sysctl: simplify ->strategy 2008-10-16 11:21:47 -07:00
ipcomp.h ipsec: ipcomp - Merge IPComp implementations 2008-07-25 02:54:40 -07:00
ipconfig.h net: remove CVS keywords 2008-06-11 21:00:38 -07:00
ipip.h inet: Make tunnel RX/TX byte counters more consistent 2008-10-09 12:03:17 -07:00
ipv6.h ipv6: making ip and icmp statistics per/namespace 2008-10-08 11:16:45 -07:00
ipx.h
iw_handler.h wext: Emit event stream entries correctly when compat. 2008-06-16 18:50:49 -07:00
lapb.h
llc_c_ac.h
llc_c_ev.h
llc_c_st.h
llc_conn.h
llc_if.h
llc_pdu.h
llc_s_ac.h
llc_s_ev.h
llc_s_st.h
llc_sap.h
llc.h
mac80211.h mac80211: fixme for kernel-doc 2008-10-14 21:12:37 -04:00
mip6.h
ndisc.h sysctl: simplify ->strategy 2008-10-16 11:21:47 -07:00
neighbour.h core: add stat to track unresolved discards in neighbor cache 2008-07-16 20:50:49 -07:00
net_namespace.h netfilter: netns nf_conntrack: add netns boilerplate 2008-10-08 11:35:02 +02:00
netdma.h
netevent.h
netlabel.h netlabel: Add configuration support for local labeling 2008-10-10 10:16:34 -04:00
netlink.h netlink: constify struct nlattr * arg to parsing functions 2008-10-28 11:59:11 -07:00
netrom.h
nexthop.h
p8022.h
pkt_cls.h
pkt_sched.h pkt_sched: Remove the tx queue state check in qdisc_run() 2008-09-23 01:05:56 -07:00
protocol.h
psnap.h
raw.h
rawv6.h
red.h
request_sock.h tcp: Fix kernel panic when calling tcp_v(4/6)_md5_do_lookup 2008-08-06 23:50:04 -07:00
rose.h rose: improving AX25 routing frames via ROSE network 2008-06-17 17:08:32 -07:00
route.h ipv4: Conditionally enable transparent flow flag when connecting 2008-10-01 07:35:39 -07:00
rtnetlink.h
sch_generic.h pkt_sched: Fix handling of gso skbs on requeuing 2008-10-06 09:54:39 -07:00
scm.h
slhc_vj.h
snmp.h net: remove CVS keywords 2008-06-11 21:00:38 -07:00
sock.h udp: RCU handling for Unicast packets. 2008-10-29 02:11:14 -07:00
stp.h net: Add STP demux layer 2008-07-05 21:25:39 -07:00
syncppp.h Remove bogus variables from syncppp.[ch] 2008-07-23 23:00:31 +02:00
tcp_states.h
tcp.h net: wrap sk->sk_backlog_rcv() 2008-10-07 14:18:42 -07:00
timewait_sock.h
transp_v6.h net: change proto destroy method to return void 2008-06-14 17:04:49 -07:00
udp.h udp: introduce struct udp_table and multiple spinlocks 2008-10-29 01:41:45 -07:00
udplite.h udp: introduce struct udp_table and multiple spinlocks 2008-10-29 01:41:45 -07:00
wext.h wext: Dispatch and handle compat ioctls entirely in net/wireless/wext.c 2008-06-16 18:32:46 -07:00
wireless.h cfg80211: show interface type 2008-09-24 16:18:00 -04:00
x25.h
x25device.h
xfrm.h net: reduce structures when XFRM=n 2008-10-28 13:24:06 -07:00