linux

Commit Graph

Author	SHA1	Message	Date
David S. Miller	67f11f4ded	net: Remove linux/prefetch.h include from linux/skbuff.h No longer needed. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-22 20:54:11 -04:00
David S. Miller	0fcbe742ea	net: Remove prefetches from SKB list handlers. Noticed by Linus. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-22 20:35:29 -04:00
Heiko Carstens	34ea646c9f	net: add missing prefetch.h include Fixes build errors on s390 and probably other archs as well: In file included from net/ipv4/ip_forward.c:32:0: include/net/udp.h: In function 'udp_csum_outgoing': include/net/udp.h:141:2: error: implicit declaration of function 'prefetch' Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-22 11:26:02 -07:00
Eric Dumazet	0a14842f5a	net: filter: Just In Time compiler for x86-64 In order to speedup packet filtering, here is an implementation of a JIT compiler for x86_64 It is disabled by default, and must be enabled by the admin. echo 1 >/proc/sys/net/core/bpf_jit_enable It uses module_alloc() and module_free() to get memory in the 2GB text kernel range since we call helpers functions from the generated code. EAX : BPF A accumulator EBX : BPF X accumulator RDI : pointer to skb (first argument given to JIT function) RBP : frame pointer (even if CONFIG_FRAME_POINTER=n) r9d : skb->len - skb->data_len (headlen) r8 : skb->data To get a trace of generated code, use : echo 2 >/proc/sys/net/core/bpf_jit_enable Example of generated code : # tcpdump -p -n -s 0 -i eth1 host 192.168.20.0/24 flen=18 proglen=147 pass=3 image=ffffffffa00b5000 JIT code: ffffffffa00b5000: 55 48 89 e5 48 83 ec 60 48 89 5d f8 44 8b 4f 60 JIT code: ffffffffa00b5010: 44 2b 4f 64 4c 8b 87 b8 00 00 00 be 0c 00 00 00 JIT code: ffffffffa00b5020: e8 24 7b f7 e0 3d 00 08 00 00 75 28 be 1a 00 00 JIT code: ffffffffa00b5030: 00 e8 fe 7a f7 e0 24 00 3d 00 14 a8 c0 74 49 be JIT code: ffffffffa00b5040: 1e 00 00 00 e8 eb 7a f7 e0 24 00 3d 00 14 a8 c0 JIT code: ffffffffa00b5050: 74 36 eb 3b 3d 06 08 00 00 74 07 3d 35 80 00 00 JIT code: ffffffffa00b5060: 75 2d be 1c 00 00 00 e8 c8 7a f7 e0 24 00 3d 00 JIT code: ffffffffa00b5070: 14 a8 c0 74 13 be 26 00 00 00 e8 b5 7a f7 e0 24 JIT code: ffffffffa00b5080: 00 3d 00 14 a8 c0 75 07 b8 ff ff 00 00 eb 02 31 JIT code: ffffffffa00b5090: c0 c9 c3 BPF program is 144 bytes long, so native program is almost same size ;) (000) ldh [12] (001) jeq #0x800 jt 2 jf 8 (002) ld [26] (003) and #0xffffff00 (004) jeq #0xc0a81400 jt 16 jf 5 (005) ld [30] (006) and #0xffffff00 (007) jeq #0xc0a81400 jt 16 jf 17 (008) jeq #0x806 jt 10 jf 9 (009) jeq #0x8035 jt 10 jf 17 (010) ld [28] (011) and #0xffffff00 (012) jeq #0xc0a81400 jt 16 jf 13 (013) ld [38] (014) and #0xffffff00 (015) jeq #0xc0a81400 jt 16 jf 17 (016) ret #65535 (017) ret #0 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-27 23:05:08 -07:00
Linus Torvalds	42933bac11	Merge branch 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6 * 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6: Fix common misspellings	2011-04-07 11:14:49 -07:00
Lucas De Marchi	25985edced	Fix common misspellings Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>	2011-03-31 11:26:23 -03:00
David S. Miller	eec009548e	net: Fix warnings caused by MAX_SKB_FRAGS change. After commit `a715dea3c8` ("net: Always allocate at least 16 skb frags regardless of page size"), the value of MAX_SKB_FRAGS can now take on either an "unsigned long" or an "int" value. This causes warnings like: net/packet/af_packet.c: In function ‘tpacket_fill_skb’: net/packet/af_packet.c:948: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 2 has type ‘int’ Fix by forcing the constant to be unsigned long, otherwise we have a situation where the type of a system wide constant is variable. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-29 23:34:08 -07:00
Anton Blanchard	a715dea3c8	net: Always allocate at least 16 skb frags regardless of page size When analysing performance of the cxgb3 on a ppc64 box I noticed that we weren't doing much GRO merging. It turns out we are limited by the number of SKB frags: #define MAX_SKB_FRAGS (65536/PAGE_SIZE + 2) With a 4kB page size we have 18 frags, but with a 64kB page size we only have 3 frags. I ran a single stream TCP bandwidth test to compare the performance of Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-28 22:26:32 -07:00
Jiri Pirko	8a4eb5734e	net: introduce rx_handler results and logic around that This patch allows rx_handlers to better signalize what to do next to it's caller. That makes skb->deliver_no_wcard no longer needed. kernel-doc for rx_handler_result is taken from Nicolas' patch. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-16 12:53:54 -07:00
Michał Mirosław	04ed3e741d	net: change netdev->features to u32 Quoting Ben Hutchings: we presumably won't be defining features that can only be enabled on 64-bit architectures. Occurences found by `grep -r` on net/, drivers/net, include/ [ Move features and vlan_features next to each other in struct netdev, as per Eric Dumazet's suggestion -DaveM ] Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-24 15:32:47 -08:00
David S. Miller	686a295553	net: Add safe reverse SKB queue walkers. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-20 22:47:32 -08:00
KOVACS Krisztian	2fc72c7b84	netfilter: fix compilation when conntrack is disabled but tproxy is enabled The IPv6 tproxy patches split IPv6 defragmentation off of conntrack, but failed to update the #ifdef stanzas guarding the defragmentation related fields and code in skbuff and conntrack related code in nf_defrag_ipv6.c. This patch adds the required #ifdefs so that IPv6 tproxy can truly be used without connection tracking. Original report: http://marc.info/?l=linux-netdev&m=129010118516341&w=2 Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: KOVACS Krisztian <hidden@balabit.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2011-01-12 20:25:08 +01:00
Michał Mirosław	04fb451eff	net: Introduce skb_checksum_start_offset() Introduce skb_checksum_start_offset() to replace repetitive calculation. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-16 14:43:14 -08:00
Vladislav Zolotarov	a3d22a68d7	bnx2x: Take the distribution range definition out of skb_tx_hash() Move the calcualation of the Tx hash for a given hash range into a separate function and define the skb_tx_hash(), which calculates a Tx hash for a [0; dev->real_num_tx_queues - 1] hash values range, using this function (__skb_tx_hash()). Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-16 13:15:53 -08:00
Tom Herbert	3853b5841c	xps: Improvements in TX queue selection In dev_pick_tx, don't do work in calculating queue index or setting the index in the sock unless the device has more than one queue. This allows the sock to be set only with a queue index of a multi-queue device which is desirable if device are stacked like in a tunnel. We also allow the mapping of a socket to queue to be changed. To maintain in order packet transmission a flag (ooo_okay) has been added to the sk_buff structure. If a transport layer sets this flag on a packet, the transmit queue can be changed for the socket. Presumably, the transport would set this if there was no possbility of creating OOO packets (for instance, there are no packets in flight for the socket). This patch includes the modification in TCP output for setting this flag. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-24 11:44:19 -08:00
Eric Dumazet	27b75c95f1	net: avoid RCU for NOCACHE dst There is no point using RCU for dst we allocate for a very short time (used once). Change dst_release() to take DST_NOCACHE into account, but also change skb_dst_set_noref() to force a refcount increment for such dst. This is a _huge_ gain, because we dont waste memory to store xx thousand of dsts. Instead of queueing them to RCU, we can free them instantly. CPU caches can stay hot, re-using same memory blocks to hold temporary dsts. Note : remove unneeded smp_mb__before_atomic_dec(); in dst_release(), since atomic_dec_return() implies a full memory barrier. Stress test, 160.000.000 udp frames sent, IP route cache disabled (DDOS). Before: real 0m38.091s user 0m13.189s sys 7m53.018s After: real 0m29.946s user 0m12.157s sys 7m40.605s For reference, if IP route cache was enabled : real 0m32.030s user 0m10.521s sys 8m15.243s Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-20 03:02:23 -07:00
Eric Dumazet	564824b0c5	net: allocate skbs on local node commit `b30973f877` (node-aware skb allocation) spread a wrong habit of allocating net drivers skbs on a given memory node : The one closest to the NIC hardware. This is wrong because as soon as we try to scale network stack, we need to use many cpus to handle traffic and hit slub/slab management on cross-node allocations/frees when these cpus have to alloc/free skbs bound to a central node. skb allocated in RX path are ephemeral, they have a very short lifetime : Extra cost to maintain NUMA affinity is too expensive. What appeared as a nice idea four years ago is in fact a bad one. In 2010, NIC hardwares are multiqueue, or we use RPS to spread the load, and two 10Gb NIC might deliver more than 28 million packets per second, needing all the available cpus. Cost of cross-node handling in network and vm stacks outperforms the small benefit hardware had when doing its DMA transfert in its 'local' memory node at RX time. Even trying to differentiate the two allocations done for one skb (the sk_buff on local node, the data part on NIC hardware node) is not enough to bring good performance. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-16 11:13:19 -07:00
Eric Dumazet	cb4dfe562c	net: skb_frag_t can be smaller on small arches On 32bit arches, if PAGE_SIZE is smaller than 65536, we can use 16bit offset and size fields. This patch saves 72 bytes per skb on i386, or 128 bytes after rounding. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-26 18:31:13 -07:00
Eric Dumazet	a02cec2155	net: return operator cleanup Change "return (EXPR);" to "return EXPR;" return is not a function, parentheses are not required. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-23 14:33:39 -07:00
Eric Dumazet	bc8acf2c8c	drivers/net: avoid some skb->ip_summed initializations fresh skbs have ip_summed set to CHECKSUM_NONE (0) We can avoid setting again skb->ip_summed to CHECKSUM_NONE in drivers. Introduce skb_checksum_none_assert() helper so that we keep this assertion documented in driver sources. Change most occurrences of : skb->ip_summed = CHECKSUM_NONE; by : skb_checksum_none_assert(skb); Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-02 19:06:22 -07:00
David S. Miller	21dc330157	net: Rename skb_has_frags to skb_has_frag_list SKBs can be "fragmented" in two ways, via a page array (called skb_shinfo(skb)->frags[]) and via a list of SKBs (called skb_shinfo(skb)->frag_list). Since skb_has_frags() tests the latter, it's name is confusing since it sounds more like it's testing the former. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-23 00:13:46 -07:00
Oliver Hartkopp	2244d07bfa	net: simplify flags for tx timestamping This patch removes the abstraction introduced by the union skb_shared_tx in the shared skb data. The access of the different union elements at several places led to some confusion about accessing the shared tx_flags e.g. in skb_orphan_try(). http://marc.info/?l=linux-netdev&m=128084897415886&w=2 Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-19 00:08:30 -07:00
Krishna Kumar	bfb564e739	core: Factor out flow calculation from get_rps_cpu Factor out flow calculation code from get_rps_cpu, since other functions can use the same code. Revisions: v2 (Ben): Separate flow calcuation out and use in select queue. v3 (Arnd): Don't re-implement MIN. v4 (Changli): skb->data points to ethernet header in macvtap, and make a fast path. Tested macvtap with this patch. v5 (Changli): - Cache skb->rxhash in skb_get_rxhash - macvtap may not have pow(2) queues, so change code for queue selection. (Arnd): - Use first available queue if all fails. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-16 21:06:24 -07:00
Changli Gao	f9599ce111	sk_buff: introduce pskb_network_may_pull() Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-04 21:53:14 -07:00
Oliver Hartkopp	cff0d6e6ed	can-raw: Fix skb_orphan_try handling Commit `fc6055a5ba` (net: Introduce skb_orphan_try()) allows an early orphan of the skb and takes care on tx timestamping, which needs the sk-reference in the skb on driver level. So does the can-raw socket, which has not been taken into account here. The patch below adds a 'prevent_sk_orphan' bit in the skb tx shared info, which fixes the problem discovered by Matthias Fuchs here: http://marc.info/?t=128030411900003&r=1&w=2 Even if it's not a primary tx timestamp topic it fits well into some skb shared tx context. Or should be find a different place for the information to protect the sk reference until it reaches the driver level? Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-03 00:31:48 -07:00
Eric Dumazet	fed66381d6	net: pskb_expand_head() optimization Move frags[] at the end of struct skb_shared_info, and make pskb_expand_head() copy only the used part of it instead of whole array. This should avoid kmemcheck warnings and speedup pskb_expand_head() as well, avoiding a lot of cache misses. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-07-24 21:05:57 -07:00
Richard Cochran	c1f19b51d1	net: support time stamping in phy devices. This patch adds a new networking option to allow hardware time stamps from PHY devices. When enabled, likely candidates among incoming and outgoing network packets are offered to the PHY driver for possible time stamping. When accepted by the PHY driver, incoming packets are deferred for later delivery by the driver. The patch also adds phylib driver methods for the SIOCSHWTSTAMP ioctl and callbacks for transmit and receive time stamping. Drivers may optionally implement these functions. Signed-off-by: Richard Cochran <richard.cochran@omicron.at> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-07-18 19:15:26 -07:00
Richard Cochran	4507a71507	net: add driver hook for tx time stamping. This patch adds a hook for transmit time stamps. The transmit hook allows a software fallback for transmit time stamps, for MACs lacking time stamping hardware. Using the hook will still require adding an inline function call to each MAC driver. Signed-off-by: Richard Cochran <richard.cochran@omicron.at> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-07-18 19:15:25 -07:00
Eric Dumazet	5933dd2f02	net: NET_SKB_PAD should depend on L1_CACHE_BYTES In old kernels, NET_SKB_PAD was defined to 16. Then commit `d6301d3dd1` (net: Increase default NET_SKB_PAD to 32), and commit `18e8c134f4` (net: Increase NET_SKB_PAD to 64 bytes) increased it to 64. While first patch was governed by network stack needs, second was more driven by performance issues on current hardware. Real intent was to align data on a cache line boundary. So use max(32, L1_CACHE_BYTES) instead of 64, to be more generic. Remove microblaze and powerpc own NET_SKB_PAD definitions. Thanks to Alexander Duyck and David Miller for their comments. Suggested-by: David Miller <davem@davemloft.net> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-15 18:16:43 -07:00
David S. Miller	62522d36d7	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2010-06-11 13:32:31 -07:00
John Fastabend	597a264b1a	net: deliver skbs on inactive slaves to exact matches Currently, the accelerated receive path for VLAN's will drop packets if the real device is an inactive slave and is not one of the special pkts tested for in skb_bond_should_drop(). This behavior is different then the non-accelerated path and for pkts over a bonded vlan. For example, vlanx -> bond0 -> ethx will be dropped in the vlan path and not delivered to any packet handlers at all. However, bond0 -> vlanx -> ethx and bond0 -> ethx will be delivered to handlers that match the exact dev, because the VLAN path checks the real_dev which is not a slave and netif_recv_skb() doesn't drop frames but only delivers them to exact matches. This patch adds a sk_buff flag which is used for tagging skbs that would previously been dropped and allows the skb to continue to skb_netif_recv(). Here we add logic to check for the deliver_no_wcard flag and if it is set only deliver to handlers that match exactly. This makes both paths above consistent and gives pkt handlers a way to identify skbs that come from inactive slaves. Without this patch in some configurations skbs will be delivered to handlers with exact matches and in others be dropped out right in the vlan path. I have tested the following 4 configurations in failover modes and load balancing modes. # bond0 -> ethx # vlanx -> bond0 -> ethx # bond0 -> vlanx -> ethx # bond0 -> ethx \| vlanx -> -- Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-10 22:23:34 -07:00
Alexander Duyck	b78462ebc6	skbuff: add check for non-linear to warn_if_lro and needs_linearize We can avoid an unecessary cache miss by checking if the skb is non-linear before accessing gso_size/gso_type in skb_warn_if_lro, the same can also be done to avoid a cache miss on nr_frags if data_len is 0. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-05 02:23:16 -07:00
Changli Gao	5b0daa3474	skb: make skb_recycle_check() return a bool value Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-29 00:12:13 -07:00
Eric Dumazet	7fee226ad2	net: add a noref bit on skb dst Use low order bit of skb->_skb_dst to tell dst is not refcounted. Change _skb_dst to _skb_refdst to make sure all uses are catched. skb_dst() returns the dst, regardless of noref bit set or not, but with a lockdep check to make sure a noref dst is not given if current user is not rcu protected. New skb_dst_set_noref() helper to set an notrefcounted dst on a skb. (with lockdep check) skb_dst_drop() drops a reference only if skb dst was refcounted. skb_dst_force() helper is used to force a refcount on dst, when skb is queued and not anymore RCU protected. Use skb_dst_force() in __sk_add_backlog(), __dev_xmit_skb() if !IFF_XMIT_DST_RELEASE or skb enqueued on qdisc queue, in sock_queue_rcv_skb(), in __nf_queue(). Use skb_dst_force() in dev_requeue_skb(). Note: dst_use_noref() still dirties dst, we might transform it later to do one dirtying per jiffies. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-17 17:18:50 -07:00
Eric Dumazet	18e8c134f4	net: Increase NET_SKB_PAD to 64 bytes eth_type_trans() & get_rps_cpus() currently need two 64bytes cache lines in packet to compute rxhash. Increasing NET_SKB_PAD from 32 to 64 reduces the need to one cache line only, and makes RPS faster. NET_IP_ALIGN(2) + ethernet_header(14) + IP_header(20/40) + ports(8) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-06 21:58:51 -07:00
Eric Dumazet	ec7d2f2cf3	net: __alloc_skb() speedup With following patch I can reach maximum rate of my pktgen+udpsink simulator : - 'old' machine : dual quad core E5450 @3.00GHz - 64 UDP rx flows (only differ by destination port) - RPS enabled, NIC interrupts serviced on cpu0 - rps dispatched on 7 other cores. (~130.000 IPI per second) - SLAB allocator (faster than SLUB in this workload) - tg3 NIC - 1.080.000 pps without a single drop at NIC level. Idea is to add two prefetchw() calls in __alloc_skb(), one to prefetch first sk_buff cache line, the second to prefetch the shinfo part. Also using one memset() to initialize all skb_shared_info fields instead of one by one to reduce number of instructions, using long word moves. All skb_shared_info fields before 'dataref' are cleared in __alloc_skb(). Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-05 01:07:37 -07:00
David S. Miller	47d29646a2	net: Inline skb_pull() in eth_type_trans(). In commit `6be8ac2f` ("[NET]: uninline skb_pull, de-bloats a lot") we uninlined skb_pull. But in some critical paths it makes sense to inline this thing and it helps performance significantly. Create an skb_pull_inline() so that we can do this in a way that serves also as annotation. Based upon a patch by Eric Dumazet. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-02 02:21:44 -07:00
Rami Rosen	ccb7c7732e	net: Remove two unnecessary exports (skbuff). There is no need to export skb_under_panic() and skb_over_panic() in skbuff.c, since these methods are used only in skbuff.c ; this patch removes these two exports. It also marks these functions as 'static' and removeS the extern declarations of them from include/linux/skbuff.h Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-20 22:39:53 -07:00
Alexander Duyck	cd58950a53	skbuff: remove unused dev_consume_skb macro definition dev_consume_skb and kfree_skb_clean have no users and in the case of kfree_skb_clean could cause potential build issues since I cannot find where it is defined. Based on the patch in which it was introduced it appears to have been a bit of leftover code from an earlier version of the patch in which kfree_skb_clean was dropped in favor of consume_skb. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-13 02:58:24 -07:00
David S. Miller	4a35ecf8bf	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/bonding/bond_main.c drivers/net/via-velocity.c drivers/net/wireless/iwlwifi/iwl-agn.c	2010-04-06 23:53:30 -07:00
Alexander Duyck	03e6d819c2	skbuff: remove unused dma_head & dma_maps fields The dma map fields in the skb_shared_info structure no longer has any users and can be dropped since it is making the skb_shared_info unecessarily larger. Running slabtop show that we were using 4K slabs for the skb->head on x86_64 w/ an allocation size of 1522. It turns out that the dma_head and dma_maps array made skb_shared large enough that we had crossed over the 2k boundary with standard frames and as such we were using 4k blocks of memory for all skbs. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-03-24 11:13:35 -07:00
Tom Herbert	0a9627f264	rps: Receive Packet Steering This patch implements software receive side packet steering (RPS). RPS distributes the load of received packet processing across multiple CPUs. Problem statement: Protocol processing done in the NAPI context for received packets is serialized per device queue and becomes a bottleneck under high packet load. This substantially limits pps that can be achieved on a single queue NIC and provides no scaling with multiple cores. This solution queues packets early on in the receive path on the backlog queues of other CPUs. This allows protocol processing (e.g. IP and TCP) to be performed on packets in parallel. For each device (or each receive queue in a multi-queue device) a mask of CPUs is set to indicate the CPUs that can process packets. A CPU is selected on a per packet basis by hashing contents of the packet header (e.g. the TCP or UDP 4-tuple) and using the result to index into the CPU mask. The IPI mechanism is used to raise networking receive softirqs between CPUs. This effectively emulates in software what a multi-queue NIC can provide, but is generic requiring no device support. Many devices now provide a hash over the 4-tuple on a per packet basis (e.g. the Toeplitz hash). This patch allow drivers to set the HW reported hash in an skb field, and that value in turn is used to index into the RPS maps. Using the HW generated hash can avoid cache misses on the packet when steering it to a remote CPU. The CPU mask is set on a per device and per queue basis in the sysfs variable /sys/class/net/<device>/queues/rx-<n>/rps_cpus. This is a set of canonical bit maps for receive queues in the device (numbered by <n>). If a device does not support multi-queue, a single variable is used for the device (rx-0). Generally, we have found this technique increases pps capabilities of a single queue device with good CPU utilization. Optimal settings for the CPU mask seem to depend on architectures and cache hierarcy. Below are some results running 500 instances of netperf TCP_RR test with 1 byte req. and resp. Results show cumulative transaction rate and system CPU utilization. e1000e on 8 core Intel Without RPS: 108K tps at 33% CPU With RPS: 311K tps at 64% CPU forcedeth on 16 core AMD Without RPS: 156K tps at 15% CPU With RPS: 404K tps at 49% CPU bnx2x on 16 core AMD Without RPS 567K tps at 61% CPU (4 HW RX queues) Without RPS 738K tps at 96% CPU (8 HW RX queues) With RPS: 854K tps at 76% CPU (4 HW RX queues) Caveats: - The benefits of this patch are dependent on architecture and cache hierarchy. Tuning the masks to get best performance is probably necessary. - This patch adds overhead in the path for processing a single packet. In a lightly loaded server this overhead may eliminate the advantages of increased parallelism, and possibly cause some relative performance degradation. We have found that masks that are cache aware (share same caches with the interrupting CPU) mitigate much of this. - The RPS masks can be changed dynamically, however whenever the mask is changed this introduces the possibility of generating out of order packets. It's probably best not change the masks too frequently. Signed-off-by: Tom Herbert <therbert@google.com> include/linux/netdevice.h \| 32 ++++- include/linux/skbuff.h \| 3 + net/core/dev.c \| 335 +++++++++++++++++++++++++++++++++++++-------- net/core/net-sysfs.c \| 225 ++++++++++++++++++++++++++++++- net/core/skbuff.c \| 2 + 5 files changed, 538 insertions(+), 59 deletions(-) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-03-16 21:23:18 -07:00
Eric Dumazet	4ab408dea0	net: fix protocol sk_buff field Commit `e992cd9b72` (kmemcheck: make bitfield annotations truly no-ops when disabled) allows us to revert a workaround we did in the past to not add holes in sk_buff structure. This patch partially reverts commit `14d18a81b5` (net: fix kmemcheck annotations) so that sparse doesnt complain: include/linux/skbuff.h:357:41: error: invalid bitfield specifier for type restricted __be16. Reported-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-03-02 03:05:05 -08:00
Felix Fietkau	da3f5cf1f8	skbuff: align sk_buff::cb to 64 bit and close some potential holes The alignment requirement for 64-bit load/store instructions on ARM is implementation defined. Some CPUs (such as Marvell Feroceon) do not generate an exception, if such an instruction is executed with an address that is not 64 bit aligned. In such a case, the Feroceon corrupts adjacent memory, which showed up in my tests as a crash in the rx path of ath9k that only occured with CONFIG_XFRM set. This crash happened, because the first field of the mac80211 rx status info in the cb is an u64, and changing it corrupted the skb->sp field. This patch also closes some potential pre-existing holes in the sk_buff struct surrounding the cb[] area. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Cc: stable@kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2010-02-27 03:16:59 -08:00
Ben Hutchings	1a5778aa00	net: Fix first line of kernel-doc for a few functions The function name must be followed by a space, hypen, space, and a short description. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-02-14 22:35:47 -08:00
Alexander Duyck	c81c2d9544	skbuff: remove skb_dma_map/unmap The two functions skb_dma_map/unmap are unsafe to use as they cause problems when packets are cloned and sent to multiple devices while a HW IOMMU is enabled. Due to this it is best to remove the code so it is not used by any other network driver maintainters. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-12-02 19:57:15 -08:00
Eric Dumazet	8964be4a9a	net: rename skb->iif to skb->skb_iif To help grep games, rename iif to skb_iif Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-11-20 15:35:04 -08:00
David S. Miller	230f9bb701	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/usb/cdc_ether.c All CDC ethernet devices of type USB_CLASS_COMM need to use '&mbm_info'. Signed-off-by: David S. Miller <davem@davemloft.net>	2009-11-06 00:55:55 -08:00
Eric Dumazet	d94d9fee9f	net: cleanup include/linux This cleanup patch puts struct/union/enum opening braces, in first line to ease grep games. struct something { becomes : struct something { Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-11-04 09:50:58 -08:00
Eric Dumazet	9d410c7960	net: fix sk_forward_alloc corruption On UDP sockets, we must call skb_free_datagram() with socket locked, or risk sk_forward_alloc corruption. This requirement is not respected in SUNRPC. Add a convenient helper, skb_free_datagram_locked() and use it in SUNRPC Reported-by: Francis Moreau <francis.moro@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-10-30 12:25:12 -07:00
Eric Dumazet	14d18a81b5	net: fix kmemcheck annotations struct sk_buff kmemcheck annotations enlarged this structure by 8/16 bytes Fix this by moving 'protocol' inside flags1 bitfield, and queue_mapping inside flags2 bitfield. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-10-29 22:48:44 -07:00
Eric Dumazet	61321bbd62	net: Add netdev_alloc_skb_ip_align() helper Instead of hardcoding NET_IP_ALIGN stuff in various network drivers, we can add a helper around netdev_alloc_skb() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-10-13 03:44:03 -07:00
Neil Horman	3b885787ea	net: Generalize socket rx gap / receive queue overflow cmsg Create a new socket level option to report number of queue overflows Recently I augmented the AF_PACKET protocol to report the number of frames lost on the socket receive queue between any two enqueued frames. This value was exported via a SOL_PACKET level cmsg. AFter I completed that work it was requested that this feature be generalized so that any datagram oriented socket could make use of this option. As such I've created this patch, It creates a new SOL_SOCKET level option called SO_RXQ_OVFL, which when enabled exports a SOL_SOCKET level cmsg that reports the nubmer of times the sk_receive_queue overflowed between any two given frames. It also augments the AF_PACKET protocol to take advantage of this new feature (as it previously did not touch sk->sk_drops, which this patch uses to record the overflow count). Tested successfully by me. Notes: 1) Unlike my previous patch, this patch simply records the sk_drops value, which is not a number of drops between packets, but rather a total number of drops. Deltas must be computed in user space. 2) While this patch currently works with datagram oriented protocols, it will also be accepted by non-datagram oriented protocols. I'm not sure if thats agreeable to everyone, but my argument in favor of doing so is that, for those protocols which aren't applicable to this option, sk_drops will always be zero, and reporting no drops on a receive queue that isn't used for those non-participating protocols seems reasonable to me. This also saves us having to code in a per-protocol opt in mechanism. 3) This applies cleanly to net-next assuming that commit `977750076d` (my af packet cmsg patch) is reverted Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-10-12 13:26:31 -07:00
Johannes Berg	72bce62775	net: remove unused skb->do_not_encrypt mac80211 required this due to the master netdev, but now it can put all information into skb->cb and this can go. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2009-07-24 15:05:31 -04:00
Tobias Klauser	8660c1240e	skbuff.h: Fix comment for NET_IP_ALIGN Use the correct function call for skb_reserve in the comment for NET_IP_ALIGN. Signed-off-by: Tobias Klauser <klto@zhaw.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-07-14 12:03:42 -07:00
Linus Torvalds	d2aa455037	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (55 commits) netxen: fix tx ring accounting netxen: fix detection of cut-thru firmware mode forcedeth: fix dma api mismatches atm: sk_wmem_alloc initial value is one net: correct off-by-one write allocations reports via-velocity : fix no link detection on boot Net / e100: Fix suspend of devices that cannot be power managed TI DaVinci EMAC : Fix rmmod error net: group address list and its count ipv4: Fix fib_trie rebalancing, part 2 pkt_sched: Update drops stats in act_police sky2: version 1.23 sky2: add GRO support sky2: skb recycling sky2: reduce default transmit ring sky2: receive counter update sky2: fix shutdown synchronization sky2: PCI irq issues sky2: more receive shutdown sky2: turn off pause during shutdown ... Manually fix trivial conflict in net/core/skbuff.c due to kmemcheck	2009-06-18 14:07:15 -07:00
Randy Dunlap	a42fc8f694	skbuff.h: fix skb_dst kernel-doc Fix kernel-doc warnings (missing + extra entries) in skbuff.h. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-17 04:31:11 -07:00
Linus Torvalds	b3fec0fe35	Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/vegard/kmemcheck * 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/vegard/kmemcheck: (39 commits) signal: fix __send_signal() false positive kmemcheck warning fs: fix do_mount_root() false positive kmemcheck warning fs: introduce __getname_gfp() trace: annotate bitfields in struct ring_buffer_event net: annotate struct sock bitfield c2port: annotate bitfield for kmemcheck net: annotate inet_timewait_sock bitfields ieee1394/csr1212: fix false positive kmemcheck report ieee1394: annotate bitfield net: annotate bitfields in struct inet_sock net: use kmemcheck bitfields API for skbuff kmemcheck: introduce bitfield API kmemcheck: add opcode self-testing at boot x86: unify pte_hidden x86: make _PAGE_HIDDEN conditional kmemcheck: make kconfig accessible for other architectures kmemcheck: enable in the x86 Kconfig kmemcheck: add hooks for the page allocator kmemcheck: add hooks for page- and sg-dma-mappings kmemcheck: don't track page tables ...	2009-06-16 13:09:51 -07:00
Vegard Nossum	fe55f6d5c0	net: use kmemcheck bitfields API for skbuff Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>	2009-06-15 15:49:25 +02:00
Johannes Berg	8f77f3849c	mac80211: do not pass PS frames out of mac80211 again In order to handle powersave frames properly we had needed to pass these out to the device queues again, and introduce the skb->requeue bit. This, however, also has unnecessary overhead by needing to 'clean up' already tried frames, and this clean-up code is also buggy when software encryption is used. Instead of sending the frames via the master netdev queue again, simply put them into the pending queue. This also fixes a problem where frames for that particular station could be reordered when some were still on the software queues and older ones are re-injected into the software queue after them. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2009-06-10 13:28:37 -04:00
David S. Miller	ee03987170	skbuff: Add frag list abstraction interfaces. With the hope that these can be used to eliminate direct references to the frag list implementation. Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-09 00:17:13 -07:00
Eric Dumazet	042a53a9e4	net: skb_shared_info optimization skb_dma_unmap() is quite expensive for small packets, because we use two different cache lines from skb_shared_info. One to access nr_frags, one to access dma_maps[0] Instead of dma_maps being an array of MAX_SKB_FRAGS + 1 elements, let dma_head alone in a new dma_head field, close to nr_frags, to reduce cache lines misses. Tested on my dev machine (bnx2 & tg3 adapters), nice speedup ! Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-08 00:21:48 -07:00
Eric Dumazet	eae3f29cc7	net: num_dma_maps is not used Get rid of num_dma_maps in struct skb_shared_info, as it seems unused. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-08 00:20:23 -07:00
Eric Dumazet	e5b9215ef9	net: skb cleanup Can remove anonymous union now it has one field. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-03 02:51:05 -07:00
Eric Dumazet	adf30907d6	net: skb->dst accessors Define three accessors to get/set dst attached to a skb struct dst_entry skb_dst(const struct sk_buff skb) void skb_dst_set(struct sk_buff skb, struct dst_entry dst) void skb_dst_drop(struct sk_buff *skb) This one should replace occurrences of : dst_release(skb->dst) skb->dst = NULL; Delete skb->dst field Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-03 02:51:04 -07:00
Eric Dumazet	511c3f92ad	net: skb->rtable accessor Define skb_rtable(const struct sk_buff *skb) accessor to get rtable from skb Delete skb->rtable field Setting rtable is not allowed, just set dst instead as rtable is an alias. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-03 02:51:02 -07:00
Eric Dumazet	dfbf97f3ac	net: add _skb_dst opaque field struct sk_buff uses one union to define dst and rtable fields. We want to replace direct access to these pointers by accessors. First patch adds a new "unsigned long _skb_dst;" opaque field in this union. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-03 02:51:01 -07:00
Johann Baudy	69e3c75f4d	net: TX_RING and packet mmap New packet socket feature that makes packet socket more efficient for transmission. - It reduces number of system call through a PACKET_TX_RING mechanism, based on PACKET_RX_RING (Circular buffer allocated in kernel space which is mmapped from user space). - It minimizes CPU copy using fragmented SKB (almost zero copy). Signed-off-by: Johann Baudy <johann.baudy@gnu-log.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-05-18 22:11:22 -07:00
Michael S. Tsirkin	6f26c9a755	tun: fix tun_chr_aio_write so that aio works aio_write gets const struct iovec * but tun_chr_aio_write casts this to struct iovec * and modifies the iovec. As a result, attempts to use io_submit to send packets to a tun device fail with weird errors such as EINVAL. Since tun is the only user of skb_copy_datagram_from_iovec, we can fix this simply by changing the later so that it does not touch the iovec passed to it. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-04-21 05:42:46 -07:00
Michael S. Tsirkin	0a1ec07a67	net: skb_copy_datagram_const_iovec() There's an skb_copy_datagram_iovec() to copy out of a paged skb, but it modifies the iovec, and does not support starting at an offset in the destination. We want both in tun.c, so let's add the function. It's a carbon copy of skb_copy_datagram_iovec() with enough changes to be annoying. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-04-21 05:42:44 -07:00
David S. Miller	13223cb02c	Merge branch 'master' of /home/davem/src/GIT/linux-2.6/	2009-03-29 01:40:34 -07:00
Randy Dunlap	4b21cd4eed	skbuff.h: fix missing kernel-doc Add missing struct field to fix kernel-doc warning: Warning(include/linux/skbuff.h:182): No description found for parameter 'flags' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-03-28 23:38:40 -07:00
Linus Torvalds	d54b3538b0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (119 commits) [SCSI] scsi_dh_rdac: Retry for NOT_READY check condition [SCSI] mpt2sas: make global symbols unique [SCSI] sd: Make revalidate less chatty [SCSI] sd: Try READ CAPACITY 16 first for SBC-2 devices [SCSI] sd: Refactor sd_read_capacity() [SCSI] mpt2sas v00.100.11.15 [SCSI] mpt2sas: add MPT2SAS_MINOR(221) to miscdevice.h [SCSI] ch: Add scsi type modalias [SCSI] 3w-9xxx: add power management support [SCSI] bsg: add linux/types.h include to bsg.h [SCSI] cxgb3i: fix function descriptions [SCSI] libiscsi: fix possbile null ptr session command cleanup [SCSI] iscsi class: remove host no argument from session creation callout [SCSI] libiscsi: pass session failure a session struct [SCSI] iscsi lib: remove qdepth param from iscsi host allocation [SCSI] iscsi lib: have lib create work queue for transmitting IO [SCSI] iscsi class: fix lock dep warning on logout [SCSI] libiscsi: don't cap queue depth in iscsi modules [SCSI] iscsi_tcp: replace scsi_debug/tcp_debug logging with iscsi conn logging [SCSI] libiscsi_tcp: replace tcp_debug/scsi_debug logging with session/conn logging ...	2009-03-28 13:30:43 -07:00
Stephen Hemminger	9247744e5e	skb: expose and constify hash primitives Some minor changes to queue hashing: 1. Use const on accessor functions 2. Export skb_tx_hash for use in drivers (see ixgbe) Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-03-21 13:39:26 -07:00
Chris Leech	01d5b2fca1	[SCSI] net: define feature flags for FCoE offloads Define feature flags for FCoE offloads. Signed-off-by: Chris Leech <christopher.leech@intel.com> Signed-off-by: Yi Zou <yi.zou@intel.com> Acked-by: David Miller <davem@davemloft.net> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2009-03-13 15:11:07 -05:00
Neil Horman	ead2ceb0ec	Network Drop Monitor: Adding kfree_skb_clean for non-drops and modifying end-of-line points for skbs Signed-off-by: Neil Horman <nhorman@tuxdriver.com> include/linux/skbuff.h \| 4 +++- net/core/datagram.c \| 2 +- net/core/skbuff.c \| 22 ++++++++++++++++++++++ net/ipv4/arp.c \| 2 +- net/ipv4/udp.c \| 2 +- net/packet/af_packet.c \| 2 +- 6 files changed, 29 insertions(+), 5 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>	2009-03-13 12:09:28 -07:00
Randy Dunlap	d3a21be86c	skbuff.h: fix timestamps kernel-doc Fix skbuff.h kernel-doc for timestamps: must include "struct" keyword, otherwise there are kernel-doc errors: Error(linux-next-20090227//include/linux/skbuff.h:161): cannot understand prototype: 'struct skb_shared_hwtstamps ' Error(linux-next-20090227//include/linux/skbuff.h:177): cannot understand prototype: 'union skb_shared_tx ' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-03-02 03:15:58 -08:00
David S. Miller	e70049b9e7	Merge branch 'master' of /home/davem/src/GIT/linux-2.6/	2009-02-24 03:50:29 -08:00
David S. Miller	92a0acce18	net: Kill skb_truesize_check(), it only catches false-positives. A long time ago we had bugs, primarily in TCP, where we would modify skb->truesize (for TSO queue collapsing) in ways which would corrupt the socket memory accounting. skb_truesize_check() was added in order to try and catch this error more systematically. However this debugging check has morphed into a Frankenstein of sorts and these days it does nothing other than catch false-positives. Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-17 21:24:05 -08:00
Patrick Ohly	ac45f602ee	net: infrastructure for hardware time stamping The additional per-packet information (16 bytes for time stamps, 1 byte for flags) is stored for all packets in the skb_shared_info struct. This implementation detail is hidden from users of that information via skb_* accessor functions. A separate struct resp. union is used for the additional information so that it can be stored/copied easily outside of skb_shared_info. Compared to previous implementations (reusing the tstamp field depending on the context, optional additional structures) this is the simplest solution. It does not extend sk_buff itself. TX time stamping is implemented in software if the device driver doesn't support hardware time stamping. The new semantic for hardware/software time stamping around ndo_start_xmit() is based on two assumptions about existing network device drivers which don't support hardware time stamping and know nothing about it: - they leave the new skb_shared_tx unmodified - the keep the connection to the originating socket in skb->sk alive, i.e., don't call skb_orphan() Given that skb_shared_tx is new, the first assumption is safe. The second is only true for some drivers. As a result, software TX time stamping currently works with the bnx2 driver, but not with the unmodified igb driver (the two drivers this patch series was tested with). Signed-off-by: Patrick Ohly <patrick.ohly@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-15 22:43:34 -08:00
David S. Miller	d54e6d8727	net: Kill skbuff macros from the stone ages. This kills of HAVE_ALLOC_SKB and HAVE_ALIGNABLE_SKB. Nothing in-tree uses them and nothing in-tree has used them since 2.0.x times. Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-09 23:45:29 -08:00
David S. Miller	d6301d3dd1	net: Increase default NET_SKB_PAD to 32. Several devices need to insert some "pre headers" in front of the main packet data when they transmit a packet. Currently we allocate only 16 bytes of pad room and this ends up not being enough for some types of hardware (NIU, usb-net, s390 qeth, etc.) So increase this to 32. Note that drivers still need to check in their transmit routine whether enough headroom exists, and if not use skb_realloc_headroom(). Tunneling, IPSEC, and other encapsulation methods can cause the padding area to be used up. Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-08 19:24:13 -08:00
Herbert Xu	86911732d3	gro: Avoid copying headers of unmerged packets Unfortunately simplicity isn't always the best. The fraginfo interface turned out to be suboptimal. The problem was quite obvious. For every packet, we have to copy the headers from the frags structure into skb->head, even though for 99% of the packets this part is immediately thrown away after the merge. LRO didn't have this problem because it directly read the headers from the frags structure. This patch attempts to address this by creating an interface that allows GRO to access the headers in the first frag without having to copy it. Because all drivers that use frags place the headers in the first frag this optimisation should be enough. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-01-29 16:33:03 -08:00
David S. Miller	d5a9e24afb	net: Allow RX queue selection to seed TX queue hashing. The idea is that drivers which implement multiqueue RX pre-seed the SKB by recording the RX queue selected by the hardware. If such a seed is found on TX, we'll use that to select the outgoing TX queue. This helps get more consistent load balancing on router and firewall loads. Signed-off-by: David S. Miller <davem@davemloft.net>	2009-01-27 16:22:11 -08:00
Herbert Xu	71d93b39e5	net: Add skb_gro_receive This patch adds the helper skb_gro_receive to merge packets for GRO. The current method is to allocate a new header skb and then chain the original packets to its frag_list. This is done to make it easier to integrate into the existing GSO framework. In future as GSO is moved into the drivers, we can undo this and simply chain the original packets together. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-12-15 23:42:33 -08:00
Ilpo Järvinen	832d11c5cd	tcp: Try to restore large SKBs while SACK processing During SACK processing, most of the benefits of TSO are eaten by the SACK blocks that one-by-one fragment SKBs to MSS sized chunks. Then we're in problems when cleanup work for them has to be done when a large cumulative ACK comes. Try to return back to pre-split state already while more and more SACK info gets discovered by combining newly discovered SACK areas with the previous skb if that's SACKed as well. This approach has a number of benefits: 1) The processing overhead is spread more equally over the RTT 2) Write queue has less skbs to process (affect everything which has to walk in the queue past the sacked areas) 3) Write queue is consistent whole the time, so no other parts of TCP has to be aware of this (this was not the case with some other approach that was, well, quite intrusive all around). 4) Clean_rtx_queue can release most of the pages using single put_page instead of previous PAGE_SIZE/mss+1 calls In case a hole is fully filled by the new SACK block, we attempt to combine the next skb too which allows construction of skbs that are even larger than what tso split them to and it handles hole per on every nth patterns that often occur during slow start overshoot pretty nicely. Though this to be really useful also a retransmission would have to get lost since cumulative ACKs advance one hole at a time in the most typical case. TODO: handle upwards only merging. That should be rather easy when segment is fully sacked but I'm leaving that as future work item (it won't make very large difference anyway since this current approach already covers quite a lot of normal cases). I was earlier thinking of some sophisticated way of tracking timestamps of the first and the last segment but later on realized that it won't be that necessary at all to store the timestamp of the last segment. The cases that can occur are basically either: 1) ambiguous => no sensible measurement can be taken anyway 2) non-ambiguous is due to reordering => having the timestamp of the last segment there is just skewing things more off than does some good since the ack got triggered by one of the holes (besides some substle issues that would make determining right hole/skb even harder problem). Anyway, it has nothing to do with this change then. I choose to route some abnormal looking cases with goto noop, some could be handled differently (eg., by stopping the walking at that skb but again). In general, they either shouldn't happen at all or are rare enough to make no difference in practice. In theory this change (as whole) could cause some macroscale regression (global) because of cache misses that are taken over the round-trip time but it gets very likely better because of much less (local) cache misses per other write queue walkers and the big recovery clearing cumulative ack. Worth to note that these benefits would be very easy to get also without TSO/GSO being on as long as the data is in pages so that we can merge them. Currently I won't let that happen because DSACK splitting at fragment that would mess up pcounts due to sk_can_gso in tcp_set_skb_tso_segs. Once DSACKs fragments gets avoided, we have some conditions that can be made less strict. TODO: I will probably have to convert the excessive pointer passing to struct sacktag_state... :-) My testing revealed that considerable amount of skbs couldn't be shifted because they were cloned (most likely still awaiting tx reclaim)... [The rest is considering future work instead since I got repeatably EFAULT to tcpdump's recvfrom when I added pskb_expand_head to deal with clones, so I separated that into another, later patch] ...To counter that, I gave up on the fifth advantage: 5) When growing previous SACK block, less allocs for new skbs are done, basically a new alloc is needed only when new hole is detected and when the previous skb runs out of frags space ...which now only happens of if reclaim is fast enough to dispose the clone before the SACK block comes in (the window is RTT long), otherwise we'll have to alloc some. With clones being handled I got these numbers (will be somewhat worse without that), taken with fine-grained mibs: TCPSackShifted 398 TCPSackMerged 877 TCPSackShiftFallback 320 TCPSACKCOLLAPSEFALLBACKGSO 0 TCPSACKCOLLAPSEFALLBACKSKBBITS 0 TCPSACKCOLLAPSEFALLBACKSKBDATA 0 TCPSACKCOLLAPSEFALLBACKBELOW 0 TCPSACKCOLLAPSEFALLBACKFIRST 1 TCPSACKCOLLAPSEFALLBACKPREVBITS 318 TCPSACKCOLLAPSEFALLBACKMSS 1 TCPSACKCOLLAPSEFALLBACKNOHEAD 0 TCPSACKCOLLAPSEFALLBACKSHIFT 0 TCPSACKCOLLAPSENOOPSEQ 0 TCPSACKCOLLAPSENOOPSMALLPCOUNT 0 TCPSACKCOLLAPSENOOPSMALLLEN 0 TCPSACKCOLLAPSEHOLE 12 Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-11-24 21:20:15 -08:00
Sujith	8b30b1fe36	mac80211: Re-enable aggregation Wireless HW without any dedicated queues for aggregation do not need the ampdu_queues mechanism present right now in mac80211. Since mac80211 is still incomplete wrt TX MQ changes, do not allow aggregation sessions for drivers that set ampdu_queues. This is only an interim hack until Intel fixes the requeue issue. Signed-off-by: Sujith <Sujith.Manoharan@atheros.com> Signed-off-by: Luis Rodriguez <Luis.Rodriguez@Atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-10-31 19:02:14 -04:00
Alexey Dobriyan	def8b4faff	net: reduce structures when XFRM=n ifdef out * struct sk_buff::sp (pointer) * struct dst_entry::xfrm (pointer) * struct sock::sk_policy (2 pointers) Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-28 13:24:06 -07:00
Peter Zijlstra	654bed16cf	net: packet split receive api Add some packet-split receive hooks. For one this allows to do NUMA node affine page allocs. Later on these hooks will be extended to do emergency reserve allocations for fragments. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-07 14:22:33 -07:00
Lennert Buytenhek	04a4bb55bc	net: add skb_recycle_check() to enable netdriver skb recycling This patch adds skb_recycle_check(), which can be used by a network driver after transmitting an skb to check whether this skb can be recycled as a receive buffer. skb_recycle_check() checks that the skb is not shared or cloned, and that it is linear and its head portion large enough (as determined by the driver) to be recycled as a receive buffer. If these conditions are met, it does any necessary reference count dropping and cleans up the skbuff as if it just came from __alloc_skb(). Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-10-01 02:33:12 -07:00
David S. Miller	1164f52a24	net: Add skb_queue_walk_from() and skb_queue_walk_from_safe(). These will be used by TCP write queue handling and elsewhere. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-23 00:49:44 -07:00
David S. Miller	249c8b42c7	net: Add skb_queue_next(). A lot of code wants to iterate over an SKB queue at the top level using it's own control structure and iterator scheme. Provide skb_queue_next(), which is only valid to invoke if skb_queue_is_last() returns false. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-23 00:44:42 -07:00
David S. Miller	fc7ebb212d	net: Add skb_queue_is_last(). Several bits of code want to know "is this the last SKB in a queue", and all of them implement this by hand. Provide an common interface to make this check. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-23 00:34:07 -07:00
David S. Miller	1d4a31dde9	net: Fix bus in SKB queue splicing interfaces. Handle the case of head being non-empty, by adding list->qlen to head->qlen instead of using direct assignment. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-22 21:57:21 -07:00
David S. Miller	67fed45930	net: Add new interfaces for SKB list light-weight init and splicing. This will be used by subsequent changesets. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-21 22:36:24 -07:00
David S. Miller	a40c24a133	net: Add SKB DMA mapping helper functions. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-11 04:51:14 -07:00
David S. Miller	271bff7afb	net: Add DMA mapping tokens to skb_shared_info. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-09-11 04:48:58 -07:00
Rusty Russell	db543c1f97	net: skb_copy_datagram_from_iovec() There's an skb_copy_datagram_iovec() to copy out of a paged skb, but nothing the other way around (because we don't do that). We want to allocate big skbs in tun.c, so let's add the function. It's a carbon copy of skb_copy_datagram_iovec() with enough changes to be annoying. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-15 19:52:30 -07:00
Gerrit Renker	987c402ac3	skbuff: Code readability NiT Inserting a space between the `-' improved the C readability (some languages allow hyphens within functions and variable names, which is confusing). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-08-11 18:17:17 -07:00
Randy Dunlap	4a7b61d235	skbuff: add missing kernel-doc for do_not_encrypt Add missing kernel-doc notation to sk_buff: Warning(linux-2.6.27-rc1-git2//include/linux/skbuff.h:345): No description found for parameter 'do_not_encrypt' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-31 20:52:08 -07:00
Johannes Berg	d0f0980414	mac80211: partially fix skb->cb use This patch fixes mac80211 to not use the skb->cb over the queue step from virtual interfaces to the master. The patch also, for now, disables aggregation because that would still require requeuing, will fix that in a separate patch. There are two other places (software requeue and powersaving stations) where requeue can happen, but that is not currently used by any drivers/not possible to use respectively. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-07-29 16:55:08 -04:00
Patrick McHardy	6aa895b047	vlan: Don't store VLAN tag in cb Use a real skb member to store the skb to avoid clashes with qdiscs, which are allowed to use the cb area themselves. As currently only real devices that consume the skb set the NETIF_F_HW_VLAN_TX flag, no explicit invalidation is neccessary. The new member fills a hole on 64 bit, the skb layout changes from: __u32 mark; /* 172 4 / sk_buff_data_t transport_header; / 176 4 / sk_buff_data_t network_header; / 180 4 / sk_buff_data_t mac_header; / 184 4 / sk_buff_data_t tail; / 188 4 / / --- cacheline 3 boundary (192 bytes) --- / sk_buff_data_t end; / 192 4 / / XXX 4 bytes hole, try to pack / to __u32 mark; / 172 4 / __u16 vlan_tci; / 176 2 / / XXX 2 bytes hole, try to pack / sk_buff_data_t transport_header; / 180 4 / sk_buff_data_t network_header; / 184 4 */ Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-14 22:49:06 -07:00
David S. Miller	b19fa1fa91	net: Delete NETDEVICES_MULTIQUEUE kconfig option. Multiple TX queue support is a core networking feature. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-08 23:14:24 -07:00
Ben Hutchings	4497b0763c	net: Discard and warn about LRO'd skbs received for forwarding Add skb_warn_if_lro() to test whether an skb was received with LRO and warn if so. Change br_forward(), ip_forward() and ip6_forward() to call it) and discard the skb if it returns true. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-19 16:22:28 -07:00
Randy Dunlap	553a56726b	skbuff: fix missing kernel-doc notation Add kernel-doc notation for ndisc_nodetype: Warning(linux-2.6.25-git2//include/linux/skbuff.h:340): No description found for parameter 'ndisc_nodetype' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-21 15:51:36 -07:00
Gerrit Renker	f5572855ec	[SKB]: __skb_queue_tail = __skb_insert before This expresses __skb_queue_tail() in terms of __skb_insert(), using __skb_insert_before() as auxiliary function. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-14 00:05:28 -07:00
Gerrit Renker	7de6c03336	[SKB]: __skb_append = __skb_queue_after This expresses __skb_append in terms of __skb_queue_after, exploiting that __skb_append(old, new, list) = __skb_queue_after(list, old, new). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-14 00:05:09 -07:00
Gerrit Renker	bf29927588	[SKB]: __skb_queue_after(prev) = __skb_insert(prev, prev->next) By reordering, __skb_queue_after() is expressed in terms of __skb_insert(). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-14 00:04:51 -07:00
Gerrit Renker	f525c06d12	[SKB]: __skb_dequeue = skb_peek + __skb_unlink By rearranging the order of declarations, __skb_dequeue() is expressed in terms of * skb_peek() and * __skb_unlink(), thus in effect mirroring the analogue implementation of __skb_dequeue_tail(). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-14 00:04:12 -07:00
YOSHIFUJI Hideaki	de357cc013	[IPV6] NDISC: Don't rely on node-type hint from L2 unless required. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-04-03 10:06:01 +09:00
Templin, Fred L	fadf6bf060	[IPV6] SIT: Add PRL management for ISATAP. This patch updates the Linux the Intra-Site Automatic Tunnel Addressing Protocol (ISATAP) implementation. It places the ISATAP potential router list (PRL) in the kernel and adds three new private ioctls for PRL management. [Add several changes of structure name, constant names etc. - yoshfuji] Signed-off-by: Fred L. Templin <fred.l.templin@boeing.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-04-03 10:05:58 +09:00
Ilpo Järvinen	419ae74ecc	[NET]: uninline skb_trim, de-bloats Allyesconfig (v2.6.24-mm1): -10976 209 funcs, 123 +, 11099 -, diff: -10976 --- skb_trim Without number of debug related CONFIGs (v2.6.25-rc2-mm1): -7360 192 funcs, 131 +, 7491 -, diff: -7360 --- skb_trim skb_trim \| +42 Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-27 17:54:01 -07:00
Ilpo Järvinen	c2aa270ad7	[NET]: uninline skb_push, de-bloats a lot Allyesconfig (v2.6.24-mm1): -21593 356 funcs, 2418 +, 24011 -, diff: -21593 --- skb_push Without many debug related CONFIGs (v2.6.25-rc2-mm1): -13890 341 funcs, 189 +, 14079 -, diff: -13890 --- skb_push skb_push \| +46 Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-27 17:52:40 -07:00
Ilpo Järvinen	f58518e678	[NET]: uninline dev_alloc_skb, de-bloats a lot Allyesconfig (v2.6.24-mm1): -23668 392 funcs, 104 +, 23772 -, diff: -23668 --- dev_alloc_skb Without many debug CONFIGs (v2.6.25-rc2-mm1): -12178 382 funcs, 157 +, 12335 -, diff: -12178 --- dev_alloc_skb dev_alloc_skb \| +37 Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-27 17:51:31 -07:00
Ilpo Järvinen	6be8ac2fdc	[NET]: uninline skb_pull, de-bloats a lot Allyesconfig (v2.6.24-mm1): -28162 354 funcs, 3005 +, 31167 -, diff: -28162 --- skb_pull Without number of debug related CONFIGs (v2.6.25-rc2-mm1): -9697 338 funcs, 221 +, 9918 -, diff: -9697 --- skb_pull skb_pull \| +44 Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-27 17:47:24 -07:00
Ilpo Järvinen	0dde3e1648	[NET]: uninline skb_put, de-bloats a lot Allyesconfig (v2.6.24-mm1): ~500 files changed ... 869 funcs, 198 +, 111003 -, diff: -110805 --- skb_put skb_put \| +104 Without number of debug related CONFIGs (v2.6.25-rc2-mm1): -60744 855 funcs, 861 +, 61605 -, diff: -60744 --- skb_put skb_put \| +57 Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-27 17:43:41 -07:00
Eric Dumazet	ee6b967301	[IPV4]: Add 'rtable' field in struct sk_buff to alias 'dst' and avoid casts (Anonymous) unions can help us to avoid ugly casts. A common cast it the (struct rtable )skb->dst one. Defining an union like : union { struct dst_entry dst; struct rtable *rtable; }; permits to use skb->rtable in place. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-05 18:30:47 -08:00
Randy Dunlap	3172936341	net: fix kernel-doc warnings in header files Add missing structure kernel-doc descriptions to sock.h & skbuff.h to fix kernel-doc warnings. (I think that Stephen H. sent a similar patch, but I can't find it. I just want to kill the warnings, with either patch.) Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-02-18 20:52:13 -08:00
Rusty Russell	f35d9d8aae	virtio: Implement skb_partial_csum_set, for setting partial csums on untrusted packets. Use it in virtio_net (replacing buggy version there), it's also going to be used by TAP for partial csum support. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: David S. Miller <davem@davemloft.net>	2008-02-04 23:49:56 +11:00
Patrick McHardy	2fd8e526f4	[NETFILTER]: bridge netfilter: remove nf_bridge_info read-only netoutdev member Before the removal of the deferred output hooks, netoutdev was used in case of VLANs on top of a bridge to store the VLAN device, so the deferred hooks would see the correct output device. This isn't necessary anymore since we're calling the output hooks for the correct device directly in the IP stack. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-31 19:27:29 -08:00
Herbert Xu	a59322be07	[UDP]: Only increment counter on first peek/recv The previous move of the the UDP inDatagrams counter caused each peek of the same packet to be counted separately. This may be undesirable. This patch fixes this by adding a bit to sk_buff to record whether this packet has already been seen through skb_recv_datagram. We then only increment the counter when the packet is seen for the first time. The only dodgy part is the fact that skb_recv_datagram doesn't have a good way of returning this new bit of information. So I've added a new function __skb_recv_datagram that does return this and made skb_recv_datagram a wrapper around it. The plan is to eventually replace all uses of skb_recv_datagram with this new function at which time it can be renamed its proper name. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:34 -08:00
Herbert Xu	27ab256864	[UDP]: Avoid repeated counting of checksum errors due to peeking Currently it is possible for two processes to peek on the same socket and end up incrementing the error counter twice for the same packet. This patch fixes it by making skb_kill_datagram return whether it succeeded in unlinking the packet and only incrementing the counter if it did. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:56:32 -08:00
Jens Axboe	9c55e01c0c	[TCP]: Splice receive support. Support for network splice receive. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-01-28 14:53:31 -08:00
Herbert Xu	2d4baff8da	[SKBUFF]: Free old skb properly in skb_morph The skb_morph function only freed the data part of the dst skb, but leaked the auxiliary data such as the netfilter fields. This patch fixes this by moving the relevant parts from __kfree_skb to skb_release_all and calling it in skb_morph. It also makes kfree_skbmem static since it's no longer called anywhere else and it now no longer does skb_release_data. Thanks to Yasuyuki KOZAKAI for finding this problem and posting a patch for it. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2007-11-26 23:11:19 +08:00
Chuck Lever	78608ba032	[NET]: Fix skb_truesize_check() assertion The intent of the assertion in skb_truesize_check() is to check for skb->truesize being decremented too much by other code, resulting in a wraparound below zero. The type of the right side of the comparison causes the compiler to promote the left side to an unsigned type, despite the presence of an explicit type cast. This defeats the check for negativity. Ensure both sides of the comparison are a signed type to prevent the implicit type conversion. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-11-10 21:53:30 -08:00
Chuck Lever	c2636b4d9e	[NET]: Treat the sign of the result of skb_headroom() consistently In some places, the result of skb_headroom() is compared to an unsigned integer, and in others, the result is compared to a signed integer. Make the comparisons consistent and correct. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-23 21:27:55 -07:00
Pavel Emelyanov	e3fa259bcb	[NET]: Cut off the queue_mapping field from sk_buff Just hide it behind the #ifdef, because nobody wants it now. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-22 02:59:57 -07:00
Pavel Emelyanov	4e3ab47a54	[NET]: Make and use skb_get_queue_mapping Make the helper for getting the field, symmetrical to the "set" one. Return 0 if CONFIG_NETDEVICES_MULTIQUEUE=n Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-22 02:59:56 -07:00
Herbert Xu	deea84b0ae	[NET]: Fix SKB_WITH_OVERHEAD calculation The calculation in SKB_WITH_OVERHEAD is incorrect in that it can cause an overflow across a page boundary which is what it's meant to prevent. In particular, the header length (X) should not be lumped together with skb_shared_info. The latter needs to be aligned properly while the header has no choice but to sit in front of wherever the payload is. Therefore the correct calculation is to take away the aligned size of skb_shared_info, and then subtract the header length. The resulting quantity L satisfies the following inequality: SKB_DATA_ALIGN(L + X) + sizeof(struct skb_shared_info) <= PAGE_SIZE This is the quantity used by alloc_skb to do the actual allocation. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-22 02:59:53 -07:00
Linus Torvalds	a52cefc80f	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (42 commits) [IPV6]: Consolidate the ip6_pol_route_(input\|output) pair [TCP]: Make snd_cwnd_cnt 32-bit [TCP]: Update the /proc/net/tcp documentation [NETNS]: Don't panic on creating the namespace's loopback [NEIGH]: Ensure that pneigh_lookup is protected with RTNL [INET]: kmalloc+memset -> kzalloc in frag_alloc_queue [ISDN]: Fix compile with CONFIG_ISDN_X25 disabled. [IPV6]: Replace sk_buff ** with sk_buff * in input handlers [SELINUX]: Update for netfilter ->hook() arg changes. [INET]: Consolidate the xxx_put [INET]: Small cleanup for xxx_put after evictor consolidation [INET]: Consolidate the xxx_evictor [INET]: Consolidate the xxx_frag_destroy [INET]: Consolidate xxx_the secret_rebuild [INET]: Consolidate the xxx_frag_kill [INET]: Collect common frag sysctl variables together [INET]: Collect frag queues management objects together [INET]: Move common fields from frag_queues in one place. [TG3]: Fix performance regression on 5705. [ISDN]: Remove local copy of device name to make sure renames work. ...	2007-10-15 14:06:58 -07:00
Herbert Xu	e0053ec07e	[SKBUFF]: Add skb_morph This patch creates a new function skb_morph that's just like skb_clone except that it lets user provide the spare skb that will be overwritten by the one that's to be cloned. This will be used by IP fragment reassembly so that we get back the same skb that went in last (rather than the head skb that we get now which requires us to carry around double pointers all over the place). Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-15 12:26:24 -07:00
Brice Goglin	eabd7e35c0	Add skb_is_gso_v6 Add skb_is_gso_v6(). Signed-off-by: Brice Goglin <brice@myri.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2007-10-15 14:24:07 -04:00
Herbert Xu	d9cc20484e	[NET] skbuff: Add skb_cow_head This patch adds an optimised version of skb_cow that avoids the copy if the header can be modified even if the rest of the payload is cloned. This can be used in encapsulating paths where we only need to modify the header. As it is, this can be used in PPPOE and bridging. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-09-16 16:21:16 -07:00
David S. Miller	a309bb072b	[NET]: Page offsets and lengths need to be __u32. Based upon a report from Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-31 02:28:28 -07:00
Al Viro	4381ca3c23	fix return type of skb_checksum_complete() It returns __sum16, not unsigned int Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-15 16:40:51 -07:00
Herbert Xu	c6c6e3e05c	[NET]: Update comments for skb checksums Rusty (whose comments we should all study and emulate :) pointed out that our comments for skb checksums are no longer up-to-date. So here is a patch to 1) add the case of partial checksums on input; 2) update partial checksum case to mention csum_start/csum_offset; 3) mention the new IPv6 feature bit. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:41:55 -07:00
Jozsef Kadlecsik	ba9dda3ab5	[NETFILTER]: x_tables: add TRACE target The TRACE target can be used to follow IP and IPv6 packets through the ruleset. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick NcHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:17:14 -07:00
Peter P Waskiewicz Jr	f25f4e4480	[CORE] Stack changes to add multiqueue hardware support API Add the multiqueue hardware device support API to the core network stack. Allow drivers to allocate multiple queues and manage them at the netdev level if they choose to do so. Added a new field to sk_buff, namely queue_mapping, for drivers to know which tx_ring to select based on OS classification of the flow. Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:16:21 -07:00
Patrick McHardy	334a8132d9	[SKBUFF]: Keep track of writable header len of headerless clones Currently NAT (and others) that want to modify cloned skbs copy them, even if in the vast majority of cases its not necessary because the skb is a clone made by TCP and the portion NAT wants to modify is actually writable because TCP release the header reference before cloning. The problem is that there is no clean way for NAT to find out how long the writable header area is, so this patch introduces skb->hdr_len to hold this length. When a headerless skb is cloned skb->hdr_len is set to the current headroom, for regular clones it is copied from the original. A new function skb_clone_writable(skb, len) returns whether the skb is writable up to len bytes from skb->data. To avoid enlarging the skb the mac_len field is reduced to 16 bit and the new hdr_len field is put in the remaining 16 bit. I've done a few rough benchmarks of NAT (not with this exact patch, but a very similar one). As expected it saves huge amounts of system time in case of sendfile, bringing it down to basically the same amount as without NAT, with sendmsg it only helps on loopback, probably because of the large MTU. Transmit a 1GB file using sendfile/sendmsg over eth0/lo with and without NAT: - sendfile eth0, no NAT: sys 0m0.388s - sendfile eth0, NAT: sys 0m1.835s - sendfile eth0: NAT + path: sys 0m0.370s (~ -80%) - sendfile lo, no NAT: sys 0m0.258s - sendfile lo, NAT: sys 0m2.609s - sendfile lo, NAT + patch: sys 0m0.260s (~ -90%) - sendmsg eth0, no NAT: sys 0m2.508s - sendmsg eth0, NAT: sys 0m2.539s - sendmsg eth0, NAT + patch: sys 0m2.445s (no change) - sendmsg lo, no NAT: sys 0m2.151s - sendmsg lo, NAT: sys 0m3.557s - sendmsg lo, NAT + patch: sys 0m2.159s (~ -40%) I expect other users can see a similar performance improvement, packet mangling iptables targets, ipip and ip_gre come to mind .. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-07-10 22:15:37 -07:00
Ilpo Järvinen	b9ce204f0a	[TCP]: Congestion control API RTT sampling fix Commit `164891aadf` broke RTT sampling of congestion control modules. Inaccurate timestamps could be fed to them without providing any way for them to identify such cases. Previously RTT sampler was called only if FLAG_RETRANS_DATA_ACKED was not set filtering inaccurate timestamps nicely. In addition, the new behavior could give an invalid timestamp (zero) to RTT sampler if only skbs with TCPCB_RETRANS were ACKed. This solves both problems. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-06-15 15:08:43 -07:00
Randy Dunlap	be52178b9f	[NET] skbuff: fix kernel-doc Fix skbuff.h kernel-doc: linux-2.6.21-git4//include/linux/skbuff.h:316): No description found for parameter 'transport_header' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 03:16:20 -07:00
James Chapman	46f8914e53	[SKB]: Introduce skb_queue_walk_safe() This patch provides a method for walking skb lists while inserting or removing skbs from the list. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-30 00:07:31 -07:00
Stephen Hemminger	164891aadf	[TCP]: Congestion control API update. Do some simple changes to make congestion control API faster/cleaner. * use ktime_t rather than timeval * merge rtt sampling into existing ack callback this means one indirect call versus two per ack. * use flags bits to store options/settings Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:45 -07:00
Stephen Hemminger	0c6fcc8a8c	[NET] skbuff: skb_store_bits const is backwards Getting warnings becuase skb_store_bits has skb as constant, but the function overwrites it. Looks like const was on the wrong side. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:29:17 -07:00
Herbert Xu	604763722c	[NET]: Treat CHECKSUM_PARTIAL as CHECKSUM_UNNECESSARY When a transmitted packet is looped back directly, CHECKSUM_PARTIAL maps to the semantics of CHECKSUM_UNNECESSARY. Therefore we should treat it as such in the stack. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:43 -07:00
Herbert Xu	663ead3bb8	[NET]: Use csum_start offset instead of skb_transport_header The skb transport pointer is currently used to specify the start of the checksum region for transmit checksum offload. Unfortunately, the same pointer is also used during receive side processing. This creates a problem when we want to retransmit a received packet with partial checksums since the skb transport pointer would be overwritten. This patch solves this problem by creating a new 16-bit csum_start offset value to replace the skb transport header for the purpose of checksums. This offset is calculated from skb->head so that it does not have to change when skb->data changes. No extra space is required since csum_offset itself fits within a 16-bit word so we can use the other 16 bits for csum_start. For backwards compatibility, just before we push a packet with partial checksums off into the device driver, we set the skb transport header to what it would have been under the old scheme. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:40 -07:00
David Howells	716ea3a7aa	[NET]: Move generic skbuff stuff from XFRM code to generic code Move generic skbuff stuff from XFRM code to generic code so that AF_RXRPC can use it too. The kdoc comments I've attached to the functions needs to be checked by whoever wrote them as I had to make some guesses about the workings of these functions. Signed-off-By: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:33 -07:00
Arnaldo Carvalho de Melo	27d7ff46a3	[SK_BUFF]: Introduce skb_copy_to_linear_data{_offset} To clearly state the intent of copying to linear sk_buffs, _offset being a overly long variant but interesting for the sake of saving some bytes. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>	2007-04-25 22:28:29 -07:00
Arnaldo Carvalho de Melo	d626f62b11	[SK_BUFF]: Introduce skb_copy_from_linear_data{_offset} To clearly state the intent of copying from linear sk_buffs, _offset being a overly long variant but interesting for the sake of saving some bytes. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2007-04-25 22:28:23 -07:00
Herbert Xu	35fc92a9de	[NET]: Allow forwarding of ip_summed except CHECKSUM_COMPLETE Right now Xen has a horrible hack that lets it forward packets with partial checksums. One of the reasons that CHECKSUM_PARTIAL and CHECKSUM_COMPLETE were added is so that we can get rid of this hack (where it creates two extra bits in the skbuff to essentially mirror ip_summed without being destroyed by the forwarding code). I had forgotten that I've already gone through all the deivce drivers last time around to make sure that they're looking at ip_summed == CHECKSUM_PARTIAL rather than ip_summed != 0 on transmit. In any case, I've now done that again so it should definitely be safe. Unfortunately nobody has yet added any code to update CHECKSUM_COMPLETE values on forward so we I'm setting that to CHECKSUM_NONE. This should be safe to remove for bridging but I'd like to check that code path first. So here is the patch that lets us get rid of the hack by preserving ip_summed (mostly) on forwarded packets. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:28:16 -07:00
Yasuyuki Kozakai	de6e05c49f	[NETFILTER]: nf_conntrack: kill destroy() in struct nf_conntrack for diet The destructor per conntrack is unnecessary, then this replaces it with system wide destructor. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:45 -07:00
Yasuyuki Kozakai	5f79e0f916	[NETFILTER]: nf_conntrack: don't use nfct in skb if conntrack is disabled Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:27:44 -07:00
Arnaldo Carvalho de Melo	897933bcdf	[SK_BUFF]: Remove skb_add_mtu() leftovers Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2007-04-25 22:26:35 -07:00
Arnaldo Carvalho de Melo	4305b54135	[SK_BUFF]: Convert skb->end to sk_buff_data_t Now to convert the last one, skb->data, that will allow many simplifications and removal of some of the offset helpers. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:29 -07:00
Arnaldo Carvalho de Melo	27a884dc3c	[SK_BUFF]: Convert skb->tail to sk_buff_data_t So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes on 64bit architectures, allowing us to combine the 4 bytes hole left by the layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4 64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN... :-) Many calculations that previously required that skb->{transport,network, mac}_header be first converted to a pointer now can be done directly, being meaningful as offsets or pointers. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:28 -07:00
Arnaldo Carvalho de Melo	2e07fa9cd3	[SK_BUFF]: Use offsets for skb->{mac,network,transport}_header on 64bit architectures With this we save 8 bytes per network packet, leaving a 4 bytes hole to be used in further shrinking work, likely with the offsetization of other pointers, such as ->{data,tail,end}, at the cost of adds, that were minimized by the usual practice of setting skb->{mac,nh,n}.raw to a local variable that is then accessed multiple times in each function, it also is not more expensive than before with regards to most of the handling of such headers, like setting one of these headers to another (transport to network, etc), or subtracting, adding to/from it, comparing them, etc. Now we have this layout for sk_buff on a x86_64 machine: [acme@mica net-2.6.22]$ pahole vmlinux sk_buff struct sk_buff { struct sk_buff * next; /* 0 8 / struct sk_buff prev; /* 8 8 / struct rb_node rb; / 16 24 / struct sock sk; /* 40 8 / ktime_t tstamp; / 48 8 / struct net_device dev; /* 56 8 / / --- cacheline 1 boundary (64 bytes) --- / struct net_device input_dev; /* 64 8 / sk_buff_data_t transport_header; / 72 4 / sk_buff_data_t network_header; / 76 4 / sk_buff_data_t mac_header; / 80 4 / / XXX 4 bytes hole, try to pack / struct dst_entry dst; /* 88 8 / struct sec_path sp; /* 96 8 / char cb[48]; / 104 48 / / cacheline 2 boundary (128 bytes) was 24 bytes ago/ unsigned int len; / 152 4 / unsigned int data_len; / 156 4 / unsigned int mac_len; / 160 4 / union { __wsum csum; / 4 / __u32 csum_offset; / 4 / }; / 164 4 / __u32 priority; / 168 4 / __u8 local_df:1; / 172 1 / __u8 cloned:1; / 172 1 / __u8 ip_summed:2; / 172 1 / __u8 nohdr:1; / 172 1 / __u8 nfctinfo:3; / 172 1 / __u8 pkt_type:3; / 173 1 / __u8 fclone:2; / 173 1 / __u8 ipvs_property:1; / 173 1 / / XXX 2 bits hole, try to pack / __be16 protocol; / 174 2 / void (destructor)(struct sk_buff ); / 176 8 / struct nf_conntrack nfct; /* 184 8 / / --- cacheline 3 boundary (192 bytes) --- / struct sk_buff nfct_reasm; /* 192 8 / struct nf_bridge_info nf_bridge; /* 200 8 / __u16 tc_index; / 208 2 / __u16 tc_verd; / 210 2 / dma_cookie_t dma_cookie; / 212 4 / __u32 secmark; / 216 4 / __u32 mark; / 220 4 / unsigned int truesize; / 224 4 / atomic_t users; / 228 4 / unsigned char head; /* 232 8 / unsigned char data; /* 240 8 / unsigned char tail; /* 248 8 / / --- cacheline 4 boundary (256 bytes) --- / unsigned char end; /* 256 8 / }; / size: 264, cachelines: 5 / / sum members: 260, holes: 1, sum holes: 4 / / bit holes: 1, sum bit holes: 2 bits / / last cacheline: 8 bytes */ On 32 bits nothing changes, and pointers continue to be used with the compiler turning all this abstraction layer into dust. But there are some sk_buff validation tricks that are now possible, humm... :-) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:21 -07:00
Arnaldo Carvalho de Melo	b0e380b1d8	[SK_BUFF]: unions of just one member don't get anything done, kill them Renaming skb->h to skb->transport_header, skb->nh to skb->network_header and skb->mac to skb->mac_header, to match the names of the associated helpers (skb[_[re]set]_{transport,network,mac}_header). Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:20 -07:00
Arnaldo Carvalho de Melo	cfe1fc7759	[SK_BUFF]: Introduce skb_network_header_len For the common sequence "skb->h.raw - skb->nh.raw", similar to skb->mac_len, that is precalculated tho, don't think we need to bloat skb with one more member, so just use this new helper, reducing the number of non-skbuff.h references to the layer headers even more. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:26:19 -07:00
Yasuyuki Kozakai	e7ac05f340	[NETFILTER]: nf_conntrack: add nf_copy() to safely copy members in skb This unifies the codes to copy netfilter related datas. Before copying, nf_copy() puts original members in destination skb. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:55 -07:00
Yasuyuki Kozakai	edda553c32	[NETFILTER]: nf_conntrack: add __nf_copy() to copy members in skb This unifies the codes to copy netfilter related datas. Note that __nf_copy() assumes destination skb doesn't have any netfilter related members. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:54 -07:00
Arnaldo Carvalho de Melo	9c70220b73	[SK_BUFF]: Introduce skb_transport_header(skb) For the places where we need a pointer to the transport header, it is still legal to touch skb->h.raw directly if just adding to, subtracting from or setting it to another layer header. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:31 -07:00
Arnaldo Carvalho de Melo	39b89160df	[SK_BUFF]: Introduce ipipv6_hdr(), remove skb->h.ipv6h Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:28 -07:00
Arnaldo Carvalho de Melo	b0061ce49c	[SK_BUFF]: Introduce ipip_hdr(), remove skb->h.ipiph Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:27 -07:00
Arnaldo Carvalho de Melo	aa8223c7bb	[SK_BUFF]: Introduce tcp_hdr(), remove skb->h.th Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:26 -07:00
Arnaldo Carvalho de Melo	88c7664f13	[SK_BUFF]: Introduce icmp_hdr(), remove skb->h.icmph Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:23 -07:00
Arnaldo Carvalho de Melo	4bedb45203	[SK_BUFF]: Introduce udp_hdr(), remove skb->h.uh Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:22 -07:00
Arnaldo Carvalho de Melo	d9edf9e2be	[SK_BUFF]: Introduce igmp_hdr() & friends, remove skb->h.igmph Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:21 -07:00
Arnaldo Carvalho de Melo	967b05f64e	[SK_BUFF]: Introduce skb_set_transport_header For the cases where the transport header is being set to a offset from skb->data. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:17 -07:00
Arnaldo Carvalho de Melo	ea2ae17d64	[SK_BUFF]: Introduce skb_transport_offset() For the quite common 'skb->h.raw - skb->data' sequence. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:16 -07:00
Arnaldo Carvalho de Melo	badff6d01a	[SK_BUFF]: Introduce skb_reset_transport_header(skb) For the common, open coded 'skb->h.raw = skb->data' operation, so that we can later turn skb->h.raw into a offset, reducing the size of struct sk_buff in 64bit land while possibly keeping it as a pointer on 32bit. This one touches just the most simple cases: skb->h.raw = skb->data; skb->h.raw = {skb_push\|[__]skb_pull}() The next ones will handle the slightly more "complex" cases. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:15 -07:00
Arnaldo Carvalho de Melo	0660e03f6b	[SK_BUFF]: Introduce ipv6_hdr(), remove skb->nh.ipv6h Now the skb->nh union has just one member, .raw, i.e. it is just like the skb->mac union, strange, no? I'm just leaving it like that till the transport layer is done with, when we'll rename skb->mac.raw to skb->mac_header (or ->mac_header_offset?), ditto for ->{h,nh}. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:14 -07:00
Arnaldo Carvalho de Melo	d0a92be05e	[SK_BUFF]: Introduce arp_hdr(), remove skb->nh.arph Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:12 -07:00
Arnaldo Carvalho de Melo	eddc9ec53b	[SK_BUFF]: Introduce ip_hdr(), remove skb->nh.iph Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:10 -07:00
Arnaldo Carvalho de Melo	c14d2450cb	[SK_BUFF]: Introduce skb_set_network_header For the cases where the network header is being set to a offset from skb->data. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:25:01 -07:00
Arnaldo Carvalho de Melo	d56f90a7c9	[SK_BUFF]: Introduce skb_network_header() For the places where we need a pointer to the network header, it is still legal to touch skb->nh.raw directly if just adding to, subtracting from or setting it to another layer header. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:59 -07:00
Arnaldo Carvalho de Melo	bbe735e424	[SK_BUFF]: Introduce skb_network_offset() For the quite common 'skb->nh.raw - skb->data' sequence. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:58 -07:00
Arnaldo Carvalho de Melo	c1d2bbe1cd	[SK_BUFF]: Introduce skb_reset_network_header(skb) For the common, open coded 'skb->nh.raw = skb->data' operation, so that we can later turn skb->nh.raw into a offset, reducing the size of struct sk_buff in 64bit land while possibly keeping it as a pointer on 32bit. This one touches just the most simple case, next will handle the slightly more "complex" cases. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:46 -07:00
Arnaldo Carvalho de Melo	98e399f82a	[SK_BUFF]: Introduce skb_mac_header() For the places where we need a pointer to the mac header, it is still legal to touch skb->mac.raw directly if just adding to, subtracting from or setting it to another layer header. This one also converts some more cases to skb_reset_mac_header() that my regex missed as it had no spaces before nor after '=', ugh. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:41 -07:00
Arnaldo Carvalho de Melo	48d49d0ccd	[SK_BUFF]: Introduce skb_set_mac_header() For the cases where we want to set skb->mac.raw to an offset from skb->data. Simple cases first, the memmove ones and specially pktgen will be left for later. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:37 -07:00
Arnaldo Carvalho de Melo	459a98ed88	[SK_BUFF]: Introduce skb_reset_mac_header(skb) For the common, open coded 'skb->mac.raw = skb->data' operation, so that we can later turn skb->mac.raw into a offset, reducing the size of struct sk_buff in 64bit land while possibly keeping it as a pointer on 32bit. This one touches just the most simple case, next will handle the slightly more "complex" cases. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:24:32 -07:00
Herbert Xu	759e5d0064	[UDP]: Clean up UDP-Lite receive checksum This patch eliminates some duplicate code for the verification of receive checksums between UDP-Lite and UDP. It does this by introducing __skb_checksum_complete_head which is identical to __skb_checksum_complete_head apart from the fact that it takes a length parameter rather than computing the first skb->len bytes. As a result UDP-Lite will be able to use hardware checksum offload for packets which do not use partial coverage checksums. It also means that UDP-Lite loopback no longer does unnecessary checksum verification. If any NICs start support UDP-Lite this would also start working automatically. This patch removes the assumption that msg_flags has MSG_TRUNC clear upon entry in recvmsg. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:51 -07:00
David S. Miller	fc910a2783	[NETLINK]: Limit NLMSG_GOODSIZE to 8K. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:45 -07:00
Eric Dumazet	b7aa0bf70c	[NET]: convert network timestamps to ktime_t We currently use a special structure (struct skb_timeval) and plain 'struct timeval' to store packet timestamps in sk_buffs and struct sock. This has some drawbacks : - Fixed resolution of micro second. - Waste of space on 64bit platforms where sizeof(struct timeval)=16 I suggest using ktime_t that is a nice abstraction of high resolution time services, currently capable of nanosecond resolution. As sizeof(ktime_t) is 8 bytes, using ktime_t in 'struct sock' permits a 8 byte shrink of this structure on 64bit architectures. Some other structures also benefit from this size reduction (struct ipq in ipv4/ip_fragment.c, struct frag_queue in ipv6/reassembly.c, ...) Once this ktime infrastructure adopted, we can more easily provide nanosecond resolution on top of it. (ioctl SIOCGSTAMPNS and/or SO_TIMESTAMPNS/SCM_TIMESTAMPNS) Note : this patch includes a bug correction in compat_sock_get_timestamp() where a "err = 0;" was missing (so this syscall returned -ENOENT instead of 0) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> CC: Stephen Hemminger <shemminger@linux-foundation.org> CC: John find <linux.kernel@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-25 22:23:34 -07:00
Pavel Emelianov	c2ecba7171	[NET]: Set a separate lockdep class for neighbour table's proxy_queue Otherwise the following calltrace will lead to a wrong lockdep warning: neigh_proxy_process() `- lock(neigh_table->proxy_queue.lock); arp_redo /* via tbl->proxy_redo */ arp_process neigh_event_ns neigh_update skb_queue_purge `- lock(neighbor->arp_queue.lock); This is not a deadlock actually, as neighbor table's proxy_queue and the neighbor's arp_queue are different queues. Lockdep thinks there is a deadlock as both queues are initialized with skb_queue_head_init() and thus have a common class. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-17 13:13:31 -07:00
Herbert Xu	b4dfa0b1fb	[NET]: Get rid of alloc_skb_from_cache Since this was added originally for Xen, and Xen has recently (~2.6.18) stopped using this function, we can safely get rid of it. Good timing too since this function has started to bit rot. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-17 13:13:16 -07:00
Patrick McHardy	c01003c205	[IFB]: Fix crash on input device removal The input_device pointer is not refcounted, which means the device may disappear while packets are queued, causing a crash when ifb passes packets with a stale skb->dev pointer to netif_rx(). Fix by storing the interface index instead and do a lookup where neccessary. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-29 11:46:52 -07:00
Christoph Lameter	e18b890bb0	[PATCH] slab: remove kmem_cache_t Replace all uses of kmem_cache_t with struct kmem_cache. The patch was generated using the following script: #!/bin/sh # # Replace one string by another in all the kernel sources. # set -e for file in `find * -name ".c" -o -name ".h"\|xargs grep -l $1`; do quilt add $file sed -e "1,\$s/$1/$2/g" $file >/tmp/$$ mv /tmp/$$ $file quilt refresh done The script was run like this sh replace kmem_cache_t "struct kmem_cache" Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:25 -08:00
Christoph Hellwig	b30973f877	[PATCH] node-aware skb allocation Node-aware allocation of skbs for the receive path. Details: - __alloc_skb gets a new node argument and cals the node-aware slab functions with it. - netdev_alloc_skb passed the node number it gets from dev_to_node to it, everyone else passes -1 (any node) Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:22 -08:00
Al Viro	a80958f484	[PATCH] fix fallout from header dependency trimming OK, that seems to be enough to deal with the mess. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-04 12:45:29 -08:00
Al Viro	d7fe0f241d	[PATCH] severing skbuff.h -> mm.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2006-12-04 02:00:34 -05:00
Al Viro	bd01f843c3	[PATCH] severing skbuff.h -> poll.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2006-12-04 02:00:31 -05:00
Al Viro	a1f8e7f7fb	[PATCH] severing skbuff.h -> highmem.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2006-12-04 02:00:29 -05:00
Al Viro	ff1dcadb1b	[NET]: Split skb->csum ... into anonymous union of __wsum and __u32 (csum and csum_offset resp.) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:27:18 -08:00
Al Viro	1f61ab5ca5	[NET]: Preliminaty annotation of skb->csum. It's still not completely right; we need to split it into anon unions of __wsum and unsigned - for cases when we use it for partial checksum and for offset of checksum in skb Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:23:44 -08:00
Al Viro	b51655b958	[NET]: Annotate __skb_checksum_complete() and friends. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:23:38 -08:00
Al Viro	81d7766276	[NET]: Annotate skb_copy_and_csum_bits() and callers. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:23:36 -08:00
Al Viro	2bbbc86890	[NET]: Annotate skb_checksum() and callers. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:23:35 -08:00
Al Viro	5084205faf	[NET]: Annotate callers of csum_partial_copy_...() and csum_and_copy...() in net/* Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:23:33 -08:00
Thomas Graf	82e91ffef6	[NET]: Turn nfmark into generic mark nfmark is being used in various subsystems and has become the defacto mark field for all kinds of packets. Therefore it makes sense to rename it to `mark' and remove the dependency on CONFIG_NETFILTER. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:38 -08:00
Al Viro	ae08e1f092	[IPV6]: ip6_output annotations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:26 -08:00

... 2 3 4 5 6 ...

420 Commits