linux

Commit Graph

Author	SHA1	Message	Date
Ben Hutchings	e52ac3398c	net: Use device model to get driver name in skb_gso_segment() ethtool operations generally require the caller to hold RTNL and are not safe to call in atomic context. The device model provides this information for most devices; we'll only lose it for some old ISA drivers. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-01-17 10:31:12 -05:00
David S. Miller	b3613118eb	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2011-12-02 13:49:21 -05:00
Eric Dumazet	b536db9332	net: net_device flags is an unsigned int commit `b00055aacd` ([NET] core: add RFC2863 operstate) changed net_device flags from unsigned short to unsigned int. Some core functions still assume its an unsigned short. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-01 11:41:48 -05:00
RongQing.Li	8f89148986	net/core: fix rollback handler in register_netdevice_notifier Within nested statements, the break statement terminates only the do, for, switch, or while statement that immediately encloses it, So replace the break with goto. Signed-off-by: RongQing.Li <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-30 23:43:07 -05:00
Igor Maravic	6977a79d36	net: Fix skb_update_prio RCU usage. Change function rcu_dereference to rcu_dereference_bh to avoid warning [ INFO: suspicious RCU usage. ] ------------------------------- net/core/dev.c:2459 suspicious rcu_dereference_check() usage! because we are locking with rcu_read_lock_bh(); in function dev_queue_xmit(struct sk_buff *skb) Signed-off-by: Igor Maravic <igorm@etf.rs> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-29 18:25:17 -05:00
Tom Herbert	114cf58021	bql: Byte queue limits Networking stack support for byte queue limits, uses dynamic queue limits library. Byte queue limits are maintained per transmit queue, and a dql structure has been added to netdev_queue structure for this purpose. Configuration of bql is in the tx-<n> sysfs directory for the queue under the byte_queue_limits directory. Configuration includes: limit_min, bql minimum limit limit_max, bql maximum limit hold_time, bql slack hold time Also under the directory are: limit, current byte limit inflight, current number of bytes on the queue Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-29 12:46:19 -05:00
Tom Herbert	7346649826	net: Add queue state xoff flag for stack Create separate queue state flags so that either the stack or drivers can turn on XOFF. Added a set of functions used in the stack to determine if a queue is really stopped (either by stack or driver) Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-29 12:46:19 -05:00
Eric Dumazet	b90e5794c5	net: dont call jump_label_dec from irq context Igor Maravic reported an error caused by jump_label_dec() being called from IRQ context : BUG: sleeping function called from invalid context at kernel/mutex.c:271 in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper 1 lock held by swapper/0: #0: (&n->timer){+.-...}, at: [<ffffffff8107ce90>] call_timer_fn+0x0/0x340 Pid: 0, comm: swapper Not tainted 3.2.0-rc2-net-next-mpls+ #1 Call Trace: <IRQ> [<ffffffff8104f417>] __might_sleep+0x137/0x1f0 [<ffffffff816b9a2f>] mutex_lock_nested+0x2f/0x370 [<ffffffff810a89fd>] ? trace_hardirqs_off+0xd/0x10 [<ffffffff8109a37f>] ? local_clock+0x6f/0x80 [<ffffffff810a90a5>] ? lock_release_holdtime.part.22+0x15/0x1a0 [<ffffffff81557929>] ? sock_def_write_space+0x59/0x160 [<ffffffff815e936e>] ? arp_error_report+0x3e/0x90 [<ffffffff810969cd>] atomic_dec_and_mutex_lock+0x5d/0x80 [<ffffffff8112fc1d>] jump_label_dec+0x1d/0x50 [<ffffffff81566525>] net_disable_timestamp+0x15/0x20 [<ffffffff81557a75>] sock_disable_timestamp+0x45/0x50 [<ffffffff81557b00>] __sk_free+0x80/0x200 [<ffffffff815578d0>] ? sk_send_sigurg+0x70/0x70 [<ffffffff815e936e>] ? arp_error_report+0x3e/0x90 [<ffffffff81557cba>] sock_wfree+0x3a/0x70 [<ffffffff8155c2b0>] skb_release_head_state+0x70/0x120 [<ffffffff8155c0b6>] __kfree_skb+0x16/0x30 [<ffffffff8155c119>] kfree_skb+0x49/0x170 [<ffffffff815e936e>] arp_error_report+0x3e/0x90 [<ffffffff81575bd9>] neigh_invalidate+0x89/0xc0 [<ffffffff81578dbe>] neigh_timer_handler+0x9e/0x2a0 [<ffffffff81578d20>] ? neigh_update+0x640/0x640 [<ffffffff81073558>] __do_softirq+0xc8/0x3a0 Since jump_label_{inc\|dec} must be called from process context only, we must defer jump_label_dec() if net_disable_timestamp() is called from interrupt context. Reported-by: Igor Maravic <igorm@etf.rs> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-29 00:26:25 -05:00
Eric Dumazet	4504b8613b	net: use skb_flow_dissect() in __skb_get_rxhash() No functional changes. This uses the code we factorized in skb_flow_dissect() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-28 19:09:07 -05:00
Anton Blanchard	5cac98dd06	net: Fix corruption in /proc//net/dev_mcast I just hit this during my testing. Isn't there another bug lurking? BUG kmalloc-8: Redzone overwritten INFO: 0xc0000000de9dec48-0xc0000000de9dec4b. First byte 0x0 instead of 0xcc INFO: Allocated in .__seq_open_private+0x30/0xa0 age=0 cpu=5 pid=3896 .__kmalloc+0x1e0/0x2d0 .__seq_open_private+0x30/0xa0 .seq_open_net+0x60/0xe0 .dev_mc_seq_open+0x4c/0x70 .proc_reg_open+0xd8/0x260 .__dentry_open.clone.11+0x2b8/0x400 .do_last+0xf4/0x950 .path_openat+0xf8/0x480 .do_filp_open+0x48/0xc0 .do_sys_open+0x140/0x250 syscall_exit+0x0/0x40 dev_mc_seq_ops uses dev_seq_start/next/stop but only allocates sizeof(struct seq_net_private) of private data, whereas it expects sizeof(struct dev_iter_state): struct dev_iter_state { struct seq_net_private p; unsigned int pos; / bucket << BUCKET_SPACE + offset */ }; Create dev_seq_open_ops and use it so we don't have to expose struct dev_iter_state. [ Problem added by commit `f04565ddf5` (dev: use name hash for dev_seq_ops) -Eric ] Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-28 18:07:29 -05:00
Neil Horman	5bc1421e34	net: add network priority cgroup infrastructure (v4) This patch adds in the infrastructure code to create the network priority cgroup. The cgroup, in addition to the standard processes file creates two control files: 1) prioidx - This is a read-only file that exports the index of this cgroup. This is a value that is both arbitrary and unique to a cgroup in this subsystem, and is used to index the per-device priority map 2) priomap - This is a writeable file. On read it reports a table of 2-tuples <name:priority> where name is the name of a network interface and priority is indicates the priority assigned to frames egresessing on the named interface and originating from a pid in this cgroup This cgroup allows for skb priority to be set prior to a root qdisc getting selected. This is benenficial for DCB enabled systems, in that it allows for any application to use dcb configured priorities so without application modification Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: John Fastabend <john.r.fastabend@intel.com> CC: Robert Love <robert.w.love@intel.com> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-22 15:22:23 -05:00
Eric Dumazet	adc9300e78	net: use jump_label to shortcut RPS if not setup Most machines dont use RPS/RFS, and pay a fair amount of instructions in netif_receive_skb() / netif_rx() / get_rps_cpu() just to discover RPS/RFS is not setup. Add a jump_label named rps_needed. If no device rps_map or global rps_sock_flow_table is setup, netif_receive_skb() / netif_rx() do a single instruction instead of many ones, including conditional jumps. jmp +0 (if CONFIG_JUMP_LABEL=y) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-17 17:06:08 -05:00
Michał Mirosław	34324dc2bf	net: remove NETIF_F_NO_CSUM feature bit Only distinct use is checking if NETIF_F_NOCACHE_COPY should be enabled by default. The check heuristics is altered a bit here, so it hits other people than before. The default shouldn't be trusted for performance-critical cases anyway. For all other uses NETIF_F_NO_CSUM is equivalent to NETIF_F_HW_CSUM. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-16 17:43:12 -05:00
Michał Mirosław	c8f44affb7	net: introduce and use netdev_features_t for device features sets v2: add couple missing conversions in drivers split unexporting netdev_fix_features() implemented %pNF convert sock::sk_route_(no?)caps Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-16 17:43:10 -05:00
Michał Mirosław	bc5787c612	net: remove legacy ethtool ops As all drivers are converted, we may now remove discrete offload setting callback handling. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Acked-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-16 17:43:08 -05:00
Eric Dumazet	588f033075	net: use jump_label for netstamp_needed netstamp_needed seems a good candidate to jump_label conversion. This avoids 3 conditional branches per incoming packet in fast path. No measurable difference, given that these conditional branches are predicted on modern cpus. Only a small icache reduction, thanks to the unlikely() stuff. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-11-16 17:30:06 -05:00
Eric Dumazet	6a32e4f9dd	vlan: allow nested vlan_do_receive() commit `2425717b27` (net: allow vlan traffic to be received under bond) broke ARP processing on vlan on top of bonding. +-------+ eth0 --\| bond0 \|---bond0.103 eth1 --\| \| +-------+ 52870.115435: skb_gro_reset_offset <-napi_gro_receive 52870.115435: dev_gro_receive <-napi_gro_receive 52870.115435: napi_skb_finish <-napi_gro_receive 52870.115435: netif_receive_skb <-napi_skb_finish 52870.115435: get_rps_cpu <-netif_receive_skb 52870.115435: __netif_receive_skb <-netif_receive_skb 52870.115436: vlan_do_receive <-__netif_receive_skb 52870.115436: bond_handle_frame <-__netif_receive_skb 52870.115436: vlan_do_receive <-__netif_receive_skb 52870.115436: arp_rcv <-__netif_receive_skb 52870.115436: kfree_skb <-arp_rcv Packet is dropped in arp_rcv() because its pkt_type was set to PACKET_OTHERHOST in the first vlan_do_receive() call, since no eth0.103 exists. We really need to change pkt_type only if no more rx_handler is about to be called for the packet. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Reviewed-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-10-30 04:43:30 -04:00
Linus Torvalds	8a9ea3237e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1745 commits) dp83640: free packet queues on remove dp83640: use proper function to free transmit time stamping packets ipv6: Do not use routes from locally generated RAs \|PATCH net-next] tg3: add tx_dropped counter be2net: don't create multiple RX/TX rings in multi channel mode be2net: don't create multiple TXQs in BE2 be2net: refactor VF setup/teardown code into be_vf_setup/clear() be2net: add vlan/rx-mode/flow-control config to be_setup() net_sched: cls_flow: use skb_header_pointer() ipv4: avoid useless call of the function check_peer_pmtu TCP: remove TCP_DEBUG net: Fix driver name for mdio-gpio.c ipv4: tcp: fix TOS value in ACK messages sent from TIME_WAIT rtnetlink: Add missing manual netlink notification in dev_change_net_namespaces ipv4: fix ipsec forward performance regression jme: fix irq storm after suspend/resume route: fix ICMP redirect validation net: hold sock reference while processing tx timestamps tcp: md5: add more const attributes Add ethtool -g support to virtio_net ... Fix up conflicts in: - drivers/net/Kconfig: The split-up generated a trivial conflict with removal of a stale reference to Documentation/networking/net-modules.txt. Remove it from the new location instead. - fs/sysfs/dir.c: Fairly nasty conflicts with the sysfs rb-tree usage, conflicting with Eric Biederman's changes for tagged directories.	2011-10-25 13:25:22 +02:00
Linus Torvalds	2d03423b23	Merge branch 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core * 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (38 commits) mm: memory hotplug: Check if pages are correctly reserved on a per-section basis Revert "memory hotplug: Correct page reservation checking" Update email address for stable patch submission dynamic_debug: fix undefined reference to `__netdev_printk' dynamic_debug: use a single printk() to emit messages dynamic_debug: remove num_enabled accounting dynamic_debug: consolidate repetitive struct _ddebug descriptor definitions uio: Support physical addresses >32 bits on 32-bit systems sysfs: add unsigned long cast to prevent compile warning drivers: base: print rejected matches with DEBUG_DRIVER memory hotplug: Correct page reservation checking memory hotplug: Refuse to add unaligned memory regions remove the messy code file Documentation/zh_CN/SubmitChecklist ARM: mxc: convert device creation to use platform_device_register_full new helper to create platform devices with dma mask docs/driver-model: Update device class docs docs/driver-model: Document device.groups kobj_uevent: Ignore if some listeners cannot handle message dynamic_debug: make netif_dbg() call __netdev_printk() dynamic_debug: make netdev_dbg() call __netdev_printk() ...	2011-10-25 12:13:59 +02:00
David S. Miller	1805b2f048	Merge branch 'master' of ra.kernel.org:/pub/scm/linux/kernel/git/davem/net	2011-10-24 18:18:09 -04:00
Eric W. Biederman	d2237d3574	rtnetlink: Add missing manual netlink notification in dev_change_net_namespaces Renato Westphal noticed that since commit `a2835763e1` "rtnetlink: handle rtnl_link netlink notifications manually" was merged we no longer send a netlink message when a networking device is moved from one network namespace to another. Fix this by adding the missing manual notification in dev_change_net_namespaces. Since all network devices that are processed by dev_change_net_namspaces are in the initialized state the complicated tests that guard the manual rtmsg_ifinfo calls in rollback_registered and register_netdevice are unnecessary and we can just perform a plain notification. Cc: stable@kernel.org Tested-by: Renato Westphal <renatowestphal@gmail.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-10-24 03:03:37 -04:00
Mihai Maruseac	f04565ddf5	dev: use name hash for dev_seq_ops Instead of using the dev->next chain and trying to resync at each call to dev_seq_start, use the name hash, keeping the bucket and the offset in seq->private field. Tests revealed the following results for ifconfig > /dev/null * 1000 interfaces: * 0.114s without patch * 0.089s with patch * 3000 interfaces: * 0.489s without patch * 0.110s with patch * 5000 interfaces: * 1.363s without patch * 0.250s with patch * 128000 interfaces (other setup): * ~100s without patch * ~30s with patch Signed-off-by: Mihai Maruseac <mmaruseac@ixiacom.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-10-21 02:54:38 -04:00
Richard Cochran	4dc360c5e7	net: validate HWTSTAMP ioctl parameters This patch adds a sanity check on the values provided by user space for the hardware time stamping configuration. If the values lie outside of the absolute limits, then the ioctl request will be denied. Signed-off-by: Richard Cochran <richard.cochran@omicron.at> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-10-19 17:00:35 -04:00
Eric W. Biederman	850a545bd8	net: Move rcu_barrier from rollback_registered_many to netdev_run_todo. This patch moves the rcu_barrier from rollback_registered_many (inside the rtnl_lock) into netdev_run_todo (just outside the rtnl_lock). This allows us to gain the full benefit of sychronize_net calling synchronize_rcu_expedited when the rtnl_lock is held. The rcu_barrier in rollback_registered_many was originally a synchronize_net but was promoted to be a rcu_barrier() when it was found that people were unnecessarily hitting the 250ms wait in netdev_wait_allrefs(). Changing the rcu_barrier back to a synchronize_net is therefore safe. Since we only care about waiting for the rcu callbacks before we get to netdev_wait_allrefs() it is also safe to move the wait into netdev_run_todo. This was tested by creating and destroying 1000 tap devices and observing /proc/lock_stat. /proc/lock_stat reports this change reduces the hold times of the rtnl_lock by a factor of 10. There was no observable difference in the amount of time it takes to destroy a network device. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-10-19 16:59:42 -04:00
Eric Dumazet	9e903e0852	net: add skb frag size accessors To ease skb->truesize sanitization, its better to be able to localize all references to skb frags size. Define accessors : skb_frag_size() to fetch frag size, and skb_frag_size_{set\|add\|sub}() to manipulate it. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-10-19 03:10:46 -04:00
John Fastabend	2425717b27	net: allow vlan traffic to be received under bond The following configuration used to work as I expected. At least we could use the fcoe interfaces to do MPIO and the bond0 iface to do load balancing or failover. ---eth2.228-fcoe \| eth2 -----\| \| \|---- bond0 \| eth3 -----\| \| ---eth3.228-fcoe This worked because of a change we added to allow inactive slaves to rx 'exact' matches. This functionality was kept intact with the rx_handler mechanism. However now the vlan interface attached to the active slave never receives traffic because the bonding rx_handler updates the skb->dev and goto's another_round. Previously, the vlan_do_receive() logic was called before the bonding rx_handler. Now by the time vlan_do_receive calls vlan_find_dev() the skb->dev is set to bond0 and it is clear no vlan is attached to this iface. The vlan lookup fails. This patch moves the VLAN check above the rx_handler. A VLAN tagged frame is now routed to the eth2.228-fcoe iface in the above schematic. Untagged frames continue to the bond0 as normal. This case also remains intact, eth2 --> bond0 --> vlan.228 Here the skb is VLAN tagged but the vlan lookup fails on eth2 causing the bonding rx_handler to be called. On the second pass the vlan lookup is on the bond0 iface and completes as expected. Putting a VLAN.228 on both the bond0 and eth2 device will result in eth2.228 receiving the skb. I don't think this is completely unexpected and was the result prior to the rx_handler result. Note, the same setup is also used for other storage traffic that MPIO is used with eg. iSCSI and similar setups can be contrived without storage protocols. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jesse Gross <jesse@nicira.com> Reviewed-by: Jiri Pirko <jpirko@redhat.com> Tested-by: Hans Schillstrom <hams.schillstrom@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-10-18 23:46:46 -04:00
Ben Hutchings	09994d1b09	RPS: Ensure that an expired hardware filter can be re-added later Amir Vadai wrote: > When a stream is paused, and its rule is expired while it is paused, > no new rule will be configured to the HW when traffic resume. [...] > - When stream was resumed, traffic was steered again by RSS, and > because current-cpu was equal to desired-cpu, ndo_rx_flow_steer > wasn't called and no rule was configured to the HW. Fix this by setting the flow's current CPU only in the table for the newly selected RX queue. Reported-and-tested-by: Amir Vadai <amirv@dev.mellanox.co.il> Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-10-03 12:14:45 -04:00
Changli Gao	5dd17e08f3	net: rps: fix the support for PPPOE The upper protocol numbers of PPPOE are different, and should be treated specially. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-09-28 13:34:25 -04:00
David S. Miller	8decf86879	Merge branch 'master' of github.com:davem330/net Conflicts: MAINTAINERS drivers/net/Kconfig drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c drivers/net/ethernet/broadcom/tg3.c drivers/net/wireless/iwlwifi/iwl-pci.c drivers/net/wireless/iwlwifi/iwl-trans-tx-pcie.c drivers/net/wireless/rt2x00/rt2800usb.c drivers/net/wireless/wl12xx/main.c	2011-09-22 03:23:13 -04:00
Jiri Pirko	4bc71cb983	net: consolidate and fix ethtool_ops->get_settings calling This patch does several things: - introduces __ethtool_get_settings which is called from ethtool code and from drivers as well. Put ASSERT_RTNL there. - dev_ethtool_get_settings() is replaced by __ethtool_get_settings() - changes calling in drivers so rtnl locking is respected. In iboe_get_rate was previously ->get_settings() called unlocked. This fixes it. Also prb_calc_retire_blk_tmo() in af_packet.c had the same problem. Also fixed by calling __dev_get_by_index() instead of dev_get_by_index() and holding rtnl_lock for both calls. - introduces rtnl_lock in bnx2fc_vport_create() and fcoe_vport_create() so bnx2fc_if_create() and fcoe_if_create() are called locked as they are from other places. - use __ethtool_get_settings() in bonding code Signed-off-by: Jiri Pirko <jpirko@redhat.com> v2->v3: -removed dev_ethtool_get_settings() -added ASSERT_RTNL into __ethtool_get_settings() -prb_calc_retire_blk_tmo - use __dev_get_by_index() and lock around it and __ethtool_get_settings() call v1->v2: add missing export_symbol Reviewed-by: Ben Hutchings <bhutchings@solarflare.com> [except FCoE bits] Acked-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-09-15 17:32:26 -04:00
Michael S. Tsirkin	48c830120f	net: copy userspace buffers on device forwarding dev_forward_skb loops an skb back into host networking stack which might hang on the memory indefinitely. In particular, this can happen in macvtap in bridged mode. Copy the userspace fragments to avoid blocking the sender in that case. As this patch makes skb_copy_ubufs extern now, I also added some documentation and made it clear the SKBTX_DEV_ZEROCOPY flag automatically instead of doing it in all callers. This can be made into a separate patch if people feel it's worth it. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-09-15 14:49:44 -04:00
Ian Campbell	ea2ab69379	net: convert core to skb paged frag APIs Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: "Michał Mirosław" <mirq-linux@rere.qmqm.pl> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-24 17:52:11 -07:00
Eric Dumazet	ec5efe7946	rps: support IPIP encapsulation Skip IPIP header to get proper layer-4 information. Like GRE tunnels, this only works if rxhash is not already provided by the device itself (ethtool -K ethX rxhash off), to allow kernel compute a software rxhash. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-24 16:13:55 -07:00
Jason Baron	ffa10cb47a	dynamic_debug: make netdev_dbg() call __netdev_printk() Previously, if dynamic debug was enabled netdev_dbg() was using dynamic_dev_dbg() to print out the underlying msg. Fix this by making sure netdev_dbg() uses __netdev_printk(). Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Jason Baron <jbaron@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-08-22 18:23:06 -07:00
Jiri Pirko	0dfe178239	net: vlan: goto another_round instead of calling __netif_receive_skb Now, when vlan tag on untagged in non-accelerated path is stripped from skb, headers are reset right away. Benefit from that and avoid calling __netif_receive_skb recursivelly and just use another_round. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-22 12:43:22 -07:00
Changli Gao	ae1511bf76	net: rps: support PPPOE session messages Inspect the payload of PPPOE session messages for the 4 tuples to generate skb->rxhash. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-18 23:23:47 -07:00
Changli Gao	1ff1986fc9	net: rps: support 802.1Q For the 802.1Q packets, if the NIC doesn't support hw-accel-vlan-rx, RPS won't inspect the internal 4 tuples to generate skb->rxhash, so this kind of traffic can't get any benefit from RPS. This patch adds the support for 802.1Q to RPS. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-18 22:07:54 -07:00
Jiri Pirko	b81693d914	net: remove ndo_set_multicast_list callback Remove no longer used operation. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-17 20:22:03 -07:00
Jiri Pirko	01789349ee	net: introduce IFF_UNICAST_FLT private flag Use IFF_UNICAST_FTL to find out if driver handles unicast address filtering. In case it does not, promisc mode is entered. Patch also fixes following drivers: stmmac, niu: support uc filtering and yet it propagated ndo_set_multicast_list bna, benet, pxa168_eth, ks8851, ks8851_mll, ksz884x : has set ndo_set_rx_mode but do not support uc filtering Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-17 20:21:27 -07:00
Tom Herbert	c6865cb3cc	rps: Inspect GRE encapsulated packets to get flow hash Crack open GRE packets in __skb_get_rxhash to compute 4-tuple hash on in encapsulated packet. Note that this is used only when the __skb_get_rxhash is taken, in particular only when the device does not compute provide the rxhash (ie. feature is disabled). This was tested by creating a single GRE tunnel between two 16 core AMD machines. 200 netperf TCP_RR streams were ran with 1 byte request and response size. Without patch: 157497 tps, 50/90/99% latencies 1250/1292/1364 usecs With patch: 325896 tps, 50/90/99% latencies 603/848/1169 Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-17 20:06:03 -07:00
Tom Herbert	e971b7225b	rps: Infrastructure in __skb_get_rxhash for deep inspection Basics for looking for ports in encapsulated packets in tunnels. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-17 20:06:03 -07:00
Tom Herbert	bdeab99191	rps: Add flag to skb to indicate rxhash is based on L4 tuple The l4_rxhash flag was added to the skb structure to indicate that the rxhash value was computed over the 4 tuple for the packet which includes the port information in the encapsulated transport packet. This is used by the stack to preserve the rxhash value in __skb_rx_tunnel. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-17 20:06:03 -07:00
Tom Herbert	792df22cd0	rps: Some minor cleanup in get_rps_cpus Use some variables for clarity and extensibility. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-17 20:06:03 -07:00
Eric Dumazet	33d480ce6d	net: cleanup some rcu_dereference_raw RCU api had been completed and rcu_access_pointer() or rcu_dereference_protected() are better than generic rcu_dereference_raw() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-12 02:55:28 -07:00
Stephen Hemminger	a9b3cd7f32	rcu: convert uses of rcu_assign_pointer(x, NULL) to RCU_INIT_POINTER When assigning a NULL value to an RCU protected pointer, no barrier is needed. The rcu_assign_pointer, used to handle that but will soon change to not handle the special case. Convert all rcu_assign_pointer of NULL value. //smpl @@ expression P; @@ - rcu_assign_pointer(P, NULL) + RCU_INIT_POINTER(P, NULL) // </smpl> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-08-02 04:29:23 -07:00
Joe Perches	2d348d1f56	net: Convert struct net_device uc_promisc to bool No need to use int, its uses are boolean. May save a few bytes one day. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-25 16:17:35 -07:00
Michał Mirosław	fec30c3381	net: unexport netdev_fix_features() It is not used anywhere except net/core/dev.c now. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-14 14:44:32 -07:00
Michał Mirosław	1180e7d659	net: cleanup vlan_features setting in register_netdev vlan_features contains features inherited from underlying device. NETIF_SOFT_FEATURES are not inherited but belong to the vlan device itself (ensured in vlan_dev_fix_features()). Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-14 14:41:11 -07:00
Shan Wei	f9d7a1187d	net: Add GSO to vlan_features initialization Just add GSO to vlan_features initialization, and update comments. When we set offload features, vlan_dev_fix_features() will do more check. In vlan_dev_fix_features(), final features is decided by features of real device and vlan_features of real device. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-05 22:34:52 -07:00
Thomas Graf	4e985adaa5	rtnl: provide link dump consistency info This patch adds a change sequence counter to each net namespace which is bumped whenever a netdevice is added or removed from the list. If such a change occurred while a link dump took place, the dump will have the NLM_F_DUMP_INTR flag set in the first message which has been interrupted and in all subsequent messages of the same dump. Note that links may still be modified or renamed while a dump is taking place but we can guarantee for userspace to receive a complete list of links and not miss any. Testing: I have added 500 VLAN netdevices to make sure the dump is split over multiple messages. Then while continuously dumping links in one process I also continuously deleted and re-added a dummy netdevice in another process. Multiple dumps per seconds have had the NLM_F_DUMP_INTR flag set. I guess we can wait for Johannes patch to hit net-next via the wireless tree. I just wanted to give this some testing right away. Signed-off-by: Thomas Graf <tgraf@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-01 15:39:53 -07:00
Paul Gortmaker	56f8a75c17	ip: introduce ip_is_fragment helper inline function There are enough instances of this: iph->frag_off & htons(IP_MF \| IP_OFFSET) that a helper function is probably warranted. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-06-21 20:33:34 -07:00
David S. Miller	9f6ec8d697	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/iwlwifi/iwl-agn-rxon.c drivers/net/wireless/rtlwifi/pci.c net/netfilter/ipvs/ip_vs_core.c	2011-06-20 22:29:08 -07:00
Jiri Pirko	0b5c9db1b1	vlan: Fix the ingress VLAN_FLAG_REORDER_HDR check Testing of VLAN_FLAG_REORDER_HDR does not belong in vlan_untag but rather in vlan_do_receive. Otherwise the vlan header will not be properly put on the packet in the case of vlan header accelleration. As we remove the check from vlan_check_reorder_header rename it vlan_reorder_header to keep the naming clean. Fix up the skb->pkt_type early so we don't look at the packet after adding the vlan tag, which guarantees we don't goof and look at the wrong field. Use a simple if statement instead of a complicated switch statement to decided that we need to increment rx_stats for a multicast packet. Hopefully at somepoint we will just declare the case where VLAN_FLAG_REORDER_HDR is cleared as unsupported and remove the code. Until then this keeps it working correctly. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Jiri Pirko <jpirko@redhat.com> Acked-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-06-11 16:15:50 -07:00
Alexander Duyck	bff55273f9	v2 ethtool: remove support for ETHTOOL_GRXNTUPLE This change is meant to remove all support for displaying an ntuple as strings via ETHTOOL_GRXNTUPLE. The reason for this change is due to the fact that multiple issues have been found including: - Multiple buffer overruns for strings being displayed. - Incorrect filters displayed, cleared filters with ring of -2 are displayed - Setting get_rx_ntuple displays no rules if defined. - Endianess wrong on displayed values. - Hard limit of 1024 filters makes display functionality extremely limited The only driver that had supported this interface was ixgbe. Since it no longer uses the interface and due to the issues mentioned above I am submitting this patch to remove it. v2: Updated based on comments from Ben Hutchings - Left ETH_SS_NTUPLE_FILTERS in code but commented on it being deprecated - Removed ethtool_rx_ntuple_list and ethtool_rx_ntuple_flow_spec_container - Left ETHTOOL_GRXNTUPLE but commented it as deprecated Also cleaned up set_rx_ntuple since there is no flow spec container to maintain we can drop all the code for the alloc and free of it and just return ops->set_rx_ntuple(). Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-06-08 16:45:31 -07:00
Heiko Carstens	264524d5e5	net: cpu offline cause napi stall Frank Blaschka reported : <quote> During heavy network load we turn off/on cpus. Sometimes this causes a stall on the network device. Digging into the dump I found out following: napi is scheduled but does not run. From the I/O buffers and the napi state I see napi/rx_softirq processing has stopped because the budget was reached. napi stays in the softnet_data poll_list and the rx_softirq was raised again. I assume at this time the cpu offline comes in, the rx softirq is raised/moved to another cpu but napi stays in the poll_list of the softnet_data of the now offline cpu. Reviewing dev_cpu_callback (net/core/dev.c) I did not find the poll_list is transfered to the new cpu. </quote> This patch is a straightforward implementation of Frank suggestion : Transfert poll_list and trigger NET_RX_SOFTIRQ on new cpu. Reported-by: Frank Blaschka <blaschka@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-06-07 01:01:22 -07:00
David S. Miller	3019de124b	net: Rework netdev_drivername() to avoid warning. This interface uses a temporary buffer, but for no real reason. And now can generate warnings like: net/sched/sch_generic.c: In function dev_watchdog net/sched/sch_generic.c:254:10: warning: unused variable drivername Just return driver->name directly or "". Reported-by: Connor Hansen <cmdkhh@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-06-06 16:41:33 -07:00
Koki Sanagi	ec764bf083	net: tracepoint of net_dev_xmit sees freed skb and causes panic Because there is a possibility that skb is kfree_skb()ed and zero cleared after ndo_start_xmit, we should not see the contents of skb like skb->len and skb->dev->name after ndo_start_xmit. But trace_net_dev_xmit does that and causes panic by NULL pointer dereference. This patch fixes trace_net_dev_xmit not to see the contents of skb directly. If you want to reproduce this panic, 1. Get tracepoint of net_dev_xmit on 2. Create 2 guests on KVM 2. Make 2 guests use virtio_net 4. Execute netperf from one to another for a long time as a network burden 5. host will panic(It takes about 30 minutes) Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-06-02 14:06:31 -07:00
Neil Horman	f11970e383	net: make dev_disable_lro use physical device if passed a vlan dev (v2) If the device passed into dev_disable_lro is a vlan, then repoint the dev poniter so that we actually modify the underlying physical device. Signed-of-by: Neil Horman <nhorman@tuxdriver.com> CC: davem@davemloft.net CC: bhutchings@solarflare.com Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-25 17:55:25 -04:00
Eric Dumazet	be3fc413da	net: use synchronize_rcu_expedited() synchronize_rcu() is very slow in various situations (HZ=100, CONFIG_NO_HZ=y, CONFIG_PREEMPT=n) Extract from my (mostly idle) 8 core machine : synchronize_rcu() in 99985 us synchronize_rcu() in 79982 us synchronize_rcu() in 87612 us synchronize_rcu() in 79827 us synchronize_rcu() in 109860 us synchronize_rcu() in 98039 us synchronize_rcu() in 89841 us synchronize_rcu() in 79842 us synchronize_rcu() in 80151 us synchronize_rcu() in 119833 us synchronize_rcu() in 99858 us synchronize_rcu() in 73999 us synchronize_rcu() in 79855 us synchronize_rcu() in 79853 us When we hold RTNL mutex, we would like to spend some cpu cycles but not block too long other processes waiting for this mutex. We also want to setup/dismantle network features as fast as possible at boot/shutdown time. This patch makes synchronize_net() call the expedited version if RTNL is locked. synchronize_rcu_expedited() typical delay is about 20 us on my machine. synchronize_rcu_expedited() in 18 us synchronize_rcu_expedited() in 18 us synchronize_rcu_expedited() in 18 us synchronize_rcu_expedited() in 18 us synchronize_rcu_expedited() in 20 us synchronize_rcu_expedited() in 16 us synchronize_rcu_expedited() in 20 us synchronize_rcu_expedited() in 18 us synchronize_rcu_expedited() in 18 us Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com> CC: Ben Greear <greearb@candelatech.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-24 13:26:12 -04:00
Eric Dumazet	6df427fe8c	net: remove synchronize_net() from netdev_set_master() In the old days, we used to access dev->master in __netif_receive_skb() in a rcu_read_lock section. So one synchronize_net() call was needed in netdev_set_master() to make sure another cpu could not use old master while/after we release it. We now use netdev_rx_handler infrastructure and added one synchronize_net() call in bond_release()/bond_release_all() Remove the obsolete synchronize_net() from netdev_set_master() and add one in bridge del_nbp() after its netdev_rx_handler_unregister() call. This makes enslave -d a bit faster. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Jiri Pirko <jpirko@redhat.com> CC: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-22 21:01:20 -04:00
Eric Dumazet	449f454426	macvlan: remove one synchronize_rcu() call When one macvlan device is dismantled, we can avoid one synchronize_rcu() call done after deletion from hash list, since caller will perform a synchronize_net() call after its ndo_stop() call. Add a new netdev->dismantle field to signal this dismantle intent. Reduces RTNL hold time. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Patrick McHardy <kaber@trash.net> CC: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-20 00:33:18 -04:00
David S. Miller	9cbc94eabb	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/vmxnet3/vmxnet3_ethtool.c net/core/dev.c	2011-05-17 17:33:11 -04:00
Michael S. Tsirkin	604ae14ffb	net: Change netdev_fix_features messages loglevel Cool, how about we make 'Features changed' debug as well? This way userspace can't fill up the log just by tweaking tun features with an ioctl. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-17 15:44:10 -04:00
Eric Dumazet	372b231201	net: use hlist_del_rcu() in dev_change_name() Using plain hlist_del() in dev_change_name() is wrong since a concurrent reader can crash trying to dereference LIST_POISON1. Bug introduced in commit `72c9528bab` (net: Introduce dev_get_by_name_rcu()) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-17 13:56:59 -04:00
Michał Mirosław	6f404e441d	net: Change netdev_fix_features messages loglevel Those reduced to DEBUG can possibly be triggered by unprivileged processes and are nothing exceptional. Illegal checksum combinations can only be caused by driver bug, so promote those messages to WARN. Since GSO without SG will now only cause DEBUG message from netdev_fix_features(), remove the workaround from register_netdevice(). Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-16 15:14:21 -04:00
Peter Pan(潘卫平)	0696c3a8ac	net:set valid name before calling ndo_init() In commit `1c5cae815d` (net: call dev_alloc_name from register_netdevice), a bug of bonding was involved, see example 1 and 2. In register_netdevice(), the name of net_device is not valid until dev_get_valid_name() is called. But dev->netdev_ops->ndo_init(that is bond_init) is called before dev_get_valid_name(), and it uses the invalid name of net_device. I think register_netdevice() should make sure that the name of net_device is valid before calling ndo_init(). example 1: modprobe bonding ls /proc/net/bonding/bond%d ps -eLf root 3398 2 3398 0 1 21:34 ? 00:00:00 [bond%d] example 2: modprobe bonding max_bonds=3 [ 170.100292] bonding: Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) [ 170.101090] bonding: Warning: either miimon or arp_interval and arp_ip_target module parameters must be specified, otherwise bonding will not detect link failures! see bonding.txt for details. [ 170.102469] ------------[ cut here ]------------ [ 170.103150] WARNING: at /home/pwp/net-next-2.6/fs/proc/generic.c:586 proc_register+0x126/0x157() [ 170.104075] Hardware name: VirtualBox [ 170.105065] proc_dir_entry 'bonding/bond%d' already registered [ 170.105613] Modules linked in: bonding(+) sunrpc ipv6 uinput microcode ppdev parport_pc parport joydev e1000 pcspkr i2c_piix4 i2c_core [last unloaded: bonding] [ 170.108397] Pid: 3457, comm: modprobe Not tainted 2.6.39-rc2+ #14 [ 170.108935] Call Trace: [ 170.109382] [<c0438f3b>] warn_slowpath_common+0x6a/0x7f [ 170.109911] [<c051a42a>] ? proc_register+0x126/0x157 [ 170.110329] [<c0438fc3>] warn_slowpath_fmt+0x2b/0x2f [ 170.110846] [<c051a42a>] proc_register+0x126/0x157 [ 170.111870] [<c051a4dd>] proc_create_data+0x82/0x98 [ 170.112335] [<f94e6af6>] bond_create_proc_entry+0x3f/0x73 [bonding] [ 170.112905] [<f94dd806>] bond_init+0x77/0xa5 [bonding] [ 170.113319] [<c0721ac6>] register_netdevice+0x8c/0x1d3 [ 170.113848] [<f94e0e30>] bond_create+0x6c/0x90 [bonding] [ 170.114322] [<f94f4763>] bonding_init+0x763/0x7b1 [bonding] [ 170.114879] [<c0401240>] do_one_initcall+0x76/0x122 [ 170.115317] [<f94f4000>] ? 0xf94f3fff [ 170.115799] [<c0463f1e>] sys_init_module+0x1286/0x140d [ 170.116879] [<c07c6d9f>] sysenter_do_call+0x12/0x28 [ 170.117404] ---[ end trace 64e4fac3ae5fff1a ]--- [ 170.117924] bond%d: Warning: failed to register to debugfs [ 170.128728] ------------[ cut here ]------------ [ 170.129360] WARNING: at /home/pwp/net-next-2.6/fs/proc/generic.c:586 proc_register+0x126/0x157() [ 170.130323] Hardware name: VirtualBox [ 170.130797] proc_dir_entry 'bonding/bond%d' already registered [ 170.131315] Modules linked in: bonding(+) sunrpc ipv6 uinput microcode ppdev parport_pc parport joydev e1000 pcspkr i2c_piix4 i2c_core [last unloaded: bonding] [ 170.133731] Pid: 3457, comm: modprobe Tainted: G W 2.6.39-rc2+ #14 [ 170.134308] Call Trace: [ 170.134743] [<c0438f3b>] warn_slowpath_common+0x6a/0x7f [ 170.135305] [<c051a42a>] ? proc_register+0x126/0x157 [ 170.135820] [<c0438fc3>] warn_slowpath_fmt+0x2b/0x2f [ 170.137168] [<c051a42a>] proc_register+0x126/0x157 [ 170.137700] [<c051a4dd>] proc_create_data+0x82/0x98 [ 170.138174] [<f94e6af6>] bond_create_proc_entry+0x3f/0x73 [bonding] [ 170.138745] [<f94dd806>] bond_init+0x77/0xa5 [bonding] [ 170.139278] [<c0721ac6>] register_netdevice+0x8c/0x1d3 [ 170.139828] [<f94e0e30>] bond_create+0x6c/0x90 [bonding] [ 170.140361] [<f94f4763>] bonding_init+0x763/0x7b1 [bonding] [ 170.140927] [<c0401240>] do_one_initcall+0x76/0x122 [ 170.141494] [<f94f4000>] ? 0xf94f3fff [ 170.141975] [<c0463f1e>] sys_init_module+0x1286/0x140d [ 170.142463] [<c07c6d9f>] sysenter_do_call+0x12/0x28 [ 170.142974] ---[ end trace 64e4fac3ae5fff1b ]--- [ 170.144949] bond%d: Warning: failed to register to debugfs Signed-off-by: Weiping Pan <panweiping3@gmail.com> Reviewed-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-13 16:49:49 -04:00
Michał Mirosław	afe12cc86b	net: introduce netdev_change_features() It will be needed by bonding and other drivers changing vlan_features after ndo_init callback. As a bonus, this includes kernel-doc for netdev_update_features(). Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-12 18:40:56 -04:00
David S. Miller	3c709f8fb4	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-3.6 Conflicts: drivers/net/benet/be_main.c	2011-05-11 14:26:58 -04:00
Eric Dumazet	e14a599335	net: dev_close() should check IFF_UP Commit `443457242b` (factorize sync-rcu call in unregister_netdevice_many) mistakenly removed one test from dev_close() Following actions trigger a BUG : modprobe bonding modprobe dummy ifconfig bond0 up ifenslave bond0 dummy0 rmmod dummy dev_close() must not close a non IFF_UP device. With help from Frank Blaschka and Einar EL Lueck Reported-by: Frank Blaschka <blaschka@linux.vnet.ibm.com> Reported-by: Einar EL Lueck <ELELUECK@de.ibm.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-10 15:03:33 -07:00
David S. Miller	7143b7d412	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/tg3.c	2011-05-05 14:59:02 -07:00
Jiri Pirko	1c5cae815d	net: call dev_alloc_name from register_netdevice Force dev_alloc_name() to be called from register_netdevice() by dev_get_valid_name(). That allows to remove multiple explicit dev_alloc_name() calls. The possibility to call dev_alloc_name in advance remains. This also fixes veth creation regresion caused by `84c49d8c3e` Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-05 10:57:45 -07:00
Lifeng Sun	41c31f318a	networking: inappropriate ioctl operation should return ENOTTY ioctl() calls against a socket with an inappropriate ioctl operation are incorrectly returning EINVAL rather than ENOTTY: [ENOTTY] Inappropriate I/O control operation. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=33992 Signed-off-by: Lifeng Sun <lifongsun@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-05-02 15:41:29 -07:00
David Decotigny	8ae6daca85	ethtool: Call ethtool's get/set_settings callbacks with cleaned data This makes sure that when a driver calls the ethtool's get/set_settings() callback of another driver, the data passed to it is clean. This guarantees that speed_hi will be zeroed correctly if the called callback doesn't explicitely set it: we are sure we don't get a corrupted speed from the underlying driver. We also take care of setting the cmd field appropriately (ETHTOOL_GSET/SSET). This applies to dev_ethtool_get_settings(), which now makes sure it sets up that ethtool command parameter correctly before passing it to drivers. This also means that whoever calls dev_ethtool_get_settings() does not have to clean the ethtool command parameter. This function also becomes an exported symbol instead of an inline. All drivers visible to make allyesconfig under x86_64 have been updated. Signed-off-by: David Decotigny <decot@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-29 14:01:30 -07:00
Michał Mirosław	1742f183fc	net: fix netdev_increment_features() Simplify and fix netdev_increment_features() to conform to what is stated in netdevice.h comments about NETIF_F_ONE_FOR_ALL. Include FCoE segmentation and VLAN-challedged flags in computation. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-28 13:33:07 -07:00
Jiri Pirko	3aba891dde	bonding: move processing of recv handlers into handle_frame() Since now when bonding uses rx_handler, all traffic going into bond device goes thru bond_handle_frame. So there's no need to go back into bonding code later via ptype handlers. This patch converts original ptype handlers into "bonding receive probes". These functions are called from bond_handle_frame and they are registered per-mode. Note that vlan packets are also handled because they are always untagged thanks to vlan_untag() Note that this also allows arpmon for eth-bond-bridge-vlan topology. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-25 12:00:30 -07:00
Michał Mirosław	22d5969fb4	net: make WARN_ON in dev_disable_lro() useful Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-25 11:56:27 -07:00
Eric Dumazet	b71d1d426d	inet: constify ip headers and in6_addr Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers where possible, to make code intention more obvious. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-22 11:04:14 -07:00
David S. Miller	e1943424e4	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/bnx2x/bnx2x_ethtool.c	2011-04-19 00:21:33 -07:00
Ben Hutchings	31d8b9e099	net: Disable NETIF_F_TSO_ECN when TSO is disabled NETIF_F_TSO_ECN has no effect when TSO is disabled; this just means that feature state will be accurately reported to user-space. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-12 19:29:45 -07:00
Ben Hutchings	ea2d36883c	net: Disable all TSO features when SG is disabled The feature flags NETIF_F_TSO and NETIF_F_TSO6 independently enable TSO for IPv4 and IPv6 respectively. However, the test in netdev_fix_features() and its predecessor functions was never updated to check for NETIF_F_TSO6, possibly because it was originally proposed that TSO for IPv6 would be dependent on both feature flags. Now that these feature flags can be changed independently from user-space and we depend on netdev_fix_features() to fix invalid feature combinations, it's important to disable them both if scatter-gather is disabled. Also disable NETIF_F_TSO_ECN so user-space sees all TSO features as disabled. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-12 19:29:45 -07:00
Michał Mirosław	872674858f	net: add RTNL_ASSERT in __netdev_update_features() Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-12 14:36:07 -07:00
Jiri Pirko	bcc6d47903	net: vlan: make non-hw-accel rx path similar to hw-accel Now there are 2 paths for rx vlan frames. When rx-vlan-hw-accel is enabled, skb is untagged by NIC, vlan_tci is set and the skb gets into vlan code in __netif_receive_skb - vlan_hwaccel_do_receive. For non-rx-vlan-hw-accel however, tagged skb goes thru whole __netif_receive_skb, it's untagged in ptype_base hander and reinjected This incosistency is fixed by this patch. Vlan untagging happens early in __netif_receive_skb so the rest of code (ptype_all handlers, rx_handlers) see the skb like it was untagged by hw. Signed-off-by: Jiri Pirko <jpirko@redhat.com> v1->v2: remove "inline" from vlan_core.c functions Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-12 14:15:19 -07:00
David S. Miller	1c01a80cfe	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/smsc911x.c	2011-04-11 13:44:25 -07:00
Linus Torvalds	42933bac11	Merge branch 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6 * 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6: Fix common misspellings	2011-04-07 11:14:49 -07:00
Tom Herbert	c6e1a0d12c	net: Allow no-cache copy from user on transmit This patch uses __copy_from_user_nocache on transmit to bypass data cache for a performance improvement. skb_add_data_nocache and skb_copy_to_page_nocache can be called by sendmsg functions to use this feature, initial support is in tcp_sendmsg. This functionality is configurable per device using ethtool. Presumably, this feature would only be useful when the driver does not touch the data. The feature is turned on by default if a device indicates that it does some form of checksum offload; it is off by default for devices that do no checksum offload or indicate no checksum is necessary. For the former case copy-checksum is probably done anyway, in the latter case the device is likely loopback in which case the no cache copy is probably not beneficial. This patch was tested using 200 instances of netperf TCP_RR with 1400 byte request and one byte reply. Platform is 16 core AMD x86. No-cache copy disabled: 672703 tps, 97.13% utilization 50/90/99% latency:244.31 484.205 1028.41 No-cache copy enabled: 702113 tps, 96.16% utilization, 50/90/99% latency 238.56 467.56 956.955 Using 14000 byte request and response sizes demonstrate the effects more dramatically: No-cache copy disabled: 79571 tps, 34.34 %utlization 50/90/95% latency 1584.46 2319.59 5001.76 No-cache copy enabled: 83856 tps, 34.81% utilization 50/90/95% latency 2508.42 2622.62 2735.88 Note especially the effect on latency tail (95th percentile). This seems to provide a nice performance improvement and is consistent in the tests I ran. Presumably, this would provide the greatest benfits in the presence of an application workload stressing the cache and a lot of transmit data happening. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-04 22:30:30 -07:00
Michał Mirosław	6cb6a27c45	net: Call netdev_features_change() from netdev_update_features() Issue FEAT_CHANGE notification when features are changed by netdev_update_features(). This will allow changes made by extra constraints on e.g. MTU change to be properly propagated like changes via ethtool. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-04-02 22:48:47 -07:00
Lucas De Marchi	25985edced	Fix common misspellings Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>	2011-03-31 11:26:23 -03:00
Daniel Lezcano	79b569f0ec	netdev: fix mtu check when TSO is enabled In case the device where is coming from the packet has TSO enabled, we should not check the mtu size value as this one could be bigger than the expected value. This is the case for the macvlan driver when the lower device has TSO enabled. The macvlan inherit this feature and forward the packets without fragmenting them. Then the packets go through dev_forward_skb and are dropped. This patch fix this by checking TSO is not enabled when we want to check the mtu size. Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-30 02:42:17 -07:00
stephen hemminger	edf947f100	bridge: notify applications if address of bridge device changes The mac address of the bridge device may be changed when a new interface is added to the bridge. If this happens, then the bridge needs to call the network notifiers to tickle any other systems that care. Since bridge can be a module, this also means exporting the notifier function. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-27 23:35:02 -07:00
Amerigo Wang	3b261ade42	net: remove useless comments in net/core/dev.c The code itself can explain what it is doing, no need these comments. Signed-off-by: WANG Cong <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-27 23:34:59 -07:00
Michał Mirosław	27660515a2	net: implement dev_disable_lro() hw_features compatibility Implement compatibility with new hw_features for dev_disable_lro(). This is a transition path - dev_disable_lro() should be later integrated into netdev_fix_features() after all drivers are converted. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-22 01:00:26 -07:00
Jiri Pirko	8a4eb5734e	net: introduce rx_handler results and logic around that This patch allows rx_handlers to better signalize what to do next to it's caller. That makes skb->deliver_no_wcard no longer needed. kernel-doc for rx_handler_result is taken from Nicolas' patch. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-16 12:53:54 -07:00
David S. Miller	33175d84ee	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/bnx2x/bnx2x_cmn.c	2011-03-10 14:26:00 -08:00
Vasiliy Kulikov	8909c9ad8f	net: don't allow CAP_NET_ADMIN to load non-netdev kernel modules Since `a8f80e8ff9` any process with CAP_NET_ADMIN may load any module from /lib/modules/. This doesn't mean that CAP_NET_ADMIN is a superset of CAP_SYS_MODULE as modules are limited to /lib/modules/**. However, CAP_NET_ADMIN capability shouldn't allow anybody load any module not related to networking. This patch restricts an ability of autoloading modules to netdev modules with explicit aliases. This fixes CVE-2011-1019. Arnd Bergmann suggested to leave untouched the old pre-v2.6.32 behavior of loading netdev modules by name (without any prefix) for processes with CAP_SYS_MODULE to maintain the compatibility with network scripts that use autoloading netdev modules by aliases like "eth0", "wlan0". Currently there are only three users of the feature in the upstream kernel: ipip, ip_gre and sit. root@albatros:~# capsh --drop=$(seq -s, 0 11),$(seq -s, 13 34) -- root@albatros:~# grep Cap /proc/$$/status CapInh: 0000000000000000 CapPrm: fffffff800001000 CapEff: fffffff800001000 CapBnd: fffffff800001000 root@albatros:~# modprobe xfs FATAL: Error inserting xfs (/lib/modules/2.6.38-rc6-00001-g2bf4ca3/kernel/fs/xfs/xfs.ko): Operation not permitted root@albatros:~# lsmod \| grep xfs root@albatros:~# ifconfig xfs xfs: error fetching interface information: Device not found root@albatros:~# lsmod \| grep xfs root@albatros:~# lsmod \| grep sit root@albatros:~# ifconfig sit sit: error fetching interface information: Device not found root@albatros:~# lsmod \| grep sit root@albatros:~# ifconfig sit0 sit0 Link encap:IPv6-in-IPv4 NOARP MTU:1480 Metric:1 root@albatros:~# lsmod \| grep sit sit 10457 0 tunnel4 2957 1 sit For CAP_SYS_MODULE module loading is still relaxed: root@albatros:~# grep Cap /proc/$$/status CapInh: 0000000000000000 CapPrm: ffffffffffffffff CapEff: ffffffffffffffff CapBnd: ffffffffffffffff root@albatros:~# ifconfig xfs xfs: error fetching interface information: Device not found root@albatros:~# lsmod \| grep xfs xfs 745319 0 Reference: https://lkml.org/lkml/2011/2/24/203 Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Kees Cook <kees.cook@canonical.com> Signed-off-by: James Morris <jmorris@namei.org>	2011-03-10 10:25:19 +11:00
Jiri Pirko	e3f48d37cf	net: allow handlers to be processed for orig_dev This was there before, I forgot about this. Allows deliveries to ptype_base handlers registered for orig_dev. I presume this is still desired. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-07 15:37:16 -08:00
David S. Miller	63d8ea7f93	net: Forgot to commit net/core/dev.c part of Jiri's ->rx_handler patch. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-02-28 10:48:59 -08:00
Michał Mirosław	14d1232f49	net: avoid initial "Features changed" message Avoid "Features changed" message and ndo_set_features call on device registration caused by automatic enabling of GSO and GRO. Driver should have enabled hardware offloads it set in features, so the ndo_set_features() is not needed at registration time. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-02-23 14:23:31 -08:00
Michał Mirosław	8e9b59b219	Fix "(unregistered net_device): Features changed" message Fix netdev_update_features() messages on register time by moving the call further in register_netdevice(). When netdev->reg_state != NETREG_REGISTERED, netdev_name() returns "(unregistered netdevice)" even if the dev's name is already filled. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-02-23 14:23:31 -08:00
David S. Miller	2a3bcfdde6	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc-next-2.6	2011-02-22 10:21:36 -08:00
David S. Miller	da935c66ba	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: Documentation/feature-removal-schedule.txt drivers/net/e1000e/netdev.c net/xfrm/xfrm_policy.c	2011-02-19 19:17:35 -08:00

1 2 3 4 5 ...

834 Commits