linux

Commit Graph

Author	SHA1	Message	Date
Mahesh Bandewar	ea8ffc0818	bonding: fix wq initialization for links created via netlink Earlier patch `4493b81bea` ("bonding: initialize work-queues during creation of bond") moved the work-queue initialization from bond_open() to bond_create(). However this caused the link those are created using netlink 'create bond option' (ip link add bondX type bond); create the new trunk without initializing work-queues. Prior to the above mentioned change, ndo_open was in both paths and things worked correctly. The consequence is visible in the report shared by Joe Stringer - I've noticed that this patch breaks bonding within namespaces if you're not careful to perform device cleanup correctly. Here's my repro script, you can run on any net-next with this patch and you'll start seeing some weird behaviour: ip netns add foo ip li add veth0 type veth peer name veth0+ netns foo ip li add veth1 type veth peer name veth1+ netns foo ip netns exec foo ip li add bond0 type bond ip netns exec foo ip li set dev veth0+ master bond0 ip netns exec foo ip li set dev veth1+ master bond0 ip netns exec foo ip addr add dev bond0 192.168.0.1/24 ip netns exec foo ip li set dev bond0 up ip li del dev veth0 ip li del dev veth1 The second to last command segfaults, last command hangs. rtnl is now permanently locked. It's not a problem if you take bond0 down before deleting veths, or delete bond0 before deleting veths. If you delete either end of the veth pair as per above, either inside or outside the namespace, it hits this problem. Here's some kernel logs: [ 1221.801610] bond0: Enslaving veth0+ as an active interface with an up link [ 1224.449581] bond0: Enslaving veth1+ as an active interface with an up link [ 1281.193863] bond0: Releasing backup interface veth0+ [ 1281.193866] bond0: the permanent HWaddr of veth0+ - 16:bf:fb:e0:b8:43 - is still in use by bond0 - set the HWaddr of veth0+ to a different address to avoid conflicts [ 1281.193867] ------------[ cut here ]------------ [ 1281.193873] WARNING: CPU: 0 PID: 2024 at kernel/workqueue.c:1511 __queue_delayed_work+0x13f/0x150 [ 1281.193873] Modules linked in: bonding veth openvswitch nf_nat_ipv6 nf_nat_ipv4 nf_nat autofs4 nfsd auth_rpcgss nfs_acl binfmt_misc nfs lockd grace sunrpc fscache ppdev vmw_balloon coretemp psmouse serio_raw vmwgfx ttm drm_kms_helper vmw_vmci netconsole parport_pc configfs drm i2c_piix4 fb_sys_fops syscopyarea sysfillrect sysimgblt shpchp mac_hid nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack libcrc32c lp parport hid_generic usbhid hid mptspi mptscsih e1000 mptbase ahci libahci [ 1281.193905] CPU: 0 PID: 2024 Comm: ip Tainted: G W 4.10.0-bisect-bond-v0.14 #37 [ 1281.193906] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014 [ 1281.193906] Call Trace: [ 1281.193912] dump_stack+0x63/0x89 [ 1281.193915] __warn+0xd1/0xf0 [ 1281.193917] warn_slowpath_null+0x1d/0x20 [ 1281.193918] __queue_delayed_work+0x13f/0x150 [ 1281.193920] queue_delayed_work_on+0x27/0x40 [ 1281.193929] bond_change_active_slave+0x25b/0x670 [bonding] [ 1281.193932] ? synchronize_rcu_expedited+0x27/0x30 [ 1281.193935] __bond_release_one+0x489/0x510 [bonding] [ 1281.193939] ? addrconf_notify+0x1b7/0xab0 [ 1281.193942] bond_netdev_event+0x2c5/0x2e0 [bonding] [ 1281.193944] ? netconsole_netdev_event+0x124/0x190 [netconsole] [ 1281.193947] notifier_call_chain+0x49/0x70 [ 1281.193948] raw_notifier_call_chain+0x16/0x20 [ 1281.193950] call_netdevice_notifiers_info+0x35/0x60 [ 1281.193951] rollback_registered_many+0x23b/0x3e0 [ 1281.193953] unregister_netdevice_many+0x24/0xd0 [ 1281.193955] rtnl_delete_link+0x3c/0x50 [ 1281.193956] rtnl_dellink+0x8d/0x1b0 [ 1281.193960] rtnetlink_rcv_msg+0x95/0x220 [ 1281.193962] ? __kmalloc_node_track_caller+0x35/0x280 [ 1281.193964] ? __netlink_lookup+0xf1/0x110 [ 1281.193966] ? rtnl_newlink+0x830/0x830 [ 1281.193967] netlink_rcv_skb+0xa7/0xc0 [ 1281.193969] rtnetlink_rcv+0x28/0x30 [ 1281.193970] netlink_unicast+0x15b/0x210 [ 1281.193971] netlink_sendmsg+0x319/0x390 [ 1281.193974] sock_sendmsg+0x38/0x50 [ 1281.193975] ___sys_sendmsg+0x25c/0x270 [ 1281.193978] ? mem_cgroup_commit_charge+0x76/0xf0 [ 1281.193981] ? page_add_new_anon_rmap+0x89/0xc0 [ 1281.193984] ? lru_cache_add_active_or_unevictable+0x35/0xb0 [ 1281.193985] ? __handle_mm_fault+0x4e9/0x1170 [ 1281.193987] __sys_sendmsg+0x45/0x80 [ 1281.193989] SyS_sendmsg+0x12/0x20 [ 1281.193991] do_syscall_64+0x6e/0x180 [ 1281.193993] entry_SYSCALL64_slow_path+0x25/0x25 [ 1281.193995] RIP: 0033:0x7f6ec122f5a0 [ 1281.193995] RSP: 002b:00007ffe69e89c48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 1281.193997] RAX: ffffffffffffffda RBX: 00007ffe69e8dd60 RCX: 00007f6ec122f5a0 [ 1281.193997] RDX: 0000000000000000 RSI: 00007ffe69e89c90 RDI: 0000000000000003 [ 1281.193998] RBP: 00007ffe69e89c90 R08: 0000000000000000 R09: 0000000000000003 [ 1281.193999] R10: 00007ffe69e89a10 R11: 0000000000000246 R12: 0000000058f14b9f [ 1281.193999] R13: 0000000000000000 R14: 00000000006473a0 R15: 00007ffe69e8e450 [ 1281.194001] ---[ end trace 713a77486cbfbfa3 ]--- Fixes: `4493b81bea` ("bonding: initialize work-queues during creation of bond") Reported-by: Joe Stringer <joe@ovn.org> Tested-by: Joe Stringer <joe@ovn.org> Signed-off-by: Mahesh Bandewar <maheshb@google.com> Acked-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-21 15:28:37 -04:00
Alexander Kochetkov	b18b745397	net: arc_emac: switch to phy_start()/phy_stop() Currently driver use phy_start_aneg() in arc_emac_open() to bring up PHY. But phy_start() function is more appropriate for this purposes. Besides that it call phy_start_aneg() as part of PHY startup sequence it also can correctly bring up PHY from error and suspended states. So the patch replace phy_start_aneg() to phy_start(). Also the patch add call to phy_stop() to arc_emac_stop() to allow the PHY device to be fully suspended when the interface is unused. Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-21 15:23:52 -04:00
David S. Miller	6b633e82b0	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2017-04-20 This adds the basic infrastructure for IPsec hardware offloading, it creates a configuration API and adjusts the packet path. 1) Add the needed netdev features to configure IPsec offloads. 2) Add the IPsec hardware offloading API. 3) Prepare the ESP packet path for hardware offloading. 4) Add gso handlers for esp4 and esp6, this implements the software fallback for GSO packets. 5) Add xfrm replay handler functions for offloading. 6) Change ESP to use a synchronous crypto algorithm on offloading, we don't have the option for asynchronous returns when we handle IPsec at layer2. 7) Add a xfrm validate function to validate_xmit_skb. This implements the software fallback for non GSO packets. 8) Set the inner_network and inner_transport members of the SKB, as well as encapsulation, to reflect the actual positions of these headers, and removes them only once encryption is done on the payload. From Ilan Tayari. 9) Prepare the ESP GRO codepath for hardware offloading. 10) Fix incorrect null pointer check in esp6. From Colin Ian King. 11) Fix for the GSO software fallback path to detect the fallback correctly. From Ilan Tayari. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-21 15:11:28 -04:00
Steffen Klassert	77999328b5	MAINTAINERS: Add new IPsec offloading files. This adds two new files to IPsec maintenance scope: net/ipv4/esp4_offload.c net/ipv6/ip6_offload.c Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-21 15:11:09 -04:00
David S. Miller	072cec7797	Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2017-04-19 This series contains updates to i40e and i40evf only, most notable being the addition of trace points for BPF programs. Tobias Klauser updates i40evf to use net_device stats struct instead of a local private copy. Preethi updates the VF driver to not enable receive checksum offload by default for tunneled packets. Alex fixes an issue he introduced when he converted the code over to using the length field to determine if a descriptor was done or not. Mitch adds the ability to dump additional information on the VFs, which is not available through 'ip link show' using debugfs. Scott adds trace points to the drivers so that BPF programs can be attached for feature testing and verification. Jingjing adds admin queue functions for Pipeline Personalization Profile commands. Jake does most of the heavy lifting in this series, starting with the a reduction in the scope of the RTNL lock being held while resetting VFs to allow multiple PFs to reset in a timely manner. Factored out the direct queue modification so that we are able to re-use the code. Reduced the wait time for admin queue commands to complete, since we were waiting a minimum of a millisecond, when in practice the admin queue command is processed often much faster. Cleaned up code (flag) we never use. Make the code to resetting all the VFs optimized for parallel computing instead of the current way is a serialized fashion, to help reduce the time it takes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-21 14:13:11 -04:00
stephen hemminger	76bb5db5c7	netvsc: fix use after free on module removal The NAPI data structure is embedded in the netvsc_device structure and is freed when device is closed. There is still a reference (in NAPI list) to this which causes a crash in netif_napi_del when device is removed. Fix by managing NAPI instances correctly. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-21 13:59:57 -04:00
David S. Miller	dfb05553a5	Merge branch 'tc-filter-cleanup-destroy-delete' Cong Wang says: ==================== net_sched: clean up tc filter destroy and delete logic The first patch fixes a potenial race condition, the second one is pure cleanup. ==================== Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-21 13:58:16 -04:00
WANG Cong	4392053879	net_sched: remove useless NULL to tp->root There is no need to NULL tp->root in ->destroy(), since tp is going to be freed very soon, and existing readers are still safe to read them. For cls_route, we always init its tp->root, so it can't be NULL, we can drop more useless code. Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-21 13:58:15 -04:00
WANG Cong	763dbf6328	net_sched: move the empty tp check from ->destroy() to ->delete() We could have a race condition where in ->classify() path we dereference tp->root and meanwhile a parallel ->destroy() makes it a NULL. Daniel cured this bug in commit `d936377414` ("net, sched: respect rcu grace period on cls destruction"). This happens when ->destroy() is called for deleting a filter to check if we are the last one in tp, this tp is still linked and visible at that time. The root cause of this problem is the semantic of ->destroy(), it does two things (for non-force case): 1) check if tp is empty 2) if tp is empty we could really destroy it and its caller, if cares, needs to check its return value to see if it is really destroyed. Therefore we can't unlink tp unless we know it is empty. As suggested by Daniel, we could actually move the test logic to ->delete() so that we can safely unlink tp after ->delete() tells us the last one is just deleted and before ->destroy(). Fixes: `1e052be69d` ("net_sched: destroy proto tp when all filters are gone") Cc: Roi Dayan <roid@mellanox.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-21 13:58:15 -04:00
Daniel Borkmann	b1d9fc41aa	bpf: add napi_id read access to __sk_buff Add napi_id access to __sk_buff for socket filter program types, tc program types and other bpf_convert_ctx_access() users. Having access to skb->napi_id is useful for per RX queue listener siloing, f.e. in combination with SO_ATTACH_REUSEPORT_EBPF and when busy polling is used, meaning SO_REUSEPORT enabled listeners can then select the corresponding socket at SYN time already [1]. The skb is marked via skb_mark_napi_id() early in the receive path (e.g., napi_gro_receive()). Currently, sockets can only use SO_INCOMING_NAPI_ID from `6d4339028b` ("net: Introduce SO_INCOMING_NAPI_ID") as a socket option to look up the NAPI ID associated with the queue for steering, which requires a prior sk_mark_napi_id() after the socket was looked up. Semantics for the __sk_buff napi_id access are similar, meaning if skb->napi_id is < MIN_NAPI_ID (e.g. outgoing packets using sender_cpu), then an invalid napi_id of 0 is returned to the program, otherwise a valid non-zero napi_id. [1] http://netdevconf.org/2.1/slides/apr6/dumazet-BUSY-POLLING-Netdev-2.1.pdf Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-21 13:53:27 -04:00
K. Y. Srinivasan	73e64fa4f4	netvsc: Deal with rescinded channels correctly We will not be able to send packets over a channel that has been rescinded. Make necessary adjustments so we can properly cleanup even when the channel is rescinded. This issue can be trigerred in the NIC hot-remove path. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-21 13:47:00 -04:00
David S. Miller	87e978edac	Merge branch 'ibmvnic-updates-and-bug-fixes' Nathan Fontenot says: ==================== ibmvnic: Updates and bug fixes This set of patches is a series of updates to remove some unneeded and unused code in the driver as well as bug fixes for the ibmvnic driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-21 13:33:55 -04:00
Nathan Fontenot	d76e0fec7e	ibmvnic: Remove unused bouce buffer The bounce buffer is not used in the ibmvnic driver, just get rid of it. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-21 13:33:55 -04:00
Nathan Fontenot	7f7adc5060	ibmvnic: Allocate zero-filled memory for sub crqs Update the allocation of memory for the sub crq structs and their associated pages to allocate zero-filled memory. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-21 13:33:54 -04:00
Brian King	dd9c20fa07	ibmvnic: Disable irq prior to close Add some code to call disable_irq on all the vnic interface's irqs. This fixes a crash observed when closing an active interface, as seen in the oops below when we try to access a buffer in the interrupt handler which we've already freed. Unable to handle kernel paging request for data at address 0x00000001 Faulting instruction address: 0xd000000003886824 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=2048 NUMA pSeries Modules linked in: ibmvnic(OEN) rpadlpar_io(X) rpaphp(X) tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag rpcsec_ Supported: No, Unsupported modules are loaded CPU: 8 PID: 0 Comm: swapper/8 Tainted: G OE NX 4.4.49-92.11-default #1 task: c00000007f990110 ti: c0000000fffa0000 task.ti: c00000007f9b8000 NIP: d000000003886824 LR: d000000003886824 CTR: c0000000007eff60 REGS: c0000000fffa3a70 TRAP: 0300 Tainted: G OE NX (4.4.49-92.11-default) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 22008042 XER: 20000008 CFAR: c000000000008468 DAR: 0000000000000001 DSISR: 40000000 SOFTE: 0 GPR00: d000000003886824 c0000000fffa3cf0 d000000003894118 0000000000000000 GPR04: 0000000000000000 0000000000000000 c000000001249da0 0000000000000000 GPR08: 000000000000000e 0000000000000000 c0000000ccb00000 d000000003889180 GPR12: c0000000007eff60 c000000007af4c00 0000000000000001 c0000000010def30 GPR16: c00000007f9b8000 c000000000b98c30 c00000007f9b8080 c000000000bab858 GPR20: 0000000000000005 0000000000000000 c0000000ff5d7e80 c0000000f809f648 GPR24: c0000000ff5d7ec8 0000000000000000 0000000000000000 c0000000ccb001a0 GPR28: 000000000000000a c0000000f809f600 c0000000fd4cd900 c0000000f9cd5b00 NIP [d000000003886824] ibmvnic_interrupt_tx+0x114/0x380 [ibmvnic] LR [d000000003886824] ibmvnic_interrupt_tx+0x114/0x380 [ibmvnic] Call Trace: [c0000000fffa3cf0] [d000000003886824] ibmvnic_interrupt_tx+0x114/0x380 [ibmvnic] (unreliable) [c0000000fffa3dd0] [c000000000132940] __handle_irq_event_percpu+0x90/0x2e0 [c0000000fffa3e90] [c000000000132bcc] handle_irq_event_percpu+0x3c/0x90 [c0000000fffa3ed0] [c000000000132c88] handle_irq_event+0x68/0xc0 [c0000000fffa3f00] [c000000000137edc] handle_fasteoi_irq+0xec/0x250 [c0000000fffa3f30] [c000000000131b04] generic_handle_irq+0x54/0x80 [c0000000fffa3f60] [c000000000011190] __do_irq+0x80/0x1d0 [c0000000fffa3f90] [c0000000000248d8] call_do_irq+0x14/0x24 [c00000007f9bb9e0] [c000000000011380] do_IRQ+0xa0/0x120 [c00000007f9bba40] [c000000000002594] hardware_interrupt_common+0x114/0x180 Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-21 13:33:53 -04:00
Nathan Fontenot	3748905599	ibmvnic: Correct crq and resource releasing We should not be releasing the crq's when calling close for the adapter, these need to remain open to facilitate operations such as updating the mac address. The crq's should be released in the adpaters remove routine. Additionally, we need to call release_reources from remove. This corrects the scenario of trying to remove an adapter that has only been probed. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-21 13:33:53 -04:00
Nathan Fontenot	661a262276	ibmvnic: Remove inflight list The inflight list used to track memory that is allocated for crq that are inflight is not needed. The one piece of the inflight list that does need to be cleaned at module exit is the error buffer list which is already attached to the adapter struct. This patch removes the inflight list and moves checking the error buffer list to ibmvnic_remove. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-21 13:33:53 -04:00
Brian King	ed7ecbf700	ibmvnic: Do not disable IRQ after scheduling tasklet Since the primary CRQ is only used for service functions and not in the performance path, simplify the code a bit and avoid disabling the IRQ. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-21 13:33:53 -04:00
Brian King	58c8c0c096	ibmvnic: Fixup atomic API usage Replace a couple of modifications of an atomic followed by a read of the atomic, which is no longer atomic, to use atomic_XX_return variants to avoid race conditions. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-21 13:33:52 -04:00
Brian King	59af56c25b	ibmvnic: Unmap longer term buffer before free Make sure we unregister long term buffers from the adapter prior to DMA unmapping it and freeing the buffer. Failure to do so could result in a DMA to a now invalid address. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-21 13:33:52 -04:00
Murilo Fossa Vicentini	993a82b0ff	ibmvnic: Fix ibmvnic_change_mac_addr struct format The ibmvnic_change_mac_addr struct alignment was not matching the defined format in PAPR+, it had the reserved and return code fields swapped. As a consequence, the CHANGE_MAC_ADDR_RSP commands were being improperly handled and executed even when the operation wasn't successfully completed by the system firmware. Also changing the endianness of the debug message to make it easier to parse the CRQ content. Signed-off-by: Murilo Fossa Vicentini <muvic@linux.vnet.ibm.com> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-21 13:33:52 -04:00
Thomas Falcon	ffa738555b	ibmvnic: Report errors when failing to release sub-crqs Add reporting of errors when releasing sub-crqs fails. Signed-off-by: Thomas Falcon <tlfalcon@us.ibm.com> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-21 13:33:52 -04:00
Arnd Bergmann	ca1cb28da0	liquidio: remove unnecessary variable assignment gcc points out an useless assignment that was added during code refactoring: drivers/net/ethernet/cavium/liquidio/lio_ethtool.c: In function 'octnet_intrmod_callback': drivers/net/ethernet/cavium/liquidio/lio_ethtool.c:1315:59: error: parameter 'oct_dev' set but not used [-Werror=unused-but-set-parameter] This is harmless but can clearly be remove to avoid the warning. Fixes: `50c0add534` ("liquidio: refactor interrupt moderation code") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-21 13:25:34 -04:00
Matthew Whitehead	7acf8a1e8a	Replace 2 jiffies with sysctl netdev_budget_usecs to enable softirq tuning Constants used for tuning are generally a bad idea, especially as hardware changes over time. Replace the constant 2 jiffies with sysctl variable netdev_budget_usecs to enable sysadmins to tune the softirq processing. Also document the variable. For example, a very fast machine might tune this to 1000 microseconds, while my regression testing 486DX-25 needs it to be 4000 microseconds on a nearly idle network to prevent time_squeeze from being incremented. Version 2: changed jiffies to microseconds for predictable units. Signed-off-by: Matthew Whitehead <tedheadster@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-21 13:22:34 -04:00
David S. Miller	20da848f8e	Merge branch 'iptunnel-policy-based-routing' Craig Gallek says: ==================== ip_tunnel: Allow policy-based routing through tunnels iproute2 changes to follow. Example usage: ip link add gre-test type gre local 10.0.0.1 remote 10.0.0.2 fwmark 0x4 ip -detail link show gre-test ... ip link set gre-test type gre fwmark 0 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-21 13:21:32 -04:00
Craig Gallek	9830ad4c6a	ip_tunnel: Allow policy-based routing through tunnels This feature allows the administrator to set an fwmark for packets traversing a tunnel. This allows the use of independent routing tables for tunneled packets without the use of iptables. There is no concept of per-packet routing decisions through IPv4 tunnels, so this implementation does not need to work with per-packet route lookups as the v6 implementation may (with IP6_TNL_F_USE_ORIG_FWMARK). Further, since the v4 tunnel ioctls share datastructures (which can not be trivially modified) with the kernel's internal tunnel configuration structures, the mark attribute must be stored in the tunnel structure itself and passed as a parameter when creating or changing tunnel attributes. Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-21 13:21:31 -04:00
Craig Gallek	0a473b82cb	ip6_tunnel: Allow policy-based routing through tunnels This feature allows the administrator to set an fwmark for packets traversing a tunnel. This allows the use of independent routing tables for tunneled packets without the use of iptables. Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-21 13:21:30 -04:00
Florian Fainelli	8e6c1812e6	net: dsa: Remove redundant NULL dst check tag_lan9303.c does check for a NULL dst but that's already checked by dsa_switch_rcv() one layer above. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Juergen Borleis <jbe@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-21 10:41:24 -04:00
Dan Carpenter	6905e5a5c8	net/mlx5e: IPoIB, Fix error handling in mlx5_rdma_netdev_alloc() The labels were out of order, so it either could result in an Oops or a leak. Fixes: `48935bbb7a` ("net/mlx5e: IPoIB, Add netdevice profile skeleton") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-20 16:26:10 -04:00
Dan Carpenter	f6ca26f268	qede: allocate enough data for ->arfs_fltr_bmap We've got the number of longs, yes, but we should multiply by sizeof(long) to get the number of bytes needed. Fixes: `e4917d46a6` ("qede: Add aRFS support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-20 16:23:23 -04:00
Chema Gonzalez	d6ecf32805	tcp_cubic: fix typo in module param description Signed-off-by: Chema Gonzalez <chemag@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-20 16:16:44 -04:00
Jamal Hadi Salim	b603aa4d32	Add Jiri Pirko as TC subsystem co-maintainer Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-20 16:15:04 -04:00
Jamal Hadi Salim	7ab273be23	Add Cong Wang as TC subsystem co-maintainer Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-20 16:15:04 -04:00
David S. Miller	a5f62ca68f	Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 10GbE Intel Wired LAN Driver Updates 2017-04-18 This series contains updates to mainly ixgbe with only one ixgbevf change. Usha adds a check to ensure the creation of number of VF's is valid based on the traffic classes configured, all to avoid transmit hangs. Joe Perches reduces the use of pr_cont since the output can be interleaved by other processes. Tony cleans up the code overwriting the KX4 config, which is configured by the NVM. Adds a check for MMNGC.MNG_VETO, to resolve an issue where we were getting a link loss for the BMC when loading the driver. Don fixes up SGMII x553 config details which were missed in earlier implementations. Added support for x552 XFI backplane interface support. Cleaned up an unused define, which was causing confusion on supported devices. Emil fixes a link issue on KR parts by making sure the default setting is set. Refactors the code so that the code for allocating memory for the list of MAC addresses that the VFs can use into its own function. Made some code cleans to help readability and ensure notification of SRIOV being enabled is done upon completion. Fixed an issue where if we failed to allocate vfinfo in __ixgbe_enable_sriov() the driver would crash with a NULL pointer dereference. Philippe Reynes updates ixgbevf to use the new API for {get\|set}_link_ksettings. Alex increases the headroom allocation when using build_skb() on a system with 4K pages. Fixed an issue in ixgbe_dump() where we were no longer clearing the status bit. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-20 16:10:31 -04:00
subashab@codeaurora.org	0bd84065b1	net: ipv6: Fix UDP early demux lookup with udp_l3mdev_accept=0 David Ahern reported that `5425077d73` ("net: ipv6: Add early demux handler for UDP unicast") breaks udp_l3mdev_accept=0 since early demux for IPv6 UDP was doing a generic socket lookup which does not require an exact match. Fix this by making UDPv6 early demux match connected sockets only. v1->v2: Take reference to socket after match as suggested by Eric v2->v3: Add comment before break Fixes: `5425077d73` ("net: ipv6: Add early demux handler for UDP unicast") Reported-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Cc: Eric Dumazet <edumazet@google.com> Acked-by: David Ahern <dsa@cumulusnetworks.com> Tested-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-20 15:50:27 -04:00
David S. Miller	8ad0921b8a	Merge branch 'tcp_poll-flakes' Eric Dumazet says: ==================== tcp: address two poll() flakes Some packetdrill tests are failing when host kernel is using ASAN or other debugging infrastructure. I was able to fix the flakes by making sure we were not sending wakeup events too soon. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-20 15:42:11 -04:00
Eric Dumazet	0f9fa831ae	tcp: remove poll() flakes with FastOpen When using TCP FastOpen for an active session, we send one wakeup event from tcp_finish_connect(), right before the data eventually contained in the received SYNACK is queued to sk->sk_receive_queue. This means that depending on machine load or luck, poll() users might receive POLLOUT events instead of POLLIN\|POLLOUT To fix this, we need to move the call to sk->sk_state_change() after the (optional) call to tcp_rcv_fastopen_synack() Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-20 15:42:11 -04:00
Eric Dumazet	3d4762639d	tcp: remove poll() flakes when receiving RST When a RST packet is processed, we send two wakeup events to interested polling users. First one by a sk->sk_error_report(sk) from tcp_reset(), followed by a sk->sk_state_change(sk) from tcp_done(). Depending on machine load and luck, poll() can either return POLLERR, or POLLIN\|POLLOUT\|POLLERR\|POLLHUP (this happens on 99 % of the cases) This is probably fine, but we can avoid the confusion by reordering things so that we have more TCP fields updated before the first wakeup. This might even allow us to remove some barriers we added in the past. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-20 15:42:10 -04:00
David S. Miller	d02e93d647	Merge branch 'mlxsw-flow-based-forwarding-OVS' Jiri Pirko says: ==================== mlxsw: Allow flow based forwarding in OVS This patchset does some fixes so the HW is setup correctly to do flow-based (ACL based) forwarding for OVS-enslaved port. The first patch is just trivial fix spotted on the way. Patches 2-4 take care of proper FID setup which HW needs in order to for ACL based forwarding. The 7th patch (with dependency of patch 5 and 6) takes care of proper setup of ports that are enslaved in OVS. The last patch implements new FID miss trap that is used to push packets belonging to unknown flows to kernel and userspace. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-20 15:32:32 -04:00
Jiri Pirko	9d41accc1e	mlxsw: spectrum: Add FID miss trap When there is no FID set for a specific packet, the HW will drop it. However, by default these packets are useful to be delivered to CPU as it can inspect them and program HW accordingly. So add this trap. This would only ever happen when port is enslaved to an OVS master. Otherwise, packets would be dropped during VLAN / STP filtering, before FID classification. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-20 15:32:31 -04:00
Jiri Pirko	2b94e58df5	mlxsw: spectrum: Allow ports to work under OVS master >From now on, a port can become a slave of OVS master. All vlans are enabled, STP state is set to "forwarding". It is up to the OVS userspace daemon to setup the flows either in kernel or in HW using TC flower offload. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-20 15:32:31 -04:00
Jiri Pirko	5be6614127	net: add netif_is_ovs_port helper To find out if a netdev is an OVS port. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-20 15:32:31 -04:00
Jiri Pirko	93cd081305	mlxsw: spectrum: Teach mlxsw_sp_port_vlan_set to accept any vlan range So far, mlxsw_sp_port_vlan_set range is limited by MLXSW_REG_SPVM_REC_MAX_COUNT. In spectrum_switchdev code this is wrapped up by a helper function which actually does multiple calls to FW for bigger ranges. Move the code into mlxsw_sp_port_vlan_set and use it always. That allows caller not to care about the range. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-20 15:32:30 -04:00
Jiri Pirko	cedbb8b259	mlxsw: spectrum_flower: Set dummy FID before forward action HW requires the FID to be valid in order for the forward action to work. So regardless of the current FID validity, just set the dummy FID which would do the trick. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-20 15:32:30 -04:00
Jiri Pirko	202d6f423c	mlxsw: spectrum: Add dummy FID initialization For forwarding using ACL action, HW needs a valid FID to be setup. It does not actually use it, so it can be any valid FID. So create a dummy FID only for this purpose. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-20 15:32:30 -04:00
Jiri Pirko	ac44dd43d8	mlxsw: spectrum: Implement action to set FID Implement part of multipurpose Virtual Router and Forwarding Domain Action that takes care of setting up FID. We need to use it to be able to forward packets using ACL action when no FID is associated on RX. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-20 15:32:30 -04:00
Jiri Pirko	b51df799f6	mlxsw: spectrum: Fix indent in mlxsw_sp_netdevice_port_upper_event Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-20 15:32:29 -04:00
Niklas Cassel	fada43ccc8	bindings: net: stmmac: add missing note about LPI interrupt The hardware has a LPI interrupt. There is already code in the stmmac driver to parse and handle the interrupt. However, this information was missing from the DT binding. At the same time, improve the description of the existing interrupts. Signed-off-by: Niklas Cassel <niklas.cassel@axis.com> Acked-By: Joao Pinto <jpinto@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-20 15:29:46 -04:00
Tobias Klauser	1f504ec989	bpf: remove reference to sock_filter_ext from kerneldoc comment struct sock_filter_ext didn't make it into the tree and is now called struct bpf_insn. Reword the kerneldoc comment for bpf_convert_filter() accordingly. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-20 15:26:49 -04:00
David S. Miller	028f43bc64	My last pull request has been a while, we now have: * connection quality monitoring with multiple thresholds * support for FILS shared key authentication offload * pre-CAC regulatory compliance - only ETSI allows this * sanity check for some rate confusion that hit ChromeOS (but nobody else uses it, evidently) * some documentation updates * lots of cleanups -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEExu3sM/nZ1eRSfR9Ha3t4Rpy0AB0FAlj12HMACgkQa3t4Rpy0 AB0ztBAAi0tH9xR/7iYgChyZV4S8PpYKo2QoQZofG8vzAztboqI4clAxbWEOsJHh qddjm+foiHVJtZj2LqxjDcaxk69VIh/ERSlR7ve7GCzz9WAAWBMHZop2eArHvgI1 pqP4mQEZ7QISVo88H3LeRdj8NmTwfZYH8u8e2CN3yEpSh1PPrU+slaXRLrjB4uql XWwwJYQatgDw6Dj4vTIk++DqGo7OhK6CrC1gZLnyOtitTiPzRtfj8rdRHeRKdlj4 wOkUaenjs5r9KsofNYZpzckHp2NEpgIruqCsNdRGHf14EWBC5Q1N35OUOecyQ67T 3VeSnHxU4qjomkXgwqmDKFFOdqtqIruor3YDdO1iwO2TNF+JlNfq5AqUNec/XjUv VDmj1NRZE0ftJtCkDFm1Q/ABfVDH9i2O6ZBs6a3zb65lA83q1y4xlF48LqDzG3qi fNnfRO2rOOiyosF3HEkF5u1mfD6MRUtZAc2ZiHckGUpAngs5QOWKqtVgcgWjmbFW qDTKsFYi2YpGXZAnUjqS4ZtmcgRGEXqg1STJBt4cA8cnmI9Ka5GplACVhqzGeneH EYMESEct9BOpR6BjABmbZL09NtCkiTPYjiL4V//USr4f6NFhOeHHMYuxYFYIEgC6 ldRjf4EUzZw0QJ8X6L+zxYI5m40fEJ7bGhlIdMo7fWXpRpCaF1Y= =f4VT -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-davem-2017-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== My last pull request has been a while, we now have: * connection quality monitoring with multiple thresholds * support for FILS shared key authentication offload * pre-CAC regulatory compliance - only ETSI allows this * sanity check for some rate confusion that hit ChromeOS (but nobody else uses it, evidently) * some documentation updates * lots of cleanups ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-20 13:54:40 -04:00

1 2 3 4 5 ...

665029 Commits All Branches Search

665029 Commits

All Branches