linux

Commit Graph

Author	SHA1	Message	Date
Xie He	9dc829a135	drivers/net/wan/lapbether: Fixed the value of hard_header_len When this driver transmits data, first this driver will remove a pseudo header of 1 byte, then the lapb module will prepend the LAPB header of 2 or 3 bytes, then this driver will prepend a length field of 2 bytes, then the underlying Ethernet device will prepend its own header. So, the header length required should be: -1 + 3 + 2 + "the header length needed by the underlying device". This patch fixes kernel panic when this driver is used with AF_PACKET SOCK_DGRAM sockets. Signed-off-by: Xie He <xie.he.0141@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-06 12:16:21 -07:00
David S. Miller	0f57a1e522	Merge branch 'net-rmnet-fix-interface-leak-for-rmnet-module' Taehee Yoo says: ==================== net: rmnet: fix interface leak for rmnet module There are two problems in rmnet module that they occur the leak of a lower interface. The symptom is the same, which is the leak of a lower interface. But there are two different real problems. This patchset is to fix these real problems. 1. Do not allow to have different two modes. As a lower interface of rmnet, there are two modes that they are VND and BRIDGE. One interface can have only one mode. But in the current rmnet, there is no code to prevent to have two modes in one lower interface. So, interface leak occurs. 2. Do not allow to add multiple bridge interfaces. rmnet can have only two bridge interface. If an additional bridge interface is tried to be attached, rmnet should deny it. But there is no code to do that. So, interface leak occurs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-04 18:04:56 -07:00
Taehee Yoo	2fb2799a2a	net: rmnet: do not allow to add multiple bridge interfaces rmnet can have only two bridge interface. One of them is a link interface and another one is added by the master operation. rmnet interface shouldn't allow adding additional bridge interfaces by mater operation. But, there is no code to deny additional interfaces. So, interface leak occurs. Test commands: ip link add dummy0 type dummy ip link add dummy1 type dummy ip link add dummy2 type dummy ip link add rmnet0 link dummy0 type rmnet mux_id 1 ip link set dummy1 master rmnet0 ip link set dummy2 master rmnet0 ip link del rmnet0 In the above test command, the dummy0 was attached to rmnet as VND mode. Then, dummy1 was attached to rmnet0 as BRIDGE mode. At this point, dummy0 mode is switched from VND to BRIDGE automatically. Then, dummy2 is attached to rmnet as BRIDGE mode. At this point, rmnet0 should deny this operation. But, rmnet0 doesn't deny this. So that below splat occurs when the rmnet0 interface is deleted. Splat looks like: [ 186.684787][ C2] WARNING: CPU: 2 PID: 1009 at net/core/dev.c:8992 rollback_registered_many+0x986/0xcf0 [ 186.684788][ C2] Modules linked in: rmnet dummy openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_x [ 186.684805][ C2] CPU: 2 PID: 1009 Comm: ip Not tainted 5.8.0-rc1+ #621 [ 186.684807][ C2] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 186.684808][ C2] RIP: 0010:rollback_registered_many+0x986/0xcf0 [ 186.684811][ C2] Code: 41 8b 4e cc 45 31 c0 31 d2 4c 89 ee 48 89 df e8 e0 47 ff ff 85 c0 0f 84 cd fc ff ff 5 [ 186.684812][ C2] RSP: 0018:ffff8880cd9472e0 EFLAGS: 00010287 [ 186.684815][ C2] RAX: ffff8880cc56da58 RBX: ffff8880ab21c000 RCX: ffffffff9329d323 [ 186.684816][ C2] RDX: 1ffffffff2be6410 RSI: 0000000000000008 RDI: ffffffff95f32080 [ 186.684818][ C2] RBP: dffffc0000000000 R08: fffffbfff2be6411 R09: fffffbfff2be6411 [ 186.684819][ C2] R10: ffffffff95f32087 R11: 0000000000000001 R12: ffff8880cd947480 [ 186.684820][ C2] R13: ffff8880ab21c0b8 R14: ffff8880cd947400 R15: ffff8880cdf10640 [ 186.684822][ C2] FS: 00007f00843890c0(0000) GS:ffff8880d4e00000(0000) knlGS:0000000000000000 [ 186.684823][ C2] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 186.684825][ C2] CR2: 000055b8ab1077b8 CR3: 00000000ab612006 CR4: 00000000000606e0 [ 186.684826][ C2] Call Trace: [ 186.684827][ C2] ? lockdep_hardirqs_on_prepare+0x379/0x540 [ 186.684829][ C2] ? netif_set_real_num_tx_queues+0x780/0x780 [ 186.684830][ C2] ? rmnet_unregister_real_device+0x56/0x90 [rmnet] [ 186.684831][ C2] ? __kasan_slab_free+0x126/0x150 [ 186.684832][ C2] ? kfree+0xdc/0x320 [ 186.684834][ C2] ? rmnet_unregister_real_device+0x56/0x90 [rmnet] [ 186.684835][ C2] unregister_netdevice_many.part.135+0x13/0x1b0 [ 186.684836][ C2] rtnl_delete_link+0xbc/0x100 [ ... ] [ 238.440071][ T1009] unregister_netdevice: waiting for rmnet0 to become free. Usage count = 1 Fixes: `037f9cdf72` ("net: rmnet: use upper/lower device infrastructure") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-04 18:04:55 -07:00
Taehee Yoo	2a762e9e8c	net: rmnet: fix lower interface leak There are two types of the lower interface of rmnet that are VND and BRIDGE. Each lower interface can have only one type either VND or BRIDGE. But, there is a case, which uses both lower interface types. Due to this unexpected behavior, lower interface leak occurs. Test commands: ip link add dummy0 type dummy ip link add dummy1 type dummy ip link add rmnet0 link dummy0 type rmnet mux_id 1 ip link set dummy1 master rmnet0 ip link add rmnet1 link dummy1 type rmnet mux_id 2 ip link del rmnet0 The dummy1 was attached as BRIDGE interface of rmnet0. Then, it also was attached as VND interface of rmnet1. This is unexpected behavior and there is no code for handling this case. So that below splat occurs when the rmnet0 interface is deleted. Splat looks like: [ 53.254112][ C1] WARNING: CPU: 1 PID: 1192 at net/core/dev.c:8992 rollback_registered_many+0x986/0xcf0 [ 53.254117][ C1] Modules linked in: rmnet dummy openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nfx [ 53.254182][ C1] CPU: 1 PID: 1192 Comm: ip Not tainted 5.8.0-rc1+ #620 [ 53.254188][ C1] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 53.254192][ C1] RIP: 0010:rollback_registered_many+0x986/0xcf0 [ 53.254200][ C1] Code: 41 8b 4e cc 45 31 c0 31 d2 4c 89 ee 48 89 df e8 e0 47 ff ff 85 c0 0f 84 cd fc ff ff 0f 0b e5 [ 53.254205][ C1] RSP: 0018:ffff888050a5f2e0 EFLAGS: 00010287 [ 53.254214][ C1] RAX: ffff88805756d658 RBX: ffff88804d99c000 RCX: ffffffff8329d323 [ 53.254219][ C1] RDX: 1ffffffff0be6410 RSI: 0000000000000008 RDI: ffffffff85f32080 [ 53.254223][ C1] RBP: dffffc0000000000 R08: fffffbfff0be6411 R09: fffffbfff0be6411 [ 53.254228][ C1] R10: ffffffff85f32087 R11: 0000000000000001 R12: ffff888050a5f480 [ 53.254233][ C1] R13: ffff88804d99c0b8 R14: ffff888050a5f400 R15: ffff8880548ebe40 [ 53.254238][ C1] FS: 00007f6b86b370c0(0000) GS:ffff88806c200000(0000) knlGS:0000000000000000 [ 53.254243][ C1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 53.254248][ C1] CR2: 0000562c62438758 CR3: 000000003f600005 CR4: 00000000000606e0 [ 53.254253][ C1] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 53.254257][ C1] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 53.254261][ C1] Call Trace: [ 53.254266][ C1] ? lockdep_hardirqs_on_prepare+0x379/0x540 [ 53.254270][ C1] ? netif_set_real_num_tx_queues+0x780/0x780 [ 53.254275][ C1] ? rmnet_unregister_real_device+0x56/0x90 [rmnet] [ 53.254279][ C1] ? __kasan_slab_free+0x126/0x150 [ 53.254283][ C1] ? kfree+0xdc/0x320 [ 53.254288][ C1] ? rmnet_unregister_real_device+0x56/0x90 [rmnet] [ 53.254293][ C1] unregister_netdevice_many.part.135+0x13/0x1b0 [ 53.254297][ C1] rtnl_delete_link+0xbc/0x100 [ 53.254301][ C1] ? rtnl_af_register+0xc0/0xc0 [ 53.254305][ C1] rtnl_dellink+0x2dc/0x840 [ 53.254309][ C1] ? find_held_lock+0x39/0x1d0 [ 53.254314][ C1] ? valid_fdb_dump_strict+0x620/0x620 [ 53.254318][ C1] ? rtnetlink_rcv_msg+0x457/0x890 [ 53.254322][ C1] ? lock_contended+0xd20/0xd20 [ 53.254326][ C1] rtnetlink_rcv_msg+0x4a8/0x890 [ ... ] [ 73.813696][ T1192] unregister_netdevice: waiting for rmnet0 to become free. Usage count = 1 Fixes: `037f9cdf72` ("net: rmnet: use upper/lower device infrastructure") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-04 18:04:55 -07:00
Taehee Yoo	ccfc9df135	hsr: fix interface leak in error path of hsr_dev_finalize() To release hsr(upper) interface, it should release its own lower interfaces first. Then, hsr(upper) interface can be released safely. In the current code of error path of hsr_dev_finalize(), it releases hsr interface before releasing a lower interface. So, a warning occurs, which warns about the leak of lower interfaces. In order to fix this problem, changing the ordering of the error path of hsr_dev_finalize() is needed. Test commands: ip link add dummy0 type dummy ip link add dummy1 type dummy ip link add dummy2 type dummy ip link add hsr0 type hsr slave1 dummy0 slave2 dummy1 ip link add hsr1 type hsr slave1 dummy2 slave2 dummy0 Splat looks like: [ 214.923127][ C2] WARNING: CPU: 2 PID: 1093 at net/core/dev.c:8992 rollback_registered_many+0x986/0xcf0 [ 214.923129][ C2] Modules linked in: hsr dummy openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipx [ 214.923154][ C2] CPU: 2 PID: 1093 Comm: ip Not tainted 5.8.0-rc2+ #623 [ 214.923156][ C2] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 214.923157][ C2] RIP: 0010:rollback_registered_many+0x986/0xcf0 [ 214.923160][ C2] Code: 41 8b 4e cc 45 31 c0 31 d2 4c 89 ee 48 89 df e8 e0 47 ff ff 85 c0 0f 84 cd fc ff ff 5 [ 214.923162][ C2] RSP: 0018:ffff8880c5156f28 EFLAGS: 00010287 [ 214.923165][ C2] RAX: ffff8880d1dad458 RBX: ffff8880bd1b9000 RCX: ffffffffb929d243 [ 214.923167][ C2] RDX: 1ffffffff77e63f0 RSI: 0000000000000008 RDI: ffffffffbbf31f80 [ 214.923168][ C2] RBP: dffffc0000000000 R08: fffffbfff77e63f1 R09: fffffbfff77e63f1 [ 214.923170][ C2] R10: ffffffffbbf31f87 R11: 0000000000000001 R12: ffff8880c51570a0 [ 214.923172][ C2] R13: ffff8880bd1b90b8 R14: ffff8880c5157048 R15: ffff8880d1dacc40 [ 214.923174][ C2] FS: 00007fdd257a20c0(0000) GS:ffff8880da200000(0000) knlGS:0000000000000000 [ 214.923175][ C2] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 214.923177][ C2] CR2: 00007ffd78beb038 CR3: 00000000be544005 CR4: 00000000000606e0 [ 214.923179][ C2] Call Trace: [ 214.923180][ C2] ? netif_set_real_num_tx_queues+0x780/0x780 [ 214.923182][ C2] ? dev_validate_mtu+0x140/0x140 [ 214.923183][ C2] ? synchronize_rcu.part.79+0x85/0xd0 [ 214.923185][ C2] ? synchronize_rcu_expedited+0xbb0/0xbb0 [ 214.923187][ C2] rollback_registered+0xc8/0x170 [ 214.923188][ C2] ? rollback_registered_many+0xcf0/0xcf0 [ 214.923190][ C2] unregister_netdevice_queue+0x18b/0x240 [ 214.923191][ C2] hsr_dev_finalize+0x56e/0x6e0 [hsr] [ 214.923192][ C2] hsr_newlink+0x36b/0x450 [hsr] [ 214.923194][ C2] ? hsr_dellink+0x70/0x70 [hsr] [ 214.923195][ C2] ? rtnl_create_link+0x2e4/0xb00 [ 214.923197][ C2] ? __netlink_ns_capable+0xc3/0xf0 [ 214.923198][ C2] __rtnl_newlink+0xbdb/0x1270 [ ... ] Fixes: `e0a4b99773` ("hsr: use upper/lower device infrastructure") Reported-by: syzbot+7f1c020f68dab95aab59@syzkaller.appspotmail.com Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-04 18:02:48 -07:00
Luo bin	6dbb89014d	hinic: fix sending mailbox timeout in aeq event work When sending mailbox in the work of aeq event, another aeq event will be triggered. because the last aeq work is not exited and only one work can be excuted simultaneously in the same workqueue, mailbox sending function will return failure of timeout. We create and use another workqueue to fix this. Signed-off-by: Luo bin <luobin9@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-04 17:53:16 -07:00
David S. Miller	c00e858d55	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Use kvfree() to release vmalloc()'ed areas in ipset, from Eric Dumazet. 2) UAF in nfnetlink_queue from the nf_conntrack_update() path. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-04 17:47:35 -07:00
David S. Miller	4d57254554	Merge branch 'Documentation-networking-eliminate-doubled-words' Randy Dunlap says: ==================== Documentation: networking: eliminate doubled words Drop all duplicated words in Documentation/networking/ files. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-04 17:46:21 -07:00
Randy Dunlap	e54ac95afb	Documentation: networking: rxrpc: drop doubled word Drop the doubled word "have". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: netdev@vger.kernel.org Cc: David Howells <dhowells@redhat.com> Cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-04 17:46:21 -07:00
Randy Dunlap	474112d57c	Documentation: networking: ipvs-sysctl: drop doubled word Drop the doubled word "that". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-04 17:46:21 -07:00
Randy Dunlap	a7db3c7669	Documentation: networking: ip-sysctl: drop doubled word Drop the doubled word "that". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-04 17:46:21 -07:00
Randy Dunlap	4f6a009c8b	Documentation: networking: dsa: drop doubled word Drop the doubled word "in". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: netdev@vger.kernel.org Cc: Andrew Lunn <andrew@lunn.ch> Cc: Vivien Didelot <vivien.didelot@gmail.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-04 17:46:21 -07:00
Randy Dunlap	6d0fe3aea4	Documentation: networking: can_ucan_protocol: drop doubled words Drop the doubled words "the" and "of". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: netdev@vger.kernel.org Cc: Wolfgang Grandegger <wg@grandegger.com> Cc: Marc Kleine-Budde <mkl@pengutronix.de> Cc: linux-can@vger.kernel.org Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-04 17:46:21 -07:00
Randy Dunlap	e99094856d	Documentation: networking: ax25: drop doubled word Drop the doubled word "and". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: netdev@vger.kernel.org Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-hams@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-04 17:46:21 -07:00
Randy Dunlap	caebecb032	Documentation: networking: arcnet: drop doubled word Drop the doubled word "to". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-04 17:46:21 -07:00
Toke Høiland-Jørgensen	d7bf2ebebc	sched: consistently handle layer3 header accesses in the presence of VLANs There are a couple of places in net/sched/ that check skb->protocol and act on the value there. However, in the presence of VLAN tags, the value stored in skb->protocol can be inconsistent based on whether VLAN acceleration is enabled. The commit quoted in the Fixes tag below fixed the users of skb->protocol to use a helper that will always see the VLAN ethertype. However, most of the callers don't actually handle the VLAN ethertype, but expect to find the IP header type in the protocol field. This means that things like changing the ECN field, or parsing diffserv values, stops working if there's a VLAN tag, or if there are multiple nested VLAN tags (QinQ). To fix this, change the helper to take an argument that indicates whether the caller wants to skip the VLAN tags or not. When skipping VLAN tags, we make sure to skip all of them, so behaviour is consistent even in QinQ mode. To make the helper usable from the ECN code, move it to if_vlan.h instead of pkt_sched.h. v3: - Remove empty lines - Move vlan variable definitions inside loop in skb_protocol() - Also use skb_protocol() helper in IP{,6}_ECN_decapsulate() and bpf_skb_ecn_set_ce() v2: - Use eth_type_vlan() helper in skb_protocol() - Also fix code that reads skb->protocol directly - Change a couple of 'if/else if' statements to switch constructs to avoid calling the helper twice Reported-by: Ilya Ponetayev <i.ponetaev@ndmsystems.com> Fixes: `d8b9605d26` ("net: sched: fix skb->protocol use in case of accelerated vlan path") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-03 14:34:53 -07:00
Pablo Neira Ayuso	d005fbb855	netfilter: conntrack: refetch conntrack after nf_conntrack_update() __nf_conntrack_update() might refresh the conntrack object that is attached to the skbuff. Otherwise, this triggers UAF. [ 633.200434] ================================================================== [ 633.200472] BUG: KASAN: use-after-free in nf_conntrack_update+0x34e/0x770 [nf_conntrack] [ 633.200478] Read of size 1 at addr ffff888370804c00 by task nfqnl_test/6769 [ 633.200487] CPU: 1 PID: 6769 Comm: nfqnl_test Not tainted 5.8.0-rc2+ #388 [ 633.200490] Hardware name: LENOVO 23259H1/23259H1, BIOS G2ET32WW (1.12 ) 05/30/2012 [ 633.200491] Call Trace: [ 633.200499] dump_stack+0x7c/0xb0 [ 633.200526] ? nf_conntrack_update+0x34e/0x770 [nf_conntrack] [ 633.200532] print_address_description.constprop.6+0x1a/0x200 [ 633.200539] ? _raw_write_lock_irqsave+0xc0/0xc0 [ 633.200568] ? nf_conntrack_update+0x34e/0x770 [nf_conntrack] [ 633.200594] ? nf_conntrack_update+0x34e/0x770 [nf_conntrack] [ 633.200598] kasan_report.cold.9+0x1f/0x42 [ 633.200604] ? call_rcu+0x2c0/0x390 [ 633.200633] ? nf_conntrack_update+0x34e/0x770 [nf_conntrack] [ 633.200659] nf_conntrack_update+0x34e/0x770 [nf_conntrack] [ 633.200687] ? nf_conntrack_find_get+0x30/0x30 [nf_conntrack] Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1436 Fixes: `ee04805ff5` ("netfilter: conntrack: make conntrack userspace helpers work again") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-07-03 14:47:03 +02:00
Nicolas Ferre	ad4e2b6483	MAINTAINERS: net: macb: add Claudiu as co-maintainer I would like that Claudiu becomes co-maintainer of the Cadence macb driver. He's already participating to lots of reviews and enhancements to this driver and knows the different versions of this controller. Signed-off-by: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-02 14:33:50 -07:00
Codrin Ciubotariu	af199a1a9c	net: dsa: microchip: set the correct number of ports The number of ports is incorrectly set to the maximum available for a DSA switch. Even if the extra ports are not used, this causes some functions to be called later, like port_disable() and port_stp_state_set(). If the driver doesn't check the port index, it will end up modifying unknown registers. Fixes: `b987e98e50` ("dsa: add DSA switch driver for Microchip KSZ9477") Signed-off-by: Codrin Ciubotariu <codrin.ciubotariu@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-02 14:26:54 -07:00
Eric Dumazet	1ca0fafd73	tcp: md5: allow changing MD5 keys in all socket states This essentially reverts commit `7212303268` ("tcp: md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on established sockets") Mathieu reported that many vendors BGP implementations can actually switch TCP MD5 on established flows. Quoting Mathieu : Here is a list of a few network vendors along with their behavior with respect to TCP MD5: - Cisco: Allows for password to be changed, but within the hold-down timer (~180 seconds). - Juniper: When password is initially set on active connection it will reset, but after that any subsequent password changes no network resets. - Nokia: No notes on if they flap the tcp connection or not. - Ericsson/RedBack: Allows for 2 password (old/new) to co-exist until both sides are ok with new passwords. - Meta-Switch: Expects the password to be set before a connection is attempted, but no further info on whether they reset the TCP connection on a change. - Avaya: Disable the neighbor, then set password, then re-enable. - Zebos: Would normally allow the change when socket connected. We can revert my prior change because commit `9424e2e7ad` ("tcp: md5: fix potential overestimation of TCP option space") removed the leak of 4 kernel bytes to the wire that was the main reason for my patch. While doing my investigations, I found a bug when a MD5 key is changed, leading to these commits that stable teams want to consider before backporting this revert : Commit `6a2febec33` ("tcp: md5: add missing memory barriers in tcp_md5_do_add()/tcp_md5_hash_key()") Commit `e6ced831ef` ("tcp: md5: refine tcp_md5_do_add()/tcp_md5_hash_key() barriers") Fixes: `7212303268` "tcp: md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on established sockets" Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-02 14:07:49 -07:00
Helmut Grohne	e4b9a72d76	net: dsa: microchip: enable ksz9893 via i2c in the ksz9477 driver The KSZ9893 3-Port Gigabit Ethernet Switch can be controlled via SPI, I²C or MDIO (very limited and not supported by this driver). While there is already a compatible entry for the SPI bus, it was missing for I²C. Signed-off-by: Helmut Grohne <helmut.grohne@intenta.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-01 17:48:47 -07:00
Eric Dumazet	ba3bb0e76c	tcp: fix SO_RCVLOWAT possible hangs under high mem pressure Whenever tcp_try_rmem_schedule() returns an error, we are under trouble and should make sure to wakeup readers so that they can drain socket queues and eventually make room. Fixes: `03f45c883c` ("tcp: avoid extra wakeups for SO_RCVLOWAT users") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-01 17:46:04 -07:00
Willem de Bruijn	0da7536fb4	ip: Fix SO_MARK in RST, ACK and ICMP packets When no full socket is available, skbs are sent over a per-netns control socket. Its sk_mark is temporarily adjusted to match that of the real (request or timewait) socket or to reflect an incoming skb, so that the outgoing skb inherits this in __ip_make_skb. Introduction of the socket cookie mark field broke this. Now the skb is set through the cookie and cork: <caller> # init sockc.mark from sk_mark or cmsg ip_append_data ip_setup_cork # convert sockc.mark to cork mark ip_push_pending_frames ip_finish_skb __ip_make_skb # set skb->mark to cork mark But I missed these special control sockets. Update all callers of __ip(6)_make_skb that were originally missed. For IPv6, the same two icmp(v6) paths are affected. The third case is not, as commit `92e55f412c` ("tcp: don't annotate mark on control socket from tcp_v6_send_response()") replaced the ctl_sk->sk_mark with passing the mark field directly as a function argument. That commit predates the commit that introduced the bug. Fixes: `c6af0c227a` ("ip: support SO_MARK cmsg") Signed-off-by: Willem de Bruijn <willemb@google.com> Reported-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-01 17:38:30 -07:00
Eric Dumazet	e114e1e8ac	tcp: md5: do not send silly options in SYNCOOKIES Whenever cookie_init_timestamp() has been used to encode ECN,SACK,WSCALE options, we can not remove the TS option in the SYNACK. Otherwise, tcp_synack_options() will still advertize options like WSCALE that we can not deduce later when receiving the packet from the client to complete 3WHS. Note that modern linux TCP stacks wont use MD5+TS+SACK in a SYN packet, but we can not know for sure that all TCP stacks have the same logic. Before the fix a tcpdump would exhibit this wrong exchange : 10:12:15.464591 IP C > S: Flags [S], seq 4202415601, win 65535, options [nop,nop,md5 valid,mss 1400,sackOK,TS val 456965269 ecr 0,nop,wscale 8], length 0 10:12:15.464602 IP S > C: Flags [S.], seq 253516766, ack 4202415602, win 65535, options [nop,nop,md5 valid,mss 1400,nop,nop,sackOK,nop,wscale 8], length 0 10:12:15.464611 IP C > S: Flags [.], ack 1, win 256, options [nop,nop,md5 valid], length 0 10:12:15.464678 IP C > S: Flags [P.], seq 1:13, ack 1, win 256, options [nop,nop,md5 valid], length 12 10:12:15.464685 IP S > C: Flags [.], ack 13, win 65535, options [nop,nop,md5 valid], length 0 After this patch the exchange looks saner : 11:59:59.882990 IP C > S: Flags [S], seq 517075944, win 65535, options [nop,nop,md5 valid,mss 1400,sackOK,TS val 1751508483 ecr 0,nop,wscale 8], length 0 11:59:59.883002 IP S > C: Flags [S.], seq 1902939253, ack 517075945, win 65535, options [nop,nop,md5 valid,mss 1400,sackOK,TS val 1751508479 ecr 1751508483,nop,wscale 8], length 0 11:59:59.883012 IP C > S: Flags [.], ack 1, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508483 ecr 1751508479], length 0 11:59:59.883114 IP C > S: Flags [P.], seq 1:13, ack 1, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508483 ecr 1751508479], length 12 11:59:59.883122 IP S > C: Flags [.], ack 13, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508483 ecr 1751508483], length 0 11:59:59.883152 IP S > C: Flags [P.], seq 1:13, ack 13, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508484 ecr 1751508483], length 12 11:59:59.883170 IP C > S: Flags [.], ack 13, win 256, options [nop,nop,md5 valid,nop,nop,TS val 1751508484 ecr 1751508484], length 0 Of course, no SACK block will ever be added later, but nothing should break. Technically, we could remove the 4 nops included in MD5+TS options, but again some stacks could break seeing not conventional alignment. Fixes: `4957faade1` ("TCPCT part 1g: Responder Cookie => Initiator") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Florian Westphal <fw@strlen.de> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-01 17:36:23 -07:00
Rao Shoaib	9ef845f894	rds: If one path needs re-connection, check all and re-connect In testing with mprds enabled, Oracle Cluster nodes after reboot were not able to communicate with others nodes and so failed to rejoin the cluster. Peers with lower IP address initiated connection but the node could not respond as it choose a different path and could not initiate a connection as it had a higher IP address. With this patch, when a node sends out a packet and the selected path is down, all other paths are also checked and any down paths are re-connected. Reviewed-by: Ka-cheong Poon <ka-cheong.poon@oracle.com> Reviewed-by: David Edmondson <david.edmondson@oracle.com> Signed-off-by: Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com> Signed-off-by: Rao Shoaib <rao.shoaib@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-01 17:35:17 -07:00
Eric Dumazet	e6ced831ef	tcp: md5: refine tcp_md5_do_add()/tcp_md5_hash_key() barriers My prior fix went a bit too far, according to Herbert and Mathieu. Since we accept that concurrent TCP MD5 lookups might see inconsistent keys, we can use READ_ONCE()/WRITE_ONCE() instead of smp_rmb()/smp_wmb() Clearing all key->key[] is needed to avoid possible KMSAN reports, if key->keylen is increased. Since tcp_md5_do_add() is not fast path, using __GFP_ZERO to clear all struct tcp_md5sig_key is simpler. data_race() was added in linux-5.8 and will prevent KCSAN reports, this can safely be removed in stable backports, if data_race() is not yet backported. v2: use data_race() both in tcp_md5_hash_key() and tcp_md5_do_add() Fixes: `6a2febec33` ("tcp: md5: add missing memory barriers in tcp_md5_do_add()/tcp_md5_hash_key()") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Marco Elver <elver@google.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-01 17:29:45 -07:00
Sean Tranchetti	1e82a62fec	genetlink: remove genl_bind A potential deadlock can occur during registering or unregistering a new generic netlink family between the main nl_table_lock and the cb_lock where each thread wants the lock held by the other, as demonstrated below. 1) Thread 1 is performing a netlink_bind() operation on a socket. As part of this call, it will call netlink_lock_table(), incrementing the nl_table_users count to 1. 2) Thread 2 is registering (or unregistering) a genl_family via the genl_(un)register_family() API. The cb_lock semaphore will be taken for writing. 3) Thread 1 will call genl_bind() as part of the bind operation to handle subscribing to GENL multicast groups at the request of the user. It will attempt to take the cb_lock semaphore for reading, but it will fail and be scheduled away, waiting for Thread 2 to finish the write. 4) Thread 2 will call netlink_table_grab() during the (un)registration call. However, as Thread 1 has incremented nl_table_users, it will not be able to proceed, and both threads will be stuck waiting for the other. genl_bind() is a noop, unless a genl_family implements the mcast_bind() function to handle setting up family-specific multicast operations. Since no one in-tree uses this functionality as Cong pointed out, simply removing the genl_bind() function will remove the possibility for deadlock, as there is no attempt by Thread 1 above to take the cb_lock semaphore. Fixes: `c380d9a7af` ("genetlink: pass multicast bind/unbind to families") Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Johannes Berg <johannes.berg@intel.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Sean Tranchetti <stranche@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-01 15:49:11 -07:00
Luo bin	d3c54f7f18	hinic: fix passing non negative value to ERR_PTR get_dev_cap and set_resources_state functions may return a positive value because of hardware failure, and the positive return value can not be passed to ERR_PTR directly. Fixes: `7dd29ee128` ("hinic: add sriov feature support") Signed-off-by: Luo bin <luobin9@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-07-01 12:14:04 -07:00
Dan Carpenter	8ff41cc217	net: qrtr: Fix an out of bounds read qrtr_endpoint_post() This code assumes that the user passed in enough data for a qrtr_hdr_v1 or qrtr_hdr_v2 struct, but it's not necessarily true. If the buffer is too small then it will read beyond the end. Reported-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reported-by: syzbot+b8fe393f999a291a9ea6@syzkaller.appspotmail.com Fixes: `194ccc8829` ("net: qrtr: Support decoding incoming v2 packets") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-30 18:36:13 -07:00
Eric Dumazet	6a2febec33	tcp: md5: add missing memory barriers in tcp_md5_do_add()/tcp_md5_hash_key() MD5 keys are read with RCU protection, and tcp_md5_do_add() might update in-place a prior key. Normally, typical RCU updates would allocate a new piece of memory. In this case only key->key and key->keylen might be updated, and we do not care if an incoming packet could see the old key, the new one, or some intermediate value, since changing the key on a live flow is known to be problematic anyway. We only want to make sure that in the case key->keylen is changed, cpus in tcp_md5_hash_key() wont try to use uninitialized data, or crash because key->keylen was read twice to feed sg_init_one() and ahash_request_set_crypt() Fixes: `9ea88a1530` ("tcp: md5: check md5 signature without socket lock") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-30 18:14:38 -07:00
Carl Huang	28541f3d32	net: qrtr: free flow in __qrtr_node_release The flow is allocated in qrtr_tx_wait, but not freed when qrtr node is released. (slot) becomes NULL after radix_tree_iter_delete is called in __qrtr_node_release. The fix is to save (slot) to a vairable and then free it. This memory leak is catched when kmemleak is enabled in kernel, the report looks like below: unreferenced object 0xffffa0de69e08420 (size 32): comm "kworker/u16:3", pid 176, jiffies 4294918275 (age 82858.876s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 28 84 e0 69 de a0 ff ff ........(..i.... 28 84 e0 69 de a0 ff ff 03 00 00 00 00 00 00 00 (..i............ backtrace: [<00000000e252af0a>] qrtr_node_enqueue+0x38e/0x400 [qrtr] [<000000009cea437f>] qrtr_sendmsg+0x1e0/0x2a0 [qrtr] [<000000008bddbba4>] sock_sendmsg+0x5b/0x60 [<0000000003beb43a>] qmi_send_message.isra.3+0xbe/0x110 [qmi_helpers] [<000000009c9ae7de>] qmi_send_request+0x1c/0x20 [qmi_helpers] Signed-off-by: Carl Huang <cjhuang@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-30 16:25:04 -07:00
Li Heng	8a259e6b73	net: cxgb4: fix return error value in t4_prep_fw t4_prep_fw goto bye tag with positive return value when something bad happened and which can not free resource in adap_init0. so fix it to return negative value. Fixes: `16e47624e7` ("cxgb4: Add new scheme to update T4/T5 firmware") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Li Heng <liheng40@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-30 15:53:25 -07:00
David S. Miller	e708e2bd55	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2020-06-30 The following pull-request contains BPF updates for your net tree. We've added 28 non-merge commits during the last 9 day(s) which contain a total of 35 files changed, 486 insertions(+), 232 deletions(-). The main changes are: 1) Fix an incorrect verifier branch elimination for PTR_TO_BTF_ID pointer types, from Yonghong Song. 2) Fix UAPI for sockmap and flow_dissector progs that were ignoring various arguments passed to BPF_PROG_{ATTACH,DETACH}, from Lorenz Bauer & Jakub Sitnicki. 3) Fix broken AF_XDP DMA hacks that are poking into dma-direct and swiotlb internals and integrate it properly into DMA core, from Christoph Hellwig. 4) Fix RCU splat from recent changes to avoid skipping ingress policy when kTLS is enabled, from John Fastabend. 5) Fix BPF ringbuf map to enforce size to be the power of 2 in order for its position masking to work, from Andrii Nakryiko. 6) Fix regression from CAP_BPF work to re-allow CAP_SYS_ADMIN for loading of network programs, from Maciej Żenczykowski. 7) Fix libbpf section name prefix for devmap progs, from Jesper Dangaard Brouer. 8) Fix formatting in UAPI documentation for BPF helpers, from Quentin Monnet. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-30 14:20:45 -07:00
Yonghong Song	d923021c2c	bpf: Add tests for PTR_TO_BTF_ID vs. null comparison Add two tests for PTR_TO_BTF_ID vs. null ptr comparison, one for PTR_TO_BTF_ID in the ctx structure and the other for PTR_TO_BTF_ID after one level pointer chasing. In both cases, the test ensures condition is not removed. For example, for this test struct bpf_fentry_test_t { struct bpf_fentry_test_t a; }; int BPF_PROG(test7, struct bpf_fentry_test_t arg) { if (arg == 0) test7_result = 1; return 0; } Before the previous verifier change, we have xlated codes: int test7(long long unsigned int * ctx): ; int BPF_PROG(test7, struct bpf_fentry_test_t arg) 0: (79) r1 = (u64 )(r1 +0) ; int BPF_PROG(test7, struct bpf_fentry_test_t arg) 1: (b4) w0 = 0 2: (95) exit After the previous verifier change, we have: int test7(long long unsigned int * ctx): ; int BPF_PROG(test7, struct bpf_fentry_test_t arg) 0: (79) r1 = (u64 )(r1 +0) ; if (arg == 0) 1: (55) if r1 != 0x0 goto pc+4 ; test7_result = 1; 2: (18) r1 = map[id:6][0]+48 4: (b7) r2 = 1 5: (7b) (u64 )(r1 +0) = r2 ; int BPF_PROG(test7, struct bpf_fentry_test_t arg) 6: (b4) w0 = 0 7: (95) exit Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200630171241.2523875-1-yhs@fb.com	2020-06-30 22:21:29 +02:00
Yonghong Song	01c66c48d4	bpf: Fix an incorrect branch elimination by verifier Wenbo reported an issue in [1] where a checking of null pointer is evaluated as always false. In this particular case, the program type is tp_btf and the pointer to compare is a PTR_TO_BTF_ID. The current verifier considers PTR_TO_BTF_ID always reprents a non-null pointer, hence all PTR_TO_BTF_ID compares to 0 will be evaluated as always not-equal, which resulted in the branch elimination. For example, struct bpf_fentry_test_t { struct bpf_fentry_test_t a; }; int BPF_PROG(test7, struct bpf_fentry_test_t arg) { if (arg == 0) test7_result = 1; return 0; } int BPF_PROG(test8, struct bpf_fentry_test_t *arg) { if (arg->a == 0) test8_result = 1; return 0; } In above bpf programs, both branch arg == 0 and arg->a == 0 are removed. This may not be what developer expected. The bug is introduced by Commit `cac616db39` ("bpf: Verifier track null pointer branch_taken with JNE and JEQ"), where PTR_TO_BTF_ID is considered to be non-null when evaluting pointer vs. scalar comparison. This may be added considering we have PTR_TO_BTF_ID_OR_NULL in the verifier as well. PTR_TO_BTF_ID_OR_NULL is added to explicitly requires a non-NULL testing in selective cases. The current generic pointer tracing framework in verifier always assigns PTR_TO_BTF_ID so users does not need to check NULL pointer at every pointer level like a->b->c->d. We may not want to assign every PTR_TO_BTF_ID as PTR_TO_BTF_ID_OR_NULL as this will require a null test before pointer dereference which may cause inconvenience for developers. But we could avoid branch elimination to preserve original code intention. This patch simply removed PTR_TO_BTD_ID from reg_type_not_null() in verifier, which prevented the above branches from being eliminated. [1]: https://lore.kernel.org/bpf/79dbb7c0-449d-83eb-5f4f-7af0cc269168@fb.com/T/ Fixes: `cac616db39` ("bpf: Verifier track null pointer branch_taken with JNE and JEQ") Reported-by: Wenbo Zhang <ethercflow@gmail.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200630171240.2523722-1-yhs@fb.com	2020-06-30 22:21:05 +02:00
David S. Miller	0433c93dff	Merge branch 'net-ipa-three-bug-fixes' Alex Elder says: ==================== net: ipa: three bug fixes This series contains three bug fixes for the Qualcomm IPA driver. In practice these bugs are unlikke.y to be harmful, but they do represent incorrect code. Version 2 adds "Fixes" tags to two of the patches and fixes a typo in one (found by checkpatch.pl). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-30 13:10:57 -07:00
Alex Elder	6cb63ea6a3	net: ipa: introduce ipa_cmd_tag_process() Create a new function ipa_cmd_tag_process() that simply allocates a transaction, adds a tag process command to it to clear the hardware pipeline, and commits the transaction. Call it in from ipa_endpoint_suspend(), after suspending the modem endpoints but before suspending the AP command TX and AP LAN RX endpoints (which are used by the tag sequence). Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-30 13:10:57 -07:00
Alex Elder	41af5436e8	net: ipa: no checksum offload for SDM845 LAN RX The AP LAN RX endpoint should not have download checksum offload enabled. The receive handler does properly accommodate the trailer that's added by the hardware, but we ignore it. Fixes: `1ed7d0c0fd` ("soc: qcom: ipa: configuration data") Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-30 13:10:57 -07:00
Alex Elder	5468cbcddf	net: ipa: always check for stopped channel In gsi_channel_stop(), there's a check to see if the channel might have entered STOPPED state since a previous call, which might have timed out before stopping completed. That check actually belongs in gsi_channel_stop_command(), which is called repeatedly by gsi_channel_stop() for RX channels. Fixes: `650d160382` ("soc: qcom: ipa: the generic software interface") Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-30 13:10:57 -07:00
Russell King	f2ca673d2c	net: mvneta: fix use of state->speed When support for short preambles was added, it incorrectly keyed its decision off state->speed instead of state->interface. state->speed is not guaranteed to be correct for in-band modes, which can lead to short preambles being unexpectedly disabled. Fix this by keying off the interface mode, which is the only way that mvneta can operate at 2.5Gbps. Fixes: `da58a931f2` ("net: mvneta: Add support for 2500Mbps SGMII") Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-30 13:01:12 -07:00
David S. Miller	b9fcf0a0d8	Merge branch 'support-AF_PACKET-for-layer-3-devices' Jason A. Donenfeld says: ==================== support AF_PACKET for layer 3 devices Hans reported that packets injected by a correct-looking and trivial libpcap-based program were not being accepted by wireguard. In investigating that, I noticed that a few devices weren't properly handling AF_PACKET-injected packets, and so this series introduces a bit of shared infrastructure to support that. The basic problem begins with socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL)) sockets. When sendto is called, AF_PACKET examines the headers of the packet with this logic: static void packet_parse_headers(struct sk_buff skb, struct socket sock) { if ((!skb->protocol \|\| skb->protocol == htons(ETH_P_ALL)) && sock->type == SOCK_RAW) { skb_reset_mac_header(skb); skb->protocol = dev_parse_header_protocol(skb); } skb_probe_transport_header(skb); } The middle condition there triggers, and we jump to dev_parse_header_protocol. Note that this is the only caller of dev_parse_header_protocol in the kernel, and I assume it was designed for this purpose: static inline __be16 dev_parse_header_protocol(const struct sk_buff skb) { const struct net_device dev = skb->dev; if (!dev->header_ops \|\| !dev->header_ops->parse_protocol) return 0; return dev->header_ops->parse_protocol(skb); } Since AF_PACKET already knows which netdev the packet is going to, the dev_parse_header_protocol function can see if that netdev has a way it prefers to figure out the protocol from the header. This, again, is the only use of parse_protocol in the kernel. At the moment, it's only used with ethernet devices, via eth_header_parse_protocol. This makes sense, as mostly people are used to AF_PACKET-injecting ethernet frames rather than layer 3 frames. But with nothing in place for layer 3 netdevs, this function winds up returning 0, and skb->protocol then is set to 0, and then by the time it hits the netdev's ndo_start_xmit, the driver doesn't know what to do with it. This is a problem because drivers very much rely on skb->protocol being correct, and routinely reject packets where it's incorrect. That's why having this parsing happen for injected packets is quite important. In wireguard, ipip, and ipip6, for example, packets from AF_PACKET are just dropped entirely. For tun devices, it's sort of uglier, with the tun "packet information" header being passed to userspace containing a bogus protocol value. Some userspace programs are ill-equipped to deal with that. (But of course, that doesn't happen with tap devices, which benefit from the similar shared infrastructure for layer 2 netdevs, further motiviating this patchset for layer 3 netdevs.) This patchset addresses the issue by first adding a layer 3 header parse function, much akin to the existing one for layer 2 packets, and then adds a shared header_ops structure that, also much akin to the existing one for layer 2 packets. Then it wires it up to a few immediate places that stuck out as requiring it, and does a bit of cleanup. This patchset seems like it's fixing real bugs, so it might be appropriate for stable. But they're also very old bugs, so if you'd rather not backport to stable, that'd make sense to me too. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-30 12:29:39 -07:00
Jason A. Donenfeld	8f9a1fa430	net: xfrmi: implement header_ops->parse_protocol for AF_PACKET The xfrm interface uses skb->protocol to determine packet type, and bails out if it's not set. For AF_PACKET injection, we need to support its call chain of: packet_sendmsg -> packet_snd -> packet_parse_headers -> dev_parse_header_protocol -> parse_protocol Without a valid parse_protocol, this returns zero, and xfrmi rejects the skb. So, this wires up the ip_tunnel handler for layer 3 packets for that case. Reported-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-30 12:29:39 -07:00
Jason A. Donenfeld	75ea1f4773	net: sit: implement header_ops->parse_protocol for AF_PACKET Sit uses skb->protocol to determine packet type, and bails out if it's not set. For AF_PACKET injection, we need to support its call chain of: packet_sendmsg -> packet_snd -> packet_parse_headers -> dev_parse_header_protocol -> parse_protocol Without a valid parse_protocol, this returns zero, and sit rejects the skb. So, this wires up the ip_tunnel handler for layer 3 packets for that case. Reported-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-30 12:29:39 -07:00
Jason A. Donenfeld	ab59d2b698	net: vti: implement header_ops->parse_protocol for AF_PACKET Vti uses skb->protocol to determine packet type, and bails out if it's not set. For AF_PACKET injection, we need to support its call chain of: packet_sendmsg -> packet_snd -> packet_parse_headers -> dev_parse_header_protocol -> parse_protocol Without a valid parse_protocol, this returns zero, and vti rejects the skb. So, this wires up the ip_tunnel handler for layer 3 packets for that case. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-30 12:29:39 -07:00
Jason A. Donenfeld	b9815eb1d1	tun: implement header_ops->parse_protocol for AF_PACKET The tun driver passes up skb->protocol to userspace in the form of PI headers. For AF_PACKET injection, we need to support its call chain of: packet_sendmsg -> packet_snd -> packet_parse_headers -> dev_parse_header_protocol -> parse_protocol Without a valid parse_protocol, this returns zero, and the tun driver then gives userspace bogus values that it can't deal with. Note that this isn't the case with tap, because tap already benefits from the shared infrastructure for ethernet headers. But with tun, there's nothing. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-30 12:29:39 -07:00
Jason A. Donenfeld	1a574074ae	wireguard: queueing: make use of ip_tunnel_parse_protocol Now that wg_examine_packet_protocol has been added for general consumption as ip_tunnel_parse_protocol, it's possible to remove wg_examine_packet_protocol and simply use the new ip_tunnel_parse_protocol function directly. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-30 12:29:39 -07:00
Jason A. Donenfeld	01a4967c71	wireguard: implement header_ops->parse_protocol for AF_PACKET WireGuard uses skb->protocol to determine packet type, and bails out if it's not set or set to something it's not expecting. For AF_PACKET injection, we need to support its call chain of: packet_sendmsg -> packet_snd -> packet_parse_headers -> dev_parse_header_protocol -> parse_protocol Without a valid parse_protocol, this returns zero, and wireguard then rejects the skb. So, this wires up the ip_tunnel handler for layer 3 packets for that case. Reported-by: Hans Wippel <ndev@hwipl.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-30 12:29:39 -07:00
Jason A. Donenfeld	e53ac93220	net: ipip: implement header_ops->parse_protocol for AF_PACKET Ipip uses skb->protocol to determine packet type, and bails out if it's not set. For AF_PACKET injection, we need to support its call chain of: packet_sendmsg -> packet_snd -> packet_parse_headers -> dev_parse_header_protocol -> parse_protocol Without a valid parse_protocol, this returns zero, and ipip rejects the skb. So, this wires up the ip_tunnel handler for layer 3 packets for that case. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-30 12:29:39 -07:00
Jason A. Donenfeld	2606aff916	net: ip_tunnel: add header_ops for layer 3 devices Some devices that take straight up layer 3 packets benefit from having a shared header_ops so that AF_PACKET sockets can inject packets that are recognized. This shared infrastructure will be used by other drivers that currently can't inject packets using AF_PACKET. It also exposes the parser function, as it is useful in standalone form too. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-30 12:29:39 -07:00
Jakub Sitnicki	2576f87066	bpf, netns: Fix use-after-free in pernet pre_exit callback Iterating over BPF links attached to network namespace in pre_exit hook is not safe, even if there is just one. Once link gets auto-detached, that is its back-pointer to net object is set to NULL, the link can be released and freed without waiting on netns_bpf_mutex, effectively causing the list element we are operating on to be freed. This leads to use-after-free when trying to access the next element on the list, as reported by KASAN. Bug can be triggered by destroying a network namespace, while also releasing a link attached to this network namespace. \| ================================================================== \| BUG: KASAN: use-after-free in netns_bpf_pernet_pre_exit+0xd9/0x130 \| Read of size 8 at addr ffff888119e0d778 by task kworker/u8:2/177 \| \| CPU: 3 PID: 177 Comm: kworker/u8:2 Not tainted 5.8.0-rc1-00197-ga0c04c9d1008-dirty #776 \| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014 \| Workqueue: netns cleanup_net \| Call Trace: \| dump_stack+0x9e/0xe0 \| print_address_description.constprop.0+0x3a/0x60 \| ? netns_bpf_pernet_pre_exit+0xd9/0x130 \| kasan_report.cold+0x1f/0x40 \| ? netns_bpf_pernet_pre_exit+0xd9/0x130 \| netns_bpf_pernet_pre_exit+0xd9/0x130 \| cleanup_net+0x30b/0x5b0 \| ? unregister_pernet_device+0x50/0x50 \| ? rcu_read_lock_bh_held+0xb0/0xb0 \| ? _raw_spin_unlock_irq+0x24/0x50 \| process_one_work+0x4d1/0xa10 \| ? lock_release+0x3e0/0x3e0 \| ? pwq_dec_nr_in_flight+0x110/0x110 \| ? rwlock_bug.part.0+0x60/0x60 \| worker_thread+0x7a/0x5c0 \| ? process_one_work+0xa10/0xa10 \| kthread+0x1e3/0x240 \| ? kthread_create_on_node+0xd0/0xd0 \| ret_from_fork+0x1f/0x30 \| \| Allocated by task 280: \| save_stack+0x1b/0x40 \| __kasan_kmalloc.constprop.0+0xc2/0xd0 \| netns_bpf_link_create+0xfe/0x650 \| __do_sys_bpf+0x153a/0x2a50 \| do_syscall_64+0x59/0x300 \| entry_SYSCALL_64_after_hwframe+0x44/0xa9 \| \| Freed by task 198: \| save_stack+0x1b/0x40 \| __kasan_slab_free+0x12f/0x180 \| kfree+0xed/0x350 \| process_one_work+0x4d1/0xa10 \| worker_thread+0x7a/0x5c0 \| kthread+0x1e3/0x240 \| ret_from_fork+0x1f/0x30 \| \| The buggy address belongs to the object at ffff888119e0d700 \| which belongs to the cache kmalloc-192 of size 192 \| The buggy address is located 120 bytes inside of \| 192-byte region [ffff888119e0d700, ffff888119e0d7c0) \| The buggy address belongs to the page: \| page:ffffea0004678340 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 \| flags: 0x2fffe0000000200(slab) \| raw: 02fffe0000000200 ffffea00045ba8c0 0000000600000006 ffff88811a80ea80 \| raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000 \| page dumped because: kasan: bad access detected \| \| Memory state around the buggy address: \| ffff888119e0d600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb \| ffff888119e0d680: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc \| >ffff888119e0d700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb \| ^ \| ffff888119e0d780: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc \| ffff888119e0d800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb \| ================================================================== Remove the "fast-path" for releasing a link that got auto-detached by a dying network namespace to fix it. This way as long as link is on the list and netns_bpf mutex is held, we have a guarantee that link memory can be accessed. An alternative way to fix this issue would be to safely iterate over the list of links and ensure there is no access to link object after detaching it. But, at the moment, optimizing synchronization overhead on link release without a workload in mind seems like an overkill. Fixes: `ab53cad90e` ("bpf, netns: Keep a list of attached bpf_link's") Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200630164541.1329993-1-jakub@cloudflare.com	2020-06-30 10:53:42 -07:00

1 2 3 4 5 ...

933446 Commits All Branches Search

933446 Commits

All Branches