linux

Commit Graph

Author	SHA1	Message	Date
Florian Westphal	e0c7d47221	netfilter: conntrack: check netns when comparing conntrack objects Once we place all conntracks in the same hash table we must also compare the netns pointer to skip conntracks that belong to a different namespace. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-05-05 16:39:46 +02:00
Florian Westphal	868043485e	netfilter: conntrack: use nf_ct_key_equal() in more places This prepares for upcoming change that places all conntracks into a single, global table. For this to work we will need to also compare net pointer during lookup. To avoid open-coding such check use the nf_ct_key_equal helper and then later extend it to also consider net_eq. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-05-05 16:39:44 +02:00
Florian Westphal	88b68bc523	netfilter: conntrack: don't attempt to iterate over empty table Once we place all conntracks into same table iteration becomes more costly because the table contains conntracks that we are not interested in (belonging to other netns). So don't bother scanning if the current namespace has no entries. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-05-05 16:39:44 +02:00
Florian Westphal	5e3c61f981	netfilter: conntrack: fix lookup race during hash resize When resizing the conntrack hash table at runtime via echo 42 > /sys/module/nf_conntrack/parameters/hashsize, we are racing with the conntrack lookup path -- reads can happen in parallel and nothing prevents readers from observing a the newly allocated hash but the old size (or vice versa). So access to hash[bucket] can trigger OOB read access in case the table got expanded and we saw the new size but the old hash pointer (or it got shrunk and we got new hash ptr but the size of the old and larger table): kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.6.0-rc2+ #107 [..] Call Trace: [<ffffffff822c3d6a>] ? nf_conntrack_tuple_taken+0x12a/0xe90 [<ffffffff822c3ac1>] ? nf_ct_invert_tuplepr+0x221/0x3a0 [<ffffffff8230e703>] get_unique_tuple+0xfb3/0x2760 Use generation counter to obtain the address/length of the same table. Also add a synchronize_net before freeing the old hash. AFAICS, without it we might access ct_hash[bucket] after ct_hash has been freed, provided that lockless reader got delayed by another event: CPU1 CPU2 seq_begin seq_retry <delay> resize occurs free oldhash for_each(oldhash[size]) Note that resize is only supported in init_netns, it took over 2 minutes of constant resizing+flooding to produce the warning, so this isn't a big problem in practice. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-05-05 16:39:43 +02:00
Florian Westphal	2cf1234807	netfilter: conntrack: keep BH enabled during lookup No need to disable BH here anymore: stats are switched to _ATOMIC variant (== this_cpu_inc()), which nowadays generates same code as the non _ATOMIC NF_STAT, at least on x86. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-05-05 16:39:43 +02:00
Florian Westphal	1ad8f48df6	netfilter: nftables: add connlabel set support Conntrack labels are currently sized depending on the iptables ruleset, i.e. if we're asked to test or set bits 1, 2, and 65 then we would allocate enough room to store at least bit 65. However, with nft, the input is just a register with arbitrary runtime content. We therefore ask for the upper ceiling we currently have, which is enough room to store 128 bits. Alternatively, we could alter nf_connlabel_replace to increase net->ct.label_words at run time, but since 128 bits is not that big we'd only save sizeof(long) so it doesn't seem worth it for now. This follows a similar approach that xtables 'connlabel' match uses, so when user inputs ct label set bar then we will set the bit used by the 'bar' label and leave the rest alone. This is done by passing the sreg content to nf_connlabels_replace as both value and mask argument. Labels (bits) already set thus cannot be re-set to zero, but this is not supported by xtables connlabel match either. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-05-05 16:27:59 +02:00
Liping Zhang	cec5913c15	netfilter: IDLETIMER: fix race condition when destroy the target Workqueue maybe still in running while we destroy the IDLETIMER target, thus cause a use after free error, add cancel_work_sync() to avoid such situation. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-29 14:28:48 +02:00
Florian Westphal	70d72b7e06	netfilter: conntrack: init all_locks to avoid debug warning Else we get 'BUG: spinlock bad magic on CPU#' on resize when spin lock debugging is enabled. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-29 11:27:10 +02:00
Nicolas Dichtel	cbdeafd7e1	netfilter/ipvs: use nla_put_u64_64bit() Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-25 15:09:11 -04:00
Pablo Neira Ayuso	6cd54fc60c	Merge tag 'ipvs-for-v4.7' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next Simon Horman says: ==================== IPVS Updates for v4.7 please consider these enhancements to the IPVS. They allow SIP connections originating from real-servers to be load balanced by the SIP psersitence engine as is already implemented in the other direction. And for better one packet scheduling (OPS) performance. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-25 15:35:53 +02:00
Pablo Neira Ayuso	3bb398d925	netfilter: nf_ct_helper: disable automatic helper assignment Four years ago we introduced a new sysctl knob to disable automatic helper assignment in 72110dfaa907 ("netfilter: nf_ct_helper: disable automatic helper assignment"). This knob kept this behaviour enabled by default to remain conservative. This measure was introduced to provide a secure way to configure iptables and connection tracking helpers through explicit rules. Give the time we have waited for this, let's turn off this by default now, worse case users still have a chance to recover the former behaviour by explicitly enabling this back through sysctl. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-25 15:34:30 +02:00
Pablo Neira Ayuso	e701001e7c	netfilter: nft_rbtree: allow adjacent intervals with dynamic updates This patch fixes dynamic element updates for adjacent intervals in the rb-tree representation. Since elements are sorted in the rb-tree, in case of adjacent nodes with the same key, the assumption is that an interval end node must be placed before an interval opening. In tree lookup operations, the idea is to search for the closer element that is smaller than the one we're searching for. Given that we'll have two possible matchings, we have to take the opening interval in case of adjacent nodes. Range merges are not trivial with the current representation, specifically we have to check if node extensions are equal and make sure we keep the existing internal states around. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-25 15:32:41 +02:00
Pablo Neira Ayuso	ef1d20e0f8	netfilter: nft_rbtree: introduce nft_rbtree_interval_end() helper Add this new nft_rbtree_interval_end() helper function to check in the end interval is set. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-25 14:52:12 +02:00
Pablo Neira Ayuso	3971ca1435	netfilter: nf_tables: parse element flags from nft_del_setelem() Parse flags and pass them to the set via ->deactivate() to check if we remove the right element from the intervals. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-25 14:52:12 +02:00
Pablo Neira Ayuso	0e9091d686	netfilter: nf_tables: introduce nft_setelem_parse_flags() helper This function parses the set element flags, thus, we can reuse the same handling when deleting elements. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-25 14:52:12 +02:00
Florian Westphal	141658fb02	netfilter: conntrack: use get_random_once for conntrack hash seed As earlier commit removed accessed to the hash from other files we can also make it static. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-25 14:52:12 +02:00
Florian Westphal	7001c6d109	netfilter: conntrack: use get_random_once for nat and expectations Use a private seed and init it using get_random_once. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-25 14:52:12 +02:00
Florian Westphal	a3efd81205	netfilter: conntrack: move generation seqcnt out of netns_ct We only allow rehash in init namespace, so we only use init_ns.generation. And even if we would allow it, it makes no sense as the conntrack locks are global; any ongoing rehash prevents insert/ delete. So make this private to nf_conntrack_core instead. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-25 14:52:11 +02:00
David S. Miller	11afbff861	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree, mostly from Florian Westphal to sort out the lack of sufficient validation in x_tables and connlabel preparation patches to add nf_tables support. They are: 1) Ensure we don't go over the ruleset blob boundaries in mark_source_chains(). 2) Validate that target jumps land on an existing xt_entry. This extra sanitization comes with a performance penalty when loading the ruleset. 3) Introduce xt_check_entry_offsets() and use it from {arp,ip,ip6}tables. 4) Get rid of the smallish check_entry() functions in {arp,ip,ip6}tables. 5) Make sure the minimal possible target size in x_tables. 6) Similar to #3, add xt_compat_check_entry_offsets() for compat code. 7) Check that standard target size is valid. 8) More sanitization to ensure that the target_offset field is correct. 9) Add xt_check_entry_match() to validate that matches are well-formed. 10-12) Three patch to reduce the number of parameters in translate_compat_table() for {arp,ip,ip6}tables by using a container structure. 13) No need to return value from xt_compat_match_from_user(), so make it void. 14) Consolidate translate_table() so it can be used by compat code too. 15) Remove obsolete check for compat code, so we keep consistent with what was already removed in the native layout code (back in 2007). 16) Get rid of target jump validation from mark_source_chains(), obsoleted by #2. 17) Introduce xt_copy_counters_from_user() to consolidate counter copying, and use it from {arp,ip,ip6}tables. 18,22) Get rid of unnecessary explicit inlining in ctnetlink for dump functions. 19) Move nf_connlabel_match() to xt_connlabel. 20) Skip event notification if connlabel did not change. 21) Update of nf_connlabels_get() to make the upcoming nft connlabel support easier. 23) Remove spinlock to read protocol state field in conntrack. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-24 00:12:08 -04:00
Nicolas Dichtel	b46f6ded90	libnl: nla_put_be64(): align on a 64-bit area nla_data() is now aligned on a 64-bit area. A temporary version (nla_put_be64_32bit()) is added for nla_put_net64(). This function is removed in the next patch. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-23 20:13:24 -04:00
David S. Miller	1602f49b58	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts were two cases of simple overlapping changes, nothing serious. In the UDP case, we need to add a hlist_add_tail_rcu() to linux/rculist.h, because we've moved UDP socket handling away from using nulls lists. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-23 18:51:33 -04:00
Marco Angaroni	8fb04d9fc7	ipvs: don't alter conntrack in OPS mode When using OPS mode in conjunction with SIP persistent-engine, packets originating from the same ip-address/port could be balanced to different real servers, and (to properly handle SIP responses) OPS connections are created in the in-out direction too, where ip_vs_update_conntrack() is called to modify the reply tuple. As a result, there can be collision of conntrack tuples, causing random packet drops, as explained below: conntrack1: orig=CIP->VIP, reply=RIP1->CIP conntrack2: orig=RIP2->CIP, reply=CIP->VIP Tuple CIP->VIP is both in orig of conntrack1 and reply of conntrack2. The collision triggers packet drop inside nf_conntrack processing. In addition, the current implementation deletes the conntrack object at every expire of an OPS connection (once every forwarded packet), to have it recreated from scratch at next packet traversing IPVS. Since in OPS mode, by definition, we don't expect any associated response, the choices implemented in this patch are: a) don't call nf_conntrack_alter_reply() for OPS connections inside ip_vs_update_conntrack(). b) don't delete the conntrack object at OPS connection expire. The result is that created conntrack objects for each tuple CIP->VIP, RIP-N->CIP, etc. are left in UNREPLIED state and not modified by IPVS OPS connection management. This eliminates packet drops and leaves a single conntrack object for each tuple packets are sent from. Signed-off-by: Marco Angaroni <marcoangaroni@gmail.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2016-04-20 12:34:17 +10:00
Marco Angaroni	013b042465	ipvs: optimize release of connections in OPS mode One-packet-scheduling is the most expensive mode in IPVS from performance point of view: for each packet to be processed a new connection data structure is created and, after packet is sent, deleted by starting a new timer set to expire immediately. SIP persistent-engine needs OPS mode to have Call-ID based load balancing, so OPS mode performance has negative impact in SIP protocol load balancing. This patch aims to improve performance of OPS mode by means of the following changes in the release mechanism of OPS connections: a) call expire callback ip_vs_conn_expire() directly instead of starting a timer programmed to fire immediately. b) avoid call_rcu() overhead inside expire callback, since OPS connection are not inserted in the hash-table and last just the time to process the packet, hence there is no concurrent access to such data structures. Signed-off-by: Marco Angaroni <marcoangaroni@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2016-04-20 12:34:17 +10:00
Marco Angaroni	39b9722315	ipvs: handle connections started by real-servers When using LVS-NAT and SIP persistence-egine over UDP, the following limitations are present with current implementation: 1) To actually have load-balancing based on Call-ID header, you need to use one-packet-scheduling mode. But with one-packet-scheduling the connection is deleted just after packet is forwarded, so SIP responses coming from real-servers do not match any connection and SNAT is not applied. 2) If you do not use "-o" option, IPVS behaves as normal UDP load balancer, so different SIP calls (each one identified by a different Call-ID) coming from the same ip-address/port go to the same real-server. So basically you don’t have load-balancing based on Call-ID as intended. 3) Call-ID is not learned when a new SIP call is started by a real-server (inside-to-outside direction), but only in the outside-to-inside direction. This would be a general problem for all SIP servers acting as Back2BackUserAgent. This patch aims to solve problems 1) and 3) while keeping OPS mode mandatory for SIP-UDP, so that 2) is not a problem anymore. The basic mechanism implemented is to make packets, that do not match any existent connection but come from real-servers, create new connections instead of let them pass without any effect. When such packets pass through ip_vs_out(), if their source ip address and source port match a configured real-server, a new connection is automatically created in the same way as it would have happened if the packet had come from outside-to-inside direction. A new connection template is created too if the virtual-service is persistent and there is no matching connection template found. The new connection automatically created, if the service had "-o" option, is an OPS connection that lasts only the time to forward the packet, just like it happens on the ingress side. The main part of this mechanism is implemented inside a persistent-engine specific callback (at the moment only SIP persistent engine exists) and is triggered only for UDP packets, since connection oriented protocols, by using different set of ports (typically ephemeral ports) to open new outgoing connections, should not need this feature. The following requisites are needed for automatic connection creation; if any is missing the packet simply goes the same way as before. a) virtual-service is not fwmark based (this is because fwmark services do not store address and port of the virtual-service, required to build the connection data). b) virtual-service and real-servers must not have been configured with omitted port (this is again to have all data to create the connection). Signed-off-by: Marco Angaroni <marcoangaroni@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2016-04-20 12:34:17 +10:00
Florian Westphal	a163f2cb39	netfilter: conntrack: don't acquire lock during seq_printf read access doesn't need any lock here. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-19 20:26:25 +02:00
Pablo Neira Ayuso	4a96300cec	netfilter: ctnetlink: restore inlining for netlink message size calculation Calm down gcc warnings: net/netfilter/nf_conntrack_netlink.c:529:15: warning: 'ctnetlink_proto_size' defined but not used [-Wunused-function] static size_t ctnetlink_proto_size(const struct nf_conn ct) ^ net/netfilter/nf_conntrack_netlink.c:546:15: warning: 'ctnetlink_acct_size' defined but not used [-Wunused-function] static size_t ctnetlink_acct_size(const struct nf_conn ct) ^ net/netfilter/nf_conntrack_netlink.c:556:12: warning: 'ctnetlink_secctx_size' defined but not used [-Wunused-function] static int ctnetlink_secctx_size(const struct nf_conn ct) ^ net/netfilter/nf_conntrack_netlink.c:572:15: warning: 'ctnetlink_timestamp_size' defined but not used [-Wunused-function] static size_t ctnetlink_timestamp_size(const struct nf_conn ct) ^ So gcc compiles them out when CONFIG_NF_CONNTRACK_EVENTS and CONFIG_NETFILTER_NETLINK_GLUE_CT are not set. Fixes: `4054ff4545` ("netfilter: ctnetlink: remove unnecessary inlining") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Arnd Bergmann <arnd@arndb.de>	2016-04-18 22:14:40 +02:00
Florian Westphal	adff6c6560	netfilter: connlabels: change nf_connlabels_get bit arg to 'highest used' nf_connlabel_set() takes the bit number that we would like to set. nf_connlabels_get() however took the number of bits that we want to support. So e.g. nf_connlabels_get(32) support bits 0 to 31, but not 32. This changes nf_connlabels_get() to take the highest bit that we want to set. Callers then don't have to cope with a potential integer wrap when using nf_connlabels_get(bit + 1) anymore. Current callers are fine, this change is only to make folloup nft ct label set support simpler. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-18 20:39:48 +02:00
Florian Westphal	5a8145f7b2	netfilter: labels: don't emit ct event if labels were not changed make the replace function only send a ctnetlink event if the contents of the new set is different. Otherwise 'ct label set ct label \| bar' will cause netlink event storm since we "replace" labels for each packet. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-18 20:39:44 +02:00
Florian Westphal	b4ef159927	netfilter: connlabels: move helpers to xt_connlabel Currently labels can only be set either by iptables connlabel match or via ctnetlink. Before adding nftables set support, clean up the clabel core and move helpers that nft will not need after all to the xtables module. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-18 20:39:41 +02:00
Alexander Duyck	aed069df09	ip_tunnel_core: iptunnel_handle_offloads returns int and doesn't free skb This patch updates the IP tunnel core function iptunnel_handle_offloads so that we return an int and do not free the skb inside the function. This actually allows us to clean up several paths in several tunnels so that we can free the skb at one point in the path without having to have a secondary path if we are supporting tunnel offloads. In addition it should resolve some double-free issues I have found in the tunnels paths as I believe it is possible for us to end up triggering such an event in the case of fou or gue. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-16 19:09:13 -04:00
Pablo Neira Ayuso	4054ff4545	netfilter: ctnetlink: remove unnecessary inlining Many of these functions are called from control plane path. Move ctnetlink_nlmsg_size() under CONFIG_NF_CONNTRACK_EVENTS to avoid a compilation warning when CONFIG_NF_CONNTRACK_EVENTS=n. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-14 00:39:34 +02:00
Florian Westphal	d7591f0c41	netfilter: x_tables: introduce and use xt_copy_counters_from_user The three variants use same copy&pasted code, condense this into a helper and use that. Make sure info.name is 0-terminated. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-14 00:30:41 +02:00
Florian Westphal	09d9686047	netfilter: x_tables: do compat validation via translate_table This looks like refactoring, but its also a bug fix. Problem is that the compat path (32bit iptables, 64bit kernel) lacks a few sanity tests that are done in the normal path. For example, we do not check for underflows and the base chain policies. While its possible to also add such checks to the compat path, its more copy&pastry, for instance we cannot reuse check_underflow() helper as e->target_offset differs in the compat case. Other problem is that it makes auditing for validation errors harder; two places need to be checked and kept in sync. At a high level 32 bit compat works like this: 1- initial pass over blob: validate match/entry offsets, bounds checking lookup all matches and targets do bookkeeping wrt. size delta of 32/64bit structures assign match/target.u.kernel pointer (points at kernel implementation, needed to access ->compatsize etc.) 2- allocate memory according to the total bookkeeping size to contain the translated ruleset 3- second pass over original blob: for each entry, copy the 32bit representation to the newly allocated memory. This also does any special match translations (e.g. adjust 32bit to 64bit longs, etc). 4- check if ruleset is free of loops (chase all jumps) 5-first pass over translated blob: call the checkentry function of all matches and targets. The alternative implemented by this patch is to drop steps 3&4 from the compat process, the translation is changed into an intermediate step rather than a full 1:1 translate_table replacement. In the 2nd pass (step #3), change the 64bit ruleset back to a kernel representation, i.e. put() the kernel pointer and restore ->u.user.name . This gets us a 64bit ruleset that is in the format generated by a 64bit iptables userspace -- we can then use translate_table() to get the 'native' sanity checks. This has two drawbacks: 1. we re-validate all the match and target entry structure sizes even though compat translation is supposed to never generate bogus offsets. 2. we put and then re-lookup each match and target. THe upside is that we get all sanity tests and ruleset validations provided by the normal path and can remove some duplicated compat code. iptables-restore time of autogenerated ruleset with 300k chains of form -A CHAIN0001 -m limit --limit 1/s -j CHAIN0002 -A CHAIN0002 -m limit --limit 1/s -j CHAIN0003 shows no noticeable differences in restore times: old: 0m30.796s new: 0m31.521s 64bit: 0m25.674s Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-14 00:30:40 +02:00
Florian Westphal	0188346f21	netfilter: x_tables: xt_compat_match_from_user doesn't need a retval Always returned 0. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-14 00:30:40 +02:00
Florian Westphal	13631bfc60	netfilter: x_tables: validate all offsets and sizes in a rule Validate that all matches (if any) add up to the beginning of the target and that each match covers at least the base structure size. The compat path should be able to safely re-use the function as the structures only differ in alignment; added a BUILD_BUG_ON just in case we have an arch that adds padding as well. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-14 00:30:38 +02:00
Florian Westphal	ce683e5f9d	netfilter: x_tables: check for bogus target offset We're currently asserting that targetoff + targetsize <= nextoff. Extend it to also check that targetoff is >= sizeof(xt_entry). Since this is generic code, add an argument pointing to the start of the match/target, we can then derive the base structure size from the delta. We also need the e->elems pointer in a followup change to validate matches. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-14 00:30:37 +02:00
Florian Westphal	7ed2abddd2	netfilter: x_tables: check standard target size too We have targets and standard targets -- the latter carries a verdict. The ip/ip6tables validation functions will access t->verdict for the standard targets to fetch the jump offset or verdict for chainloop detection, but this happens before the targets get checked/validated. Thus we also need to check for verdict presence here, else t->verdict can point right after a blob. Spotted with UBSAN while testing malformed blobs. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-14 00:30:37 +02:00
Florian Westphal	fc1221b3a1	netfilter: x_tables: add compat version of xt_check_entry_offsets 32bit rulesets have different layout and alignment requirements, so once more integrity checks get added to xt_check_entry_offsets it will reject well-formed 32bit rulesets. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-14 00:30:36 +02:00
Florian Westphal	a08e4e190b	netfilter: x_tables: assert minimum target size The target size includes the size of the xt_entry_target struct. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-14 00:30:36 +02:00
Florian Westphal	7d35812c32	netfilter: x_tables: add and use xt_check_entry_offsets Currently arp/ip and ip6tables each implement a short helper to check that the target offset is large enough to hold one xt_entry_target struct and that t->u.target_size fits within the current rule. Unfortunately these checks are not sufficient. To avoid adding new tests to all of ip/ip6/arptables move the current checks into a helper, then extend this helper in followup patches. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-14 00:30:35 +02:00
David S. Miller	da0caadf0a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains the first batch of Netfilter updates for your net-next tree. 1) Define pr_fmt() in nf_conntrack, from Weongyo Jeong. 2) Define and register netfilter's afinfo for the bridge family, this comes in preparation for native nfqueue's bridge for nft, from Stephane Bryant. 3) Add new attributes to store layer 2 and VLAN headers to nfqueue, also from Stephane Bryant. 4) Parse new NFQA_VLAN and NFQA_L2HDR nfqueue netlink attributes coming from userspace, from Stephane Bryant. 5) Use net->ipv6.devconf_all->hop_limit instead of hardcoded hop_limit in IPv6 SYNPROXY, from Liping Zhang. 6) Remove unnecessary check for dst == NULL in nf_reject_ipv6, from Haishuang Yan. 7) Deinline ctnetlink event report functions, from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-12 22:34:56 -04:00
Florian Westphal	ecdfb48cdd	netfilter: conntrack: move expectation event helper to ecache.c Not performance critical, it is only invoked when an expectation is added/destroyed. While at it, kill unused nf_ct_expect_event() wrapper. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-12 23:01:57 +02:00
Florian Westphal	3c435e2e41	netfilter: conntrack: de-inline nf_conntrack_eventmask_report Way too large; move it to nf_conntrack_ecache.c. Reduces total object size by 1216 byte on my machine. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-12 23:01:52 +02:00
David S. Miller	1089ac6977	For the 4.6 cycle, we have a number of changes: * Bob's mesh mode rhashtable conversion, this includes the rhashtable API change for allocation flags * BSSID scan, connect() command reassoc support (Jouni) * fast (optimised data only) and support for RSS in mac80211 (myself) * various smaller changes -----BEGIN PGP SIGNATURE----- iQIcBAABCgAGBQJXBQ4GAAoJEGt7eEactAAdWiMP/ibaP3I79NDc0s7wCDA+KRkm hx0Qx4a0wwm7lDFlnGBjY6yKr+XFDliCvdGX7XGpLSsTioNg7eXPpwx5FQoj6RiV 8+5RKE9fTguN9ofUzqAwHd9sVOaxvdlXbKfb/N93Gzjpw/meYk58wXdF7Almkroa ukgJeMzIlIh+6D96zFEA+Ofzp5chwh+x2Dn0wXutEe9P9fOERA859veAvx65b+Ql IRGTqyuY5B/wcbkr4o+DWQwgrdt7Vop9nYVPNWtMHm2JTzfuCSaQ2cD9TnVAK/bg /vtqC46KKNLyBRGexAPqdftY9PWcfipgE+n7k+Et4iGSmNm7Z3dEyewgXmqli7XJ X8Uiaq+N6Fpe06DVSU7aSRt8NLV64A44jXSfKRI9U2POUqKMn/PMdm8bhPW8qCdM ra6myWpQGHWK9e0TQQdShq0NQKGxCZAiSRiiIrbbvXl1CwXxkPCG39wAC3Sh1tEN ou4lGraeywGnTjaq+mwLEtHLoug8Y2x+Fz+Ze4Cu2enXxna9lp4lr+rFlc+2+0Er o9oPxkTk8krZGIj9M6PNc5W+InMwchaFX3076n67hnFHzFRlOQzkfffbPYlhKJDQ f8c9JiNZIoX/fD1TAKsrdO1+EKm/xo7w7pLgbMwQal8Jr88SkITDg0i3oXc56vNQ ZK2gUzwvrD/jh0AUyDfN =sj7y -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-davem-2016-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== For the 4.7 cycle, we have a number of changes: * Bob's mesh mode rhashtable conversion, this includes the rhashtable API change for allocation flags * BSSID scan, connect() command reassoc support (Jouni) * fast (optimised data only) and support for RSS in mac80211 (myself) * various smaller changes ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 16:42:31 -04:00
Jozsef Kadlecsik	644c7e48cb	netfilter: nf_conntrack_tcp: Fix stack out of bounds when parsing TCP options Baozeng Ding reported a KASAN stack out of bounds issue - it uncovered that the TCP option parsing routines in netfilter TCP connection tracking could read one byte out of the buffer of the TCP options. Therefore in the patch we check that the available data length is large enough to parse both TCP option code and size. Reported-by: Baozeng Ding <sploving1@gmail.com> Tested-by: Baozeng Ding <sploving1@gmail.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-07 18:42:37 +02:00
Bob Copeland	8f6fd83c6c	rhashtable: accept GFP flags in rhashtable_walk_init In certain cases, the 802.11 mesh pathtable code wants to iterate over all of the entries in the forwarding table from the receive path, which is inside an RCU read-side critical section. Enable walks inside atomic sections by allowing GFP_ATOMIC allocations for the walker state. Change all existing callsites to pass in GFP_KERNEL. Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Bob Copeland <me@bobcopeland.com> [also adjust gfs2/glock.c and rhashtable tests] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2016-04-05 10:56:32 +02:00
Eric Dumazet	3b24d854cb	tcp/dccp: do not touch listener sk_refcnt under synflood When a SYNFLOOD targets a non SO_REUSEPORT listener, multiple cpus contend on sk->sk_refcnt and sk->sk_wmem_alloc changes. By letting listeners use SOCK_RCU_FREE infrastructure, we can relax TCP_LISTEN lookup rules and avoid touching sk_refcnt Note that we still use SLAB_DESTROY_BY_RCU rules for other sockets, only listeners are impacted by this change. Peak performance under SYNFLOOD is increased by ~33% : On my test machine, I could process 3.2 Mpps instead of 2.4 Mpps Most consuming functions are now skb_set_owner_w() and sock_wfree() contending on sk->sk_wmem_alloc when cooking SYNACK and freeing them. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-04 22:11:20 -04:00
Stephane Bryant	8d45ff22f1	netfilter: bridge: nf queue verdict to use NFQA_VLAN and NFQA_L2HDR This makes nf queues use NFQA_VLAN and NFQA_L2HDR in verdict to modify the original skb Signed-off-by: Stephane Bryant <stephane.ml.bryant@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-03-29 13:29:30 +02:00
Stephane Bryant	15824ab29f	netfilter: bridge: pass L2 header and VLAN as netlink attributes in queues to userspace - This creates 2 netlink attribute NFQA_VLAN and NFQA_L2HDR. - These are filled up for the PF_BRIDGE family on the way to userspace. - NFQA_VLAN is a nested attribute, with the NFQA_VLAN_PROTO and the NFQA_VLAN_TCI carrying the corresponding vlan_proto and vlan_tci fields from the skb using big endian ordering (and using the CFI bit as the VLAN_TAG_PRESENT flag in vlan_tci as in the skb) Signed-off-by: Stephane Bryant <stephane.ml.bryant@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-03-29 13:26:38 +02:00
Pablo Neira Ayuso	931401137f	netfilter: nfnetlink_queue: honor NFQA_CFG_F_FAIL_OPEN when netlink unicast fails When netlink unicast fails to deliver the message to userspace, we should also check if the NFQA_CFG_F_FAIL_OPEN flag is set so we reinject the packet back to the stack. I think the user expects no packet drops when this flag is set due to queueing to userspace errors, no matter if related to the internal queue or when sending the netlink message to userspace. The userspace application will still get the ENOBUFS error via recvmsg() so the user still knows that, with the current configuration that is in place, the userspace application is not consuming the messages at the pace that the kernel needs. Reported-by: "Yigal Reiss (yreiss)" <yreiss@cisco.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Tested-by: "Yigal Reiss (yreiss)" <yreiss@cisco.com>	2016-03-28 17:59:20 +02:00
Vishwanath Pai	596cf3fe58	netfilter: ipset: fix race condition in ipset save, swap and delete This fix adds a new reference counter (ref_netlink) for the struct ip_set. The other reference counter (ref) can be swapped out by ip_set_swap and we need a separate counter to keep track of references for netlink events like dump. Using the same ref counter for dump causes a race condition which can be demonstrated by the following script: ipset create hash_ip1 hash:ip family inet hashsize 1024 maxelem 500000 \ counters ipset create hash_ip2 hash:ip family inet hashsize 300000 maxelem 500000 \ counters ipset create hash_ip3 hash:ip family inet hashsize 1024 maxelem 500000 \ counters ipset save & ipset swap hash_ip3 hash_ip2 ipset destroy hash_ip3 /* will crash the machine / Swap will exchange the values of ref so destroy will see ref = 0 instead of ref = 1. With this fix in place swap will not succeed because ipset save still has ref_netlink on the set (ip_set_swap doesn't swap ref_netlink). Both delete and swap will error out if ref_netlink != 0 on the set. Note: The changes to _head functions is because previously we would increment ref whenever we called these functions, we don't do that anymore. Reviewed-by: Joshua Hunt <johunt@akamai.com> Signed-off-by: Vishwanath Pai <vpai@akamai.com> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-03-28 17:57:45 +02:00
Weongyo Jeong	ccd63c20fe	netfilter: nf_conntrack: Uses pr_fmt() for logging. Uses pr_fmt() macro for debugging messages of nf_conntrack module. Signed-off-by: Weongyo Jeong <weongyo.linux@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-03-28 12:56:07 +02:00
Nicholas Mc Guire	e39365be03	netfilter: nf_conntrack: consolidate lock/unlock into unlock_wait The spin_lock()/spin_unlock() is synchronizing on the nf_conntrack_locks_all_lock which is equivalent to spin_unlock_wait() but the later should be more efficient. Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-03-15 01:10:42 +01:00
Florian Westphal	d157bd7615	netfilter: x_tables: check for size overflow Ben Hawkes says: integer overflow in xt_alloc_table_info, which on 32-bit systems can lead to small structure allocation and a copy_from_user based heap corruption. Reported-by: Ben Hawkes <hawkes@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-03-12 11:55:01 +01:00
Florian Westphal	f0716cd6eb	netfilter: nft_compat: check match/targetinfo attr size We copy according to ->target\|matchsize, so check that the netlink attribute (which can include padding and might be larger) contains enough data. Reported-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-03-11 11:37:56 +01:00
Pablo Neira Ayuso	d387eaf51f	Merge tag 'ipvs-fixes-for-v4.5' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs Simon Horman says: ==================== please consider these IPVS fixes for v4.5 or if it is too late please consider them for v4.6. * Arnd Bergman has corrected an error whereby the SIP persistence engine may incorrectly access protocol fields * Julian Anastasov has corrected a problem reported by Jiri Bohac with the connection rescheduling mechanism added in 3.10 when new SYNs in connection to dead real server can be redirected to another real server. * Marco Angaroni resolved a problem in the SIP persistence engine whereby the Call-ID could not be found if it was at the beginning of a SIP message. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-03-11 11:37:35 +01:00
Pablo Neira Ayuso	793882bfc3	Merge branch 'master' of git://blackhole.kfki.hu/nf Jozsef Kadlecsik says: ==================== Please apply the next patch against the nf tree: - Deniz Eren reported that parallel flush/dump of list:set type of sets can lead to kernel crash. The bug was due to non-RCU compatible flushing, listing in the set type, fixed by me. - Julia Lawall pointed out that IPSET_ATTR_ETHER netlink attribute length was not checked explicitly. The patch adds the missing checkings. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-03-10 17:34:00 +01:00
Jozsef Kadlecsik	d8aacd8718	netfilter: ipset: Check IPSET_ATTR_ETHER netlink attribute length Julia Lawall pointed out that IPSET_ATTR_ETHER netlink attribute length was not checked explicitly, just for the maximum possible size. Malicious netlink clients could send shorter attribute and thus resulting a kernel read after the buffer. The patch adds the explicit length checkings. Reported-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2016-03-08 20:36:17 +01:00
David S. Miller	4c38cd61ae	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter updates for your net-next tree, they are: 1) Remove useless debug message when deleting IPVS service, from Yannick Brosseau. 2) Get rid of compilation warning when CONFIG_PROC_FS is unset in several spots of the IPVS code, from Arnd Bergmann. 3) Add prandom_u32 support to nft_meta, from Florian Westphal. 4) Remove unused variable in xt_osf, from Sudip Mukherjee. 5) Don't calculate IP checksum twice from netfilter ipv4 defrag hook since fixing af_packet defragmentation issues, from Joe Stringer. 6) On-demand hook registration for iptables from netns. Instead of registering the hooks for every available netns whenever we need one of the support tables, we register this on the specific netns that needs it, patchset from Florian Westphal. 7) Add missing port range selection to nf_tables masquerading support. BTW, just for the record, there is a typo in the description of `5f6c253ebe` ("netfilter: bridge: register hooks only when bridge interface is added") that refers to the cluster match as deprecated, but it is actually the CLUSTERIP target (which registers hooks inconditionally) the one that is scheduled for removal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-08 14:25:20 -05:00
Marco Angaroni	7617a24f83	ipvs: correct initial offset of Call-ID header search in SIP persistence engine The IPVS SIP persistence engine is not able to parse the SIP header "Call-ID" when such header is inserted in the first positions of the SIP message. When IPVS is configured with "--pe sip" option, like for example: ipvsadm -A -u 1.2.3.4:5060 -s rr --pe sip -p 120 -o some particular messages (see below for details) do not create entries in the connection template table, which can be listed with: ipvsadm -Lcn --persistent-conn Problematic SIP messages are SIP responses having "Call-ID" header positioned just after message first line: SIP/2.0 200 OK [Call-ID header here] [rest of the headers] When "Call-ID" header is positioned down (after a few other headers) it is correctly recognized. This is due to the data offset used in get_callid function call inside ip_vs_pe_sip.c file: since dptr already points to the start of the SIP message, the value of dataoff should be initially 0. Otherwise the header is searched starting from some bytes after the first character of the SIP message. Fixes: `758ff03387` ("IPVS: sip persistence engine") Signed-off-by: Marco Angaroni <marcoangaroni@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2016-03-07 11:53:35 +09:00
Julian Anastasov	f911b675a0	ipvs: allow rescheduling after RST "RFC 5961, 4.2. Mitigation" describes a mechanism to request client to confirm with RST the restart of TCP connection before resending its SYN. As result, IPVS can see SYNs for existing connection in CLOSE state. Add check to allow rescheduling in this state. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2016-03-07 11:53:32 +09:00
Julian Anastasov	f719e3754e	ipvs: drop first packet to redirect conntrack Jiri Bohac is reporting for a problem where the attempt to reschedule existing connection to another real server needs proper redirect for the conntrack used by the IPVS connection. For example, when IPVS connection is created to NAT-ed real server we alter the reply direction of conntrack. If we later decide to select different real server we can not alter again the conntrack. And if we expire the old connection, the new connection is left without conntrack. So, the only way to redirect both the IPVS connection and the Netfilter's conntrack is to drop the SYN packet that hits existing connection, to wait for the next jiffie to expire the old connection and its conntrack and to rely on client's retransmission to create new connection as usually. Jiri Bohac provided a fix that drops all SYNs on rescheduling, I extended his patch to do such drops only for connections that use conntrack. Here is the original report from Jiri Bohac: Since commit `dc7b3eb900` ("ipvs: Fix reuse connection if real server is dead"), new connections to dead servers are redistributed immediately to new servers. The old connection is expired using ip_vs_conn_expire_now() which sets the connection timer to expire immediately. However, before the timer callback, ip_vs_conn_expire(), is run to clean the connection's conntrack entry, the new redistributed connection may already be established and its conntrack removed instead. Fix this by dropping the first packet of the new connection instead, like we do when the destination server is not available. The timer will have deleted the old conntrack entry long before the first packet of the new connection is retransmitted. Fixes: `dc7b3eb900` ("ipvs: Fix reuse connection if real server is dead") Signed-off-by: Jiri Bohac <jbohac@suse.cz> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2016-03-07 11:53:30 +09:00
Arnd Bergmann	3f20efba41	ipvs: handle ip_vs_fill_iph_skb_off failure ip_vs_fill_iph_skb_off() may not find an IP header, and gcc has determined that ip_vs_sip_fill_param() then incorrectly accesses the protocol fields: net/netfilter/ipvs/ip_vs_pe_sip.c: In function 'ip_vs_sip_fill_param': net/netfilter/ipvs/ip_vs_pe_sip.c:76:5: error: 'iph.protocol' may be used uninitialized in this function [-Werror=maybe-uninitialized] if (iph.protocol != IPPROTO_UDP) ^ net/netfilter/ipvs/ip_vs_pe_sip.c:81:10: error: 'iph.len' may be used uninitialized in this function [-Werror=maybe-uninitialized] dataoff = iph.len + sizeof(struct udphdr); ^ This adds a check for the ip_vs_fill_iph_skb_off() return code before looking at the ip header data returned from it. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: `b0e010c527` ("ipvs: replace ip_vs_fill_ip4hdr with ip_vs_fill_iph_skb_off") Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2016-03-07 11:53:28 +09:00
Pablo Neira Ayuso	8a6bf5da1a	netfilter: nft_masq: support port range Complete masquerading support by allowing port range selection. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-03-02 20:05:27 +01:00
Florian Westphal	b9e69e1273	netfilter: xtables: don't hook tables by default delay hook registration until the table is being requested inside a namespace. Historically, a particular table (iptables mangle, ip6tables filter, etc) was registered on module load. When netns support was added to iptables only the ip/ip6tables ruleset was made namespace aware, not the actual hook points. This means f.e. that when ipt_filter table/module is loaded on a system, then each namespace on that system has an (empty) iptables filter ruleset. In other words, if a namespace sends a packet, such skb is 'caught' by netfilter machinery and fed to hooking points for that table (i.e. INPUT, FORWARD, etc). Thanks to Eric Biederman, hooks are no longer global, but per namespace. This means that we can avoid allocation of empty ruleset in a namespace and defer hook registration until we need the functionality. We register a tables hook entry points ONLY in the initial namespace. When an iptables get/setockopt is issued inside a given namespace, we check if the table is found in the per-namespace list. If not, we attempt to find it in the initial namespace, and, if found, create an empty default table in the requesting namespace and register the needed hooks. Hook points are destroyed only once namespace is deleted, there is no 'usage count' (it makes no sense since there is no 'remove table' operation in xtables api). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-03-02 20:05:24 +01:00
Pablo Neira Ayuso	a7ed31cf65	Merge tag 'ipvs-for-v4.6' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next into HEAD Simon Horman says: ==================== please consider these cleanups for IPVS for v4.6. * Arnd Bergmann has resolved a bunch of unused variable warnings and; * Yannick Brosseau has removed a noisy debug message ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-03-02 20:05:01 +01:00
WANG Cong	64d4e3431e	net: remove skb_sender_cpu_clear() After commit `52bd2d62ce` ("net: better skb->sender_cpu and skb->napi_id cohabitation") skb_sender_cpu_clear() becomes empty and can be removed. Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 17:36:47 -05:00
Sudip Mukherjee	8ee225e785	netfilter: xt_osf: remove unused variable While building with W=1 we got the warning: net/netfilter/xt_osf.c:265:9: warning: variable 'loop_cont' set but not used The local variable loop_cont was only initialized and then assigned a value but was never used or checked after that. While removing the variable, the case of OSFOPT_TS was not removed so that it will serve as a reminder to us that we can do something in that particular case. Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-02-29 13:59:43 +01:00
Florian Westphal	b07edbe1cf	netfilter: meta: add PRANDOM support Can be used to randomly match packets e.g. for statistic traffic sampling. See commit `3ad0040573` ("bpf: split state from prandom_u32() and consolidate {c, e}BPF prngs") for more info why this doesn't use prandom_u32 directly. Unlike bpf nft_meta can be built as a module, so add an EXPORT_SYMBOL for prandom_seed_full_state too. Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-02-29 13:55:59 +01:00
Phil Turnbull	017b1b6d28	netfilter: nfnetlink_acct: validate NFACCT_FILTER parameters nfacct_filter_alloc doesn't validate the NFACCT_FILTER_MASK and NFACCT_FILTER_VALUE parameters which can trigger a NULL pointer dereference. CAP_NET_ADMIN is required to trigger the bug. Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-02-29 13:27:21 +01:00
Jozsef Kadlecsik	45040978c8	netfilter: ipset: Fix set:list type crash when flush/dump set in parallel Flushing/listing entries was not RCU safe, so parallel flush/dump could lead to kernel crash. Bug reported by Deniz Eren. Fixes netfilter bugzilla id #1050. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2016-02-24 20:32:21 +01:00
David S. Miller	b633353115	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/phy/bcm7xxx.c drivers/net/phy/marvell.c drivers/net/vxlan.c All three conflicts were cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-23 00:09:14 -05:00
Florian Westphal	c5b0db3263	nfnetlink: Revert "nfnetlink: add support for memory mapped netlink" reverts commit `3ab1f683bf` ("nfnetlink: add support for memory mapped netlink")' Like previous commits in the series, remove wrappers that are not needed after mmapped netlink removal. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-18 11:42:22 -05:00
Florian Westphal	905f0a739a	nfnetlink: remove nfnetlink_alloc_skb Following mmapped netlink removal this code can be simplified by removing the alloc wrapper. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-18 11:42:19 -05:00
Arnd Bergmann	f6ca9f46f6	netfilter: ipvs: avoid unused variable warnings The proc_create() and remove_proc_entry() functions do not reference their arguments when CONFIG_PROC_FS is disabled, so we get a couple of warnings about unused variables in IPVS: ipvs/ip_vs_app.c:608:14: warning: unused variable 'net' [-Wunused-variable] ipvs/ip_vs_ctl.c:3950:14: warning: unused variable 'net' [-Wunused-variable] ipvs/ip_vs_ctl.c:3994:14: warning: unused variable 'net' [-Wunused-variable] This removes the local variables and instead looks them up separately for each use, which obviously avoids the warning. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 4c50a8ce2b63 ("netfilter: ipvs: avoid unused variable warning") Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2016-02-18 09:17:58 +09:00
Yannick Brosseau	2d9e9b0d05	netfilter: ipvs: Remove noisy debug print from ip_vs_del_service This have been there for a long time, but does not seem to add value Signed-off-by: Yannick Brosseau <scientist@fb.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2016-02-18 09:17:58 +09:00
Edward Cree	6fa79666e2	net: ip_tunnel: remove 'csum_help' argument to iptunnel_handle_offloads All users now pass false, so we can remove it, and remove the code that was conditional upon it. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-12 05:52:16 -05:00
Craig Gallek	a583636a83	inet: refactor inet[6]_lookup functions to take skb This is a preliminary step to allow fast socket lookup of SO_REUSEPORT groups. Doing so with a BPF filter will require access to the skb in question. This change plumbs the skb (and offset to payload data) through the call stack to the listening socket lookup implementations where it will be used in a following patch. Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-11 03:54:14 -05:00
Anton Protopopov	5cc6ce9ff2	netfilter: nft_counter: fix erroneous return values The nft_counter_init() and nft_counter_clone() functions should return negative error value -ENOMEM instead of positive ENOMEM. Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-02-08 13:05:02 +01:00
Arnd Bergmann	08a7f5d3f5	netfilter: tee: select NF_DUP_IPV6 unconditionally The NETFILTER_XT_TARGET_TEE option selects NF_DUP_IPV6 whenever IP6_NF_IPTABLES is enabled, and it ensures that it cannot be builtin itself if NF_CONNTRACK is a loadable module, as that is a dependency for NF_DUP_IPV6. However, NF_DUP_IPV6 can be enabled even if IP6_NF_IPTABLES is turned off, and it only really depends on IPV6. With the current check in tee_tg6, we call nf_dup_ipv6() whenever NF_DUP_IPV6 is enabled. This can however be a loadable module which is unreachable from a built-in xt_TEE: net/built-in.o: In function `tee_tg6': :(.text+0x67728): undefined reference to `nf_dup_ipv6' The bug was originally introduced in the split of the xt_TEE module into separate modules for ipv4 and ipv6, and two patches tried to fix it unsuccessfully afterwards. This is a revert of the the first incorrect attempt to fix it, going back to depending on IPV6 as the dependency, and we adapt the 'select' condition accordingly. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: `bbde9fc182` ("netfilter: factor out packet duplication for IPv4/IPv6") Fixes: `116984a316` ("netfilter: xt_TEE: use IS_ENABLED(CONFIG_NF_DUP_IPV6)") Fixes: `74ec4d55c4` ("netfilter: fix xt_TEE and xt_TPROXY dependencies") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-02-08 12:58:28 +01:00
Phil Turnbull	c58d6c9368	netfilter: nfnetlink: correctly validate length of batch messages If nlh->nlmsg_len is zero then an infinite loop is triggered because 'skb_pull(skb, msglen);' pulls zero bytes. The calculation in nlmsg_len() underflows if 'nlh->nlmsg_len < NLMSG_HDRLEN' which bypasses the length validation and will later trigger an out-of-bound read. If the length validation does fail then the malformed batch message is copied back to userspace. However, we cannot do this because the nlh->nlmsg_len can be invalid. This leads to an out-of-bounds read in netlink_ack: [ 41.455421] ================================================================== [ 41.456431] BUG: KASAN: slab-out-of-bounds in memcpy+0x1d/0x40 at addr ffff880119e79340 [ 41.456431] Read of size 4294967280 by task a.out/987 [ 41.456431] ============================================================================= [ 41.456431] BUG kmalloc-512 (Not tainted): kasan: bad access detected [ 41.456431] ----------------------------------------------------------------------------- ... [ 41.456431] Bytes b4 ffff880119e79310: 00 00 00 00 d5 03 00 00 b0 fb fe ff 00 00 00 00 ................ [ 41.456431] Object ffff880119e79320: 20 00 00 00 10 00 05 00 00 00 00 00 00 00 00 00 ............... [ 41.456431] Object ffff880119e79330: 14 00 0a 00 01 03 fc 40 45 56 11 22 33 10 00 05 .......@EV."3... [ 41.456431] Object ffff880119e79340: f0 ff ff ff 88 99 aa bb 00 14 00 0a 00 06 fe fb ................ ^^ start of batch nlmsg with nlmsg_len=4294967280 ... [ 41.456431] Memory state around the buggy address: [ 41.456431] ffff880119e79400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 41.456431] ffff880119e79480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 41.456431] >ffff880119e79500: 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc [ 41.456431] ^ [ 41.456431] ffff880119e79580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 41.456431] ffff880119e79600: fc fc fc fc fc fc fc fc fc fc fb fb fb fb fb fb [ 41.456431] ================================================================== Fix this with better validation of nlh->nlmsg_len and by setting NFNL_BATCH_FAILURE if any batch message fails length validation. CAP_NET_ADMIN is required to trigger the bugs. Fixes: `9ea2aa8b7d` ("netfilter: nfnetlink: validate nfnetlink header from batch") Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-02-08 12:56:54 +01:00
Florian Westphal	53c520c2ab	netfilter: cttimeout: fix deadlock due to erroneous unlock/lock conversion The spin_unlock call should have been left as-is, revert. Fixes: `b16c29191d` ("netfilter: nf_conntrack: use safer way to lock all buckets") Reported-by: kernel test robot <fengguang.wu@intel.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-02-01 00:15:28 +01:00
Pablo Neira Ayuso	7c7bdf3599	netfilter: nfnetlink: use original skbuff when acking batches Since `bd678e09dc` ("netfilter: nfnetlink: fix splat due to incorrect socket memory accounting in skbuff clones"), we don't manually attach the sk to the skbuff clone anymore, so we have to use the original skbuff from netlink_ack() which needs to access the sk pointer. Fixes: `bd678e09dc` ("netfilter: nfnetlink: fix splat due to incorrect socket memory accounting in skbuff clones") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-02-01 00:15:27 +01:00
Florian Westphal	d93c6258ee	netfilter: conntrack: resched in nf_ct_iterate_cleanup Ulrich reports soft lockup with following (shortened) callchain: NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! __netif_receive_skb_core+0x6e4/0x774 process_backlog+0x94/0x160 net_rx_action+0x88/0x178 call_do_softirq+0x24/0x3c do_softirq+0x54/0x6c __local_bh_enable_ip+0x7c/0xbc nf_ct_iterate_cleanup+0x11c/0x22c [nf_conntrack] masq_inet_event+0x20/0x30 [nf_nat_masquerade_ipv6] atomic_notifier_call_chain+0x1c/0x2c ipv6_del_addr+0x1bc/0x220 [ipv6] Problem is that nf_ct_iterate_cleanup can run for a very long time since it can be interrupted by softirq processing. Moreover, atomic_notifier_call_chain runs with rcu readlock held. So lets call cond_resched() in nf_ct_iterate_cleanup and defer the call to a work queue for the atomic_notifier_call_chain case. We also need another cond_resched in get_next_corpse, since we have to deal with iter() always returning false, in that case get_next_corpse will walk entire conntrack table. Reported-by: Ulrich Weber <uw@ocedo.com> Tested-by: Ulrich Weber <uw@ocedo.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-02-01 00:15:26 +01:00
Sasha Levin	b16c29191d	netfilter: nf_conntrack: use safer way to lock all buckets When we need to lock all buckets in the connection hashtable we'd attempt to lock 1024 spinlocks, which is way more preemption levels than supported by the kernel. Furthermore, this behavior was hidden by checking if lockdep is enabled, and if it was - use only 8 buckets(!). Fix this by using a global lock and synchronize all buckets on it when we need to lock them all. This is pretty heavyweight, but is only done when we need to resize the hashtable, and that doesn't happen often enough (or at all). Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-01-20 14:15:31 +01:00
Pablo Neira Ayuso	35b815392a	netfilter: nf_tables_netdev: fix error path in module initialization Unregister the chain type and return error, otherwise this leaks the subscription to the netdevice notifier call chain. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-01-18 13:53:37 +01:00
Eric Dumazet	d6b3347bf1	netfilter: xt_TCPMSS: handle CHECKSUM_COMPLETE in tcpmss_tg6() In case MSS option is added in TCP options, skb length increases by 4. IPv6 needs to update skb->csum if skb has CHECKSUM_COMPLETE, otherwise kernel complains loudly in netdev_rx_csum_fault() with a stack dump. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-01-18 12:18:17 +01:00
Pablo Neira Ayuso	efaea94aaf	netfilter: nft_ct: keep counters away from CONFIG_NF_CONNTRACK_LABELS This is accidental, they don't depend on the label infrastructure. Fixes: `48f66c905a` ("netfilter: nft_ct: add byte/packet counter support") Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Florian Westphal <fw@strlen.de>	2016-01-14 19:41:16 +01:00
Florian Westphal	f90d2d37fb	netfilter: ipset: allow a 0 netmask with hash_netiface type Jozsef says: The correct behaviour is that if we have ipset create test1 hash:net,iface ipset add test1 0.0.0.0/0,eth0 iptables -A INPUT -m set --match-set test1 src,src then the rule should match for any traffic coming in through eth0. This removes the -EINVAL runtime test to make matching work in case packet arrived via the specified interface. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1297092 Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-01-13 14:03:43 +01:00
Florian Westphal	4b8c4eddfc	netfilter: nft_byteorder: avoid unneeded le/be conversion steps David points out that we to three le/be conversions instead of just one. Doesn't matter on x86_64 w. gcc, but other architectures might be less lucky. Since it also simplifies code just follow his advice. Fixes: c0f3275f5cb ("nftables: byteorder: provide le/be 64 bit conversion helper") Suggested-by: David Laight <David.Laight@aculab.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-01-13 14:02:59 +01:00
David S. Miller	9b59377b75	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next, they are: 1) Release nf_tables objects on netns destructions via nft_release_afinfo(). 2) Destroy basechain and rules on netdevice removal in the new netdev family. 3) Get rid of defensive check against removal of inactive objects in nf_tables. 4) Pass down netns pointer to our existing nfnetlink callbacks, as well as commit() and abort() nfnetlink callbacks. 5) Allow to invert limit expression in nf_tables, so we can throttle overlimit traffic. 6) Add packet duplication for the netdev family. 7) Add forward expression for the netdev family. 8) Define pr_fmt() in conntrack helpers. 9) Don't leave nfqueue configuration on inconsistent state in case of errors, from Ken-ichirou MATSUZAWA, follow up patches are also from him. 10) Skip queue option handling after unbind. 11) Return error on unknown both in nfqueue and nflog command. 12) Autoload ctnetlink when NFQA_CFG_F_CONNTRACK is set. 13) Add new NFTA_SET_USERDATA attribute to store user data in sets, from Carlos Falgueras. 14) Add support for 64 bit byteordering changes nf_tables, from Florian Westphal. 15) Add conntrack byte/packet counter matching support to nf_tables, also from Florian. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-08 20:53:16 -05:00
Florian Westphal	48f66c905a	netfilter: nft_ct: add byte/packet counter support If the accounting extension isn't present, we'll return a counter value of 0. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-01-08 14:44:09 +01:00
Florian Westphal	ce1e7989d9	netfilter: nft_byteorder: provide 64bit le/be conversion Needed to convert the (64bit) conntrack counters to BE ordering. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-01-08 13:32:11 +01:00
Carlos Falgueras García	e6d8ecac9e	netfilter: nf_tables: Add new attributes into nft_set to store user data. User data is stored at after 'nft_set_ops' private data into 'data[]' flexible array. The field 'udata' points to user data and 'udlen' stores its length. Add new flag NFTA_SET_USERDATA. Signed-off-by: Carlos Falgueras García <carlosfg@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-01-08 13:25:08 +01:00
Ken-ichirou MATSUZAWA	eb075954e9	netfilter: nfnetlink_log: just returns error for unknown command This patch stops processing options for unknown command. Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-01-08 13:25:07 +01:00
Ken-ichirou MATSUZAWA	71b2e5f5ca	netfilter: nfnetlink_queue: autoload nf_conntrack_netlink module NFQA_CFG_F_CONNTRACK config flag This patch enables to load nf_conntrack_netlink module if NFQA_CFG_F_CONNTRACK config flag is specified. Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-01-08 13:25:06 +01:00
Ken-ichirou MATSUZAWA	21c3c971d1	netfilter: nfnetlink_queue: just returns error for unknown command This patch stops processing options for unknown command. Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-01-08 13:25:05 +01:00
Ken-ichirou MATSUZAWA	17bc6b4884	netfilter: nfnetlink_queue: don't handle options after unbind This patch stops processing after destroying a queue instance. Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-01-08 13:25:04 +01:00
Ken-ichirou MATSUZAWA	60d2c7f9ab	netfilter: nfnetlink_queue: validate dependencies to avoid breaking atomicity Check that dependencies are fulfilled before updating the queue instance, otherwise we can leave things in intermediate state on errors in nfqnl_recv_config(). Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-01-08 13:25:03 +01:00
Pablo Neira Ayuso	ad6d950393	netfilter: nf_ct_helper: define pr_fmt() Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-01-04 17:48:51 +01:00
Pablo Neira Ayuso	39e6dea28a	netfilter: nf_tables: add forward expression to the netdev family You can use this to forward packets from ingress to the egress path of the specified interface. This provides a fast path to bounce packets from one interface to another specific destination interface. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-01-04 17:48:38 +01:00
Pablo Neira Ayuso	502061f81d	netfilter: nf_tables: add packet duplication to the netdev family You can use this to duplicate packets and inject them at the egress path of the specified interface. This duplication allows you to inspect traffic from the dummy or any other interface dedicated to this purpose. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-01-03 21:04:23 +01:00
Pablo Neira Ayuso	c7862a5f0d	netfilter: nft_limit: allow to invert matching criteria This patch allows you to invert the ratelimit matching criteria, so you can match packets over the ratelimit. This is required to support what hashlimit does. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-01-03 20:58:52 +01:00
David S. Miller	c07f30ad68	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2015-12-31 18:20:10 -05:00
Pablo Neira Ayuso	5913beaf0d	netfilter: nfnetlink: pass down netns pointer to commit() and abort() callbacks Adapt callsites to avoid recurrent lookup of the netns pointer. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-12-28 18:43:15 +01:00
Pablo Neira Ayuso	7b8002a151	netfilter: nfnetlink: pass down netns pointer to call() and call_rcu() Adapt callsites to avoid recurrent lookup of the netns pointer. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-12-28 18:41:41 +01:00
Pablo Neira Ayuso	f4c756b4ea	netfilter: nf_tables: remove check against removal of inactive objects The following sequence inside a batch, although not very useful, is valid: add table foo ... delete table foo This may be generated by some robot while applying some incremental upgrade, so remove the defensive checks against this. This patch keeps the check on the get/dump path by now, we have to replace the inactive flag by introducing object generations. Reported-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-12-28 18:37:20 +01:00
Pablo Neira Ayuso	5ebe0b0eec	netfilter: nf_tables: destroy basechain and rules on netdevice removal If the netdevice is destroyed, the resources that are attached should be released too as they belong to the device that is now gone. Suggested-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-12-28 18:34:35 +01:00
Pablo Neira Ayuso	df05ef874b	netfilter: nf_tables: release objects on netns destruction We have to release the existing objects on netns removal otherwise we leak them. Chains are unregistered in first place to make sure no packets are walking on our rules and sets anymore. The object release happens by when we unregister the family via nft_release_afinfo() which is called from nft_unregister_afinfo() from the corresponding __net_exit path in every family. Reported-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-12-28 18:34:35 +01:00
David S. Miller	59ce9670ce	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains the first batch of Netfilter updates for the upcoming 4.5 kernel. This batch contains userspace netfilter header compilation fixes, support for packet mangling in nf_tables, the new tracing infrastructure for nf_tables and cgroup2 support for iptables. More specifically, they are: 1) Two patches to include dependencies in our netfilter userspace headers to resolve compilation problems, from Mikko Rapeli. 2) Four comestic cleanup patches for the ebtables codebase, from Ian Morris. 3) Remove duplicate include in the netfilter reject infrastructure, from Stephen Hemminger. 4) Two patches to simplify the netfilter defragmentation code for IPv6, patch from Florian Westphal. 5) Fix root ownership of /proc/net netfilter for unpriviledged net namespaces, from Philip Whineray. 6) Get rid of unused fields in struct nft_pktinfo, from Florian Westphal. 7) Add mangling support to our nf_tables payload expression, from Patrick McHardy. 8) Introduce a new netlink-based tracing infrastructure for nf_tables, from Florian Westphal. 9) Change setter functions in nfnetlink_log to be void, from Rami Rosen. 10) Add netns support to the cttimeout infrastructure. 11) Add cgroup2 support to iptables, from Tejun Heo. 12) Introduce nfnl_dereference_protected() in nfnetlink, from Florian. 13) Add support for mangling pkttype in the nf_tables meta expression, also from Florian. BTW, I need that you pull net into net-next, I have another batch that requires changes that I don't yet see in net. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-12-18 15:37:42 -05:00
Florian Westphal	d5f79b6e4d	netfilter: nft_ct: include direction when dumping NFT_CT_L3PROTOCOL key one nft userspace test case fails with 'ct l3proto original ipv4' mismatches 'ct l3proto ipv4' ... because NFTA_CT_DIRECTION attr is missing. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-12-18 14:45:45 +01:00
Pablo Neira Ayuso	aa47e42c60	netfilter: nf_tables: use skb->protocol instead of assuming ethernet header Otherwise we may end up with incorrect network and transport header for other protocols. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-12-18 14:45:45 +01:00
Florian Westphal	b4aae759c2	netfilter: meta: add support for setting skb->pkttype This allows to redirect bridged packets to local machine: ether type ip ether daddr set aa:53:08:12:34:56 meta pkttype set unicast Without 'set unicast', ip stack discards PACKET_OTHERHOST skbs. It is also useful to add support for a '-m cluster like' nft rule (where switch floods packets to several nodes, and each cluster node node processes a subset of packets for load distribution). Mangling is restricted to HOST/OTHER/BROAD/MULTICAST, i.e. you cannot set skb->pkt_type to PACKET_KERNEL or change PACKET_LOOPBACK to PACKET_HOST. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-12-18 14:12:56 +01:00
David S. Miller	b3e0d3d7ba	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/geneve.c Here we had an overlapping change, where in 'net' the extraneous stats bump was being removed whilst in 'net-next' the final argument to udp_tunnel6_xmit_skb() was being changed. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-12-17 22:08:28 -05:00
Tom Herbert	53692b1de4	sctp: Rename NETIF_F_SCTP_CSUM to NETIF_F_SCTP_CRC The SCTP checksum is really a CRC and is very different from the standards 1's complement checksum that serves as the checksum for IP protocols. This offload interface is also very different. Rename NETIF_F_SCTP_CSUM to NETIF_F_SCTP_CRC to highlight these differences. The term CSUM should be reserved in the stack to refer to the standard 1's complement IP checksum. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-12-15 16:49:58 -05:00
Florian Westphal	9c55d3b545	nfnetlink: add nfnl_dereference_protected helper to avoid overly long line in followup patch. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-12-15 15:14:04 +01:00
Tejun Heo	c38c4597e4	netfilter: implement xt_cgroup cgroup2 path match This patch implements xt_cgroup path match which matches cgroup2 membership of the associated socket. The match is recursive and invertible. For rationales on introducing another cgroup based match, please refer to a preceding commit "sock, cgroup: add sock->sk_cgroup". v3: Folded into xt_cgroup as a new revision interface as suggested by Pablo. v2: Included linux/limits.h from xt_cgroup2.h for PATH_MAX. Added explicit alignment to the priv field. Both suggested by Jan. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Daniel Wagner <daniel.wagner@bmw-carit.de> CC: Neil Horman <nhorman@tuxdriver.com> Cc: Jan Engelhardt <jengelh@inai.de> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-12-14 20:34:55 +01:00
Tejun Heo	4ec8ff0edc	netfilter: prepare xt_cgroup for multi revisions xt_cgroup will grow cgroup2 path based match. Postfix existing symbols with _v0 and prepare for multi revision registration. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Daniel Wagner <daniel.wagner@bmw-carit.de> CC: Neil Horman <nhorman@tuxdriver.com> Cc: Jan Engelhardt <jengelh@inai.de> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-12-14 20:34:52 +01:00
Pablo Neira Ayuso	a4ec80082c	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Resolve conflict between commit `264640fc2c` ("ipv6: distinguish frag queues by device for multicast and link-local packets") from the net tree and commit `029f7f3b87` ("netfilter: ipv6: nf_defrag: avoid/free clone operations") from the nf-next tree. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Conflicts: net/ipv6/netfilter/nf_conntrack_reasm.c	2015-12-14 20:31:16 +01:00
Pablo Neira	19576c9478	netfilter: cttimeout: add netns support Add a per-netns list of timeout objects and adjust code to use it. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-12-14 12:48:58 +01:00
Xin Long	a907e36d54	netfilter: nf_tables: use reverse traversal commit_list in nf_tables_abort When we use 'nft -f' to submit rules, it will build multiple rules into one netlink skb to send to kernel, kernel will process them one by one. meanwhile, it add the trans into commit_list to record every commit. if one of them's return value is -EAGAIN, status \|= NFNL_BATCH_REPLAY will be marked. after all the process is done. it will roll back all the commits. now kernel use list_add_tail to add trans to commit, and use list_for_each_entry_safe to roll back. which means the order of adding and rollback is the same. that will cause some cases cannot work well, even trigger call trace, like: 1. add a set into table foo [return -EAGAIN]: commit_list = 'add set trans' 2. del foo: commit_list = 'add set trans' -> 'del set trans' -> 'del tab trans' then nf_tables_abort will be called to roll back: firstly process 'add set trans': case NFT_MSG_NEWSET: trans->ctx.table->use--; list_del_rcu(&nft_trans_set(trans)->list); it will del the set from the table foo, but it has removed when del table foo [step 2], then the kernel will panic. the right order of rollback should be: 'del tab trans' -> 'del set trans' -> 'add set trans'. which is opposite with commit_list order. so fix it by rolling back commits with reverse order in nf_tables_abort. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-12-13 22:47:32 +01:00
Pablo Neira Ayuso	bd678e09dc	netfilter: nfnetlink: fix splat due to incorrect socket memory accounting in skbuff clones If we attach the sk to the skb from nfnetlink_rcv_batch(), then netlink_skb_destructor() will underflow the socket receive memory counter and we get warning splat when releasing the socket. $ cat /proc/net/netlink sk Eth Pid Groups Rmem Wmem Dump Locks Drops Inode ffff8800ca903000 12 0 00000000 -54144 0 0 2 0 17942 ^^^^^^ Rmem above shows an underflow. And here below the warning splat: [ 1363.815976] WARNING: CPU: 2 PID: 1356 at net/netlink/af_netlink.c:958 netlink_sock_destruct+0x80/0xb9() [...] [ 1363.816152] CPU: 2 PID: 1356 Comm: kworker/u16:1 Tainted: G W 4.4.0-rc1+ #153 [ 1363.816155] Hardware name: LENOVO 23259H1/23259H1, BIOS G2ET32WW (1.12 ) 05/30/2012 [ 1363.816160] Workqueue: netns cleanup_net [ 1363.816163] 0000000000000000 ffff880119203dd0 ffffffff81240204 0000000000000000 [ 1363.816169] ffff880119203e08 ffffffff8104db4b ffffffff813d49a1 ffff8800ca771000 [ 1363.816174] ffffffff81a42b00 0000000000000000 ffff8800c0afe1e0 ffff880119203e18 [ 1363.816179] Call Trace: [ 1363.816181] <IRQ> [<ffffffff81240204>] dump_stack+0x4e/0x79 [ 1363.816193] [<ffffffff8104db4b>] warn_slowpath_common+0x9a/0xb3 [ 1363.816197] [<ffffffff813d49a1>] ? netlink_sock_destruct+0x80/0xb9 skb->sk was only needed to lookup for the netns, however we don't need this anymore since `633c9a840d` ("netfilter: nfnetlink: avoid recurrent netns lookups in call_batch") so this patch removes this manual socket assignment to resolve this problem. Reported-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Tested-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>	2015-12-10 18:16:29 +01:00
Pablo Neira Ayuso	633c9a840d	netfilter: nfnetlink: avoid recurrent netns lookups in call_batch Pass the net pointer to the call_batch callback functions so we can skip recurrent lookups. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Tested-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>	2015-12-10 13:49:24 +01:00
Florian Westphal	9fb0b519c7	netfilter: nf_tables: fix nf_log_trace based tracing nf_log_trace() outputs bogus 'TRACE:' strings because I forgot to update the comments array. Fixes: `33d5a7b14b` ("netfilter: nf_tables: extend tracing infrastructure") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-12-09 16:53:46 +01:00
Rosen, Rami	23509fcd4e	netfilter: nfnetlink_log: Change setter functions to be void Change return type of nfulnl_set_timeout() and nfulnl_set_qthresh() to be void. This patch changes the return type of the static methods nfulnl_set_timeout() and nfulnl_set_qthresh() to be void, as there is no justification and no need for these methods to return int. Signed-off-by: Rami Rosen <rami.rosen@intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-12-09 14:52:56 +01:00
Nikolay Borisov	639e077b43	netfilter: nfnetlink_queue: Unregister pernet subsys in case of init failure Commit `3bfe049807` ("netfilter: nfnetlink_{log,queue}: Register pernet in first place") reorganised the initialisation order of the pernet_subsys to avoid "use-before-initialised" condition. However, in doing so the cleanup logic in nfnetlink_queue got botched in that the pernet_subsys wasn't cleaned in case nfnetlink_subsys_register failed. This patch adds the necessary cleanup routine call. Fixes: `3bfe049807` ("netfilter: nfnetlink_{log,queue}: Register pernet in first place") Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-12-09 14:46:47 +01:00
Florian Westphal	e639f7ab07	netfilter: nf_tables: wrap tracing with a static key Only needed when meta nftrace rule(s) were added. The assumption is that no such rules are active, so the call to nft_trace_init is "never" needed. When nftrace rules are active, we always call the nft_trace_* functions, but will only send netlink messages when all of the following are true: - traceinfo structure was initialised - skb->nf_trace == 1 - at least one subscriber to trace group. Adding an extra conditional (static_branch ... && skb->nf_trace) nft_trace_init( ..) Is possible but results in a larger nft_do_chain footprint. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-12-09 13:23:13 +01:00
Florian Westphal	33d5a7b14b	netfilter: nf_tables: extend tracing infrastructure nft monitor mode can then decode and display this trace data. Parts of LL/Network/Transport headers are provided as separate attributes. Otherwise, printing IP address data becomes virtually impossible for userspace since in the case of the netdev family we really don't want userspace to have to know all the possible link layer types and/or sizes just to display/print an ip address. We also don't want userspace to have to follow ipv6 header chains to get the s/dport info, the kernel already did this work for us. To avoid bloating nft_do_chain all data required for tracing is encapsulated in nft_traceinfo. The structure is initialized unconditionally(!) for each nft_do_chain invocation. This unconditionall call will be moved under a static key in a followup patch. With lots of help from Patrick McHardy and Pablo Neira. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-12-09 13:18:37 +01:00
Tejun Heo	2a56a1fec2	net: wrap sock->sk_cgrp_prioidx and ->sk_classid inside a struct Introduce sock->sk_cgrp_data which is a struct sock_cgroup_data. ->sk_cgroup_prioidx and ->sk_classid are moved into it. The struct and its accessors are defined in cgroup-defs.h. This is to prepare for overloading the fields with a cgroup pointer. This patch mostly performs equivalent conversions but the followings are noteworthy. * Equality test before updating classid is removed from sock_update_classid(). This shouldn't make any noticeable difference and a similar test will be implemented on the helper side later. * sock_update_netprioidx() now takes struct sock_cgroup_data and can be moved to netprio_cgroup.h without causing include dependency loop. Moved. * The dummy version of sock_update_netprioidx() converted to a static inline function while at it. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-12-08 22:02:33 -05:00
Patrick McHardy	7ec3f7b47b	netfilter: nft_payload: add packet mangling support Add support for mangling packet payload. Checksum for the specified base header is updated automatically if requested, however no updates for any kind of pseudo headers are supported, meaning no stateless NAT is supported. For checksum updates different checksumming methods can be specified. The currently supported methods are NONE for no checksum updates, and INET for internet type checksums. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-11-25 13:54:51 +01:00
Philip Whineray	f13f2aeed1	netfilter: Set /proc/net entries owner to root in namespace Various files are owned by root with 0440 permission. Reading them is impossible in an unprivileged user namespace, interfering with firewall tools. For instance, iptables-save relies on /proc/net/ip_tables_names contents to dump only loaded tables. This patch assigned ownership of the following files to root in the current namespace: - /proc/net/_tables_names - /proc/net/_tables_matches - /proc/net/_tables_targets - /proc/net/nf_conntrack - /proc/net/nf_conntrack_expect - /proc/net/netfilter/nfnetlink_log A mapping for root must be available, so this order should be followed: unshare(CLONE_NEWUSER); / Setup the mapping */ unshare(CLONE_NEWNET); Signed-off-by: Philip Whineray <phil@firehol.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-11-25 13:54:09 +01:00
Arnd Bergmann	8e662164ab	netfilter: nfnetlink_queue: avoid harmless unnitialized variable warnings Several ARM default configurations give us warnings on recent compilers about potentially uninitialized variables in the nfnetlink code in two functions: net/netfilter/nfnetlink_queue.c: In function 'nfqnl_build_packet_message': net/netfilter/nfnetlink_queue.c:519:19: warning: 'nfnl_ct' may be used uninitialized in this function [-Wmaybe-uninitialized] if (ct && nfnl_ct->build(skb, ct, ctinfo, NFQA_CT, NFQA_CT_INFO) < 0) Moving the rcu_dereference(nfnl_ct_hook) call outside of the conditional code avoids the warning without forcing us to preinitialize the variable. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: `a4b4766c3c` ("netfilter: nfnetlink_queue: rename related to nfqueue attaching conntrack info") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-11-23 11:22:26 +01:00
Eric Dumazet	340c78e590	ipvs: use skb_to_full_sk() helper SYNACK packets might be attached to request sockets. Use skb_to_full_sk() helper to avoid illegal accesses to inet_sk(skb->sk) Fixes: `ca6fb06518` ("tcp: attach SYNACK messages to request sockets instead of listener") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-15 18:39:48 -05:00
David S. Miller	382a483e53	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree. This large batch that includes fixes for ipset, netfilter ingress, nf_tables dynamic set instantiation and a longstanding Kconfig dependency problem. More specifically, they are: 1) Add missing check for empty hook list at the ingress hook, from Florian Westphal. 2) Input and output interface are swapped at the ingress hook, reported by Patrick McHardy. 3) Resolve ipset extension alignment issues on ARM, patch from Jozsef Kadlecsik. 4) Fix bit check on bitmap in ipset hash type, also from Jozsef. 5) Release buckets when all entries have expired in ipset hash type, again from Jozsef. 6) Oneliner to initialize conntrack tuple object in the PPTP helper, otherwise the conntrack lookup may fail due to random bits in the structure holes, patch from Anthony Lineham. 7) Silence a bogus gcc warning in nfnetlink_log, from Arnd Bergmann. 8) Fix Kconfig dependency problems with TPROXY, socket and dup, also from Arnd. 9) Add __netdev_alloc_pcpu_stats() to allow creating percpu counters from atomic context, this is required by the follow up fix for nf_tables. 10) Fix crash from the dynamic set expression, we have to add new clone operation that should be defined when a simple memcpy is not enough. This resolves a crash when using per-cpu counters with new Patrick McHardy's flow table nft support. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-12 14:17:16 -05:00
Pablo Neira Ayuso	086f332167	netfilter: nf_tables: add clone interface to expression operations With the conversion of the counter expressions to make it percpu, we need to clone the percpu memory area, otherwise we crash when using counters from flow tables. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-11-10 23:47:32 +01:00
Arnd Bergmann	74ec4d55c4	netfilter: fix xt_TEE and xt_TPROXY dependencies Kconfig is too smart for its own good: a Kconfig line that states select NF_DEFRAG_IPV6 if IP6_NF_IPTABLES means that if IP6_NF_IPTABLES is set to 'm', then NF_DEFRAG_IPV6 will also be set to 'm', regardless of the state of the symbol from which it is selected. When the xt_TEE driver is built-in and nothing else forces NF_DEFRAG_IPV6 to be built-in, this causes a link-time error: net/built-in.o: In function `tee_tg6': net/netfilter/xt_TEE.c:46: undefined reference to `nf_dup_ipv6' This works around that behavior by changing the dependency to 'if IP6_NF_IPTABLES != n', which is interpreted as boolean expression rather than a tristate and causes the NF_DEFRAG_IPV6 symbol to be built-in as well. The bug only occurs once in thousands of 'randconfig' builds and does not really impact real users. From inspecting the other surrounding Kconfig symbols, I am guessing that NETFILTER_XT_TARGET_TPROXY and NETFILTER_XT_MATCH_SOCKET have the same issue. If not, this change should still be harmless. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-11-10 23:46:57 +01:00
Arnd Bergmann	c872a2d9e3	netfilter: nfnetlink_log: work around uninitialized variable warning After a recent (correct) change, gcc started warning about the use of the 'flags' variable in nfulnl_recv_config() net/netfilter/nfnetlink_log.c: In function 'nfulnl_recv_config': net/netfilter/nfnetlink_log.c:320:14: warning: 'flags' may be used uninitialized in this function [-Wmaybe-uninitialized] net/netfilter/nfnetlink_log.c:828:6: note: 'flags' was declared here The warning first shows up in ARM s3c2410_defconfig with gcc-4.3 or higher (including 5.2.1, which is the latest version I checked) I tried working around it by rearranging the code but had no success with that. As a last resort, this initializes the variable to zero, which shuts up the warning, but means that we don't get a warning if the code is ever changed in a way that actually causes the variable to be used without first being written. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: `8cbc870829` ("netfilter: nfnetlink_log: validate dependencies to avoid breaking atomicity") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-11-10 23:46:36 +01:00
Eric Dumazet	3aed822591	netfilter: nft_meta: use skb_to_full_sk() helper SYNACK packets might be attached to request sockets. Fixes: `ca6fb06518` ("tcp: attach SYNACK messages to request sockets instead of listener") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-08 20:56:39 -05:00
Eric Dumazet	fdd723e2a8	netfilter: xt_owner: use skb_to_full_sk() helper SYNACK packets might be attached to a request socket, xt_owner wants to gte the listener in this case. Fixes: `ca6fb06518` ("tcp: attach SYNACK messages to request sockets instead of listener") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-08 20:56:39 -05:00
Jozsef Kadlecsik	0aae24eb40	netfilter: ipset: Fix hash type expire: release empty hash bucket block When all entries are expired/all slots are empty, release the bucket. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-11-07 11:28:49 +01:00
Jozsef Kadlecsik	e9dfdc052d	netfilter: ipset: Fix hash:* type expiration Incorrect index was used when the data blob was shrinked at expiration, which could lead to falsely expired entries and memory leak when the comment extension was used too. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-11-07 11:23:34 +01:00
Jozsef Kadlecsik	95ad1f4a93	netfilter: ipset: Fix extension alignment The data extensions in ipset lacked the proper memory alignment and thus could lead to kernel crash on several architectures. Therefore the structures have been reorganized and alignment attributes added where needed. The patch was tested on armv7h by Gerhard Wiesinger and on x86_64, sparc64 by Jozsef Kadlecsik. Reported-by: Gerhard Wiesinger <lists@wiesinger.com> Tested-by: Gerhard Wiesinger <lists@wiesinger.com> Tested-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>	2015-11-07 11:21:47 +01:00
David S. Miller	d9c7dbc11a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Conflicts: net/netfilter/xt_TEE.c Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree, they are: 1) Fix crash when TEE target is used with no --oif, from Eric Dumazet. 2) Oneliner to fix a crash on the redirect traffic to localhost infrastructure when interface has not yet an address, from Munehisa Kamata. 3) Oneliner not to request module all the time from nfnetlink due to wrong type value, from Florian Westphal. I'll make sure these patches 1 and 2 hit -stable. ==================== The conflict in net/netfilter/xt_TEE.c was minor, a change to the 'oif' selection overlapping a function signature change for the nf_dup_ipv{4,6}() routines. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-04 20:47:50 -05:00
Florian Westphal	dbc3617f4c	netfilter: nfnetlink: don't probe module if it exists nfnetlink_bind request_module()s all the time as nfnetlink_get_subsys() shifts the argument by 8 to obtain the subsys id. So using type instead of type << 8 always returns NULL. Fixes: `03292745b0` ("netlink: add nlk->netlink_bind hook for module auto-loading") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-28 03:40:50 +01:00
Munehisa Kamata	94f9cd8143	netfilter: nf_nat_redirect: add missing NULL pointer check Commit `8b13eddfdf` ("netfilter: refactor NAT redirect IPv4 to use it from nf_tables") has introduced a trivial logic change which can result in the following crash. BUG: unable to handle kernel NULL pointer dereference at 0000000000000030 IP: [<ffffffffa033002d>] nf_nat_redirect_ipv4+0x2d/0xa0 [nf_nat_redirect] PGD 3ba662067 PUD 3ba661067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: ipv6(E) xt_REDIRECT(E) nf_nat_redirect(E) xt_tcpudp(E) iptable_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) nf_nat(E) nf_conntrack(E) ip_tables(E) x_tables(E) binfmt_misc(E) xfs(E) libcrc32c(E) evbug(E) evdev(E) psmouse(E) i2c_piix4(E) i2c_core(E) acpi_cpufreq(E) button(E) ext4(E) crc16(E) jbd2(E) mbcache(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) CPU: 0 PID: 2536 Comm: ip Tainted: G E 4.1.7-15.23.amzn1.x86_64 #1 Hardware name: Xen HVM domU, BIOS 4.2.amazon 05/06/2015 task: ffff8800eb438000 ti: ffff8803ba664000 task.ti: ffff8803ba664000 [...] Call Trace: <IRQ> [<ffffffffa0334065>] redirect_tg4+0x15/0x20 [xt_REDIRECT] [<ffffffffa02e2e99>] ipt_do_table+0x2b9/0x5e1 [ip_tables] [<ffffffffa0328045>] iptable_nat_do_chain+0x25/0x30 [iptable_nat] [<ffffffffa031777d>] nf_nat_ipv4_fn+0x13d/0x1f0 [nf_nat_ipv4] [<ffffffffa0328020>] ? iptable_nat_ipv4_fn+0x20/0x20 [iptable_nat] [<ffffffffa031785e>] nf_nat_ipv4_in+0x2e/0x90 [nf_nat_ipv4] [<ffffffffa03280a5>] iptable_nat_ipv4_in+0x15/0x20 [iptable_nat] [<ffffffff81449137>] nf_iterate+0x57/0x80 [<ffffffff814491f7>] nf_hook_slow+0x97/0x100 [<ffffffff814504d4>] ip_rcv+0x314/0x400 unsigned int nf_nat_redirect_ipv4(struct sk_buff *skb, ... { ... rcu_read_lock(); indev = __in_dev_get_rcu(skb->dev); if (indev != NULL) { ifa = indev->ifa_list; newdst = ifa->ifa_local; <--- } rcu_read_unlock(); ... } Before the commit, 'ifa' had been always checked before access. After the commit, however, it could be accessed even if it's NULL. Interestingly, this was once fixed in 2003. http://marc.info/?l=netfilter-devel&m=106668497403047&w=2 In addition to the original one, we have seen the crash when packets that need to be redirected somehow arrive on an interface which hasn't been yet fully configured. This change just reverts the logic to the old behavior to avoid the crash. Fixes: `8b13eddfdf` ("netfilter: refactor NAT redirect IPv4 to use it from nf_tables") Signed-off-by: Munehisa Kamata <kamatam@amazon.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-27 06:54:56 +01:00
David S. Miller	ba3e2084f2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: net/ipv6/xfrm6_output.c net/openvswitch/flow_netlink.c net/openvswitch/vport-gre.c net/openvswitch/vport-vxlan.c net/openvswitch/vport.c net/openvswitch/vport.h The openvswitch conflicts were overlapping changes. One was the egress tunnel info fix in 'net' and the other was the vport ->send() op simplification in 'net-next'. The xfrm6_output.c conflicts was also a simplification overlapping a bug fix. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-24 06:54:12 -07:00
Eric Dumazet	45efccdbec	netfilter: xt_TEE: fix NULL dereference iptables -I INPUT ... -j TEE --gateway 10.1.2.3 <crash> because --oif was not specified tee_tg_check() sets ->priv pointer to NULL in this case. Fixes: `bbde9fc182` ("netfilter: factor out packet duplication for IPv4/IPv6") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-22 12:55:25 +02:00
Pablo Neira Ayuso	f0a0a978b6	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next This merge resolves conflicts with `75aec9df3a` ("bridge: Remove br_nf_push_frag_xmit_sk") as part of Eric Biederman's effort to improve netns support in the network stack that reached upstream via David's net-next tree. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Conflicts: net/bridge/br_netfilter_hooks.c	2015-10-17 14:28:03 +02:00
Nikolay Borisov	00db674bed	netfilter: ipset: Fix sleeping memory allocation in atomic context Commit `00590fdd5b` introduced RCU locking in list type and in doing so introduced a memory allocation in list_set_add, which is done in an atomic context, due to the fact that ipset rcu list modifications are serialised with a spin lock. The reason why we can't use a mutex is that in addition to modifying the list with ipset commands, it's also being modified when a particular ipset rule timeout expires aka garbage collection. This gc is triggered from set_cleanup_entries, which in turn is invoked from a timer thus requiring the lock to be bh-safe. Concretely the following call chain can lead to "sleeping function called in atomic context" splat: call_ad -> list_set_uadt -> list_set_uadd -> kzalloc(, GFP_KERNEL). And since GFP_KERNEL allows initiating direct reclaim thus potentially sleeping in the allocation path. To fix the issue change the allocation type to GFP_ATOMIC, to correctly reflect that it is occuring in an atomic context. Fixes: `00590fdd5b` ("netfilter: ipset: Introduce RCU locking in list type") Signed-off-by: Nikolay Borisov <kernel@kyup.com> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-17 13:01:24 +02:00
Florian Westphal	81b4325eba	netfilter: nf_queue: remove rcu_read_lock calls All verdict handlers make use of the nfnetlink .call_rcu callback so rcu readlock is already held. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-16 18:22:41 +02:00
Florian Westphal	ed78d09d59	netfilter: make nf_queue_entry_get_refs return void We don't care if module is being unloaded anymore since hook unregister handling will destroy queue entries using that hook. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-16 18:22:23 +02:00
Florian Westphal	2ffbceb2b0	netfilter: remove hook owner refcounting since commit `8405a8fff3` ("netfilter: nf_qeueue: Drop queue entries on nf_unregister_hook") all pending queued entries are discarded. So we can simply remove all of the owner handling -- when module is removed it also needs to unregister all its hooks. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-16 18:21:39 +02:00
Pablo Neira	8cbc870829	netfilter: nfnetlink_log: validate dependencies to avoid breaking atomicity Check that dependencies are fulfilled before updating the logger instance, otherwise we can leave things in intermediate state on errors in nfulnl_recv_config(). [ Ken-ichirou reports that this is also fixing missing instance refcnt drop on error introduced in his patch `914eebf2f4` ("netfilter: nfnetlink_log: autoload nf_conntrack_netlink module NFQA_CFG_F_CONNTRACK config flag"). ] Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Tested-by: Ken-ichirou MATSUZAWA <chamaken@gmail.com>	2015-10-15 06:45:03 +02:00
Pablo Neira Ayuso	336a3b3ee9	netfilter: nfnetlink_log: consolidate check for instance in nfulnl_recv_config() This patch consolidates the check for valid logger instance once we have passed the command handling: The config message that we receive may contain the following info: 1) Command only: We always get a valid instance pointer if we just created it. In case that the instance is being destroyed or the command is unknown, we jump to exit path of nfulnl_recv_config(). This patch doesn't modify this handling. 2) Config only: In this case, the instance must always exist since the user is asking for configuration updates. If the instance doesn't exist this returns -ENODEV. 3) No command and no configs are specified: This case is rare. The user is sending us a config message with neither commands nor config options. In this case, we have to check if the instance exists and bail out otherwise. Before this patch, it was possible to send a config message with no command and no config updates for an unexisting instance without triggering an error. So this is the only case that changes. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Tested-by: Ken-ichirou MATSUZAWA <chamaken@gmail.com>	2015-10-15 06:44:31 +02:00
Florian Westphal	514ed62ed3	netfilter: sync with packet rx also after removing queue entries We need to sync packet rx again after flushing the queue entries. Otherwise, the following race could happen: cpu1: nf_unregister_hook(H) called, H unliked from lists, calls synchronize_net() to wait for packet rx completion. Problem is that while no new nf_queue_entry structs that use H can be allocated, another CPU might receive a verdict from userspace just before cpu1 calls nf_queue_nf_hook_drop to remove this entry: cpu2: receive verdict from userspace, lock queue cpu2: unlink nf_queue_entry struct E, which references H, from queue list cpu1: calls nf_queue_nf_hook_drop, blocks on queue spinlock cpu2: unlock queue cpu1: nf_queue_nf_hook_drop drops affected queue entries cpu2: call nf_reinject for E cpu1: kfree(H) cpu2: potential use-after-free for H Cc: Eric W. Biederman <ebiederm@xmission.com> Fixes: `085db2c045` ("netfilter: Per network namespace netfilter hooks.") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-13 13:59:56 +02:00
Florian Westphal	7ceebfe46e	netfilter: nfqueue: don't use prev pointer Usage of -prev seems buggy. While packet was out our hook cannot be removed but we have no way to know if the previous one is still valid. So better not use ->prev at all. Since NF_REPEAT just asks to invoke same hook function again, just do so, and continue with nf_interate if we get an ACCEPT verdict. A side effect of this change is that if nf_reinject(NF_REPEAT) causes another REPEAT we will now drop the skb instead of a kernel loop. However, NF_REPEAT loops would be a bug so this should not happen anyway. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-13 12:03:24 +02:00
Eric W. Biederman	19bcf9f203	ipv4: Pass struct net into ip_defrag and ip_check_defrag The function ip_defrag is called on both the input and the output paths of the networking stack. In particular conntrack when it is tracking outbound packets from the local machine calls ip_defrag. So add a struct net parameter and stop making ip_defrag guess which network namespace it needs to defragment packets in. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-12 19:44:16 -07:00
Ken-ichirou MATSUZAWA	914eebf2f4	netfilter: nfnetlink_log: autoload nf_conntrack_netlink module NFQA_CFG_F_CONNTRACK config flag This patch enables to load nf_conntrack_netlink module if NFULNL_CFG_F_CONNTRACK config flag is specified. Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-12 21:44:12 +02:00
Pablo Neira Ayuso	d53195c259	Merge tag 'ipvs4-for-v4.4' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next Simon Horman says: ==================== Fourth Round of IPVS Updates for v4.4 please consider these build warning cleanups from David Ahern and myself. They resolve some minor side effects of Eric Biederman' heroic work to cleanup IPVS which you recently pulled: its queued up for v4.4 so no need to worry about earlier kernel versions. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-12 17:38:54 +02:00
Pablo Neira Ayuso	4302f5eeb9	nfnetlink_cttimeout: add rcu_barrier() on module removal Make sure kfree_rcu() released objects before leaving the module removal exit path. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-12 17:04:41 +02:00
Pablo Neira Ayuso	ae2d708ed8	netfilter: conntrack: fix crash on timeout object removal The object and module refcounts are updated for each conntrack template, however, if we delete the iptables rules and we flush the timeout database, we may end up with invalid references to timeout object that are just gone. Resolve this problem by setting the timeout reference to NULL when the custom timeout entry is removed from our base. This patch requires some RCU trickery to ensure safe pointer handling. This handling is similar to what we already do with conntrack helpers, the idea is to avoid bumping the timeout object reference counter from the packet path to avoid the cost of atomic ops. Reported-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-12 17:04:34 +02:00
Pablo Neira Ayuso	403d89ad9c	netfilter: xt_CT: don't put back reference to timeout policy object On success, this shouldn't put back the timeout policy object, otherwise we may have module refcount overflow and we allow deletion of timeout that are still in use. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-12 16:54:45 +02:00
Yaowei Bai	875e082949	net/nfnetlink: lockdep_nfnl_is_held can be boolean This patch makes lockdep_nfnl_is_held return bool to improve readability due to this particular function only using either one or zero as its return value. No functional change. Signed-off-by: Yaowei Bai <bywxiaobai@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-09 07:49:00 -07:00
Eric W. Biederman	33224b16ff	ipv4, ipv6: Pass net into ip_local_out and ip6_local_out Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:27:02 -07:00
Eric W. Biederman	792883303c	ipv6: Merge ip6_local_out and ip6_local_out_sk Stop hidding the sk parameter with an inline helper function and make all of the callers pass it, so that it is clear what the function is doing. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:26:58 -07:00
Eric W. Biederman	e2cb77db08	ipv4: Merge ip_local_out and ip_local_out_sk It is confusing and silly hiding a parameter so modify all of the callers to pass in the appropriate socket or skb->sk if no socket is known. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:26:57 -07:00
Eric W. Biederman	13206b6bff	net: Pass net into dst_output and remove dst_output_okfn Replace dst_output_okfn with dst_output Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-08 04:26:54 -07:00
Simon Horman	92240e8dc0	ipvs: Remove possibly unused variables from ip_vs_conn_net_{init,cleanup} If CONFIG_PROC_FS is undefined then the arguments of proc_create() and remove_proc_entry() are unused. As a result the net variables of ip_vs_conn_net_{init,cleanup} are unused. net/netfilter/ipvs//ip_vs_conn.c: In function ‘ip_vs_conn_net_init’: net/netfilter/ipvs//ip_vs_conn.c:1350:14: warning: unused variable ‘net’ [-Wunused-variable] net/netfilter/ipvs//ip_vs_conn.c: In function ‘ip_vs_conn_net_cleanup’: net/netfilter/ipvs//ip_vs_conn.c:1361:14: warning: unused variable ‘net’ [-Wunused-variable] ... Resolve this by dereferencing net as needed rather than storing it in a variable. Fixes: `3d99376689` ("ipvs: Pass ipvs not net into ip_vs_control_net_(init\|cleanup)") Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>	2015-10-07 10:12:00 +09:00
David Ahern	ed1c9f0e78	ipvs: Remove possibly unused variable from ip_vs_out Eric's net namespace changes in `1b75097dd7` leaves net unreferenced if CONFIG_IP_VS_IPV6 is not enabled: ../net/netfilter/ipvs/ip_vs_core.c: In function ‘ip_vs_out’: ../net/netfilter/ipvs/ip_vs_core.c:1177:14: warning: unused variable ‘net’ [-Wunused-variable] After the net refactoring there is only 1 user; push the reference to the 1 user. While the line length slightly exceeds 80 it seems to be the best change. Fixes: 1b75097dd7a26("ipvs: Pass ipvs into ip_vs_out") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Julian Anastasov <ja@ssi.bg> [horms: updated subject] Signed-off-by: Simon Horman <horms@verge.net.au>	2015-10-07 10:11:59 +09:00
Ken-ichirou MATSUZAWA	a29a9a585b	netfilter: nfnetlink_log: allow to attach conntrack This patch enables to include the conntrack information together with the packet that is sent to user-space via NFLOG, then a user-space program can acquire NATed information by this NFULA_CT attribute. Including the conntrack information is optional, you can set it via NFULNL_CFG_F_CONNTRACK flag with the NFULA_CFG_FLAGS attribute like NFQUEUE. Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-05 17:32:14 +02:00
Ken-ichirou MATSUZAWA	224a05975e	netfilter: ctnetlink: add const qualifier to nfnl_hook.get_ct get_ct as is and will not update its skb argument, and users of nfnl_ct_hook is currently only nfqueue, we can add const qualifier. Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp>	2015-10-05 17:32:13 +02:00
Ken-ichirou MATSUZAWA	83f3e94d34	netfilter: Kconfig rename QUEUE_CT to GLUE_CT Conntrack information attaching infrastructure is now generic and update it's name to use `glue' in previous patch. This patch updates Kconfig symbol name and adding NF_CT_NETLINK dependency. Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-05 17:32:12 +02:00
Ken-ichirou MATSUZAWA	a4b4766c3c	netfilter: nfnetlink_queue: rename related to nfqueue attaching conntrack info The idea of this series of patch is to attach conntrack information to nflog like nfqueue has already done. nfqueue conntrack info attaching basis is generic, rename those names to generic one, glue. Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-05 17:32:11 +02:00
Pablo Neira Ayuso	b28b1e826f	netfilter: nfnetlink_queue: use y2038 safe timestamp The __build_packet_message function fills a nfulnl_msg_packet_timestamp structure that uses 64-bit seconds and is therefore y2038 safe, but it uses an intermediate 'struct timespec' which is not. This trivially changes the code to use 'struct timespec64' instead, to correct the result on 32-bit architectures. This is a copy and paste of Arnd's original patch for nfnetlink_log. Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-05 17:27:25 +02:00
Pablo Neira Ayuso	2b5b1a01a7	Merge tag 'ipvs3-for-v4.4' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next Simon Horman says: ==================== Third Round of IPVS Updates for v4.4 please consider this build fix from Eric Biederman which resolves a build problem introduced in is excellent work to cleanup IPVS which you recently pulled: its queued up for v4.4 so no need to worry about earlier kernel versions. I have another minor cleanup, to fix a build warning, pending. However, I wanted to send this one to you now as its hit nf-next, net-next and in turn next, and a slow trickle of bug reports are appearing. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-05 17:27:06 +02:00
Arnd Bergmann	f6389ecbc5	nfnetlink: use y2038 safe timestamp The __build_packet_message function fills a nfulnl_msg_packet_timestamp structure that uses 64-bit seconds and is therefore y2038 safe, but it uses an intermediate 'struct timespec' which is not. This trivially changes the code to use 'struct timespec64' instead, to correct the result on 32-bit architectures. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: netfilter-devel@vger.kernel.org Cc: coreteam@netfilter.org Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:16:47 -07:00
Pablo Neira Ayuso	32f40c5fa7	netfilter: rename nfnetlink_queue_core.c to nfnetlink_queue.c Now that we have integrated the ct glue code into nfnetlink_queue without introducing dependencies with the conntrack code. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-04 21:45:44 +02:00
Pablo Neira Ayuso	b7bd1809e0	netfilter: nfnetlink_queue: get rid of nfnetlink_queue_ct.c The original intention was to avoid dependencies between nfnetlink_queue and conntrack without ifdef pollution. However, we can achieve this by moving the conntrack dependent code into ctnetlink and keep some glue code to access the nfq_ct indirection from nfqueue. After this patch, the nfq_ct indirection is always compiled in the netfilter core to avoid polluting nfqueue with ifdefs. Thus, if nf_conntrack is not compiled this results in only 8-bytes of memory waste in x86_64. This patch also adds ctnetlink_nfqueue_seqadj() to avoid that the nf_conn structure layout if exposed to nf_queue, which creates another dependency with nf_conntrack at compilation time. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-04 21:45:44 +02:00
Eric W. Biederman	b59f2e31b8	ipvs: Don't protect ip_vs_addr_is_unicast with CONFIG_SYSCTL I arranged the code so that the compiler can remove the unecessary bits in ip_vs_leave when CONFIG_SYSCTL is unset, and removed an explicit CONFIG_SYSCTL. Unfortunately when rebasing my work on top of that of Alex Gartrell I missed the fact that the newly added function ip_vs_addr_is_unicast was surrounded by CONFIG_SYSCTL. So remove the now unnecessary CONFIG_SYSCTL guards around ip_vs_addr_is_unicast. It is causing build failures today when CONFIG_SYSCTL is not selected and any self respecting compiler will notice that sysctl_cache_bypass is always false without CONFIG_SYSCTL and not include the logic from the function ip_vs_addr_is_unicast in the compiled code. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-10-01 15:05:15 +09:00
David S. Miller	4bf1b54f9d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following pull request contains Netfilter/IPVS updates for net-next containing 90 patches from Eric Biederman. The main goal of this batch is to avoid recurrent lookups for the netns pointer, that happens over and over again in our Netfilter/IPVS code. The idea consists of passing netns pointer from the hook state to the relevant functions and objects where this may be needed. You can find more information on the IPVS updates from Simon Horman's commit merge message: `c3456026ad` ("Merge tag 'ipvs2-for-v4.4' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next"). Exceptionally, this time, I'm not posting the patches again on netdev, Eric already Cc'ed this mailing list in the original submission. If you need me to make, just let me know. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-29 21:46:21 -07:00
Eric W. Biederman	5f5d74d723	ipv6: Pass struct net into ip6_route_me_harder Don't make ip6_route_me_harder guess which network namespace it is routing in, pass the network namespace in. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-09-29 20:21:32 +02:00
Eric W. Biederman	e45f50660e	ipv4: Pass struct net into ip_route_me_harder Don't make ip_route_me_harder guess which network namespace it is routing in, pass the network namespace in. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-09-29 20:21:32 +02:00
Eric W. Biederman	d815d90bbb	netfilter: Push struct net down into nf_afinfo.reroute The network namespace is needed when routing a packet. Stop making nf_afinfo.reroute guess which network namespace is the proper namespace to route the packet in. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-09-29 20:21:31 +02:00
Eric W. Biederman	372892ec11	ipv4: Push struct net down into nf_send_reset This is needed so struct net can be pushed down into ip_route_me_harder. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-09-29 20:21:31 +02:00
David S. Miller	4963ed48f2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: net/ipv4/arp.c The net/ipv4/arp.c conflict was one commit adding a new local variable while another commit was deleting one. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-26 16:08:27 -07:00
Eric W. Biederman	57781c1cee	ipvs: Pass ipvs into ip_vs_gather_frags This will be needed later when the network namespace guessing is removed from ip_defrag. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:43 +09:00
Eric W. Biederman	9cfdd75b7c	ipvs: Remove skb_sknet This function adds no real value and it obscures what the code is doing. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:43 +09:00
Eric W. Biederman	7d1f88eca0	ipvs: Pass ipvs not net to ip_vs_protocol_net_(init\|cleanup) Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:43 +09:00
Eric W. Biederman	69f390934b	ipvs: Remove net argument from ip_vs_tcp_conn_listen The argument is unnecessary and in practice confusing, and has caused the callers to do all manner of silly things. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:43 +09:00
Eric W. Biederman	a43d1a6b97	ipvs: Pass ipvs through ip_vs_route_me_harder into sysctl_snat_reroute This removes the need to use the hack skb_net. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:43 +09:00
Eric W. Biederman	7b5f689a2c	ipvs: Pass ipvs into ip_vs_out_icmp and ip_vs_out_icmp_v6 This removes the need to compute ipvs with the hack "net_ipvs(skb_net(skb))" Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:43 +09:00
Eric W. Biederman	6f2bcea991	ipvs: Pass ipvs into ip_vs_in_icmp and ip_vs_in_icmp_v6 With ipvs passed into ip_vs_in_icmp and ip_vs_in_icmp_v6 they no longer need to call the hack that is skb_net. Additionally ipvs_in_icmp no longer needs to call dev_net(skb->dev) and can use the ipvs->net instead. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:43 +09:00
Eric W. Biederman	6e385bb3ef	ipvs: Pass ipvs into ip_vs_in Derive ipvs from state->net in the callers of ip_vs_in and pass it into ip_vs_out. Removing the need to use the hack skb_net. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:43 +09:00
Eric W. Biederman	1b75097dd7	ipvs: Pass ipvs into ip_vs_out Derive ipvs from state->net in the callers of ip_vs_out and pass it into ip_vs_out. Removing the need to use the hack skb_net. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:42 +09:00
Eric W. Biederman	2300f0451e	ipvs: Pass ipvs not net into sysctl_nat_icmp_send Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:42 +09:00
Eric W. Biederman	51efbcbbb2	ipvs: Simplify ipvs and net access in ip_vs_leave Stop using the hack skb_net(skb) to compute the network namespace. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:42 +09:00
Eric W. Biederman	5703294874	ipvs: Wrap sysctl_cache_bypass and remove ifdefs in ip_vs_leave With sysctl_cache_bypass now a compile time constant the compiler can figue out that it can elimiate all of the code that depends on sysctl_cache_bypass being true. Also remove the duplicate computation of net previously necessitated by #ifdef CONFIG_SYSCTL Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:42 +09:00
Eric W. Biederman	7c08d78e6f	ipvs: Better derivation of ipvs in ip_vs_in_stats and ip_vs_out_stats Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:42 +09:00
Eric W. Biederman	20868a40d0	ipvs: Pass ipvs into ensure_mtu_is adequate This allows two different ways for computing/guessing net to be removed from ensure_mtu_is_adequate. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:42 +09:00
Eric W. Biederman	f5745f8ae6	ipvs: Pass ipvs into __ip_vs_get_out_rt_v6 Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:42 +09:00
Eric W. Biederman	ecfe87b884	ipvs: Pass ipvs into __ip_vs_get_out_rt Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:41 +09:00
Eric W. Biederman	361c3f5293	ipvs: Better derivation of ipvs in ip_vs_tunnel_xmit Don't use "net_ipvs(skb_net(skb))" as skb_net is a bad hack. Instead use cp->ipvs and ipvs->net for the net. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:41 +09:00
Eric W. Biederman	d8f44c335a	ipvs: Pass ipvs into .conn_schedule and ip_vs_try_to_schedule This moves the hack "net_ipvs(skb_net(skb))" up one level where it will be easier to remove. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:41 +09:00
Eric W. Biederman	2f3edc6a5b	ipvs: Pass ipvs not net into ip_vs_conn_net_init and ip_vs_conn_net_cleanup Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:41 +09:00
Eric W. Biederman	d889717aaf	ipvs: Pass ipvs not net into ip_vs_conn_net_flush Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:41 +09:00
Eric W. Biederman	754b81a357	ipvs: Pass ipvs not net to ip_vs_conn_hashkey Use the address of struct netns_ipvs in the hash not the address of struct net. Both addresses are equally valid candidates and by using the address of struct netns_ipvs there becomes no need deal with struct net in this part of the code. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:41 +09:00
Eric W. Biederman	0cf705c8c2	ipvs: Pass ipvs into conn_out_get Move the hack of relying on "net_ipvs(skb_net(skb))" to derive the ipvs up a layer. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:41 +09:00
Eric W. Biederman	ab16197642	ipvs: Pass ipvs into .conn_in_get and ip_vs_conn_in_get_proto Stop relying on "net_ipvs(skb_net(skb))" to derive the ipvs as skb_net is a hack. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:41 +09:00
Eric W. Biederman	f5099dd4d9	ipvs: Pass ipvs into ip_vs_conn_fill_param_proto Move the ugly hack net_ipvs(skb_net(skb)) up a layer in the call stack so it is easier to remove. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:40 +09:00
Eric W. Biederman	1281a9c2d1	ipvs: Pass ipvs not net into init_netns and exit_netns Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:40 +09:00
Eric W. Biederman	c70bd6800a	ipvs: Pass ipvs not net into [un]register_ip_vs_proto_netns Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:40 +09:00
Eric W. Biederman	b5dd212cc1	ipvs: Pass ipvs not net into ip_vs_app_net_init and ip_vs_app_net_cleanup Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:40 +09:00
Eric W. Biederman	09858708e6	ipvs: Pass ipvs not net into ip_vs_app_inc_release Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:40 +09:00
Eric W. Biederman	9f8128a56e	ipvs: Pass ipvs not net to register_ip_vs_app and unregister_ip_vs_app Also move the tests for net_ipvs being NULL into __ip_vs_ftp_init and __ip_vs_ftp_exit. The only places where they possibly make sense. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:40 +09:00
Eric W. Biederman	3250dc9c52	ipvs: Pass ipvs not net to register_ip_vs_app_inc Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:40 +09:00
Eric W. Biederman	a080ce38a0	ipvs: Pass ipvs not net into ip_vs_app_inc_new Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:40 +09:00
Eric W. Biederman	19648918fb	ipvs: Pass ipvs not net into register_app and unregister_app Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:39 +09:00
Eric W. Biederman	a4dd0360c6	ipvs: Pass ipvs not net to ip_vs_estimator_net_init and ip_vs_estimator_cleanup Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:39 +09:00
Eric W. Biederman	70a131a2c8	ipvs: Pass ipvs not net to estimation_timer Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:39 +09:00
Eric W. Biederman	3d99376689	ipvs: Pass ipvs not net into ip_vs_control_net_(init\|cleanup) Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:39 +09:00
Eric W. Biederman	8b8237a581	ipvs: Pass ipvs not net to ip_vs_control_net_(init\|cleanup)_sysctl Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:39 +09:00
Eric W. Biederman	423b55954d	ipvs: Pass ipvs not net to ip_vs_random_drop_entry Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:39 +09:00
Eric W. Biederman	0f34d54bf4	ipvs: Pass ipvs not net to ip_vs_start_estimator aned ip_vs_stop_estimator Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:39 +09:00
Eric W. Biederman	cacd1e60f1	ipvs: Pass ipvs not net to ip_vs_genl_set_config Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:39 +09:00
Eric W. Biederman	ebea1f7c0b	ipvs: Pass ipvs not net to ip_vs_sync_net_cleanup Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:38 +09:00
Eric W. Biederman	802cb43703	ipvs: Pass ipvs not net to ip_vs_sync_net_init Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:38 +09:00
Eric W. Biederman	1fc12004d2	ipvs: Pass ipvs not net to ip_vs_proc_sync_conn Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:38 +09:00
Eric W. Biederman	4f30665bac	ipvs: Pass ipvs not net to ip_vs_proc_conn Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:38 +09:00
Eric W. Biederman	b61a8c1a40	ipvs: Pass ipvs not net to ip_vs_sync_conn Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:38 +09:00
Eric W. Biederman	72e9481e28	ipvs: Pass ipvs not net to ip_vs_sync_conn_v0 Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:38 +09:00
Eric W. Biederman	7d537f3ab7	ipvs: Pass ipvs not net to ip_vs_process_message Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:38 +09:00
Eric W. Biederman	37b68e6ded	ipvs: Store ipvs not net in struct ip_vs_sync_thread_data In practice struct netns_ipvs is as meaningful as struct net and more useful as it holds the ipvs specific data. So store a pointer to struct netns_ipvs. Update the accesses of tinfo->net to access tinfo->ipvs->net instead. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:38 +09:00
Eric W. Biederman	fd124e2f8b	ipvs: Pass ipvs not net to make_receive_sock Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:37 +09:00
Eric W. Biederman	68c76b6aa0	ipvs: Pass ipvs not net to make_send_sock Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:37 +09:00
Eric W. Biederman	b3cf3cbfb5	ipvs: Pass ipvs not net to stop_sync_thread Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:37 +09:00
Eric W. Biederman	6ac121d710	ipvs: Pass ipvs not net to start_sync_thread Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:37 +09:00
Eric W. Biederman	df04ffb766	ipvs: Pass ipvs not net to ip_vs_genl_del_daemon Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:37 +09:00
Eric W. Biederman	d8443c5f2b	ipvs: Pass ipvs not net to ip_vs_genl_new_daemon Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:37 +09:00
Eric W. Biederman	34c2f5146c	ipvs: Pass ipvs not net to ip_vs_genl_find_service Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:37 +09:00
Eric W. Biederman	613fb830b7	ipvs: Pass ipvs not net to ip_vs_genl_parse_service Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:37 +09:00
Eric W. Biederman	af5403419d	ipvs: Pass ipvs not net to __ip_vs_get_timeouts Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:36 +09:00
Eric W. Biederman	08fff4c357	ipvs: Pass ipvs not net to __ip_vs_get_dest_entries Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:36 +09:00
Eric W. Biederman	b2876b7773	ipvs: Pass ipvs not net to __ip_vs_get_service_entries Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:36 +09:00
Eric W. Biederman	f1faa1e749	ipvs: Pass ipvs not net to ip_vs_set_timeout Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:35 +09:00
Eric W. Biederman	18d6ade63c	ipvs: Pass ipvs not net to ip_vs_proto_data_get Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:35 +09:00
Eric W. Biederman	a47b430080	ipvs: Cache ipvs in ip_vs_in_icmp and ip_vs_in_icmp_v6 Storte the value of net_ipvs in a variable named ipvs so that when there are more users struct netns_ipvs in ip_vs_in_cmp and ip_vs_in_icmp_v6 they won't need to compute the value again. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:35 +09:00
Eric W. Biederman	c60856c687	ipvs: Pass ipvs not net to ip_vs_zero_all Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:35 +09:00
Eric W. Biederman	56d2169b77	ipvs: Pass ipvs not net to ip_vs_service_net_cleanup Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:35 +09:00
Eric W. Biederman	ef7c599d91	ipvs: Pass ipvs not net to ip_vs_flush Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:35 +09:00
Eric W. Biederman	5060bd8307	ipvs: Pass ipvs not net to ip_vs_add_service Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-09-24 09:34:35 +09:00

... 3 4 5 6 7 ...

3589 Commits