2006-01-05 21:19:05 +01:00
|
|
|
/* Connection tracking via netlink socket. Allows for user space
|
|
|
|
* protocol helpers and general trouble making from userspace.
|
|
|
|
*
|
|
|
|
* (C) 2001 by Jay Schulist <jschlst@samba.org>
|
2006-03-21 02:56:32 +01:00
|
|
|
* (C) 2002-2006 by Harald Welte <laforge@gnumonks.org>
|
2006-01-05 21:19:05 +01:00
|
|
|
* (C) 2003 by Patrick Mchardy <kaber@trash.net>
|
2012-06-26 20:27:09 +02:00
|
|
|
* (C) 2005-2012 by Pablo Neira Ayuso <pablo@netfilter.org>
|
2006-01-05 21:19:05 +01:00
|
|
|
*
|
2007-02-12 20:15:49 +01:00
|
|
|
* Initial connection tracking via netlink development funded and
|
2006-01-05 21:19:05 +01:00
|
|
|
* generally made possible by Network Robots, Inc. (www.networkrobots.com)
|
|
|
|
*
|
|
|
|
* Further development of this code funded by Astaro AG (http://www.astaro.com)
|
|
|
|
*
|
|
|
|
* This software may be used and distributed according to the terms
|
|
|
|
* of the GNU General Public License, incorporated herein by reference.
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include <linux/init.h>
|
|
|
|
#include <linux/module.h>
|
|
|
|
#include <linux/kernel.h>
|
2008-05-17 08:26:25 +02:00
|
|
|
#include <linux/rculist.h>
|
2009-03-25 21:05:46 +01:00
|
|
|
#include <linux/rculist_nulls.h>
|
2006-01-05 21:19:05 +01:00
|
|
|
#include <linux/types.h>
|
|
|
|
#include <linux/timer.h>
|
2010-10-13 22:24:54 +02:00
|
|
|
#include <linux/security.h>
|
2006-01-05 21:19:05 +01:00
|
|
|
#include <linux/skbuff.h>
|
|
|
|
#include <linux/errno.h>
|
|
|
|
#include <linux/netlink.h>
|
|
|
|
#include <linux/spinlock.h>
|
2006-06-27 12:00:35 +02:00
|
|
|
#include <linux/interrupt.h>
|
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 09:04:11 +01:00
|
|
|
#include <linux/slab.h>
|
2006-01-05 21:19:05 +01:00
|
|
|
|
|
|
|
#include <linux/netfilter.h>
|
2007-03-26 08:06:12 +02:00
|
|
|
#include <net/netlink.h>
|
2010-01-13 16:04:18 +01:00
|
|
|
#include <net/sock.h>
|
2006-01-05 21:19:05 +01:00
|
|
|
#include <net/netfilter/nf_conntrack.h>
|
|
|
|
#include <net/netfilter/nf_conntrack_core.h>
|
2006-11-29 02:34:58 +01:00
|
|
|
#include <net/netfilter/nf_conntrack_expect.h>
|
2006-01-05 21:19:05 +01:00
|
|
|
#include <net/netfilter/nf_conntrack_helper.h>
|
2013-08-27 08:50:12 +02:00
|
|
|
#include <net/netfilter/nf_conntrack_seqadj.h>
|
2006-01-05 21:19:05 +01:00
|
|
|
#include <net/netfilter/nf_conntrack_l3proto.h>
|
2006-11-29 02:35:06 +01:00
|
|
|
#include <net/netfilter/nf_conntrack_l4proto.h>
|
2006-12-03 07:07:13 +01:00
|
|
|
#include <net/netfilter/nf_conntrack_tuple.h>
|
netfilter: accounting rework: ct_extend + 64bit counters (v4)
Initially netfilter has had 64bit counters for conntrack-based accounting, but
it was changed in 2.6.14 to save memory. Unfortunately in-kernel 64bit counters are
still required, for example for "connbytes" extension. However, 64bit counters
waste a lot of memory and it was not possible to enable/disable it runtime.
This patch:
- reimplements accounting with respect to the extension infrastructure,
- makes one global version of seq_print_acct() instead of two seq_print_counters(),
- makes it possible to enable it at boot time (for CONFIG_SYSCTL/CONFIG_SYSFS=n),
- makes it possible to enable/disable it at runtime by sysctl or sysfs,
- extends counters from 32bit to 64bit,
- renames ip_conntrack_counter -> nf_conn_counter,
- enables accounting code unconditionally (no longer depends on CONFIG_NF_CT_ACCT),
- set initial accounting enable state based on CONFIG_NF_CT_ACCT
- removes buggy IPCT_COUNTER_FILLING event handling.
If accounting is enabled newly created connections get additional acct extend.
Old connections are not changed as it is not possible to add a ct_extend area
to confirmed conntrack. Accounting is performed for all connections with
acct extend regardless of a current state of "net.netfilter.nf_conntrack_acct".
Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-21 19:01:34 +02:00
|
|
|
#include <net/netfilter/nf_conntrack_acct.h>
|
2010-02-15 18:14:57 +01:00
|
|
|
#include <net/netfilter/nf_conntrack_zones.h>
|
2011-01-19 16:00:07 +01:00
|
|
|
#include <net/netfilter/nf_conntrack_timestamp.h>
|
2013-01-11 07:30:44 +01:00
|
|
|
#include <net/netfilter/nf_conntrack_labels.h>
|
2006-12-03 07:07:13 +01:00
|
|
|
#ifdef CONFIG_NF_NAT_NEEDED
|
|
|
|
#include <net/netfilter/nf_nat_core.h>
|
2012-08-26 19:14:06 +02:00
|
|
|
#include <net/netfilter/nf_nat_l4proto.h>
|
2012-06-07 13:31:25 +02:00
|
|
|
#include <net/netfilter/nf_nat_helper.h>
|
2006-12-03 07:07:13 +01:00
|
|
|
#endif
|
2006-01-05 21:19:05 +01:00
|
|
|
|
|
|
|
#include <linux/netfilter/nfnetlink.h>
|
|
|
|
#include <linux/netfilter/nfnetlink_conntrack.h>
|
|
|
|
|
|
|
|
MODULE_LICENSE("GPL");
|
|
|
|
|
2006-03-21 02:56:32 +01:00
|
|
|
static char __initdata version[] = "0.93";
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2016-04-12 23:32:34 +02:00
|
|
|
static int ctnetlink_dump_tuples_proto(struct sk_buff *skb,
|
|
|
|
const struct nf_conntrack_tuple *tuple,
|
|
|
|
struct nf_conntrack_l4proto *l4proto)
|
2006-01-05 21:19:05 +01:00
|
|
|
{
|
|
|
|
int ret = 0;
|
2007-09-28 23:37:03 +02:00
|
|
|
struct nlattr *nest_parms;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
nest_parms = nla_nest_start(skb, CTA_TUPLE_PROTO | NLA_F_NESTED);
|
|
|
|
if (!nest_parms)
|
|
|
|
goto nla_put_failure;
|
2012-04-02 00:57:48 +02:00
|
|
|
if (nla_put_u8(skb, CTA_PROTO_NUM, tuple->dst.protonum))
|
|
|
|
goto nla_put_failure;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2007-09-28 23:37:41 +02:00
|
|
|
if (likely(l4proto->tuple_to_nlattr))
|
|
|
|
ret = l4proto->tuple_to_nlattr(skb, tuple);
|
2007-02-12 20:15:49 +01:00
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
nla_nest_end(skb, nest_parms);
|
2006-01-05 21:19:05 +01:00
|
|
|
|
|
|
|
return ret;
|
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
nla_put_failure:
|
2006-01-05 21:19:05 +01:00
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
2016-04-12 23:32:34 +02:00
|
|
|
static int ctnetlink_dump_tuples_ip(struct sk_buff *skb,
|
|
|
|
const struct nf_conntrack_tuple *tuple,
|
|
|
|
struct nf_conntrack_l3proto *l3proto)
|
2006-01-05 21:19:05 +01:00
|
|
|
{
|
|
|
|
int ret = 0;
|
2007-09-28 23:37:03 +02:00
|
|
|
struct nlattr *nest_parms;
|
|
|
|
|
|
|
|
nest_parms = nla_nest_start(skb, CTA_TUPLE_IP | NLA_F_NESTED);
|
|
|
|
if (!nest_parms)
|
|
|
|
goto nla_put_failure;
|
2006-03-22 22:54:15 +01:00
|
|
|
|
2007-09-28 23:37:41 +02:00
|
|
|
if (likely(l3proto->tuple_to_nlattr))
|
|
|
|
ret = l3proto->tuple_to_nlattr(skb, tuple);
|
2006-03-22 22:54:15 +01:00
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
nla_nest_end(skb, nest_parms);
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2006-03-22 22:54:15 +01:00
|
|
|
return ret;
|
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
nla_put_failure:
|
2006-03-22 22:54:15 +01:00
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
2016-04-12 23:32:34 +02:00
|
|
|
static int ctnetlink_dump_tuples(struct sk_buff *skb,
|
|
|
|
const struct nf_conntrack_tuple *tuple)
|
2006-03-22 22:54:15 +01:00
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
struct nf_conntrack_l3proto *l3proto;
|
2006-11-29 02:35:06 +01:00
|
|
|
struct nf_conntrack_l4proto *l4proto;
|
2006-03-22 22:54:15 +01:00
|
|
|
|
2012-03-05 03:24:29 +01:00
|
|
|
rcu_read_lock();
|
2008-11-17 16:00:40 +01:00
|
|
|
l3proto = __nf_ct_l3proto_find(tuple->src.l3num);
|
2006-03-22 22:54:15 +01:00
|
|
|
ret = ctnetlink_dump_tuples_ip(skb, tuple, l3proto);
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2012-03-05 03:24:29 +01:00
|
|
|
if (ret >= 0) {
|
|
|
|
l4proto = __nf_ct_l4proto_find(tuple->src.l3num,
|
|
|
|
tuple->dst.protonum);
|
|
|
|
ret = ctnetlink_dump_tuples_proto(skb, tuple, l4proto);
|
|
|
|
}
|
|
|
|
rcu_read_unlock();
|
2006-01-05 21:19:05 +01:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2016-04-12 23:32:34 +02:00
|
|
|
static int ctnetlink_dump_zone_id(struct sk_buff *skb, int attrtype,
|
|
|
|
const struct nf_conntrack_zone *zone, int dir)
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
{
|
|
|
|
if (zone->id == NF_CT_DEFAULT_ZONE_ID || zone->dir != dir)
|
|
|
|
return 0;
|
|
|
|
if (nla_put_be16(skb, attrtype, htons(zone->id)))
|
|
|
|
goto nla_put_failure;
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
nla_put_failure:
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
2016-04-12 23:32:34 +02:00
|
|
|
static int ctnetlink_dump_status(struct sk_buff *skb, const struct nf_conn *ct)
|
2006-01-05 21:19:05 +01:00
|
|
|
{
|
2012-04-02 00:57:48 +02:00
|
|
|
if (nla_put_be32(skb, CTA_STATUS, htonl(ct->status)))
|
|
|
|
goto nla_put_failure;
|
2006-01-05 21:19:05 +01:00
|
|
|
return 0;
|
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
nla_put_failure:
|
2006-01-05 21:19:05 +01:00
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
2016-04-12 23:32:34 +02:00
|
|
|
static int ctnetlink_dump_timeout(struct sk_buff *skb, const struct nf_conn *ct)
|
2006-01-05 21:19:05 +01:00
|
|
|
{
|
2016-08-03 02:45:07 +02:00
|
|
|
long timeout = nf_ct_expires(ct) / HZ;
|
2007-02-12 20:15:49 +01:00
|
|
|
|
2012-04-02 00:57:48 +02:00
|
|
|
if (nla_put_be32(skb, CTA_TIMEOUT, htonl(timeout)))
|
|
|
|
goto nla_put_failure;
|
2006-01-05 21:19:05 +01:00
|
|
|
return 0;
|
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
nla_put_failure:
|
2006-01-05 21:19:05 +01:00
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
2016-04-12 23:32:34 +02:00
|
|
|
static int ctnetlink_dump_protoinfo(struct sk_buff *skb, struct nf_conn *ct)
|
2006-01-05 21:19:05 +01:00
|
|
|
{
|
2008-04-14 11:15:52 +02:00
|
|
|
struct nf_conntrack_l4proto *l4proto;
|
2007-09-28 23:37:03 +02:00
|
|
|
struct nlattr *nest_proto;
|
2006-01-05 21:19:05 +01:00
|
|
|
int ret;
|
|
|
|
|
2008-11-17 16:00:40 +01:00
|
|
|
l4proto = __nf_ct_l4proto_find(nf_ct_l3num(ct), nf_ct_protonum(ct));
|
|
|
|
if (!l4proto->to_nlattr)
|
2006-01-05 21:19:05 +01:00
|
|
|
return 0;
|
2007-02-12 20:15:49 +01:00
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
nest_proto = nla_nest_start(skb, CTA_PROTOINFO | NLA_F_NESTED);
|
|
|
|
if (!nest_proto)
|
|
|
|
goto nla_put_failure;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2007-09-28 23:37:41 +02:00
|
|
|
ret = l4proto->to_nlattr(skb, nest_proto, ct);
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
nla_nest_end(skb, nest_proto);
|
2006-01-05 21:19:05 +01:00
|
|
|
|
|
|
|
return ret;
|
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
nla_put_failure:
|
2006-01-05 21:19:05 +01:00
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
2016-04-12 23:32:34 +02:00
|
|
|
static int ctnetlink_dump_helpinfo(struct sk_buff *skb,
|
|
|
|
const struct nf_conn *ct)
|
2006-01-05 21:19:05 +01:00
|
|
|
{
|
2007-09-28 23:37:03 +02:00
|
|
|
struct nlattr *nest_helper;
|
2006-03-21 02:56:32 +01:00
|
|
|
const struct nf_conn_help *help = nfct_help(ct);
|
2007-06-05 21:55:27 +02:00
|
|
|
struct nf_conntrack_helper *helper;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2007-06-05 21:55:27 +02:00
|
|
|
if (!help)
|
2006-01-05 21:19:05 +01:00
|
|
|
return 0;
|
2007-02-12 20:15:49 +01:00
|
|
|
|
2007-06-05 21:55:27 +02:00
|
|
|
helper = rcu_dereference(help->helper);
|
|
|
|
if (!helper)
|
|
|
|
goto out;
|
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
nest_helper = nla_nest_start(skb, CTA_HELP | NLA_F_NESTED);
|
|
|
|
if (!nest_helper)
|
|
|
|
goto nla_put_failure;
|
2012-04-02 00:57:48 +02:00
|
|
|
if (nla_put_string(skb, CTA_HELP_NAME, helper->name))
|
|
|
|
goto nla_put_failure;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2007-09-28 23:37:41 +02:00
|
|
|
if (helper->to_nlattr)
|
|
|
|
helper->to_nlattr(skb, ct);
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
nla_nest_end(skb, nest_helper);
|
2007-06-05 21:55:27 +02:00
|
|
|
out:
|
2006-01-05 21:19:05 +01:00
|
|
|
return 0;
|
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
nla_put_failure:
|
2006-01-05 21:19:05 +01:00
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
[NETFILTER]: Kill some supper dupper bloatry
/me awards the bloatiest-of-all-net/-.c-code award to
nf_conntrack_netlink.c, congratulations to all the authors :-/!
Hall of (unquestionable) fame (measured per inline, top 10 under
net/):
-4496 ctnetlink_parse_tuple netfilter/nf_conntrack_netlink.c
-2165 ctnetlink_dump_tuples netfilter/nf_conntrack_netlink.c
-2115 __ip_vs_get_out_rt ipv4/ipvs/ip_vs_xmit.c
-1924 xfrm_audit_helper_pktinfo xfrm/xfrm_state.c
-1799 ctnetlink_parse_tuple_proto netfilter/nf_conntrack_netlink.c
-1268 ctnetlink_parse_tuple_ip netfilter/nf_conntrack_netlink.c
-1093 ctnetlink_exp_dump_expect netfilter/nf_conntrack_netlink.c
-1060 void ccid3_update_send_interval dccp/ccids/ccid3.c
-983 ctnetlink_dump_tuples_proto netfilter/nf_conntrack_netlink.c
-827 ctnetlink_exp_dump_tuple netfilter/nf_conntrack_netlink.c
(i386 / gcc (GCC) 4.1.2 20070626 (Red Hat 4.1.2-13) /
allyesconfig except CONFIG_FORCED_INLINING)
...and I left < 200 byte gains as future work item.
After iterative inline removal, I finally have this:
net/netfilter/nf_conntrack_netlink.c:
ctnetlink_exp_fill_info | -1104
ctnetlink_new_expect | -1572
ctnetlink_fill_info | -1303
ctnetlink_new_conntrack | -2230
ctnetlink_get_expect | -341
ctnetlink_del_expect | -352
ctnetlink_expect_event | -1110
ctnetlink_conntrack_event | -1548
ctnetlink_del_conntrack | -729
ctnetlink_get_conntrack | -728
10 functions changed, 11017 bytes removed, diff: -11017
net/netfilter/nf_conntrack_netlink.c:
ctnetlink_parse_tuple | +419
dump_nat_seq_adj | +183
ctnetlink_dump_counters | +166
ctnetlink_dump_tuples | +261
ctnetlink_exp_dump_expect | +633
ctnetlink_change_status | +460
6 functions changed, 2122 bytes added, diff: +2122
net/netfilter/nf_conntrack_netlink.o:
16 functions changed, 2122 bytes added, 11017 bytes removed, diff: -8895
Without a number of CONFIG.*DEBUGs, I got this:
net/netfilter/nf_conntrack_netlink.o:
16 functions changed, 2122 bytes added, 11029 bytes removed, diff: -8907
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-06 08:11:31 +01:00
|
|
|
static int
|
2013-09-26 17:31:52 +02:00
|
|
|
dump_counters(struct sk_buff *skb, struct nf_conn_acct *acct,
|
|
|
|
enum ip_conntrack_dir dir, int type)
|
2006-01-05 21:19:05 +01:00
|
|
|
{
|
2013-09-26 17:31:52 +02:00
|
|
|
enum ctattr_type attr = dir ? CTA_COUNTERS_REPLY: CTA_COUNTERS_ORIG;
|
|
|
|
struct nf_conn_counter *counter = acct->counter;
|
2007-09-28 23:37:03 +02:00
|
|
|
struct nlattr *nest_count;
|
2013-09-26 17:31:52 +02:00
|
|
|
u64 pkts, bytes;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2013-09-26 17:31:52 +02:00
|
|
|
if (type == IPCTNL_MSG_CT_GET_CTRZERO) {
|
|
|
|
pkts = atomic64_xchg(&counter[dir].packets, 0);
|
|
|
|
bytes = atomic64_xchg(&counter[dir].bytes, 0);
|
|
|
|
} else {
|
|
|
|
pkts = atomic64_read(&counter[dir].packets);
|
|
|
|
bytes = atomic64_read(&counter[dir].bytes);
|
|
|
|
}
|
|
|
|
|
|
|
|
nest_count = nla_nest_start(skb, attr | NLA_F_NESTED);
|
2007-09-28 23:37:03 +02:00
|
|
|
if (!nest_count)
|
|
|
|
goto nla_put_failure;
|
|
|
|
|
2016-04-22 17:31:18 +02:00
|
|
|
if (nla_put_be64(skb, CTA_COUNTERS_PACKETS, cpu_to_be64(pkts),
|
|
|
|
CTA_COUNTERS_PAD) ||
|
|
|
|
nla_put_be64(skb, CTA_COUNTERS_BYTES, cpu_to_be64(bytes),
|
|
|
|
CTA_COUNTERS_PAD))
|
2012-04-02 00:57:48 +02:00
|
|
|
goto nla_put_failure;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
nla_nest_end(skb, nest_count);
|
2006-01-05 21:19:05 +01:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
nla_put_failure:
|
2006-01-05 21:19:05 +01:00
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
2011-12-24 14:11:39 +01:00
|
|
|
static int
|
2013-09-26 17:31:52 +02:00
|
|
|
ctnetlink_dump_acct(struct sk_buff *skb, const struct nf_conn *ct, int type)
|
2011-12-24 14:11:39 +01:00
|
|
|
{
|
2013-09-26 17:31:52 +02:00
|
|
|
struct nf_conn_acct *acct = nf_conn_acct_find(ct);
|
2011-12-24 14:11:39 +01:00
|
|
|
|
|
|
|
if (!acct)
|
|
|
|
return 0;
|
|
|
|
|
2013-09-26 17:31:52 +02:00
|
|
|
if (dump_counters(skb, acct, IP_CT_DIR_ORIGINAL, type) < 0)
|
|
|
|
return -1;
|
|
|
|
if (dump_counters(skb, acct, IP_CT_DIR_REPLY, type) < 0)
|
|
|
|
return -1;
|
|
|
|
|
|
|
|
return 0;
|
2011-12-24 14:11:39 +01:00
|
|
|
}
|
|
|
|
|
2011-01-19 16:00:07 +01:00
|
|
|
static int
|
|
|
|
ctnetlink_dump_timestamp(struct sk_buff *skb, const struct nf_conn *ct)
|
|
|
|
{
|
|
|
|
struct nlattr *nest_count;
|
|
|
|
const struct nf_conn_tstamp *tstamp;
|
|
|
|
|
|
|
|
tstamp = nf_conn_tstamp_find(ct);
|
|
|
|
if (!tstamp)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
nest_count = nla_nest_start(skb, CTA_TIMESTAMP | NLA_F_NESTED);
|
|
|
|
if (!nest_count)
|
|
|
|
goto nla_put_failure;
|
|
|
|
|
2016-04-22 17:31:18 +02:00
|
|
|
if (nla_put_be64(skb, CTA_TIMESTAMP_START, cpu_to_be64(tstamp->start),
|
|
|
|
CTA_TIMESTAMP_PAD) ||
|
2012-04-02 00:57:48 +02:00
|
|
|
(tstamp->stop != 0 && nla_put_be64(skb, CTA_TIMESTAMP_STOP,
|
2016-04-22 17:31:18 +02:00
|
|
|
cpu_to_be64(tstamp->stop),
|
|
|
|
CTA_TIMESTAMP_PAD)))
|
2012-04-02 00:57:48 +02:00
|
|
|
goto nla_put_failure;
|
2011-01-19 16:00:07 +01:00
|
|
|
nla_nest_end(skb, nest_count);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
nla_put_failure:
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
2006-01-05 21:19:05 +01:00
|
|
|
#ifdef CONFIG_NF_CONNTRACK_MARK
|
2016-04-12 23:32:34 +02:00
|
|
|
static int ctnetlink_dump_mark(struct sk_buff *skb, const struct nf_conn *ct)
|
2006-01-05 21:19:05 +01:00
|
|
|
{
|
2012-04-02 00:57:48 +02:00
|
|
|
if (nla_put_be32(skb, CTA_MARK, htonl(ct->mark)))
|
|
|
|
goto nla_put_failure;
|
2006-01-05 21:19:05 +01:00
|
|
|
return 0;
|
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
nla_put_failure:
|
2006-01-05 21:19:05 +01:00
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
#else
|
|
|
|
#define ctnetlink_dump_mark(a, b) (0)
|
|
|
|
#endif
|
|
|
|
|
2007-12-18 07:28:41 +01:00
|
|
|
#ifdef CONFIG_NF_CONNTRACK_SECMARK
|
2016-04-12 23:32:34 +02:00
|
|
|
static int ctnetlink_dump_secctx(struct sk_buff *skb, const struct nf_conn *ct)
|
2007-12-18 07:28:41 +01:00
|
|
|
{
|
2010-10-13 22:24:54 +02:00
|
|
|
struct nlattr *nest_secctx;
|
|
|
|
int len, ret;
|
|
|
|
char *secctx;
|
|
|
|
|
|
|
|
ret = security_secid_to_secctx(ct->secmark, &secctx, &len);
|
|
|
|
if (ret)
|
2011-01-06 20:25:00 +01:00
|
|
|
return 0;
|
2010-10-13 22:24:54 +02:00
|
|
|
|
|
|
|
ret = -1;
|
|
|
|
nest_secctx = nla_nest_start(skb, CTA_SECCTX | NLA_F_NESTED);
|
|
|
|
if (!nest_secctx)
|
|
|
|
goto nla_put_failure;
|
2007-12-18 07:28:41 +01:00
|
|
|
|
2012-04-02 00:57:48 +02:00
|
|
|
if (nla_put_string(skb, CTA_SECCTX_NAME, secctx))
|
|
|
|
goto nla_put_failure;
|
2010-10-13 22:24:54 +02:00
|
|
|
nla_nest_end(skb, nest_secctx);
|
|
|
|
|
|
|
|
ret = 0;
|
2007-12-18 07:28:41 +01:00
|
|
|
nla_put_failure:
|
2010-10-13 22:24:54 +02:00
|
|
|
security_release_secctx(secctx, len);
|
|
|
|
return ret;
|
2007-12-18 07:28:41 +01:00
|
|
|
}
|
|
|
|
#else
|
2010-10-13 22:24:54 +02:00
|
|
|
#define ctnetlink_dump_secctx(a, b) (0)
|
2007-12-18 07:28:41 +01:00
|
|
|
#endif
|
|
|
|
|
2013-01-11 07:30:45 +01:00
|
|
|
#ifdef CONFIG_NF_CONNTRACK_LABELS
|
2016-04-15 12:24:57 +02:00
|
|
|
static inline int ctnetlink_label_size(const struct nf_conn *ct)
|
2013-01-11 07:30:45 +01:00
|
|
|
{
|
|
|
|
struct nf_conn_labels *labels = nf_ct_labels_find(ct);
|
|
|
|
|
|
|
|
if (!labels)
|
|
|
|
return 0;
|
2016-07-21 12:51:16 +02:00
|
|
|
return nla_total_size(sizeof(labels->bits));
|
2013-01-11 07:30:45 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
ctnetlink_dump_labels(struct sk_buff *skb, const struct nf_conn *ct)
|
|
|
|
{
|
|
|
|
struct nf_conn_labels *labels = nf_ct_labels_find(ct);
|
2016-07-21 12:51:16 +02:00
|
|
|
unsigned int i;
|
2013-01-11 07:30:45 +01:00
|
|
|
|
|
|
|
if (!labels)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
i = 0;
|
|
|
|
do {
|
|
|
|
if (labels->bits[i] != 0)
|
2016-07-21 12:51:16 +02:00
|
|
|
return nla_put(skb, CTA_LABELS, sizeof(labels->bits),
|
|
|
|
labels->bits);
|
2013-01-11 07:30:45 +01:00
|
|
|
i++;
|
2016-07-21 12:51:16 +02:00
|
|
|
} while (i < ARRAY_SIZE(labels->bits));
|
2013-01-11 07:30:45 +01:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
#else
|
|
|
|
#define ctnetlink_dump_labels(a, b) (0)
|
|
|
|
#define ctnetlink_label_size(a) (0)
|
|
|
|
#endif
|
|
|
|
|
2007-12-18 07:28:19 +01:00
|
|
|
#define master_tuple(ct) &(ct->master->tuplehash[IP_CT_DIR_ORIGINAL].tuple)
|
|
|
|
|
2016-04-12 23:32:34 +02:00
|
|
|
static int ctnetlink_dump_master(struct sk_buff *skb, const struct nf_conn *ct)
|
2007-12-18 07:28:19 +01:00
|
|
|
{
|
|
|
|
struct nlattr *nest_parms;
|
|
|
|
|
|
|
|
if (!(ct->status & IPS_EXPECTED))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
nest_parms = nla_nest_start(skb, CTA_TUPLE_MASTER | NLA_F_NESTED);
|
|
|
|
if (!nest_parms)
|
|
|
|
goto nla_put_failure;
|
|
|
|
if (ctnetlink_dump_tuples(skb, master_tuple(ct)) < 0)
|
|
|
|
goto nla_put_failure;
|
|
|
|
nla_nest_end(skb, nest_parms);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
nla_put_failure:
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
[NETFILTER]: Kill some supper dupper bloatry
/me awards the bloatiest-of-all-net/-.c-code award to
nf_conntrack_netlink.c, congratulations to all the authors :-/!
Hall of (unquestionable) fame (measured per inline, top 10 under
net/):
-4496 ctnetlink_parse_tuple netfilter/nf_conntrack_netlink.c
-2165 ctnetlink_dump_tuples netfilter/nf_conntrack_netlink.c
-2115 __ip_vs_get_out_rt ipv4/ipvs/ip_vs_xmit.c
-1924 xfrm_audit_helper_pktinfo xfrm/xfrm_state.c
-1799 ctnetlink_parse_tuple_proto netfilter/nf_conntrack_netlink.c
-1268 ctnetlink_parse_tuple_ip netfilter/nf_conntrack_netlink.c
-1093 ctnetlink_exp_dump_expect netfilter/nf_conntrack_netlink.c
-1060 void ccid3_update_send_interval dccp/ccids/ccid3.c
-983 ctnetlink_dump_tuples_proto netfilter/nf_conntrack_netlink.c
-827 ctnetlink_exp_dump_tuple netfilter/nf_conntrack_netlink.c
(i386 / gcc (GCC) 4.1.2 20070626 (Red Hat 4.1.2-13) /
allyesconfig except CONFIG_FORCED_INLINING)
...and I left < 200 byte gains as future work item.
After iterative inline removal, I finally have this:
net/netfilter/nf_conntrack_netlink.c:
ctnetlink_exp_fill_info | -1104
ctnetlink_new_expect | -1572
ctnetlink_fill_info | -1303
ctnetlink_new_conntrack | -2230
ctnetlink_get_expect | -341
ctnetlink_del_expect | -352
ctnetlink_expect_event | -1110
ctnetlink_conntrack_event | -1548
ctnetlink_del_conntrack | -729
ctnetlink_get_conntrack | -728
10 functions changed, 11017 bytes removed, diff: -11017
net/netfilter/nf_conntrack_netlink.c:
ctnetlink_parse_tuple | +419
dump_nat_seq_adj | +183
ctnetlink_dump_counters | +166
ctnetlink_dump_tuples | +261
ctnetlink_exp_dump_expect | +633
ctnetlink_change_status | +460
6 functions changed, 2122 bytes added, diff: +2122
net/netfilter/nf_conntrack_netlink.o:
16 functions changed, 2122 bytes added, 11017 bytes removed, diff: -8895
Without a number of CONFIG.*DEBUGs, I got this:
net/netfilter/nf_conntrack_netlink.o:
16 functions changed, 2122 bytes added, 11029 bytes removed, diff: -8907
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-06 08:11:31 +01:00
|
|
|
static int
|
2013-08-27 08:50:12 +02:00
|
|
|
dump_ct_seq_adj(struct sk_buff *skb, const struct nf_ct_seqadj *seq, int type)
|
2007-12-18 07:28:00 +01:00
|
|
|
{
|
|
|
|
struct nlattr *nest_parms;
|
|
|
|
|
|
|
|
nest_parms = nla_nest_start(skb, type | NLA_F_NESTED);
|
|
|
|
if (!nest_parms)
|
|
|
|
goto nla_put_failure;
|
|
|
|
|
2013-08-27 08:50:12 +02:00
|
|
|
if (nla_put_be32(skb, CTA_SEQADJ_CORRECTION_POS,
|
|
|
|
htonl(seq->correction_pos)) ||
|
|
|
|
nla_put_be32(skb, CTA_SEQADJ_OFFSET_BEFORE,
|
|
|
|
htonl(seq->offset_before)) ||
|
|
|
|
nla_put_be32(skb, CTA_SEQADJ_OFFSET_AFTER,
|
|
|
|
htonl(seq->offset_after)))
|
2012-04-02 00:57:48 +02:00
|
|
|
goto nla_put_failure;
|
2007-12-18 07:28:00 +01:00
|
|
|
|
|
|
|
nla_nest_end(skb, nest_parms);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
nla_put_failure:
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
2016-04-12 23:32:34 +02:00
|
|
|
static int ctnetlink_dump_ct_seq_adj(struct sk_buff *skb,
|
|
|
|
const struct nf_conn *ct)
|
2007-12-18 07:28:00 +01:00
|
|
|
{
|
2013-08-27 08:50:12 +02:00
|
|
|
struct nf_conn_seqadj *seqadj = nfct_seqadj(ct);
|
|
|
|
struct nf_ct_seqadj *seq;
|
2007-12-18 07:28:00 +01:00
|
|
|
|
2013-08-27 08:50:12 +02:00
|
|
|
if (!(ct->status & IPS_SEQ_ADJUST) || !seqadj)
|
2007-12-18 07:28:00 +01:00
|
|
|
return 0;
|
|
|
|
|
2013-08-27 08:50:12 +02:00
|
|
|
seq = &seqadj->seq[IP_CT_DIR_ORIGINAL];
|
|
|
|
if (dump_ct_seq_adj(skb, seq, CTA_SEQ_ADJ_ORIG) == -1)
|
2007-12-18 07:28:00 +01:00
|
|
|
return -1;
|
|
|
|
|
2013-08-27 08:50:12 +02:00
|
|
|
seq = &seqadj->seq[IP_CT_DIR_REPLY];
|
|
|
|
if (dump_ct_seq_adj(skb, seq, CTA_SEQ_ADJ_REPLY) == -1)
|
2007-12-18 07:28:00 +01:00
|
|
|
return -1;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2016-04-12 23:32:34 +02:00
|
|
|
static int ctnetlink_dump_id(struct sk_buff *skb, const struct nf_conn *ct)
|
2006-01-05 21:19:05 +01:00
|
|
|
{
|
2012-04-02 00:57:48 +02:00
|
|
|
if (nla_put_be32(skb, CTA_ID, htonl((unsigned long)ct)))
|
|
|
|
goto nla_put_failure;
|
2006-01-05 21:19:05 +01:00
|
|
|
return 0;
|
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
nla_put_failure:
|
2006-01-05 21:19:05 +01:00
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
2016-04-12 23:32:34 +02:00
|
|
|
static int ctnetlink_dump_use(struct sk_buff *skb, const struct nf_conn *ct)
|
2006-01-05 21:19:05 +01:00
|
|
|
{
|
2012-04-02 00:57:48 +02:00
|
|
|
if (nla_put_be32(skb, CTA_USE, htonl(atomic_read(&ct->ct_general.use))))
|
|
|
|
goto nla_put_failure;
|
2006-01-05 21:19:05 +01:00
|
|
|
return 0;
|
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
nla_put_failure:
|
2006-01-05 21:19:05 +01:00
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2012-09-07 22:12:54 +02:00
|
|
|
ctnetlink_fill_info(struct sk_buff *skb, u32 portid, u32 seq, u32 type,
|
2011-12-24 14:11:39 +01:00
|
|
|
struct nf_conn *ct)
|
2006-01-05 21:19:05 +01:00
|
|
|
{
|
2015-08-08 21:40:01 +02:00
|
|
|
const struct nf_conntrack_zone *zone;
|
2006-01-05 21:19:05 +01:00
|
|
|
struct nlmsghdr *nlh;
|
|
|
|
struct nfgenmsg *nfmsg;
|
2007-09-28 23:37:03 +02:00
|
|
|
struct nlattr *nest_parms;
|
2012-09-07 22:12:54 +02:00
|
|
|
unsigned int flags = portid ? NLM_F_MULTI : 0, event;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2011-12-24 14:11:39 +01:00
|
|
|
event = (NFNL_SUBSYS_CTNETLINK << 8 | IPCTNL_MSG_CT_NEW);
|
2012-09-07 22:12:54 +02:00
|
|
|
nlh = nlmsg_put(skb, portid, seq, event, sizeof(*nfmsg), flags);
|
2009-06-02 20:07:39 +02:00
|
|
|
if (nlh == NULL)
|
|
|
|
goto nlmsg_failure;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2009-06-02 20:07:39 +02:00
|
|
|
nfmsg = nlmsg_data(nlh);
|
2008-04-14 11:15:52 +02:00
|
|
|
nfmsg->nfgen_family = nf_ct_l3num(ct);
|
2006-01-05 21:19:05 +01:00
|
|
|
nfmsg->version = NFNETLINK_V0;
|
|
|
|
nfmsg->res_id = 0;
|
|
|
|
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
zone = nf_ct_zone(ct);
|
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
nest_parms = nla_nest_start(skb, CTA_TUPLE_ORIG | NLA_F_NESTED);
|
|
|
|
if (!nest_parms)
|
|
|
|
goto nla_put_failure;
|
2009-06-02 20:03:35 +02:00
|
|
|
if (ctnetlink_dump_tuples(skb, nf_ct_tuple(ct, IP_CT_DIR_ORIGINAL)) < 0)
|
2007-09-28 23:37:03 +02:00
|
|
|
goto nla_put_failure;
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
if (ctnetlink_dump_zone_id(skb, CTA_TUPLE_ZONE, zone,
|
|
|
|
NF_CT_ZONE_DIR_ORIG) < 0)
|
|
|
|
goto nla_put_failure;
|
2007-09-28 23:37:03 +02:00
|
|
|
nla_nest_end(skb, nest_parms);
|
2007-02-12 20:15:49 +01:00
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
nest_parms = nla_nest_start(skb, CTA_TUPLE_REPLY | NLA_F_NESTED);
|
|
|
|
if (!nest_parms)
|
|
|
|
goto nla_put_failure;
|
2009-06-02 20:03:35 +02:00
|
|
|
if (ctnetlink_dump_tuples(skb, nf_ct_tuple(ct, IP_CT_DIR_REPLY)) < 0)
|
2007-09-28 23:37:03 +02:00
|
|
|
goto nla_put_failure;
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
if (ctnetlink_dump_zone_id(skb, CTA_TUPLE_ZONE, zone,
|
|
|
|
NF_CT_ZONE_DIR_REPL) < 0)
|
|
|
|
goto nla_put_failure;
|
2007-09-28 23:37:03 +02:00
|
|
|
nla_nest_end(skb, nest_parms);
|
2006-01-05 21:19:05 +01:00
|
|
|
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
if (ctnetlink_dump_zone_id(skb, CTA_ZONE, zone,
|
|
|
|
NF_CT_DEFAULT_ZONE_DIR) < 0)
|
2012-04-02 00:57:48 +02:00
|
|
|
goto nla_put_failure;
|
2010-02-15 18:14:57 +01:00
|
|
|
|
2006-01-05 21:19:05 +01:00
|
|
|
if (ctnetlink_dump_status(skb, ct) < 0 ||
|
|
|
|
ctnetlink_dump_timeout(skb, ct) < 0 ||
|
2013-09-26 17:31:52 +02:00
|
|
|
ctnetlink_dump_acct(skb, ct, type) < 0 ||
|
2011-01-19 16:00:07 +01:00
|
|
|
ctnetlink_dump_timestamp(skb, ct) < 0 ||
|
2006-01-05 21:19:05 +01:00
|
|
|
ctnetlink_dump_protoinfo(skb, ct) < 0 ||
|
|
|
|
ctnetlink_dump_helpinfo(skb, ct) < 0 ||
|
|
|
|
ctnetlink_dump_mark(skb, ct) < 0 ||
|
2010-10-13 22:24:54 +02:00
|
|
|
ctnetlink_dump_secctx(skb, ct) < 0 ||
|
2013-01-11 07:30:45 +01:00
|
|
|
ctnetlink_dump_labels(skb, ct) < 0 ||
|
2006-01-05 21:19:05 +01:00
|
|
|
ctnetlink_dump_id(skb, ct) < 0 ||
|
2007-12-18 07:28:00 +01:00
|
|
|
ctnetlink_dump_use(skb, ct) < 0 ||
|
2007-12-18 07:28:19 +01:00
|
|
|
ctnetlink_dump_master(skb, ct) < 0 ||
|
2013-08-27 08:50:12 +02:00
|
|
|
ctnetlink_dump_ct_seq_adj(skb, ct) < 0)
|
2007-09-28 23:37:03 +02:00
|
|
|
goto nla_put_failure;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2009-06-02 20:07:39 +02:00
|
|
|
nlmsg_end(skb, nlh);
|
2006-01-05 21:19:05 +01:00
|
|
|
return skb->len;
|
|
|
|
|
|
|
|
nlmsg_failure:
|
2007-09-28 23:37:03 +02:00
|
|
|
nla_put_failure:
|
2009-06-02 20:07:39 +02:00
|
|
|
nlmsg_cancel(skb, nlh);
|
2006-01-05 21:19:05 +01:00
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
2016-04-15 12:24:57 +02:00
|
|
|
static inline size_t ctnetlink_proto_size(const struct nf_conn *ct)
|
2009-03-25 21:50:59 +01:00
|
|
|
{
|
|
|
|
struct nf_conntrack_l3proto *l3proto;
|
|
|
|
struct nf_conntrack_l4proto *l4proto;
|
2009-06-02 20:08:27 +02:00
|
|
|
size_t len = 0;
|
|
|
|
|
|
|
|
rcu_read_lock();
|
|
|
|
l3proto = __nf_ct_l3proto_find(nf_ct_l3num(ct));
|
|
|
|
len += l3proto->nla_size;
|
|
|
|
|
|
|
|
l4proto = __nf_ct_l4proto_find(nf_ct_l3num(ct), nf_ct_protonum(ct));
|
|
|
|
len += l4proto->nla_size;
|
|
|
|
rcu_read_unlock();
|
|
|
|
|
|
|
|
return len;
|
|
|
|
}
|
|
|
|
|
2016-04-15 12:24:57 +02:00
|
|
|
static inline size_t ctnetlink_acct_size(const struct nf_conn *ct)
|
2010-04-01 12:39:19 +02:00
|
|
|
{
|
|
|
|
if (!nf_ct_ext_exist(ct, NF_CT_EXT_ACCT))
|
|
|
|
return 0;
|
|
|
|
return 2 * nla_total_size(0) /* CTA_COUNTERS_ORIG|REPL */
|
2016-04-22 17:31:18 +02:00
|
|
|
+ 2 * nla_total_size_64bit(sizeof(uint64_t)) /* CTA_COUNTERS_PACKETS */
|
|
|
|
+ 2 * nla_total_size_64bit(sizeof(uint64_t)) /* CTA_COUNTERS_BYTES */
|
2010-04-01 12:39:19 +02:00
|
|
|
;
|
|
|
|
}
|
|
|
|
|
2016-04-15 12:24:57 +02:00
|
|
|
static inline int ctnetlink_secctx_size(const struct nf_conn *ct)
|
2010-10-13 22:24:54 +02:00
|
|
|
{
|
2011-01-06 20:25:00 +01:00
|
|
|
#ifdef CONFIG_NF_CONNTRACK_SECMARK
|
|
|
|
int len, ret;
|
2010-10-13 22:24:54 +02:00
|
|
|
|
2011-01-06 20:25:00 +01:00
|
|
|
ret = security_secid_to_secctx(ct->secmark, NULL, &len);
|
|
|
|
if (ret)
|
|
|
|
return 0;
|
2010-10-13 22:24:54 +02:00
|
|
|
|
2011-01-06 20:25:00 +01:00
|
|
|
return nla_total_size(0) /* CTA_SECCTX */
|
|
|
|
+ nla_total_size(sizeof(char) * len); /* CTA_SECCTX_NAME */
|
|
|
|
#else
|
|
|
|
return 0;
|
2010-10-13 22:24:54 +02:00
|
|
|
#endif
|
2011-01-06 20:25:00 +01:00
|
|
|
}
|
2010-10-13 22:24:54 +02:00
|
|
|
|
2016-04-15 12:24:57 +02:00
|
|
|
static inline size_t ctnetlink_timestamp_size(const struct nf_conn *ct)
|
2011-01-19 16:00:07 +01:00
|
|
|
{
|
|
|
|
#ifdef CONFIG_NF_CONNTRACK_TIMESTAMP
|
|
|
|
if (!nf_ct_ext_exist(ct, NF_CT_EXT_TSTAMP))
|
|
|
|
return 0;
|
2016-04-22 17:31:18 +02:00
|
|
|
return nla_total_size(0) + 2 * nla_total_size_64bit(sizeof(uint64_t));
|
2011-01-19 16:00:07 +01:00
|
|
|
#else
|
|
|
|
return 0;
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2016-04-12 23:32:34 +02:00
|
|
|
#ifdef CONFIG_NF_CONNTRACK_EVENTS
|
|
|
|
static size_t ctnetlink_nlmsg_size(const struct nf_conn *ct)
|
2009-06-02 20:08:27 +02:00
|
|
|
{
|
|
|
|
return NLMSG_ALIGN(sizeof(struct nfgenmsg))
|
|
|
|
+ 3 * nla_total_size(0) /* CTA_TUPLE_ORIG|REPL|MASTER */
|
|
|
|
+ 3 * nla_total_size(0) /* CTA_TUPLE_IP */
|
|
|
|
+ 3 * nla_total_size(0) /* CTA_TUPLE_PROTO */
|
|
|
|
+ 3 * nla_total_size(sizeof(u_int8_t)) /* CTA_PROTO_NUM */
|
|
|
|
+ nla_total_size(sizeof(u_int32_t)) /* CTA_ID */
|
|
|
|
+ nla_total_size(sizeof(u_int32_t)) /* CTA_STATUS */
|
2013-09-26 17:31:51 +02:00
|
|
|
+ ctnetlink_acct_size(ct)
|
2011-01-19 16:00:07 +01:00
|
|
|
+ ctnetlink_timestamp_size(ct)
|
2009-06-02 20:08:27 +02:00
|
|
|
+ nla_total_size(sizeof(u_int32_t)) /* CTA_TIMEOUT */
|
|
|
|
+ nla_total_size(0) /* CTA_PROTOINFO */
|
|
|
|
+ nla_total_size(0) /* CTA_HELP */
|
|
|
|
+ nla_total_size(NF_CT_HELPER_NAME_LEN) /* CTA_HELP_NAME */
|
2011-01-06 20:25:00 +01:00
|
|
|
+ ctnetlink_secctx_size(ct)
|
2009-03-26 13:37:14 +01:00
|
|
|
#ifdef CONFIG_NF_NAT_NEEDED
|
2009-06-02 20:08:27 +02:00
|
|
|
+ 2 * nla_total_size(0) /* CTA_NAT_SEQ_ADJ_ORIG|REPL */
|
|
|
|
+ 6 * nla_total_size(sizeof(u_int32_t)) /* CTA_NAT_SEQ_OFFSET */
|
2009-03-26 13:37:14 +01:00
|
|
|
#endif
|
|
|
|
#ifdef CONFIG_NF_CONNTRACK_MARK
|
2009-06-02 20:08:27 +02:00
|
|
|
+ nla_total_size(sizeof(u_int32_t)) /* CTA_MARK */
|
2014-06-16 13:52:34 +02:00
|
|
|
#endif
|
|
|
|
#ifdef CONFIG_NF_CONNTRACK_ZONES
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
+ nla_total_size(sizeof(u_int16_t)) /* CTA_ZONE|CTA_TUPLE_ZONE */
|
2009-03-26 13:37:14 +01:00
|
|
|
#endif
|
2009-06-02 20:08:27 +02:00
|
|
|
+ ctnetlink_proto_size(ct)
|
2013-01-11 07:30:45 +01:00
|
|
|
+ ctnetlink_label_size(ct)
|
2009-06-02 20:08:27 +02:00
|
|
|
;
|
2009-03-25 21:50:59 +01:00
|
|
|
}
|
|
|
|
|
2009-06-03 10:32:06 +02:00
|
|
|
static int
|
|
|
|
ctnetlink_conntrack_event(unsigned int events, struct nf_ct_event *item)
|
2006-01-05 21:19:05 +01:00
|
|
|
{
|
2015-08-08 21:40:01 +02:00
|
|
|
const struct nf_conntrack_zone *zone;
|
2010-01-13 16:04:18 +01:00
|
|
|
struct net *net;
|
2006-01-05 21:19:05 +01:00
|
|
|
struct nlmsghdr *nlh;
|
|
|
|
struct nfgenmsg *nfmsg;
|
2007-09-28 23:37:03 +02:00
|
|
|
struct nlattr *nest_parms;
|
2008-11-18 11:56:20 +01:00
|
|
|
struct nf_conn *ct = item->ct;
|
2006-01-05 21:19:05 +01:00
|
|
|
struct sk_buff *skb;
|
|
|
|
unsigned int type;
|
|
|
|
unsigned int flags = 0, group;
|
netfilter: conntrack: optional reliable conntrack event delivery
This patch improves ctnetlink event reliability if one broadcast
listener has set the NETLINK_BROADCAST_ERROR socket option.
The logic is the following: if an event delivery fails, we keep
the undelivered events in the missed event cache. Once the next
packet arrives, we add the new events (if any) to the missed
events in the cache and we try a new delivery, and so on. Thus,
if ctnetlink fails to deliver an event, we try to deliver them
once we see a new packet. Therefore, we may lose state
transitions but the userspace process gets in sync at some point.
At worst case, if no events were delivered to userspace, we make
sure that destroy events are successfully delivered. Basically,
if ctnetlink fails to deliver the destroy event, we remove the
conntrack entry from the hashes and we insert them in the dying
list, which contains inactive entries. Then, the conntrack timer
is added with an extra grace timeout of random32() % 15 seconds
to trigger the event again (this grace timeout is tunable via
/proc). The use of a limited random timeout value allows
distributing the "destroy" resends, thus, avoiding accumulating
lots "destroy" events at the same time. Event delivery may
re-order but we can identify them by means of the tuple plus
the conntrack ID.
The maximum number of conntrack entries (active or inactive) is
still handled by nf_conntrack_max. Thus, we may start dropping
packets at some point if we accumulate a lot of inactive conntrack
entries that did not successfully report the destroy event to
userspace.
During my stress tests consisting of setting a very small buffer
of 2048 bytes for conntrackd and the NETLINK_BROADCAST_ERROR socket
flag, and generating lots of very small connections, I noticed
very few destroy entries on the fly waiting to be resend.
A simple way to test this patch consist of creating a lot of
entries, set a very small Netlink buffer in conntrackd (+ a patch
which is not in the git tree to set the BROADCAST_ERROR flag)
and invoke `conntrack -F'.
For expectations, no changes are introduced in this patch.
Currently, event delivery is only done for new expectations (no
events from expectation expiration, removal and confirmation).
In that case, they need a per-expectation event cache to implement
the same idea that is exposed in this patch.
This patch can be useful to provide reliable flow-accouting. We
still have to add a new conntrack extension to store the creation
and destroy time.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-13 12:30:52 +02:00
|
|
|
int err;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
|
|
|
/* ignore our fake conntrack entry */
|
2010-06-08 16:09:52 +02:00
|
|
|
if (nf_ct_is_untracked(ct))
|
2009-06-03 10:32:06 +02:00
|
|
|
return 0;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2009-06-13 12:26:29 +02:00
|
|
|
if (events & (1 << IPCT_DESTROY)) {
|
2006-01-05 21:19:05 +01:00
|
|
|
type = IPCTNL_MSG_CT_DELETE;
|
|
|
|
group = NFNLGRP_CONNTRACK_DESTROY;
|
2009-06-13 12:26:29 +02:00
|
|
|
} else if (events & ((1 << IPCT_NEW) | (1 << IPCT_RELATED))) {
|
2006-01-05 21:19:05 +01:00
|
|
|
type = IPCTNL_MSG_CT_NEW;
|
|
|
|
flags = NLM_F_CREATE|NLM_F_EXCL;
|
|
|
|
group = NFNLGRP_CONNTRACK_NEW;
|
netfilter: conntrack: simplify event caching system
This patch simplifies the conntrack event caching system by removing
several events:
* IPCT_[*]_VOLATILE, IPCT_HELPINFO and IPCT_NATINFO has been deleted
since the have no clients.
* IPCT_COUNTER_FILLING which is a leftover of the 32-bits counter
days.
* IPCT_REFRESH which is not of any use since we always include the
timeout in the messages.
After this patch, the existing events are:
* IPCT_NEW, IPCT_RELATED and IPCT_DESTROY, that are used to identify
addition and deletion of entries.
* IPCT_STATUS, that notes that the status bits have changes,
eg. IPS_SEEN_REPLY and IPS_ASSURED.
* IPCT_PROTOINFO, that reports that internal protocol information has
changed, eg. the TCP, DCCP and SCTP protocol state.
* IPCT_HELPER, that a helper has been assigned or unassigned to this
entry.
* IPCT_MARK and IPCT_SECMARK, that reports that the mark has changed, this
covers the case when a mark is set to zero.
* IPCT_NATSEQADJ, to report that there's updates in the NAT sequence
adjustment.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2009-06-02 20:08:46 +02:00
|
|
|
} else if (events) {
|
2006-01-05 21:19:05 +01:00
|
|
|
type = IPCTNL_MSG_CT_NEW;
|
|
|
|
group = NFNLGRP_CONNTRACK_UPDATE;
|
|
|
|
} else
|
2009-06-03 10:32:06 +02:00
|
|
|
return 0;
|
2006-03-21 03:03:59 +01:00
|
|
|
|
2010-01-13 16:04:18 +01:00
|
|
|
net = nf_ct_net(ct);
|
|
|
|
if (!item->report && !nfnetlink_has_listeners(net, group))
|
2009-06-03 10:32:06 +02:00
|
|
|
return 0;
|
2006-03-21 03:03:59 +01:00
|
|
|
|
2009-06-02 20:08:27 +02:00
|
|
|
skb = nlmsg_new(ctnetlink_nlmsg_size(ct), GFP_ATOMIC);
|
|
|
|
if (skb == NULL)
|
2009-04-17 17:47:31 +02:00
|
|
|
goto errout;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
|
|
|
type |= NFNL_SUBSYS_CTNETLINK << 8;
|
2012-09-07 22:12:54 +02:00
|
|
|
nlh = nlmsg_put(skb, item->portid, 0, type, sizeof(*nfmsg), flags);
|
2009-06-02 20:07:39 +02:00
|
|
|
if (nlh == NULL)
|
|
|
|
goto nlmsg_failure;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2009-06-02 20:07:39 +02:00
|
|
|
nfmsg = nlmsg_data(nlh);
|
2008-04-14 11:15:52 +02:00
|
|
|
nfmsg->nfgen_family = nf_ct_l3num(ct);
|
2006-01-05 21:19:05 +01:00
|
|
|
nfmsg->version = NFNETLINK_V0;
|
|
|
|
nfmsg->res_id = 0;
|
|
|
|
|
2008-11-17 16:00:40 +01:00
|
|
|
rcu_read_lock();
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
zone = nf_ct_zone(ct);
|
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
nest_parms = nla_nest_start(skb, CTA_TUPLE_ORIG | NLA_F_NESTED);
|
|
|
|
if (!nest_parms)
|
|
|
|
goto nla_put_failure;
|
2009-06-02 20:03:35 +02:00
|
|
|
if (ctnetlink_dump_tuples(skb, nf_ct_tuple(ct, IP_CT_DIR_ORIGINAL)) < 0)
|
2007-09-28 23:37:03 +02:00
|
|
|
goto nla_put_failure;
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
if (ctnetlink_dump_zone_id(skb, CTA_TUPLE_ZONE, zone,
|
|
|
|
NF_CT_ZONE_DIR_ORIG) < 0)
|
|
|
|
goto nla_put_failure;
|
2007-09-28 23:37:03 +02:00
|
|
|
nla_nest_end(skb, nest_parms);
|
2007-02-12 20:15:49 +01:00
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
nest_parms = nla_nest_start(skb, CTA_TUPLE_REPLY | NLA_F_NESTED);
|
|
|
|
if (!nest_parms)
|
|
|
|
goto nla_put_failure;
|
2009-06-02 20:03:35 +02:00
|
|
|
if (ctnetlink_dump_tuples(skb, nf_ct_tuple(ct, IP_CT_DIR_REPLY)) < 0)
|
2007-09-28 23:37:03 +02:00
|
|
|
goto nla_put_failure;
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
if (ctnetlink_dump_zone_id(skb, CTA_TUPLE_ZONE, zone,
|
|
|
|
NF_CT_ZONE_DIR_REPL) < 0)
|
|
|
|
goto nla_put_failure;
|
2007-09-28 23:37:03 +02:00
|
|
|
nla_nest_end(skb, nest_parms);
|
2006-01-05 21:19:05 +01:00
|
|
|
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
if (ctnetlink_dump_zone_id(skb, CTA_ZONE, zone,
|
|
|
|
NF_CT_DEFAULT_ZONE_DIR) < 0)
|
2012-04-02 00:57:48 +02:00
|
|
|
goto nla_put_failure;
|
2010-02-15 18:14:57 +01:00
|
|
|
|
2008-05-14 08:27:11 +02:00
|
|
|
if (ctnetlink_dump_id(skb, ct) < 0)
|
|
|
|
goto nla_put_failure;
|
|
|
|
|
2008-06-10 00:59:58 +02:00
|
|
|
if (ctnetlink_dump_status(skb, ct) < 0)
|
|
|
|
goto nla_put_failure;
|
|
|
|
|
2009-06-13 12:26:29 +02:00
|
|
|
if (events & (1 << IPCT_DESTROY)) {
|
2013-09-26 17:31:52 +02:00
|
|
|
if (ctnetlink_dump_acct(skb, ct, type) < 0 ||
|
2011-01-19 16:00:07 +01:00
|
|
|
ctnetlink_dump_timestamp(skb, ct) < 0)
|
2007-09-28 23:37:03 +02:00
|
|
|
goto nla_put_failure;
|
2006-11-29 02:35:32 +01:00
|
|
|
} else {
|
|
|
|
if (ctnetlink_dump_timeout(skb, ct) < 0)
|
2007-09-28 23:37:03 +02:00
|
|
|
goto nla_put_failure;
|
2006-11-29 02:35:32 +01:00
|
|
|
|
2009-06-13 12:26:29 +02:00
|
|
|
if (events & (1 << IPCT_PROTOINFO)
|
2006-11-29 02:35:32 +01:00
|
|
|
&& ctnetlink_dump_protoinfo(skb, ct) < 0)
|
2007-09-28 23:37:03 +02:00
|
|
|
goto nla_put_failure;
|
2006-11-29 02:35:32 +01:00
|
|
|
|
2009-06-13 12:26:29 +02:00
|
|
|
if ((events & (1 << IPCT_HELPER) || nfct_help(ct))
|
2006-11-29 02:35:32 +01:00
|
|
|
&& ctnetlink_dump_helpinfo(skb, ct) < 0)
|
2007-09-28 23:37:03 +02:00
|
|
|
goto nla_put_failure;
|
2006-11-29 02:35:32 +01:00
|
|
|
|
2010-10-20 00:17:32 +02:00
|
|
|
#ifdef CONFIG_NF_CONNTRACK_SECMARK
|
2009-06-13 12:26:29 +02:00
|
|
|
if ((events & (1 << IPCT_SECMARK) || ct->secmark)
|
2010-10-13 22:24:54 +02:00
|
|
|
&& ctnetlink_dump_secctx(skb, ct) < 0)
|
2007-12-18 07:28:41 +01:00
|
|
|
goto nla_put_failure;
|
2010-10-20 00:17:32 +02:00
|
|
|
#endif
|
2013-01-11 07:30:45 +01:00
|
|
|
if (events & (1 << IPCT_LABEL) &&
|
|
|
|
ctnetlink_dump_labels(skb, ct) < 0)
|
|
|
|
goto nla_put_failure;
|
2006-11-29 02:35:32 +01:00
|
|
|
|
2009-06-13 12:26:29 +02:00
|
|
|
if (events & (1 << IPCT_RELATED) &&
|
2007-12-18 07:28:19 +01:00
|
|
|
ctnetlink_dump_master(skb, ct) < 0)
|
|
|
|
goto nla_put_failure;
|
|
|
|
|
2013-08-27 08:50:12 +02:00
|
|
|
if (events & (1 << IPCT_SEQADJ) &&
|
|
|
|
ctnetlink_dump_ct_seq_adj(skb, ct) < 0)
|
2007-12-18 07:28:00 +01:00
|
|
|
goto nla_put_failure;
|
2006-11-29 02:35:32 +01:00
|
|
|
}
|
2006-08-22 09:31:49 +02:00
|
|
|
|
2008-01-31 13:44:27 +01:00
|
|
|
#ifdef CONFIG_NF_CONNTRACK_MARK
|
2009-06-13 12:26:29 +02:00
|
|
|
if ((events & (1 << IPCT_MARK) || ct->mark)
|
2008-01-31 13:44:27 +01:00
|
|
|
&& ctnetlink_dump_mark(skb, ct) < 0)
|
|
|
|
goto nla_put_failure;
|
|
|
|
#endif
|
2008-11-17 16:00:40 +01:00
|
|
|
rcu_read_unlock();
|
2008-01-31 13:44:27 +01:00
|
|
|
|
2009-06-02 20:07:39 +02:00
|
|
|
nlmsg_end(skb, nlh);
|
2012-09-07 22:12:54 +02:00
|
|
|
err = nfnetlink_send(skb, net, item->portid, group, item->report,
|
2010-01-13 16:02:14 +01:00
|
|
|
GFP_ATOMIC);
|
netfilter: conntrack: optional reliable conntrack event delivery
This patch improves ctnetlink event reliability if one broadcast
listener has set the NETLINK_BROADCAST_ERROR socket option.
The logic is the following: if an event delivery fails, we keep
the undelivered events in the missed event cache. Once the next
packet arrives, we add the new events (if any) to the missed
events in the cache and we try a new delivery, and so on. Thus,
if ctnetlink fails to deliver an event, we try to deliver them
once we see a new packet. Therefore, we may lose state
transitions but the userspace process gets in sync at some point.
At worst case, if no events were delivered to userspace, we make
sure that destroy events are successfully delivered. Basically,
if ctnetlink fails to deliver the destroy event, we remove the
conntrack entry from the hashes and we insert them in the dying
list, which contains inactive entries. Then, the conntrack timer
is added with an extra grace timeout of random32() % 15 seconds
to trigger the event again (this grace timeout is tunable via
/proc). The use of a limited random timeout value allows
distributing the "destroy" resends, thus, avoiding accumulating
lots "destroy" events at the same time. Event delivery may
re-order but we can identify them by means of the tuple plus
the conntrack ID.
The maximum number of conntrack entries (active or inactive) is
still handled by nf_conntrack_max. Thus, we may start dropping
packets at some point if we accumulate a lot of inactive conntrack
entries that did not successfully report the destroy event to
userspace.
During my stress tests consisting of setting a very small buffer
of 2048 bytes for conntrackd and the NETLINK_BROADCAST_ERROR socket
flag, and generating lots of very small connections, I noticed
very few destroy entries on the fly waiting to be resend.
A simple way to test this patch consist of creating a lot of
entries, set a very small Netlink buffer in conntrackd (+ a patch
which is not in the git tree to set the BROADCAST_ERROR flag)
and invoke `conntrack -F'.
For expectations, no changes are introduced in this patch.
Currently, event delivery is only done for new expectations (no
events from expectation expiration, removal and confirmation).
In that case, they need a per-expectation event cache to implement
the same idea that is exposed in this patch.
This patch can be useful to provide reliable flow-accouting. We
still have to add a new conntrack extension to store the creation
and destroy time.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-06-13 12:30:52 +02:00
|
|
|
if (err == -ENOBUFS || err == -EAGAIN)
|
|
|
|
return -ENOBUFS;
|
|
|
|
|
2009-06-03 10:32:06 +02:00
|
|
|
return 0;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
nla_put_failure:
|
2008-11-17 16:00:40 +01:00
|
|
|
rcu_read_unlock();
|
2009-06-02 20:07:39 +02:00
|
|
|
nlmsg_cancel(skb, nlh);
|
2008-11-17 16:00:40 +01:00
|
|
|
nlmsg_failure:
|
2006-01-05 21:19:05 +01:00
|
|
|
kfree_skb(skb);
|
2009-04-17 17:47:31 +02:00
|
|
|
errout:
|
2010-03-16 14:30:21 +01:00
|
|
|
if (nfnetlink_set_err(net, 0, group, -ENOBUFS) > 0)
|
|
|
|
return -ENOBUFS;
|
|
|
|
|
2009-06-03 10:32:06 +02:00
|
|
|
return 0;
|
2006-01-05 21:19:05 +01:00
|
|
|
}
|
|
|
|
#endif /* CONFIG_NF_CONNTRACK_EVENTS */
|
|
|
|
|
|
|
|
static int ctnetlink_done(struct netlink_callback *cb)
|
|
|
|
{
|
2006-05-30 03:24:58 +02:00
|
|
|
if (cb->args[1])
|
|
|
|
nf_ct_put((struct nf_conn *)cb->args[1]);
|
2014-06-20 22:38:58 +02:00
|
|
|
kfree(cb->data);
|
2006-01-05 21:19:05 +01:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2014-12-24 09:57:10 +01:00
|
|
|
struct ctnetlink_filter {
|
2012-02-24 15:41:50 +01:00
|
|
|
struct {
|
|
|
|
u_int32_t val;
|
|
|
|
u_int32_t mask;
|
|
|
|
} mark;
|
|
|
|
};
|
|
|
|
|
2014-12-24 09:57:10 +01:00
|
|
|
static struct ctnetlink_filter *
|
|
|
|
ctnetlink_alloc_filter(const struct nlattr * const cda[])
|
|
|
|
{
|
|
|
|
#ifdef CONFIG_NF_CONNTRACK_MARK
|
|
|
|
struct ctnetlink_filter *filter;
|
|
|
|
|
|
|
|
filter = kzalloc(sizeof(*filter), GFP_KERNEL);
|
|
|
|
if (filter == NULL)
|
|
|
|
return ERR_PTR(-ENOMEM);
|
|
|
|
|
|
|
|
filter->mark.val = ntohl(nla_get_be32(cda[CTA_MARK]));
|
|
|
|
filter->mark.mask = ntohl(nla_get_be32(cda[CTA_MARK_MASK]));
|
|
|
|
|
|
|
|
return filter;
|
|
|
|
#else
|
|
|
|
return ERR_PTR(-EOPNOTSUPP);
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
|
|
|
static int ctnetlink_filter_match(struct nf_conn *ct, void *data)
|
|
|
|
{
|
|
|
|
struct ctnetlink_filter *filter = data;
|
|
|
|
|
|
|
|
if (filter == NULL)
|
|
|
|
return 1;
|
|
|
|
|
|
|
|
#ifdef CONFIG_NF_CONNTRACK_MARK
|
|
|
|
if ((ct->mark & filter->mark.mask) == filter->mark.val)
|
|
|
|
return 1;
|
|
|
|
#endif
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2006-01-05 21:19:05 +01:00
|
|
|
static int
|
|
|
|
ctnetlink_dump_table(struct sk_buff *skb, struct netlink_callback *cb)
|
|
|
|
{
|
2010-01-13 16:04:18 +01:00
|
|
|
struct net *net = sock_net(skb->sk);
|
2006-05-30 03:24:58 +02:00
|
|
|
struct nf_conn *ct, *last;
|
2006-01-05 21:19:05 +01:00
|
|
|
struct nf_conntrack_tuple_hash *h;
|
2009-03-25 21:05:46 +01:00
|
|
|
struct hlist_nulls_node *n;
|
2009-06-02 20:07:39 +02:00
|
|
|
struct nfgenmsg *nfmsg = nlmsg_data(cb->nlh);
|
2006-01-05 21:19:23 +01:00
|
|
|
u_int8_t l3proto = nfmsg->nfgen_family;
|
2016-08-25 15:33:32 +02:00
|
|
|
struct nf_conn *nf_ct_evict[8];
|
|
|
|
int res, i;
|
2014-03-03 14:46:13 +01:00
|
|
|
spinlock_t *lockp;
|
|
|
|
|
2006-08-18 03:12:38 +02:00
|
|
|
last = (struct nf_conn *)cb->args[1];
|
2016-08-25 15:33:32 +02:00
|
|
|
i = 0;
|
2014-03-03 14:46:13 +01:00
|
|
|
|
|
|
|
local_bh_disable();
|
2016-05-02 18:39:55 +02:00
|
|
|
for (; cb->args[0] < nf_conntrack_htable_size; cb->args[0]++) {
|
2006-05-30 03:24:58 +02:00
|
|
|
restart:
|
2016-08-25 15:33:32 +02:00
|
|
|
while (i) {
|
|
|
|
i--;
|
|
|
|
if (nf_ct_should_gc(nf_ct_evict[i]))
|
|
|
|
nf_ct_kill(nf_ct_evict[i]);
|
|
|
|
nf_ct_put(nf_ct_evict[i]);
|
|
|
|
}
|
|
|
|
|
2014-03-03 14:46:13 +01:00
|
|
|
lockp = &nf_conntrack_locks[cb->args[0] % CONNTRACK_LOCKS];
|
2016-01-19 01:23:51 +01:00
|
|
|
nf_conntrack_lock(lockp);
|
2016-05-02 18:39:55 +02:00
|
|
|
if (cb->args[0] >= nf_conntrack_htable_size) {
|
2014-03-03 14:46:13 +01:00
|
|
|
spin_unlock(lockp);
|
|
|
|
goto out;
|
|
|
|
}
|
2016-05-02 18:39:55 +02:00
|
|
|
hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[cb->args[0]],
|
|
|
|
hnnode) {
|
2006-12-03 07:07:13 +01:00
|
|
|
if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
|
2006-01-05 21:19:05 +01:00
|
|
|
continue;
|
|
|
|
ct = nf_ct_tuplehash_to_ctrack(h);
|
2016-08-25 15:33:32 +02:00
|
|
|
if (nf_ct_is_expired(ct)) {
|
|
|
|
if (i < ARRAY_SIZE(nf_ct_evict) &&
|
|
|
|
atomic_inc_not_zero(&ct->ct_general.use))
|
|
|
|
nf_ct_evict[i++] = ct;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2016-04-28 19:13:45 +02:00
|
|
|
if (!net_eq(net, nf_ct_net(ct)))
|
|
|
|
continue;
|
|
|
|
|
2006-01-05 21:19:23 +01:00
|
|
|
/* Dump entries of a given L3 protocol number.
|
|
|
|
* If it is not specified, ie. l3proto == 0,
|
|
|
|
* then dump everything. */
|
2008-04-14 11:15:52 +02:00
|
|
|
if (l3proto && nf_ct_l3num(ct) != l3proto)
|
2011-01-11 23:54:42 +01:00
|
|
|
continue;
|
2006-08-18 03:12:38 +02:00
|
|
|
if (cb->args[1]) {
|
|
|
|
if (ct != last)
|
2011-01-11 23:54:42 +01:00
|
|
|
continue;
|
2006-08-18 03:12:38 +02:00
|
|
|
cb->args[1] = 0;
|
2006-05-30 03:24:58 +02:00
|
|
|
}
|
2014-12-24 09:57:10 +01:00
|
|
|
if (!ctnetlink_filter_match(ct, cb->data))
|
2012-02-24 15:41:50 +01:00
|
|
|
continue;
|
2014-12-24 09:57:10 +01:00
|
|
|
|
2012-03-05 03:24:29 +01:00
|
|
|
rcu_read_lock();
|
|
|
|
res =
|
2012-09-07 22:12:54 +02:00
|
|
|
ctnetlink_fill_info(skb, NETLINK_CB(cb->skb).portid,
|
2012-03-05 03:24:29 +01:00
|
|
|
cb->nlh->nlmsg_seq,
|
|
|
|
NFNL_MSG_TYPE(cb->nlh->nlmsg_type),
|
|
|
|
ct);
|
|
|
|
rcu_read_unlock();
|
|
|
|
if (res < 0) {
|
2011-01-24 19:01:07 +01:00
|
|
|
nf_conntrack_get(&ct->ct_general);
|
2006-05-30 03:24:58 +02:00
|
|
|
cb->args[1] = (unsigned long)ct;
|
2014-03-03 14:46:13 +01:00
|
|
|
spin_unlock(lockp);
|
2006-01-05 21:19:05 +01:00
|
|
|
goto out;
|
2006-05-30 03:24:58 +02:00
|
|
|
}
|
|
|
|
}
|
2014-03-03 14:46:13 +01:00
|
|
|
spin_unlock(lockp);
|
2006-08-18 03:12:38 +02:00
|
|
|
if (cb->args[1]) {
|
2006-05-30 03:24:58 +02:00
|
|
|
cb->args[1] = 0;
|
|
|
|
goto restart;
|
2006-01-05 21:19:05 +01:00
|
|
|
}
|
|
|
|
}
|
2006-05-30 03:24:58 +02:00
|
|
|
out:
|
2014-03-03 14:46:13 +01:00
|
|
|
local_bh_enable();
|
2006-08-18 03:12:38 +02:00
|
|
|
if (last)
|
|
|
|
nf_ct_put(last);
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2016-08-25 15:33:32 +02:00
|
|
|
while (i) {
|
|
|
|
i--;
|
|
|
|
if (nf_ct_should_gc(nf_ct_evict[i]))
|
|
|
|
nf_ct_kill(nf_ct_evict[i]);
|
|
|
|
nf_ct_put(nf_ct_evict[i]);
|
|
|
|
}
|
|
|
|
|
2006-01-05 21:19:05 +01:00
|
|
|
return skb->len;
|
|
|
|
}
|
|
|
|
|
2016-04-12 23:32:34 +02:00
|
|
|
static int ctnetlink_parse_tuple_ip(struct nlattr *attr,
|
|
|
|
struct nf_conntrack_tuple *tuple)
|
2006-01-05 21:19:05 +01:00
|
|
|
{
|
2007-09-28 23:37:03 +02:00
|
|
|
struct nlattr *tb[CTA_IP_MAX+1];
|
2006-01-05 21:19:05 +01:00
|
|
|
struct nf_conntrack_l3proto *l3proto;
|
|
|
|
int ret = 0;
|
|
|
|
|
2013-06-12 17:54:51 +02:00
|
|
|
ret = nla_parse_nested(tb, CTA_IP_MAX, attr, NULL);
|
|
|
|
if (ret < 0)
|
|
|
|
return ret;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2009-03-18 17:28:37 +01:00
|
|
|
rcu_read_lock();
|
|
|
|
l3proto = __nf_ct_l3proto_find(tuple->src.l3num);
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2007-09-28 23:39:55 +02:00
|
|
|
if (likely(l3proto->nlattr_to_tuple)) {
|
|
|
|
ret = nla_validate_nested(attr, CTA_IP_MAX,
|
|
|
|
l3proto->nla_policy);
|
|
|
|
if (ret == 0)
|
|
|
|
ret = l3proto->nlattr_to_tuple(tb, tuple);
|
|
|
|
}
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2009-03-18 17:28:37 +01:00
|
|
|
rcu_read_unlock();
|
2006-01-05 21:19:05 +01:00
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2007-09-28 23:39:55 +02:00
|
|
|
static const struct nla_policy proto_nla_policy[CTA_PROTO_MAX+1] = {
|
|
|
|
[CTA_PROTO_NUM] = { .type = NLA_U8 },
|
2006-01-05 21:19:05 +01:00
|
|
|
};
|
|
|
|
|
2016-04-12 23:32:34 +02:00
|
|
|
static int ctnetlink_parse_tuple_proto(struct nlattr *attr,
|
|
|
|
struct nf_conntrack_tuple *tuple)
|
2006-01-05 21:19:05 +01:00
|
|
|
{
|
2007-09-28 23:37:03 +02:00
|
|
|
struct nlattr *tb[CTA_PROTO_MAX+1];
|
2006-11-29 02:35:06 +01:00
|
|
|
struct nf_conntrack_l4proto *l4proto;
|
2006-01-05 21:19:05 +01:00
|
|
|
int ret = 0;
|
|
|
|
|
2007-09-28 23:39:55 +02:00
|
|
|
ret = nla_parse_nested(tb, CTA_PROTO_MAX, attr, proto_nla_policy);
|
|
|
|
if (ret < 0)
|
|
|
|
return ret;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
if (!tb[CTA_PROTO_NUM])
|
2006-01-05 21:19:05 +01:00
|
|
|
return -EINVAL;
|
2007-12-18 07:29:45 +01:00
|
|
|
tuple->dst.protonum = nla_get_u8(tb[CTA_PROTO_NUM]);
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2009-03-18 17:28:37 +01:00
|
|
|
rcu_read_lock();
|
|
|
|
l4proto = __nf_ct_l4proto_find(tuple->src.l3num, tuple->dst.protonum);
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2007-09-28 23:39:55 +02:00
|
|
|
if (likely(l4proto->nlattr_to_tuple)) {
|
|
|
|
ret = nla_validate_nested(attr, CTA_PROTO_MAX,
|
|
|
|
l4proto->nla_policy);
|
|
|
|
if (ret == 0)
|
|
|
|
ret = l4proto->nlattr_to_tuple(tb, tuple);
|
|
|
|
}
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2009-03-18 17:28:37 +01:00
|
|
|
rcu_read_unlock();
|
2007-02-12 20:15:49 +01:00
|
|
|
|
2006-01-05 21:19:05 +01:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
static int
|
|
|
|
ctnetlink_parse_zone(const struct nlattr *attr,
|
|
|
|
struct nf_conntrack_zone *zone)
|
|
|
|
{
|
2015-08-14 16:03:40 +02:00
|
|
|
nf_ct_zone_init(zone, NF_CT_DEFAULT_ZONE_ID,
|
|
|
|
NF_CT_DEFAULT_ZONE_DIR, 0);
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
#ifdef CONFIG_NF_CONNTRACK_ZONES
|
|
|
|
if (attr)
|
|
|
|
zone->id = ntohs(nla_get_be16(attr));
|
|
|
|
#else
|
|
|
|
if (attr)
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
#endif
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
ctnetlink_parse_tuple_zone(struct nlattr *attr, enum ctattr_type type,
|
|
|
|
struct nf_conntrack_zone *zone)
|
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
if (zone->id != NF_CT_DEFAULT_ZONE_ID)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
ret = ctnetlink_parse_zone(attr, zone);
|
|
|
|
if (ret < 0)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
if (type == CTA_TUPLE_REPLY)
|
|
|
|
zone->dir = NF_CT_ZONE_DIR_REPL;
|
|
|
|
else
|
|
|
|
zone->dir = NF_CT_ZONE_DIR_ORIG;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2010-02-10 15:38:33 +01:00
|
|
|
static const struct nla_policy tuple_nla_policy[CTA_TUPLE_MAX+1] = {
|
|
|
|
[CTA_TUPLE_IP] = { .type = NLA_NESTED },
|
|
|
|
[CTA_TUPLE_PROTO] = { .type = NLA_NESTED },
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
[CTA_TUPLE_ZONE] = { .type = NLA_U16 },
|
2010-02-10 15:38:33 +01:00
|
|
|
};
|
|
|
|
|
[NETFILTER]: Kill some supper dupper bloatry
/me awards the bloatiest-of-all-net/-.c-code award to
nf_conntrack_netlink.c, congratulations to all the authors :-/!
Hall of (unquestionable) fame (measured per inline, top 10 under
net/):
-4496 ctnetlink_parse_tuple netfilter/nf_conntrack_netlink.c
-2165 ctnetlink_dump_tuples netfilter/nf_conntrack_netlink.c
-2115 __ip_vs_get_out_rt ipv4/ipvs/ip_vs_xmit.c
-1924 xfrm_audit_helper_pktinfo xfrm/xfrm_state.c
-1799 ctnetlink_parse_tuple_proto netfilter/nf_conntrack_netlink.c
-1268 ctnetlink_parse_tuple_ip netfilter/nf_conntrack_netlink.c
-1093 ctnetlink_exp_dump_expect netfilter/nf_conntrack_netlink.c
-1060 void ccid3_update_send_interval dccp/ccids/ccid3.c
-983 ctnetlink_dump_tuples_proto netfilter/nf_conntrack_netlink.c
-827 ctnetlink_exp_dump_tuple netfilter/nf_conntrack_netlink.c
(i386 / gcc (GCC) 4.1.2 20070626 (Red Hat 4.1.2-13) /
allyesconfig except CONFIG_FORCED_INLINING)
...and I left < 200 byte gains as future work item.
After iterative inline removal, I finally have this:
net/netfilter/nf_conntrack_netlink.c:
ctnetlink_exp_fill_info | -1104
ctnetlink_new_expect | -1572
ctnetlink_fill_info | -1303
ctnetlink_new_conntrack | -2230
ctnetlink_get_expect | -341
ctnetlink_del_expect | -352
ctnetlink_expect_event | -1110
ctnetlink_conntrack_event | -1548
ctnetlink_del_conntrack | -729
ctnetlink_get_conntrack | -728
10 functions changed, 11017 bytes removed, diff: -11017
net/netfilter/nf_conntrack_netlink.c:
ctnetlink_parse_tuple | +419
dump_nat_seq_adj | +183
ctnetlink_dump_counters | +166
ctnetlink_dump_tuples | +261
ctnetlink_exp_dump_expect | +633
ctnetlink_change_status | +460
6 functions changed, 2122 bytes added, diff: +2122
net/netfilter/nf_conntrack_netlink.o:
16 functions changed, 2122 bytes added, 11017 bytes removed, diff: -8895
Without a number of CONFIG.*DEBUGs, I got this:
net/netfilter/nf_conntrack_netlink.o:
16 functions changed, 2122 bytes added, 11029 bytes removed, diff: -8907
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-06 08:11:31 +01:00
|
|
|
static int
|
2009-08-25 16:07:58 +02:00
|
|
|
ctnetlink_parse_tuple(const struct nlattr * const cda[],
|
|
|
|
struct nf_conntrack_tuple *tuple,
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
enum ctattr_type type, u_int8_t l3num,
|
|
|
|
struct nf_conntrack_zone *zone)
|
2006-01-05 21:19:05 +01:00
|
|
|
{
|
2007-09-28 23:37:03 +02:00
|
|
|
struct nlattr *tb[CTA_TUPLE_MAX+1];
|
2006-01-05 21:19:05 +01:00
|
|
|
int err;
|
|
|
|
|
|
|
|
memset(tuple, 0, sizeof(*tuple));
|
|
|
|
|
2013-06-12 17:54:51 +02:00
|
|
|
err = nla_parse_nested(tb, CTA_TUPLE_MAX, cda[type], tuple_nla_policy);
|
|
|
|
if (err < 0)
|
|
|
|
return err;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
if (!tb[CTA_TUPLE_IP])
|
2006-01-05 21:19:05 +01:00
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
tuple->src.l3num = l3num;
|
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
err = ctnetlink_parse_tuple_ip(tb[CTA_TUPLE_IP], tuple);
|
2006-01-05 21:19:05 +01:00
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
if (!tb[CTA_TUPLE_PROTO])
|
2006-01-05 21:19:05 +01:00
|
|
|
return -EINVAL;
|
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
err = ctnetlink_parse_tuple_proto(tb[CTA_TUPLE_PROTO], tuple);
|
2006-01-05 21:19:05 +01:00
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
if (tb[CTA_TUPLE_ZONE]) {
|
|
|
|
if (!zone)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
err = ctnetlink_parse_tuple_zone(tb[CTA_TUPLE_ZONE],
|
|
|
|
type, zone);
|
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2006-01-05 21:19:05 +01:00
|
|
|
/* orig and expect tuples get DIR_ORIGINAL */
|
|
|
|
if (type == CTA_TUPLE_REPLY)
|
|
|
|
tuple->dst.dir = IP_CT_DIR_REPLY;
|
|
|
|
else
|
|
|
|
tuple->dst.dir = IP_CT_DIR_ORIGINAL;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2010-02-10 15:38:33 +01:00
|
|
|
static const struct nla_policy help_nla_policy[CTA_HELP_MAX+1] = {
|
2012-11-22 02:32:46 +01:00
|
|
|
[CTA_HELP_NAME] = { .type = NLA_NUL_STRING,
|
|
|
|
.len = NF_CT_HELPER_NAME_LEN - 1 },
|
2010-02-10 15:38:33 +01:00
|
|
|
};
|
|
|
|
|
2016-04-12 23:32:34 +02:00
|
|
|
static int ctnetlink_parse_help(const struct nlattr *attr, char **helper_name,
|
|
|
|
struct nlattr **helpinfo)
|
2006-01-05 21:19:05 +01:00
|
|
|
{
|
2013-06-12 17:54:51 +02:00
|
|
|
int err;
|
2007-09-28 23:37:03 +02:00
|
|
|
struct nlattr *tb[CTA_HELP_MAX+1];
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2013-06-12 17:54:51 +02:00
|
|
|
err = nla_parse_nested(tb, CTA_HELP_MAX, attr, help_nla_policy);
|
|
|
|
if (err < 0)
|
|
|
|
return err;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
if (!tb[CTA_HELP_NAME])
|
2006-01-05 21:19:05 +01:00
|
|
|
return -EINVAL;
|
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
*helper_name = nla_data(tb[CTA_HELP_NAME]);
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2012-06-07 14:19:42 +02:00
|
|
|
if (tb[CTA_HELP_INFO])
|
|
|
|
*helpinfo = tb[CTA_HELP_INFO];
|
|
|
|
|
2006-01-05 21:19:05 +01:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2007-09-28 23:39:55 +02:00
|
|
|
static const struct nla_policy ct_nla_policy[CTA_MAX+1] = {
|
2010-02-10 15:38:33 +01:00
|
|
|
[CTA_TUPLE_ORIG] = { .type = NLA_NESTED },
|
|
|
|
[CTA_TUPLE_REPLY] = { .type = NLA_NESTED },
|
2007-09-28 23:39:55 +02:00
|
|
|
[CTA_STATUS] = { .type = NLA_U32 },
|
2010-02-10 15:38:33 +01:00
|
|
|
[CTA_PROTOINFO] = { .type = NLA_NESTED },
|
|
|
|
[CTA_HELP] = { .type = NLA_NESTED },
|
|
|
|
[CTA_NAT_SRC] = { .type = NLA_NESTED },
|
2007-09-28 23:39:55 +02:00
|
|
|
[CTA_TIMEOUT] = { .type = NLA_U32 },
|
|
|
|
[CTA_MARK] = { .type = NLA_U32 },
|
|
|
|
[CTA_ID] = { .type = NLA_U32 },
|
2010-02-10 15:38:33 +01:00
|
|
|
[CTA_NAT_DST] = { .type = NLA_NESTED },
|
|
|
|
[CTA_TUPLE_MASTER] = { .type = NLA_NESTED },
|
2012-11-22 02:32:46 +01:00
|
|
|
[CTA_NAT_SEQ_ADJ_ORIG] = { .type = NLA_NESTED },
|
|
|
|
[CTA_NAT_SEQ_ADJ_REPLY] = { .type = NLA_NESTED },
|
2010-02-15 18:14:57 +01:00
|
|
|
[CTA_ZONE] = { .type = NLA_U16 },
|
2012-02-24 15:41:50 +01:00
|
|
|
[CTA_MARK_MASK] = { .type = NLA_U32 },
|
2013-01-11 07:30:46 +01:00
|
|
|
[CTA_LABELS] = { .type = NLA_BINARY,
|
2014-02-18 15:25:32 +01:00
|
|
|
.len = NF_CT_LABELS_MAX_SIZE },
|
2013-01-11 07:30:46 +01:00
|
|
|
[CTA_LABELS_MASK] = { .type = NLA_BINARY,
|
2014-02-18 15:25:32 +01:00
|
|
|
.len = NF_CT_LABELS_MAX_SIZE },
|
2006-01-05 21:19:05 +01:00
|
|
|
};
|
|
|
|
|
2014-12-24 09:57:10 +01:00
|
|
|
static int ctnetlink_flush_conntrack(struct net *net,
|
|
|
|
const struct nlattr * const cda[],
|
|
|
|
u32 portid, int report)
|
|
|
|
{
|
|
|
|
struct ctnetlink_filter *filter = NULL;
|
|
|
|
|
|
|
|
if (cda[CTA_MARK] && cda[CTA_MARK_MASK]) {
|
|
|
|
filter = ctnetlink_alloc_filter(cda);
|
|
|
|
if (IS_ERR(filter))
|
|
|
|
return PTR_ERR(filter);
|
|
|
|
}
|
|
|
|
|
|
|
|
nf_ct_iterate_cleanup(net, ctnetlink_filter_match, filter,
|
|
|
|
portid, report);
|
|
|
|
kfree(filter);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2015-12-15 18:41:56 +01:00
|
|
|
static int ctnetlink_del_conntrack(struct net *net, struct sock *ctnl,
|
|
|
|
struct sk_buff *skb,
|
|
|
|
const struct nlmsghdr *nlh,
|
|
|
|
const struct nlattr * const cda[])
|
2006-01-05 21:19:05 +01:00
|
|
|
{
|
|
|
|
struct nf_conntrack_tuple_hash *h;
|
|
|
|
struct nf_conntrack_tuple tuple;
|
|
|
|
struct nf_conn *ct;
|
2009-06-02 20:07:39 +02:00
|
|
|
struct nfgenmsg *nfmsg = nlmsg_data(nlh);
|
2006-01-05 21:19:05 +01:00
|
|
|
u_int8_t u3 = nfmsg->nfgen_family;
|
2015-08-08 21:40:01 +02:00
|
|
|
struct nf_conntrack_zone zone;
|
2010-02-15 18:14:57 +01:00
|
|
|
int err;
|
|
|
|
|
|
|
|
err = ctnetlink_parse_zone(cda[CTA_ZONE], &zone);
|
|
|
|
if (err < 0)
|
|
|
|
return err;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
if (cda[CTA_TUPLE_ORIG])
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
err = ctnetlink_parse_tuple(cda, &tuple, CTA_TUPLE_ORIG,
|
|
|
|
u3, &zone);
|
2007-09-28 23:37:03 +02:00
|
|
|
else if (cda[CTA_TUPLE_REPLY])
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
err = ctnetlink_parse_tuple(cda, &tuple, CTA_TUPLE_REPLY,
|
|
|
|
u3, &zone);
|
2006-01-05 21:19:05 +01:00
|
|
|
else {
|
2014-12-24 09:57:10 +01:00
|
|
|
return ctnetlink_flush_conntrack(net, cda,
|
|
|
|
NETLINK_CB(skb).portid,
|
|
|
|
nlmsg_report(nlh));
|
2006-01-05 21:19:05 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
|
2015-08-08 21:40:01 +02:00
|
|
|
h = nf_conntrack_find_get(net, &zone, &tuple);
|
2006-10-12 23:09:16 +02:00
|
|
|
if (!h)
|
2006-01-05 21:19:05 +01:00
|
|
|
return -ENOENT;
|
|
|
|
|
|
|
|
ct = nf_ct_tuplehash_to_ctrack(h);
|
2007-02-12 20:15:49 +01:00
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
if (cda[CTA_ID]) {
|
2007-12-18 07:29:45 +01:00
|
|
|
u_int32_t id = ntohl(nla_get_be32(cda[CTA_ID]));
|
2007-09-28 23:41:27 +02:00
|
|
|
if (id != (u32)(unsigned long)ct) {
|
2006-01-05 21:19:05 +01:00
|
|
|
nf_ct_put(ct);
|
|
|
|
return -ENOENT;
|
|
|
|
}
|
2007-02-12 20:15:49 +01:00
|
|
|
}
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2016-08-25 15:33:31 +02:00
|
|
|
nf_ct_delete(ct, NETLINK_CB(skb).portid, nlmsg_report(nlh));
|
2006-01-05 21:19:05 +01:00
|
|
|
nf_ct_put(ct);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2015-12-15 18:41:56 +01:00
|
|
|
static int ctnetlink_get_conntrack(struct net *net, struct sock *ctnl,
|
|
|
|
struct sk_buff *skb,
|
|
|
|
const struct nlmsghdr *nlh,
|
|
|
|
const struct nlattr * const cda[])
|
2006-01-05 21:19:05 +01:00
|
|
|
{
|
|
|
|
struct nf_conntrack_tuple_hash *h;
|
|
|
|
struct nf_conntrack_tuple tuple;
|
|
|
|
struct nf_conn *ct;
|
|
|
|
struct sk_buff *skb2 = NULL;
|
2009-06-02 20:07:39 +02:00
|
|
|
struct nfgenmsg *nfmsg = nlmsg_data(nlh);
|
2006-01-05 21:19:05 +01:00
|
|
|
u_int8_t u3 = nfmsg->nfgen_family;
|
2015-08-08 21:40:01 +02:00
|
|
|
struct nf_conntrack_zone zone;
|
2010-02-15 18:14:57 +01:00
|
|
|
int err;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2012-02-24 15:30:15 +01:00
|
|
|
if (nlh->nlmsg_flags & NLM_F_DUMP) {
|
|
|
|
struct netlink_dump_control c = {
|
|
|
|
.dump = ctnetlink_dump_table,
|
|
|
|
.done = ctnetlink_done,
|
|
|
|
};
|
2014-12-24 09:57:10 +01:00
|
|
|
|
2012-02-24 15:41:50 +01:00
|
|
|
if (cda[CTA_MARK] && cda[CTA_MARK_MASK]) {
|
2014-12-24 09:57:10 +01:00
|
|
|
struct ctnetlink_filter *filter;
|
2012-02-24 15:41:50 +01:00
|
|
|
|
2014-12-24 09:57:10 +01:00
|
|
|
filter = ctnetlink_alloc_filter(cda);
|
|
|
|
if (IS_ERR(filter))
|
|
|
|
return PTR_ERR(filter);
|
2012-02-24 15:41:50 +01:00
|
|
|
|
|
|
|
c.data = filter;
|
|
|
|
}
|
2012-02-24 15:30:15 +01:00
|
|
|
return netlink_dump_start(ctnl, skb, nlh, &c);
|
|
|
|
}
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2010-02-15 18:14:57 +01:00
|
|
|
err = ctnetlink_parse_zone(cda[CTA_ZONE], &zone);
|
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
if (cda[CTA_TUPLE_ORIG])
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
err = ctnetlink_parse_tuple(cda, &tuple, CTA_TUPLE_ORIG,
|
|
|
|
u3, &zone);
|
2007-09-28 23:37:03 +02:00
|
|
|
else if (cda[CTA_TUPLE_REPLY])
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
err = ctnetlink_parse_tuple(cda, &tuple, CTA_TUPLE_REPLY,
|
|
|
|
u3, &zone);
|
2006-01-05 21:19:05 +01:00
|
|
|
else
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
|
2015-08-08 21:40:01 +02:00
|
|
|
h = nf_conntrack_find_get(net, &zone, &tuple);
|
2006-10-12 23:09:16 +02:00
|
|
|
if (!h)
|
2006-01-05 21:19:05 +01:00
|
|
|
return -ENOENT;
|
2006-10-12 23:09:16 +02:00
|
|
|
|
2006-01-05 21:19:05 +01:00
|
|
|
ct = nf_ct_tuplehash_to_ctrack(h);
|
|
|
|
|
|
|
|
err = -ENOMEM;
|
2009-06-02 20:07:39 +02:00
|
|
|
skb2 = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
|
|
|
|
if (skb2 == NULL) {
|
2006-01-05 21:19:05 +01:00
|
|
|
nf_ct_put(ct);
|
|
|
|
return -ENOMEM;
|
|
|
|
}
|
|
|
|
|
2008-11-17 16:00:40 +01:00
|
|
|
rcu_read_lock();
|
2012-09-07 22:12:54 +02:00
|
|
|
err = ctnetlink_fill_info(skb2, NETLINK_CB(skb).portid, nlh->nlmsg_seq,
|
2011-12-24 14:11:39 +01:00
|
|
|
NFNL_MSG_TYPE(nlh->nlmsg_type), ct);
|
2008-11-17 16:00:40 +01:00
|
|
|
rcu_read_unlock();
|
2006-01-05 21:19:05 +01:00
|
|
|
nf_ct_put(ct);
|
|
|
|
if (err <= 0)
|
|
|
|
goto free;
|
|
|
|
|
2012-09-07 22:12:54 +02:00
|
|
|
err = netlink_unicast(ctnl, skb2, NETLINK_CB(skb).portid, MSG_DONTWAIT);
|
2006-01-05 21:19:05 +01:00
|
|
|
if (err < 0)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
free:
|
|
|
|
kfree_skb(skb2);
|
|
|
|
out:
|
2011-01-13 14:19:55 +01:00
|
|
|
/* this avoids a loop in nfnetlink. */
|
|
|
|
return err == -EAGAIN ? -ENOBUFS : err;
|
2006-01-05 21:19:05 +01:00
|
|
|
}
|
|
|
|
|
2012-11-27 14:49:42 +01:00
|
|
|
static int ctnetlink_done_list(struct netlink_callback *cb)
|
|
|
|
{
|
|
|
|
if (cb->args[1])
|
|
|
|
nf_ct_put((struct nf_conn *)cb->args[1]);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2014-03-03 14:45:20 +01:00
|
|
|
ctnetlink_dump_list(struct sk_buff *skb, struct netlink_callback *cb, bool dying)
|
2012-11-27 14:49:42 +01:00
|
|
|
{
|
2014-06-08 11:41:23 +02:00
|
|
|
struct nf_conn *ct, *last;
|
2012-11-27 14:49:42 +01:00
|
|
|
struct nf_conntrack_tuple_hash *h;
|
|
|
|
struct hlist_nulls_node *n;
|
|
|
|
struct nfgenmsg *nfmsg = nlmsg_data(cb->nlh);
|
|
|
|
u_int8_t l3proto = nfmsg->nfgen_family;
|
|
|
|
int res;
|
2014-03-03 14:45:20 +01:00
|
|
|
int cpu;
|
|
|
|
struct hlist_nulls_head *list;
|
|
|
|
struct net *net = sock_net(skb->sk);
|
2012-11-27 14:49:42 +01:00
|
|
|
|
|
|
|
if (cb->args[2])
|
|
|
|
return 0;
|
|
|
|
|
2014-06-08 11:41:23 +02:00
|
|
|
last = (struct nf_conn *)cb->args[1];
|
|
|
|
|
2014-03-03 14:45:20 +01:00
|
|
|
for (cpu = cb->args[0]; cpu < nr_cpu_ids; cpu++) {
|
|
|
|
struct ct_pcpu *pcpu;
|
|
|
|
|
|
|
|
if (!cpu_possible(cpu))
|
2012-11-27 14:49:42 +01:00
|
|
|
continue;
|
2014-03-03 14:45:20 +01:00
|
|
|
|
|
|
|
pcpu = per_cpu_ptr(net->ct.pcpu_lists, cpu);
|
|
|
|
spin_lock_bh(&pcpu->lock);
|
|
|
|
list = dying ? &pcpu->dying : &pcpu->unconfirmed;
|
|
|
|
restart:
|
|
|
|
hlist_nulls_for_each_entry(h, n, list, hnnode) {
|
|
|
|
ct = nf_ct_tuplehash_to_ctrack(h);
|
|
|
|
if (l3proto && nf_ct_l3num(ct) != l3proto)
|
2012-11-27 14:49:42 +01:00
|
|
|
continue;
|
2014-03-03 14:45:20 +01:00
|
|
|
if (cb->args[1]) {
|
|
|
|
if (ct != last)
|
|
|
|
continue;
|
|
|
|
cb->args[1] = 0;
|
|
|
|
}
|
|
|
|
rcu_read_lock();
|
|
|
|
res = ctnetlink_fill_info(skb, NETLINK_CB(cb->skb).portid,
|
|
|
|
cb->nlh->nlmsg_seq,
|
|
|
|
NFNL_MSG_TYPE(cb->nlh->nlmsg_type),
|
|
|
|
ct);
|
|
|
|
rcu_read_unlock();
|
|
|
|
if (res < 0) {
|
2014-06-08 11:41:23 +02:00
|
|
|
if (!atomic_inc_not_zero(&ct->ct_general.use))
|
|
|
|
continue;
|
2014-06-05 14:28:44 +02:00
|
|
|
cb->args[0] = cpu;
|
2014-03-03 14:45:20 +01:00
|
|
|
cb->args[1] = (unsigned long)ct;
|
|
|
|
spin_unlock_bh(&pcpu->lock);
|
|
|
|
goto out;
|
|
|
|
}
|
2012-11-27 14:49:42 +01:00
|
|
|
}
|
2014-03-03 14:45:20 +01:00
|
|
|
if (cb->args[1]) {
|
|
|
|
cb->args[1] = 0;
|
|
|
|
goto restart;
|
2014-06-05 14:28:44 +02:00
|
|
|
}
|
2014-03-03 14:45:20 +01:00
|
|
|
spin_unlock_bh(&pcpu->lock);
|
2012-11-27 14:49:42 +01:00
|
|
|
}
|
2014-06-05 14:28:44 +02:00
|
|
|
cb->args[2] = 1;
|
2012-11-27 14:49:42 +01:00
|
|
|
out:
|
|
|
|
if (last)
|
|
|
|
nf_ct_put(last);
|
|
|
|
|
|
|
|
return skb->len;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
ctnetlink_dump_dying(struct sk_buff *skb, struct netlink_callback *cb)
|
|
|
|
{
|
2014-03-03 14:45:20 +01:00
|
|
|
return ctnetlink_dump_list(skb, cb, true);
|
2012-11-27 14:49:42 +01:00
|
|
|
}
|
|
|
|
|
2015-12-15 18:41:56 +01:00
|
|
|
static int ctnetlink_get_ct_dying(struct net *net, struct sock *ctnl,
|
|
|
|
struct sk_buff *skb,
|
|
|
|
const struct nlmsghdr *nlh,
|
|
|
|
const struct nlattr * const cda[])
|
2012-11-27 14:49:42 +01:00
|
|
|
{
|
|
|
|
if (nlh->nlmsg_flags & NLM_F_DUMP) {
|
|
|
|
struct netlink_dump_control c = {
|
|
|
|
.dump = ctnetlink_dump_dying,
|
|
|
|
.done = ctnetlink_done_list,
|
|
|
|
};
|
|
|
|
return netlink_dump_start(ctnl, skb, nlh, &c);
|
|
|
|
}
|
|
|
|
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
ctnetlink_dump_unconfirmed(struct sk_buff *skb, struct netlink_callback *cb)
|
|
|
|
{
|
2014-03-03 14:45:20 +01:00
|
|
|
return ctnetlink_dump_list(skb, cb, false);
|
2012-11-27 14:49:42 +01:00
|
|
|
}
|
|
|
|
|
2015-12-15 18:41:56 +01:00
|
|
|
static int ctnetlink_get_ct_unconfirmed(struct net *net, struct sock *ctnl,
|
|
|
|
struct sk_buff *skb,
|
|
|
|
const struct nlmsghdr *nlh,
|
|
|
|
const struct nlattr * const cda[])
|
2012-11-27 14:49:42 +01:00
|
|
|
{
|
|
|
|
if (nlh->nlmsg_flags & NLM_F_DUMP) {
|
|
|
|
struct netlink_dump_control c = {
|
|
|
|
.dump = ctnetlink_dump_unconfirmed,
|
|
|
|
.done = ctnetlink_done_list,
|
|
|
|
};
|
|
|
|
return netlink_dump_start(ctnl, skb, nlh, &c);
|
|
|
|
}
|
|
|
|
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
2008-10-20 12:34:27 +02:00
|
|
|
#ifdef CONFIG_NF_NAT_NEEDED
|
2008-10-14 20:58:31 +02:00
|
|
|
static int
|
|
|
|
ctnetlink_parse_nat_setup(struct nf_conn *ct,
|
|
|
|
enum nf_nat_manip_type manip,
|
2009-08-25 16:07:58 +02:00
|
|
|
const struct nlattr *attr)
|
2008-10-14 20:58:31 +02:00
|
|
|
{
|
|
|
|
typeof(nfnetlink_parse_nat_setup_hook) parse_nat_setup;
|
2012-08-26 19:14:06 +02:00
|
|
|
int err;
|
2008-10-14 20:58:31 +02:00
|
|
|
|
|
|
|
parse_nat_setup = rcu_dereference(nfnetlink_parse_nat_setup_hook);
|
|
|
|
if (!parse_nat_setup) {
|
2008-10-17 00:24:51 +02:00
|
|
|
#ifdef CONFIG_MODULES
|
2008-10-14 20:58:31 +02:00
|
|
|
rcu_read_unlock();
|
2013-02-05 01:50:26 +01:00
|
|
|
nfnl_unlock(NFNL_SUBSYS_CTNETLINK);
|
2012-08-26 19:14:06 +02:00
|
|
|
if (request_module("nf-nat") < 0) {
|
2013-02-05 01:50:26 +01:00
|
|
|
nfnl_lock(NFNL_SUBSYS_CTNETLINK);
|
2008-10-14 20:58:31 +02:00
|
|
|
rcu_read_lock();
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
2013-02-05 01:50:26 +01:00
|
|
|
nfnl_lock(NFNL_SUBSYS_CTNETLINK);
|
2008-10-14 20:58:31 +02:00
|
|
|
rcu_read_lock();
|
|
|
|
if (nfnetlink_parse_nat_setup_hook)
|
|
|
|
return -EAGAIN;
|
|
|
|
#endif
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
2012-08-26 19:14:06 +02:00
|
|
|
err = parse_nat_setup(ct, manip, attr);
|
|
|
|
if (err == -EAGAIN) {
|
|
|
|
#ifdef CONFIG_MODULES
|
|
|
|
rcu_read_unlock();
|
2013-02-05 01:50:26 +01:00
|
|
|
nfnl_unlock(NFNL_SUBSYS_CTNETLINK);
|
2012-08-26 19:14:06 +02:00
|
|
|
if (request_module("nf-nat-%u", nf_ct_l3num(ct)) < 0) {
|
2013-02-05 01:50:26 +01:00
|
|
|
nfnl_lock(NFNL_SUBSYS_CTNETLINK);
|
2012-08-26 19:14:06 +02:00
|
|
|
rcu_read_lock();
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
2013-02-05 01:50:26 +01:00
|
|
|
nfnl_lock(NFNL_SUBSYS_CTNETLINK);
|
2012-08-26 19:14:06 +02:00
|
|
|
rcu_read_lock();
|
|
|
|
#else
|
|
|
|
err = -EOPNOTSUPP;
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
return err;
|
2008-10-14 20:58:31 +02:00
|
|
|
}
|
2008-10-20 12:34:27 +02:00
|
|
|
#endif
|
2008-10-14 20:58:31 +02:00
|
|
|
|
[NETFILTER]: Kill some supper dupper bloatry
/me awards the bloatiest-of-all-net/-.c-code award to
nf_conntrack_netlink.c, congratulations to all the authors :-/!
Hall of (unquestionable) fame (measured per inline, top 10 under
net/):
-4496 ctnetlink_parse_tuple netfilter/nf_conntrack_netlink.c
-2165 ctnetlink_dump_tuples netfilter/nf_conntrack_netlink.c
-2115 __ip_vs_get_out_rt ipv4/ipvs/ip_vs_xmit.c
-1924 xfrm_audit_helper_pktinfo xfrm/xfrm_state.c
-1799 ctnetlink_parse_tuple_proto netfilter/nf_conntrack_netlink.c
-1268 ctnetlink_parse_tuple_ip netfilter/nf_conntrack_netlink.c
-1093 ctnetlink_exp_dump_expect netfilter/nf_conntrack_netlink.c
-1060 void ccid3_update_send_interval dccp/ccids/ccid3.c
-983 ctnetlink_dump_tuples_proto netfilter/nf_conntrack_netlink.c
-827 ctnetlink_exp_dump_tuple netfilter/nf_conntrack_netlink.c
(i386 / gcc (GCC) 4.1.2 20070626 (Red Hat 4.1.2-13) /
allyesconfig except CONFIG_FORCED_INLINING)
...and I left < 200 byte gains as future work item.
After iterative inline removal, I finally have this:
net/netfilter/nf_conntrack_netlink.c:
ctnetlink_exp_fill_info | -1104
ctnetlink_new_expect | -1572
ctnetlink_fill_info | -1303
ctnetlink_new_conntrack | -2230
ctnetlink_get_expect | -341
ctnetlink_del_expect | -352
ctnetlink_expect_event | -1110
ctnetlink_conntrack_event | -1548
ctnetlink_del_conntrack | -729
ctnetlink_get_conntrack | -728
10 functions changed, 11017 bytes removed, diff: -11017
net/netfilter/nf_conntrack_netlink.c:
ctnetlink_parse_tuple | +419
dump_nat_seq_adj | +183
ctnetlink_dump_counters | +166
ctnetlink_dump_tuples | +261
ctnetlink_exp_dump_expect | +633
ctnetlink_change_status | +460
6 functions changed, 2122 bytes added, diff: +2122
net/netfilter/nf_conntrack_netlink.o:
16 functions changed, 2122 bytes added, 11017 bytes removed, diff: -8895
Without a number of CONFIG.*DEBUGs, I got this:
net/netfilter/nf_conntrack_netlink.o:
16 functions changed, 2122 bytes added, 11029 bytes removed, diff: -8907
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-06 08:11:31 +01:00
|
|
|
static int
|
2009-08-25 16:07:58 +02:00
|
|
|
ctnetlink_change_status(struct nf_conn *ct, const struct nlattr * const cda[])
|
2006-01-05 21:19:05 +01:00
|
|
|
{
|
|
|
|
unsigned long d;
|
2007-12-18 07:29:45 +01:00
|
|
|
unsigned int status = ntohl(nla_get_be32(cda[CTA_STATUS]));
|
2006-01-05 21:19:05 +01:00
|
|
|
d = ct->status ^ status;
|
|
|
|
|
|
|
|
if (d & (IPS_EXPECTED|IPS_CONFIRMED|IPS_DYING))
|
|
|
|
/* unchangeable */
|
2008-06-10 00:56:20 +02:00
|
|
|
return -EBUSY;
|
2007-02-12 20:15:49 +01:00
|
|
|
|
2006-01-05 21:19:05 +01:00
|
|
|
if (d & IPS_SEEN_REPLY && !(status & IPS_SEEN_REPLY))
|
|
|
|
/* SEEN_REPLY bit can only be set */
|
2008-06-10 00:56:20 +02:00
|
|
|
return -EBUSY;
|
2007-02-12 20:15:49 +01:00
|
|
|
|
2006-01-05 21:19:05 +01:00
|
|
|
if (d & IPS_ASSURED && !(status & IPS_ASSURED))
|
|
|
|
/* ASSURED bit can only be set */
|
2008-06-10 00:56:20 +02:00
|
|
|
return -EBUSY;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
|
|
|
/* Be careful here, modifying NAT bits can screw up things,
|
|
|
|
* so don't let users modify them directly if they don't pass
|
2006-12-03 07:07:13 +01:00
|
|
|
* nf_nat_range. */
|
2006-01-05 21:19:05 +01:00
|
|
|
ct->status |= status & ~(IPS_NAT_DONE_MASK | IPS_NAT_MASK);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2008-10-14 20:58:31 +02:00
|
|
|
static int
|
netfilter: ctnetlink: force null nat binding on insert
Quoting Andrey Vagin:
When a conntrack is created by kernel, it is initialized (sets
IPS_{DST,SRC}_NAT_DONE_BIT bits in nf_nat_setup_info) and only then it
is added in hashes (__nf_conntrack_hash_insert), so one conntract
can't be initialized from a few threads concurrently.
ctnetlink can add an uninitialized conntrack (w/o
IPS_{DST,SRC}_NAT_DONE_BIT) in hashes, then a few threads can look up
this conntrack and start initialize it concurrently. It's dangerous,
because BUG can be triggered from nf_nat_setup_info.
Fix this race by always setting up nat, even if no CTA_NAT_ attribute
was requested before inserting the ct into the hash table. In absence
of CTA_NAT_ attribute, a null binding is created.
This alters current behaviour: Before this patch, the first packet
matching the newly injected conntrack would be run through the nat
table since nf_nat_initialized() returns false. IOW, this forces
ctnetlink users to specify the desired nat transformation on ct
creation time.
Thanks for Florian Westphal, this patch is based on his original
patch to address this problem, including this patch description.
Reported-By: Andrey Vagin <avagin@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Florian Westphal <fw@strlen.de>
2014-02-16 12:15:43 +01:00
|
|
|
ctnetlink_setup_nat(struct nf_conn *ct, const struct nlattr * const cda[])
|
2008-10-14 20:58:31 +02:00
|
|
|
{
|
|
|
|
#ifdef CONFIG_NF_NAT_NEEDED
|
|
|
|
int ret;
|
|
|
|
|
2014-04-28 21:07:31 +02:00
|
|
|
if (!cda[CTA_NAT_DST] && !cda[CTA_NAT_SRC])
|
|
|
|
return 0;
|
|
|
|
|
netfilter: ctnetlink: force null nat binding on insert
Quoting Andrey Vagin:
When a conntrack is created by kernel, it is initialized (sets
IPS_{DST,SRC}_NAT_DONE_BIT bits in nf_nat_setup_info) and only then it
is added in hashes (__nf_conntrack_hash_insert), so one conntract
can't be initialized from a few threads concurrently.
ctnetlink can add an uninitialized conntrack (w/o
IPS_{DST,SRC}_NAT_DONE_BIT) in hashes, then a few threads can look up
this conntrack and start initialize it concurrently. It's dangerous,
because BUG can be triggered from nf_nat_setup_info.
Fix this race by always setting up nat, even if no CTA_NAT_ attribute
was requested before inserting the ct into the hash table. In absence
of CTA_NAT_ attribute, a null binding is created.
This alters current behaviour: Before this patch, the first packet
matching the newly injected conntrack would be run through the nat
table since nf_nat_initialized() returns false. IOW, this forces
ctnetlink users to specify the desired nat transformation on ct
creation time.
Thanks for Florian Westphal, this patch is based on his original
patch to address this problem, including this patch description.
Reported-By: Andrey Vagin <avagin@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Florian Westphal <fw@strlen.de>
2014-02-16 12:15:43 +01:00
|
|
|
ret = ctnetlink_parse_nat_setup(ct, NF_NAT_MANIP_DST,
|
|
|
|
cda[CTA_NAT_DST]);
|
|
|
|
if (ret < 0)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
ret = ctnetlink_parse_nat_setup(ct, NF_NAT_MANIP_SRC,
|
|
|
|
cda[CTA_NAT_SRC]);
|
|
|
|
return ret;
|
2008-10-14 20:58:31 +02:00
|
|
|
#else
|
netfilter: ctnetlink: force null nat binding on insert
Quoting Andrey Vagin:
When a conntrack is created by kernel, it is initialized (sets
IPS_{DST,SRC}_NAT_DONE_BIT bits in nf_nat_setup_info) and only then it
is added in hashes (__nf_conntrack_hash_insert), so one conntract
can't be initialized from a few threads concurrently.
ctnetlink can add an uninitialized conntrack (w/o
IPS_{DST,SRC}_NAT_DONE_BIT) in hashes, then a few threads can look up
this conntrack and start initialize it concurrently. It's dangerous,
because BUG can be triggered from nf_nat_setup_info.
Fix this race by always setting up nat, even if no CTA_NAT_ attribute
was requested before inserting the ct into the hash table. In absence
of CTA_NAT_ attribute, a null binding is created.
This alters current behaviour: Before this patch, the first packet
matching the newly injected conntrack would be run through the nat
table since nf_nat_initialized() returns false. IOW, this forces
ctnetlink users to specify the desired nat transformation on ct
creation time.
Thanks for Florian Westphal, this patch is based on his original
patch to address this problem, including this patch description.
Reported-By: Andrey Vagin <avagin@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Florian Westphal <fw@strlen.de>
2014-02-16 12:15:43 +01:00
|
|
|
if (!cda[CTA_NAT_DST] && !cda[CTA_NAT_SRC])
|
|
|
|
return 0;
|
2008-10-14 20:58:31 +02:00
|
|
|
return -EOPNOTSUPP;
|
|
|
|
#endif
|
|
|
|
}
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2016-04-12 23:32:34 +02:00
|
|
|
static int ctnetlink_change_helper(struct nf_conn *ct,
|
|
|
|
const struct nlattr * const cda[])
|
2006-01-05 21:19:05 +01:00
|
|
|
{
|
|
|
|
struct nf_conntrack_helper *helper;
|
2006-03-21 02:56:32 +01:00
|
|
|
struct nf_conn_help *help = nfct_help(ct);
|
2009-04-22 11:26:37 +02:00
|
|
|
char *helpname = NULL;
|
2012-06-07 14:19:42 +02:00
|
|
|
struct nlattr *helpinfo = NULL;
|
2006-01-05 21:19:05 +01:00
|
|
|
int err;
|
|
|
|
|
2012-06-07 14:19:42 +02:00
|
|
|
err = ctnetlink_parse_help(cda[CTA_HELP], &helpname, &helpinfo);
|
2006-01-05 21:19:05 +01:00
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
|
2017-01-26 23:49:44 +01:00
|
|
|
/* don't change helper of sibling connections */
|
|
|
|
if (ct->master) {
|
|
|
|
/* If we try to change the helper to the same thing twice,
|
|
|
|
* treat the second attempt as a no-op instead of returning
|
|
|
|
* an error.
|
|
|
|
*/
|
2017-04-01 14:55:44 +02:00
|
|
|
err = -EBUSY;
|
|
|
|
if (help) {
|
|
|
|
rcu_read_lock();
|
|
|
|
helper = rcu_dereference(help->helper);
|
|
|
|
if (helper && !strcmp(helper->name, helpname))
|
|
|
|
err = 0;
|
|
|
|
rcu_read_unlock();
|
|
|
|
}
|
|
|
|
|
|
|
|
return err;
|
2017-01-26 23:49:44 +01:00
|
|
|
}
|
|
|
|
|
2007-05-10 23:15:58 +02:00
|
|
|
if (!strcmp(helpname, "")) {
|
|
|
|
if (help && help->helper) {
|
2006-01-05 21:19:05 +01:00
|
|
|
/* we had a helper before ... */
|
|
|
|
nf_ct_remove_expectations(ct);
|
2011-08-01 18:19:00 +02:00
|
|
|
RCU_INIT_POINTER(help->helper, NULL);
|
2006-01-05 21:19:05 +01:00
|
|
|
}
|
2007-05-10 23:15:58 +02:00
|
|
|
|
|
|
|
return 0;
|
2006-01-05 21:19:05 +01:00
|
|
|
}
|
2007-02-12 20:15:49 +01:00
|
|
|
|
2010-02-03 13:41:29 +01:00
|
|
|
helper = __nf_conntrack_helper_find(helpname, nf_ct_l3num(ct),
|
|
|
|
nf_ct_protonum(ct));
|
2008-11-18 11:54:05 +01:00
|
|
|
if (helper == NULL) {
|
|
|
|
#ifdef CONFIG_MODULES
|
2014-03-03 14:46:01 +01:00
|
|
|
spin_unlock_bh(&nf_conntrack_expect_lock);
|
2008-11-18 11:54:05 +01:00
|
|
|
|
|
|
|
if (request_module("nfct-helper-%s", helpname) < 0) {
|
2014-03-03 14:46:01 +01:00
|
|
|
spin_lock_bh(&nf_conntrack_expect_lock);
|
2008-11-18 11:54:05 +01:00
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
2014-03-03 14:46:01 +01:00
|
|
|
spin_lock_bh(&nf_conntrack_expect_lock);
|
2010-02-03 13:41:29 +01:00
|
|
|
helper = __nf_conntrack_helper_find(helpname, nf_ct_l3num(ct),
|
|
|
|
nf_ct_protonum(ct));
|
2008-11-18 11:54:05 +01:00
|
|
|
if (helper)
|
|
|
|
return -EAGAIN;
|
|
|
|
#endif
|
2008-06-10 00:56:20 +02:00
|
|
|
return -EOPNOTSUPP;
|
2008-11-18 11:54:05 +01:00
|
|
|
}
|
2007-05-10 23:15:58 +02:00
|
|
|
|
2007-07-08 07:23:42 +02:00
|
|
|
if (help) {
|
2012-06-07 14:19:42 +02:00
|
|
|
if (help->helper == helper) {
|
|
|
|
/* update private helper data if allowed. */
|
2012-09-21 16:52:08 +02:00
|
|
|
if (helper->from_nlattr)
|
2012-06-07 14:19:42 +02:00
|
|
|
helper->from_nlattr(helpinfo, ct);
|
2007-07-08 07:23:42 +02:00
|
|
|
return 0;
|
2012-06-18 17:29:53 +02:00
|
|
|
} else
|
2007-07-08 07:23:42 +02:00
|
|
|
return -EBUSY;
|
|
|
|
}
|
2007-05-10 23:15:58 +02:00
|
|
|
|
2012-06-18 17:29:53 +02:00
|
|
|
/* we cannot set a helper for an existing conntrack */
|
|
|
|
return -EOPNOTSUPP;
|
2006-01-05 21:19:05 +01:00
|
|
|
}
|
|
|
|
|
2016-04-12 23:32:34 +02:00
|
|
|
static int ctnetlink_change_timeout(struct nf_conn *ct,
|
|
|
|
const struct nlattr * const cda[])
|
2006-01-05 21:19:05 +01:00
|
|
|
{
|
2007-12-18 07:29:45 +01:00
|
|
|
u_int32_t timeout = ntohl(nla_get_be32(cda[CTA_TIMEOUT]));
|
2007-02-12 20:15:49 +01:00
|
|
|
|
2016-08-25 15:33:31 +02:00
|
|
|
ct->timeout = nfct_time_stamp + timeout * HZ;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2016-08-25 15:33:31 +02:00
|
|
|
if (test_bit(IPS_DYING_BIT, &ct->status))
|
|
|
|
return -ETIME;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2010-02-10 15:38:33 +01:00
|
|
|
static const struct nla_policy protoinfo_policy[CTA_PROTOINFO_MAX+1] = {
|
|
|
|
[CTA_PROTOINFO_TCP] = { .type = NLA_NESTED },
|
|
|
|
[CTA_PROTOINFO_DCCP] = { .type = NLA_NESTED },
|
|
|
|
[CTA_PROTOINFO_SCTP] = { .type = NLA_NESTED },
|
|
|
|
};
|
|
|
|
|
2016-04-12 23:32:34 +02:00
|
|
|
static int ctnetlink_change_protoinfo(struct nf_conn *ct,
|
|
|
|
const struct nlattr * const cda[])
|
2006-01-05 21:19:05 +01:00
|
|
|
{
|
2009-08-25 16:07:58 +02:00
|
|
|
const struct nlattr *attr = cda[CTA_PROTOINFO];
|
|
|
|
struct nlattr *tb[CTA_PROTOINFO_MAX+1];
|
2006-11-29 02:35:06 +01:00
|
|
|
struct nf_conntrack_l4proto *l4proto;
|
2006-01-05 21:19:05 +01:00
|
|
|
int err = 0;
|
|
|
|
|
2013-06-12 17:54:51 +02:00
|
|
|
err = nla_parse_nested(tb, CTA_PROTOINFO_MAX, attr, protoinfo_policy);
|
|
|
|
if (err < 0)
|
|
|
|
return err;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2009-03-18 17:28:37 +01:00
|
|
|
rcu_read_lock();
|
|
|
|
l4proto = __nf_ct_l4proto_find(nf_ct_l3num(ct), nf_ct_protonum(ct));
|
2007-09-28 23:37:41 +02:00
|
|
|
if (l4proto->from_nlattr)
|
|
|
|
err = l4proto->from_nlattr(tb, ct);
|
2009-03-18 17:28:37 +01:00
|
|
|
rcu_read_unlock();
|
2006-01-05 21:19:05 +01:00
|
|
|
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2013-08-27 08:50:12 +02:00
|
|
|
static const struct nla_policy seqadj_policy[CTA_SEQADJ_MAX+1] = {
|
|
|
|
[CTA_SEQADJ_CORRECTION_POS] = { .type = NLA_U32 },
|
|
|
|
[CTA_SEQADJ_OFFSET_BEFORE] = { .type = NLA_U32 },
|
|
|
|
[CTA_SEQADJ_OFFSET_AFTER] = { .type = NLA_U32 },
|
2010-02-10 15:38:33 +01:00
|
|
|
};
|
|
|
|
|
2016-04-12 23:32:34 +02:00
|
|
|
static int change_seq_adj(struct nf_ct_seqadj *seq,
|
|
|
|
const struct nlattr * const attr)
|
2007-12-18 07:28:00 +01:00
|
|
|
{
|
2013-06-12 17:54:51 +02:00
|
|
|
int err;
|
2013-08-27 08:50:12 +02:00
|
|
|
struct nlattr *cda[CTA_SEQADJ_MAX+1];
|
2007-12-18 07:28:00 +01:00
|
|
|
|
2013-08-27 08:50:12 +02:00
|
|
|
err = nla_parse_nested(cda, CTA_SEQADJ_MAX, attr, seqadj_policy);
|
2013-06-12 17:54:51 +02:00
|
|
|
if (err < 0)
|
|
|
|
return err;
|
2007-12-18 07:28:00 +01:00
|
|
|
|
2013-08-27 08:50:12 +02:00
|
|
|
if (!cda[CTA_SEQADJ_CORRECTION_POS])
|
2007-12-18 07:28:00 +01:00
|
|
|
return -EINVAL;
|
|
|
|
|
2013-08-27 08:50:12 +02:00
|
|
|
seq->correction_pos =
|
|
|
|
ntohl(nla_get_be32(cda[CTA_SEQADJ_CORRECTION_POS]));
|
2007-12-18 07:28:00 +01:00
|
|
|
|
2013-08-27 08:50:12 +02:00
|
|
|
if (!cda[CTA_SEQADJ_OFFSET_BEFORE])
|
2007-12-18 07:28:00 +01:00
|
|
|
return -EINVAL;
|
|
|
|
|
2013-08-27 08:50:12 +02:00
|
|
|
seq->offset_before =
|
|
|
|
ntohl(nla_get_be32(cda[CTA_SEQADJ_OFFSET_BEFORE]));
|
2007-12-18 07:28:00 +01:00
|
|
|
|
2013-08-27 08:50:12 +02:00
|
|
|
if (!cda[CTA_SEQADJ_OFFSET_AFTER])
|
2007-12-18 07:28:00 +01:00
|
|
|
return -EINVAL;
|
|
|
|
|
2013-08-27 08:50:12 +02:00
|
|
|
seq->offset_after =
|
|
|
|
ntohl(nla_get_be32(cda[CTA_SEQADJ_OFFSET_AFTER]));
|
2007-12-18 07:28:00 +01:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2013-08-27 08:50:12 +02:00
|
|
|
ctnetlink_change_seq_adj(struct nf_conn *ct,
|
|
|
|
const struct nlattr * const cda[])
|
2007-12-18 07:28:00 +01:00
|
|
|
{
|
2013-08-27 08:50:12 +02:00
|
|
|
struct nf_conn_seqadj *seqadj = nfct_seqadj(ct);
|
2007-12-18 07:28:00 +01:00
|
|
|
int ret = 0;
|
|
|
|
|
2013-08-27 08:50:12 +02:00
|
|
|
if (!seqadj)
|
2007-12-18 07:28:00 +01:00
|
|
|
return 0;
|
|
|
|
|
2013-08-27 08:50:12 +02:00
|
|
|
if (cda[CTA_SEQ_ADJ_ORIG]) {
|
|
|
|
ret = change_seq_adj(&seqadj->seq[IP_CT_DIR_ORIGINAL],
|
|
|
|
cda[CTA_SEQ_ADJ_ORIG]);
|
2007-12-18 07:28:00 +01:00
|
|
|
if (ret < 0)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
ct->status |= IPS_SEQ_ADJUST;
|
|
|
|
}
|
|
|
|
|
2013-08-27 08:50:12 +02:00
|
|
|
if (cda[CTA_SEQ_ADJ_REPLY]) {
|
|
|
|
ret = change_seq_adj(&seqadj->seq[IP_CT_DIR_REPLY],
|
|
|
|
cda[CTA_SEQ_ADJ_REPLY]);
|
2007-12-18 07:28:00 +01:00
|
|
|
if (ret < 0)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
ct->status |= IPS_SEQ_ADJUST;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2013-01-11 07:30:46 +01:00
|
|
|
static int
|
|
|
|
ctnetlink_attach_labels(struct nf_conn *ct, const struct nlattr * const cda[])
|
|
|
|
{
|
|
|
|
#ifdef CONFIG_NF_CONNTRACK_LABELS
|
|
|
|
size_t len = nla_len(cda[CTA_LABELS]);
|
|
|
|
const void *mask = cda[CTA_LABELS_MASK];
|
|
|
|
|
|
|
|
if (len & (sizeof(u32)-1)) /* must be multiple of u32 */
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (mask) {
|
|
|
|
if (nla_len(cda[CTA_LABELS_MASK]) == 0 ||
|
|
|
|
nla_len(cda[CTA_LABELS_MASK]) != len)
|
|
|
|
return -EINVAL;
|
|
|
|
mask = nla_data(cda[CTA_LABELS_MASK]);
|
|
|
|
}
|
|
|
|
|
|
|
|
len /= sizeof(u32);
|
|
|
|
|
|
|
|
return nf_connlabels_replace(ct, nla_data(cda[CTA_LABELS]), mask, len);
|
|
|
|
#else
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2006-01-05 21:19:05 +01:00
|
|
|
static int
|
2009-08-25 16:07:58 +02:00
|
|
|
ctnetlink_change_conntrack(struct nf_conn *ct,
|
|
|
|
const struct nlattr * const cda[])
|
2006-01-05 21:19:05 +01:00
|
|
|
{
|
|
|
|
int err;
|
|
|
|
|
2009-03-16 15:27:22 +01:00
|
|
|
/* only allow NAT changes and master assignation for new conntracks */
|
|
|
|
if (cda[CTA_NAT_SRC] || cda[CTA_NAT_DST] || cda[CTA_TUPLE_MASTER])
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
if (cda[CTA_HELP]) {
|
2006-01-05 21:19:05 +01:00
|
|
|
err = ctnetlink_change_helper(ct, cda);
|
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
if (cda[CTA_TIMEOUT]) {
|
2006-01-05 21:19:05 +01:00
|
|
|
err = ctnetlink_change_timeout(ct, cda);
|
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
if (cda[CTA_STATUS]) {
|
2006-01-05 21:19:05 +01:00
|
|
|
err = ctnetlink_change_status(ct, cda);
|
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
if (cda[CTA_PROTOINFO]) {
|
2006-01-05 21:19:05 +01:00
|
|
|
err = ctnetlink_change_protoinfo(ct, cda);
|
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2006-04-01 12:23:21 +02:00
|
|
|
#if defined(CONFIG_NF_CONNTRACK_MARK)
|
2007-09-28 23:37:03 +02:00
|
|
|
if (cda[CTA_MARK])
|
2007-12-18 07:29:45 +01:00
|
|
|
ct->mark = ntohl(nla_get_be32(cda[CTA_MARK]));
|
2006-01-05 21:19:05 +01:00
|
|
|
#endif
|
|
|
|
|
2013-08-27 08:50:12 +02:00
|
|
|
if (cda[CTA_SEQ_ADJ_ORIG] || cda[CTA_SEQ_ADJ_REPLY]) {
|
|
|
|
err = ctnetlink_change_seq_adj(ct, cda);
|
2007-12-18 07:28:00 +01:00
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
}
|
2013-08-27 08:50:12 +02:00
|
|
|
|
2013-01-11 07:30:46 +01:00
|
|
|
if (cda[CTA_LABELS]) {
|
|
|
|
err = ctnetlink_attach_labels(ct, cda);
|
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
}
|
2007-12-18 07:28:00 +01:00
|
|
|
|
2006-01-05 21:19:05 +01:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2009-03-16 15:28:09 +01:00
|
|
|
static struct nf_conn *
|
2015-08-08 21:40:01 +02:00
|
|
|
ctnetlink_create_conntrack(struct net *net,
|
|
|
|
const struct nf_conntrack_zone *zone,
|
2010-01-13 16:04:18 +01:00
|
|
|
const struct nlattr * const cda[],
|
2006-01-05 21:19:05 +01:00
|
|
|
struct nf_conntrack_tuple *otuple,
|
2007-09-28 23:43:53 +02:00
|
|
|
struct nf_conntrack_tuple *rtuple,
|
2009-03-16 15:25:46 +01:00
|
|
|
u8 u3)
|
2006-01-05 21:19:05 +01:00
|
|
|
{
|
|
|
|
struct nf_conn *ct;
|
|
|
|
int err = -EINVAL;
|
2007-07-08 07:23:42 +02:00
|
|
|
struct nf_conntrack_helper *helper;
|
2011-04-21 10:55:07 +02:00
|
|
|
struct nf_conn_tstamp *tstamp;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2010-02-15 18:14:57 +01:00
|
|
|
ct = nf_conntrack_alloc(net, zone, otuple, rtuple, GFP_ATOMIC);
|
netfilter 07/09: simplify nf_conntrack_alloc() error handling
nf_conntrack_alloc cannot return NULL, so there is no need to check for
NULL before using the value. I have also removed the initialization of ct
to NULL in nf_conntrack_alloc, since the value is never used, and since
perhaps it might lead one to think that return ct at the end might return
NULL.
The semantic patch that finds this problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@match exists@
expression x, E;
position p1,p2;
statement S1, S2;
@@
x@p1 = nf_conntrack_alloc(...)
... when != x = E
(
if (x@p2 == NULL || ...) S1 else S2
|
if (x@p2 == NULL && ...) S1 else S2
)
@other_match exists@
expression match.x, E1, E2;
position p1!=match.p1,match.p2;
@@
x@p1 = E1
... when != x = E2
x@p2
@ script:python depends on !other_match@
p1 << match.p1;
p2 << match.p2;
@@
print "%s: call to nf_conntrack_alloc %s bad test %s" % (p1[0].file,p1[0].line,p2[0].line)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-12 01:06:08 +01:00
|
|
|
if (IS_ERR(ct))
|
2009-03-16 15:28:09 +01:00
|
|
|
return ERR_PTR(-ENOMEM);
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
if (!cda[CTA_TIMEOUT])
|
2009-03-18 17:36:40 +01:00
|
|
|
goto err1;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2016-08-25 15:33:31 +02:00
|
|
|
ct->timeout = nfct_time_stamp + ntohl(nla_get_be32(cda[CTA_TIMEOUT])) * HZ;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2008-08-19 06:30:55 +02:00
|
|
|
rcu_read_lock();
|
2008-11-18 11:54:05 +01:00
|
|
|
if (cda[CTA_HELP]) {
|
2009-04-22 11:26:37 +02:00
|
|
|
char *helpname = NULL;
|
2012-06-07 14:19:42 +02:00
|
|
|
struct nlattr *helpinfo = NULL;
|
|
|
|
|
|
|
|
err = ctnetlink_parse_help(cda[CTA_HELP], &helpname, &helpinfo);
|
2009-03-18 17:36:40 +01:00
|
|
|
if (err < 0)
|
|
|
|
goto err2;
|
2008-11-18 11:54:05 +01:00
|
|
|
|
2010-02-03 13:41:29 +01:00
|
|
|
helper = __nf_conntrack_helper_find(helpname, nf_ct_l3num(ct),
|
|
|
|
nf_ct_protonum(ct));
|
2008-11-18 11:54:05 +01:00
|
|
|
if (helper == NULL) {
|
|
|
|
rcu_read_unlock();
|
|
|
|
#ifdef CONFIG_MODULES
|
|
|
|
if (request_module("nfct-helper-%s", helpname) < 0) {
|
|
|
|
err = -EOPNOTSUPP;
|
2009-03-18 17:36:40 +01:00
|
|
|
goto err1;
|
2008-11-18 11:54:05 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
rcu_read_lock();
|
2010-02-03 13:41:29 +01:00
|
|
|
helper = __nf_conntrack_helper_find(helpname,
|
|
|
|
nf_ct_l3num(ct),
|
|
|
|
nf_ct_protonum(ct));
|
2008-11-18 11:54:05 +01:00
|
|
|
if (helper) {
|
|
|
|
err = -EAGAIN;
|
2009-03-18 17:36:40 +01:00
|
|
|
goto err2;
|
2008-11-18 11:54:05 +01:00
|
|
|
}
|
|
|
|
rcu_read_unlock();
|
|
|
|
#endif
|
|
|
|
err = -EOPNOTSUPP;
|
2009-03-18 17:36:40 +01:00
|
|
|
goto err1;
|
2008-11-18 11:54:05 +01:00
|
|
|
} else {
|
|
|
|
struct nf_conn_help *help;
|
|
|
|
|
2012-06-07 12:11:50 +02:00
|
|
|
help = nf_ct_helper_ext_add(ct, helper, GFP_ATOMIC);
|
2008-11-18 11:54:05 +01:00
|
|
|
if (help == NULL) {
|
|
|
|
err = -ENOMEM;
|
2009-03-18 17:36:40 +01:00
|
|
|
goto err2;
|
2008-11-18 11:54:05 +01:00
|
|
|
}
|
2012-06-07 14:19:42 +02:00
|
|
|
/* set private helper data if allowed. */
|
2012-09-21 16:52:08 +02:00
|
|
|
if (helper->from_nlattr)
|
2012-06-07 14:19:42 +02:00
|
|
|
helper->from_nlattr(helpinfo, ct);
|
2008-11-18 11:54:05 +01:00
|
|
|
|
|
|
|
/* not in hash table yet so not strictly necessary */
|
2011-08-01 18:19:00 +02:00
|
|
|
RCU_INIT_POINTER(help->helper, helper);
|
2008-11-18 11:54:05 +01:00
|
|
|
}
|
|
|
|
} else {
|
|
|
|
/* try an implicit helper assignation */
|
2010-02-03 14:13:03 +01:00
|
|
|
err = __nf_ct_try_assign_helper(ct, NULL, GFP_ATOMIC);
|
2009-03-18 17:36:40 +01:00
|
|
|
if (err < 0)
|
|
|
|
goto err2;
|
2008-08-19 06:30:55 +02:00
|
|
|
}
|
|
|
|
|
netfilter: ctnetlink: force null nat binding on insert
Quoting Andrey Vagin:
When a conntrack is created by kernel, it is initialized (sets
IPS_{DST,SRC}_NAT_DONE_BIT bits in nf_nat_setup_info) and only then it
is added in hashes (__nf_conntrack_hash_insert), so one conntract
can't be initialized from a few threads concurrently.
ctnetlink can add an uninitialized conntrack (w/o
IPS_{DST,SRC}_NAT_DONE_BIT) in hashes, then a few threads can look up
this conntrack and start initialize it concurrently. It's dangerous,
because BUG can be triggered from nf_nat_setup_info.
Fix this race by always setting up nat, even if no CTA_NAT_ attribute
was requested before inserting the ct into the hash table. In absence
of CTA_NAT_ attribute, a null binding is created.
This alters current behaviour: Before this patch, the first packet
matching the newly injected conntrack would be run through the nat
table since nf_nat_initialized() returns false. IOW, this forces
ctnetlink users to specify the desired nat transformation on ct
creation time.
Thanks for Florian Westphal, this patch is based on his original
patch to address this problem, including this patch description.
Reported-By: Andrey Vagin <avagin@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Florian Westphal <fw@strlen.de>
2014-02-16 12:15:43 +01:00
|
|
|
err = ctnetlink_setup_nat(ct, cda);
|
|
|
|
if (err < 0)
|
|
|
|
goto err2;
|
2008-10-14 20:58:31 +02:00
|
|
|
|
2010-02-19 14:24:39 +01:00
|
|
|
nf_ct_acct_ext_add(ct, GFP_ATOMIC);
|
2011-01-19 16:00:07 +01:00
|
|
|
nf_ct_tstamp_ext_add(ct, GFP_ATOMIC);
|
2010-02-19 14:24:39 +01:00
|
|
|
nf_ct_ecache_ext_add(ct, 0, 0, GFP_ATOMIC);
|
2013-01-11 07:30:44 +01:00
|
|
|
nf_ct_labels_ext_add(ct);
|
|
|
|
|
2010-02-19 14:24:39 +01:00
|
|
|
/* we must add conntrack extensions before confirmation. */
|
|
|
|
ct->status |= IPS_CONFIRMED;
|
|
|
|
|
|
|
|
if (cda[CTA_STATUS]) {
|
|
|
|
err = ctnetlink_change_status(ct, cda);
|
2009-03-18 17:36:40 +01:00
|
|
|
if (err < 0)
|
|
|
|
goto err2;
|
2006-11-29 02:35:31 +01:00
|
|
|
}
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2013-08-27 08:50:12 +02:00
|
|
|
if (cda[CTA_SEQ_ADJ_ORIG] || cda[CTA_SEQ_ADJ_REPLY]) {
|
|
|
|
err = ctnetlink_change_seq_adj(ct, cda);
|
2009-03-18 17:36:40 +01:00
|
|
|
if (err < 0)
|
|
|
|
goto err2;
|
2009-02-09 23:33:57 +01:00
|
|
|
}
|
|
|
|
|
2010-11-12 17:33:17 +01:00
|
|
|
memset(&ct->proto, 0, sizeof(ct->proto));
|
2007-09-28 23:37:03 +02:00
|
|
|
if (cda[CTA_PROTOINFO]) {
|
2006-01-05 21:19:05 +01:00
|
|
|
err = ctnetlink_change_protoinfo(ct, cda);
|
2009-03-18 17:36:40 +01:00
|
|
|
if (err < 0)
|
|
|
|
goto err2;
|
2006-01-05 21:19:05 +01:00
|
|
|
}
|
|
|
|
|
2006-04-01 12:23:21 +02:00
|
|
|
#if defined(CONFIG_NF_CONNTRACK_MARK)
|
2007-09-28 23:37:03 +02:00
|
|
|
if (cda[CTA_MARK])
|
2007-12-18 07:29:45 +01:00
|
|
|
ct->mark = ntohl(nla_get_be32(cda[CTA_MARK]));
|
2006-01-05 21:19:05 +01:00
|
|
|
#endif
|
|
|
|
|
2007-09-28 23:43:53 +02:00
|
|
|
/* setup master conntrack: this is a confirmed expectation */
|
2009-03-16 15:25:46 +01:00
|
|
|
if (cda[CTA_TUPLE_MASTER]) {
|
|
|
|
struct nf_conntrack_tuple master;
|
|
|
|
struct nf_conntrack_tuple_hash *master_h;
|
|
|
|
struct nf_conn *master_ct;
|
|
|
|
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
err = ctnetlink_parse_tuple(cda, &master, CTA_TUPLE_MASTER,
|
|
|
|
u3, NULL);
|
2009-03-16 15:25:46 +01:00
|
|
|
if (err < 0)
|
2009-03-18 17:36:40 +01:00
|
|
|
goto err2;
|
2009-03-16 15:25:46 +01:00
|
|
|
|
2010-02-15 18:14:57 +01:00
|
|
|
master_h = nf_conntrack_find_get(net, zone, &master);
|
2009-03-16 15:25:46 +01:00
|
|
|
if (master_h == NULL) {
|
|
|
|
err = -ENOENT;
|
2009-03-18 17:36:40 +01:00
|
|
|
goto err2;
|
2009-03-16 15:25:46 +01:00
|
|
|
}
|
|
|
|
master_ct = nf_ct_tuplehash_to_ctrack(master_h);
|
2007-12-12 19:34:29 +01:00
|
|
|
__set_bit(IPS_EXPECTED_BIT, &ct->status);
|
2007-09-28 23:43:53 +02:00
|
|
|
ct->master = master_ct;
|
2007-12-12 19:34:29 +01:00
|
|
|
}
|
2011-04-21 10:55:07 +02:00
|
|
|
tstamp = nf_conn_tstamp_find(ct);
|
|
|
|
if (tstamp)
|
2014-08-23 03:32:09 +02:00
|
|
|
tstamp->start = ktime_get_real_ns();
|
2007-09-28 23:43:53 +02:00
|
|
|
|
2012-02-24 11:45:49 +01:00
|
|
|
err = nf_conntrack_hash_check_insert(ct);
|
|
|
|
if (err < 0)
|
|
|
|
goto err2;
|
|
|
|
|
2008-01-31 13:36:54 +01:00
|
|
|
rcu_read_unlock();
|
2006-11-27 19:25:32 +01:00
|
|
|
|
2009-03-16 15:28:09 +01:00
|
|
|
return ct;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2009-03-18 17:36:40 +01:00
|
|
|
err2:
|
|
|
|
rcu_read_unlock();
|
|
|
|
err1:
|
2006-01-05 21:19:05 +01:00
|
|
|
nf_conntrack_free(ct);
|
2009-03-16 15:28:09 +01:00
|
|
|
return ERR_PTR(err);
|
2006-01-05 21:19:05 +01:00
|
|
|
}
|
|
|
|
|
2015-12-15 18:41:56 +01:00
|
|
|
static int ctnetlink_new_conntrack(struct net *net, struct sock *ctnl,
|
|
|
|
struct sk_buff *skb,
|
|
|
|
const struct nlmsghdr *nlh,
|
|
|
|
const struct nlattr * const cda[])
|
2006-01-05 21:19:05 +01:00
|
|
|
{
|
|
|
|
struct nf_conntrack_tuple otuple, rtuple;
|
|
|
|
struct nf_conntrack_tuple_hash *h = NULL;
|
2009-06-02 20:07:39 +02:00
|
|
|
struct nfgenmsg *nfmsg = nlmsg_data(nlh);
|
2012-02-24 11:45:49 +01:00
|
|
|
struct nf_conn *ct;
|
2006-01-05 21:19:05 +01:00
|
|
|
u_int8_t u3 = nfmsg->nfgen_family;
|
2015-08-08 21:40:01 +02:00
|
|
|
struct nf_conntrack_zone zone;
|
2010-02-15 18:14:57 +01:00
|
|
|
int err;
|
|
|
|
|
|
|
|
err = ctnetlink_parse_zone(cda[CTA_ZONE], &zone);
|
|
|
|
if (err < 0)
|
|
|
|
return err;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
if (cda[CTA_TUPLE_ORIG]) {
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
err = ctnetlink_parse_tuple(cda, &otuple, CTA_TUPLE_ORIG,
|
|
|
|
u3, &zone);
|
2006-01-05 21:19:05 +01:00
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
if (cda[CTA_TUPLE_REPLY]) {
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
err = ctnetlink_parse_tuple(cda, &rtuple, CTA_TUPLE_REPLY,
|
|
|
|
u3, &zone);
|
2006-01-05 21:19:05 +01:00
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
if (cda[CTA_TUPLE_ORIG])
|
2015-08-08 21:40:01 +02:00
|
|
|
h = nf_conntrack_find_get(net, &zone, &otuple);
|
2007-09-28 23:37:03 +02:00
|
|
|
else if (cda[CTA_TUPLE_REPLY])
|
2015-08-08 21:40:01 +02:00
|
|
|
h = nf_conntrack_find_get(net, &zone, &rtuple);
|
2006-01-05 21:19:05 +01:00
|
|
|
|
|
|
|
if (h == NULL) {
|
|
|
|
err = -ENOENT;
|
2009-03-16 15:28:09 +01:00
|
|
|
if (nlh->nlmsg_flags & NLM_F_CREATE) {
|
2009-05-05 17:48:26 +02:00
|
|
|
enum ip_conntrack_events events;
|
2007-09-28 23:43:53 +02:00
|
|
|
|
2013-02-12 00:22:38 +01:00
|
|
|
if (!cda[CTA_TUPLE_ORIG] || !cda[CTA_TUPLE_REPLY])
|
|
|
|
return -EINVAL;
|
2016-08-08 16:10:26 +02:00
|
|
|
if (otuple.dst.protonum != rtuple.dst.protonum)
|
|
|
|
return -EINVAL;
|
2013-02-12 00:22:38 +01:00
|
|
|
|
2015-08-08 21:40:01 +02:00
|
|
|
ct = ctnetlink_create_conntrack(net, &zone, cda, &otuple,
|
2009-03-16 15:28:09 +01:00
|
|
|
&rtuple, u3);
|
2012-02-24 11:45:49 +01:00
|
|
|
if (IS_ERR(ct))
|
|
|
|
return PTR_ERR(ct);
|
|
|
|
|
2009-03-16 15:28:09 +01:00
|
|
|
err = 0;
|
2009-05-05 17:48:26 +02:00
|
|
|
if (test_bit(IPS_EXPECTED_BIT, &ct->status))
|
2017-04-01 14:31:32 +02:00
|
|
|
events = 1 << IPCT_RELATED;
|
2009-05-05 17:48:26 +02:00
|
|
|
else
|
2017-04-01 14:31:32 +02:00
|
|
|
events = 1 << IPCT_NEW;
|
2009-05-05 17:48:26 +02:00
|
|
|
|
2013-01-11 07:30:46 +01:00
|
|
|
if (cda[CTA_LABELS] &&
|
|
|
|
ctnetlink_attach_labels(ct, cda) == 0)
|
|
|
|
events |= (1 << IPCT_LABEL);
|
|
|
|
|
2010-02-03 13:48:53 +01:00
|
|
|
nf_conntrack_eventmask_report((1 << IPCT_REPLY) |
|
|
|
|
(1 << IPCT_ASSURED) |
|
2009-06-13 12:26:29 +02:00
|
|
|
(1 << IPCT_HELPER) |
|
|
|
|
(1 << IPCT_PROTOINFO) |
|
2013-08-27 08:50:12 +02:00
|
|
|
(1 << IPCT_SEQADJ) |
|
2009-06-13 12:26:29 +02:00
|
|
|
(1 << IPCT_MARK) | events,
|
2012-09-07 22:12:54 +02:00
|
|
|
ct, NETLINK_CB(skb).portid,
|
2009-06-13 12:26:29 +02:00
|
|
|
nlmsg_report(nlh));
|
2009-03-16 15:28:09 +01:00
|
|
|
nf_ct_put(ct);
|
2012-02-24 11:45:49 +01:00
|
|
|
}
|
2007-09-28 23:43:53 +02:00
|
|
|
|
2006-01-05 21:19:05 +01:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
/* implicit 'else' */
|
|
|
|
|
|
|
|
err = -EEXIST;
|
2012-02-24 11:45:49 +01:00
|
|
|
ct = nf_ct_tuplehash_to_ctrack(h);
|
2007-08-08 03:11:26 +02:00
|
|
|
if (!(nlh->nlmsg_flags & NLM_F_EXCL)) {
|
2014-03-03 14:46:01 +01:00
|
|
|
spin_lock_bh(&nf_conntrack_expect_lock);
|
2008-11-18 11:56:20 +01:00
|
|
|
err = ctnetlink_change_conntrack(ct, cda);
|
2014-03-03 14:46:01 +01:00
|
|
|
spin_unlock_bh(&nf_conntrack_expect_lock);
|
2008-11-18 11:56:20 +01:00
|
|
|
if (err == 0) {
|
2010-02-03 13:48:53 +01:00
|
|
|
nf_conntrack_eventmask_report((1 << IPCT_REPLY) |
|
|
|
|
(1 << IPCT_ASSURED) |
|
2009-06-13 12:26:29 +02:00
|
|
|
(1 << IPCT_HELPER) |
|
2013-06-21 16:51:30 +02:00
|
|
|
(1 << IPCT_LABEL) |
|
2009-06-13 12:26:29 +02:00
|
|
|
(1 << IPCT_PROTOINFO) |
|
2013-08-27 08:50:12 +02:00
|
|
|
(1 << IPCT_SEQADJ) |
|
2009-06-13 12:26:29 +02:00
|
|
|
(1 << IPCT_MARK),
|
2012-09-07 22:12:54 +02:00
|
|
|
ct, NETLINK_CB(skb).portid,
|
2009-06-13 12:26:29 +02:00
|
|
|
nlmsg_report(nlh));
|
2012-02-24 11:45:49 +01:00
|
|
|
}
|
2007-08-08 03:11:26 +02:00
|
|
|
}
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2012-02-24 11:45:49 +01:00
|
|
|
nf_ct_put(ct);
|
2006-01-05 21:19:05 +01:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2012-06-26 20:27:09 +02:00
|
|
|
static int
|
2012-09-07 22:12:54 +02:00
|
|
|
ctnetlink_ct_stat_cpu_fill_info(struct sk_buff *skb, u32 portid, u32 seq,
|
2012-06-26 20:27:09 +02:00
|
|
|
__u16 cpu, const struct ip_conntrack_stat *st)
|
|
|
|
{
|
|
|
|
struct nlmsghdr *nlh;
|
|
|
|
struct nfgenmsg *nfmsg;
|
2012-09-07 22:12:54 +02:00
|
|
|
unsigned int flags = portid ? NLM_F_MULTI : 0, event;
|
2012-06-26 20:27:09 +02:00
|
|
|
|
|
|
|
event = (NFNL_SUBSYS_CTNETLINK << 8 | IPCTNL_MSG_CT_GET_STATS_CPU);
|
2012-09-07 22:12:54 +02:00
|
|
|
nlh = nlmsg_put(skb, portid, seq, event, sizeof(*nfmsg), flags);
|
2012-06-26 20:27:09 +02:00
|
|
|
if (nlh == NULL)
|
|
|
|
goto nlmsg_failure;
|
|
|
|
|
|
|
|
nfmsg = nlmsg_data(nlh);
|
|
|
|
nfmsg->nfgen_family = AF_UNSPEC;
|
|
|
|
nfmsg->version = NFNETLINK_V0;
|
|
|
|
nfmsg->res_id = htons(cpu);
|
|
|
|
|
2016-09-11 22:55:53 +02:00
|
|
|
if (nla_put_be32(skb, CTA_STATS_FOUND, htonl(st->found)) ||
|
2012-06-26 20:27:09 +02:00
|
|
|
nla_put_be32(skb, CTA_STATS_INVALID, htonl(st->invalid)) ||
|
|
|
|
nla_put_be32(skb, CTA_STATS_IGNORE, htonl(st->ignore)) ||
|
|
|
|
nla_put_be32(skb, CTA_STATS_INSERT, htonl(st->insert)) ||
|
|
|
|
nla_put_be32(skb, CTA_STATS_INSERT_FAILED,
|
|
|
|
htonl(st->insert_failed)) ||
|
|
|
|
nla_put_be32(skb, CTA_STATS_DROP, htonl(st->drop)) ||
|
|
|
|
nla_put_be32(skb, CTA_STATS_EARLY_DROP, htonl(st->early_drop)) ||
|
|
|
|
nla_put_be32(skb, CTA_STATS_ERROR, htonl(st->error)) ||
|
|
|
|
nla_put_be32(skb, CTA_STATS_SEARCH_RESTART,
|
|
|
|
htonl(st->search_restart)))
|
|
|
|
goto nla_put_failure;
|
|
|
|
|
|
|
|
nlmsg_end(skb, nlh);
|
|
|
|
return skb->len;
|
|
|
|
|
|
|
|
nla_put_failure:
|
|
|
|
nlmsg_failure:
|
|
|
|
nlmsg_cancel(skb, nlh);
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
ctnetlink_ct_stat_cpu_dump(struct sk_buff *skb, struct netlink_callback *cb)
|
|
|
|
{
|
|
|
|
int cpu;
|
|
|
|
struct net *net = sock_net(skb->sk);
|
|
|
|
|
|
|
|
if (cb->args[0] == nr_cpu_ids)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
for (cpu = cb->args[0]; cpu < nr_cpu_ids; cpu++) {
|
|
|
|
const struct ip_conntrack_stat *st;
|
|
|
|
|
|
|
|
if (!cpu_possible(cpu))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
st = per_cpu_ptr(net->ct.stat, cpu);
|
|
|
|
if (ctnetlink_ct_stat_cpu_fill_info(skb,
|
2012-09-07 22:12:54 +02:00
|
|
|
NETLINK_CB(cb->skb).portid,
|
2012-06-26 20:27:09 +02:00
|
|
|
cb->nlh->nlmsg_seq,
|
|
|
|
cpu, st) < 0)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
cb->args[0] = cpu;
|
|
|
|
|
|
|
|
return skb->len;
|
|
|
|
}
|
|
|
|
|
2015-12-15 18:41:56 +01:00
|
|
|
static int ctnetlink_stat_ct_cpu(struct net *net, struct sock *ctnl,
|
|
|
|
struct sk_buff *skb,
|
|
|
|
const struct nlmsghdr *nlh,
|
|
|
|
const struct nlattr * const cda[])
|
2012-06-26 20:27:09 +02:00
|
|
|
{
|
|
|
|
if (nlh->nlmsg_flags & NLM_F_DUMP) {
|
|
|
|
struct netlink_dump_control c = {
|
|
|
|
.dump = ctnetlink_ct_stat_cpu_dump,
|
|
|
|
};
|
|
|
|
return netlink_dump_start(ctnl, skb, nlh, &c);
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2012-09-07 22:12:54 +02:00
|
|
|
ctnetlink_stat_ct_fill_info(struct sk_buff *skb, u32 portid, u32 seq, u32 type,
|
2012-06-26 20:27:09 +02:00
|
|
|
struct net *net)
|
|
|
|
{
|
|
|
|
struct nlmsghdr *nlh;
|
|
|
|
struct nfgenmsg *nfmsg;
|
2012-09-07 22:12:54 +02:00
|
|
|
unsigned int flags = portid ? NLM_F_MULTI : 0, event;
|
2012-06-26 20:27:09 +02:00
|
|
|
unsigned int nr_conntracks = atomic_read(&net->ct.count);
|
|
|
|
|
|
|
|
event = (NFNL_SUBSYS_CTNETLINK << 8 | IPCTNL_MSG_CT_GET_STATS);
|
2012-09-07 22:12:54 +02:00
|
|
|
nlh = nlmsg_put(skb, portid, seq, event, sizeof(*nfmsg), flags);
|
2012-06-26 20:27:09 +02:00
|
|
|
if (nlh == NULL)
|
|
|
|
goto nlmsg_failure;
|
|
|
|
|
|
|
|
nfmsg = nlmsg_data(nlh);
|
|
|
|
nfmsg->nfgen_family = AF_UNSPEC;
|
|
|
|
nfmsg->version = NFNETLINK_V0;
|
|
|
|
nfmsg->res_id = 0;
|
|
|
|
|
|
|
|
if (nla_put_be32(skb, CTA_STATS_GLOBAL_ENTRIES, htonl(nr_conntracks)))
|
|
|
|
goto nla_put_failure;
|
|
|
|
|
|
|
|
nlmsg_end(skb, nlh);
|
|
|
|
return skb->len;
|
|
|
|
|
|
|
|
nla_put_failure:
|
|
|
|
nlmsg_failure:
|
|
|
|
nlmsg_cancel(skb, nlh);
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
2015-12-15 18:41:56 +01:00
|
|
|
static int ctnetlink_stat_ct(struct net *net, struct sock *ctnl,
|
|
|
|
struct sk_buff *skb, const struct nlmsghdr *nlh,
|
|
|
|
const struct nlattr * const cda[])
|
2012-06-26 20:27:09 +02:00
|
|
|
{
|
|
|
|
struct sk_buff *skb2;
|
|
|
|
int err;
|
|
|
|
|
|
|
|
skb2 = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
|
|
|
|
if (skb2 == NULL)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
2012-09-07 22:12:54 +02:00
|
|
|
err = ctnetlink_stat_ct_fill_info(skb2, NETLINK_CB(skb).portid,
|
2012-06-26 20:27:09 +02:00
|
|
|
nlh->nlmsg_seq,
|
|
|
|
NFNL_MSG_TYPE(nlh->nlmsg_type),
|
|
|
|
sock_net(skb->sk));
|
|
|
|
if (err <= 0)
|
|
|
|
goto free;
|
|
|
|
|
2012-09-07 22:12:54 +02:00
|
|
|
err = netlink_unicast(ctnl, skb2, NETLINK_CB(skb).portid, MSG_DONTWAIT);
|
2012-06-26 20:27:09 +02:00
|
|
|
if (err < 0)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
free:
|
|
|
|
kfree_skb(skb2);
|
|
|
|
out:
|
|
|
|
/* this avoids a loop in nfnetlink. */
|
|
|
|
return err == -EAGAIN ? -ENOBUFS : err;
|
|
|
|
}
|
|
|
|
|
2013-08-07 18:13:20 +02:00
|
|
|
static const struct nla_policy exp_nla_policy[CTA_EXPECT_MAX+1] = {
|
|
|
|
[CTA_EXPECT_MASTER] = { .type = NLA_NESTED },
|
|
|
|
[CTA_EXPECT_TUPLE] = { .type = NLA_NESTED },
|
|
|
|
[CTA_EXPECT_MASK] = { .type = NLA_NESTED },
|
|
|
|
[CTA_EXPECT_TIMEOUT] = { .type = NLA_U32 },
|
|
|
|
[CTA_EXPECT_ID] = { .type = NLA_U32 },
|
|
|
|
[CTA_EXPECT_HELP_NAME] = { .type = NLA_NUL_STRING,
|
|
|
|
.len = NF_CT_HELPER_NAME_LEN - 1 },
|
|
|
|
[CTA_EXPECT_ZONE] = { .type = NLA_U16 },
|
|
|
|
[CTA_EXPECT_FLAGS] = { .type = NLA_U32 },
|
|
|
|
[CTA_EXPECT_CLASS] = { .type = NLA_U32 },
|
|
|
|
[CTA_EXPECT_NAT] = { .type = NLA_NESTED },
|
|
|
|
[CTA_EXPECT_FN] = { .type = NLA_NUL_STRING },
|
|
|
|
};
|
|
|
|
|
|
|
|
static struct nf_conntrack_expect *
|
|
|
|
ctnetlink_alloc_expect(const struct nlattr *const cda[], struct nf_conn *ct,
|
|
|
|
struct nf_conntrack_helper *helper,
|
|
|
|
struct nf_conntrack_tuple *tuple,
|
|
|
|
struct nf_conntrack_tuple *mask);
|
|
|
|
|
2015-10-05 04:48:47 +02:00
|
|
|
#ifdef CONFIG_NETFILTER_NETLINK_GLUE_CT
|
2012-06-07 12:13:39 +02:00
|
|
|
static size_t
|
2015-10-05 04:47:13 +02:00
|
|
|
ctnetlink_glue_build_size(const struct nf_conn *ct)
|
2012-06-07 12:13:39 +02:00
|
|
|
{
|
|
|
|
return 3 * nla_total_size(0) /* CTA_TUPLE_ORIG|REPL|MASTER */
|
|
|
|
+ 3 * nla_total_size(0) /* CTA_TUPLE_IP */
|
|
|
|
+ 3 * nla_total_size(0) /* CTA_TUPLE_PROTO */
|
|
|
|
+ 3 * nla_total_size(sizeof(u_int8_t)) /* CTA_PROTO_NUM */
|
|
|
|
+ nla_total_size(sizeof(u_int32_t)) /* CTA_ID */
|
|
|
|
+ nla_total_size(sizeof(u_int32_t)) /* CTA_STATUS */
|
|
|
|
+ nla_total_size(sizeof(u_int32_t)) /* CTA_TIMEOUT */
|
|
|
|
+ nla_total_size(0) /* CTA_PROTOINFO */
|
|
|
|
+ nla_total_size(0) /* CTA_HELP */
|
|
|
|
+ nla_total_size(NF_CT_HELPER_NAME_LEN) /* CTA_HELP_NAME */
|
|
|
|
+ ctnetlink_secctx_size(ct)
|
|
|
|
#ifdef CONFIG_NF_NAT_NEEDED
|
|
|
|
+ 2 * nla_total_size(0) /* CTA_NAT_SEQ_ADJ_ORIG|REPL */
|
|
|
|
+ 6 * nla_total_size(sizeof(u_int32_t)) /* CTA_NAT_SEQ_OFFSET */
|
|
|
|
#endif
|
|
|
|
#ifdef CONFIG_NF_CONNTRACK_MARK
|
|
|
|
+ nla_total_size(sizeof(u_int32_t)) /* CTA_MARK */
|
2014-06-16 13:52:34 +02:00
|
|
|
#endif
|
|
|
|
#ifdef CONFIG_NF_CONNTRACK_ZONES
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
+ nla_total_size(sizeof(u_int16_t)) /* CTA_ZONE|CTA_TUPLE_ZONE */
|
2012-06-07 12:13:39 +02:00
|
|
|
#endif
|
|
|
|
+ ctnetlink_proto_size(ct)
|
|
|
|
;
|
|
|
|
}
|
|
|
|
|
2015-10-05 04:49:56 +02:00
|
|
|
static struct nf_conn *ctnetlink_glue_get_ct(const struct sk_buff *skb,
|
2015-10-05 04:47:13 +02:00
|
|
|
enum ip_conntrack_info *ctinfo)
|
2015-09-30 23:53:44 +02:00
|
|
|
{
|
|
|
|
struct nf_conn *ct;
|
|
|
|
|
|
|
|
ct = nf_ct_get(skb, ctinfo);
|
|
|
|
if (ct && nf_ct_is_untracked(ct))
|
|
|
|
ct = NULL;
|
|
|
|
|
|
|
|
return ct;
|
|
|
|
}
|
|
|
|
|
2015-10-05 04:47:13 +02:00
|
|
|
static int __ctnetlink_glue_build(struct sk_buff *skb, struct nf_conn *ct)
|
2012-06-07 12:13:39 +02:00
|
|
|
{
|
2015-08-08 21:40:01 +02:00
|
|
|
const struct nf_conntrack_zone *zone;
|
2012-06-07 12:13:39 +02:00
|
|
|
struct nlattr *nest_parms;
|
|
|
|
|
|
|
|
rcu_read_lock();
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
zone = nf_ct_zone(ct);
|
|
|
|
|
2012-06-07 12:13:39 +02:00
|
|
|
nest_parms = nla_nest_start(skb, CTA_TUPLE_ORIG | NLA_F_NESTED);
|
|
|
|
if (!nest_parms)
|
|
|
|
goto nla_put_failure;
|
|
|
|
if (ctnetlink_dump_tuples(skb, nf_ct_tuple(ct, IP_CT_DIR_ORIGINAL)) < 0)
|
|
|
|
goto nla_put_failure;
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
if (ctnetlink_dump_zone_id(skb, CTA_TUPLE_ZONE, zone,
|
|
|
|
NF_CT_ZONE_DIR_ORIG) < 0)
|
|
|
|
goto nla_put_failure;
|
2012-06-07 12:13:39 +02:00
|
|
|
nla_nest_end(skb, nest_parms);
|
|
|
|
|
|
|
|
nest_parms = nla_nest_start(skb, CTA_TUPLE_REPLY | NLA_F_NESTED);
|
|
|
|
if (!nest_parms)
|
|
|
|
goto nla_put_failure;
|
|
|
|
if (ctnetlink_dump_tuples(skb, nf_ct_tuple(ct, IP_CT_DIR_REPLY)) < 0)
|
|
|
|
goto nla_put_failure;
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
if (ctnetlink_dump_zone_id(skb, CTA_TUPLE_ZONE, zone,
|
|
|
|
NF_CT_ZONE_DIR_REPL) < 0)
|
|
|
|
goto nla_put_failure;
|
2012-06-07 12:13:39 +02:00
|
|
|
nla_nest_end(skb, nest_parms);
|
|
|
|
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
if (ctnetlink_dump_zone_id(skb, CTA_ZONE, zone,
|
|
|
|
NF_CT_DEFAULT_ZONE_DIR) < 0)
|
2015-08-08 21:40:01 +02:00
|
|
|
goto nla_put_failure;
|
2012-06-07 12:13:39 +02:00
|
|
|
|
|
|
|
if (ctnetlink_dump_id(skb, ct) < 0)
|
|
|
|
goto nla_put_failure;
|
|
|
|
|
|
|
|
if (ctnetlink_dump_status(skb, ct) < 0)
|
|
|
|
goto nla_put_failure;
|
|
|
|
|
|
|
|
if (ctnetlink_dump_timeout(skb, ct) < 0)
|
|
|
|
goto nla_put_failure;
|
|
|
|
|
|
|
|
if (ctnetlink_dump_protoinfo(skb, ct) < 0)
|
|
|
|
goto nla_put_failure;
|
|
|
|
|
|
|
|
if (ctnetlink_dump_helpinfo(skb, ct) < 0)
|
|
|
|
goto nla_put_failure;
|
|
|
|
|
|
|
|
#ifdef CONFIG_NF_CONNTRACK_SECMARK
|
|
|
|
if (ct->secmark && ctnetlink_dump_secctx(skb, ct) < 0)
|
|
|
|
goto nla_put_failure;
|
|
|
|
#endif
|
|
|
|
if (ct->master && ctnetlink_dump_master(skb, ct) < 0)
|
|
|
|
goto nla_put_failure;
|
|
|
|
|
|
|
|
if ((ct->status & IPS_SEQ_ADJUST) &&
|
2013-08-27 08:50:12 +02:00
|
|
|
ctnetlink_dump_ct_seq_adj(skb, ct) < 0)
|
2012-06-07 12:13:39 +02:00
|
|
|
goto nla_put_failure;
|
|
|
|
|
|
|
|
#ifdef CONFIG_NF_CONNTRACK_MARK
|
|
|
|
if (ct->mark && ctnetlink_dump_mark(skb, ct) < 0)
|
|
|
|
goto nla_put_failure;
|
|
|
|
#endif
|
2013-01-11 07:30:45 +01:00
|
|
|
if (ctnetlink_dump_labels(skb, ct) < 0)
|
|
|
|
goto nla_put_failure;
|
2012-06-07 12:13:39 +02:00
|
|
|
rcu_read_unlock();
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
nla_put_failure:
|
|
|
|
rcu_read_unlock();
|
|
|
|
return -ENOSPC;
|
|
|
|
}
|
|
|
|
|
2015-09-30 23:53:44 +02:00
|
|
|
static int
|
2015-10-05 04:47:13 +02:00
|
|
|
ctnetlink_glue_build(struct sk_buff *skb, struct nf_conn *ct,
|
|
|
|
enum ip_conntrack_info ctinfo,
|
|
|
|
u_int16_t ct_attr, u_int16_t ct_info_attr)
|
2015-09-30 23:53:44 +02:00
|
|
|
{
|
|
|
|
struct nlattr *nest_parms;
|
|
|
|
|
|
|
|
nest_parms = nla_nest_start(skb, ct_attr | NLA_F_NESTED);
|
|
|
|
if (!nest_parms)
|
|
|
|
goto nla_put_failure;
|
|
|
|
|
2015-10-05 04:47:13 +02:00
|
|
|
if (__ctnetlink_glue_build(skb, ct) < 0)
|
2015-09-30 23:53:44 +02:00
|
|
|
goto nla_put_failure;
|
|
|
|
|
|
|
|
nla_nest_end(skb, nest_parms);
|
|
|
|
|
|
|
|
if (nla_put_be32(skb, ct_info_attr, htonl(ctinfo)))
|
|
|
|
goto nla_put_failure;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
nla_put_failure:
|
|
|
|
return -ENOSPC;
|
|
|
|
}
|
|
|
|
|
2017-01-26 23:49:43 +01:00
|
|
|
static int
|
|
|
|
ctnetlink_update_status(struct nf_conn *ct, const struct nlattr * const cda[])
|
|
|
|
{
|
|
|
|
unsigned int status = ntohl(nla_get_be32(cda[CTA_STATUS]));
|
|
|
|
unsigned long d = ct->status ^ status;
|
|
|
|
|
|
|
|
if (d & IPS_SEEN_REPLY && !(status & IPS_SEEN_REPLY))
|
|
|
|
/* SEEN_REPLY bit can only be set */
|
|
|
|
return -EBUSY;
|
|
|
|
|
|
|
|
if (d & IPS_ASSURED && !(status & IPS_ASSURED))
|
|
|
|
/* ASSURED bit can only be set */
|
|
|
|
return -EBUSY;
|
|
|
|
|
|
|
|
/* This check is less strict than ctnetlink_change_status()
|
|
|
|
* because callers often flip IPS_EXPECTED bits when sending
|
|
|
|
* an NFQA_CT attribute to the kernel. So ignore the
|
|
|
|
* unchangeable bits but do not error out.
|
|
|
|
*/
|
|
|
|
ct->status = (status & ~IPS_UNCHANGEABLE_MASK) |
|
|
|
|
(ct->status & IPS_UNCHANGEABLE_MASK);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2012-06-07 12:13:39 +02:00
|
|
|
static int
|
2015-10-05 04:47:13 +02:00
|
|
|
ctnetlink_glue_parse_ct(const struct nlattr *cda[], struct nf_conn *ct)
|
2012-06-07 12:13:39 +02:00
|
|
|
{
|
|
|
|
int err;
|
|
|
|
|
|
|
|
if (cda[CTA_TIMEOUT]) {
|
|
|
|
err = ctnetlink_change_timeout(ct, cda);
|
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
if (cda[CTA_STATUS]) {
|
2017-01-26 23:49:43 +01:00
|
|
|
err = ctnetlink_update_status(ct, cda);
|
2012-06-07 12:13:39 +02:00
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
if (cda[CTA_HELP]) {
|
|
|
|
err = ctnetlink_change_helper(ct, cda);
|
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
}
|
2013-01-11 07:30:46 +01:00
|
|
|
if (cda[CTA_LABELS]) {
|
|
|
|
err = ctnetlink_attach_labels(ct, cda);
|
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
}
|
2012-06-07 12:13:39 +02:00
|
|
|
#if defined(CONFIG_NF_CONNTRACK_MARK)
|
2013-12-19 18:25:15 +01:00
|
|
|
if (cda[CTA_MARK]) {
|
|
|
|
u32 mask = 0, mark, newmark;
|
|
|
|
if (cda[CTA_MARK_MASK])
|
|
|
|
mask = ~ntohl(nla_get_be32(cda[CTA_MARK_MASK]));
|
|
|
|
|
|
|
|
mark = ntohl(nla_get_be32(cda[CTA_MARK]));
|
|
|
|
newmark = (ct->mark & mask) ^ mark;
|
|
|
|
if (newmark != ct->mark)
|
|
|
|
ct->mark = newmark;
|
|
|
|
}
|
2012-06-07 12:13:39 +02:00
|
|
|
#endif
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2015-10-05 04:47:13 +02:00
|
|
|
ctnetlink_glue_parse(const struct nlattr *attr, struct nf_conn *ct)
|
2012-06-07 12:13:39 +02:00
|
|
|
{
|
|
|
|
struct nlattr *cda[CTA_MAX+1];
|
2012-08-14 12:47:37 +02:00
|
|
|
int ret;
|
2012-06-07 12:13:39 +02:00
|
|
|
|
2013-06-12 17:54:51 +02:00
|
|
|
ret = nla_parse_nested(cda, CTA_MAX, attr, ct_nla_policy);
|
|
|
|
if (ret < 0)
|
|
|
|
return ret;
|
2012-06-07 12:13:39 +02:00
|
|
|
|
2014-03-03 14:46:01 +01:00
|
|
|
spin_lock_bh(&nf_conntrack_expect_lock);
|
2015-10-05 04:47:13 +02:00
|
|
|
ret = ctnetlink_glue_parse_ct((const struct nlattr **)cda, ct);
|
2014-03-03 14:46:01 +01:00
|
|
|
spin_unlock_bh(&nf_conntrack_expect_lock);
|
2012-08-14 12:47:37 +02:00
|
|
|
|
|
|
|
return ret;
|
2012-06-07 12:13:39 +02:00
|
|
|
}
|
|
|
|
|
2015-10-05 04:47:13 +02:00
|
|
|
static int ctnetlink_glue_exp_parse(const struct nlattr * const *cda,
|
|
|
|
const struct nf_conn *ct,
|
|
|
|
struct nf_conntrack_tuple *tuple,
|
|
|
|
struct nf_conntrack_tuple *mask)
|
2013-08-07 18:13:20 +02:00
|
|
|
{
|
|
|
|
int err;
|
|
|
|
|
|
|
|
err = ctnetlink_parse_tuple(cda, tuple, CTA_EXPECT_TUPLE,
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
nf_ct_l3num(ct), NULL);
|
2013-08-07 18:13:20 +02:00
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
|
|
|
|
return ctnetlink_parse_tuple(cda, mask, CTA_EXPECT_MASK,
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
nf_ct_l3num(ct), NULL);
|
2013-08-07 18:13:20 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2015-10-05 04:47:13 +02:00
|
|
|
ctnetlink_glue_attach_expect(const struct nlattr *attr, struct nf_conn *ct,
|
|
|
|
u32 portid, u32 report)
|
2013-08-07 18:13:20 +02:00
|
|
|
{
|
|
|
|
struct nlattr *cda[CTA_EXPECT_MAX+1];
|
|
|
|
struct nf_conntrack_tuple tuple, mask;
|
2013-08-27 11:47:26 +02:00
|
|
|
struct nf_conntrack_helper *helper = NULL;
|
2013-08-07 18:13:20 +02:00
|
|
|
struct nf_conntrack_expect *exp;
|
|
|
|
int err;
|
|
|
|
|
|
|
|
err = nla_parse_nested(cda, CTA_EXPECT_MAX, attr, exp_nla_policy);
|
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
|
2015-10-05 04:47:13 +02:00
|
|
|
err = ctnetlink_glue_exp_parse((const struct nlattr * const *)cda,
|
|
|
|
ct, &tuple, &mask);
|
2013-08-07 18:13:20 +02:00
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
|
|
|
|
if (cda[CTA_EXPECT_HELP_NAME]) {
|
|
|
|
const char *helpname = nla_data(cda[CTA_EXPECT_HELP_NAME]);
|
|
|
|
|
|
|
|
helper = __nf_conntrack_helper_find(helpname, nf_ct_l3num(ct),
|
|
|
|
nf_ct_protonum(ct));
|
|
|
|
if (helper == NULL)
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
|
|
|
exp = ctnetlink_alloc_expect((const struct nlattr * const *)cda, ct,
|
|
|
|
helper, &tuple, &mask);
|
|
|
|
if (IS_ERR(exp))
|
|
|
|
return PTR_ERR(exp);
|
|
|
|
|
|
|
|
err = nf_ct_expect_related_report(exp, portid, report);
|
2016-08-08 16:03:40 +02:00
|
|
|
nf_ct_expect_put(exp);
|
|
|
|
return err;
|
2013-08-07 18:13:20 +02:00
|
|
|
}
|
|
|
|
|
2015-10-05 04:47:13 +02:00
|
|
|
static void ctnetlink_glue_seqadj(struct sk_buff *skb, struct nf_conn *ct,
|
|
|
|
enum ip_conntrack_info ctinfo, int diff)
|
2015-09-30 23:53:44 +02:00
|
|
|
{
|
|
|
|
if (!(ct->status & IPS_NAT_MASK))
|
|
|
|
return;
|
|
|
|
|
|
|
|
nf_ct_tcp_seqadj_set(skb, ct, ctinfo, diff);
|
|
|
|
}
|
|
|
|
|
2015-10-05 04:47:13 +02:00
|
|
|
static struct nfnl_ct_hook ctnetlink_glue_hook = {
|
|
|
|
.get_ct = ctnetlink_glue_get_ct,
|
|
|
|
.build_size = ctnetlink_glue_build_size,
|
|
|
|
.build = ctnetlink_glue_build,
|
|
|
|
.parse = ctnetlink_glue_parse,
|
|
|
|
.attach_expect = ctnetlink_glue_attach_expect,
|
|
|
|
.seq_adjust = ctnetlink_glue_seqadj,
|
2012-06-07 12:13:39 +02:00
|
|
|
};
|
2015-10-05 04:48:47 +02:00
|
|
|
#endif /* CONFIG_NETFILTER_NETLINK_GLUE_CT */
|
2012-06-07 12:13:39 +02:00
|
|
|
|
2007-02-12 20:15:49 +01:00
|
|
|
/***********************************************************************
|
|
|
|
* EXPECT
|
|
|
|
***********************************************************************/
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2016-04-12 23:32:34 +02:00
|
|
|
static int ctnetlink_exp_dump_tuple(struct sk_buff *skb,
|
|
|
|
const struct nf_conntrack_tuple *tuple,
|
|
|
|
enum ctattr_expect type)
|
2006-01-05 21:19:05 +01:00
|
|
|
{
|
2007-09-28 23:37:03 +02:00
|
|
|
struct nlattr *nest_parms;
|
2007-02-12 20:15:49 +01:00
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
nest_parms = nla_nest_start(skb, type | NLA_F_NESTED);
|
|
|
|
if (!nest_parms)
|
|
|
|
goto nla_put_failure;
|
2006-01-05 21:19:05 +01:00
|
|
|
if (ctnetlink_dump_tuples(skb, tuple) < 0)
|
2007-09-28 23:37:03 +02:00
|
|
|
goto nla_put_failure;
|
|
|
|
nla_nest_end(skb, nest_parms);
|
2006-01-05 21:19:05 +01:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
nla_put_failure:
|
2006-01-05 21:19:05 +01:00
|
|
|
return -1;
|
2007-02-12 20:15:49 +01:00
|
|
|
}
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2016-04-12 23:32:34 +02:00
|
|
|
static int ctnetlink_exp_dump_mask(struct sk_buff *skb,
|
|
|
|
const struct nf_conntrack_tuple *tuple,
|
|
|
|
const struct nf_conntrack_tuple_mask *mask)
|
2006-03-22 22:54:15 +01:00
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
struct nf_conntrack_l3proto *l3proto;
|
2006-11-29 02:35:06 +01:00
|
|
|
struct nf_conntrack_l4proto *l4proto;
|
2007-07-08 07:31:32 +02:00
|
|
|
struct nf_conntrack_tuple m;
|
2007-09-28 23:37:03 +02:00
|
|
|
struct nlattr *nest_parms;
|
2007-07-08 07:31:32 +02:00
|
|
|
|
|
|
|
memset(&m, 0xFF, sizeof(m));
|
|
|
|
memcpy(&m.src.u3, &mask->src.u3, sizeof(m.src.u3));
|
2010-01-26 17:04:02 +01:00
|
|
|
m.src.u.all = mask->src.u.all;
|
|
|
|
m.dst.protonum = tuple->dst.protonum;
|
2007-07-08 07:31:32 +02:00
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
nest_parms = nla_nest_start(skb, CTA_EXPECT_MASK | NLA_F_NESTED);
|
|
|
|
if (!nest_parms)
|
|
|
|
goto nla_put_failure;
|
2006-03-22 22:54:15 +01:00
|
|
|
|
2012-03-05 03:24:29 +01:00
|
|
|
rcu_read_lock();
|
2008-11-17 16:00:40 +01:00
|
|
|
l3proto = __nf_ct_l3proto_find(tuple->src.l3num);
|
2007-07-08 07:31:32 +02:00
|
|
|
ret = ctnetlink_dump_tuples_ip(skb, &m, l3proto);
|
2012-03-05 03:24:29 +01:00
|
|
|
if (ret >= 0) {
|
|
|
|
l4proto = __nf_ct_l4proto_find(tuple->src.l3num,
|
|
|
|
tuple->dst.protonum);
|
2007-07-08 07:31:32 +02:00
|
|
|
ret = ctnetlink_dump_tuples_proto(skb, &m, l4proto);
|
2012-03-05 03:24:29 +01:00
|
|
|
}
|
|
|
|
rcu_read_unlock();
|
|
|
|
|
2006-03-22 22:54:15 +01:00
|
|
|
if (unlikely(ret < 0))
|
2007-09-28 23:37:03 +02:00
|
|
|
goto nla_put_failure;
|
2006-03-22 22:54:15 +01:00
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
nla_nest_end(skb, nest_parms);
|
2006-03-22 22:54:15 +01:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
nla_put_failure:
|
2006-03-22 22:54:15 +01:00
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
2012-08-26 19:14:06 +02:00
|
|
|
static const union nf_inet_addr any_addr;
|
|
|
|
|
[NETFILTER]: Kill some supper dupper bloatry
/me awards the bloatiest-of-all-net/-.c-code award to
nf_conntrack_netlink.c, congratulations to all the authors :-/!
Hall of (unquestionable) fame (measured per inline, top 10 under
net/):
-4496 ctnetlink_parse_tuple netfilter/nf_conntrack_netlink.c
-2165 ctnetlink_dump_tuples netfilter/nf_conntrack_netlink.c
-2115 __ip_vs_get_out_rt ipv4/ipvs/ip_vs_xmit.c
-1924 xfrm_audit_helper_pktinfo xfrm/xfrm_state.c
-1799 ctnetlink_parse_tuple_proto netfilter/nf_conntrack_netlink.c
-1268 ctnetlink_parse_tuple_ip netfilter/nf_conntrack_netlink.c
-1093 ctnetlink_exp_dump_expect netfilter/nf_conntrack_netlink.c
-1060 void ccid3_update_send_interval dccp/ccids/ccid3.c
-983 ctnetlink_dump_tuples_proto netfilter/nf_conntrack_netlink.c
-827 ctnetlink_exp_dump_tuple netfilter/nf_conntrack_netlink.c
(i386 / gcc (GCC) 4.1.2 20070626 (Red Hat 4.1.2-13) /
allyesconfig except CONFIG_FORCED_INLINING)
...and I left < 200 byte gains as future work item.
After iterative inline removal, I finally have this:
net/netfilter/nf_conntrack_netlink.c:
ctnetlink_exp_fill_info | -1104
ctnetlink_new_expect | -1572
ctnetlink_fill_info | -1303
ctnetlink_new_conntrack | -2230
ctnetlink_get_expect | -341
ctnetlink_del_expect | -352
ctnetlink_expect_event | -1110
ctnetlink_conntrack_event | -1548
ctnetlink_del_conntrack | -729
ctnetlink_get_conntrack | -728
10 functions changed, 11017 bytes removed, diff: -11017
net/netfilter/nf_conntrack_netlink.c:
ctnetlink_parse_tuple | +419
dump_nat_seq_adj | +183
ctnetlink_dump_counters | +166
ctnetlink_dump_tuples | +261
ctnetlink_exp_dump_expect | +633
ctnetlink_change_status | +460
6 functions changed, 2122 bytes added, diff: +2122
net/netfilter/nf_conntrack_netlink.o:
16 functions changed, 2122 bytes added, 11017 bytes removed, diff: -8895
Without a number of CONFIG.*DEBUGs, I got this:
net/netfilter/nf_conntrack_netlink.o:
16 functions changed, 2122 bytes added, 11029 bytes removed, diff: -8907
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-06 08:11:31 +01:00
|
|
|
static int
|
2006-01-05 21:19:05 +01:00
|
|
|
ctnetlink_exp_dump_expect(struct sk_buff *skb,
|
2007-02-12 20:15:49 +01:00
|
|
|
const struct nf_conntrack_expect *exp)
|
2006-01-05 21:19:05 +01:00
|
|
|
{
|
|
|
|
struct nf_conn *master = exp->master;
|
2011-12-30 16:40:17 +01:00
|
|
|
long timeout = ((long)exp->timeout.expires - (long)jiffies) / HZ;
|
2010-09-28 21:06:34 +02:00
|
|
|
struct nf_conn_help *help;
|
2012-02-05 03:41:52 +01:00
|
|
|
#ifdef CONFIG_NF_NAT_NEEDED
|
|
|
|
struct nlattr *nest_parms;
|
|
|
|
struct nf_conntrack_tuple nat_tuple = {};
|
|
|
|
#endif
|
2012-02-05 03:44:51 +01:00
|
|
|
struct nf_ct_helper_expectfn *expfn;
|
|
|
|
|
2007-12-18 07:37:03 +01:00
|
|
|
if (timeout < 0)
|
|
|
|
timeout = 0;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
|
|
|
if (ctnetlink_exp_dump_tuple(skb, &exp->tuple, CTA_EXPECT_TUPLE) < 0)
|
2007-09-28 23:37:03 +02:00
|
|
|
goto nla_put_failure;
|
2006-03-22 22:54:15 +01:00
|
|
|
if (ctnetlink_exp_dump_mask(skb, &exp->tuple, &exp->mask) < 0)
|
2007-09-28 23:37:03 +02:00
|
|
|
goto nla_put_failure;
|
2006-01-05 21:19:05 +01:00
|
|
|
if (ctnetlink_exp_dump_tuple(skb,
|
|
|
|
&master->tuplehash[IP_CT_DIR_ORIGINAL].tuple,
|
|
|
|
CTA_EXPECT_MASTER) < 0)
|
2007-09-28 23:37:03 +02:00
|
|
|
goto nla_put_failure;
|
2007-02-12 20:15:49 +01:00
|
|
|
|
2012-02-05 03:41:52 +01:00
|
|
|
#ifdef CONFIG_NF_NAT_NEEDED
|
2012-08-26 19:14:06 +02:00
|
|
|
if (!nf_inet_addr_cmp(&exp->saved_addr, &any_addr) ||
|
|
|
|
exp->saved_proto.all) {
|
2012-02-05 03:41:52 +01:00
|
|
|
nest_parms = nla_nest_start(skb, CTA_EXPECT_NAT | NLA_F_NESTED);
|
|
|
|
if (!nest_parms)
|
|
|
|
goto nla_put_failure;
|
|
|
|
|
2012-04-02 00:57:48 +02:00
|
|
|
if (nla_put_be32(skb, CTA_EXPECT_NAT_DIR, htonl(exp->dir)))
|
|
|
|
goto nla_put_failure;
|
2012-02-05 03:41:52 +01:00
|
|
|
|
|
|
|
nat_tuple.src.l3num = nf_ct_l3num(master);
|
2012-08-26 19:14:06 +02:00
|
|
|
nat_tuple.src.u3 = exp->saved_addr;
|
2012-02-05 03:41:52 +01:00
|
|
|
nat_tuple.dst.protonum = nf_ct_protonum(master);
|
|
|
|
nat_tuple.src.u = exp->saved_proto;
|
|
|
|
|
|
|
|
if (ctnetlink_exp_dump_tuple(skb, &nat_tuple,
|
|
|
|
CTA_EXPECT_NAT_TUPLE) < 0)
|
|
|
|
goto nla_put_failure;
|
|
|
|
nla_nest_end(skb, nest_parms);
|
|
|
|
}
|
|
|
|
#endif
|
2012-04-02 00:57:48 +02:00
|
|
|
if (nla_put_be32(skb, CTA_EXPECT_TIMEOUT, htonl(timeout)) ||
|
|
|
|
nla_put_be32(skb, CTA_EXPECT_ID, htonl((unsigned long)exp)) ||
|
|
|
|
nla_put_be32(skb, CTA_EXPECT_FLAGS, htonl(exp->flags)) ||
|
|
|
|
nla_put_be32(skb, CTA_EXPECT_CLASS, htonl(exp->class)))
|
|
|
|
goto nla_put_failure;
|
2010-09-28 21:06:34 +02:00
|
|
|
help = nfct_help(master);
|
|
|
|
if (help) {
|
|
|
|
struct nf_conntrack_helper *helper;
|
|
|
|
|
|
|
|
helper = rcu_dereference(help->helper);
|
2012-04-02 00:57:48 +02:00
|
|
|
if (helper &&
|
|
|
|
nla_put_string(skb, CTA_EXPECT_HELP_NAME, helper->name))
|
|
|
|
goto nla_put_failure;
|
2010-09-28 21:06:34 +02:00
|
|
|
}
|
2012-02-05 03:44:51 +01:00
|
|
|
expfn = nf_ct_helper_expectfn_find_by_symbol(exp->expectfn);
|
2012-04-02 00:57:48 +02:00
|
|
|
if (expfn != NULL &&
|
|
|
|
nla_put_string(skb, CTA_EXPECT_FN, expfn->name))
|
|
|
|
goto nla_put_failure;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
|
|
|
return 0;
|
2007-02-12 20:15:49 +01:00
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
nla_put_failure:
|
2006-01-05 21:19:05 +01:00
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2012-09-07 22:12:54 +02:00
|
|
|
ctnetlink_exp_fill_info(struct sk_buff *skb, u32 portid, u32 seq,
|
2009-06-02 20:03:34 +02:00
|
|
|
int event, const struct nf_conntrack_expect *exp)
|
2006-01-05 21:19:05 +01:00
|
|
|
{
|
|
|
|
struct nlmsghdr *nlh;
|
|
|
|
struct nfgenmsg *nfmsg;
|
2012-09-07 22:12:54 +02:00
|
|
|
unsigned int flags = portid ? NLM_F_MULTI : 0;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
|
|
|
event |= NFNL_SUBSYS_CTNETLINK_EXP << 8;
|
2012-09-07 22:12:54 +02:00
|
|
|
nlh = nlmsg_put(skb, portid, seq, event, sizeof(*nfmsg), flags);
|
2009-06-02 20:07:39 +02:00
|
|
|
if (nlh == NULL)
|
|
|
|
goto nlmsg_failure;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2009-06-02 20:07:39 +02:00
|
|
|
nfmsg = nlmsg_data(nlh);
|
2006-01-05 21:19:05 +01:00
|
|
|
nfmsg->nfgen_family = exp->tuple.src.l3num;
|
|
|
|
nfmsg->version = NFNETLINK_V0;
|
|
|
|
nfmsg->res_id = 0;
|
|
|
|
|
|
|
|
if (ctnetlink_exp_dump_expect(skb, exp) < 0)
|
2007-09-28 23:37:03 +02:00
|
|
|
goto nla_put_failure;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2009-06-02 20:07:39 +02:00
|
|
|
nlmsg_end(skb, nlh);
|
2006-01-05 21:19:05 +01:00
|
|
|
return skb->len;
|
|
|
|
|
|
|
|
nlmsg_failure:
|
2007-09-28 23:37:03 +02:00
|
|
|
nla_put_failure:
|
2009-06-02 20:07:39 +02:00
|
|
|
nlmsg_cancel(skb, nlh);
|
2006-01-05 21:19:05 +01:00
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
#ifdef CONFIG_NF_CONNTRACK_EVENTS
|
2009-06-03 10:32:06 +02:00
|
|
|
static int
|
|
|
|
ctnetlink_expect_event(unsigned int events, struct nf_exp_event *item)
|
2006-01-05 21:19:05 +01:00
|
|
|
{
|
2010-01-13 16:04:18 +01:00
|
|
|
struct nf_conntrack_expect *exp = item->exp;
|
|
|
|
struct net *net = nf_ct_exp_net(exp);
|
2006-01-05 21:19:05 +01:00
|
|
|
struct nlmsghdr *nlh;
|
|
|
|
struct nfgenmsg *nfmsg;
|
|
|
|
struct sk_buff *skb;
|
2010-10-19 10:19:06 +02:00
|
|
|
unsigned int type, group;
|
2006-01-05 21:19:05 +01:00
|
|
|
int flags = 0;
|
|
|
|
|
2010-10-19 10:19:06 +02:00
|
|
|
if (events & (1 << IPEXP_DESTROY)) {
|
|
|
|
type = IPCTNL_MSG_EXP_DELETE;
|
|
|
|
group = NFNLGRP_CONNTRACK_EXP_DESTROY;
|
|
|
|
} else if (events & (1 << IPEXP_NEW)) {
|
2006-01-05 21:19:05 +01:00
|
|
|
type = IPCTNL_MSG_EXP_NEW;
|
|
|
|
flags = NLM_F_CREATE|NLM_F_EXCL;
|
2010-10-19 10:19:06 +02:00
|
|
|
group = NFNLGRP_CONNTRACK_EXP_NEW;
|
2006-01-05 21:19:05 +01:00
|
|
|
} else
|
2009-06-03 10:32:06 +02:00
|
|
|
return 0;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2010-10-19 10:19:06 +02:00
|
|
|
if (!item->report && !nfnetlink_has_listeners(net, group))
|
2009-06-03 10:32:06 +02:00
|
|
|
return 0;
|
2006-08-22 09:32:05 +02:00
|
|
|
|
2009-06-02 20:07:39 +02:00
|
|
|
skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC);
|
|
|
|
if (skb == NULL)
|
2009-04-17 17:47:31 +02:00
|
|
|
goto errout;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2006-02-04 11:11:09 +01:00
|
|
|
type |= NFNL_SUBSYS_CTNETLINK_EXP << 8;
|
2012-09-07 22:12:54 +02:00
|
|
|
nlh = nlmsg_put(skb, item->portid, 0, type, sizeof(*nfmsg), flags);
|
2009-06-02 20:07:39 +02:00
|
|
|
if (nlh == NULL)
|
|
|
|
goto nlmsg_failure;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2009-06-02 20:07:39 +02:00
|
|
|
nfmsg = nlmsg_data(nlh);
|
2006-01-05 21:19:05 +01:00
|
|
|
nfmsg->nfgen_family = exp->tuple.src.l3num;
|
|
|
|
nfmsg->version = NFNETLINK_V0;
|
|
|
|
nfmsg->res_id = 0;
|
|
|
|
|
2008-11-17 16:00:40 +01:00
|
|
|
rcu_read_lock();
|
2006-01-05 21:19:05 +01:00
|
|
|
if (ctnetlink_exp_dump_expect(skb, exp) < 0)
|
2007-09-28 23:37:03 +02:00
|
|
|
goto nla_put_failure;
|
2008-11-17 16:00:40 +01:00
|
|
|
rcu_read_unlock();
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2009-06-02 20:07:39 +02:00
|
|
|
nlmsg_end(skb, nlh);
|
2012-09-07 22:12:54 +02:00
|
|
|
nfnetlink_send(skb, net, item->portid, group, item->report, GFP_ATOMIC);
|
2009-06-03 10:32:06 +02:00
|
|
|
return 0;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
nla_put_failure:
|
2008-11-17 16:00:40 +01:00
|
|
|
rcu_read_unlock();
|
2009-06-02 20:07:39 +02:00
|
|
|
nlmsg_cancel(skb, nlh);
|
2008-11-17 16:00:40 +01:00
|
|
|
nlmsg_failure:
|
2006-01-05 21:19:05 +01:00
|
|
|
kfree_skb(skb);
|
2009-04-17 17:47:31 +02:00
|
|
|
errout:
|
2010-01-13 16:04:18 +01:00
|
|
|
nfnetlink_set_err(net, 0, 0, -ENOBUFS);
|
2009-06-03 10:32:06 +02:00
|
|
|
return 0;
|
2006-01-05 21:19:05 +01:00
|
|
|
}
|
|
|
|
#endif
|
2007-07-08 07:32:34 +02:00
|
|
|
static int ctnetlink_exp_done(struct netlink_callback *cb)
|
|
|
|
{
|
2007-07-08 07:35:21 +02:00
|
|
|
if (cb->args[1])
|
|
|
|
nf_ct_expect_put((struct nf_conntrack_expect *)cb->args[1]);
|
2007-07-08 07:32:34 +02:00
|
|
|
return 0;
|
|
|
|
}
|
2006-01-05 21:19:05 +01:00
|
|
|
|
|
|
|
static int
|
|
|
|
ctnetlink_exp_dump_table(struct sk_buff *skb, struct netlink_callback *cb)
|
|
|
|
{
|
2010-01-13 16:04:18 +01:00
|
|
|
struct net *net = sock_net(skb->sk);
|
2007-07-08 07:32:34 +02:00
|
|
|
struct nf_conntrack_expect *exp, *last;
|
2009-06-02 20:07:39 +02:00
|
|
|
struct nfgenmsg *nfmsg = nlmsg_data(cb->nlh);
|
2006-01-05 21:19:23 +01:00
|
|
|
u_int8_t l3proto = nfmsg->nfgen_family;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2008-01-31 13:38:19 +01:00
|
|
|
rcu_read_lock();
|
2007-07-08 07:35:21 +02:00
|
|
|
last = (struct nf_conntrack_expect *)cb->args[1];
|
|
|
|
for (; cb->args[0] < nf_ct_expect_hsize; cb->args[0]++) {
|
2007-07-08 07:32:34 +02:00
|
|
|
restart:
|
2016-05-06 00:51:49 +02:00
|
|
|
hlist_for_each_entry(exp, &nf_ct_expect_hash[cb->args[0]],
|
2007-07-08 07:35:21 +02:00
|
|
|
hnode) {
|
|
|
|
if (l3proto && exp->tuple.src.l3num != l3proto)
|
2007-07-08 07:32:34 +02:00
|
|
|
continue;
|
2016-05-06 00:51:47 +02:00
|
|
|
|
|
|
|
if (!net_eq(nf_ct_net(exp->master), net))
|
|
|
|
continue;
|
|
|
|
|
2007-07-08 07:35:21 +02:00
|
|
|
if (cb->args[1]) {
|
|
|
|
if (exp != last)
|
|
|
|
continue;
|
|
|
|
cb->args[1] = 0;
|
|
|
|
}
|
2009-06-02 20:03:34 +02:00
|
|
|
if (ctnetlink_exp_fill_info(skb,
|
2012-09-07 22:12:54 +02:00
|
|
|
NETLINK_CB(cb->skb).portid,
|
2007-07-08 07:35:21 +02:00
|
|
|
cb->nlh->nlmsg_seq,
|
|
|
|
IPCTNL_MSG_EXP_NEW,
|
2009-06-02 20:03:34 +02:00
|
|
|
exp) < 0) {
|
2008-01-31 13:38:19 +01:00
|
|
|
if (!atomic_inc_not_zero(&exp->use))
|
|
|
|
continue;
|
2007-07-08 07:35:21 +02:00
|
|
|
cb->args[1] = (unsigned long)exp;
|
|
|
|
goto out;
|
|
|
|
}
|
2007-07-08 07:32:34 +02:00
|
|
|
}
|
2007-07-08 07:35:21 +02:00
|
|
|
if (cb->args[1]) {
|
|
|
|
cb->args[1] = 0;
|
|
|
|
goto restart;
|
2007-07-08 07:32:34 +02:00
|
|
|
}
|
|
|
|
}
|
2007-02-12 20:15:49 +01:00
|
|
|
out:
|
2008-01-31 13:38:19 +01:00
|
|
|
rcu_read_unlock();
|
2007-07-08 07:32:34 +02:00
|
|
|
if (last)
|
|
|
|
nf_ct_expect_put(last);
|
2006-01-05 21:19:05 +01:00
|
|
|
|
|
|
|
return skb->len;
|
|
|
|
}
|
|
|
|
|
2013-03-18 00:21:36 +01:00
|
|
|
static int
|
|
|
|
ctnetlink_exp_ct_dump_table(struct sk_buff *skb, struct netlink_callback *cb)
|
|
|
|
{
|
|
|
|
struct nf_conntrack_expect *exp, *last;
|
|
|
|
struct nfgenmsg *nfmsg = nlmsg_data(cb->nlh);
|
|
|
|
struct nf_conn *ct = cb->data;
|
|
|
|
struct nf_conn_help *help = nfct_help(ct);
|
|
|
|
u_int8_t l3proto = nfmsg->nfgen_family;
|
|
|
|
|
|
|
|
if (cb->args[0])
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
rcu_read_lock();
|
|
|
|
last = (struct nf_conntrack_expect *)cb->args[1];
|
|
|
|
restart:
|
|
|
|
hlist_for_each_entry(exp, &help->expectations, lnode) {
|
|
|
|
if (l3proto && exp->tuple.src.l3num != l3proto)
|
|
|
|
continue;
|
|
|
|
if (cb->args[1]) {
|
|
|
|
if (exp != last)
|
|
|
|
continue;
|
|
|
|
cb->args[1] = 0;
|
|
|
|
}
|
|
|
|
if (ctnetlink_exp_fill_info(skb, NETLINK_CB(cb->skb).portid,
|
|
|
|
cb->nlh->nlmsg_seq,
|
|
|
|
IPCTNL_MSG_EXP_NEW,
|
|
|
|
exp) < 0) {
|
|
|
|
if (!atomic_inc_not_zero(&exp->use))
|
|
|
|
continue;
|
|
|
|
cb->args[1] = (unsigned long)exp;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (cb->args[1]) {
|
|
|
|
cb->args[1] = 0;
|
|
|
|
goto restart;
|
|
|
|
}
|
|
|
|
cb->args[0] = 1;
|
|
|
|
out:
|
|
|
|
rcu_read_unlock();
|
|
|
|
if (last)
|
|
|
|
nf_ct_expect_put(last);
|
|
|
|
|
|
|
|
return skb->len;
|
|
|
|
}
|
|
|
|
|
2015-12-15 18:41:56 +01:00
|
|
|
static int ctnetlink_dump_exp_ct(struct net *net, struct sock *ctnl,
|
|
|
|
struct sk_buff *skb,
|
2013-03-18 00:21:36 +01:00
|
|
|
const struct nlmsghdr *nlh,
|
|
|
|
const struct nlattr * const cda[])
|
|
|
|
{
|
|
|
|
int err;
|
|
|
|
struct nfgenmsg *nfmsg = nlmsg_data(nlh);
|
|
|
|
u_int8_t u3 = nfmsg->nfgen_family;
|
|
|
|
struct nf_conntrack_tuple tuple;
|
|
|
|
struct nf_conntrack_tuple_hash *h;
|
|
|
|
struct nf_conn *ct;
|
2015-08-08 21:40:01 +02:00
|
|
|
struct nf_conntrack_zone zone;
|
2013-03-18 00:21:36 +01:00
|
|
|
struct netlink_dump_control c = {
|
|
|
|
.dump = ctnetlink_exp_ct_dump_table,
|
|
|
|
.done = ctnetlink_exp_done,
|
|
|
|
};
|
|
|
|
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
err = ctnetlink_parse_tuple(cda, &tuple, CTA_EXPECT_MASTER,
|
|
|
|
u3, NULL);
|
2013-03-18 00:21:36 +01:00
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
|
2015-08-08 21:40:01 +02:00
|
|
|
err = ctnetlink_parse_zone(cda[CTA_EXPECT_ZONE], &zone);
|
|
|
|
if (err < 0)
|
|
|
|
return err;
|
2013-03-18 00:21:36 +01:00
|
|
|
|
2015-08-08 21:40:01 +02:00
|
|
|
h = nf_conntrack_find_get(net, &zone, &tuple);
|
2013-03-18 00:21:36 +01:00
|
|
|
if (!h)
|
|
|
|
return -ENOENT;
|
|
|
|
|
|
|
|
ct = nf_ct_tuplehash_to_ctrack(h);
|
2017-04-02 12:01:33 +02:00
|
|
|
/* No expectation linked to this connection tracking. */
|
|
|
|
if (!nfct_help(ct)) {
|
|
|
|
nf_ct_put(ct);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2013-03-18 00:21:36 +01:00
|
|
|
c.data = ct;
|
|
|
|
|
|
|
|
err = netlink_dump_start(ctnl, skb, nlh, &c);
|
|
|
|
nf_ct_put(ct);
|
|
|
|
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2015-12-15 18:41:56 +01:00
|
|
|
static int ctnetlink_get_expect(struct net *net, struct sock *ctnl,
|
|
|
|
struct sk_buff *skb, const struct nlmsghdr *nlh,
|
|
|
|
const struct nlattr * const cda[])
|
2006-01-05 21:19:05 +01:00
|
|
|
{
|
|
|
|
struct nf_conntrack_tuple tuple;
|
|
|
|
struct nf_conntrack_expect *exp;
|
|
|
|
struct sk_buff *skb2;
|
2009-06-02 20:07:39 +02:00
|
|
|
struct nfgenmsg *nfmsg = nlmsg_data(nlh);
|
2006-01-05 21:19:05 +01:00
|
|
|
u_int8_t u3 = nfmsg->nfgen_family;
|
2015-08-08 21:40:01 +02:00
|
|
|
struct nf_conntrack_zone zone;
|
2010-02-15 18:14:57 +01:00
|
|
|
int err;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2011-01-18 21:40:38 +01:00
|
|
|
if (nlh->nlmsg_flags & NLM_F_DUMP) {
|
2013-03-18 00:21:36 +01:00
|
|
|
if (cda[CTA_EXPECT_MASTER])
|
2015-12-15 18:41:56 +01:00
|
|
|
return ctnetlink_dump_exp_ct(net, ctnl, skb, nlh, cda);
|
2013-03-18 00:21:36 +01:00
|
|
|
else {
|
|
|
|
struct netlink_dump_control c = {
|
|
|
|
.dump = ctnetlink_exp_dump_table,
|
|
|
|
.done = ctnetlink_exp_done,
|
|
|
|
};
|
|
|
|
return netlink_dump_start(ctnl, skb, nlh, &c);
|
|
|
|
}
|
2006-01-05 21:19:05 +01:00
|
|
|
}
|
|
|
|
|
2010-02-15 18:14:57 +01:00
|
|
|
err = ctnetlink_parse_zone(cda[CTA_EXPECT_ZONE], &zone);
|
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
|
2011-12-14 12:45:22 +01:00
|
|
|
if (cda[CTA_EXPECT_TUPLE])
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
err = ctnetlink_parse_tuple(cda, &tuple, CTA_EXPECT_TUPLE,
|
|
|
|
u3, NULL);
|
2011-12-14 12:45:22 +01:00
|
|
|
else if (cda[CTA_EXPECT_MASTER])
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
err = ctnetlink_parse_tuple(cda, &tuple, CTA_EXPECT_MASTER,
|
|
|
|
u3, NULL);
|
2006-01-05 21:19:05 +01:00
|
|
|
else
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
|
2015-08-08 21:40:01 +02:00
|
|
|
exp = nf_ct_expect_find_get(net, &zone, &tuple);
|
2006-01-05 21:19:05 +01:00
|
|
|
if (!exp)
|
|
|
|
return -ENOENT;
|
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
if (cda[CTA_EXPECT_ID]) {
|
2007-12-18 07:29:45 +01:00
|
|
|
__be32 id = nla_get_be32(cda[CTA_EXPECT_ID]);
|
2007-09-28 23:41:50 +02:00
|
|
|
if (ntohl(id) != (u32)(unsigned long)exp) {
|
2007-07-08 07:30:49 +02:00
|
|
|
nf_ct_expect_put(exp);
|
2006-01-05 21:19:05 +01:00
|
|
|
return -ENOENT;
|
|
|
|
}
|
2007-02-12 20:15:49 +01:00
|
|
|
}
|
2006-01-05 21:19:05 +01:00
|
|
|
|
|
|
|
err = -ENOMEM;
|
2009-06-02 20:07:39 +02:00
|
|
|
skb2 = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
|
2011-12-24 19:03:46 +01:00
|
|
|
if (skb2 == NULL) {
|
|
|
|
nf_ct_expect_put(exp);
|
2006-01-05 21:19:05 +01:00
|
|
|
goto out;
|
2011-12-24 19:03:46 +01:00
|
|
|
}
|
2006-11-27 18:25:58 +01:00
|
|
|
|
2008-11-17 16:00:40 +01:00
|
|
|
rcu_read_lock();
|
2012-09-07 22:12:54 +02:00
|
|
|
err = ctnetlink_exp_fill_info(skb2, NETLINK_CB(skb).portid,
|
2009-06-02 20:03:34 +02:00
|
|
|
nlh->nlmsg_seq, IPCTNL_MSG_EXP_NEW, exp);
|
2008-11-17 16:00:40 +01:00
|
|
|
rcu_read_unlock();
|
2011-12-24 19:03:46 +01:00
|
|
|
nf_ct_expect_put(exp);
|
2006-01-05 21:19:05 +01:00
|
|
|
if (err <= 0)
|
|
|
|
goto free;
|
|
|
|
|
2012-09-07 22:12:54 +02:00
|
|
|
err = netlink_unicast(ctnl, skb2, NETLINK_CB(skb).portid, MSG_DONTWAIT);
|
2011-12-24 19:03:46 +01:00
|
|
|
if (err < 0)
|
|
|
|
goto out;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2011-12-24 19:03:46 +01:00
|
|
|
return 0;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
|
|
|
free:
|
|
|
|
kfree_skb(skb2);
|
|
|
|
out:
|
2011-12-24 19:03:46 +01:00
|
|
|
/* this avoids a loop in nfnetlink. */
|
|
|
|
return err == -EAGAIN ? -ENOBUFS : err;
|
2006-01-05 21:19:05 +01:00
|
|
|
}
|
|
|
|
|
2015-12-15 18:41:56 +01:00
|
|
|
static int ctnetlink_del_expect(struct net *net, struct sock *ctnl,
|
|
|
|
struct sk_buff *skb, const struct nlmsghdr *nlh,
|
|
|
|
const struct nlattr * const cda[])
|
2006-01-05 21:19:05 +01:00
|
|
|
{
|
2007-07-08 07:35:21 +02:00
|
|
|
struct nf_conntrack_expect *exp;
|
2006-01-05 21:19:05 +01:00
|
|
|
struct nf_conntrack_tuple tuple;
|
2009-06-02 20:07:39 +02:00
|
|
|
struct nfgenmsg *nfmsg = nlmsg_data(nlh);
|
hlist: drop the node parameter from iterators
I'm not sure why, but the hlist for each entry iterators were conceived
list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.
Besides the semantic patch, there was some manual work required:
- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.
The semantic patch which is mostly the work of Peter Senna Tschudin is here:
@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
type T;
expression a,c,d,e;
identifier b;
statement S;
@@
-T b;
<+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
...+>
[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-28 02:06:00 +01:00
|
|
|
struct hlist_node *next;
|
2006-01-05 21:19:05 +01:00
|
|
|
u_int8_t u3 = nfmsg->nfgen_family;
|
2015-08-08 21:40:01 +02:00
|
|
|
struct nf_conntrack_zone zone;
|
2007-07-08 07:35:21 +02:00
|
|
|
unsigned int i;
|
2006-01-05 21:19:05 +01:00
|
|
|
int err;
|
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
if (cda[CTA_EXPECT_TUPLE]) {
|
2006-01-05 21:19:05 +01:00
|
|
|
/* delete a single expect by tuple */
|
2010-02-15 18:14:57 +01:00
|
|
|
err = ctnetlink_parse_zone(cda[CTA_EXPECT_ZONE], &zone);
|
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
err = ctnetlink_parse_tuple(cda, &tuple, CTA_EXPECT_TUPLE,
|
|
|
|
u3, NULL);
|
2006-01-05 21:19:05 +01:00
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
|
|
|
|
/* bump usage count to 2 */
|
2015-08-08 21:40:01 +02:00
|
|
|
exp = nf_ct_expect_find_get(net, &zone, &tuple);
|
2006-01-05 21:19:05 +01:00
|
|
|
if (!exp)
|
|
|
|
return -ENOENT;
|
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
if (cda[CTA_EXPECT_ID]) {
|
2007-12-18 07:29:45 +01:00
|
|
|
__be32 id = nla_get_be32(cda[CTA_EXPECT_ID]);
|
2007-09-28 23:41:50 +02:00
|
|
|
if (ntohl(id) != (u32)(unsigned long)exp) {
|
2007-07-08 07:30:49 +02:00
|
|
|
nf_ct_expect_put(exp);
|
2006-01-05 21:19:05 +01:00
|
|
|
return -ENOENT;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* after list removal, usage count == 1 */
|
2014-03-03 14:46:01 +01:00
|
|
|
spin_lock_bh(&nf_conntrack_expect_lock);
|
2010-10-19 10:19:06 +02:00
|
|
|
if (del_timer(&exp->timeout)) {
|
2012-09-07 22:12:54 +02:00
|
|
|
nf_ct_unlink_expect_report(exp, NETLINK_CB(skb).portid,
|
2010-10-19 10:19:06 +02:00
|
|
|
nlmsg_report(nlh));
|
|
|
|
nf_ct_expect_put(exp);
|
|
|
|
}
|
2014-03-03 14:46:01 +01:00
|
|
|
spin_unlock_bh(&nf_conntrack_expect_lock);
|
2007-02-12 20:15:49 +01:00
|
|
|
/* have to put what we 'get' above.
|
2006-01-05 21:19:05 +01:00
|
|
|
* after this line usage count == 0 */
|
2007-07-08 07:30:49 +02:00
|
|
|
nf_ct_expect_put(exp);
|
2007-09-28 23:37:03 +02:00
|
|
|
} else if (cda[CTA_EXPECT_HELP_NAME]) {
|
|
|
|
char *name = nla_data(cda[CTA_EXPECT_HELP_NAME]);
|
2007-07-08 07:35:21 +02:00
|
|
|
struct nf_conn_help *m_help;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
|
|
|
/* delete all expectations for this helper */
|
2014-03-03 14:46:01 +01:00
|
|
|
spin_lock_bh(&nf_conntrack_expect_lock);
|
2007-07-08 07:35:21 +02:00
|
|
|
for (i = 0; i < nf_ct_expect_hsize; i++) {
|
hlist: drop the node parameter from iterators
I'm not sure why, but the hlist for each entry iterators were conceived
list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.
Besides the semantic patch, there was some manual work required:
- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.
The semantic patch which is mostly the work of Peter Senna Tschudin is here:
@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
type T;
expression a,c,d,e;
identifier b;
statement S;
@@
-T b;
<+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
...+>
[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-28 02:06:00 +01:00
|
|
|
hlist_for_each_entry_safe(exp, next,
|
2016-05-06 00:51:49 +02:00
|
|
|
&nf_ct_expect_hash[i],
|
2007-07-08 07:35:21 +02:00
|
|
|
hnode) {
|
2016-05-06 00:51:47 +02:00
|
|
|
|
|
|
|
if (!net_eq(nf_ct_exp_net(exp), net))
|
|
|
|
continue;
|
|
|
|
|
2007-07-08 07:35:21 +02:00
|
|
|
m_help = nfct_help(exp->master);
|
2010-02-03 13:41:29 +01:00
|
|
|
if (!strcmp(m_help->helper->name, name) &&
|
|
|
|
del_timer(&exp->timeout)) {
|
2010-10-19 10:19:06 +02:00
|
|
|
nf_ct_unlink_expect_report(exp,
|
2012-09-07 22:12:54 +02:00
|
|
|
NETLINK_CB(skb).portid,
|
2010-10-19 10:19:06 +02:00
|
|
|
nlmsg_report(nlh));
|
2007-07-08 07:35:21 +02:00
|
|
|
nf_ct_expect_put(exp);
|
|
|
|
}
|
2006-01-05 21:19:05 +01:00
|
|
|
}
|
|
|
|
}
|
2014-03-03 14:46:01 +01:00
|
|
|
spin_unlock_bh(&nf_conntrack_expect_lock);
|
2006-01-05 21:19:05 +01:00
|
|
|
} else {
|
|
|
|
/* This basically means we have to flush everything*/
|
2014-03-03 14:46:01 +01:00
|
|
|
spin_lock_bh(&nf_conntrack_expect_lock);
|
2007-07-08 07:35:21 +02:00
|
|
|
for (i = 0; i < nf_ct_expect_hsize; i++) {
|
hlist: drop the node parameter from iterators
I'm not sure why, but the hlist for each entry iterators were conceived
list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.
Besides the semantic patch, there was some manual work required:
- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.
The semantic patch which is mostly the work of Peter Senna Tschudin is here:
@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
type T;
expression a,c,d,e;
identifier b;
statement S;
@@
-T b;
<+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
...+>
[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-28 02:06:00 +01:00
|
|
|
hlist_for_each_entry_safe(exp, next,
|
2016-05-06 00:51:49 +02:00
|
|
|
&nf_ct_expect_hash[i],
|
2007-07-08 07:35:21 +02:00
|
|
|
hnode) {
|
2016-05-06 00:51:47 +02:00
|
|
|
|
|
|
|
if (!net_eq(nf_ct_exp_net(exp), net))
|
|
|
|
continue;
|
|
|
|
|
2007-07-08 07:35:21 +02:00
|
|
|
if (del_timer(&exp->timeout)) {
|
2010-10-19 10:19:06 +02:00
|
|
|
nf_ct_unlink_expect_report(exp,
|
2012-09-07 22:12:54 +02:00
|
|
|
NETLINK_CB(skb).portid,
|
2010-10-19 10:19:06 +02:00
|
|
|
nlmsg_report(nlh));
|
2007-07-08 07:35:21 +02:00
|
|
|
nf_ct_expect_put(exp);
|
|
|
|
}
|
2006-01-05 21:19:05 +01:00
|
|
|
}
|
|
|
|
}
|
2014-03-03 14:46:01 +01:00
|
|
|
spin_unlock_bh(&nf_conntrack_expect_lock);
|
2006-01-05 21:19:05 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
static int
|
2009-08-25 16:07:58 +02:00
|
|
|
ctnetlink_change_expect(struct nf_conntrack_expect *x,
|
|
|
|
const struct nlattr * const cda[])
|
2006-01-05 21:19:05 +01:00
|
|
|
{
|
2012-05-02 16:39:24 +02:00
|
|
|
if (cda[CTA_EXPECT_TIMEOUT]) {
|
|
|
|
if (!del_timer(&x->timeout))
|
|
|
|
return -ETIME;
|
|
|
|
|
|
|
|
x->timeout.expires = jiffies +
|
|
|
|
ntohl(nla_get_be32(cda[CTA_EXPECT_TIMEOUT])) * HZ;
|
|
|
|
add_timer(&x->timeout);
|
|
|
|
}
|
|
|
|
return 0;
|
2006-01-05 21:19:05 +01:00
|
|
|
}
|
|
|
|
|
2012-02-05 03:41:52 +01:00
|
|
|
static const struct nla_policy exp_nat_nla_policy[CTA_EXPECT_NAT_MAX+1] = {
|
|
|
|
[CTA_EXPECT_NAT_DIR] = { .type = NLA_U32 },
|
|
|
|
[CTA_EXPECT_NAT_TUPLE] = { .type = NLA_NESTED },
|
|
|
|
};
|
|
|
|
|
|
|
|
static int
|
|
|
|
ctnetlink_parse_expect_nat(const struct nlattr *attr,
|
|
|
|
struct nf_conntrack_expect *exp,
|
|
|
|
u_int8_t u3)
|
|
|
|
{
|
|
|
|
#ifdef CONFIG_NF_NAT_NEEDED
|
|
|
|
struct nlattr *tb[CTA_EXPECT_NAT_MAX+1];
|
|
|
|
struct nf_conntrack_tuple nat_tuple = {};
|
|
|
|
int err;
|
|
|
|
|
2013-06-12 17:54:51 +02:00
|
|
|
err = nla_parse_nested(tb, CTA_EXPECT_NAT_MAX, attr, exp_nat_nla_policy);
|
|
|
|
if (err < 0)
|
|
|
|
return err;
|
2012-02-05 03:41:52 +01:00
|
|
|
|
|
|
|
if (!tb[CTA_EXPECT_NAT_DIR] || !tb[CTA_EXPECT_NAT_TUPLE])
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
err = ctnetlink_parse_tuple((const struct nlattr * const *)tb,
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
&nat_tuple, CTA_EXPECT_NAT_TUPLE,
|
|
|
|
u3, NULL);
|
2012-02-05 03:41:52 +01:00
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
|
2012-08-26 19:14:06 +02:00
|
|
|
exp->saved_addr = nat_tuple.src.u3;
|
2012-02-05 03:41:52 +01:00
|
|
|
exp->saved_proto = nat_tuple.src.u;
|
|
|
|
exp->dir = ntohl(nla_get_be32(tb[CTA_EXPECT_NAT_DIR]));
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
#else
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2013-08-07 19:12:34 +02:00
|
|
|
static struct nf_conntrack_expect *
|
|
|
|
ctnetlink_alloc_expect(const struct nlattr * const cda[], struct nf_conn *ct,
|
|
|
|
struct nf_conntrack_helper *helper,
|
|
|
|
struct nf_conntrack_tuple *tuple,
|
|
|
|
struct nf_conntrack_tuple *mask)
|
2006-01-05 21:19:05 +01:00
|
|
|
{
|
2013-08-07 19:12:34 +02:00
|
|
|
u_int32_t class = 0;
|
2006-01-05 21:19:05 +01:00
|
|
|
struct nf_conntrack_expect *exp;
|
2006-03-21 02:56:32 +01:00
|
|
|
struct nf_conn_help *help;
|
2013-08-07 19:12:34 +02:00
|
|
|
int err;
|
2012-02-05 02:34:16 +01:00
|
|
|
|
2012-02-05 03:21:12 +01:00
|
|
|
if (cda[CTA_EXPECT_CLASS] && helper) {
|
|
|
|
class = ntohl(nla_get_be32(cda[CTA_EXPECT_CLASS]));
|
2013-08-07 19:12:34 +02:00
|
|
|
if (class > helper->expect_class_max)
|
|
|
|
return ERR_PTR(-EINVAL);
|
2012-02-05 03:21:12 +01:00
|
|
|
}
|
2007-07-08 07:30:49 +02:00
|
|
|
exp = nf_ct_expect_alloc(ct);
|
2013-08-07 19:12:34 +02:00
|
|
|
if (!exp)
|
|
|
|
return ERR_PTR(-ENOMEM);
|
|
|
|
|
2010-09-28 21:06:34 +02:00
|
|
|
help = nfct_help(ct);
|
|
|
|
if (!help) {
|
|
|
|
if (!cda[CTA_EXPECT_TIMEOUT]) {
|
|
|
|
err = -EINVAL;
|
2012-12-26 12:49:40 +01:00
|
|
|
goto err_out;
|
2010-09-28 21:06:34 +02:00
|
|
|
}
|
|
|
|
exp->timeout.expires =
|
|
|
|
jiffies + ntohl(nla_get_be32(cda[CTA_EXPECT_TIMEOUT])) * HZ;
|
2007-02-12 20:15:49 +01:00
|
|
|
|
2010-09-28 21:06:34 +02:00
|
|
|
exp->flags = NF_CT_EXPECT_USERSPACE;
|
|
|
|
if (cda[CTA_EXPECT_FLAGS]) {
|
|
|
|
exp->flags |=
|
|
|
|
ntohl(nla_get_be32(cda[CTA_EXPECT_FLAGS]));
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
if (cda[CTA_EXPECT_FLAGS]) {
|
|
|
|
exp->flags = ntohl(nla_get_be32(cda[CTA_EXPECT_FLAGS]));
|
|
|
|
exp->flags &= ~NF_CT_EXPECT_USERSPACE;
|
|
|
|
} else
|
|
|
|
exp->flags = 0;
|
|
|
|
}
|
2012-02-05 03:44:51 +01:00
|
|
|
if (cda[CTA_EXPECT_FN]) {
|
|
|
|
const char *name = nla_data(cda[CTA_EXPECT_FN]);
|
|
|
|
struct nf_ct_helper_expectfn *expfn;
|
|
|
|
|
|
|
|
expfn = nf_ct_helper_expectfn_find_by_name(name);
|
|
|
|
if (expfn == NULL) {
|
|
|
|
err = -EINVAL;
|
|
|
|
goto err_out;
|
|
|
|
}
|
|
|
|
exp->expectfn = expfn->expectfn;
|
|
|
|
} else
|
|
|
|
exp->expectfn = NULL;
|
2007-02-12 20:15:49 +01:00
|
|
|
|
2012-02-05 03:21:12 +01:00
|
|
|
exp->class = class;
|
2006-01-05 21:19:05 +01:00
|
|
|
exp->master = ct;
|
2012-02-05 02:34:16 +01:00
|
|
|
exp->helper = helper;
|
2013-08-07 19:12:34 +02:00
|
|
|
exp->tuple = *tuple;
|
|
|
|
exp->mask.src.u3 = mask->src.u3;
|
|
|
|
exp->mask.src.u.all = mask->src.u.all;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2012-02-05 03:41:52 +01:00
|
|
|
if (cda[CTA_EXPECT_NAT]) {
|
|
|
|
err = ctnetlink_parse_expect_nat(cda[CTA_EXPECT_NAT],
|
2013-08-07 19:12:34 +02:00
|
|
|
exp, nf_ct_l3num(ct));
|
2012-02-05 03:41:52 +01:00
|
|
|
if (err < 0)
|
|
|
|
goto err_out;
|
|
|
|
}
|
2013-08-07 19:12:34 +02:00
|
|
|
return exp;
|
2012-02-05 03:41:52 +01:00
|
|
|
err_out:
|
2007-07-08 07:30:49 +02:00
|
|
|
nf_ct_expect_put(exp);
|
2013-08-07 19:12:34 +02:00
|
|
|
return ERR_PTR(err);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2015-08-08 21:40:01 +02:00
|
|
|
ctnetlink_create_expect(struct net *net,
|
|
|
|
const struct nf_conntrack_zone *zone,
|
2013-08-07 19:12:34 +02:00
|
|
|
const struct nlattr * const cda[],
|
|
|
|
u_int8_t u3, u32 portid, int report)
|
|
|
|
{
|
|
|
|
struct nf_conntrack_tuple tuple, mask, master_tuple;
|
|
|
|
struct nf_conntrack_tuple_hash *h = NULL;
|
|
|
|
struct nf_conntrack_helper *helper = NULL;
|
|
|
|
struct nf_conntrack_expect *exp;
|
|
|
|
struct nf_conn *ct;
|
|
|
|
int err;
|
|
|
|
|
|
|
|
/* caller guarantees that those three CTA_EXPECT_* exist */
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
err = ctnetlink_parse_tuple(cda, &tuple, CTA_EXPECT_TUPLE,
|
|
|
|
u3, NULL);
|
2013-08-07 19:12:34 +02:00
|
|
|
if (err < 0)
|
|
|
|
return err;
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
err = ctnetlink_parse_tuple(cda, &mask, CTA_EXPECT_MASK,
|
|
|
|
u3, NULL);
|
2013-08-07 19:12:34 +02:00
|
|
|
if (err < 0)
|
|
|
|
return err;
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
err = ctnetlink_parse_tuple(cda, &master_tuple, CTA_EXPECT_MASTER,
|
|
|
|
u3, NULL);
|
2013-08-07 19:12:34 +02:00
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
|
|
|
|
/* Look for master conntrack of this expectation */
|
|
|
|
h = nf_conntrack_find_get(net, zone, &master_tuple);
|
|
|
|
if (!h)
|
|
|
|
return -ENOENT;
|
|
|
|
ct = nf_ct_tuplehash_to_ctrack(h);
|
|
|
|
|
2017-03-29 13:11:27 +02:00
|
|
|
rcu_read_lock();
|
2013-08-07 19:12:34 +02:00
|
|
|
if (cda[CTA_EXPECT_HELP_NAME]) {
|
|
|
|
const char *helpname = nla_data(cda[CTA_EXPECT_HELP_NAME]);
|
|
|
|
|
|
|
|
helper = __nf_conntrack_helper_find(helpname, u3,
|
|
|
|
nf_ct_protonum(ct));
|
|
|
|
if (helper == NULL) {
|
2017-03-29 13:11:27 +02:00
|
|
|
rcu_read_unlock();
|
2013-08-07 19:12:34 +02:00
|
|
|
#ifdef CONFIG_MODULES
|
|
|
|
if (request_module("nfct-helper-%s", helpname) < 0) {
|
|
|
|
err = -EOPNOTSUPP;
|
|
|
|
goto err_ct;
|
|
|
|
}
|
2017-03-29 13:11:27 +02:00
|
|
|
rcu_read_lock();
|
2013-08-07 19:12:34 +02:00
|
|
|
helper = __nf_conntrack_helper_find(helpname, u3,
|
|
|
|
nf_ct_protonum(ct));
|
|
|
|
if (helper) {
|
|
|
|
err = -EAGAIN;
|
2017-03-29 13:11:27 +02:00
|
|
|
goto err_rcu;
|
2013-08-07 19:12:34 +02:00
|
|
|
}
|
2017-03-29 13:11:27 +02:00
|
|
|
rcu_read_unlock();
|
2013-08-07 19:12:34 +02:00
|
|
|
#endif
|
|
|
|
err = -EOPNOTSUPP;
|
|
|
|
goto err_ct;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
exp = ctnetlink_alloc_expect(cda, ct, helper, &tuple, &mask);
|
|
|
|
if (IS_ERR(exp)) {
|
|
|
|
err = PTR_ERR(exp);
|
2017-03-29 13:11:27 +02:00
|
|
|
goto err_rcu;
|
2013-08-07 19:12:34 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
err = nf_ct_expect_related_report(exp, portid, report);
|
|
|
|
nf_ct_expect_put(exp);
|
2017-03-29 13:11:27 +02:00
|
|
|
err_rcu:
|
|
|
|
rcu_read_unlock();
|
2013-08-07 19:12:34 +02:00
|
|
|
err_ct:
|
|
|
|
nf_ct_put(ct);
|
2006-01-05 21:19:05 +01:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2015-12-15 18:41:56 +01:00
|
|
|
static int ctnetlink_new_expect(struct net *net, struct sock *ctnl,
|
|
|
|
struct sk_buff *skb, const struct nlmsghdr *nlh,
|
|
|
|
const struct nlattr * const cda[])
|
2006-01-05 21:19:05 +01:00
|
|
|
{
|
|
|
|
struct nf_conntrack_tuple tuple;
|
|
|
|
struct nf_conntrack_expect *exp;
|
2009-06-02 20:07:39 +02:00
|
|
|
struct nfgenmsg *nfmsg = nlmsg_data(nlh);
|
2006-01-05 21:19:05 +01:00
|
|
|
u_int8_t u3 = nfmsg->nfgen_family;
|
2015-08-08 21:40:01 +02:00
|
|
|
struct nf_conntrack_zone zone;
|
2010-02-15 18:14:57 +01:00
|
|
|
int err;
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2007-09-28 23:37:03 +02:00
|
|
|
if (!cda[CTA_EXPECT_TUPLE]
|
|
|
|
|| !cda[CTA_EXPECT_MASK]
|
|
|
|
|| !cda[CTA_EXPECT_MASTER])
|
2006-01-05 21:19:05 +01:00
|
|
|
return -EINVAL;
|
|
|
|
|
2010-02-15 18:14:57 +01:00
|
|
|
err = ctnetlink_parse_zone(cda[CTA_EXPECT_ZONE], &zone);
|
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
|
netfilter: nf_conntrack: add direction support for zones
This work adds a direction parameter to netfilter zones, so identity
separation can be performed only in original/reply or both directions
(default). This basically opens up the possibility of doing NAT with
conflicting IP address/port tuples from multiple, isolated tenants
on a host (e.g. from a netns) without requiring each tenant to NAT
twice resp. to use its own dedicated IP address to SNAT to, meaning
overlapping tuples can be made unique with the zone identifier in
original direction, where the NAT engine will then allocate a unique
tuple in the commonly shared default zone for the reply direction.
In some restricted, local DNAT cases, also port redirection could be
used for making the reply traffic unique w/o requiring SNAT.
The consensus we've reached and discussed at NFWS and since the initial
implementation [1] was to directly integrate the direction meta data
into the existing zones infrastructure, as opposed to the ct->mark
approach we proposed initially.
As we pass the nf_conntrack_zone object directly around, we don't have
to touch all call-sites, but only those, that contain equality checks
of zones. Thus, based on the current direction (original or reply),
we either return the actual id, or the default NF_CT_DEFAULT_ZONE_ID.
CT expectations are direction-agnostic entities when expectations are
being compared among themselves, so we can only use the identifier
in this case.
Note that zone identifiers can not be included into the hash mix
anymore as they don't contain a "stable" value that would be equal
for both directions at all times, f.e. if only zone->id would
unconditionally be xor'ed into the table slot hash, then replies won't
find the corresponding conntracking entry anymore.
If no particular direction is specified when configuring zones, the
behaviour is exactly as we expect currently (both directions).
Support has been added for the CT netlink interface as well as the
x_tables raw CT target, which both already offer existing interfaces
to user space for the configuration of zones.
Below a minimal, simplified collision example (script in [2]) with
netperf sessions:
+--- tenant-1 ---+ mark := 1
| netperf |--+
+----------------+ | CT zone := mark [ORIGINAL]
[ip,sport] := X +--------------+ +--- gateway ---+
| mark routing |--| SNAT |-- ... +
+--------------+ +---------------+ |
+--- tenant-2 ---+ | ~~~|~~~
| netperf |--+ +-----------+ |
+----------------+ mark := 2 | netserver |------ ... +
[ip,sport] := X +-----------+
[ip,port] := Y
On the gateway netns, example:
iptables -t raw -A PREROUTING -j CT --zone mark --zone-dir ORIGINAL
iptables -t nat -A POSTROUTING -o <dev> -j SNAT --to-source <ip> --random-fully
iptables -t mangle -A PREROUTING -m conntrack --ctdir ORIGINAL -j CONNMARK --save-mark
iptables -t mangle -A POSTROUTING -m conntrack --ctdir REPLY -j CONNMARK --restore-mark
conntrack dump from gateway netns:
netperf -H 10.1.1.2 -t TCP_STREAM -l60 -p12865,5555 from each tenant netns
tcp 6 431995 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=1024
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 431994 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=5555 dport=12865 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=12865 dport=5555
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 299 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=39438 dport=33768 zone-orig=1
src=10.1.1.2 dst=10.1.1.1 sport=33768 dport=39438
[ASSURED] mark=1 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp 6 300 ESTABLISHED src=40.1.1.1 dst=10.1.1.2 sport=32889 dport=40206 zone-orig=2
src=10.1.1.2 dst=10.1.1.1 sport=40206 dport=32889
[ASSURED] mark=2 secctx=system_u:object_r:unlabeled_t:s0 use=2
Taking this further, test script in [2] creates 200 tenants and runs
original-tuple colliding netperf sessions each. A conntrack -L dump in
the gateway netns also confirms 200 overlapping entries, all in ESTABLISHED
state as expected.
I also did run various other tests with some permutations of the script,
to mention some: SNAT in random/random-fully/persistent mode, no zones (no
overlaps), static zones (original, reply, both directions), etc.
[1] http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/57412/
[2] https://paste.fedoraproject.org/242835/65657871/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-08-14 16:03:39 +02:00
|
|
|
err = ctnetlink_parse_tuple(cda, &tuple, CTA_EXPECT_TUPLE,
|
|
|
|
u3, NULL);
|
2006-01-05 21:19:05 +01:00
|
|
|
if (err < 0)
|
|
|
|
return err;
|
|
|
|
|
2014-03-03 14:46:01 +01:00
|
|
|
spin_lock_bh(&nf_conntrack_expect_lock);
|
2015-08-08 21:40:01 +02:00
|
|
|
exp = __nf_ct_expect_find(net, &zone, &tuple);
|
2006-01-05 21:19:05 +01:00
|
|
|
if (!exp) {
|
2014-03-03 14:46:01 +01:00
|
|
|
spin_unlock_bh(&nf_conntrack_expect_lock);
|
2006-01-05 21:19:05 +01:00
|
|
|
err = -ENOENT;
|
2008-11-18 11:56:20 +01:00
|
|
|
if (nlh->nlmsg_flags & NLM_F_CREATE) {
|
2015-08-08 21:40:01 +02:00
|
|
|
err = ctnetlink_create_expect(net, &zone, cda, u3,
|
2012-09-07 22:12:54 +02:00
|
|
|
NETLINK_CB(skb).portid,
|
2008-11-18 11:56:20 +01:00
|
|
|
nlmsg_report(nlh));
|
|
|
|
}
|
2006-01-05 21:19:05 +01:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
err = -EEXIST;
|
|
|
|
if (!(nlh->nlmsg_flags & NLM_F_EXCL))
|
|
|
|
err = ctnetlink_change_expect(exp, cda);
|
2014-03-03 14:46:01 +01:00
|
|
|
spin_unlock_bh(&nf_conntrack_expect_lock);
|
2006-01-05 21:19:05 +01:00
|
|
|
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2012-06-26 20:27:09 +02:00
|
|
|
static int
|
2012-09-07 22:12:54 +02:00
|
|
|
ctnetlink_exp_stat_fill_info(struct sk_buff *skb, u32 portid, u32 seq, int cpu,
|
2012-06-26 20:27:09 +02:00
|
|
|
const struct ip_conntrack_stat *st)
|
|
|
|
{
|
|
|
|
struct nlmsghdr *nlh;
|
|
|
|
struct nfgenmsg *nfmsg;
|
2012-09-07 22:12:54 +02:00
|
|
|
unsigned int flags = portid ? NLM_F_MULTI : 0, event;
|
2012-06-26 20:27:09 +02:00
|
|
|
|
|
|
|
event = (NFNL_SUBSYS_CTNETLINK << 8 | IPCTNL_MSG_EXP_GET_STATS_CPU);
|
2012-09-07 22:12:54 +02:00
|
|
|
nlh = nlmsg_put(skb, portid, seq, event, sizeof(*nfmsg), flags);
|
2012-06-26 20:27:09 +02:00
|
|
|
if (nlh == NULL)
|
|
|
|
goto nlmsg_failure;
|
|
|
|
|
|
|
|
nfmsg = nlmsg_data(nlh);
|
|
|
|
nfmsg->nfgen_family = AF_UNSPEC;
|
|
|
|
nfmsg->version = NFNETLINK_V0;
|
|
|
|
nfmsg->res_id = htons(cpu);
|
|
|
|
|
|
|
|
if (nla_put_be32(skb, CTA_STATS_EXP_NEW, htonl(st->expect_new)) ||
|
|
|
|
nla_put_be32(skb, CTA_STATS_EXP_CREATE, htonl(st->expect_create)) ||
|
|
|
|
nla_put_be32(skb, CTA_STATS_EXP_DELETE, htonl(st->expect_delete)))
|
|
|
|
goto nla_put_failure;
|
|
|
|
|
|
|
|
nlmsg_end(skb, nlh);
|
|
|
|
return skb->len;
|
|
|
|
|
|
|
|
nla_put_failure:
|
|
|
|
nlmsg_failure:
|
|
|
|
nlmsg_cancel(skb, nlh);
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
ctnetlink_exp_stat_cpu_dump(struct sk_buff *skb, struct netlink_callback *cb)
|
|
|
|
{
|
|
|
|
int cpu;
|
|
|
|
struct net *net = sock_net(skb->sk);
|
|
|
|
|
|
|
|
if (cb->args[0] == nr_cpu_ids)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
for (cpu = cb->args[0]; cpu < nr_cpu_ids; cpu++) {
|
|
|
|
const struct ip_conntrack_stat *st;
|
|
|
|
|
|
|
|
if (!cpu_possible(cpu))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
st = per_cpu_ptr(net->ct.stat, cpu);
|
2012-09-07 22:12:54 +02:00
|
|
|
if (ctnetlink_exp_stat_fill_info(skb, NETLINK_CB(cb->skb).portid,
|
2012-06-26 20:27:09 +02:00
|
|
|
cb->nlh->nlmsg_seq,
|
|
|
|
cpu, st) < 0)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
cb->args[0] = cpu;
|
|
|
|
|
|
|
|
return skb->len;
|
|
|
|
}
|
|
|
|
|
2015-12-15 18:41:56 +01:00
|
|
|
static int ctnetlink_stat_exp_cpu(struct net *net, struct sock *ctnl,
|
|
|
|
struct sk_buff *skb,
|
|
|
|
const struct nlmsghdr *nlh,
|
|
|
|
const struct nlattr * const cda[])
|
2012-06-26 20:27:09 +02:00
|
|
|
{
|
|
|
|
if (nlh->nlmsg_flags & NLM_F_DUMP) {
|
|
|
|
struct netlink_dump_control c = {
|
|
|
|
.dump = ctnetlink_exp_stat_cpu_dump,
|
|
|
|
};
|
|
|
|
return netlink_dump_start(ctnl, skb, nlh, &c);
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2006-01-05 21:19:05 +01:00
|
|
|
#ifdef CONFIG_NF_CONNTRACK_EVENTS
|
2009-06-03 10:32:06 +02:00
|
|
|
static struct nf_ct_event_notifier ctnl_notifier = {
|
|
|
|
.fcn = ctnetlink_conntrack_event,
|
2006-01-05 21:19:05 +01:00
|
|
|
};
|
|
|
|
|
2009-06-03 10:32:06 +02:00
|
|
|
static struct nf_exp_event_notifier ctnl_notifier_exp = {
|
|
|
|
.fcn = ctnetlink_expect_event,
|
2006-01-05 21:19:05 +01:00
|
|
|
};
|
|
|
|
#endif
|
|
|
|
|
2007-09-28 23:15:45 +02:00
|
|
|
static const struct nfnl_callback ctnl_cb[IPCTNL_MSG_MAX] = {
|
2006-01-05 21:19:05 +01:00
|
|
|
[IPCTNL_MSG_CT_NEW] = { .call = ctnetlink_new_conntrack,
|
2007-09-28 23:39:55 +02:00
|
|
|
.attr_count = CTA_MAX,
|
|
|
|
.policy = ct_nla_policy },
|
2006-01-05 21:19:05 +01:00
|
|
|
[IPCTNL_MSG_CT_GET] = { .call = ctnetlink_get_conntrack,
|
2007-09-28 23:39:55 +02:00
|
|
|
.attr_count = CTA_MAX,
|
|
|
|
.policy = ct_nla_policy },
|
2006-01-05 21:19:05 +01:00
|
|
|
[IPCTNL_MSG_CT_DELETE] = { .call = ctnetlink_del_conntrack,
|
2007-09-28 23:39:55 +02:00
|
|
|
.attr_count = CTA_MAX,
|
|
|
|
.policy = ct_nla_policy },
|
2006-01-05 21:19:05 +01:00
|
|
|
[IPCTNL_MSG_CT_GET_CTRZERO] = { .call = ctnetlink_get_conntrack,
|
2007-09-28 23:39:55 +02:00
|
|
|
.attr_count = CTA_MAX,
|
|
|
|
.policy = ct_nla_policy },
|
2012-06-26 20:27:09 +02:00
|
|
|
[IPCTNL_MSG_CT_GET_STATS_CPU] = { .call = ctnetlink_stat_ct_cpu },
|
|
|
|
[IPCTNL_MSG_CT_GET_STATS] = { .call = ctnetlink_stat_ct },
|
2012-11-27 14:49:42 +01:00
|
|
|
[IPCTNL_MSG_CT_GET_DYING] = { .call = ctnetlink_get_ct_dying },
|
|
|
|
[IPCTNL_MSG_CT_GET_UNCONFIRMED] = { .call = ctnetlink_get_ct_unconfirmed },
|
2006-01-05 21:19:05 +01:00
|
|
|
};
|
|
|
|
|
2007-09-28 23:15:45 +02:00
|
|
|
static const struct nfnl_callback ctnl_exp_cb[IPCTNL_MSG_EXP_MAX] = {
|
2006-01-05 21:19:05 +01:00
|
|
|
[IPCTNL_MSG_EXP_GET] = { .call = ctnetlink_get_expect,
|
2007-09-28 23:39:55 +02:00
|
|
|
.attr_count = CTA_EXPECT_MAX,
|
|
|
|
.policy = exp_nla_policy },
|
2006-01-05 21:19:05 +01:00
|
|
|
[IPCTNL_MSG_EXP_NEW] = { .call = ctnetlink_new_expect,
|
2007-09-28 23:39:55 +02:00
|
|
|
.attr_count = CTA_EXPECT_MAX,
|
|
|
|
.policy = exp_nla_policy },
|
2006-01-05 21:19:05 +01:00
|
|
|
[IPCTNL_MSG_EXP_DELETE] = { .call = ctnetlink_del_expect,
|
2007-09-28 23:39:55 +02:00
|
|
|
.attr_count = CTA_EXPECT_MAX,
|
|
|
|
.policy = exp_nla_policy },
|
2012-06-26 20:27:09 +02:00
|
|
|
[IPCTNL_MSG_EXP_GET_STATS_CPU] = { .call = ctnetlink_stat_exp_cpu },
|
2006-01-05 21:19:05 +01:00
|
|
|
};
|
|
|
|
|
2007-09-28 23:15:45 +02:00
|
|
|
static const struct nfnetlink_subsystem ctnl_subsys = {
|
2006-01-05 21:19:05 +01:00
|
|
|
.name = "conntrack",
|
|
|
|
.subsys_id = NFNL_SUBSYS_CTNETLINK,
|
|
|
|
.cb_count = IPCTNL_MSG_MAX,
|
|
|
|
.cb = ctnl_cb,
|
|
|
|
};
|
|
|
|
|
2007-09-28 23:15:45 +02:00
|
|
|
static const struct nfnetlink_subsystem ctnl_exp_subsys = {
|
2006-01-05 21:19:05 +01:00
|
|
|
.name = "conntrack_expect",
|
|
|
|
.subsys_id = NFNL_SUBSYS_CTNETLINK_EXP,
|
|
|
|
.cb_count = IPCTNL_MSG_EXP_MAX,
|
|
|
|
.cb = ctnl_exp_cb,
|
|
|
|
};
|
|
|
|
|
2006-12-03 07:06:05 +01:00
|
|
|
MODULE_ALIAS("ip_conntrack_netlink");
|
2006-01-05 21:19:05 +01:00
|
|
|
MODULE_ALIAS_NFNL_SUBSYS(NFNL_SUBSYS_CTNETLINK);
|
2006-02-04 11:11:41 +01:00
|
|
|
MODULE_ALIAS_NFNL_SUBSYS(NFNL_SUBSYS_CTNETLINK_EXP);
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2011-11-22 00:16:51 +01:00
|
|
|
static int __net_init ctnetlink_net_init(struct net *net)
|
|
|
|
{
|
|
|
|
#ifdef CONFIG_NF_CONNTRACK_EVENTS
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
ret = nf_conntrack_register_notifier(net, &ctnl_notifier);
|
|
|
|
if (ret < 0) {
|
|
|
|
pr_err("ctnetlink_init: cannot register notifier.\n");
|
|
|
|
goto err_out;
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = nf_ct_expect_register_notifier(net, &ctnl_notifier_exp);
|
|
|
|
if (ret < 0) {
|
|
|
|
pr_err("ctnetlink_init: cannot expect register notifier.\n");
|
|
|
|
goto err_unreg_notifier;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
#ifdef CONFIG_NF_CONNTRACK_EVENTS
|
|
|
|
err_unreg_notifier:
|
|
|
|
nf_conntrack_unregister_notifier(net, &ctnl_notifier);
|
|
|
|
err_out:
|
|
|
|
return ret;
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
|
|
|
static void ctnetlink_net_exit(struct net *net)
|
|
|
|
{
|
|
|
|
#ifdef CONFIG_NF_CONNTRACK_EVENTS
|
|
|
|
nf_ct_expect_unregister_notifier(net, &ctnl_notifier_exp);
|
|
|
|
nf_conntrack_unregister_notifier(net, &ctnl_notifier);
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
|
|
|
static void __net_exit ctnetlink_net_exit_batch(struct list_head *net_exit_list)
|
|
|
|
{
|
|
|
|
struct net *net;
|
|
|
|
|
|
|
|
list_for_each_entry(net, net_exit_list, exit_list)
|
|
|
|
ctnetlink_net_exit(net);
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct pernet_operations ctnetlink_net_ops = {
|
|
|
|
.init = ctnetlink_net_init,
|
|
|
|
.exit_batch = ctnetlink_net_exit_batch,
|
|
|
|
};
|
|
|
|
|
2006-01-05 21:19:05 +01:00
|
|
|
static int __init ctnetlink_init(void)
|
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
|
2010-05-13 15:02:08 +02:00
|
|
|
pr_info("ctnetlink v%s: registering with nfnetlink.\n", version);
|
2006-01-05 21:19:05 +01:00
|
|
|
ret = nfnetlink_subsys_register(&ctnl_subsys);
|
|
|
|
if (ret < 0) {
|
2010-05-13 15:02:08 +02:00
|
|
|
pr_err("ctnetlink_init: cannot register with nfnetlink.\n");
|
2006-01-05 21:19:05 +01:00
|
|
|
goto err_out;
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = nfnetlink_subsys_register(&ctnl_exp_subsys);
|
|
|
|
if (ret < 0) {
|
2010-05-13 15:02:08 +02:00
|
|
|
pr_err("ctnetlink_init: cannot register exp with nfnetlink.\n");
|
2006-01-05 21:19:05 +01:00
|
|
|
goto err_unreg_subsys;
|
|
|
|
}
|
|
|
|
|
2012-08-29 08:49:16 +02:00
|
|
|
ret = register_pernet_subsys(&ctnetlink_net_ops);
|
|
|
|
if (ret < 0) {
|
2011-11-22 00:16:51 +01:00
|
|
|
pr_err("ctnetlink_init: cannot register pernet operations\n");
|
2006-01-05 21:19:05 +01:00
|
|
|
goto err_unreg_exp_subsys;
|
|
|
|
}
|
2015-10-05 04:48:47 +02:00
|
|
|
#ifdef CONFIG_NETFILTER_NETLINK_GLUE_CT
|
2012-06-07 12:13:39 +02:00
|
|
|
/* setup interaction between nf_queue and nf_conntrack_netlink. */
|
2015-10-05 04:47:13 +02:00
|
|
|
RCU_INIT_POINTER(nfnl_ct_hook, &ctnetlink_glue_hook);
|
2012-06-07 12:13:39 +02:00
|
|
|
#endif
|
2006-01-05 21:19:05 +01:00
|
|
|
return 0;
|
|
|
|
|
|
|
|
err_unreg_exp_subsys:
|
|
|
|
nfnetlink_subsys_unregister(&ctnl_exp_subsys);
|
|
|
|
err_unreg_subsys:
|
|
|
|
nfnetlink_subsys_unregister(&ctnl_subsys);
|
|
|
|
err_out:
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void __exit ctnetlink_exit(void)
|
|
|
|
{
|
2010-05-13 15:02:08 +02:00
|
|
|
pr_info("ctnetlink: unregistering from nfnetlink.\n");
|
2006-01-05 21:19:05 +01:00
|
|
|
|
2011-11-22 00:16:51 +01:00
|
|
|
unregister_pernet_subsys(&ctnetlink_net_ops);
|
2006-01-05 21:19:05 +01:00
|
|
|
nfnetlink_subsys_unregister(&ctnl_exp_subsys);
|
|
|
|
nfnetlink_subsys_unregister(&ctnl_subsys);
|
2015-10-05 04:48:47 +02:00
|
|
|
#ifdef CONFIG_NETFILTER_NETLINK_GLUE_CT
|
2015-10-05 04:47:13 +02:00
|
|
|
RCU_INIT_POINTER(nfnl_ct_hook, NULL);
|
2012-06-07 12:13:39 +02:00
|
|
|
#endif
|
netfilter: invoke synchronize_rcu after set the _hook_ to NULL
Otherwise, another CPU may access the invalid pointer. For example:
CPU0 CPU1
- rcu_read_lock();
- pfunc = _hook_;
_hook_ = NULL; -
mod unload -
- pfunc(); // invalid, panic
- rcu_read_unlock();
So we must call synchronize_rcu() to wait the rcu reader to finish.
Also note, in nf_nat_snmp_basic_fini, synchronize_rcu() will be invoked
by later nf_conntrack_helper_unregister, but I'm inclined to add a
explicit synchronize_rcu after set the nf_nat_snmp_hook to NULL. Depend
on such obscure assumptions is not a good idea.
Last, in nfnetlink_cttimeout, we use kfree_rcu to free the time object,
so in cttimeout_exit, invoking rcu_barrier() is not necessary at all,
remove it too.
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-25 01:53:12 +01:00
|
|
|
synchronize_rcu();
|
2006-01-05 21:19:05 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
module_init(ctnetlink_init);
|
|
|
|
module_exit(ctnetlink_exit);
|