Use "hlist_nulls" infrastructure we added in 2.6.29 for RCUification of UDP & TCP.
This permits an easy conversion from call_rcu() based hash lists to a
SLAB_DESTROY_BY_RCU one.
Avoiding call_rcu() delay at nf_conn freeing time has numerous gains.
First, it doesnt fill RCU queues (up to 10000 elements per cpu).
This reduces OOM possibility, if queued elements are not taken into account
This reduces latency problems when RCU queue size hits hilimit and triggers
emergency mode.
- It allows fast reuse of just freed elements, permitting better use of
CPU cache.
- We delete rcu_head from "struct nf_conn", shrinking size of this structure
by 8 or 16 bytes.
This patch only takes care of "struct nf_conn".
call_rcu() is still used for less critical conntrack parts, that may
be converted later if necessary.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This is necessary in order to have an upper bound for Netlink
message calculation, which is not a problem at all, as there
are no helpers with a longer name.
Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
There is added a single callback for the l3 proto helper. The two
callbacks for the l4 protos are necessary because of the general
structure of a ctnetlink event, which is in short:
CTA_TUPLE_ORIG
<l3/l4-proto-attributes>
CTA_TUPLE_REPLY
<l3/l4-proto-attributes>
CTA_ID
...
CTA_PROTOINFO
<l4-proto-attributes>
CTA_TUPLE_MASTER
<l3/l4-proto-attributes>
Therefore the formular is
size := sizeof(generic-nlas) + 3 * sizeof(tuple_nlas) + sizeof(protoinfo_nlas)
Some of the NLAs are optional, e. g. CTA_TUPLE_MASTER, which is only
set if it's an expected connection. But the number of optional NLAs is
small enough to prevent netlink_trim() from reallocating if calculated
properly.
Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
We use same not trivial helper function in four places. We can factorize it.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Using hlist_add_head() in nf_conntrack_set_hashsize() is quite dangerous.
Without any barrier, one CPU could see a loop while doing its lookup.
Its true new table cannot be seen by another cpu, but previous table is still
readable.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
net/netfilter/xt_LED.c:40: error: field netfilter_led_trigger has incomplete type
net/netfilter/xt_LED.c: In function led_timeout_callback:
net/netfilter/xt_LED.c:78: warning: unused variable ledinternal
net/netfilter/xt_LED.c: In function led_tg_check:
net/netfilter/xt_LED.c:102: error: implicit declaration of function led_trigger_register
net/netfilter/xt_LED.c: In function led_tg_destroy:
net/netfilter/xt_LED.c:135: error: implicit declaration of function led_trigger_unregister
Fix by adding a dependency on LED_TRIGGERS.
Reported-by: Sachin Sant <sachinp@in.ibm.com>
Tested-by: Subrata Modak <tosubrata@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch combines Greg Bank's dprintk() work with the existing dynamic
printk patchset, we are now calling it 'dynamic debug'.
The new feature of this patchset is a richer /debugfs control file interface,
(an example output from my system is at the bottom), which allows fined grained
control over the the debug output. The output can be controlled by function,
file, module, format string, and line number.
for example, enabled all debug messages in module 'nf_conntrack':
echo -n 'module nf_conntrack +p' > /mnt/debugfs/dynamic_debug/control
to disable them:
echo -n 'module nf_conntrack -p' > /mnt/debugfs/dynamic_debug/control
A further explanation can be found in the documentation patch.
Signed-off-by: Greg Banks <gnb@sgi.com>
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We use RCU to defer freeing of conntrack structures. In DOS situation, RCU might
accumulate about 10.000 elements per CPU in its internal queues. To get accurate
conntrack counts (at the expense of slightly more RAM used), we might consider
conntrack counter not taking into account "about to be freed elements, waiting
in RCU queues". We thus decrement it in nf_conntrack_free(), not in the RCU
callback.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Tested-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch fixes an unaligned memory access in tcp_sack while reading
sequence numbers from TCP selective acknowledgement options. Prior to
applying this patch, upstream linux-2.6.27.20 was occasionally
generating messages like this on my sparc64 system:
[54678.532071] Kernel unaligned access at TPC[6b17d4] tcp_packet+0xcd4/0xd00
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch adds nfnetlink_set_err() to propagate the error to netlink
broadcast listener in case of memory allocation errors in the
message building.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patchs adds support of modification of the used logger via sysctl.
It can be used to change the logger to module that can not use the bind
operation (ipt_LOG and ipt_ULOG). For this purpose, it creates a
directory /proc/sys/net/netfilter/nf_log which contains a file
per-protocol. The content of the file is the name current logger (NONE if
not set) and a logger can be setup by simply echoing its name to the file.
By echoing "NONE" to a /proc/sys/net/netfilter/nf_log/PROTO file, the
logger corresponding to this PROTO is set to NULL.
Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Convert the remaining refcount users.
As pointed out by Patrick McHardy, the protocols can be accessed safely using RCU.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch adds the iptables cluster match. This match can be used
to deploy gateway and back-end load-sharing clusters. The cluster
can be composed of 32 nodes maximum (although I have only tested
this with two nodes, so I cannot tell what is the real scalability
limit of this solution in terms of cluster nodes).
Assuming that all the nodes see all packets (see below for an
example on how to do that if your switch does not allow this), the
cluster match decides if this node has to handle a packet given:
(jhash(source IP) % total_nodes) & node_mask
For related connections, the master conntrack is used. The following
is an example of its use to deploy a gateway cluster composed of two
nodes (where this is the node 1):
iptables -I PREROUTING -t mangle -i eth1 -m cluster \
--cluster-total-nodes 2 --cluster-local-node 1 \
--cluster-proc-name eth1 -j MARK --set-mark 0xffff
iptables -A PREROUTING -t mangle -i eth1 \
-m mark ! --mark 0xffff -j DROP
iptables -A PREROUTING -t mangle -i eth2 -m cluster \
--cluster-total-nodes 2 --cluster-local-node 1 \
--cluster-proc-name eth2 -j MARK --set-mark 0xffff
iptables -A PREROUTING -t mangle -i eth2 \
-m mark ! --mark 0xffff -j DROP
And the following commands to make all nodes see the same packets:
ip maddr add 01:00:5e:00:01:01 dev eth1
ip maddr add 01:00:5e:00:01:02 dev eth2
arptables -I OUTPUT -o eth1 --h-length 6 \
-j mangle --mangle-mac-s 01:00:5e:00:01:01
arptables -I INPUT -i eth1 --h-length 6 \
--destination-mac 01:00:5e:00:01:01 \
-j mangle --mangle-mac-d 00:zz:yy:xx:5a:27
arptables -I OUTPUT -o eth2 --h-length 6 \
-j mangle --mangle-mac-s 01:00:5e:00:01:02
arptables -I INPUT -i eth2 --h-length 6 \
--destination-mac 01:00:5e:00:01:02 \
-j mangle --mangle-mac-d 00:zz:yy:xx:5a:27
In the case of TCP connections, pickup facility has to be disabled
to avoid marking TCP ACK packets coming in the reply direction as
valid.
echo 0 > /proc/sys/net/netfilter/nf_conntrack_tcp_loose
BTW, some final notes:
* This match mangles the skbuff pkt_type in case that it detects
PACKET_MULTICAST for a non-multicast address. This may be done in
a PKTTYPE target for this sole purpose.
* This match supersedes the CLUSTERIP target.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Module specific data moved into per-net site and being allocated/freed
during net namespace creation/deletion.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
We currently use the negative value in the conntrack code to encode
the packet verdict in the error. As NF_DROP is equal to 0, inverting
NF_DROP makes no sense and, as a result, no packets are ever dropped.
Signed-off-by: Christoph Paasch <christoph.paasch@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch fixes a possible crash due to the missing initialization
of the expectation class when nf_ct_expect_related() is called.
Reported-by: BORBELY Zoltan <bozo@andrews.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Commit 784544739a (netfilter: iptables:
lock free counters) broke a number of modules whose rule data referenced
itself. A reallocation would not reestablish the correct references, so
it is best to use a separate struct that does not fall under RCU.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch moves the event reporting outside the lock section. With
this patch, the creation and update of entries is homogeneous from
the event reporting perspective. Moreover, as the event reporting is
done outside the lock section, the netlink broadcast delivery can
benefit of the yield() call under congestion.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch moves the preliminary checkings that must be fulfilled
to update a conntrack, which are the following:
* NAT manglings cannot be updated
* Changing the master conntrack is not allowed.
This patch is a cleanup.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch moves the assignation of the master conntrack to
ctnetlink_create_conntrack(), which is where it really belongs.
This patch is a cleanup.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch modifies the proc output to add display of registered
loggers. The content of /proc/net/netfilter/nf_log is modified. Instead
of displaying a protocol per line with format:
proto:logger
it now displays:
proto:logger (comma_separated_list_of_loggers)
NONE is used as keyword if no logger is used.
Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch modifies nf_log to use a linked list of loggers for each
protocol. This list of loggers is read and write protected with a
mutex.
This patch separates registration and binding. To be used as
logging module, a module has to register calling nf_log_register()
and to bind to a protocol it has to call nf_log_bind_pf().
This patch also converts the logging modules to the new API. For nfnetlink_log,
it simply switchs call to register functions to call to bind function and
adds a call to nf_log_register() during init. For other modules, it just
remove a const flag from the logger structure and replace it with a
__read_mostly.
Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Commit 784544739a
(netfilter: iptables: lock free counters) broke xt_hashlimit netfilter module :
This module was storing a pointer inside its xt_hashlimit_info, and this pointer
is not relocated when we temporarly switch tables (iptables -L).
This hack is not not needed at all (probably a leftover from
ancient time), as each cpu should and can access to its own copy.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Fix regression introduded by commit 079aa88 (netfilter: xt_recent: IPv6 support):
From http://bugzilla.kernel.org/show_bug.cgi?id=12753:
Problem Description:
An uninitialized buffer causes IPv4 addresses added manually (via the +IP
command to the proc interface) to never match any packets. Similarly, the -IP
command fails to remove IPv4 addresses.
Details:
In the function recent_entry_lookup, the xt_recent module does comparisons of
the entire nf_inet_addr union value, both for IPv4 and IPv6 addresses. For
addresses initialized from actual packets the remaining 12 bytes not occupied
by the IPv4 are zeroed so this works correctly. However when setting the
nf_inet_addr addr variable in the recent_mt_proc_write function, only the IPv4
bytes are initialized and the remaining 12 bytes contain garbage.
Hence addresses added in this way never match any packets, unless these
uninitialized 12 bytes happened to be zero by coincidence. Similarly, addresses
cannot consistently be removed using the proc interface due to mismatch of the
garbage bytes (although it will sometimes work to remove an address that was
added manually).
Reading the /proc/net/xt_recent/ entries hides this problem because this only
uses the first 4 bytes when displaying IPv4 addresses.
Steps to reproduce:
$ iptables -I INPUT -m recent --rcheck -j LOG
$ echo +169.254.156.239 > /proc/net/xt_recent/DEFAULT
$ cat /proc/net/xt_recent/DEFAULT
src=169.254.156.239 ttl: 0 last_seen: 119910 oldest_pkt: 1 119910
[At this point no packets from 169.254.156.239 are being logged.]
$ iptables -I INPUT -s 169.254.156.239 -m recent --set
$ cat /proc/net/xt_recent/DEFAULT
src=169.254.156.239 ttl: 0 last_seen: 119910 oldest_pkt: 1 119910
src=169.254.156.239 ttl: 255 last_seen: 126184 oldest_pkt: 4 125434, 125684, 125934, 126184
[At this point, adding the address via an iptables rule, packets are being
logged correctly.]
$ echo -169.254.156.239 > /proc/net/xt_recent/DEFAULT
$ cat /proc/net/xt_recent/DEFAULT
src=169.254.156.239 ttl: 0 last_seen: 119910 oldest_pkt: 1 119910
src=169.254.156.239 ttl: 255 last_seen: 126992 oldest_pkt: 10 125434, 125684, 125934, 126184, 126434, 126684, 126934, 126991, 126991, 126992
$ echo -169.254.156.239 > /proc/net/xt_recent/DEFAULT
$ cat /proc/net/xt_recent/DEFAULT
src=169.254.156.239 ttl: 0 last_seen: 119910 oldest_pkt: 1 119910
src=169.254.156.239 ttl: 255 last_seen: 126992 oldest_pkt: 10 125434, 125684, 125934, 126184, 126434, 126684, 126934, 126991, 126991, 126992
[Removing the address via /proc interface failed evidently.]
Possible solutions:
- initialize the addr variable in recent_mt_proc_write
- compare only 4 bytes for IPv4 addresses in recent_entry_lookup
Signed-off-by: Patrick McHardy <kaber@trash.net>
Since tcp_packet() may return -NF_DROP in two situations, the
packet-drop stats must be increased.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Kernel module providing implementation of LED netfilter target. Each
instance of the target appears as a led-trigger device, which can be
associated with one or more LEDs in /sys/class/leds/
Signed-off-by: Adam Nielsen <a.nielsen@shikadi.net>
Acked-by: Richard Purdie <rpurdie@linux.intel.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
get_random_bytes() is sometimes called with a hard coded size assumption
of an integer. This could not be true for next centuries. This patch
replace it with a compile time statement.
Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Table size is defined as unsigned, wheres the table maximum size is
defined as a signed integer. The calculation of max is 8 or 4,
multiplied the table size. Therefore the max value is aligned to
unsigned.
Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
The reader/writer lock in ip_tables is acquired in the critical path of
processing packets and is one of the reasons just loading iptables can cause
a 20% performance loss. The rwlock serves two functions:
1) it prevents changes to table state (xt_replace) while table is in use.
This is now handled by doing rcu on the xt_table. When table is
replaced, the new table(s) are put in and the old one table(s) are freed
after RCU period.
2) it provides synchronization when accesing the counter values.
This is now handled by swapping in new table_info entries for each cpu
then summing the old values, and putting the result back onto one
cpu. On a busy system it may cause sampling to occur at different
times on each cpu, but no packet/byte counts are lost in the process.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Sucessfully tested on my dual quad core machine too, but iptables only (no ipv6 here)
BTW, my new "tbench 8" result is 2450 MB/s, (it was 2150 MB/s not so long ago)
Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
xt_physdev netfilter module can use an ifname_compare() helper
so that two loops are unfolded.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
1) physdev_mt() incorrectly assumes nulldevname[] is aligned on an int
2) It also uses word comparisons, while it could use long word ones.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Suggested by: James King <t.james.king@gmail.com>
Similarly to commit c9fd496809, merge
TTL and HL. Since HL does not depend on any IPv6-specific function,
no new module dependencies would arise.
With slight adjustments to the Kconfig help text.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
When extensions were moved to the NFPROTO_UNSPEC wildcard in
ab4f21e6fb, they disappeared from the
procfs files.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
0 is used by Hop-by-hop header and so this may cause confusion.
255 is stated as 'Reserved' by IANA.
Signed-off-by: Christoph Paasch <christoph.paasch@student.uclouvain.be>
Signed-off-by: Patrick McHardy <kaber@trash.net>
NFLOG timeout was computed in timer by doing:
flushtimeout*HZ/100
Default value of flushtimeout was HZ (for 1 second delay). This was
wrong for non 100HZ computer. This patch modify the default delay by
using 100 instead of HZ.
Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
In NFLOG the per-rule qthreshold should overrides per-instance only
it is set. With current code, the per-rule qthreshold is 1 if not set
and it overrides the per-instance qthreshold.
This patch modifies the default xt_NFLOG threshold from 1 to
0. Thus a value of 0 means there is no per-rule setting and the instance
parameter has to apply.
Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
When user tries to map all chunks given in argument, kernel
works on a copy of the chunkmap, but at the end it doesn't
check the copy, but the orginal one.
Signed-off-by: Qu Haoran <haoran.qu@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes echoing if the socket that has sent the request to
create/update/delete an entry is not subscribed to any multicast
group. With the current code, ctnetlink would not send the echo
message via unicast as nfnetlink_send() would be skip.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes an inconsistency in the current ctnetlink code
since NAT sequence adjustment bit can only be updated but not set
in the conntrack entry creation.
This patch is used by conntrackd to successfully recover newly
created entries that represent connections with helpers and NAT
payload mangling.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
As it currently stands, skb destructors are forbidden on the
receive path because the protocol end-points will overwrite
any existing destructor with their own.
This is the reason why we have to call skb_orphan in the loopback
driver before we reinject the packet back into the stack, thus
creating a period during which loopback traffic isn't charged
to any socket.
With virtualisation, we have a similar problem in that traffic
is reinjected into the stack without being associated with any
socket entity, thus providing no natural congestion push-back
for those poor folks still stuck with UDP.
Now had we been consistent in telling them that UDP simply has
no congestion feedback, I could just fob them off. Unfortunately,
we appear to have gone to some length in catering for this on
the standard UDP path, with skb/socket accounting so that has
created a very unhealthy dependency.
Alas habits are difficult to break out of, so we may just have
to allow skb destructors on the receive path.
It turns out that making skb destructors useable on the receive path
isn't as easy as it seems. For instance, simply adding skb_orphan
to skb_set_owner_r isn't enough. This is because we assume all
over the IP stack that skb->sk is an IP socket if present.
The new transparent proxy code goes one step further and assumes
that skb->sk is the receiving socket if present.
Now all of this can be dealt with by adding simple checks such
as only treating skb->sk as an IP socket if skb->sk->sk_family
matches. However, it turns out that for bridging at least we
don't need to do all of this work.
This is of interest because most virtualisation setups use bridging
so we don't actually go through the IP stack on the host (with
the exception of our old nemesis the bridge netfilter, but that's
easily taken care of).
So this patch simply adds skb_orphan to the point just before we
enter the IP stack, but after we've gone through the bridge on the
receive path. It also adds an skb_orphan to the one place in
netfilter that touches skb->sk/skb->destructor, that is, tproxy.
One word of caution, because of the internal code structure, anyone
wishing to deploy this must use skb_set_owner_w as opposed to
skb_set_owner_r since many functions that create a new skb from
an existing one will invoke skb_set_owner_w on the new skb.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Base versions handle constant folding now.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Caused by call to request_module() while holding nf_conntrack_lock.
Reported-and-tested-by: Kövesdi György <kgy@teledigit.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
netfilter: xt_time: print timezone for user information
Let users have a way to figure out if their distro set the kernel
timezone at all.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
nf_conntrack_alloc cannot return NULL, so there is no need to check for
NULL before using the value. I have also removed the initialization of ct
to NULL in nf_conntrack_alloc, since the value is never used, and since
perhaps it might lead one to think that return ct at the end might return
NULL.
The semantic patch that finds this problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@match exists@
expression x, E;
position p1,p2;
statement S1, S2;
@@
x@p1 = nf_conntrack_alloc(...)
... when != x = E
(
if (x@p2 == NULL || ...) S1 else S2
|
if (x@p2 == NULL && ...) S1 else S2
)
@other_match exists@
expression match.x, E1, E2;
position p1!=match.p1,match.p2;
@@
x@p1 = E1
... when != x = E2
x@p2
@ script:python depends on !other_match@
p1 << match.p1;
p2 << match.p2;
@@
print "%s: call to nf_conntrack_alloc %s bad test %s" % (p1[0].file,p1[0].line,p2[0].line)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 55b69e91 (netfilter: implement NFPROTO_UNSPEC as a wildcard
for extensions) broke revision probing for matches and targets that
are registered with NFPROTO_UNSPEC.
Fix by continuing the search on the NFPROTO_UNSPEC list if nothing
is found on the af-specific lists.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
In future all cpumask ops will only be valid (in general) for bit
numbers < nr_cpu_ids. So use that instead of NR_CPUS in iterators
and other comparisons.
This is always safe: no cpu number can be >= nr_cpu_ids, and
nr_cpu_ids is initialized to NR_CPUS at boot.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1429 commits)
net: Allow dependancies of FDDI & Tokenring to be modular.
igb: Fix build warning when DCA is disabled.
net: Fix warning fallout from recent NAPI interface changes.
gro: Fix potential use after free
sfc: If AN is enabled, always read speed/duplex from the AN advertising bits
sfc: When disabling the NIC, close the device rather than unregistering it
sfc: SFT9001: Add cable diagnostics
sfc: Add support for multiple PHY self-tests
sfc: Merge top-level functions for self-tests
sfc: Clean up PHY mode management in loopback self-test
sfc: Fix unreliable link detection in some loopback modes
sfc: Generate unique names for per-NIC workqueues
802.3ad: use standard ethhdr instead of ad_header
802.3ad: generalize out mac address initializer
802.3ad: initialize ports LACPDU from const initializer
802.3ad: remove typedef around ad_system
802.3ad: turn ports is_individual into a bool
802.3ad: turn ports is_enabled into a bool
802.3ad: make ntt bool
ixgbe: Fix set_ringparam in ixgbe to use the same memory pools.
...
Fixed trivial IPv4/6 address printing conflicts in fs/cifs/connect.c due
to the conversion to %pI (in this networking merge) and the addition of
doing IPv6 addresses (from the earlier merge of CIFS).
The patch "don't call nf_log_packet in NFLOG module" make xt_NFLOG
dependant of nfnetlink_log. This patch forces the dependencies to fix
compilation in case only xt_NFLOG compilation was asked and modifies the
help message accordingly to the change.
Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
TIME_WAIT sockets need to be handled specially, and the socket match
casted inet_timewait_sock instances to inet_sock, which are not
compatible.
Handle this special case by checking sk->sk_state.
Signed-off-by: Balazs Scheidler <bazsi@balabit.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
fs/nfsd/nfs4recover.c
Manually fixed above to use new creds API functions, e.g.
nfs4_save_creds().
Signed-off-by: James Morris <jmorris@namei.org>
The previous fix for the conntrack creation race (netfilter: ctnetlink:
fix conntrack creation race) missed a GFP_KERNEL allocation that is
now performed while holding a spinlock. Switch to GFP_ATOMIC.
Reported-and-tested-by: Zoltan Borbely <bozo@andrews.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
this warning:
net/netfilter/nf_conntrack_ftp.c: In function 'help':
net/netfilter/nf_conntrack_ftp.c:360: warning: 'matchoff' may be used uninitialized in this function
net/netfilter/nf_conntrack_ftp.c:360: warning: 'matchlen' may be used uninitialized in this function
triggers because GCC does not recognize the (correct) error flow
between find_pattern(), 'found', 'matchoff' and 'matchlen'.
Annotate it.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Conntrack creation through ctnetlink has two races:
- the timer may expire and free the conntrack concurrently, causing an
invalid memory access when attempting to put it in the hash tables
- an identical conntrack entry may be created in the packet processing
path in the time between the lookup and hash insertion
Hold the conntrack lock between the lookup and insertion to avoid this.
Reported-by: Zoltan Borbely <bozo@andrews.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The message triggers when sending non-FTP data on port 21 or with
certain clients that use multiple syscalls to send the command.
Change to pr_debug() since users have been complaining.
Signed-off-by: Patrick McHardy <kaber@trash.net>
net/netfilter/nf_conntrack_proto_sctp.c: In function 'sctp_packet':
net/netfilter/nf_conntrack_proto_sctp.c:376: warning: array subscript is above array bounds
gcc doesn't realize that do_basic_checks() guarantees that there is
at least one valid chunk and thus new_state is never SCTP_CONNTRACK_MAX
after the loop. Initialize to SCTP_CONNTRACK_NONE to avoid the warning.
Based on patch by Wu Fengguang <wfg@linux.intel.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Not needed, since creation and removal are done by name.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
net/netfilter/nf_conntrack_core.c:46:1: warning: symbol 'nfnetlink_parse_nat_setup_hook' was not declared. Should it be static?
Including the proper header also revealed an incorrect prototype.
Signed-off-by: Patrick McHardy <kaber@trash.net>
net/netfilter/nfnetlink_log.c:537:1: warning: symbol 'nfulnl_log_packet' was not declared. Should it be static?
Including the proper header also revealed an incorrect prototype.
Signed-off-by: Patrick McHardy <kaber@trash.net>
As for now, the creation and update of conntracks via ctnetlink do not
propagate an event to userspace. This can result in inconsistent situations
if several userspace processes modify the connection tracking table by means
of ctnetlink at the same time. Specifically, using the conntrack command
line tool and conntrackd at the same time can trigger unconsistencies.
This patch also modifies the event cache infrastructure to pass the
process PID and the ECHO flag to nfnetlink_send() to report back
to userspace if the process that triggered the change needs so.
Based on a suggestion from Patrick McHardy.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch adds module loading for helpers via ctnetlink.
* Creation path: We support explicit and implicit helper assignation. For
the explicit case, we try to load the module. If the module is correctly
loaded and the helper is present, we return EAGAIN to re-start the
creation. Otherwise, we return EOPNOTSUPP.
* Update path: release the spin lock, load the module and check. If it is
present, then return EAGAIN to re-start the update.
This patch provides a refactorized function to lookup-and-set the
connection tracking helper. The function removes the exported symbol
__nf_ct_helper_find as it has not clients anymore.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch adds the macro MODULE_ALIAS_NFCT_HELPER that defines a
way to provide generic and persistent aliases for the connection
tracking helpers.
This next patch requires this patch.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch replaces the unnecessary module refcounting with
the read-side locks. With this patch, all the dump and fill_info
function are called under the RCU read lock.
Based on a patch from Fabian Hugelshofer.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch changes the return value if the conntrack has no helper assigned.
Instead of EINVAL, which is reserved for malformed messages, it returns
EOPNOTSUPP.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Use nf_conntrack_get instead of the direct call to atomic_inc.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Conflicts:
security/keys/internal.h
security/keys/process_keys.c
security/keys/request_key.c
Fixed conflicts above by using the non 'tsk' versions.
Signed-off-by: James Morris <jmorris@namei.org>
Attach creds to file structs and discard f_uid/f_gid.
file_operations::open() methods (such as hppfs_open()) should use file->f_cred
rather than current_cred(). At the moment file->f_cred will be current_cred()
at this point.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Signed-off-by: James Morris <jmorris@namei.org>
payload_len is a be16 value, not cpu_endian, also the size of a ponter
to a struct ipv6hdr was being added, not the size of the struct itself.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch modifies xt_NFLOG to suppress the call to nf_log_packet()
function. The call of this wrapper in xt_NFLOG was causing NFLOG to
use the first initialized module. Thus, if ipt_ULOG is loaded before
nfnetlink_log all NFLOG rules are treated as plain LOG rules.
Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
I want to compile out proc_* and sysctl_* handlers totally and
stub them to NULL depending on config options, however usage of &
will prevent this, since taking adress of NULL pointer will break
compilation.
So, drop & in front of every ->proc_handler and every ->strategy
handler, it was never needed in fact.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the 'supports_ipv6' scheduler flag since all schedulers now
support IPv6.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IPv6 support to LBLC and LBLCR schedulers. These were the last
schedulers without IPv6 support, but we might want to keep the
supports_ipv6 flag in the case of future schedulers without IPv6
support.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IPv6 support to SH and DH schedulers. I hope this simple IPv6 address
hashing is good enough. The 128 bit are just XORed into 32 before hashing
them like an IPv4 address.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u
can be replaced with %pI4
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
register_pernet_gen_device() can't be used is nf_conntrack_pptp module is
also used (compiled in or loaded).
Right now, proto_gre_net_exit() is called before nf_conntrack_pptp_net_exit().
The former shutdowns and frees GRE piece of netns, however the latter
absolutely needs it to flush keymap. Oops is inevitable.
Switch to shiny new register_pernet_gen_subsys() to get correct ordering in
netns ops list.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
netfilter: replace old NF_ARP calls with NFPROTO_ARP
netfilter: fix compilation error with NAT=n
netfilter: xt_recent: use proc_create_data()
netfilter: snmp nat leaks memory in case of failure
netfilter: xt_iprange: fix range inversion match
netfilter: netns: use NFPROTO_NUMPROTO instead of NUMPROTO for tables array
netfilter: ctnetlink: remove obsolete NAT dependency from Kconfig
pkt_sched: sch_generic: Fix oops in sch_teql
dccp: Port redirection support for DCCP
tcp: Fix IPv6 fallout from 'Port redirection support for TCP'
netdev: change name dropping error codes
ipvs: Update CONFIG_IP_VS_IPV6 description and help text
(Supplements: ee999d8b95)
NFPROTO_ARP actually has a different value from NF_ARP, so ensure all
callers use the new value so that packets _do_ get delivered to the
registered hooks.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the compilation of ctnetlink when the NAT support
is not enabled.
/home/benh/kernels/linux-powerpc/net/netfilter/nf_conntrack_netlink.c:819: warning: enum nf_nat_manip_type\u2019 declared inside parameter list
/home/benh/kernels/linux-powerpc/net/netfilter/nf_conntrack_netlink.c:819: warning: its scope is only this definition or declaration, which is probably not what you want
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reported by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Inverted IPv4 v1 and IPv6 v0 matches don't match anything since 2.6.25-rc1!
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that ctnetlink doesn't have any NAT module depenencies anymore,
we can also remove them from Kconfig.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds a URL to further info to the CONFIG_IP_VS_IPV6 Kconfig help
text. Also, I think it should be ok to remove the "DANGEROUS" label in the
description line at this point to get people to try it out and find all
the bugs ;) It's still marked as experimental, of course.
Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
net: Remove CONFIG_KMOD from net/ (towards removing CONFIG_KMOD entirely)
ipv4: Add a missing rcu_assign_pointer() in routing cache.
[netdrvr] ibmtr: PCMCIA IBMTR is ok on 64bit
xen-netfront: Avoid unaligned accesses to IP header
lmc: copy_*_user under spinlock
[netdrvr] myri10ge, ixgbe: remove broken select INTEL_IOATDMA
Some code here depends on CONFIG_KMOD to not try to load
protocol modules or similar, replace by CONFIG_MODULES
where more than just request_module depends on CONFIG_KMOD
and and also use try_then_request_module in ebtables.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (46 commits)
UIO: Fix mapping of logical and virtual memory
UIO: add automata sercos3 pci card support
UIO: Change driver name of uio_pdrv
UIO: Add alignment warnings for uio-mem
Driver core: add bus_sort_breadthfirst() function
NET: convert the phy_device file to use bus_find_device_by_name
kobject: Cleanup kobject_rename and !CONFIG_SYSFS
kobject: Fix kobject_rename and !CONFIG_SYSFS
sysfs: Make dir and name args to sysfs_notify() const
platform: add new device registration helper
sysfs: use ilookup5() instead of ilookup5_nowait()
PNP: create device attributes via default device attributes
Driver core: make bus_find_device_by_name() more robust
usb: turn dev_warn+WARN_ON combos into dev_WARN
debug: use dev_WARN() rather than WARN_ON() in device_pm_add()
debug: Introduce a dev_WARN() function
sysfs: fix deadlock
device model: Do a quickcheck for driver binding before doing an expensive check
Driver core: Fix cleanup in device_create_vargs().
Driver core: Clarify device cleanup.
...
Signed-off-by: Danny ter Haar <dth@cistron.nl>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Base infrastructure to enable per-module debug messages.
I've introduced CONFIG_DYNAMIC_PRINTK_DEBUG, which when enabled centralizes
control of debugging statements on a per-module basis in one /proc file,
currently, <debugfs>/dynamic_printk/modules. When, CONFIG_DYNAMIC_PRINTK_DEBUG,
is not set, debugging statements can still be enabled as before, often by
defining 'DEBUG' for the proper compilation unit. Thus, this patch set has no
affect when CONFIG_DYNAMIC_PRINTK_DEBUG is not set.
The infrastructure currently ties into all pr_debug() and dev_dbg() calls. That
is, if CONFIG_DYNAMIC_PRINTK_DEBUG is set, all pr_debug() and dev_dbg() calls
can be dynamically enabled/disabled on a per-module basis.
Future plans include extending this functionality to subsystems, that define
their own debug levels and flags.
Usage:
Dynamic debugging is controlled by the debugfs file,
<debugfs>/dynamic_printk/modules. This file contains a list of the modules that
can be enabled. The format of the file is as follows:
<module_name> <enabled=0/1>
.
.
.
<module_name> : Name of the module in which the debug call resides
<enabled=0/1> : whether the messages are enabled or not
For example:
snd_hda_intel enabled=0
fixup enabled=1
driver enabled=0
Enable a module:
$echo "set enabled=1 <module_name>" > dynamic_printk/modules
Disable a module:
$echo "set enabled=0 <module_name>" > dynamic_printk/modules
Enable all modules:
$echo "set enabled=1 all" > dynamic_printk/modules
Disable all modules:
$echo "set enabled=0 all" > dynamic_printk/modules
Finally, passing "dynamic_printk" at the command line enables
debugging for all modules. This mode can be turned off via the above
disable command.
[gkh: minor cleanups and tweaks to make the build work quietly]
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch removes the module dependency between ctnetlink and
nf_nat by means of an indirect call that is initialized when
nf_nat is loaded. Now, nf_conntrack_netlink only requires
nf_conntrack and nfnetlink.
This patch puts nfnetlink_parse_nat_setup_hook into the
nf_conntrack_core to avoid dependencies between ctnetlink,
nf_conntrack_ipv4 and nf_conntrack_ipv6.
This patch also introduces the function ctnetlink_change_nat
that is only invoked from the creation path. Actually, the
nat handling cannot be invoked from the update path since
this is not allowed. By introducing this function, we remove
the useless nat handling in the update path and we avoid
deadlock-prone code.
This patch also adds the required EAGAIN logic for nfnetlink.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The file(s) below do not use LINUX_VERSION_CODE nor KERNEL_VERSION.
net/netfilter/nf_tproxy_core.c
This patch removes the said #include <version.h>.
Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus noted a build failure case:
net/netfilter/ipvs/ip_vs_xmit.c: In function 'ip_vs_tunnel_xmit':
net/netfilter/ipvs/ip_vs_xmit.c:616: error: implicit declaration of function 'ip_select_ident'
The proper include file (net/ip.h) is being included in ip_vs_xmit.c to get
that declaration. So the only possible case where this can happen is if
CONFIG_INET is not enabled.
This seems to be purely a missing dependency in the ipvs/Kconfig file IP_VS
entry.
Also, while we're here, remove the out of date "EXPERIMENTAL" string in the
IP_VS config help header line. IP_VS no longer depends upon CONFIG_EXPERIMENTAL
Signed-off-by: David S. Miller <davem@davemloft.net>
Lots of extensions are completely family-independent, so squash some code.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Using ->family in struct xt_*_param, multiple struct xt_{match,target}
can be squashed together.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
By passing in the family through which extensions were invoked, a bit
of data space can be reclaimed. The "family" member will be added to
the parameter structures and the check functions be adjusted.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch does this for target extensions' destroy functions.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch does this for target extensions' checkentry functions.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch does this for target extensions' target functions.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch does this for match extensions' destroy functions.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch does this for match extensions' checkentry functions.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
The function signatures for Xtables extensions have grown over time.
It involves a lot of typing/replication, and also a bit of stack space
even if they are not used. Realize an NFWS2008 idea and pack them into
structs. The skb remains outside of the struct so gcc can continue to
apply its optimizations.
This patch does this for match extensions' match functions.
A few ambiguities have also been addressed. The "offset" parameter for
example has been renamed to "fragoff" (there are so many different
offsets already) and "protoff" to "thoff" (there is more than just one
protocol here, so clarify).
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
It used to be that {ip,ip6,etc}_tables called extension->checkentry
themselves, but this can be moved into the xtables core.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
The TPROXY target implements redirection of non-local TCP/UDP traffic to local
sockets. Additionally, it's possible to manipulate the packet mark if and only
if a socket has been found. (We need this because we cannot use multiple
targets in the same iptables rule.)
Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Add iptables 'socket' match, which matches packets for which a TCP/UDP
socket lookup succeeds.
Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
The iptables tproxy core is a module that contains the common routines used by
various tproxy related modules (TPROXY target and socket match)
Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
From kernel perspective, allow entrance in nf_hook_slow().
Stuff which uses nf_register_hook/nf_register_hooks, but otherwise not netns-ready:
DECnet netfilter
ipt_CLUSTERIP
nf_nat_standalone.c together with XFRM (?)
IPVS
several individual match modules (like hashlimit)
ctnetlink
NOTRACK
all sorts of queueing and reporting to userspace
L3 and L4 protocol sysctls, bridge sysctls
probably something else
Anyway critical mass has been achieved, there is no reason to hide netfilter any longer.
From userspace perspective, allow to manipulate all sorts of
iptables/ip6tables/arptables rules.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Add init_net checks to not remove kmem_caches twice and so on.
Refactor functions to split code which should be executed only for
init_net into one place.
ip_ct_attach and ip_ct_destroy assignments remain separate, because
they're separate stages in setup and teardown.
NOTE: NOTRACK code is in for-every-net part. It will be made per-netns
after we decidce how to do it correctly.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Note, sysctl table is always duplicated, this is simpler and less
special-cased.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Heh, last minute proof-reading of this patch made me think,
that this is actually unneeded, simply because "ct" pointers will be
different for different conntracks in different netns, just like they
are different in one netns.
Not so sure anymore.
[Patrick: pointers will be different, flushing can only be done while
inactive though and thus it needs to be per netns]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This is cleaner, we already know conntrack to which event is relevant.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Again, it's deducible from skb, but we're going to use it for
nf_conntrack_checksum and statistics, so just pass it from upper layer.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
It's deducible from skb->dev or skb->dst->dev, but we know netns at
the moment of call, so pass it down and use for finding and creating
conntracks.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
What is confirmed connection in one netns can very well be unconfirmed
in another one.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Make per-netns a) expectation hash and b) expectations count.
Expectations always belongs to netns to which it's master conntrack belong.
This is natural and doesn't bloat expectation.
Proc files and leaf users are stubbed to init_net, this is temporary.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
* make per-netns conntrack hash
Other solution is to add ->ct_net pointer to tuplehashes and still has one
hash, I tried that it's ugly and requires more code deep down in protocol
modules et al.
* propagate netns pointer to where needed, e. g. to conntrack iterators.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Sysctls and proc files are stubbed to init_net's one. This is temporary.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Conntrack (struct nf_conn) gets pointer to netns: ->ct_net -- netns in which
it was created. It comes from netdevice.
->ct_net is write-once field.
Every conntrack in system has ->ct_net initialized, no exceptions.
->ct_net doesn't pin netns: conntracks are recycled after timeouts and
pinning background traffic will prevent netns from even starting shutdown
sequence.
Right now every conntrack is created in init_net.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
One comment: #ifdefs around #include is necessary to overcome amazing compile
breakages in NOTRACK-in-netns patch (see below).
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
When a match or target is looked up using xt_find_{match,target},
Xtables will also search the NFPROTO_UNSPEC module list. This allows
for protocol-independent extensions (like xt_time) to be reused from
other components (e.g. arptables, ebtables).
Extensions that take different codepaths depending on match->family
or target->family of course cannot use NFPROTO_UNSPEC within the
registration structure (e.g. xt_pkttype).
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
The netfilter subsystem only supports a handful of protocols (much
less than PF_*) and even non-PF protocols like ARP and
pseudo-protocols like PF_BRIDGE. By creating NFPROTO_*, we can earn a
few memory savings on arrays that previously were always PF_MAX-sized
and keep the pseudo-protocols to ourselves.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This updates xt_recent to support the IPv6 address family.
The new /proc/net/xt_recent directory must be used for this.
The old proc interface can also be configured out.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Like with other modules (such as ipt_state), ipt_recent.h is changed
to forward definitions to (IOW include) xt_recent.h, and xt_recent.c
is changed to use the new constant names.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
and (try to) consistently use u_int8_t for the L3 family.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Since IPVS now has partial IPv6 support, this patch moves IPVS from
net/ipv4/ipvs to net/netfilter/ipvs. It's a result of:
$ git mv net/ipv4/ipvs net/netfilter
and adapting the relevant Kconfigs/Makefiles to the new path.
Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
The function localtime_3 in xt_time.c gives a wrong monthday in a leap
year after 28th 2. calculating monthday should use the array
days_since_leapyear[] not days_since_year[] in a leap year.
Signed-off-by: Kaihui Luo <kaih.luo@gmail.com>
Acked-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexey Dobriyan points out:
1. simple_strtoul() silently accepts all characters for given base even
if result won't fit into unsigned long. This is amazing stupidity in
itself, but
2. nf_conntrack_irc helper use simple_strtoul() for DCC request parsing.
Data first copied into 64KB buffer, so theoretically nothing prevents
reading past the end of it, since data comes from network given 1).
This is not actually a problem currently since we're guaranteed to have
a 0 byte in skb_shared_info or in the buffer the data is copied to, but
to make this more robust, make sure the string is actually terminated.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
It does "kfree(list_head)" which looks wrong because entity that was
allocated is definitely not list_head.
However, this all works because list_head is first item in
struct nf_ct_gre_keymap.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
gre_keymap_list should be protected in all places.
(unless I'm misreading something)
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Helper's ->help hook can run concurrently with itself, so iterating over
SIP helpers with static pointer won't work reliably.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes a GFP_KERNEL allocation while holding a spin lock with
bottom halves disabled in ctnetlink_change_helper().
This problem was introduced in 2.6.23 with the netfilter extension
infrastructure.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix allocation with GFP_KERNEL in ctnetlink_create_conntrack() under
read-side lock sections.
This problem was introduced in 2.6.25.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we create a conntrack that has NAT handlings and a helper, the helper
is assigned twice. This happens because nf_nat_setup_info() - via
nf_conntrack_alter_reply() - sets the helper before ctnetlink, which
indeed does not check if the conntrack already has a helper as it thinks that
it is a brand new conntrack.
The fix moves the helper assignation before the set of the status flags.
This avoids a bogus assertion in __nf_ct_ext_add (if netfilter assertions are
enabled) which checks that the conntrack must not be confirmed.
This problem was introduced in 2.6.23 with the netfilter extension
infrastructure.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Starting with 9043476f72 ("[PATCH]
sanitize proc_sysctl") we have two netfilter releated problems:
- WARNING: at kernel/sysctl.c:1966 unregister_sysctl_table+0xcc/0x103(),
caused by wrong order of ini/fini calls
- net.netfilter is duplicated and has truncated set of records
Thanks to very useful guidelines from Al Viro, this patch fixes both
of them.
Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Deleting a timer with del_timer doesn't guarantee, that the
timer function is not running at the moment of deletion. Thus
in the xt_hashlimit case we can get into a ticklish situation
when the htable_gc rearms the timer back and we'll actually
delete an entry with a pending timer.
Fix it with using del_timer_sync().
AFAIK del_timer_sync checks for the timer to be pending by
itself, so I remove the check.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to time out dead connections quicker, keep track of outstanding data
and cap the timeout.
Suggested by Herbert Xu.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
As Linus points out, "ct->ext" and "new" are always equal, avoid unnecessary
dereferences and use "new" directly.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
As suggested by Patrick McHardy, introduce a __krealloc() that doesn't
free the original buffer to fix a double-free and use-after-free bug
introduced by me in netfilter that uses RCU.
Reported-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Tested-by: Dieter Ries <clip2@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduced by a258860e (netfilter: ctnetlink: add full support for SCTP to ctnetlink):
net/netfilter/nf_conntrack_proto_sctp.c:483:2: warning: cast from restricted type
net/netfilter/nf_conntrack_proto_sctp.c:483:2: warning: incorrect type in argument 1 (different base types)
net/netfilter/nf_conntrack_proto_sctp.c:483:2: expected unsigned int [unsigned] [usertype] x
net/netfilter/nf_conntrack_proto_sctp.c:483:2: got restricted unsigned int const <noident>
net/netfilter/nf_conntrack_proto_sctp.c:483:2: warning: cast from restricted type
net/netfilter/nf_conntrack_proto_sctp.c:483:2: warning: cast from restricted type
net/netfilter/nf_conntrack_proto_sctp.c:483:2: warning: cast from restricted type
net/netfilter/nf_conntrack_proto_sctp.c:483:2: warning: cast from restricted type
net/netfilter/nf_conntrack_proto_sctp.c:487:2: warning: cast from restricted type
net/netfilter/nf_conntrack_proto_sctp.c:487:2: warning: incorrect type in argument 1 (different base types)
net/netfilter/nf_conntrack_proto_sctp.c:487:2: expected unsigned int [unsigned] [usertype] x
net/netfilter/nf_conntrack_proto_sctp.c:487:2: got restricted unsigned int const <noident>
net/netfilter/nf_conntrack_proto_sctp.c:487:2: warning: cast from restricted type
net/netfilter/nf_conntrack_proto_sctp.c:487:2: warning: cast from restricted type
net/netfilter/nf_conntrack_proto_sctp.c:487:2: warning: cast from restricted type
net/netfilter/nf_conntrack_proto_sctp.c:487:2: warning: cast from restricted type
net/netfilter/nf_conntrack_proto_sctp.c:532:42: warning: incorrect type in assignment (different base types)
net/netfilter/nf_conntrack_proto_sctp.c:532:42: expected restricted unsigned int <noident>
net/netfilter/nf_conntrack_proto_sctp.c:532:42: got unsigned int
net/netfilter/nf_conntrack_proto_sctp.c:534:39: warning: incorrect type in assignment (different base types)
net/netfilter/nf_conntrack_proto_sctp.c:534:39: expected restricted unsigned int <noident>
net/netfilter/nf_conntrack_proto_sctp.c:534:39: got unsigned int
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds some fields to NFLOG to be able to send the complete
hardware header with all necessary informations.
It sends to userspace:
* the type of hardware link
* the lenght of hardware header
* the hardware header
Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix netfilter xt_time's time_mt()'s use of do_div() on an s64 by using
div_s64() instead.
This was introduced by patch ee4411a1b1
("[NETFILTER]: x_tables: add xt_time match").
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Initially netfilter has had 64bit counters for conntrack-based accounting, but
it was changed in 2.6.14 to save memory. Unfortunately in-kernel 64bit counters are
still required, for example for "connbytes" extension. However, 64bit counters
waste a lot of memory and it was not possible to enable/disable it runtime.
This patch:
- reimplements accounting with respect to the extension infrastructure,
- makes one global version of seq_print_acct() instead of two seq_print_counters(),
- makes it possible to enable it at boot time (for CONFIG_SYSCTL/CONFIG_SYSFS=n),
- makes it possible to enable/disable it at runtime by sysctl or sysfs,
- extends counters from 32bit to 64bit,
- renames ip_conntrack_counter -> nf_conn_counter,
- enables accounting code unconditionally (no longer depends on CONFIG_NF_CT_ACCT),
- set initial accounting enable state based on CONFIG_NF_CT_ACCT
- removes buggy IPCT_COUNTER_FILLING event handling.
If accounting is enabled newly created connections get additional acct extend.
Old connections are not changed as it is not possible to add a ct_extend area
to confirmed conntrack. Accounting is performed for all connections with
acct extend regardless of a current state of "net.netfilter.nf_conntrack_acct".
Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Without CONFIG_NET_NS, namespace is always &init_net.
Compiler will be able to omit namespace comparisons with this patch.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a conntrack entry is destroyed in process context and destruction
is interrupted by packet processing and the packet is an attempt to
reopen a closed connection, TCP conntrack tries to kill the old entry
itself and returns NF_REPEAT to pass the packet through the hook
again. This may lead to an endless loop: TCP conntrack repeatedly
finds the old entry, but can not kill it itself since destruction
is already in progress, but destruction in process context can not
complete since TCP conntrack is keeping the CPU busy.
Drop the packet in TCP conntrack if we can't kill the connection
ourselves to avoid this.
Reported by: hemao77@gmail.com [ Kernel bugzilla #11058 ]
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The flag XT_STRING_FLAG_IGNORECASE indicates case insensitive string
matching. netfilter can find cmd.exe, Cmd.exe, cMd.exe and etc easily.
A new revision 1 was added, in the meantime invert of xt_string_info
was moved into flags as a flag. If revision is 1, The flag
XT_STRING_FLAG_INVERT indicates invert matching.
Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
ctnetlink does not need to allocate the conntrack entries with GFP_ATOMIC
as its code is executed in user context.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Get rid of refrences to no longer existant Fast NAT.
IP_ROUTE_NAT support was removed in August of 2004, but references to Fast
NAT were left in a couple of config options.
Signed-off-by: Russ Dill <Russ.Dill@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lost connections was reported by Thomas Bätzler (running 2.6.25 kernel) on
the netfilter mailing list (see the thread "Weird nat/conntrack Problem
with PASV FTP upload"). He provided tcpdump recordings which helped to
find a long lingering bug in conntrack.
In TCP connection tracking, checking the lower bound of valid ACK could
lead to mark valid packets as INVALID because:
- We have got a "higher or equal" inequality, but the test checked
the "higher" condition only; fixed.
- If the packet contains a SACK option, it could occur that the ACK
value was before the left edge of our (S)ACK "window": if a previous
packet from the other party intersected the right edge of the window
of the receiver, we could move forward the window parameters beyond
accepting a valid ack. Therefore in this patch we check the rightmost
SACK edge instead of the ACK value in the lower bound of valid (S)ACK
test.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The H.245 helper is not registered/unregistered, but assigned to
connections manually from the Q.931 helper. This means on unload
existing expectations and connections using the helper are not
cleaned up, leading to the following oops on module unload:
CPU 0 Unable to handle kernel paging request at virtual address c00a6828, epc == 802224dc, ra == 801d4e7c
Oops[#1]:
Cpu 0
$ 0 : 00000000 00000000 00000004 c00a67f0
$ 4 : 802a5ad0 81657e00 00000000 00000000
$ 8 : 00000008 801461c8 00000000 80570050
$12 : 819b0280 819b04b0 00000006 00000000
$16 : 802a5a60 80000000 80b46000 80321010
$20 : 00000000 00000004 802a5ad0 00000001
$24 : 00000000 802257a8
$28 : 802a4000 802a59e8 00000004 801d4e7c
Hi : 0000000b
Lo : 00506320
epc : 802224dc ip_conntrack_help+0x38/0x74 Tainted: P
ra : 801d4e7c nf_iterate+0xbc/0x130
Status: 1000f403 KERNEL EXL IE
Cause : 00800008
BadVA : c00a6828
PrId : 00019374
Modules linked in: ip_nat_pptp ip_conntrack_pptp ath_pktlog wlan_acl wlan_wep wlan_tkip wlan_ccmp wlan_xauth ath_pci ath_dev ath_dfs ath_rate_atheros wlan ath_hal ip_nat_tftp ip_conntrack_tftp ip_nat_ftp ip_conntrack_ftp pppoe ppp_async ppp_deflate ppp_mppe pppox ppp_generic slhc
Process swapper (pid: 0, threadinfo=802a4000, task=802a6000)
Stack : 801e7d98 00000004 802a5a60 80000000 801d4e7c 801d4e7c 802a5ad0 00000004
00000000 00000000 801e7d98 00000000 00000004 802a5ad0 00000000 00000010
801e7d98 80b46000 802a5a60 80320000 80000000 801d4f8c 802a5b00 00000002
80063834 00000000 80b46000 802a5a60 801e7d98 80000000 802ba854 00000000
81a02180 80b7e260 81a021b0 819b0000 819b0000 80570056 00000000 00000001
...
Call Trace:
[<801e7d98>] ip_finish_output+0x0/0x23c
[<801d4e7c>] nf_iterate+0xbc/0x130
[<801d4e7c>] nf_iterate+0xbc/0x130
[<801e7d98>] ip_finish_output+0x0/0x23c
[<801e7d98>] ip_finish_output+0x0/0x23c
[<801d4f8c>] nf_hook_slow+0x9c/0x1a4
One way to fix this would be to split helper cleanup from the unregistration
function and invoke it for the H.245 helper, but since ctnetlink needs to be
able to find the helper for synchonization purposes, a better fix is to
register it normally and make sure its not assigned to connections during
helper lookup. The missing l3num initialization is enough for this, this
patch changes it to use AF_UNSPEC to make it more explicit though.
Reported-by: liannan <liannan@twsz.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Properly free h323_buffer when helper registration fails.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>