So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes
on 64bit architectures, allowing us to combine the 4 bytes hole left by the
layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4
64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN...
:-)
Many calculations that previously required that skb->{transport,network,
mac}_header be first converted to a pointer now can be done directly, being
meaningful as offsets or pointers.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use nfattr_parse to parse attributes, this patch also modifies the default
behaviour since unknown attributes will be ignored instead of returning
EINVAL. This ensure backward compatibility: new libraries with new
attributes and old kernels can work.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch let userspace programs set the IP_CT_TCP_BE_LIBERAL flag to
force the pickup of established connections.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PUSH flag is accepted with every other valid combination.
Let's get it out of the tcp_valid_flags table and reduce the
number of combinations we have to handle. This does not
significantly reduce the table size however (8 bytes).
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This combination has been encountered on an IBM AS/400 in response
to packets sent to a closed session. There is no particular reason
to mark it invalid.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now it uses jhash, but using jhash2 would be around 3-4 times faster
(on P4).
Signed-off-by: Sami Farin <safari-netfilter@safari.iki.fi>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
subsys_table is initialized to NULL, therefore just returns NULL in case
that it is not set.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove nfnetlink_check_attributes duplicates message size and callback
id checks. nfnetlink_find_client and nfnetlink_rcv_msg already do
such checks.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The retrying after an allocation failure is not necessary anymore
since we're holding the mutex the entire time, for the same
reason the double allocation race can't happen anymore.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we don't use nf_conntrack_lock anymore but a single mutex for
all protocol handling, no need to release and grab it again for sysctl
registration.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove ugly special-casing of nf_conntrack_l4proto_generic, all it
wants is its sysctl tables registered, so do that explicitly in an
init function and move the remaining protocol initialization and
cleanup code to nf_conntrack_proto.c as well.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The protocol lookups done by nf_conntrack are already protected by RCU,
there is no need to keep taking nf_conntrack_lock for registration
and unregistration. Switch to a mutex.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the obsolete IPv4 only connection tracking/NAT as scheduled in
feature-removal-schedule.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove xt_proto_prefix array which duplicates xt_prefix and change all
users of xt_proto_prefix to xt_prefix.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now the skb->nh union has just one member, .raw, i.e. it is just like the
skb->mac union, strange, no? I'm just leaving it like that till the transport
layer is done with, when we'll rename skb->mac.raw to skb->mac_header (or
->mac_header_offset?), ditto for ->{h,nh}.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the places where we need a pointer to the network header, it is still legal
to touch skb->nh.raw directly if just adding to, subtracting from or setting it
to another layer header.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the quite common 'skb->nh.raw - skb->data' sequence.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the places where we need a pointer to the mac header, it is still legal to
touch skb->mac.raw directly if just adding to, subtracting from or setting it
to another layer header.
This one also converts some more cases to skb_reset_mac_header() that my
regex missed as it had no spaces before nor after '=', ugh.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We currently use a special structure (struct skb_timeval) and plain
'struct timeval' to store packet timestamps in sk_buffs and struct
sock.
This has some drawbacks :
- Fixed resolution of micro second.
- Waste of space on 64bit platforms where sizeof(struct timeval)=16
I suggest using ktime_t that is a nice abstraction of high resolution
time services, currently capable of nanosecond resolution.
As sizeof(ktime_t) is 8 bytes, using ktime_t in 'struct sock' permits
a 8 byte shrink of this structure on 64bit architectures. Some other
structures also benefit from this size reduction (struct ipq in
ipv4/ip_fragment.c, struct frag_queue in ipv6/reassembly.c, ...)
Once this ktime infrastructure adopted, we can more easily provide
nanosecond resolution on top of it. (ioctl SIOCGSTAMPNS and/or
SO_TIMESTAMPNS/SCM_TIMESTAMPNS)
Note : this patch includes a bug correction in
compat_sock_get_timestamp() where a "err = 0;" was missing (so this
syscall returned -ENOENT instead of 0)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
CC: Stephen Hemminger <shemminger@linux-foundation.org>
CC: John find <linux.kernel@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here is the current version of the 64 bit divide common code.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
NF_CT_NETLINK=y, NF_NAT=m results in:
LD .tmp_vmlinux1
net/built-in.o: dans la fonction « nfnetlink_parse_nat_proto »:
nf_conntrack_netlink.c:(.text+0x28db9): référence indéfinie vers « nf_nat_proto_find_get »
nf_conntrack_netlink.c:(.text+0x28dd6): référence indéfinie vers « nf_nat_proto_put »
net/built-in.o: dans la fonction « ctnetlink_new_conntrack »:
nf_conntrack_netlink.c:(.text+0x29959): référence indéfinie vers « nf_nat_setup_info »
nf_conntrack_netlink.c:(.text+0x29b35): référence indéfinie vers « nf_nat_setup_info »
nf_conntrack_netlink.c:(.text+0x29cf7): référence indéfinie vers « nf_nat_setup_info »
nf_conntrack_netlink.c:(.text+0x29de2): référence indéfinie vers « nf_nat_setup_info »
make: *** [.tmp_vmlinux1] Erreur 1
Reported by Kevin Baradon <kevin.baradon@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
physoutdev is only set on purely bridged packet, when nfnetlink_log is used
in the OUTPUT/FORWARD/POSTROUTING hooks on packets forwarded from or to a
bridge it crashes when trying to dereference skb->nf_bridge->physoutdev.
Reported by Holger Eitzenberger <heitzenberger@astaro.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Userspace expects a zero-terminated string, so include the trailing
zero in the netlink message.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix reference counting (memory leak) problem in __nfulnl_send() and callers
related to packet queueing.
Signed-off-by: Michal Miroslaw <mirq-linux@rere.qmqm.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Count module references correctly: after instance_destroy() there
might be timer pending and holding a reference for this netlink instance.
Based on patch by Michal Miroslaw <mirq-linux@rere.qmqm.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminate possible NULL pointer dereference in nfulnl_recv_config().
Signed-off-by: Michal Miroslaw <mirq-linux@rere.qmqm.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paranoia: instance_put() might have freed the inst pointer when we
spin_unlock_bh().
Signed-off-by: Michal Miroslaw <mirq-linux@rere.qmqm.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stop reference leaking in nfulnl_log_packet(). If we start a timer we
are already taking another reference.
Signed-off-by: Michal Miroslaw <mirq-linux@rere.qmqm.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some stacks apparently send packets with SYN|URG set. Linux accepts
these packets, so TCP conntrack should to.
Pointed out by Martijn Posthuma <posthuma@sangine.com>.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The nf_conntrack_netlink config option is named CONFIG_NF_CT_NETLINK,
but multiple files use CONFIG_IP_NF_CONNTRACK_NETLINK or
CONFIG_NF_CONNTRACK_NETLINK for ifdefs.
Fix this and reformat all CONFIG_NF_CT_NETLINK ifdefs to only use a line.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix {nf,ip}_ct_iterate_cleanup unconfirmed list handling:
- unconfirmed entries can not be killed manually, they are removed on
confirmation or final destruction of the conntrack entry, which means
we might iterate forever without making forward progress.
This can happen in combination with the conntrack event cache, which
holds a reference to the conntrack entry, which is only released when
the packet makes it all the way through the stack or a different
packet is handled.
- taking references to an unconfirmed entry and using it outside the
locked section doesn't work, the list entries are not refcounted and
another CPU might already be waiting to destroy the entry
What the code really wants to do is make sure the references of the hash
table to the selected conntrack entries are released, so they will be
destroyed once all references from skbs and the event cache are dropped.
Since unconfirmed entries haven't even entered the hash yet, simply mark
them as dying and skip confirmation based on that.
Reported and tested by Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
ctnetlink uses netlink_unicast from an atomic_notifier_chain
(which is called within a RCU read side critical section)
without holding further locks. netlink_unicast calls netlink_trim
with the result of gfp_any() for the gfp flags, which are passed
down to pskb_expand_header. gfp_any() only checks for softirq
context and returns GFP_KERNEL, resulting in this warning:
BUG: sleeping function called from invalid context at mm/slab.c:3032
in_atomic():1, irqs_disabled():0
no locks held by rmmod/7010.
Call Trace:
[<ffffffff8109467f>] debug_show_held_locks+0x9/0xb
[<ffffffff8100b0b4>] __might_sleep+0xd9/0xdb
[<ffffffff810b5082>] __kmalloc+0x68/0x110
[<ffffffff811ba8f2>] pskb_expand_head+0x4d/0x13b
[<ffffffff81053147>] netlink_broadcast+0xa5/0x2e0
[<ffffffff881cd1d7>] :nfnetlink:nfnetlink_send+0x83/0x8a
[<ffffffff8834f6a6>] :nf_conntrack_netlink:ctnetlink_conntrack_event+0x94c/0x96a
[<ffffffff810624d6>] notifier_call_chain+0x29/0x3e
[<ffffffff8106251d>] atomic_notifier_call_chain+0x32/0x60
[<ffffffff881d266d>] :nf_conntrack:destroy_conntrack+0xa5/0x1d3
[<ffffffff881d194e>] :nf_conntrack:nf_ct_cleanup+0x8c/0x12c
[<ffffffff881d4614>] :nf_conntrack:kill_l3proto+0x0/0x13
[<ffffffff881d482a>] :nf_conntrack:nf_conntrack_l3proto_unregister+0x90/0x94
[<ffffffff883551b3>] :nf_conntrack_ipv4:nf_conntrack_l3proto_ipv4_fini+0x2b/0x5d
[<ffffffff8109d44f>] sys_delete_module+0x1b5/0x1e6
[<ffffffff8105f245>] trace_hardirqs_on_thunk+0x35/0x37
[<ffffffff8105911e>] system_call+0x7e/0x83
Since netlink_unicast is supposed to be callable from within RCU
read side critical sections, make gfp_any() check for in_atomic()
instead of in_softirq().
Additionally nfnetlink_send needs to use gfp_any() as well for the
call to netlink_broadcast).
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The semantic effect of insert_at_head is that it would allow new registered
sysctl entries to override existing sysctl entries of the same name. Which is
pain for caching and the proc interface never implemented.
I have done an audit and discovered that none of the current users of
register_sysctl care as (excpet for directories) they do not register
duplicate sysctl entries.
So this patch simply removes the support for overriding existing entries in
the sys_sysctl interface since no one uses it or cares and it makes future
enhancments harder.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Corey Minyard <minyard@acm.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: David Chinner <dgc@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After Al Viro (finally) succeeded in removing the sched.h #include in module.h
recently, it makes sense again to remove other superfluous sched.h includes.
There are quite a lot of files which include it but don't actually need
anything defined in there. Presumably these includes were once needed for
macros that used to live in sched.h, but moved to other header files in the
course of cleaning it up.
To ease the pain, this time I did not fiddle with any header files and only
removed #includes from .c-files, which tend to cause less trouble.
Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
configs in arch/arm/configs on arm. I also checked that no new warnings were
introduced by the patch (actually, some warnings are removed that were emitted
by unnecessarily included header files).
Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sysctls are registered by the protocol module itself since 2.6.19, no need
to have them visible to others.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of depending on internally needed options and letting users
figure out what is needed, select them when needed:
- IP_NF_IPTABLES, IP_NF_ARPTABLES and IP6_NF_IPTABLES select
NETFILTER_XTABLES
- NETFILTER_XT_TARGET_CONNMARK, NETFILTER_XT_MATCH_CONNMARK and
IP_NF_TARGET_CLUSTERIP select NF_CONNTRACK_MARK
- NETFILTER_XT_MATCH_CONNBYTES selects NF_CT_ACCT
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use NF_IP6_ instead of NF_IP_. The values are identical, this is merely
cleanup.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
No caller checks the return value, and since its usually called within the
module unload path there's nothing a module could do about errors anyway,
so BUG on invalid conditions and return void.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
NF_CT_STAT_INC assumes rcu_read_lock in nf_hook_slow disables
preemption as well, making it legal to use __get_cpu_var without
disabling preemption manually. The assumption is not correct anymore
with preemptable RCU, additionally we need to protect against softirqs
when not holding nf_conntrack_lock.
Add NF_CT_STAT_INC_ATOMIC macro, which disables local softirqs,
and use where necessary.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>