Commit Graph

1767 Commits

Author SHA1 Message Date
Linus Torvalds 3b59bf0816 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking merge from David Miller:
 "1) Move ixgbe driver over to purely page based buffering on receive.
     From Alexander Duyck.

  2) Add receive packet steering support to e1000e, from Bruce Allan.

  3) Convert TCP MD5 support over to RCU, from Eric Dumazet.

  4) Reduce cpu usage in handling out-of-order TCP packets on modern
     systems, also from Eric Dumazet.

  5) Support the IP{,V6}_UNICAST_IF socket options, making the wine
     folks happy, from Erich Hoover.

  6) Support VLAN trunking from guests in hyperv driver, from Haiyang
     Zhang.

  7) Support byte-queue-limtis in r8169, from Igor Maravic.

  8) Outline code intended for IP_RECVTOS in IP_PKTOPTIONS existed but
     was never properly implemented, Jiri Benc fixed that.

  9) 64-bit statistics support in r8169 and 8139too, from Junchang Wang.

  10) Support kernel side dump filtering by ctmark in netfilter
      ctnetlink, from Pablo Neira Ayuso.

  11) Support byte-queue-limits in gianfar driver, from Paul Gortmaker.

  12) Add new peek socket options to assist with socket migration, from
      Pavel Emelyanov.

  13) Add sch_plug packet scheduler whose queue is controlled by
      userland daemons using explicit freeze and release commands.  From
      Shriram Rajagopalan.

  14) Fix FCOE checksum offload handling on transmit, from Yi Zou."

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1846 commits)
  Fix pppol2tp getsockname()
  Remove printk from rds_sendmsg
  ipv6: fix incorrent ipv6 ipsec packet fragment
  cpsw: Hook up default ndo_change_mtu.
  net: qmi_wwan: fix build error due to cdc-wdm dependecy
  netdev: driver: ethernet: Add TI CPSW driver
  netdev: driver: ethernet: add cpsw address lookup engine support
  phy: add am79c874 PHY support
  mlx4_core: fix race on comm channel
  bonding: send igmp report for its master
  fs_enet: Add MPC5125 FEC support and PHY interface selection
  net: bpf_jit: fix BPF_S_LDX_B_MSH compilation
  net: update the usage of CHECKSUM_UNNECESSARY
  fcoe: use CHECKSUM_UNNECESSARY instead of CHECKSUM_PARTIAL on tx
  net: do not do gso for CHECKSUM_UNNECESSARY in netif_needs_gso
  ixgbe: Fix issues with SR-IOV loopback when flow control is disabled
  net/hyperv: Fix the code handling tx busy
  ixgbe: fix namespace issues when FCoE/DCB is not enabled
  rtlwifi: Remove unused ETH_ADDR_LEN defines
  igbvf: Use ETH_ALEN
  ...

Fix up fairly trivial conflicts in drivers/isdn/gigaset/interface.c and
drivers/net/usb/{Kconfig,qmi_wwan.c} as per David.
2012-03-20 21:04:47 -07:00
Linus Torvalds 9c2b957db1 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events changes for v3.4 from Ingo Molnar:

 - New "hardware based branch profiling" feature both on the kernel and
   the tooling side, on CPUs that support it.  (modern x86 Intel CPUs
   with the 'LBR' hardware feature currently.)

   This new feature is basically a sophisticated 'magnifying glass' for
   branch execution - something that is pretty difficult to extract from
   regular, function histogram centric profiles.

   The simplest mode is activated via 'perf record -b', and the result
   looks like this in perf report:

	$ perf record -b any_call,u -e cycles:u branchy

	$ perf report -b --sort=symbol
	    52.34%  [.] main                   [.] f1
	    24.04%  [.] f1                     [.] f3
	    23.60%  [.] f1                     [.] f2
	     0.01%  [k] _IO_new_file_xsputn    [k] _IO_file_overflow
	     0.01%  [k] _IO_vfprintf_internal  [k] _IO_new_file_xsputn
	     0.01%  [k] _IO_vfprintf_internal  [k] strchrnul
	     0.01%  [k] __printf               [k] _IO_vfprintf_internal
	     0.01%  [k] main                   [k] __printf

   This output shows from/to branch columns and shows the highest
   percentage (from,to) jump combinations - i.e.  the most likely taken
   branches in the system.  "branches" can also include function calls
   and any other synchronous and asynchronous transitions of the
   instruction pointer that are not 'next instruction' - such as system
   calls, traps, interrupts, etc.

   This feature comes with (hopefully intuitive) flat ascii and TUI
   support in perf report.

 - Various 'perf annotate' visual improvements for us assembly junkies.
   It will now recognize function calls in the TUI and by hitting enter
   you can follow the call (recursively) and back, amongst other
   improvements.

 - Multiple threads/processes recording support in perf record, perf
   stat, perf top - which is activated via a comma-list of PIDs:

	perf top -p 21483,21485
	perf stat -p 21483,21485 -ddd
	perf record -p 21483,21485

 - Support for per UID views, via the --uid paramter to perf top, perf
   report, etc.  For example 'perf top --uid mingo' will only show the
   tasks that I am running, excluding other users, root, etc.

 - Jump label restructurings and improvements - this includes the
   factoring out of the (hopefully much clearer) include/linux/static_key.h
   generic facility:

	struct static_key key = STATIC_KEY_INIT_FALSE;

	...

	if (static_key_false(&key))
	        do unlikely code
	else
	        do likely code

	...
	static_key_slow_inc();
	...
	static_key_slow_inc();
	...

   The static_key_false() branch will be generated into the code with as
   little impact to the likely code path as possible.  the
   static_key_slow_*() APIs flip the branch via live kernel code patching.

   This facility can now be used more widely within the kernel to
   micro-optimize hot branches whose likelihood matches the static-key
   usage and fast/slow cost patterns.

 - SW function tracer improvements: perf support and filtering support.

 - Various hardenings of the perf.data ABI, to make older perf.data's
   smoother on newer tool versions, to make new features integrate more
   smoothly, to support cross-endian recording/analyzing workflows
   better, etc.

 - Restructuring of the kprobes code, the splitting out of 'optprobes',
   and a corner case bugfix.

 - Allow the tracing of kernel console output (printk).

 - Improvements/fixes to user-space RDPMC support, allowing user-space
   self-profiling code to extract PMU counts without performing any
   system calls, while playing nice with the kernel side.

 - 'perf bench' improvements

 - ... and lots of internal restructurings, cleanups and fixes that made
   these features possible.  And, as usual this list is incomplete as
   there were also lots of other improvements

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (120 commits)
  perf report: Fix annotate double quit issue in branch view mode
  perf report: Remove duplicate annotate choice in branch view mode
  perf/x86: Prettify pmu config literals
  perf report: Enable TUI in branch view mode
  perf report: Auto-detect branch stack sampling mode
  perf record: Add HEADER_BRANCH_STACK tag
  perf record: Provide default branch stack sampling mode option
  perf tools: Make perf able to read files from older ABIs
  perf tools: Fix ABI compatibility bug in print_event_desc()
  perf tools: Enable reading of perf.data files from different ABI rev
  perf: Add ABI reference sizes
  perf report: Add support for taken branch sampling
  perf record: Add support for sampling taken branch
  perf tools: Add code to support PERF_SAMPLE_BRANCH_STACK
  x86/kprobes: Split out optprobe related code to kprobes-opt.c
  x86/kprobes: Fix a bug which can modify kernel code permanently
  x86/kprobes: Fix instruction recovery on optimized path
  perf: Add callback to flush branch_stack on context switch
  perf: Disable PERF_SAMPLE_BRANCH_* when not supported
  perf/x86: Add LBR software filter support for Intel CPUs
  ...
2012-03-20 10:29:15 -07:00
David S. Miller 4da0bd7365 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-03-18 23:29:41 -04:00
Pablo Neira Ayuso a16a1647fa netfilter: ctnetlink: fix race between delete and timeout expiration
Kerin Millar reported hardlockups while running `conntrackd -c'
in a busy firewall. That system (with several processors) was
acting as backup in a primary-backup setup.

After several tries, I found a race condition between the deletion
operation of ctnetlink and timeout expiration. This patch fixes
this problem.

Tested-by: Kerin Millar <kerframil@gmail.com>
Reported-by: Kerin Millar <kerframil@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-17 01:47:08 -07:00
Ingo Molnar 35239e23c6 Merge branch 'perf/urgent' into perf/core
Merge reason: We are going to queue up a dependent patch.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-12 20:44:11 +01:00
David S. Miller b2d3298e09 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-03-09 14:34:20 -08:00
Pablo Neira Ayuso 24de58f465 netfilter: xt_CT: allow to attach timeout policy + glue code
This patch allows you to attach the timeout policy via the
CT target, it adds a new revision of the target to ensure
backward compatibility. Moreover, it also contains the glue
code to stick the timeout object defined via nfnetlink_cttimeout
to the given flow.

Example usage (it requires installing the nfct tool and
libnetfilter_cttimeout):

1) create the timeout policy:

 nfct timeout add tcp-policy0 inet tcp \
	established 1000 close 10 time_wait 10 last_ack 10

2) attach the timeout policy to the packet:

 iptables -I PREROUTING -t raw -p tcp -j CT --timeout tcp-policy0

You have to install the following user-space software:

a) libnetfilter_cttimeout:
   git://git.netfilter.org/libnetfilter_cttimeout

b) nfct:
   git://git.netfilter.org/nfct

You also have to get iptables with -j CT --timeout support.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-07 17:41:28 +01:00
Pablo Neira Ayuso dd70507241 netfilter: nf_ct_ext: add timeout extension
This patch adds the timeout extension, which allows you to attach
specific timeout policies to flows.

This extension is only used by the template conntrack.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-07 17:41:25 +01:00
Pablo Neira Ayuso 5097846230 netfilter: add cttimeout infrastructure for fine timeout tuning
This patch adds the infrastructure to add fine timeout tuning
over nfnetlink. Now you can use the NFNL_SUBSYS_CTNETLINK_TIMEOUT
subsystem to create/delete/dump timeout objects that contain some
specific timeout policy for one flow.

The follow up patches will allow you attach timeout policy object
to conntrack via the CT target and the conntrack extension
infrastructure.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-07 17:41:22 +01:00
Pablo Neira Ayuso 2c8503f55f netfilter: nf_conntrack: pass timeout array to l4->new and l4->packet
This patch defines a new interface for l4 protocol trackers:

unsigned int *(*get_timeouts)(struct net *net);

that is used to return the array of unsigned int that contains
the timeouts that will be applied for this flow. This is passed
to the l4proto->new(...) and l4proto->packet(...) functions to
specify the timeout policy.

This interface allows per-net global timeout configuration
(although only DCCP supports this by now) and it will allow
custom custom timeout configuration by means of follow-up
patches.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-07 17:41:19 +01:00
Pablo Neira Ayuso b888341c7f netfilter: nf_ct_gre: add unsigned int array to define timeouts
This patch adds an array to define the default GRE timeouts.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-07 17:41:16 +01:00
Pablo Neira Ayuso 33ee44643f netfilter: nf_ct_tcp: move retransmission and unacknowledged timeout to array
This patch moves the retransmission and unacknowledged timeouts
to the tcp_timeouts array. This change is required by follow-up
patches.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-07 17:41:15 +01:00
Pablo Neira Ayuso 5a41db94c6 netfilter: nf_ct_udp[lite]: convert UDP[lite] timeouts to array
Use one array to store the UDP timeouts instead of two variables.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-07 17:41:13 +01:00
Hans Schillstrom 3b988ece9b netfilter: ctnetlink: fix lockep splats
net/netfilter/nf_conntrack_proto.c:70 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
3 locks held by conntrack/3235:
nfnl_lock+0x17/0x20
netlink_dump+0x32/0x240
ctnetlink_dump_table+0x3e/0x170 [nf_conntrack_netlink]

stack backtrace:
Pid: 3235, comm: conntrack Tainted: G W  3.2.0+ #511
Call Trace:
[<ffffffff8108ce45>] lockdep_rcu_suspicious+0xe5/0x100
[<ffffffffa00ec6e1>] __nf_ct_l4proto_find+0x81/0xb0 [nf_conntrack]
[<ffffffffa0115675>] ctnetlink_fill_info+0x215/0x5f0 [nf_conntrack_netlink]
[<ffffffffa0115dc1>] ctnetlink_dump_table+0xd1/0x170 [nf_conntrack_netlink]
[<ffffffff815fbdbf>] netlink_dump+0x7f/0x240
[<ffffffff81090f9d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff815fd34f>] netlink_dump_start+0xdf/0x190
[<ffffffffa0111490>] ? ctnetlink_change_nat_seq_adj+0x160/0x160 [nf_conntrack_netlink]
[<ffffffffa0115cf0>] ? ctnetlink_get_conntrack+0x2a0/0x2a0 [nf_conntrack_netlink]
[<ffffffffa0115ad9>] ctnetlink_get_conntrack+0x89/0x2a0 [nf_conntrack_netlink]
[<ffffffff81603a47>] nfnetlink_rcv_msg+0x467/0x5f0
[<ffffffff81603a7c>] ? nfnetlink_rcv_msg+0x49c/0x5f0
[<ffffffff81603922>] ? nfnetlink_rcv_msg+0x342/0x5f0
[<ffffffff81071b21>] ? get_parent_ip+0x11/0x50
[<ffffffff816035e0>] ? nfnetlink_subsys_register+0x60/0x60
[<ffffffff815fed49>] netlink_rcv_skb+0xa9/0xd0
[<ffffffff81603475>] nfnetlink_rcv+0x15/0x20
[<ffffffff815fe70e>] netlink_unicast+0x1ae/0x1f0
[<ffffffff815fea16>] netlink_sendmsg+0x2c6/0x320
[<ffffffff815b2a87>] sock_sendmsg+0x117/0x130
[<ffffffff81125093>] ? might_fault+0x53/0xb0
[<ffffffff811250dc>] ? might_fault+0x9c/0xb0
[<ffffffff81125093>] ? might_fault+0x53/0xb0
[<ffffffff815b5991>] ? move_addr_to_kernel+0x71/0x80
[<ffffffff815b644e>] sys_sendto+0xfe/0x130
[<ffffffff815b5c94>] ? sys_bind+0xb4/0xd0
[<ffffffff817a8a0e>] ? retint_swapgs+0xe/0x13
[<ffffffff817afcd2>] system_call_fastpath+0x16/0x1b

Reported-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
2012-03-07 17:41:10 +01:00
Richard Weinberger 417e02bf42 netfilter: xt_LOG: fix bogus extra layer-4 logging information
In 16059b5 netfilter: merge ipt_LOG and ip6_LOG into xt_LOG, we have
merged ipt_LOG and ip6t_LOG.

However:

IN=wlan0 OUT= MAC=xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx
SRC=213.150.61.61 DST=192.168.1.133 LEN=40 TOS=0x00 PREC=0x00 TTL=117
ID=10539 DF PROTO=TCP SPT=80 DPT=49013 WINDOW=0 RES=0x00 ACK RST
URGP=0 PROTO=UDPLITE SPT=80 DPT=49013 LEN=45843 PROTO=ICMP TYPE=0
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Several missing break in the code led to including bogus layer-4
information. This patch fixes this problem.

Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-07 17:40:59 +01:00
Tony Zelenoff 58020f7761 netfilter: nf_ct_ecache: refactor nf_ct_deliver_cached_events
* identation lowered
* some CPU cycles saved at delayed item variable initialization

Signed-off-by: Tony Zelenoff <antonz@parallels.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-07 17:40:53 +01:00
Tony Zelenoff 93326ae312 netfilter: nf_ct_ecache: trailing whitespace removed
Signed-off-by: Tony Zelenoff <antonz@parallels.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-07 17:40:51 +01:00
Richard Weinberger 6939c33a75 netfilter: merge ipt_LOG and ip6_LOG into xt_LOG
ipt_LOG and ip6_LOG have a lot of common code, merge them
to reduce duplicate code.

Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-07 17:40:49 +01:00
Pablo Neira Ayuso 544d5c7d9f netfilter: ctnetlink: allow to set expectfn for expectations
This patch allows you to set expectfn which is specifically used
by the NAT side of most of the existing conntrack helpers.

I have added a symbol map that uses a string as key to look up for
the function that is attached to the expectation object. This is
the best solution I came out with to solve this issue.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-07 17:40:46 +01:00
Pablo Neira Ayuso 076a0ca026 netfilter: ctnetlink: add NAT support for expectations
This patch adds the missing bits to create expectations that
are created in NAT setups.
2012-03-07 17:40:44 +01:00
Pablo Neira Ayuso b8c5e52c13 netfilter: ctnetlink: allow to set expectation class
This patch allows you to set the expectation class.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-07 17:40:42 +01:00
Pablo Neira Ayuso 660fdb2a0f netfilter: ctnetlink: allow to set helper for new expectations
This patch allow you to set the helper for newly created
expectations based of the CTA_EXPECT_HELP_NAME attribute.
Before this, the helper set was NULL.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-07 17:40:40 +01:00
Jozsef Kadlecsik 2a7cef2a4b netfilter: ipset: Exceptions support added to hash:*net* types
The "nomatch" keyword and option is added to the hash:*net* types,
by which one can add exception entries to sets. Example:

        ipset create test hash:net
        ipset add test 192.168.0/24
        ipset add test 192.168.0/30 nomatch

In this case the IP addresses from 192.168.0/24 except 192.168.0/30
match the elements of the set.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-07 17:40:35 +01:00
Jan Engelhardt c15f1c8325 netfilter: ipset: use NFPROTO_ constants
ipset is actually using NFPROTO values rather than AF (xt_set passes
that along).

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-03-07 17:40:29 +01:00
Pablo Neira Ayuso 7413851197 netfilter: nf_conntrack: fix early_drop with reliable event delivery
If reliable event delivery is enabled and ctnetlink fails to deliver
the destroy event in early_drop, the conntrack subsystem cannot
drop any the candidate flow that was planned to be evicted.

Reported-by: Kerin Millar <kerframil@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06 14:43:50 -05:00
Pablo Neira Ayuso 8be619d1e4 netfilter: ctnetlink: remove incorrect spin_[un]lock_bh on NAT module autoload
Since 7d367e0, ctnetlink_new_conntrack is called without holding
the nf_conntrack_lock spinlock. Thus, ctnetlink_parse_nat_setup
does not require to release that spinlock anymore in the NAT module
autoload case.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06 14:43:49 -05:00
Ingo Molnar 737f24bda7 Merge branch 'perf/urgent' into perf/core
Conflicts:
	tools/perf/builtin-record.c
	tools/perf/builtin-top.c
	tools/perf/perf.h
	tools/perf/util/top.h

Merge reason: resolve these cherry-picking conflicts.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-05 09:20:08 +01:00
David S. Miller ff4783ce78 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/sfc/rx.c

Overlapping changes in drivers/net/ethernet/sfc/rx.c, one to change
the rx_buf->is_page boolean into a set of u16 flags, and another to
adjust how ->ip_summed is initialized.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-26 21:55:51 -05:00
Pablo Neira Ayuso 0f298a285f netfilter: ctnetlink: support kernel-space dump filtering by ctmark
This patch adds CTA_MARK_MASK which, together with CTA_MARK, allows
you to selectively send conntrack entries to user-space by
returning those that match mark & mask.

With this, we can save cycles in the building and the parsing of
the entries that may be later on filtered out in user-space by using
the ctmark & mask.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-26 14:12:33 -05:00
Pablo Neira Ayuso 80d326fab5 netlink: add netlink_dump_control structure for netlink_dump_start()
Davem considers that the argument list of this interface is getting
out of control. This patch tries to address this issue following
his proposal:

struct netlink_dump_control c = { .dump = dump, .done = done, ... };

netlink_dump_start(..., &c);

Suggested by David S. Miller.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-26 14:10:06 -05:00
Jozsef Kadlecsik 7d367e0668 netfilter: ctnetlink: fix soft lockup when netlink adds new entries (v2)
Marcell Zambo and Janos Farago noticed and reported that when
new conntrack entries are added via netlink and the conntrack table
gets full, soft lockup happens. This is because the nf_conntrack_lock
is held while nf_conntrack_alloc is called, which is in turn wants
to lock nf_conntrack_lock while evicting entries from the full table.

The patch fixes the soft lockup with limiting the holding of the
nf_conntrack_lock to the minimum, where it's absolutely required.
It required to extend (and thus change) nf_conntrack_hash_insert
so that it makes sure conntrack and ctnetlink do not add the same entry
twice to the conntrack table.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-02-24 12:24:15 +01:00
Pablo Neira Ayuso 279072882d Revert "netfilter: ctnetlink: fix soft lockup when netlink adds new entries"
This reverts commit af14cca162.

This patch contains a race condition between packets and ctnetlink
in the conntrack addition. A new patch to fix this issue follows up.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-02-24 12:19:57 +01:00
Ingo Molnar c5905afb0e static keys: Introduce 'struct static_key', static_key_true()/false() and static_key_slow_[inc|dec]()
So here's a boot tested patch on top of Jason's series that does
all the cleanups I talked about and turns jump labels into a
more intuitive to use facility. It should also address the
various misconceptions and confusions that surround jump labels.

Typical usage scenarios:

        #include <linux/static_key.h>

        struct static_key key = STATIC_KEY_INIT_TRUE;

        if (static_key_false(&key))
                do unlikely code
        else
                do likely code

Or:

        if (static_key_true(&key))
                do likely code
        else
                do unlikely code

The static key is modified via:

        static_key_slow_inc(&key);
        ...
        static_key_slow_dec(&key);

The 'slow' prefix makes it abundantly clear that this is an
expensive operation.

I've updated all in-kernel code to use this everywhere. Note
that I (intentionally) have not pushed through the rename
blindly through to the lowest levels: the actual jump-label
patching arch facility should be named like that, so we want to
decouple jump labels from the static-key facility a bit.

On non-jump-label enabled architectures static keys default to
likely()/unlikely() branches.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jason Baron <jbaron@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: a.p.zijlstra@chello.nl
Cc: mathieu.desnoyers@efficios.com
Cc: davem@davemloft.net
Cc: ddaney.cavm@gmail.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20120222085809.GA26397@elte.hu
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-02-24 10:05:59 +01:00
David S. Miller 4a2258dddd Merge branch 'nf' of git://1984.lsi.us.es/net 2012-02-23 00:20:14 -05:00
RongQing.Li 5d38b1f8cf netfilter: ip6_route_output() never returns NULL.
ip6_route_output() never returns NULL, so it is wrong to
check if the return value is NULL.

Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-22 15:30:15 -05:00
Jozsef Kadlecsik af14cca162 netfilter: ctnetlink: fix soft lockup when netlink adds new entries
Marcell Zambo and Janos Farago noticed and reported that when
new conntrack entries are added via netlink and the conntrack table
gets full, soft lockup happens. This is because the nf_conntrack_lock
is held while nf_conntrack_alloc is called, which is in turn wants
to lock nf_conntrack_lock while evicting entries from the full table.

The patch fixes the soft lockup with limiting the holding of the
nf_conntrack_lock to the minimum, where it's absolutely required.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-02-21 16:25:31 +01:00
Florian Westphal a8db7b2d19 netfilter: nf_queue: fix queueing of bridged gro skbs
When trying to nf_queue GRO/GSO skbs, nf_queue uses skb_gso_segment
to split the skb.

However, if nf_queue is called via bridge netfilter, the mac header
won't be preserved -- packets will thus contain a bogus mac header.

Fix this by setting skb->data to the mac header when skb->nf_bridge
is set and restoring skb->data afterwards for all segments.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-02-09 20:47:53 +01:00
Simon Horman e0aac52e17 ipvs: fix matching of fwmark templates during scheduling
Commit f11017ec2d (2.6.37)
moved the fwmark variable in subcontext that is invalidated before
reaching the ip_vs_ct_in_get call. As vaddr is provided as pointer
in the param structure make sure the fwmark variable is in
same context. As the fwmark templates can not be matched,
more and more template connections are created and the
controlled connections can not go to single real server.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Cc: stable@vger.kernel.org
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-02-04 20:27:58 +01:00
Linus Torvalds ccb19d263f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits)
  tg3: Fix single-vector MSI-X code
  openvswitch: Fix multipart datapath dumps.
  ipv6: fix per device IP snmp counters
  inetpeer: initialize ->redirect_genid in inet_getpeer()
  net: fix NULL-deref in WARN() in skb_gso_segment()
  net: WARN if skb_checksum_help() is called on skb requiring segmentation
  caif: Remove bad WARN_ON in caif_dev
  caif: Fix typo in Vendor/Product-ID for CAIF modems
  bnx2x: Disable AN KR work-around for BCM57810
  bnx2x: Remove AutoGrEEEn for BCM84833
  bnx2x: Remove 100Mb force speed for BCM84833
  bnx2x: Fix PFC setting on BCM57840
  bnx2x: Fix Super-Isolate mode for BCM84833
  net: fix some sparse errors
  net: kill duplicate included header
  net: sh-eth: Fix build error by the value which is not defined
  net: Use device model to get driver name in skb_gso_segment()
  bridge: BH already disabled in br_fdb_cleanup()
  net: move sock_update_memcg outside of CONFIG_INET
  mwl8k: Fixing Sparse ENDIAN CHECK warning
  ...
2012-01-17 22:26:41 -08:00
Jozsef Kadlecsik be94db9dda netfilter: ipset: dumping error triggered removing references twice
If there was a dumping error in the middle, the set-specific variable was
not zeroed out and thus the 'done' function of the dumping wrongly tried
to release the already released reference of the set. The already released
reference was caught by __ip_set_put and triggered a kernel BUG message.
Reported by Jean-Philippe Menil.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-01-17 10:52:55 +01:00
Jozsef Kadlecsik 088067f4f1 netfilter: ipset: autoload set type modules safely
Jan Engelhardt noticed when userspace requests a set type unknown
to the kernel, it can lead to a loop due to the unsafe type module
loading. The issue is fixed in this patch.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-01-17 10:52:46 +01:00
Pablo Neira Ayuso 9bf04646b0 netfilter: revert user-space expectation helper support
This patch partially reverts:
3d058d7 netfilter: rework user-space expectation helper support
that was applied during the 3.2 development cycle.

After this patch, the tree remains just like before patch bc01bef,
that initially added the preliminary infrastructure.

I decided to partially revert this patch because the approach
that I proposed to resolve this problem is broken in NAT setups.
Moreover, a new infrastructure will be submitted for the 3.3.x
development cycle that resolve the existing issues while
providing a neat solution.

Since nobody has been seriously using this infrastructure in
user-space, the removal of this feature should affect any know
FOSS project (to my knowledge).

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-01-16 14:01:23 +01:00
Stephen Rothwell 412662d204 netfilter: xt_hashlimit: fix unused variable warning if IPv6 disabled
Fixes this warning when CONFIG_IP6_NF_IPTABLES is not enabled:

net/netfilter/xt_hashlimit.c: In function ‘hashlimit_init_dst’:
net/netfilter/xt_hashlimit.c:448:9: warning: unused variable ‘frag_off’ [-Wunused-variable]

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-01-16 13:40:54 +01:00
Linus Torvalds c49c41a413 Merge branch 'for-linus' of git://selinuxproject.org/~jmorris/linux-security
* 'for-linus' of git://selinuxproject.org/~jmorris/linux-security:
  capabilities: remove __cap_full_set definition
  security: remove the security_netlink_recv hook as it is equivalent to capable()
  ptrace: do not audit capability check when outputing /proc/pid/stat
  capabilities: remove task_ns_* functions
  capabitlies: ns_capable can use the cap helpers rather than lsm call
  capabilities: style only - move capable below ns_capable
  capabilites: introduce new has_ns_capabilities_noaudit
  capabilities: call has_ns_capability from has_capability
  capabilities: remove all _real_ interfaces
  capabilities: introduce security_capable_noaudit
  capabilities: reverse arguments to security_capable
  capabilities: remove the task from capable LSM hook entirely
  selinux: sparse fix: fix several warnings in the security server cod
  selinux: sparse fix: fix warnings in netlink code
  selinux: sparse fix: eliminate warnings for selinuxfs
  selinux: sparse fix: declare selinux_disable() in security.h
  selinux: sparse fix: move selinux_complete_init
  selinux: sparse fix: make selinux_secmark_refcount static
  SELinux: Fix RCU deref check warning in sel_netport_insert()

Manually fix up a semantic mis-merge wrt security_netlink_recv():

 - the interface was removed in commit fd77846152 ("security: remove
   the security_netlink_recv hook as it is equivalent to capable()")

 - a new user of it appeared in commit a38f7907b9 ("crypto: Add
   userspace configuration API")

causing no automatic merge conflict, but Eric Paris pointed out the
issue.
2012-01-14 18:36:33 -08:00
Eric Dumazet cf778b00e9 net: reintroduce missing rcu_assign_pointer() calls
commit a9b3cd7f32 (rcu: convert uses of rcu_assign_pointer(x, NULL) to
RCU_INIT_POINTER) did a lot of incorrect changes, since it did a
complete conversion of rcu_assign_pointer(x, y) to RCU_INIT_POINTER(x,
y).

We miss needed barriers, even on x86, when y is not NULL.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Stephen Hemminger <shemminger@vyatta.com>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-12 12:26:56 -08:00
Linus Torvalds 98793265b4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (53 commits)
  Kconfig: acpi: Fix typo in comment.
  misc latin1 to utf8 conversions
  devres: Fix a typo in devm_kfree comment
  btrfs: free-space-cache.c: remove extra semicolon.
  fat: Spelling s/obsolate/obsolete/g
  SCSI, pmcraid: Fix spelling error in a pmcraid_err() call
  tools/power turbostat: update fields in manpage
  mac80211: drop spelling fix
  types.h: fix comment spelling for 'architectures'
  typo fixes: aera -> area, exntension -> extension
  devices.txt: Fix typo of 'VMware'.
  sis900: Fix enum typo 'sis900_rx_bufer_status'
  decompress_bunzip2: remove invalid vi modeline
  treewide: Fix comment and string typo 'bufer'
  hyper-v: Update MAINTAINERS
  treewide: Fix typos in various parts of the kernel, and fix some comments.
  clockevents: drop unknown Kconfig symbol GENERIC_CLOCKEVENTS_MIGR
  gpio: Kconfig: drop unknown symbol 'CS5535_GPIO'
  leds: Kconfig: Fix typo 'D2NET_V2'
  sound: Kconfig: drop unknown symbol ARCH_CLPS7500
  ...

Fix up trivial conflicts in arch/powerpc/platforms/40x/Kconfig (some new
kconfig additions, close to removed commented-out old ones)
2012-01-08 13:21:22 -08:00
Eric Paris fd77846152 security: remove the security_netlink_recv hook as it is equivalent to capable()
Once upon a time netlink was not sync and we had to get the effective
capabilities from the skb that was being received.  Today we instead get
the capabilities from the current task.  This has rendered the entire
purpose of the hook moot as it is now functionally equivalent to the
capable() call.

Signed-off-by: Eric Paris <eparis@redhat.com>
2012-01-05 18:53:01 -05:00
David S. Miller 455ffa607f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-01-02 18:56:49 -05:00
Pablo Neira Ayuso 3ab0b245aa netfilter: nfnetlink_acct: fix nfnl_acct_get operation
The get operation was not sending the message that was built to
user-space. This patch also includes the appropriate handling for
the return value of netlink_unicast().

Moreover, fix error codes on error (for example, for non-existing
entry was uncorrect).

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-01-01 16:36:08 +01:00
Xi Wang c121638277 netfilter: ctnetlink: fix timeout calculation
The sanity check (timeout < 0) never works; the dividend is unsigned
and so is the division, which should have been a signed division.

	long timeout = (ct->timeout.expires - jiffies) / HZ;
	if (timeout < 0)
		timeout = 0;

This patch converts the time values to signed for the division.

Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-12-31 16:59:04 +01:00