Commit Graph

148 Commits

Author SHA1 Message Date
Jesper Dangaard Brouer 54d83efa44 ipvs: fix build errors related to config option combinations
Fix two build error introduced by commit 63dca2c0:
 "ipvs: Fix faulty IPv6 extension header handling in IPVS"

First build error was fairly trivial and can occur, when
CONFIG_IP_VS_IPV6 is disabled.

The second build error was tricky, and can occur when deselecting
both all Netfilter and IPVS, but selecting CONFIG_IPV6.  This is
caused by "kernel/sysctl_binary.c" including "net/ip_vs.h", which
includes "linux/netfilter_ipv6/ip6_tables.h" causing include
of "include/linux/netfilter/x_tables.h" which then cannot find
the typedef nf_hookfn.

Fix this by only including "linux/netfilter_ipv6/ip6_tables.h" in
case of CONFIG_IP_VS_IPV6 as its already used to guard the usage
of ipv6_find_hdr().

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Reported-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-10-23 09:23:40 +09:00
Jesper Dangaard Brouer d4383f04d1 ipvs: API change to avoid rescan of IPv6 exthdr
Reduce the number of times we scan/skip the IPv6 exthdrs.

This patch contains a lot of API changes.  This is done, to avoid
repeating the scan of finding the IPv6 headers, via ipv6_find_hdr(),
which is called by ip_vs_fill_iph_skb().

Finding the IPv6 headers is done as early as possible, and passed on
as a pointer "struct ip_vs_iphdr *" to the affected functions.

This patch reduce/removes 19 calls to ip_vs_fill_iph_skb().

Notice, I have choosen, not to change the API of function
pointer "(*schedule)" (in struct ip_vs_scheduler) as it can be
used by external schedulers, via {un,}register_ip_vs_scheduler.
Only 4 out of 10 schedulers use info from ip_vs_iphdr*, and when
they do, they are only interested in iph->{s,d}addr.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-09-28 11:34:33 +09:00
Jesper Dangaard Brouer 2f74713d14 ipvs: Complete IPv6 fragment handling for IPVS
IPVS now supports fragmented packets, with support from nf_conntrack_reasm.c

Based on patch from: Hans Schillstrom.

IPVS do like conntrack i.e. use the skb->nfct_reasm
(i.e. when all fragments is collected, nf_ct_frag6_output()
starts a "re-play" of all fragments into the interrupted
PREROUTING chain at prio -399 (NF_IP6_PRI_CONNTRACK_DEFRAG+1)
with nfct_reasm pointing to the assembled packet.)

Notice, module nf_defrag_ipv6 must be loaded for this to work.
Report unhandled fragments, and recommend user to load nf_defrag_ipv6.

To handle fw-mark for fragments.  Add a new IPVS hook into prerouting
chain at prio -99 (NF_IP6_PRI_NAT_DST+1) to catch fragments, and copy
fw-mark info from the first packet with an upper layer header.

IPv6 fragment handling should be the last thing on the IPVS IPv6
missing support list.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-09-28 11:34:24 +09:00
Jesper Dangaard Brouer 63dca2c0b0 ipvs: Fix faulty IPv6 extension header handling in IPVS
IPv6 packets can contain extension headers, thus its wrong to assume
that the transport/upper-layer header, starts right after (struct
ipv6hdr) the IPv6 header.  IPVS uses this false assumption, and will
write SNAT & DNAT modifications at a fixed pos which will corrupt the
message.

To fix this, proper header position must be found before modifying
packets.  Introducing ip_vs_fill_iph_skb(), which uses ipv6_find_hdr()
to skip the exthdrs. It finds (1) the transport header offset, (2) the
protocol, and (3) detects if the packet is a fragment.

Note, that fragments in IPv6 is represented via an exthdr.  Thus, this
is detected while skipping through the exthdrs.

This patch depends on commit 84018f55a:
 "netfilter: ip6_tables: add flags parameter to ipv6_find_hdr()"
This also adds a dependency to ip6_tables.

Originally based on patch from: Hans Schillstrom

kABI notes:
Changing struct ip_vs_iphdr is a potential minor kABI breaker,
because external modules can be compiled with another version of
this struct.  This should not matter, as they would most-likely
be using a compiled-in version of ip_vs_fill_iphdr().  When
recompiled, they will notice ip_vs_fill_iphdr() no longer exists,
and they have to used ip_vs_fill_iph_skb() instead.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-09-28 11:34:15 +09:00
Jesper Dangaard Brouer a638e51437 ipvs: Use config macro IS_ENABLED()
Cleanup patch.

Use the IS_ENABLED macro, instead of having to check
both the build and the module CONFIG_ option.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-09-28 11:34:05 +09:00
Jesper Dangaard Brouer 120b9c14f4 ipvs: Trivial changes, use compressed IPv6 address in output
Have not converted the proc file output to compressed IPv6 addresses.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-09-28 11:33:52 +09:00
Julian Anastasov 3654e61137 ipvs: add pmtu_disc option to disable IP DF for TUN packets
Disabling PMTU discovery can increase the output packet
rate but some users have enough resources and prefer to fragment
than to drop traffic. By default, we copy the DF bit but if
pmtu_disc is disabled we do not send FRAG_NEEDED messages anymore.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-08-10 10:35:07 +09:00
Julian Anastasov be97fdb5fb ipvs: generalize app registration in netns
Get rid of the ftp_app pointer and allow applications
to be registered without adding fields in the netns_ipvs structure.

v2: fix coding style as suggested by Pablo Neira Ayuso <pablo@netfilter.org>

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-08-10 10:34:51 +09:00
Lin Ming 9e33ce453f ipvs: fix oops on NAT reply in br_nf context
IPVS should not reset skb->nf_bridge in FORWARD hook
by calling nf_reset for NAT replies. It triggers oops in
br_nf_forward_finish.

[  579.781508] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
[  579.781669] IP: [<ffffffff817b1ca5>] br_nf_forward_finish+0x58/0x112
[  579.781792] PGD 218f9067 PUD 0
[  579.781865] Oops: 0000 [#1] SMP
[  579.781945] CPU 0
[  579.781983] Modules linked in:
[  579.782047]
[  579.782080]
[  579.782114] Pid: 4644, comm: qemu Tainted: G        W    3.5.0-rc5-00006-g95e69f9 #282 Hewlett-Packard  /30E8
[  579.782300] RIP: 0010:[<ffffffff817b1ca5>]  [<ffffffff817b1ca5>] br_nf_forward_finish+0x58/0x112
[  579.782455] RSP: 0018:ffff88007b003a98  EFLAGS: 00010287
[  579.782541] RAX: 0000000000000008 RBX: ffff8800762ead00 RCX: 000000000001670a
[  579.782653] RDX: 0000000000000000 RSI: 000000000000000a RDI: ffff8800762ead00
[  579.782845] RBP: ffff88007b003ac8 R08: 0000000000016630 R09: ffff88007b003a90
[  579.782957] R10: ffff88007b0038e8 R11: ffff88002da37540 R12: ffff88002da01a02
[  579.783066] R13: ffff88002da01a80 R14: ffff88002d83c000 R15: ffff88002d82a000
[  579.783177] FS:  0000000000000000(0000) GS:ffff88007b000000(0063) knlGS:00000000f62d1b70
[  579.783306] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
[  579.783395] CR2: 0000000000000004 CR3: 00000000218fe000 CR4: 00000000000027f0
[  579.783505] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  579.783684] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  579.783795] Process qemu (pid: 4644, threadinfo ffff880021b20000, task ffff880021aba760)
[  579.783919] Stack:
[  579.783959]  ffff88007693cedc ffff8800762ead00 ffff88002da01a02 ffff8800762ead00
[  579.784110]  ffff88002da01a02 ffff88002da01a80 ffff88007b003b18 ffffffff817b26c7
[  579.784260]  ffff880080000000 ffffffff81ef59f0 ffff8800762ead00 ffffffff81ef58b0
[  579.784477] Call Trace:
[  579.784523]  <IRQ>
[  579.784562]
[  579.784603]  [<ffffffff817b26c7>] br_nf_forward_ip+0x275/0x2c8
[  579.784707]  [<ffffffff81704b58>] nf_iterate+0x47/0x7d
[  579.784797]  [<ffffffff817ac32e>] ? br_dev_queue_push_xmit+0xae/0xae
[  579.784906]  [<ffffffff81704bfb>] nf_hook_slow+0x6d/0x102
[  579.784995]  [<ffffffff817ac32e>] ? br_dev_queue_push_xmit+0xae/0xae
[  579.785175]  [<ffffffff8187fa95>] ? _raw_write_unlock_bh+0x19/0x1b
[  579.785179]  [<ffffffff817ac417>] __br_forward+0x97/0xa2
[  579.785179]  [<ffffffff817ad366>] br_handle_frame_finish+0x1a6/0x257
[  579.785179]  [<ffffffff817b2386>] br_nf_pre_routing_finish+0x26d/0x2cb
[  579.785179]  [<ffffffff817b2cf0>] br_nf_pre_routing+0x55d/0x5c1
[  579.785179]  [<ffffffff81704b58>] nf_iterate+0x47/0x7d
[  579.785179]  [<ffffffff817ad1c0>] ? br_handle_local_finish+0x44/0x44
[  579.785179]  [<ffffffff81704bfb>] nf_hook_slow+0x6d/0x102
[  579.785179]  [<ffffffff817ad1c0>] ? br_handle_local_finish+0x44/0x44
[  579.785179]  [<ffffffff81551525>] ? sky2_poll+0xb35/0xb54
[  579.785179]  [<ffffffff817ad62a>] br_handle_frame+0x213/0x229
[  579.785179]  [<ffffffff817ad417>] ? br_handle_frame_finish+0x257/0x257
[  579.785179]  [<ffffffff816e3b47>] __netif_receive_skb+0x2b4/0x3f1
[  579.785179]  [<ffffffff816e69fc>] process_backlog+0x99/0x1e2
[  579.785179]  [<ffffffff816e6800>] net_rx_action+0xdf/0x242
[  579.785179]  [<ffffffff8107e8a8>] __do_softirq+0xc1/0x1e0
[  579.785179]  [<ffffffff8135a5ba>] ? trace_hardirqs_off_thunk+0x3a/0x6c
[  579.785179]  [<ffffffff8188812c>] call_softirq+0x1c/0x30

The steps to reproduce as follow,

1. On Host1, setup brige br0(192.168.1.106)
2. Boot a kvm guest(192.168.1.105) on Host1 and start httpd
3. Start IPVS service on Host1
   ipvsadm -A -t 192.168.1.106:80 -s rr
   ipvsadm -a -t 192.168.1.106:80 -r 192.168.1.105:80 -m
4. Run apache benchmark on Host2(192.168.1.101)
   ab -n 1000 http://192.168.1.106/

ip_vs_reply4
  ip_vs_out
    handle_response
      ip_vs_notrack
        nf_reset()
        {
          skb->nf_bridge = NULL;
        }

Actually, IPVS wants in this case just to replace nfct
with untracked version. So replace the nf_reset(skb) call
in ip_vs_notrack() with a nf_conntrack_put(skb->nfct) call.

Signed-off-by: Lin Ming <mlin@ss.pku.edu.cn>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-07-17 12:00:46 +02:00
Pablo Neira Ayuso f73181c828 ipvs: add support for sync threads
Allow master and backup servers to use many threads
for sync traffic. Add sysctl var "sync_ports" to define the
number of threads. Every thread will use single UDP port,
thread 0 will use the default port 8848 while last thread
will use port 8848+sync_ports-1.

	The sync traffic for connections is scheduled to many
master threads based on the cp address but one connection is
always assigned to same thread to avoid reordering of the
sync messages.

	Remove ip_vs_sync_switch_mode because this check
for sync mode change is still risky. Instead, check for mode
change under sync_buff_lock.

	Make sure the backup socks do not block on reading.

Special thanks to Aleksey Chudov for helping in all tests.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:40:33 +02:00
Julian Anastasov 749c42b620 ipvs: reduce sync rate with time thresholds
Add two new sysctl vars to control the sync rate with the
main idea to reduce the rate for connection templates because
currently it depends on the packet rate for controlled connections.
This mechanism should be useful also for normal connections
with high traffic.

sync_refresh_period: in seconds, difference in reported connection
	timer that triggers new sync message. It can be used to
	avoid sync messages for the specified period (or half of
	the connection timeout if it is lower) if connection state
	is not changed from last sync.

sync_retries: integer, 0..3, defines sync retries with period of
	sync_refresh_period/8. Useful to protect against loss of
	sync messages.

	Allow sysctl_sync_threshold to be used with
sysctl_sync_period=0, so that only single sync message is sent
if sync_refresh_period is also 0.

	Add new field "sync_endtime" in connection structure to
hold the reported time when connection expires. The 2 lowest
bits will represent the retry count.

	As the sysctl_sync_period now can be 0 use ACCESS_ONCE to
avoid division by zero.

	Special thanks to Aleksey Chudov for being patient with me,
for his extensive reports and helping in all tests.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:40:10 +02:00
Pablo Neira Ayuso 1c003b1580 ipvs: wakeup master thread
High rate of sync messages in master can lead to
overflowing the socket buffer and dropping the messages.
Fixed sleep of 1 second without wakeup events is not suitable
for loaded masters,

	Use delayed_work to schedule sending for queued messages
and limit the delay to IPVS_SYNC_SEND_DELAY (20ms). This will
reduce the rate of wakeups but to avoid sending long bursts we
wakeup the master thread after IPVS_SYNC_WAKEUP_RATE (8) messages.

	Add hard limit for the queued messages before sending
by using "sync_qlen_max" sysctl var. It defaults to 1/32 of
the memory pages but actually represents number of messages.
It will protect us from allocating large parts of memory
when the sending rate is lower than the queuing rate.

	As suggested by Pablo, add new sysctl var
"sync_sock_size" to configure the SNDBUF (master) or
RCVBUF (slave) socket limit. Default value is 0 (preserve
system defaults).

	Change the master thread to detect and block on
SNDBUF overflow, so that we do not drop messages when
the socket limit is low but the sync_qlen_max limit is
not reached. On ENOBUFS or other errors just drop the
messages.

	Change master thread to enter TASK_INTERRUPTIBLE
state early, so that we do not miss wakeups due to messages or
kthread_should_stop event.

Thanks to Pablo Neira Ayuso for his valuable feedback!

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:39:53 +02:00
David S. Miller 0d6c4a2e46 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/intel/e1000e/param.c
	drivers/net/wireless/iwlwifi/iwl-agn-rx.c
	drivers/net/wireless/iwlwifi/iwl-trans-pcie-rx.c
	drivers/net/wireless/iwlwifi/iwl-trans.h

Resolved the iwlwifi conflict with mainline using 3-way diff posted
by John Linville and Stephen Rothwell.  In 'net' we added a bug
fix to make iwlwifi report a more accurate skb->truesize but this
conflicted with RX path changes that happened meanwhile in net-next.

In e1000e a conflict arose in the validation code for settings of
adapter->itr.  'net-next' had more sophisticated logic so that
logic was used.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-07 23:35:40 -04:00
Hans Schillstrom 8537de8a7a ipvs: kernel oops - do_ip_vs_get_ctl
Change order of init so netns init is ready
when register ioctl and netlink.

Ver2
	Whitespace fixes and __init added.

Reported-by: "Ryan O'Hara" <rohara@redhat.com>
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-04-30 10:40:35 +02:00
Hans Schillstrom 582b8e3ead ipvs: take care of return value from protocol init_netns
ip_vs_create_timeout_table() can return NULL
All functions protocol init_netns is affected of this patch.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-04-30 10:40:35 +02:00
Eric W. Biederman a5347fe36b net: Delete all remaining instances of ctl_path
We don't use struct ctl_path anymore so delete the exported constants.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:22:30 -04:00
Eric Dumazet 95c9617472 net: cleanup unsigned to unsigned int
Use of "unsigned int" is preferred to bare "unsigned" in net tree.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15 12:44:40 -04:00
Paul Gortmaker 187f1882b5 BUG: headers with BUG/BUG_ON etc. need linux/bug.h
If a header file is making use of BUG, BUG_ON, BUILD_BUG_ON, or any
other BUG variant in a static inline (i.e. not in a #define) then
that header really should be including <linux/bug.h> and not just
expecting it to be implicitly present.

We can make this change risk-free, since if the files using these
headers didn't have exposure to linux/bug.h already, they would have
been causing compile failures/warnings.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-03-04 17:54:34 -05:00
David S. Miller 455ffa607f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-01-02 18:56:49 -05:00
Julian Anastasov 52793dbe3d ipvs: try also real server with port 0 in backup server
We should not forget to try for real server with port 0
in the backup server when processing the sync message. We should
do it in all cases because the backup server can use different
forwarding method.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-12-31 16:06:29 +01:00
Alexey Dobriyan 4e3fd7a06d net: remove ipv6_addr_copy()
C assignment can handle struct in6_addr copying.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-22 16:43:32 -05:00
Linus Torvalds 32aaeffbd4 Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
  Revert "tracing: Include module.h in define_trace.h"
  irq: don't put module.h into irq.h for tracking irqgen modules.
  bluetooth: macroize two small inlines to avoid module.h
  ip_vs.h: fix implicit use of module_get/module_put from module.h
  nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
  include: replace linux/module.h with "struct module" wherever possible
  include: convert various register fcns to macros to avoid include chaining
  crypto.h: remove unused crypto_tfm_alg_modname() inline
  uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
  pm_runtime.h: explicitly requires notifier.h
  linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
  miscdevice.h: fix up implicit use of lists and types
  stop_machine.h: fix implicit use of smp.h for smp_processor_id
  of: fix implicit use of errno.h in include/linux/of.h
  of_platform.h: delete needless include <linux/module.h>
  acpi: remove module.h include from platform/aclinux.h
  miscdevice.h: delete unnecessary inclusion of module.h
  device_cgroup.h: delete needless include <linux/module.h>
  net: sch_generic remove redundant use of <linux/module.h>
  net: inet_timewait_sock doesnt need <linux/module.h>
  ...

Fix up trivial conflicts (other header files, and  removal of the ab3550 mfd driver) in
 - drivers/media/dvb/frontends/dibx000_common.c
 - drivers/media/video/{mt9m111.c,ov6650.c}
 - drivers/mfd/ab3550-core.c
 - include/linux/dmaengine.h
2011-11-06 19:44:47 -08:00
Krzysztof Wilczynski e23ebf0fa9 ipvs: Fix compilation error in ip_vs.h for ip_vs_confirm_conntrack function.
This is to address the following error during the compilation:

  In file included from kernel/sysctl_binary.c:6:
  include/net/ip_vs.h:1406: error: expected identifier or ‘(’ before ‘{’ token
  make[1]: *** [kernel/sysctl_binary.o] Error 1
  make[1]: *** Waiting for unfinished jobs....

That manifests itself when CONFIG_IP_VS_NFCT is undefined in .config file.

Signed-off-by: Krzysztof Wilczynski <krzysztof.wilczynski@linux.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-11-01 09:20:01 +01:00
Simon Horman 4a516f1108 ipvs: Remove unused return value of protocol state transitions
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-11-01 09:19:33 +01:00
Simon Horman 3c2de2ae02 ipvs: Remove unused parameter from ip_vs_confirm_conntrack()
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-11-01 09:19:29 +01:00
Paul Gortmaker 69e7dae409 ip_vs.h: fix implicit use of module_get/module_put from module.h
This file was using the module get/put functions in two simple inline
functions.  But module_get/put were only within scope because of
the implicit presence of module.h being everywhere.

Rather than add module.h to another file in include/  -- which is
exactly the thing we are trying to avoid, simply convert these
one-line functions into a define, as per what was done for the
device_schedule_callback() in commit 523ded71de.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:32:34 -04:00
Hans Schillstrom ae1d48b23d IPVS netns shutdown/startup dead-lock
ip_vs_mutext is used by both netns shutdown code and startup
and both implicit uses sk_lock-AF_INET mutex.

cleanup CPU-1         startup CPU-2
ip_vs_dst_event()     ip_vs_genl_set_cmd()
 sk_lock-AF_INET     __ip_vs_mutex
                     sk_lock-AF_INET
__ip_vs_mutex
* DEAD LOCK *

A new mutex placed in ip_vs netns struct called sync_mutex is added.

Comments from Julian and Simon added.
This patch has been running for more than 3 month now and it seems to work.

Ver. 3
    IP_VS_SO_GET_DAEMON in do_ip_vs_get_ctl protected by sync_mutex
    instead of __ip_vs_mutex as sugested by Julian.

Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-10-12 18:32:15 +02:00
Arun Sharma 60063497a9 atomic: use <linux/atomic.h>
This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>

Signed-off-by: Arun Sharma <asharma@fb.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-26 16:49:47 -07:00
Hans Schillstrom 6c8f794993 IPVS: remove unused init and cleanup functions.
After restructuring, there is some unused or empty functions
left to be removed.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-06-14 09:07:32 +09:00
Hans Schillstrom 503cf15a5e IPVS: rename of netns init and cleanup functions.
Make it more clear what the functions does,
on request by Julian.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-06-13 17:10:09 +09:00
Hans Schillstrom ed78bec4d6 IPVS remove unused var from migration to netns
Remove variable ctl_key from struct netns_ipvs,
it's a leftover from early netns work.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-06-13 17:04:06 +09:00
Hans Schillstrom c74c0bfe0b IPVS: bug in ip_vs_ftp, same list heaad used in all netns.
When ip_vs was adapted to netns the ftp application was not adapted
in a correct way.
However this is a fix to avoid kernel errors. In the long term another solution
might be chosen.  I.e the ports that the ftp appl, uses should be per netns.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-05-27 13:37:46 +02:00
Julian Anastasov c92f5ca2e5 ipvs: Remove all remaining references to rt->rt_{src,dst}
Remove all remaining references to rt->rt_{src,dst}
by using dest->dst_saddr to cache saddr (used for TUN mode).
For ICMP in FORWARD hook just restrict the rt_mode for NAT
to disable LOCALNODE. All other modes do not allow
IP_VS_RT_MODE_RDR, so we should be safe with the ICMP
forwarding. Using cp->daddr as replacement for rt_dst
is safe for all modes except BYPASS, even when cp->dest is
NULL because it is cp->daddr that is used to assign cp->dest
for sync-ed connections.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-12 18:24:46 -04:00
David S. Miller 3c709f8fb4 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-3.6
Conflicts:
	drivers/net/benet/be_main.c
2011-05-11 14:26:58 -04:00
Hans Schillstrom 7a4f0761fc IPVS: init and cleanup restructuring
DESCRIPTION
This patch tries to restore the initial init and cleanup
sequences that was before namspace patch.
Netns also requires action when net devices unregister
which has never been implemented. I.e this patch also
covers when a device moves into a network namespace,
and has to be released.

IMPLEMENTATION
The number of calls to register_pernet_device have been
reduced to one for the ip_vs.ko
Schedulers still have their own calls.

This patch adds a function __ip_vs_service_cleanup()
and an enable flag for the netfilter hooks.

The nf hooks will be enabled when the first service is loaded
and never disabled again, except when a namespace exit starts.

Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
[horms@verge.net.au: minor edit to changelog]
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-05-10 09:52:47 +02:00
David S. Miller 2a9e950701 net: Remove __KERNEL__ cpp checks from include/net
These header files are never installed to user consumption, so any
__KERNEL__ cpp checks are superfluous.

Projects should also not copy these files into their userland utility
sources and try to use them there.  If they insist on doing so, the
onus is on them to sanitize the headers as needed.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-24 10:54:56 -07:00
Linus Torvalds c44eaf41a5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (34 commits)
  net: Add support for SMSC LAN9530, LAN9730 and LAN89530
  mlx4_en: Restoring RX buffer pointer in case of failure
  mlx4: Sensing link type at device initialization
  ipv4: Fix "Set rt->rt_iif more sanely on output routes."
  MAINTAINERS: add entry for Xen network backend
  be2net: Fix suspend/resume operation
  be2net: Rename some struct members for clarity
  pppoe: drop PPPOX_ZOMBIEs in pppoe_flush_dev
  dsa/mv88e6131: add support for mv88e6085 switch
  ipv6: Enable RFS sk_rxhash tracking for ipv6 sockets (v2)
  be2net: Fix a potential crash during shutdown.
  bna: Fix for handling firmware heartbeat failure
  can: mcp251x: Allow pass IRQ flags through platform data.
  smsc911x: fix mac_lock acquision before calling smsc911x_mac_read
  iwlwifi: accept EEPROM version 0x423 for iwl6000
  rt2x00: fix cancelling uninitialized work
  rtlwifi: Fix some warnings/bugs
  p54usb: IDs for two new devices
  wl12xx: fix potential buffer overflow in testmode nvs push
  zd1211rw: reset rx idle timer from tasklet
  ...
2011-04-11 07:27:24 -07:00
Hans Schillstrom a09d19779f IPVS: fix NULL ptr dereference in ip_vs_ctl.c ip_vs_genl_dump_daemons()
ipvsadm -ln --daemon will trigger a Null pointer exception because
ip_vs_genl_dump_daemons() uses skb_net() instead of skb_sknet().

To prevent others from NULL ptr a check is made in ip_vs.h skb_net().

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-04-04 15:25:18 +02:00
Lucas De Marchi 25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
Simon Horman 736561a01f IPVS: Use global mutex in ip_vs_app.c
As part of the work to make IPVS network namespace aware
__ip_vs_app_mutex was replaced by a per-namespace lock,
ipvs->app_mutex. ipvs->app_key is also supplied for debugging purposes.

Unfortunately this implementation results in ipvs->app_key residing
in non-static storage which at the very least causes a lockdep warning.

This patch takes the rather heavy-handed approach of reinstating
__ip_vs_app_mutex which will cover access to the ipvs->list_head
of all network namespaces.

[   12.610000] IPVS: Creating netns size=2456 id=0
[   12.630000] IPVS: Registered protocols (TCP, UDP, SCTP, AH, ESP)
[   12.640000] BUG: key ffff880003bbf1a0 not in .data!
[   12.640000] ------------[ cut here ]------------
[   12.640000] WARNING: at kernel/lockdep.c:2701 lockdep_init_map+0x37b/0x570()
[   12.640000] Hardware name: Bochs
[   12.640000] Pid: 1, comm: swapper Tainted: G        W 2.6.38-kexec-06330-g69b7efe-dirty #122
[   12.650000] Call Trace:
[   12.650000]  [<ffffffff8102e685>] warn_slowpath_common+0x75/0xb0
[   12.650000]  [<ffffffff8102e6d5>] warn_slowpath_null+0x15/0x20
[   12.650000]  [<ffffffff8105967b>] lockdep_init_map+0x37b/0x570
[   12.650000]  [<ffffffff8105829d>] ? trace_hardirqs_on+0xd/0x10
[   12.650000]  [<ffffffff81055ad8>] debug_mutex_init+0x38/0x50
[   12.650000]  [<ffffffff8104bc4c>] __mutex_init+0x5c/0x70
[   12.650000]  [<ffffffff81685ee7>] __ip_vs_app_init+0x64/0x86
[   12.660000]  [<ffffffff81685a3b>] ? ip_vs_init+0x0/0xff
[   12.660000]  [<ffffffff811b1c33>] T.620+0x43/0x170
[   12.660000]  [<ffffffff811b1e9a>] ? register_pernet_subsys+0x1a/0x40
[   12.660000]  [<ffffffff81685a3b>] ? ip_vs_init+0x0/0xff
[   12.660000]  [<ffffffff81685a3b>] ? ip_vs_init+0x0/0xff
[   12.660000]  [<ffffffff811b1db7>] register_pernet_operations+0x57/0xb0
[   12.660000]  [<ffffffff81685a3b>] ? ip_vs_init+0x0/0xff
[   12.670000]  [<ffffffff811b1ea9>] register_pernet_subsys+0x29/0x40
[   12.670000]  [<ffffffff81685f19>] ip_vs_app_init+0x10/0x12
[   12.670000]  [<ffffffff81685a87>] ip_vs_init+0x4c/0xff
[   12.670000]  [<ffffffff8166562c>] do_one_initcall+0x7a/0x12e
[   12.670000]  [<ffffffff8166583e>] kernel_init+0x13e/0x1c2
[   12.670000]  [<ffffffff8128c134>] kernel_thread_helper+0x4/0x10
[   12.670000]  [<ffffffff8128ad40>] ? restore_args+0x0/0x30
[   12.680000]  [<ffffffff81665700>] ? kernel_init+0x0/0x1c2
[   12.680000]  [<ffffffff8128c130>] ? kernel_thread_helper+0x0/0x1global0

Signed-off-by: Simon Horman <horms@verge.net.au>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-21 20:39:24 -07:00
Simon Horman f2247fbdc4 IPVS: Conditionally include sysctl members of struct netns_ipvs
There is now no need to include sysctl members of struct netns_ipvs
unless CONFIG_SYSCTL is defined.

Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15 09:37:02 +09:00
Simon Horman a4e2f5a700 IPVS: Conditional ip_vs_conntrack_enabled()
ip_vs_conntrack_enabled() becomes a noop when CONFIG_SYSCTL is undefined.

In preparation for not including sysctl_conntrack in
struct netns_ipvs when CONFIG_SYCTL is not defined.

Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15 09:37:00 +09:00
Simon Horman 3a1bbf1885 IPVS: ip_vs_todrop() becomes a noop when CONFIG_SYSCTL is undefined
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15 09:36:59 +09:00
Simon Horman 7532e8d40c IPVS: Add sysctl_sync_ver()
In preparation for not including sysctl_sync_ver in
struct netns_ipvs when CONFIG_SYCTL is not defined.

Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15 09:36:57 +09:00
Simon Horman 59e0350ead IPVS: Add {sysctl_sync_threshold,period}()
In preparation for not including sysctl_sync_threshold in
struct netns_ipvs when CONFIG_SYCTL is not defined.

Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15 09:36:57 +09:00
Julian Anastasov 6ef757f965 ipvs: rename estimator functions
Rename ip_vs_new_estimator to ip_vs_start_estimator
and ip_vs_kill_estimator to ip_vs_stop_estimator to better
match their logic.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15 09:36:54 +09:00
Julian Anastasov ea9f22cce9 ipvs: optimize rates reading
Move the estimator reading from estimation_timer to user
context. ip_vs_read_estimator() will be used to decode the rate
values. As the decoded rates are not set by estimation timer
there is no need to reset them in ip_vs_zero_stats.

 	There is no need ip_vs_new_estimator() to encode stats
to rates, if the destination is in trash both the stats and the
rates are inactive.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15 09:36:53 +09:00
Julian Anastasov 87d68a15e2 ipvs: remove unused seqcount stats
Remove ustats_seq, IPVS_STAT_INC and IPVS_STAT_ADD
because they are not used. They were replaced with u64_stats.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15 09:36:53 +09:00
Julian Anastasov 55a3d4e15c ipvs: properly zero stats and rates
Currently, the new percpu counters are not zeroed and
the zero commands do not work as expected, we still show the old
sum of percpu values. OTOH, we can not reset the percpu counters
from user context without causing the incrementing to use old
and bogus values.

 	So, as Eric Dumazet suggested fix that by moving all overhead
to stats reading in user context. Do not introduce overhead in
timer context (estimator) and incrementing (packet handling in
softirqs).

 	The new ustats0 field holds the zero point for all
counter values, the rates always use 0 as base value as before.
When showing the values to user space just give the difference
between counters and the base values. The only drawback is that
percpu stats are not zeroed, they are accessible only from /proc
and are new interface, so it should not be a compatibility problem
as long as the sum stats are correct after zeroing.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15 09:36:52 +09:00
Julian Anastasov 2a0751af09 ipvs: reorganize tot_stats
The global tot_stats contains cpustats field just like the
stats for dest and svc, so better use it to simplify the usage
in estimation_timer. As tot_stats is registered as estimator
we can remove the special ip_vs_read_cpu_stats call for
tot_stats. Fix ip_vs_read_cpu_stats to be called under
stats lock because it is still used as synchronization between
estimation timer and user context (the stats readers).

 	Also, make sure ip_vs_stats_percpu_show reads properly
the u64 stats from user context.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2011-03-15 09:36:52 +09:00