Commit Graph

33157 Commits

Author SHA1 Message Date
Ilya Dryomov
37ab77ac29 libceph: drop osd ref when canceling con work
queue_con() bumps osd ref count.  We should do the reverse when
canceling con work.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-07-08 15:08:46 +04:00
Ilya Dryomov
2d05f082cb libceph: nuke ceph_osdc_unregister_linger_request()
Remove now unused ceph_osdc_unregister_linger_request().

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-07-08 15:08:45 +04:00
Ilya Dryomov
c9f9b93ddf libceph: introduce ceph_osdc_cancel_request()
Introduce ceph_osdc_cancel_request() intended for canceling requests
from the higher layers (rbd and cephfs).  Because higher layers are in
charge and are supposed to know what and when they are canceling, the
request is not completed, only unref'ed and removed from the libceph
data structures.

__cancel_request() is no longer called before __unregister_request(),
because __unregister_request() unconditionally revokes r_request and
there is no point in trying to do it twice.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-07-08 15:08:44 +04:00
Ilya Dryomov
4f23409e0c libceph: fix linger request check in __unregister_request()
We should check if request is on the linger request list of any of the
OSDs, not whether request is registered or not.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-07-08 15:08:44 +04:00
Ilya Dryomov
af59306455 libceph: unregister only registered linger requests
Linger requests that have not yet been registered should not be
unregistered by __unregister_linger_request().  This messes up ref
count and leads to use-after-free.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-07-08 15:08:44 +04:00
Ilya Dryomov
7c6e6fc53e libceph: assert both regular and lingering lists in __remove_osd()
It is important that both regular and lingering requests lists are
empty when the OSD is removed.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-07-08 15:08:43 +04:00
Ilya Dryomov
6562d661d2 libceph: harden ceph_osdc_request_release() a bit
Add some WARN_ONs to alert us when we try to destroy requests that are
still registered.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-07-08 15:08:43 +04:00
Ilya Dryomov
9e94af202a libceph: move and add dout()s to ceph_osdc_request_{get,put}()
Add dout()s to ceph_osdc_request_{get,put}().  Also move them to .c and
turn kref release callback into a static function.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-07-08 15:08:43 +04:00
Ilya Dryomov
0215e44bb3 libceph: move and add dout()s to ceph_msg_{get,put}()
Add dout()s to ceph_msg_{get,put}().  Also move them to .c and turn
kref release callback into a static function.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-07-08 15:08:43 +04:00
Ilya Dryomov
bbf37ec3a6 libceph: add maybe_move_osd_to_lru() and switch to it
Abstract out __move_osd_to_lru() logic from __unregister_request() and
__unregister_linger_request().

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-07-08 15:08:43 +04:00
Ilya Dryomov
1d0326b13b libceph: rename ceph_osd_request::r_linger_osd to r_linger_osd_item
So that:

req->r_osd_item --> osd->o_requests list
req->r_linger_osd_item --> osd->o_linger_requests list

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2014-07-08 15:08:42 +04:00
Linus Torvalds
eb477e03fe Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
 "Mostly minor fixes this time around.  The highlights include:

   - iscsi-target CHAP authentication fixes to enforce explicit key
     values (Tejas Vaykole + rahul.rane)
   - fix a long-standing OOPs in target-core when a alua configfs
     attribute is accessed after port symlink has been removed.
     (Sebastian Herbszt)
   - fix a v3.10.y iscsi-target regression causing the login reject
     status class/detail to be ignored (Christoph Vu-Brugier)
   - fix a v3.10.y iscsi-target regression to avoid rejecting an
     existing ITT during Data-Out when data-direction is wrong (Santosh
     Kulkarni + Arshad Hussain)
   - fix a iscsi-target related shutdown deadlock on UP kernels (Mikulas
     Patocka)
   - fix a v3.16-rc1 build issue with vhost-scsi + !CONFIG_NET (MST)"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  iscsi-target: fix iscsit_del_np deadlock on unload
  iovec: move memcpy_from/toiovecend to lib/iovec.c
  iscsi-target: Avoid rejecting incorrect ITT for Data-Out
  tcm_loop: Fix memory leak in tcm_loop_submission_work error path
  iscsi-target: Explicily clear login response PDU in exception path
  target: Fix left-over se_lun->lun_sep pointer OOPs
  iscsi-target; Enforce 1024 byte maximum for CHAP_C key value
  iscsi-target: Convert chap_server_compute_md5 to use kstrtoul
2014-06-28 09:43:58 -07:00
Michael S. Tsirkin
ac5ccdba3a iovec: move memcpy_from/toiovecend to lib/iovec.c
ERROR: "memcpy_fromiovecend" [drivers/vhost/vhost_scsi.ko] undefined!

commit 9f977ef7b6
    vhost-scsi: Include prot_bytes into expected data transfer length
in target-pending makes drivers/vhost/scsi.c call memcpy_fromiovecend().
This function is not available when CONFIG_NET is not enabled.

socket.h already includes uio.h, so no callers need updating.

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-06-27 11:47:58 -07:00
Linus Torvalds
f40ede392d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix crash in ipvs tot_stats estimator, from Julian Anastasov.

 2) Fix OOPS in nf_nat on netns removal, from Florian Westphal.

 3) Really really really fix locking issues in slip and slcan tty write
    wakeups, from Tyler Hall.

 4) Fix checksum offloading in fec driver, from Fugang Duan.

 5) Off by one in BPF instruction limit test, from Kees Cook.

 6) Need to clear all TSO capability flags when doing software TSO in
    tg3 driver, from Prashant Sreedharan.

 7) Fix memory leak in vlan_reorder_header() error path, from Li
    RongQing.

 8) Fix various bugs in xen-netfront and xen-netback multiqueue support,
    from David Vrabel and Wei Liu.

 9) Fix deadlock in cxgb4 driver, from Li RongQing.

10) Prevent double free of no-cache DST entries, from Eric Dumazet.

11) Bad csum_start handling in skb_segment() leads to crashes when
    forwarding, from Tom Herbert.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits)
  net: fix setting csum_start in skb_segment()
  ipv4: fix dst race in sk_dst_get()
  net: filter: Use kcalloc/kmalloc_array to allocate arrays
  trivial: net: filter: Change kerneldoc parameter order
  trivial: net: filter: Fix typo in comment
  net: allwinner: emac: Add missing free_irq
  cxgb4: use dev_port to identify ports
  xen-netback: bookkeep number of active queues in our own module
  tg3: Change nvram command timeout value to 50ms
  cxgb4: Not need to hold the adap_rcu_lock lock when read adap_rcu_list
  be2net: fix qnq mode detection on VFs
  of: mdio: fixup of_phy_register_fixed_link parsing of new bindings
  at86rf230: fix irq setup
  net: phy: at803x: fix coccinelle warnings
  net/mlx4_core: Fix the error flow when probing with invalid VF configuration
  tulip: Poll link status more frequently for Comet chips
  net: huawei_cdc_ncm: increase command buffer size
  drivers: net: cpsw: fix dual EMAC stall when connected to same switch
  xen-netfront: recreate queues correctly when reconnecting
  xen-netfront: fix oops when disconnected from backend
  ...
2014-06-25 21:08:24 -07:00
Tom Herbert
de843723f9 net: fix setting csum_start in skb_segment()
Dave Jones reported that a crash is occurring in

csum_partial
tcp_gso_segment
inet_gso_segment
? update_dl_migration
skb_mac_gso_segment
__skb_gso_segment
dev_hard_start_xmit
sch_direct_xmit
__dev_queue_xmit
? dev_hard_start_xmit
dev_queue_xmit
ip_finish_output
? ip_output
ip_output
ip_forward_finish
ip_forward
ip_rcv_finish
ip_rcv
__netif_receive_skb_core
? __netif_receive_skb_core
? trace_hardirqs_on
__netif_receive_skb
netif_receive_skb_internal
napi_gro_complete
? napi_gro_complete
dev_gro_receive
? dev_gro_receive
napi_gro_receive

It looks like a likely culprit is that SKB_GSO_CB()->csum_start is
not set correctly when doing non-scatter gather. We are using
offset as opposed to doffset.

Reported-by: Dave Jones <davej@redhat.com>
Tested-by: Dave Jones <davej@redhat.com>
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 7e2b10c1e5 ("net: Support for multiple checksums with gso")
Acked-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-25 20:45:54 -07:00
Eric Dumazet
f886497212 ipv4: fix dst race in sk_dst_get()
When IP route cache had been removed in linux-3.6, we broke assumption
that dst entries were all freed after rcu grace period. DST_NOCACHE
dst were supposed to be freed from dst_release(). But it appears
we want to keep such dst around, either in UDP sockets or tunnels.

In sk_dst_get() we need to make sure dst refcount is not 0
before incrementing it, or else we might end up freeing a dst
twice.

DST_NOCACHE set on a dst does not mean this dst can not be attached
to a socket or a tunnel.

Then, before actual freeing, we need to observe a rcu grace period
to make sure all other cpus can catch the fact the dst is no longer
usable.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dormando <dormando@rydia.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-25 17:41:44 -07:00
Tobias Klauser
99e72a0fed net: filter: Use kcalloc/kmalloc_array to allocate arrays
Use kcalloc/kmalloc_array to make it clear we're allocating arrays. No
integer overflow can actually happen here, since len/flen is guaranteed
to be less than BPF_MAXINSNS (4096). However, this changed makes sure
we're not going to get one if BPF_MAXINSNS were ever increased.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-25 16:40:02 -07:00
Tobias Klauser
677a9fd3e6 trivial: net: filter: Change kerneldoc parameter order
Change the order of the parameters to sk_unattached_filter_create() in
the kerneldoc to reflect the order they appear in the actual function.

This fix is only cosmetic, in the generated doc they still appear in the
correct order without the fix.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-25 16:38:54 -07:00
Tobias Klauser
285276e72c trivial: net: filter: Fix typo in comment
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-25 16:38:54 -07:00
Andy Adamson
66b0686049 NFSv4: test SECINFO RPC_AUTH_GSS pseudoflavors for support
Fix nfs4_negotiate_security to create an rpc_clnt used to test each SECINFO
returned pseudoflavor. Check credential creation  (and gss_context creation)
which is important for RPC_AUTH_GSS pseudoflavors which can fail for multiple
reasons including mis-configuration.

Don't call nfs4_negotiate in nfs4_submount as it was just called by
nfs4_proc_lookup_mountpoint (nfs4_proc_lookup_common)

Signed-off-by: Andy Adamson <andros@netapp.com>
[Trond: fix corrupt return value from nfs_find_best_sec()]
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-24 18:46:58 -04:00
Li RongQing
916c1689a0 8021q: fix a potential memory leak
skb_cow called in vlan_reorder_header does not free the skb when it failed,
and vlan_reorder_header returns NULL to reset original skb when it is called
in vlan_untag, lead to a memory leak.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-21 15:12:13 -07:00
David S. Miller
1b0608fd9b Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:

====================
pull request: wireless 2014-06-18

Please pull this batch of fixes intended for the 3.16 stream!

For the Bluetooth bits, Gustavo says:

"This is our first batch of fixes for 3.16. Be aware that two patches here
are not exactly bugfixes:

* 71f28af57066 Bluetooth: Add clarifying comment for conn->auth_type
This commit just add some important security comments to the code, we found
it important enough to include it here for 3.16 since it is security related.

* 9f7ec8871132 Bluetooth: Refactor discovery stopping into its own function
This commit is just a refactor in a preparation for a fix in the next
commit (f8680f128b).

All the other patches are fixes for deadlocks and for the Bluetooth protocols,
most of them related to authentication and encryption."

On top of that...

Chin-Ran Lo fixes a problems with overlapping DMA areas in mwifiex.

Michael Braun corrects a couple of issues in order to enable a new
device in rt2800usb.

Rafał Miłecki reverts a b43 patch that caused a regression, fixes a
Kconfig typo, and corrects a frequency reporting error with the G-PHY.

Stanislaw Grsuzka fixes an rfkill regression for rt2500pci, and avoids
a rt2x00 scheduling while atomic BUG.

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-19 21:32:27 -07:00
Daniel Borkmann
24599e61b7 net: sctp: check proc_dointvec result in proc_sctp_do_auth
When writing to the sysctl field net.sctp.auth_enable, it can well
be that the user buffer we handed over to proc_dointvec() via
proc_sctp_do_auth() handler contains something other than integers.

In that case, we would set an uninitialized 4-byte value from the
stack to net->sctp.auth_enable that can be leaked back when reading
the sysctl variable, and it can unintentionally turn auth_enable
on/off based on the stack content since auth_enable is interpreted
as a boolean.

Fix it up by making sure proc_dointvec() returned sucessfully.

Fixes: b14878ccb7 ("net: sctp: cache auth_enable per endpoint")
Reported-by: Florian Westphal <fwestpha@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-19 21:30:19 -07:00
Neal Cardwell
2cd0d743b0 tcp: fix tcp_match_skb_to_sack() for unaligned SACK at end of an skb
If there is an MSS change (or misbehaving receiver) that causes a SACK
to arrive that covers the end of an skb but is less than one MSS, then
tcp_match_skb_to_sack() was rounding up pkt_len to the full length of
the skb ("Round if necessary..."), then chopping all bytes off the skb
and creating a zero-byte skb in the write queue.

This was visible now because the recently simplified TLP logic in
bef1909ee3 ("tcp: fixing TLP's FIN recovery") could find that 0-byte
skb at the end of the write queue, and now that we do not check that
skb's length we could send it as a TLP probe.

Consider the following example scenario:

 mss: 1000
 skb: seq: 0 end_seq: 4000  len: 4000
 SACK: start_seq: 3999 end_seq: 4000

The tcp_match_skb_to_sack() code will compute:

 in_sack = false
 pkt_len = start_seq - TCP_SKB_CB(skb)->seq = 3999 - 0 = 3999
 new_len = (pkt_len / mss) * mss = (3999/1000)*1000 = 3000
 new_len += mss = 4000

Previously we would find the new_len > skb->len check failing, so we
would fall through and set pkt_len = new_len = 4000 and chop off
pkt_len of 4000 from the 4000-byte skb, leaving a 0-byte segment
afterward in the write queue.

With this new commit, we notice that the new new_len >= skb->len check
succeeds, so that we return without trying to fragment.

Fixes: adb92db857 ("tcp: Make SACK code to split only at mss boundaries")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Ilpo Jarvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-19 20:50:49 -07:00
David S. Miller
8e4946ccdc Revert "net: return actual error on register_queue_kobjects"
This reverts commit d36a4f4b47.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-19 18:12:15 -07:00
Kees Cook
6f9a093b66 net: filter: fix upper BPF instruction limit
The original checks (via sk_chk_filter) for instruction count uses ">",
not ">=", so changing this in sk_convert_filter has the potential to break
existing seccomp filters that used exactly BPF_MAXINSNS many instructions.

Fixes: bd4cf0ed33 ("net: filter: rework/optimize internal BPF interpreter's instruction set")
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org # v3.15+
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-18 17:04:15 -07:00
Daniel Borkmann
ff5e92c1af net: sctp: propagate sysctl errors from proc_do* properly
sysctl handler proc_sctp_do_hmac_alg(), proc_sctp_do_rto_min() and
proc_sctp_do_rto_max() do not properly reflect some error cases
when writing values via sysctl from internal proc functions such
as proc_dointvec() and proc_dostring().

In all these cases we pass the test for write != 0 and partially
do additional work just to notice that additional sanity checks
fail and we return with hard-coded -EINVAL while proc_do*
functions might also return different errors. So fix this up by
simply testing a successful return of proc_do* right after
calling it.

This also allows to propagate its return value onwards to the user.
While touching this, also fix up some minor style issues.

Fixes: 4f3fdf3bc5 ("sctp: add check rto_min and rto_max in sysctl")
Fixes: 3c68198e75 ("sctp: Make hmac algorithm selection for cookie generation dynamic")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-18 17:03:07 -07:00
Jie Liu
d36a4f4b47 net: return actual error on register_queue_kobjects
Return the actual error code if call kset_create_and_add() failed

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-18 16:58:40 -07:00
David S. Miller
3a3ec1b2ba Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
netfilter fixes for net

The following patchset contains netfilter updates for your net tree,
they are:

1) Fix refcount leak when dumping the dying/unconfirmed conntrack lists,
   from Florian Westphal.

2) Fix crash in NAT when removing a netnamespace, also from Florian.

3) Fix a crash in IPVS when trying to remove an estimator out of the
   sysctl scope, from Julian Anastasov.

4) Add zone attribute to the routing to calculate the message size in
   ctnetlink events, from Ken-ichirou MATSUZAWA.

5) Another fix for the dying/unconfirmed list which was preventing to
   dump more than one memory page of entries (~17 entries in x86_64).

6) Fix missing RCU-safe list insertion in the rule replacement code
   in nf_tables.

7) Since the new transaction infrastructure is in place, we have to
   upgrade the chain use counter from u16 to u32 to avoid overflow
   after more than 2^16 rules are added.

8) Fix refcount leak when replacing rule in nf_tables. This problem
   was also introduced in new transaction.

9) Call the ->destroy() callback when releasing nft-xt rules to fix
   module refcount leaks.

10) Set the family in the netlink messages that contain set elements
    in nf_tables to make it consistent with other object types.

11) Don't dump NAT port information if it is unset in nft_nat.

12) Update the MAINTAINERS file, I have merged the ebtables entry
    into netfilter. While at it, also removed the netfilter users
    mailing list, the development list should be enough.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-18 16:08:40 -07:00
John W. Linville
2ee3f63d39 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2014-06-18 14:39:25 -04:00
John W. Linville
7f4dbaa3ae Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth 2014-06-17 14:08:47 -04:00
Dave Jones
17846376f2 tcp: remove unnecessary tcp_sk assignment.
This variable is overwritten by the child socket assignment before
it ever gets used.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-16 21:35:00 -07:00
Florian Westphal
945b2b2d25 netfilter: nf_nat: fix oops on netns removal
Quoting Samu Kallio:

 Basically what's happening is, during netns cleanup,
 nf_nat_net_exit gets called before ipv4_net_exit. As I understand
 it, nf_nat_net_exit is supposed to kill any conntrack entries which
 have NAT context (through nf_ct_iterate_cleanup), but for some
 reason this doesn't happen (perhaps something else is still holding
 refs to those entries?).

 When ipv4_net_exit is called, conntrack entries (including those
 with NAT context) are cleaned up, but the
 nat_bysource hashtable is long gone - freed in nf_nat_net_exit. The
 bug happens when attempting to free a conntrack entry whose NAT hash
 'prev' field points to a slot in the freed hash table (head for that
 bin).

We ignore conntracks with null nat bindings.  But this is wrong,
as these are in bysource hash table as well.

Restore nat-cleaning for the netns-is-being-removed case.

bug:
https://bugzilla.kernel.org/show_bug.cgi?id=65191

Fixes: c2d421e171 ('netfilter: nf_nat: fix race when unloading protocol modules')
Reported-by: Samu Kallio <samu.kallio@aberdeencloud.com>
Debugged-by: Samu Kallio <samu.kallio@aberdeencloud.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Tested-by: Samu Kallio <samu.kallio@aberdeencloud.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-16 13:58:54 +02:00
Ken-ichirou MATSUZAWA
4a001068d7 netfilter: ctnetlink: add zone size to length
Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-16 13:53:03 +02:00
Pablo Neira Ayuso
98ca74f4d5 Merge branch 'ipvs'
Simon Horman says:

====================
Fix for panic due use of tot_stats estimator outside of CONFIG_SYSCTL

It has been present since v3.6.39.
====================

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-16 13:22:33 +02:00
Pablo Neira Ayuso
915136065b netfilter: nft_nat: don't dump port information if unset
Don't include port information attributes if they are unset.

Reported-by: Ana Rey <anarey@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-16 13:08:14 +02:00
Pablo Neira Ayuso
6403d96254 netfilter: nf_tables: indicate family when dumping set elements
Set the nfnetlink header that indicates the family of this element.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-16 13:08:09 +02:00
Pablo Neira Ayuso
3d9b142131 netfilter: nft_compat: call {target, match}->destroy() to cleanup entry
Otherwise, the reference to external objects (eg. modules) are not
released when the rules are removed.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-16 13:08:04 +02:00
Pablo Neira Ayuso
ac904ac835 netfilter: nf_tables: fix wrong type in transaction when replacing rules
In b380e5c ("netfilter: nf_tables: add message type to transactions"),
I used the wrong message type in the rule replacement case. The rule
that is replaced needs to be handled as a deleted rule.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-16 13:07:58 +02:00
Pablo Neira Ayuso
ac34b86197 netfilter: nf_tables: decrement chain use counter when replacing rules
Thus, the chain use counter remains with the same value after the
rule replacement.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-16 13:07:50 +02:00
Pablo Neira Ayuso
a0a7379e16 netfilter: nf_tables: use u32 for chain use counter
Since 4fefee5 ("netfilter: nf_tables: allow to delete several objects
from a batch"), every new rule bumps the chain use counter. However,
this is limited to 16 bits, which means that it will overrun after
2^16 rules.

Use a u32 chain counter and check for overflows (just like we do for
table objects).

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-16 13:07:44 +02:00
Pablo Neira Ayuso
5bc5c30765 netfilter: nf_tables: use RCU-safe list insertion when replacing rules
The patch 5e94846 ("netfilter: nf_tables: add insert operation") did
not include RCU-safe list insertion when replacing rules.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-16 13:07:29 +02:00
Florian Westphal
cd5f336f17 netfilter: ctnetlink: fix refcnt leak in dying/unconfirmed list dumper
'last' keeps track of the ct that had its refcnt bumped during previous
dump cycle.  Thus it must not be overwritten until end-of-function.

Another (unrelated, theoretical) issue: Don't attempt to bump refcnt of a conntrack
whose reference count is already 0.  Such conntrack is being destroyed
right now, its memory is freed once we release the percpu dying spinlock.

Fixes: b7779d06 ('netfilter: conntrack: spinlock per cpu to protect special lists.')
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-16 12:51:36 +02:00
Pablo Neira Ayuso
266155b2de netfilter: ctnetlink: fix dumping of dying/unconfirmed conntracks
The dumping prematurely stops, it seems the callback argument that
indicates that all entries have been dumped is set after iterating
on the first cpu list. The dumping also may stop before the entire
per-cpu list content is also dumped.

With this patch, conntrack -L dying now shows the dying list content
again.

Fixes: b7779d06 ("netfilter: conntrack: spinlock per cpu to protect special lists.")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-16 12:51:35 +02:00
Linus Torvalds
a9be22425e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix checksumming regressions, from Tom Herbert.

 2) Undo unintentional permissions changes for SCTP rto_alpha and
    rto_beta sysfs knobs, from Denial Borkmann.

 3) VXLAN, like other IP tunnels, should advertize it's encapsulation
    size using dev->needed_headroom instead of dev->hard_header_len.
    From Cong Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  net: sctp: fix permissions for rto_alpha and rto_beta knobs
  vxlan: Checksum fixes
  net: add skb_pop_rcv_encapsulation
  udp: call __skb_checksum_complete when doing full checksum
  net: Fix save software checksum complete
  net: Fix GSO constants to match NETIF flags
  udp: ipv4: do not waste time in __udp4_lib_mcast_demux_lookup
  vxlan: use dev->needed_headroom instead of dev->hard_header_len
  MAINTAINERS: update cxgb4 maintainer
2014-06-15 16:37:03 -10:00
Daniel Borkmann
b58537a1f5 net: sctp: fix permissions for rto_alpha and rto_beta knobs
Commit 3fd091e73b ("[SCTP]: Remove multiple levels of msecs
to jiffies conversions.") has silently changed permissions for
rto_alpha and rto_beta knobs from 0644 to 0444. The purpose of
this was to discourage users from tweaking rto_alpha and
rto_beta knobs in production environments since they are key
to correctly compute rtt/srtt.

RFC4960 under section 6.3.1. RTO Calculation says regarding
rto_alpha and rto_beta under rule C3 and C4:

  [...]
  C3)  When a new RTT measurement R' is made, set

       RTTVAR <- (1 - RTO.Beta) * RTTVAR + RTO.Beta * |SRTT - R'|

       and

       SRTT <- (1 - RTO.Alpha) * SRTT + RTO.Alpha * R'

       Note: The value of SRTT used in the update to RTTVAR
       is its value before updating SRTT itself using the
       second assignment. After the computation, update
       RTO <- SRTT + 4 * RTTVAR.

  C4)  When data is in flight and when allowed by rule C5
       below, a new RTT measurement MUST be made each round
       trip. Furthermore, new RTT measurements SHOULD be
       made no more than once per round trip for a given
       destination transport address. There are two reasons
       for this recommendation: First, it appears that
       measuring more frequently often does not in practice
       yield any significant benefit [ALLMAN99]; second,
       if measurements are made more often, then the values
       of RTO.Alpha and RTO.Beta in rule C3 above should be
       adjusted so that SRTT and RTTVAR still adjust to
       changes at roughly the same rate (in terms of how many
       round trips it takes them to reflect new values) as
       they would if making only one measurement per
       round-trip and using RTO.Alpha and RTO.Beta as given
       in rule C3. However, the exact nature of these
       adjustments remains a research issue.
  [...]

While it is discouraged to adjust rto_alpha and rto_beta
and not further specified how to adjust them, the RFC also
doesn't explicitly forbid it, but rather gives a RECOMMENDED
default value (rto_alpha=3, rto_beta=2). We have a couple
of users relying on the old permissions before they got
changed. That said, if someone really has the urge to adjust
them, we could allow it with a warning in the log.

Fixes: 3fd091e73b ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-15 01:17:32 -07:00
Tom Herbert
46fb51eb96 net: Fix save software checksum complete
Geert reported issues regarding checksum complete and UDP.
The logic introduced in commit 7e3cead517
("net: Save software checksum complete") is not correct.

This patch:
1) Restores code in __skb_checksum_complete_header except for setting
   CHECKSUM_UNNECESSARY. This function may be calculating checksum on
   something less than skb->len.
2) Adds saving checksum to __skb_checksum_complete. The full packet
   checksum 0..skb->len is calculated without adding in pseudo header.
   This value is saved in skb->csum and then the pseudo header is added
   to that to derive the checksum for validation.
3) In both __skb_checksum_complete_header and __skb_checksum_complete,
   set skb->csum_valid to whether checksum of zero was computed. This
   allows skb_csum_unnecessary to return true without changing to
   CHECKSUM_UNNECESSARY which was done previously.
4) Copy new csum related bits in __copy_skb_header.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-15 01:00:49 -07:00
Eric Dumazet
63c6f81cdd udp: ipv4: do not waste time in __udp4_lib_mcast_demux_lookup
Its too easy to add thousand of UDP sockets on a particular bucket,
and slow down an innocent multicast receiver.

Early demux is supposed to be an optimization, we should avoid spending
too much time in it.

It is interesting to note __udp4_lib_demux_lookup() only tries to
match first socket in the chain.

10 is the threshold we already have in __udp4_lib_lookup() to switch
to secondary hash.

Fixes: 421b3885bf ("udp: ipv4: Add udp early demux")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: David Held <drheld@google.com>
Cc: Shawn Bohrer <sbohrer@rgmadvisors.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-13 15:39:24 -07:00
Marcin Kraglak
92d1372e1a Bluetooth: Allow change security level on ATT_CID in slave role
Kernel supports SMP Security Request so don't block increasing security
when we are slave.

Signed-off-by: Marcin Kraglak <marcin.kraglak@tieto.com>
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org
2014-06-13 14:36:39 +02:00
Johan Hedberg
c73f94b8c0 Bluetooth: Fix locking of hdev when calling into SMP code
The SMP code expects hdev to be unlocked since e.g. crypto functions
will try to (re)lock it. Therefore, we need to release the lock before
calling into smp.c from mgmt.c. Without this we risk a deadlock whenever
the smp_user_confirm_reply() function is called.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Tested-by: Lukasz Rymanowski <lukasz.rymanowski@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org
2014-06-13 13:32:29 +02:00