Commit Graph

23369 Commits

Author SHA1 Message Date
James Morris 898bfc1d46 Linux 3.4-rc5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQEcBAABAgAGBQJPnb50AAoJEHm+PkMAQRiGAE0H/A4zFZIUGmF3miKPDYmejmrZ
 oVDYxVAu6JHjHWhu8E3VsinvyVscowjV8dr15eSaQzmDmRkSHAnUQ+dB7Di7jLC2
 MNopxsWjwyZ8zvvr3rFR76kjbWKk/1GYytnf7GPZLbJQzd51om2V/TY/6qkwiDSX
 U8Tt7ihSgHAezefqEmWp2X/1pxDCEt+VFyn9vWpkhgdfM1iuzF39MbxSZAgqDQ/9
 JJrBHFXhArqJguhENwL7OdDzkYqkdzlGtS0xgeY7qio2CzSXxZXK4svT6FFGA8Za
 xlAaIvzslDniv3vR2ZKd6wzUwFHuynX222hNim3QMaYdXm012M+Nn1ufKYGFxI0=
 =4d4w
 -----END PGP SIGNATURE-----

Merge tag 'v3.4-rc5' into next

Linux 3.4-rc5

Merge to pull in prerequisite change for Smack:
86812bb0de

Requested by Casey.
2012-05-04 12:46:40 +10:00
Ansis Atteka 4cb6e116bb openvswitch: Release rtnl_lock if ovs_vport_cmd_build_info() failed.
This patch fixes a possible lock-up bug where rtnl_lock might not
get released.

Signed-off-by: Ansis Atteka <aatteka@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2012-05-03 18:40:38 -07:00
Linus Torvalds c42f1d4b52 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Transfer padding was wrong for full-speed USB in ASIX driver, fix
    from Ingo van Lil.

 2) Propagate the negative packet offset fix into the PowerPC BPF JIT.
    From Jan Seiffert.

 3) dl2k driver's private ioctls were letting unprivileged tasks make
    MII writes and other ugly bits like that.  Fix from Jeff Mahoney.

 4) Fix TX VLAN and RX packet drops in ucc_geth, from Joakim Tjernlund.

 5) OOPS and network namespace fixes in IPVS from Hans Schillstrom and
    Julian Anastasov.

 6) Fix races and sleeping in locked context bugs in drop_monitor, from
    Neil Horman.

 7) Fix link status indication in smsc95xx driver, from Paolo Pisati.

 8) Fix bridge netfilter OOPS, from Peter Huang.

 9) L2TP sendmsg can return on error conditions with the socket lock
    held, oops.  Fix from Sasha Levin.

10) udp_diag should return meaningful values for socket memory usage,
    from Shan Wei.

11) Eric Dumazet is so awesome he gets his own section:

       Socket memory cgroup code (I never should have applied those
       patches, grumble...) made erroneous changes to
       sk_sockets_allocated_read_positive().  It was changed to
       use percpu_counter_sum_positive (which requires BH disabling)
       instead of percpu_counter_read_positive (which does not).
       Revert back to avoid crashes and lockdep warnings.

       Adjust the default tcp_adv_win_scale and tcp_rmem[2] values
       to fix throughput regressions.  This is necessary as a result
       of our more precise skb->truesize tracking.

       Fix SKB leak in netem packet scheduler.

12) New device IDs for various bluetooth devices, from Manoj Iyer,
    AceLan Kao, and Steven Harms.

13) Fix command completion race in ipw2200, from Stanislav Yakovlev.

14) Fix rtlwifi oops on unload, from Larry Finger.

15) Fix hard_mtu when adjusting hard_header_len in smsc95xx driver.
    From Stephane Fillod.

16) ehea driver registers it's IRQ before all the necessary state is
    setup, resulting in crashes.  Fix from Thadeu Lima de Souza
    Cascardo.

17) Fix PHY connection failures in davinci_emac driver, from Anatolij
    Gustschin.

18) Missing break; in switch statement in bluetooth's
    hci_cmd_complete_evt().  Fix from Szymon Janc.

19) Fix queue programming in iwlwifi, from Johannes Berg.

20) Interrupt throttling defaults not being actually programmed into the
    hardware, fix from Jeff Kirsher and Ying Cai.

21) TLAN driver SKB encoding in descriptor busted on 64-bit, fix from
    Benjamin Poirier.

22) Fix blind status block RX producer pointer deref in TG3 driver, from
    Matt Carlson.

23) Promisc and multicast are busted on ehea, fixes from Thadeu Lima de
    Souza Cascardo.

24) Fix crashes in 6lowpan, from Alexander Smirnov.

25) tcp_complete_cwr() needs to be careful to not rewind the CWND to
    ssthresh if ssthresh has the "infinite" value.  Fix from Yuchung
    Cheng.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (81 commits)
  sungem: Fix WakeOnLan
  tcp: change tcp_adv_win_scale and tcp_rmem[2]
  net: l2tp: unlock socket lock before returning from l2tp_ip_sendmsg
  drop_monitor: prevent init path from scheduling on the wrong cpu
  usbnet: fix failure handling in usbnet_probe
  usbnet: fix leak of transfer buffer of dev->interrupt
  ucc_geth: Add 16 bytes to max TX frame for VLANs
  net: ucc_geth, increase no. of HW RX descriptors
  netem: fix possible skb leak
  sky2: fix receive length error in mixed non-VLAN/VLAN traffic
  sky2: propogate rx hash when packet is copied
  net: fix two typos in skbuff.h
  cxgb3: Don't call cxgb_vlan_mode until q locks are initialized
  ixgbe: fix calling skb_put on nonlinear skb assertion bug
  ixgbe: Fix a memory leak in IEEE DCB
  igbvf: fix the bug when initializing the igbvf
  smsc75xx: enable mac to detect speed/duplex from phy
  smsc75xx: declare smsc75xx's MII as GMII capable
  smsc75xx: fix phy interrupt acknowledge
  smsc75xx: fix phy init reset loop
  ...
2012-05-03 17:10:39 -07:00
Alexander Duyck 3a7c1ee4ab skb: Add skb_head_is_locked helper function
This patch adds support for a skb_head_is_locked helper function.  It is
meant to be used any time we are considering transferring the head from
skb->head to a paged frag.  If the head is locked it means we cannot remove
the head from the skb so it must be copied or we must take the skb as a
whole.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-03 13:18:37 -04:00
Eric Dumazet 715dc1f342 net: Fix truesize accounting in skb_gro_receive()
GRO is very optimistic in skb truesize estimates, only taking into
account the used part of fragments.

Be conservative, and use more precise computation, so that bloated GRO
skbs can be collapsed eventually.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-03 13:18:37 -04:00
Steve Dickson 7bdf7415a6 auth_gss: the list of pseudoflavors not being parsed correctly
gss_mech_list_pseudoflavors() parses a list of registered mechanisms.
On that list contains a list of pseudo flavors which was not being
parsed correctly, causing only the first pseudo flavor to be found.

Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-03 12:35:33 -04:00
Eric W. Biederman 76b6db0102 userns: Replace user_ns_map_uid and user_ns_map_gid with from_kuid and from_kgid
These function are no longer needed replace them with their more useful equivalents.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-03 03:28:39 -07:00
Eric W. Biederman ae2975bc34 userns: Convert group_info values from gid_t to kgid_t.
As a first step to converting struct cred to be all kuid_t and kgid_t
values convert the group values stored in group_info to always be
kgid_t values.   Unless user namespaces are used this change should
have no effect.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-03 03:27:21 -07:00
Alexander Duyck 34a802a5b9 tcp: move stats merge to the end of tcp_try_coalesce
This change cleans up the last bits of tcp_try_coalesce so that we only
need one goto which jumps to the end of the function.  The idea is to make
the code more readable by putting things in a linear order so that we start
execution at the top of the function, and end it at the bottom.

I also made a slight tweak to the code for handling frags when we are a
clone.  Instead of making it an if (clone) loop else nr_frags = 0 I changed
the logic so that if (!clone) we just set the number of frags to 0 which
disables the for loop anyway.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-03 04:21:33 -04:00
Alexander Duyck 57b55a7ec6 tcp: Move code related to head frag in tcp_try_coalesce
This change reorders the code related to the use of an skb->head_frag so it
is placed before we check the rest of the frags.  This allows the code to
read more linearly instead of like some sort of loop.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-03 04:21:33 -04:00
Alexander Duyck c73c3d9c49 tcp: Fix truesize accounting in tcp_try_coalesce
This patch addresses several issues in the way we were tracking the
truesize in tcp_try_coalesce.

First it was using ksize which prevents us from having a 0 sized head frag
and getting a usable result.  To resolve that this patch uses the end
pointer which is set based off either ksize, or the frag_size supplied in
build_skb.  This allows us to compute the original truesize of the entire
buffer and remove that value leaving us with just what was added as pages.

The second issue was the use of skb->len if there is a mergeable head frag.
We should only need to remove the size of an data aligned sk_buff from our
current skb->truesize to compute the delta for a buffer with a reused head.
By using skb->len the value of truesize was being artificially reduced
which means that head frags could use more memory than buffers using
standard allocations.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-03 04:21:33 -04:00
David S. Miller 8c1ae10d79 net: Add missing linux/prefetch.h include to net/core/sock.c
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-03 02:25:55 -04:00
Alexander Duyck 2996d31f9f net: Stop decapitating clones that have a head_frag
This change is meant ot prevent stealing the skb->head to use as a page in
the event that the skb->head was cloned.  This allows the other clones to
track each other via shinfo->dataref.

Without this we break down to two methods for tracking the reference count,
one being dataref, the other being the page count.  As a result it becomes
difficult to track how many references there are to skb->head.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-03 01:34:37 -04:00
Eric Dumazet b081f85c29 net: implement tcp coalescing in tcp_queue_rcv()
Extend tcp coalescing implementing it from tcp_queue_rcv(), the main
receiver function when application is not blocked in recvmsg().

Function tcp_queue_rcv() is moved a bit to allow its call from
tcp_data_queue()

This gives good results especially if GRO could not kick, and if skb
head is a fragment.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-02 21:11:11 -04:00
Eric Dumazet 923dd347b8 net: take care of cloned skbs in tcp_try_coalesce()
Before stealing fragments or skb head, we must make sure skbs are not
cloned.

Alexander was worried about destination skb being cloned : In bridge
setups, a driver could be fooled if skb->data_len would not match skb
nr_frags.

If source skb is cloned, we must take references on pages instead.

Bug happened using tcpdump (if not using mmap())

Introduce kfree_skb_partial() helper to cleanup code.

Reported-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-02 21:11:11 -04:00
Eric Dumazet b49960a05e tcp: change tcp_adv_win_scale and tcp_rmem[2]
tcp_adv_win_scale default value is 2, meaning we expect a good citizen
skb to have skb->len / skb->truesize ratio of 75% (3/4)

In 2.6 kernels we (mis)accounted for typical MSS=1460 frame :
1536 + 64 + 256 = 1856 'estimated truesize', and 1856 * 3/4 = 1392.
So these skbs were considered as not bloated.

With recent truesize fixes, a typical MSS=1460 frame truesize is now the
more precise :
2048 + 256 = 2304. But 2304 * 3/4 = 1728.
So these skb are not good citizen anymore, because 1460 < 1728

(GRO can escape this problem because it build skbs with a too low
truesize.)

This also means tcp advertises a too optimistic window for a given
allocated rcvspace : When receiving frames, sk_rmem_alloc can hit
sk_rcvbuf limit and we call tcp_prune_queue()/tcp_collapse() too often,
especially when application is slow to drain its receive queue or in
case of losses (netperf is fast, scp is slow). This is a major latency
source.

We should adjust the len/truesize ratio to 50% instead of 75%

This patch :

1) changes tcp_adv_win_scale default to 1 instead of 2

2) increase tcp_rmem[2] limit from 4MB to 6MB to take into account
better truesize tracking and to allow autotuning tcp receive window to
reach same value than before. Note that same amount of kernel memory is
consumed compared to 2.6 kernels.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-02 21:08:58 -04:00
Sasha Levin 84768edbb2 net: l2tp: unlock socket lock before returning from l2tp_ip_sendmsg
l2tp_ip_sendmsg could return without releasing socket lock, making it all the
way to userspace, and generating the following warning:

[  130.891594] ================================================
[  130.894569] [ BUG: lock held when returning to user space! ]
[  130.897257] 3.4.0-rc5-next-20120501-sasha #104 Tainted: G        W
[  130.900336] ------------------------------------------------
[  130.902996] trinity/8384 is leaving the kernel with locks still held!
[  130.906106] 1 lock held by trinity/8384:
[  130.907924]  #0:  (sk_lock-AF_INET){+.+.+.}, at: [<ffffffff82b9503f>] l2tp_ip_sendmsg+0x2f/0x550

Introduced by commit 2f16270 ("l2tp: Fix locking in l2tp_ip.c").

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-02 21:04:33 -04:00
Neil Horman 4fdcfa1284 drop_monitor: prevent init path from scheduling on the wrong cpu
I just noticed after some recent updates, that the init path for the drop
monitor protocol has a minor error.  drop monitor maintains a per cpu structure,
that gets initalized from a single cpu.  Normally this is fine, as the protocol
isn't in use yet, but I recently made a change that causes a failed skb
allocation to reschedule itself .  Given the current code, the implication is
that this workqueue reschedule will take place on the wrong cpu.  If drop
monitor is used early during the boot process, its possible that two cpus will
access a single per-cpu structure in parallel, possibly leading to data
corruption.

This patch fixes the situation, by storing the cpu number that a given instance
of this per-cpu data should be accessed from.  In the case of a need for a
reschedule, the cpu stored in the struct is assigned the rescheule, rather than
the currently executing cpu

Tested successfully by myself.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-02 21:02:48 -04:00
Yuchung Cheng 750ea2bafa tcp: early retransmit: delayed fast retransmit
Implementing the advanced early retransmit (sysctl_tcp_early_retrans==2).
Delays the fast retransmit by an interval of RTT/4. We borrow the
RTO timer to implement the delay. If we receive another ACK or send
a new packet, the timer is cancelled and restored to original RTO
value offset by time elapsed.  When the delayed-ER timer fires,
we enter fast recovery and perform fast retransmit.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-02 20:56:10 -04:00
Yuchung Cheng eed530b6c6 tcp: early retransmit
This patch implements RFC 5827 early retransmit (ER) for TCP.
It reduces DUPACK threshold (dupthresh) if outstanding packets are
less than 4 to recover losses by fast recovery instead of timeout.

While the algorithm is simple, small but frequent network reordering
makes this feature dangerous: the connection repeatedly enter
false recovery and degrade performance. Therefore we implement
a mitigation suggested in the appendix of the RFC that delays
entering fast recovery by a small interval, i.e., RTT/4. Currently
ER is conservative and is disabled for the rest of the connection
after the first reordering event. A large scale web server
experiment on the performance impact of ER is summarized in
section 6 of the paper "Proportional Rate Reduction for TCP”,
IMC 2011. http://conferences.sigcomm.org/imc/2011/docs/p155.pdf

Note that Linux has a similar feature called THIN_DUPACK. The
differences are THIN_DUPACK do not mitigate reorderings and is only
used after slow start. Currently ER is disabled if THIN_DUPACK is
enabled. I would be happy to merge THIN_DUPACK feature with ER if
people think it's a good idea.

ER is enabled by sysctl_tcp_early_retrans:
  0: Disables ER

  1: Reduce dupthresh to packets_out - 1 when outstanding packets < 4.

  2: (Default) reduce dupthresh like mode 1. In addition, delay
     entering fast recovery by RTT/4.

Note: mode 2 is implemented in the third part of this patch series.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-02 20:56:10 -04:00
Yuchung Cheng 1fbc340514 tcp: early retransmit: tcp_enter_recovery()
This a prepartion patch that refactors the code to enter recovery
into a new function tcp_enter_recovery(). It's needed to implement
the delayed fast retransmit in ER.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-02 20:56:09 -04:00
John W. Linville 076e7779c0 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2012-05-01 14:14:05 -04:00
Eric Dumazet 116a0fc31c netem: fix possible skb leak
skb_checksum_help(skb) can return an error, we must free skb in this
case. qdisc_drop(skb, sch) can also be feeded with a NULL skb (if
skb_unshare() failed), so lets use this generic helper.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01 13:40:48 -04:00
Eric Dumazet e4ae004b84 netem: add ECN capability
Add ECN (Explicit Congestion Notification) marking capability to netem

tc qdisc add dev eth0 root netem drop 0.5 ecn

Instead of dropping packets, try to ECN mark them.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01 09:39:48 -04:00
Eric Dumazet e4cbb02a10 net: add a prefetch in socket backlog processing
TCP or UDP stacks have big enough latencies that prefetching next
pointer is worth it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01 09:39:48 -04:00
James Chapman 5dac94e109 l2tp: let iproute2 create L2TPv3 IP tunnels using IPv6
The netlink API lets users create unmanaged L2TPv3 tunnels using
iproute2. Until now, a request to create an unmanaged L2TPv3 IP
encapsulation tunnel over IPv6 would be rejected with
EPROTONOSUPPORT. Now that l2tp_ip6 implements sockets for L2TP IP
encapsulation over IPv6, we can add support for that tunnel type.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01 09:30:55 -04:00
Chris Elston a32e0eec70 l2tp: introduce L2TPv3 IP encapsulation support for IPv6
L2TPv3 defines an IP encapsulation packet format where data is carried
directly over IP (no UDP). The kernel already has support for L2TP IP
encapsulation over IPv4 (l2tp_ip). This patch introduces support for
L2TP IP encapsulation over IPv6.

The implementation is derived from ipv6/raw and ipv4/l2tp_ip.

Signed-off-by: Chris Elston <celston@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01 09:30:55 -04:00
Chris Elston a495f8364e ipv6: Export ipv6 functions for use by other protocols
For implementing other protocols on top of IPv6, such as L2TPv3's IP
encapsulation over ipv6, we'd like to call some IPv6 functions which
are not currently exported. This patch exports them.

Signed-off-by: Chris Elston <celston@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01 09:30:55 -04:00
Chris Elston f9bac8df90 l2tp: netlink api for l2tpv3 ipv6 unmanaged tunnels
This patch adds support for unmanaged L2TPv3 tunnels over IPv6 using
the netlink API. We already support unmanaged L2TPv3 tunnels over
IPv4. A patch to iproute2 to make use of this feature will be
submitted separately.

Signed-off-by: Chris Elston <celston@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01 09:30:55 -04:00
Chris Elston 2121c3f571 l2tp: show IPv6 addresses in l2tp debugfs file
If an L2TP tunnel uses IPv6, make sure the l2tp debugfs file shows the
IPv6 address correctly.

Signed-off-by: Chris Elston <celston@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01 09:30:55 -04:00
James Chapman b79585f537 l2tp: pppol2tp_connect() handles ipv6 sockaddr variants
Userspace uses connect() to associate a pppol2tp socket with a tunnel
socket. This needs to allow the caller to supply the new IPv6
sockaddr_pppol2tp structures if IPv6 is used.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01 09:30:55 -04:00
James Chapman c8657fd50a l2tp: remove unused stats from l2tp_ip socket
The l2tp_ip socket currently maintains packet/byte stats in its
private socket structure. But these counters aren't exposed to
userspace and so serve no purpose. The counters were also
smp-unsafe. So this patch just gets rid of the stats.

While here, change a couple of internal __u32 variables to u32.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01 09:30:54 -04:00
James Chapman de3c7a1827 l2tp: Use ip4_datagram_connect() in l2tp_ip_connect()
Cleanup the l2tp_ip code to make use of an existing ipv4 support function.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01 09:30:54 -04:00
James Chapman 5de7aee541 l2tp: fix locking of 64-bit counters for smp
L2TP uses 64-bit counters but since these are not updated atomically,
we need to make them safe for smp. This patch addresses that.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01 09:30:54 -04:00
Henrik Rydberg 8215d557e5 HID: Create a common generic driver
Move the hid drivers of the bus drivers to a common generic hid
driver, and make it a proper module. This ought to simplify device
handling moving forward.

Cc: Gustavo Padovan <gustavo@padovan.org>
Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-05-01 12:54:55 +02:00
Henrik Rydberg 070748ed0b HID: Create a generic device group
Devices that do not have a special driver are handled by the generic
driver. This patch does the same thing using device groups; Instead of
forcing a particular driver, the appropriate driver is picked up by
udev. As a consequence, one can now move a device from generic to
specific handling by a simple rebind. By adding a new device id to the
generic driver, the same thing can be done in reverse.

Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
Acked-by: Benjamin Tissoires <benjamin.tissoires@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-05-01 12:54:55 +02:00
David S. Miller b6d151bb82 Merge branch 'tipc_net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2012-04-30 21:42:30 -04:00
Eric Dumazet 1d0c0b328a net: makes skb_splice_bits() aware of skb->head_frag
__skb_splice_bits() can check if skb to be spliced has its skb->head
mapped to a page fragment, instead of a kmalloc() area.

If so we can avoid a copy of the skb head and get a reference on
underlying page.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Matt Carlson <mcarlson@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-30 21:35:50 -04:00
Eric Dumazet 329033f645 tcp: makes tcp_try_coalesce aware of skb->head_frag
TCP coalesce can check if skb to be merged has its skb->head mapped to a
page fragment, instead of a kmalloc() area.

We had to disable coalescing in this case, for performance reasons.

We 'upgrade' skb->head as a fragment in itself.

This reduces number of cache misses when user makes its copies, since a
less sk_buff are fetched.

This makes receive and ofo queues shorter and thus reduce cache line
misses in TCP stack.

This is a followup of patch "net: allow skb->head to be a page fragment"

Tested with tg3 nic, with GRO on or off. We can see "TCPRcvCoalesce"
counter being incremented.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Matt Carlson <mcarlson@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-30 21:35:49 -04:00
Eric Dumazet d7e8883cfc net: make GRO aware of skb->head_frag
GRO can check if skb to be merged has its skb->head mapped to a page
fragment, instead of a kmalloc() area.

We 'upgrade' skb->head as a fragment in itself

This avoids the frag_list fallback, and permits to build true GRO skb
(one sk_buff and up to 16 fragments), using less memory.

This reduces number of cache misses when user makes its copy, since a
single sk_buff is fetched.

This is a followup of patch "net: allow skb->head to be a page fragment"

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Matt Carlson <mcarlson@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-30 21:35:49 -04:00
Eric Dumazet d3836f21b0 net: allow skb->head to be a page fragment
skb->head is currently allocated from kmalloc(). This is convenient but
has the drawback the data cannot be converted to a page fragment if
needed.

We have three spots were it hurts :

1) GRO aggregation

 When a linear skb must be appended to another skb, GRO uses the
frag_list fallback, very inefficient since we keep all struct sk_buff
around. So drivers enabling GRO but delivering linear skbs to network
stack aren't enabling full GRO power.

2) splice(socket -> pipe).

 We must copy the linear part to a page fragment.
 This kind of defeats splice() purpose (zero copy claim)

3) TCP coalescing.

 Recently introduced, this permits to group several contiguous segments
into a single skb. This shortens queue lengths and save kernel memory,
and greatly reduce probabilities of TCP collapses. This coalescing
doesnt work on linear skbs (or we would need to copy data, this would be
too slow)

Given all these issues, the following patch introduces the possibility
of having skb->head be a fragment in itself. We use a new skb flag,
skb->head_frag to carry this information.

build_skb() is changed to accept a frag_size argument. Drivers willing
to provide a page fragment instead of kmalloc() data will set a non zero
value, set to the fragment size.

Then, on situations we need to convert the skb head to a frag in itself,
we can check if skb->head_frag is set and avoid the copies or various
fallbacks we have.

This means drivers currently using frags could be updated to avoid the
current skb->head allocation and reduce their memory footprint (aka skb
truesize). (thats 512 or 1024 bytes saved per skb). This also makes
bpf/netfilter faster since the 'first frag' will be part of skb linear
part, no need to copy data.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Matt Carlson <mcarlson@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-30 21:35:11 -04:00
Paul Gortmaker 617d3c7a50 tipc: compress out gratuitous extra carriage returns
Some of the comment blocks are floating in limbo between two
functions, or between blocks of code.  Delete the extra line
feeds between any comment and its associated following block
of code, to be consistent with the majority of the rest of
the kernel.  Also delete trailing newlines at EOF and fix
a couple trivial typos in existing comments.

This is a 100% cosmetic change with no runtime impact.  We get
rid of over 500 lines of non-code, and being blank line deletes,
they won't even show up as noise in git blame.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-30 15:53:56 -04:00
Felix Fietkau 66f2c99af3 mac80211: fix AP mode EAP tx for VLAN stations
EAP frames for stations in an AP VLAN are sent on the main AP interface
to avoid race conditions wrt. moving stations.
For that to work properly, sta_info_get_bss must be used instead of
sta_info_get when sending EAP packets.
Previously this was only done for cooked monitor injected packets, so
this patch adds a check for tx->skb->protocol to the same place.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Cc: stable@vger.kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-30 14:40:05 -04:00
Yuchung Cheng 1cebce36d6 tcp: fix infinite cwnd in tcp_complete_cwr()
When the cwnd reduction is done, ssthresh may be infinite
if TCP enters CWR via ECN or F-RTO. If cwnd is not undone, i.e.,
undo_marker is set, tcp_complete_cwr() falsely set cwnd to the
infinite ssthresh value. The correct operation is to keep cwnd
intact because it has been updated in ECN or F-RTO.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-30 13:44:39 -04:00
Herbert Xu bb63f1f8a0 bridge: Fix fatal typo in setup of multicast_querier_expired
Unfortunately it seems that I didn't properly test the case of
an expired external querier in the recent multicast bridge series.

The setup of the timer in that case is completely broken and leads
to a NULL-pointer dereference.  This patch fixes it.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-30 13:30:56 -04:00
David S. Miller 5414fc12e3 Merge branch 'master' of git://1984.lsi.us.es/net 2012-04-30 13:23:22 -04:00
David S. Miller d499bd2ee9 l2tp: Add missing net/net/ip6_checksum.h include.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-30 13:21:28 -04:00
Trond Myklebust cbbb34498f SUNRPC: RPC client must use the current utsname hostname string
Now that the rpc client is namespace aware, it needs to use the
utsname of the process that created it instead of using the
init_utsname. Both rpc_new_client and rpc_clone_client need to
be fixed.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
2012-04-30 11:58:51 -04:00
Pablo Neira Ayuso 6cf5185248 netfilter: xt_CT: fix wrong checking in the timeout assignment path
The current checking always succeeded. We have to check the first
character of the string to check that it's empty, thus, skipping
the timeout path.

This fixes the use of the CT target without the timeout option.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-04-30 10:40:36 +02:00
Hans Schillstrom 8537de8a7a ipvs: kernel oops - do_ip_vs_get_ctl
Change order of init so netns init is ready
when register ioctl and netlink.

Ver2
	Whitespace fixes and __init added.

Reported-by: "Ryan O'Hara" <rohara@redhat.com>
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-04-30 10:40:35 +02:00
Hans Schillstrom 582b8e3ead ipvs: take care of return value from protocol init_netns
ip_vs_create_timeout_table() can return NULL
All functions protocol init_netns is affected of this patch.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-04-30 10:40:35 +02:00
Hans Schillstrom 4b984cd50b ipvs: null check of net->ipvs in lblc(r) shedulers
Avoid crash when registering shedulers after
the IPVS core initialization for netns fails. Do this by
checking for present core (net->ipvs).

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-04-30 10:40:14 +02:00
Benjamin LaHaise d2cf336167 net/l2tp: add support for L2TP over IPv6 UDP
Now that encap_rcv() works on IPv6 UDP sockets, wire L2TP up to IPv6.
Support has been tested with and without hardware offloading.  This
version fixes the L2TP over localhost issue with incorrect checksums
being reported.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-28 22:21:51 -04:00
Benjamin LaHaise d7f3f62167 net/ipv6/udp: UDP encapsulation: introduce encap_rcv hook into IPv6
Now that the sematics of udpv6_queue_rcv_skb() match IPv4's
udp_queue_rcv_skb(), introduce the UDP encap_rcv() hook for IPv6.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-28 22:21:51 -04:00
Benjamin LaHaise cb80ef463d net/ipv6/udp: UDP encapsulation: move socket locking into udpv6_queue_rcv_skb()
In order to make sure that when the encap_rcv() hook is introduced it is
not called with the socket lock held, move socket locking from callers into
udpv6_queue_rcv_skb(), matching what happens in IPv4.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-28 22:21:51 -04:00
Benjamin LaHaise f7ad74fef3 net/ipv6/udp: UDP encapsulation: break backlog_rcv into __udpv6_queue_rcv_skb
This is the first step in reworking the IPv6 UDP code to be structured more
like the IPv4 UDP code.  This patch creates __udpv6_queue_rcv_skb() with
the equivalent sematics to __udp_queue_rcv_skb(), and wires it up to the
backlog_rcv method.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-28 22:21:50 -04:00
Jeffrin Jose cb75a36c8a net: Fixed a coding style issue related to spaces.
Fixed a coding style issue relating to spaces
in net/core/sock.c

Signed-off-by: Jeffrin Jose <ahiliation@yahoo.co.in>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-28 21:45:00 -04:00
Neil Horman 3885ca785a drop_monitor: Make updating data->skb smp safe
Eric Dumazet pointed out to me that the drop_monitor protocol has some holes in
its smp protections.  Specifically, its possible to replace data->skb while its
being written.  This patch corrects that by making data->skb an rcu protected
variable.  That will prevent it from being overwritten while a tracepoint is
modifying it.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: David Miller <davem@davemloft.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-28 02:18:48 -04:00
Neil Horman cde2e9a651 drop_monitor: fix sleeping in invalid context warning
Eric Dumazet pointed out this warning in the drop_monitor protocol to me:

[   38.352571] BUG: sleeping function called from invalid context at kernel/mutex.c:85
[   38.352576] in_atomic(): 1, irqs_disabled(): 0, pid: 4415, name: dropwatch
[   38.352580] Pid: 4415, comm: dropwatch Not tainted 3.4.0-rc2+ #71
[   38.352582] Call Trace:
[   38.352592]  [<ffffffff8153aaf0>] ? trace_napi_poll_hit+0xd0/0xd0
[   38.352599]  [<ffffffff81063f2a>] __might_sleep+0xca/0xf0
[   38.352606]  [<ffffffff81655b16>] mutex_lock+0x26/0x50
[   38.352610]  [<ffffffff8153aaf0>] ? trace_napi_poll_hit+0xd0/0xd0
[   38.352616]  [<ffffffff810b72d9>] tracepoint_probe_register+0x29/0x90
[   38.352621]  [<ffffffff8153a585>] set_all_monitor_traces+0x105/0x170
[   38.352625]  [<ffffffff8153a8ca>] net_dm_cmd_trace+0x2a/0x40
[   38.352630]  [<ffffffff8154a81a>] genl_rcv_msg+0x21a/0x2b0
[   38.352636]  [<ffffffff810f8029>] ? zone_statistics+0x99/0xc0
[   38.352640]  [<ffffffff8154a600>] ? genl_rcv+0x30/0x30
[   38.352645]  [<ffffffff8154a059>] netlink_rcv_skb+0xa9/0xd0
[   38.352649]  [<ffffffff8154a5f0>] genl_rcv+0x20/0x30
[   38.352653]  [<ffffffff81549a7e>] netlink_unicast+0x1ae/0x1f0
[   38.352658]  [<ffffffff81549d76>] netlink_sendmsg+0x2b6/0x310
[   38.352663]  [<ffffffff8150824f>] sock_sendmsg+0x10f/0x130
[   38.352668]  [<ffffffff8150abe0>] ? move_addr_to_kernel+0x60/0xb0
[   38.352673]  [<ffffffff81515f04>] ? verify_iovec+0x64/0xe0
[   38.352677]  [<ffffffff81509c46>] __sys_sendmsg+0x386/0x390
[   38.352682]  [<ffffffff810ffaf9>] ? handle_mm_fault+0x139/0x210
[   38.352687]  [<ffffffff8165b5bc>] ? do_page_fault+0x1ec/0x4f0
[   38.352693]  [<ffffffff8106ba4d>] ? set_next_entity+0x9d/0xb0
[   38.352699]  [<ffffffff81310b49>] ? tty_ldisc_deref+0x9/0x10
[   38.352703]  [<ffffffff8106d363>] ? pick_next_task_fair+0x63/0x140
[   38.352708]  [<ffffffff8150b8d4>] sys_sendmsg+0x44/0x80
[   38.352713]  [<ffffffff8165f8e2>] system_call_fastpath+0x16/0x1b

It stems from holding a spinlock (trace_state_lock) while attempting to register
or unregister tracepoint hooks, making in_atomic() true in this context, leading
to the warning when the tracepoint calls might_sleep() while its taking a mutex.
Since we only use the trace_state_lock to prevent trace protocol state races, as
well as hardware stat list updates on an rcu write side, we can just convert the
spinlock to a mutex to avoid this problem.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: David Miller <davem@davemloft.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-28 02:18:48 -04:00
John W. Linville 4dcc0637fc Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth 2012-04-27 15:16:43 -04:00
Stanislav Kinsbursky ea8cfa0679 SUNRPC: traverse clients tree on PipeFS event
v2: recursion was replaced by loop

If client is a clone, then it's parent can not be in the list.
But parent's Pipefs dentries have to be created and destroyed.

Note: event skip helper for clients introduced

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:00 -04:00
Stanislav Kinsbursky 37629b572c SUNRPC: set per-net PipeFS superblock before notification
There can be a case, when on MOUNT event RPC client (after it's dentries were
created) is not longer hold by anyone except notification callback.
I.e. on release this client will be destoroyed. And it's dentries have to be
destroyed as well. Which in turn requires per-net PipeFS superblock to be set.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:10:00 -04:00
Stanislav Kinsbursky 7aab449e5a SUNRPC: skip clients with program without PipeFS entries
1) This is sane.
2) Otherwise there will be soft lockup:

do {
	rpc_get_client_for_event (clnt->cl_dentry == NULL ==> choose)
	__rpc_pipefs_event (clnt->cl_program->pipe_dir_name == NULL ==> return)
} while (1)

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:09:59 -04:00
Stanislav Kinsbursky a4dff1bc49 SUNRPC: skip dead but not buried clients on PipeFS events
These clients can't be safely dereferenced if their counter in 0.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27 14:09:59 -04:00
Neal Cardwell 651913ce9d tcp: clean up use of jiffies in tcp_rcv_rtt_measure()
Clean up a reference to jiffies in tcp_rcv_rtt_measure() that should
instead reference tcp_time_stamp. Since the result of the subtraction
is passed into a function taking u32, this should not change any
behavior (and indeed the generated assembly does not change on
x86_64). However, it seems worth cleaning this up for consistency and
clarity (and perhaps to avoid bugs if this is copied and pasted
somewhere else).

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-27 12:34:39 -04:00
Allan Stephens aad585473f tipc: Reject payload messages with invalid message type
Adds check to ensure TIPC sockets reject incoming payload messages
that have an unrecognized message type.

Remove the old open question about whether TIPC_ERR_NO_PORT is
the proper return value.  It is appropriate here since there are
valid instances where another node can make use of the reply,
and at this point in time the host is already broadcasting TIPC
data, so there are no real security concerns.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-27 10:08:00 -04:00
Eric Dumazet 8298193012 net: cleanups in sock_setsockopt()
Use min_t()/max_t() macros, reformat two comments, use !!test_bit() to
match !!sock_flag()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-27 02:14:21 -04:00
hartleys feb50ac19e crush: include header for global symbols
Include the header to pickup the definitions of the global symbols.

Quiets the following sparse warnings:

warning: symbol 'crush_find_rule' was not declared. Should it be static?
warning: symbol 'crush_do_rule' was not declared. Should it be static?

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Sage Weil <sage@newdream.net>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-27 00:03:34 -04:00
Eric Dumazet 6746960140 ipv6: RTAX_FEATURE_ALLFRAG causes inefficient TCP segment sizing
Quoting Tore Anderson from :
https://bugzilla.kernel.org/show_bug.cgi?id=42572

When RTAX_FEATURE_ALLFRAG is set on a route, the effective TCP segment
size does not take into account the size of the IPv6 Fragmentation
header that needs to be included in outbound packets, causing every
transmitted TCP segment to be fragmented across two IPv6 packets, the
latter of which will only contain 8 bytes of actual payload.

RTAX_FEATURE_ALLFRAG is typically set on a route in response to
receving a ICMPv6 Packet Too Big message indicating a Path MTU of less
than 1280 bytes. 1280 bytes is the minimum IPv6 MTU, however ICMPv6
PTBs with MTU < 1280 are still valid, in particular when an IPv6
packet is sent to an IPv4 destination through a stateless translator.
Any ICMPv4 Need To Fragment packets originated from the IPv4 part of
the path will be translated to ICMPv6 PTB which may then indicate an
MTU of less than 1280.

The Linux kernel refuses to reduce the effective MTU to anything below
1280 bytes, instead it sets it to exactly 1280 bytes, and
RTAX_FEATURE_ALLFRAG is also set. However, the TCP segment size appears
to be set to 1240 bytes (1280 Path MTU - 40 bytes of IPv6 header),
instead of 1232 (additionally taking into account the 8 bytes required
by the IPv6 Fragmentation extension header).

This in turn results in rather inefficient transmission, as every
transmitted TCP segment now is split in two fragments containing
1232+8 bytes of payload.

After this patch, all the outgoing packets that includes a
Fragmentation header all are "atomic" or "non-fragmented" fragments,
i.e., they both have Offset=0 and More Fragments=0.

With help from David S. Miller

Reported-by: Tore Anderson <tore@fud.no>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Tom Herbert <therbert@google.com>
Tested-by: Tore Anderson <tore@fud.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-27 00:03:34 -04:00
Allan Stephens 8f17789693 tipc: Enhance error checking of published names
Consolidates validation of scope and name sequence range values into
a single routine where it applies both to local name publications
and to name publications issued by other nodes in the network. This
change means that the scope value for non-local publications is now
validated and the name sequence range for local publications is now
validated only once. Additionally, a publication attempt that fails
validation now creates an entry in the system log file only if debugging
capabilities have been enabled; this prevents the system log from being
cluttered up with messages caused by a defective application or network
node.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-26 18:15:48 -04:00
Allan Stephens f7fb9d20ad tipc: Create helper routine to delete unused name sequence structure
Replaces two identical chunks of code that delete an unused name
sequence structure from TIPC's name table with calls to a new routine
that performs this operation.

This change is cosmetic and doesn't impact the operation of TIPC.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-26 18:15:47 -04:00
Allan Stephens bbe6a295d0 tipc: remove redundant memset and stale comment from subscr.c
Eliminate code to zero-out the main topology service structure,
which is already zeroed-out.

Get rid of a comment documenting a field of the main topology
service structure that no longer exists.

Both are cosmetic changes with no impact on runtime behaviour.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-26 18:15:46 -04:00
Allan Stephens 2d98abb9fe tipc: Optimize initialization of network topology service
Initialization now occurs in the calling thread of control,
rather than being deferred to the TIPC tasklet.  With the
current codebase, the deferral is no longer necessary.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-26 18:15:45 -04:00
Allan Stephens eb3865a99d tipc: Enhance re-initialization of network topology service
Streamlines the job of re-initializing TIPC's network topology service
when a node's network address is first assigned. Rather than destroying
the topology server port and breaking its connections to existing
subscribers, TIPC now simply lets the service continue running (since
the change to the port identifier of each port used by the topology
service no longer impacts the flow of messages between the service and
its subscribers).

This enhancement means that applications that utilize the topology
service prior to the assignment of TIPC's network address no longer need
to re-establish their subscriptions when the address is finally assigned.

However, it is worth noting that any subsequent events for existing
subscriptions report the new port identifier of the publishing port,
rather than the original port identifier. (For example, a name that was
previously reported as being published by <0.0.0:ref> may be subsequently
withdrawn by <Z.C.N:ref>.)

This doesn't impact any of the existing known userspace in tipc-utils,
since (a) TIPC continues to treat references to the original port ID
correctly and (b) normal use cases assign an address before active use.

However if there does happen to be some rare/custom application out
there that was relying on this, they can simply bypass the enhancement
by issuing a subscription to {0,0} and break its connection to the
topology service, if an associated withdrawal event occurs.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-26 18:15:40 -04:00
Allan Stephens eb323b075a tipc: Optimize termination of configuration service
Termination no longer tests to see if the configuration service
port was successfully created or not. In the unlikely event that the
port was not created, attempting to delete the non-existent port is
detected gracefully and causes no harm.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-26 17:19:43 -04:00
Allan Stephens 861d3a0e5b tipc: Optimize initialization of configuration service
Initialization now occurs in the calling thread of control,
rather than being deferred to the TIPC tasklet.  With the
current codebase, the deferral is no longer necessary.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-26 17:19:42 -04:00
Allan Stephens a2cfd45b52 tipc: Optimize re-initialization of configuration service
Streamlines the job of re-initializing TIPC's configuration service
when a node's network address is first assigned. Rather than destroying
the configuration server port and then recreating it, TIPC now simply
withdraws the existing {0,<0.0.0>} name publication and creates a new
{0,<Z.C.N>} name publication that identifies the node's network address
to interested subscribers.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-26 17:19:07 -04:00
David S. Miller a85c9bb895 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next 2012-04-26 15:54:45 -04:00
John W. Linville d9b8ae6bd8 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
Conflicts:
	drivers/net/wireless/iwlwifi/iwl-testmode.c
2012-04-26 15:03:48 -04:00
Pavel Emelyanov de248a75c3 tcp repair: Fix unaligned access when repairing options (v2)
Don't pick __u8/__u16 values directly from raw pointers, but instead use
an array of structures of code:value pairs. This is OK, since the buffer
we take options from is not an skb memory, but a user-to-kernel one.

For those options which don't require any value now, require this to be
zero (for potential future extension of this API).

v2: Changed tcp_repair_opt to use two __u32-s as spotted by David Laight.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-26 06:13:51 -04:00
alex.bluesman.smirnov@gmail.com 2d319508a3 6lowpan: duplicate definition of IEEE802154_ALEN
The same macros is defined in 'include/net/af_ieee802154.h' and is called
IEEE802154_ADDR_LEN. No need another one, so remove it.

Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-26 06:01:09 -04:00
alex.bluesman.smirnov@gmail.com c2e94d73ea 6lowpan: move frame allocation code to a separate function
Separate frame allocation routine from data processing function.
This makes code more human readable and easier for understanding.

Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-26 06:01:09 -04:00
alex.bluesman.smirnov@gmail.com 768f7c7c12 6lowpan: add missing spin_lock_init()
Add missing spin_lock_init() for frames list lock.

Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-26 05:32:55 -04:00
alex.bluesman.smirnov@gmail.com 8deff4af87 6lowpan: clean up fragments list if module unloaded
Clean all the pending fragments and relative timers if 6lowpan link
is going to be deleted.

Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-26 05:32:55 -04:00
alex.bluesman.smirnov@gmail.com 0848e40430 6lowpan: fix segmentation fault caused by mlme request
Add nescesary mlme callbacks to satisfy "iz list" request from user space.
Due to 6lowpan device doesn't have its own phy, mlme implemented as a pipe
to a real phy to which 6lowpan is attached.

Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-26 05:32:55 -04:00
Julian Anastasov 39f618b4fd ipvs: reset ipvs pointer in netns
Make sure net->ipvs is reset on netns cleanup or failed
initialization. It is needed for IPVS applications to know that
IPVS core is not loaded in netns.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-04-26 15:26:35 +09:00
Julian Anastasov 8d08d71ce5 ipvs: add check in ftp for initialized core
Avoid crash when registering ip_vs_ftp after
the IPVS core initialization for netns fails. Do this by
checking for present core (net->ipvs).

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-04-26 15:26:35 +09:00
Linus Torvalds 2300fd67b4 NFS client bugfixes for Linux 3.4
Highlights include:
 - Fix NFSv4 infinite loops on open(O_TRUNC)
 - Fix an Oops and an infinite loop in the NFSv4 flock code
 - Don't register the PipeFS filesystem until it has been set up
 - Fix an Oops in nfs_try_to_update_request
 - Don't reuse NFSv4 open owners: fixes a bad sequence id storm.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJPlbzwAAoJEGcL54qWCgDy24oQALZE67vBft7M2j0BiWhVbV15
 YLbCf6x/h+0BJAkKWdrBaw7N6GX6OYBOX2SsmrBkzYf5mgHeju5+dH0CmRAR5xib
 5d+Lwxif1l+rABfdzzJf8gY1L1THyJCnfmarKKyYEJ5OC1pJyulKLanXSPzPfzlm
 APV5Jf6NM2WRgkCqzP6zf61NG0HbDSR7C//HQ3k21Sdt9XDLf5qLHBSuPIQ+BlZY
 EvpbERTtJgp7rPJsLQv1F2dgasDUQNg8G+tmZatGcqEiNxVyQ2YqwshaldOVqftv
 3Kocs6OW5C1ESj1dFJZmeMZ/+GSHjRJx8fpqHJjmCsh4kPGgFviQDdYwu4FDhhPI
 FZslC5nVi8JMTPNJAFmfvbwPQId/TSRPCWYO5PtW1LSfRT/+25b6M5duro1eGIbJ
 /FDoOCYQmepNOfobU9Q3roDWyNSLYFaUaMJUrccRcAuS3S2NEXisTAT49kmqa1Vm
 ZArOJBnXTgmGi30nKhqqLJ43P61ekhX0AQ6PycZAXkjeRlkQs7AAQbMJZMB2X0r9
 KtRCDPiH2NuR0FwxNMkMP4BXdsaY7Sz/xiSZXLOUf1SeWBiBtYoDdrQ3z67SGOeG
 qxI3qXXl0KC2+l2jnezcWhBf4CDpxftGIBi+rKWJt8stoYzbemB/M1lkoTCwrVzq
 8Gwyy0QTVzE9VkY77oVW
 =hQAK
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.4-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 - Fix NFSv4 infinite loops on open(O_TRUNC)
 - Fix an Oops and an infinite loop in the NFSv4 flock code
 - Don't register the PipeFS filesystem until it has been set up
 - Fix an Oops in nfs_try_to_update_request
 - Don't reuse NFSv4 open owners: fixes a bad sequence id storm.

* tag 'nfs-for-3.4-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4: Keep dropped state owners on the LRU list for a while
  NFSv4: Ensure that we don't drop a state owner more than once
  NFSv4: Ensure we do not reuse open owner names
  nfs: Enclose hostname in brackets when needed in nfs_do_root_mount
  NFS: put open context on error in nfs_flush_multi
  NFS: put open context on error in nfs_pagein_multi
  NFSv4: Fix open(O_TRUNC) and ftruncate() error handling
  NFSv4: Ensure that we check lock exclusive/shared type against open modes
  NFSv4: Ensure that the LOCK code sets exception->inode
  NFS: check for req==NULL in nfs_try_to_update_request cleanup
  SUNRPC: register PipeFS file system after pernet sybsystem
2012-04-25 21:38:44 -07:00
Shan Wei 8dcf01fc00 net: sock_diag_handler structs can be const
read only, so change it to const.

Signed-off-by: Shan Wei <davidshan@tencent.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-25 20:46:59 -04:00
Shan Wei 62ad6fcd74 udp_diag: implement idiag_get_info for udp/udplite to get queue information
When we use netlink to monitor queue information for udp socket,
idiag_rqueue and idiag_wqueue of inet_diag_msg are returned with 0.

Keep consistent with netstat, just return back allocated rmem/wmem size.

Signed-off-by: Shan Wei <davidshan@tencent.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-25 20:43:01 -04:00
Eric Dumazet 808db80a7e ipv6: call consume_skb() in frag/reassembly
Some kfree_skb() calls should be replaced by consume_skb() to avoid
drop_monitor/dropwatch false positives.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-25 20:39:46 -04:00
John Fastabend 081579840b net: dcb: add CEE notify calls
This adds code to trigger CEE events when an APP change or setall
command is made from user space. This simplifies user space code
significantly by creating a single interface to listen on that
works with both firmware and userland agents.

And if we end up with multiple agents this keeps every thing in
sync userland agents, firmware agents, and kernel notifier consumers.

For an example agent that listens for these events see:

https://github.com/jrfastab/cgdcbxd

cgdcbxd is a daemon used to monitor DCB netlink events and manage
the net_prio control group sub-system.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Shmulik Ravid <shmulikr@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-25 19:47:17 -04:00
Simo Sorce fc2952a2a9 SUNRPC: split upcall function to extract reusable parts
This is needed to share code between the current server upcall mechanism
and the new gssproxy upcall mechanism introduced in a following patch.

Signed-off-by: Simo Sorce <simo@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-04-25 14:29:32 -04:00
John W. Linville 395836282f Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2012-04-25 13:41:25 -04:00
Julian Anastasov 8f9b9a2fad ipvs: fix crash in ip_vs_control_net_cleanup on unload
commit 14e405461e (2.6.39)
("Add __ip_vs_control_{init,cleanup}_sysctl()")
introduced regression due to wrong __net_init for
__ip_vs_control_cleanup_sysctl. This leads to crash when
the ip_vs module is unloaded.

	Fix it by changing __net_init to __net_exit for
the function that is already renamed to ip_vs_control_net_cleanup_sysctl.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-04-25 11:16:30 +02:00
Sasha Levin 7118c07a84 ipvs: Verify that IP_VS protocol has been registered
The registration of a protocol might fail, there were no checks
and all registrations were assumed to be correct. This lead to
NULL ptr dereferences when apps tried registering.

For example:

[ 1293.226051] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[ 1293.227038] IP: [<ffffffff822aacb0>] tcp_register_app+0x60/0xb0
[ 1293.227038] PGD 391de067 PUD 6c20b067 PMD 0
[ 1293.227038] Oops: 0000 [#1] PREEMPT SMP
[ 1293.227038] CPU 1
[ 1293.227038] Pid: 19609, comm: trinity Tainted: G        W    3.4.0-rc1-next-20120405-sasha-dirty #57
[ 1293.227038] RIP: 0010:[<ffffffff822aacb0>]  [<ffffffff822aacb0>] tcp_register_app+0x60/0xb0
[ 1293.227038] RSP: 0018:ffff880038c1dd18  EFLAGS: 00010286
[ 1293.227038] RAX: ffffffffffffffc0 RBX: 0000000000001500 RCX: 0000000000010000
[ 1293.227038] RDX: 0000000000000000 RSI: ffff88003a2d5888 RDI: 0000000000000282
[ 1293.227038] RBP: ffff880038c1dd48 R08: 0000000000000000 R09: 0000000000000000
[ 1293.227038] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003a2d5668
[ 1293.227038] R13: ffff88003a2d5988 R14: ffff8800696a8ff8 R15: 0000000000000000
[ 1293.227038] FS:  00007f01930d9700(0000) GS:ffff88007ce00000(0000) knlGS:0000000000000000
[ 1293.227038] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1293.227038] CR2: 0000000000000018 CR3: 0000000065dfc000 CR4: 00000000000406e0
[ 1293.227038] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1293.227038] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1293.227038] Process trinity (pid: 19609, threadinfo ffff880038c1c000, task ffff88002dc73000)
[ 1293.227038] Stack:
[ 1293.227038]  ffff880038c1dd48 00000000fffffff4 ffff8800696aada0 ffff8800694f5580
[ 1293.227038]  ffffffff8369f1e0 0000000000001500 ffff880038c1dd98 ffffffff822a716b
[ 1293.227038]  0000000000000000 ffff8800696a8ff8 0000000000000015 ffff8800694f5580
[ 1293.227038] Call Trace:
[ 1293.227038]  [<ffffffff822a716b>] ip_vs_app_inc_new+0xdb/0x180
[ 1293.227038]  [<ffffffff822a7258>] register_ip_vs_app_inc+0x48/0x70
[ 1293.227038]  [<ffffffff822b2fea>] __ip_vs_ftp_init+0xba/0x140
[ 1293.227038]  [<ffffffff821c9060>] ops_init+0x80/0x90
[ 1293.227038]  [<ffffffff821c90cb>] setup_net+0x5b/0xe0
[ 1293.227038]  [<ffffffff821c9416>] copy_net_ns+0x76/0x100
[ 1293.227038]  [<ffffffff810dc92b>] create_new_namespaces+0xfb/0x190
[ 1293.227038]  [<ffffffff810dca21>] unshare_nsproxy_namespaces+0x61/0x80
[ 1293.227038]  [<ffffffff810afd1f>] sys_unshare+0xff/0x290
[ 1293.227038]  [<ffffffff8187622e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 1293.227038]  [<ffffffff82665539>] system_call_fastpath+0x16/0x1b
[ 1293.227038] Code: 89 c7 e8 34 91 3b 00 89 de 66 c1 ee 04 31 de 83 e6 0f 48 83 c6 22 48 c1 e6 04 4a 8b 14 26 49 8d 34 34 48 8d 42 c0 48 39 d6 74 13 <66> 39 58 58 74 22 48 8b 48 40 48 8d 41 c0 48 39 ce 75 ed 49 8d
[ 1293.227038] RIP  [<ffffffff822aacb0>] tcp_register_app+0x60/0xb0
[ 1293.227038]  RSP <ffff880038c1dd18>
[ 1293.227038] CR2: 0000000000000018
[ 1293.379284] ---[ end trace 364ab40c7011a009 ]---
[ 1293.381182] Kernel panic - not syncing: Fatal exception in interrupt

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-04-25 11:16:12 +02:00
Andrei Emeltchenko 94c514fe24 mac80211: Adds clean sdata helper
Adds hepler to clean sdata ieee80211_clean_sdata similar way as
ieee80211_setup_sdata is implemented. The function will be used by other
interfaces later.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-24 14:56:10 -04:00
Felix Fietkau 7e3ed02c6e mac80211: fix num_mcast_sta counting issues
Moving a STA to an AP VLAN prevents num_mcast_sta from being decremented
once the STA leaves, because sta->sdata changes. Fix this by checking
for AP VLANs as well.

Also exclude 4-addr VLAN stations from num_mcast_sta - remote 4-addr
stations ignore 3-address multicast frames anyway. In a typical bridge
configuration they receive the same packets as 4-address unicast.

This patch also fixes clearing the sdata->u.vlan.sta pointer when the
STA is removed from a 4-addr VLAN.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-24 14:54:28 -04:00
Felix Fietkau 030ef8f8a5 mac80211: rename AP variable num_sta_authorized to num_mcast_sta
It is only used to test for BSS multicast receivers.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-24 14:54:28 -04:00
Wey-Yi Guy be6bcabc79 mac80211: check for non-managed interface
Average beacon signal only keep tracked by managed interface,
give warning and return 0 for the others.

Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-24 14:54:27 -04:00
Eliad Peller afa762f687 mac80211: call ieee80211_mgd_stop() on interface stop
ieee80211_mgd_teardown() is called on netdev removal, which
occurs after the vif was already removed from the low-level
driver, resulting in the following warning:

[ 4809.014734] ------------[ cut here ]------------
[ 4809.019861] WARNING: at net/mac80211/driver-ops.h:12 ieee80211_bss_info_change_notify+0x200/0x2c8 [mac80211]()
[ 4809.030388] wlan0:  Failed check-sdata-in-driver check, flags: 0x4
[ 4809.036862] Modules linked in: wlcore_sdio(-) wl12xx wlcore mac80211 cfg80211 [last unloaded: cfg80211]
[ 4809.046849] [<c001bd4c>] (unwind_backtrace+0x0/0x12c)
[ 4809.055937] [<c047cf1c>] (dump_stack+0x20/0x24)
[ 4809.065385] [<c003e334>] (warn_slowpath_common+0x5c/0x74)
[ 4809.075589] [<c003e408>] (warn_slowpath_fmt+0x40/0x48)
[ 4809.088291] [<bf033630>] (ieee80211_bss_info_change_notify+0x200/0x2c8 [mac80211])
[ 4809.102844] [<bf067f84>] (ieee80211_destroy_auth_data+0x80/0xa4 [mac80211])
[ 4809.116276] [<bf068004>] (ieee80211_mgd_teardown+0x5c/0x74 [mac80211])
[ 4809.129331] [<bf043f18>] (ieee80211_teardown_sdata+0xb0/0xd8 [mac80211])
[ 4809.141595] [<c03b5e58>] (rollback_registered_many+0x228/0x2f0)
[ 4809.153056] [<c03b5f48>] (unregister_netdevice_many+0x28/0x50)
[ 4809.165696] [<bf041ea8>] (ieee80211_remove_interfaces+0xb4/0xdc [mac80211])
[ 4809.179151] [<bf032174>] (ieee80211_unregister_hw+0x50/0xf0 [mac80211])
[ 4809.191043] [<bf0bebb4>] (wlcore_remove+0x5c/0x7c [wlcore])
[ 4809.201491] [<c02c6918>] (platform_drv_remove+0x24/0x28)
[ 4809.212029] [<c02c4d50>] (__device_release_driver+0x8c/0xcc)
[ 4809.222738] [<c02c4e84>] (device_release_driver+0x30/0x3c)
[ 4809.233099] [<c02c4258>] (bus_remove_device+0x10c/0x128)
[ 4809.242620] [<c02c26f8>] (device_del+0x11c/0x17c)
[ 4809.252150] [<c02c6de0>] (platform_device_del+0x28/0x68)
[ 4809.263051] [<bf0df49c>] (wl1271_remove+0x3c/0x50 [wlcore_sdio])
[ 4809.273590] [<c03806b0>] (sdio_bus_remove+0x48/0xf8)
[ 4809.283754] [<c02c4d50>] (__device_release_driver+0x8c/0xcc)
[ 4809.293729] [<c02c4e2c>] (driver_detach+0x9c/0xc4)
[ 4809.303163] [<c02c3d7c>] (bus_remove_driver+0xc4/0xf4)
[ 4809.312973] [<c02c5a98>] (driver_unregister+0x70/0x7c)
[ 4809.323220] [<c03809c4>] (sdio_unregister_driver+0x24/0x2c)
[ 4809.334213] [<bf0df458>] (wl1271_exit+0x14/0x1c [wlcore_sdio])
[ 4809.344930] [<c009b1a4>] (sys_delete_module+0x228/0x2a8)
[ 4809.354734] ---[ end trace 515290ccf5feb522 ]---

Rename ieee80211_mgd_teardown() to ieee80211_mgd_stop(),
and call it on ieee80211_do_stop().

Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-24 14:42:42 -04:00
Szymon Janc 16cde9931b Bluetooth: Fix missing break in hci_cmd_complete_evt
Command complete event for HCI_OP_USER_PASSKEY_NEG_REPLY would result
in calling handler function also for HCI_OP_LE_SET_SCAN_PARAM. This
could result in undefined behaviour.

Signed-off-by: Szymon Janc <szymon.janc@tieto.com>
Signed-off-by: Gustavo Padovan <gustavo@padovan.org>
2012-04-24 11:38:41 -03:00
Paul Gortmaker 872f24dbc6 tipc: remove inline instances from C source files.
Untie gcc's hands and let it do what it wants within the
individual source files.  There are two files, node.c and
port.c -- only the latter effectively changes (gcc-4.5.2).
Objdump shows gcc deciding to not inline port_peernode().

Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-24 00:41:03 -04:00
Eric Dumazet bfb253c9b2 af_netlink: drop_monitor/dropwatch friendly
Need to consume_skb() instead of kfree_skb() in netlink_dump() and
netlink_unicast_kernel() to avoid false dropwatch positives.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-24 00:35:14 -04:00
Eric Dumazet 658cb354ed af_netlink: cleanups
netlink_destroy_callback() move to avoid forward reference

CodingStyle cleanups

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-24 00:35:14 -04:00
Eric Dumazet 38ba0a65fa net: skb_can_coalesce returns a boolean
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-24 00:18:02 -04:00
Peter Huang (Peng) a881e963c7 set fake_rtable's dst to NULL to avoid kernel Oops
bridge: set fake_rtable's dst to NULL to avoid kernel Oops

when bridge is deleted before tap/vif device's delete, kernel may
encounter an oops because of NULL reference to fake_rtable's dst.
Set fake_rtable's dst to NULL before sending packets out can solve
this problem.

v4 reformat, change br_drop_fake_rtable(skb) to {}

v3 enrich commit header

v2 introducing new flag DST_FAKE_RTABLE to dst_entry struct.

[ Use "do { } while (0)" for nop br_drop_fake_rtable()
  implementation -DaveM ]

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Peter Huang <peter.huangpeng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-24 00:16:24 -04:00
Eric Dumazet 783c175f90 tcp: tcp_try_coalesce returns a boolean
This clarifies code intention, as suggested by David.

Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23 23:36:58 -04:00
Eric Dumazet d7ccf7c0a0 net: make spd_fill_page() linear argument a bool
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23 23:35:04 -04:00
David S. Miller f24001941c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Fix merge between commit 3adadc08cc ("net ax25: Reorder ax25_exit to
remove races") and commit 0ca7a4c87d ("net ax25: Simplify and
cleanup the ax25 sysctl handling")

The former moved around the sysctl register/unregister calls, the
later simply removed them.

With help from Stephen Rothwell.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23 23:15:17 -04:00
David S. Miller a108d5f35a net: Use bool and remove inline in skb_splice_bits() code.
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23 23:06:11 -04:00
Eric Dumazet 41c73a0d44 net: speedup skb_splice_bits()
Commit 35f3d14db (pipe: add support for shrinking and growing pipes)
added a slowdown for splice(socket -> pipe), as we might grow the spd
used in skb_splice_bits() for each skb we process in splice() syscall.

Its not needed since skb lengths are capped. The default on-stack arrays
are more than enough.

Use MAX_SKB_FRAGS instead of PIPE_DEF_BUFFERS to describe the reasonable
limit per skb.

Add coalescing support to help splicing of GRO skbs built from linear
skbs (linked into frag_list)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23 23:01:35 -04:00
Eric Dumazet 1402d36601 tcp: introduce tcp_try_coalesce
commit c8628155ec (tcp: reduce out_of_order memory use) took care of
coalescing tcp segments provided by legacy devices (linear skbs)

We extend this idea to fragged skbs, as their truesize can be heavy.

ixgbe for example uses 256+1024+PAGE_SIZE/2 = 3328 bytes per segment.

Use this coalescing strategy for receive queue too.

This contributes to reduce number of tcp collapses, at minimal cost, and
reduces memory overhead and packets drops.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23 22:42:49 -04:00
Eric Dumazet da882c1f2e tcp: sk_add_backlog() is too agressive for TCP
While investigating TCP performance problems on 10Gb+ links, we found a
tcp sender was dropping lot of incoming ACKS because of sk_rcvbuf limit
in sk_add_backlog(), especially if receiver doesnt use GRO/LRO and sends
one ACK every two MSS segments.

A sender usually tweaks sk_sndbuf, but sk_rcvbuf stays at its default
value (87380), allowing a too small backlog.

A TCP ACK, even being small, can consume nearly same truesize space than
outgoing packets. Using sk_rcvbuf + sk_sndbuf as a limit makes sense and
is fast to compute.

Performance results on netperf, single flow, receiver with disabled
GRO/LRO : 7500 Mbits instead of 6050 Mbits, no more TCPBacklogDrop
increments at sender.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23 22:28:28 -04:00
Eric Dumazet f545a38f74 net: add a limit parameter to sk_add_backlog()
sk_add_backlog() & sk_rcvqueues_full() hard coded sk_rcvbuf as the
memory limit. We need to make this limit a parameter for TCP use.

No functional change expected in this patch, all callers still using the
old sk_rcvbuf limit.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23 22:28:28 -04:00
Bala Shanmugam 218d2e26dc cfg80211: Validate legacy rateset.
Legacy rates are not validated while configuring
tx rateset using iw. So below cmd is accepted by nl80211.
sudo iw wlan2 set bitrates legacy-2.4 1 2 3

Validate legacy rates and return
error if any rate in the rateset is not valid.

Signed-off-by: Bala Shanmugam <bkamatch@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-23 15:37:41 -04:00
Wey-Yi Guy 0d8a0a1728 mac80211: declare ieee80211_ave_rssi as EXPORT
ieee80211_ave_rssi need to be declare as export for driver to use it.

Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-23 15:37:41 -04:00
Javier Cardona 6ac95b5765 mac80211: fixup for mesh TSF adjustment latency in Toffset setpoint
The original patch defined the correction margin but did not apply it.

Signed-off-by: Shinichi Hotori <hotorinn@gmail.com>
Signed-off-by: Yu Niiro <yu.niiro@gmail.com>
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-23 15:37:41 -04:00
Thomas Pedersen aee286c2cf mac80211: fix STA channel width field
According to IEEE 802.11 8.4.2.59, set the "STA channel width" bit to 0
if transmitting STA is using a 20mhz channel.

Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-23 15:34:07 -04:00
Thomas Pedersen e76781e48f mac80211: don't set mesh peer ht caps if ht disabled
Blindly setting ht caps on a mesh peer's station entry would result in
MCS rates being used by the rate control algorithm even if no ht had
been configured. Fix this by checking the channel type before assigning
ht capabilites.

Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-23 15:34:07 -04:00
Thomas Pedersen f743ff4907 mac80211: refactor mesh peer rate handling
To avoid passing supp_rates and basic_rates around all the time, just
derive these when needed in mesh_matches_local() and mesh_peer_init().

Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-23 15:34:07 -04:00
Thomas Pedersen 54ab1ffb6c mac80211: refactor mesh peer initialization
This patch unifies the previous two paths toward mesh peer creation a
bit. It also fixes a bug where a peer's changing rates or HT mode
wouldn't register on leaving and then returning to the mesh with a sta
entry still present.

Also clean up locking and clear possibly stale ht cap.

Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-23 15:34:07 -04:00
Ben Greear 8a690674e0 mac80211: Support on-channel scan option.
This based on an idea posted by Stanislaw Gruszka,
though I accept full blame for the implementation!

This has been tested with ath9k.

The idea is to let users scan on the current operating
channel without interrupting normal traffic more than
absolutely necessary (changing power level might reset
some hardware, for instance).

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-23 15:28:33 -04:00
David S. Miller ac807fa8e6 tcp: Fix build warning after tcp_{v4,v6}_init_sock consolidation.
net/ipv4/tcp_ipv4.c: In function 'tcp_v4_init_sock':
net/ipv4/tcp_ipv4.c:1891:19: warning: unused variable 'tp' [-Wunused-variable]
net/ipv6/tcp_ipv6.c: In function 'tcp_v6_init_sock':
net/ipv6/tcp_ipv6.c:1836:19: warning: unused variable 'tp' [-Wunused-variable]

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23 03:21:58 -04:00
Neal Cardwell d135c522f1 tcp: fix TCP_MAXSEG for established IPv6 passive sockets
Commit f5fff5d forgot to fix TCP_MAXSEG behavior IPv6 sockets, so IPv6
TCP server sockets that used TCP_MAXSEG would find that the advmss of
child sockets would be incorrect. This commit mirrors the advmss logic
from tcp_v4_syn_recv_sock in tcp_v6_syn_recv_sock. Eventually this
logic should probably be shared between IPv4 and IPv6, but this at
least fixes this issue.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-22 17:09:35 -04:00
Eric Dumazet c06fff6e17 af_packet: packet_getsockopt() cleanup
Factorize code, since most fetched values are int type.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21 16:36:42 -04:00
Neal Cardwell 900f65d361 tcp: move duplicate code from tcp_v4_init_sock()/tcp_v6_init_sock()
This commit moves the (substantial) common code shared between
tcp_v4_init_sock() and tcp_v6_init_sock() to a new address-family
independent function, tcp_init_sock().

Centralizing this functionality should help avoid drift issues,
e.g. where the IPv4 side is updated without a corresponding update to
IPv6. There was already some drift: IPv4 initialized snd_cwnd to
TCP_INIT_CWND, while the IPv6 side was still initializing snd_cwnd to
2 (in this case it should not matter, since snd_cwnd is also
initialized in tcp_init_metrics(), but the general risks and
maintenance overhead remain).

When diffing the old and new code, note that new tcp_init_sock()
function uses the order of steps from the tcp_v4_init_sock()
implementation (the order is slightly different in
tcp_v6_init_sock()).

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21 16:36:42 -04:00
Eric Dumazet e66e9a3147 net: allow better page reuse in splice(sock -> pipe)
splice() from socket to pipe needs linear_to_page() helper to transfert
skb header to part of page.

We can reset the offset in the current sk->sk_sndmsg_page if we are the
last user of the page.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21 16:36:42 -04:00
Eric Dumazet bbe362be53 drop_monitor: allow more events per second
It seems there is a logic error in trace_drop_common(), since we store
only 64 drops, even if they are from same location.

This fix is a one liner, but we probably need more work to avoid useless
atomic dec/inc

Now I can watch 1 Mpps drops through dropwatch...

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21 16:28:38 -04:00
Eric Dumazet a74e910618 net: change big iov allocations
iov of more than 8 entries are allocated in sendmsg()/recvmsg() through
sock_kmalloc()

As these allocations are temporary only and small enough, it makes sense
to use plain kmalloc() and avoid sk_omem_alloc atomic overhead.

Slightly changed fast path to be even faster.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Mike Waychison <mikew@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21 16:24:20 -04:00
Pavel Emelyanov b139ba4e90 tcp: Repair connection-time negotiated parameters
There are options, which are set up on a socket while performing
TCP handshake. Need to resurrect them on a socket while repairing.
A new sockoption accepts a buffer and parses it. The buffer should
be CODE:VALUE sequence of bytes, where CODE is standard option
code and VALUE is the respective value.

Only 4 options should be handled on repaired socket.

To read 3 out of 4 of these options the TCP_INFO sockoption can be
used. An ability to get the last one (the mss_clamp) was added by
the previous patch.

Now the restore. Three of these options -- timestamp_ok, mss_clamp
and snd_wscale -- are just restored on a coket.

The sack_ok flags has 2 issues. First, whether or not to do sacks
at all. This flag is just read and set back. No other sack  info is
saved or restored, since according to the standart and the code
dropping all sack-ed segments is OK, the sender will resubmit them
again, so after the repair we will probably experience a pause in
connection. Next, the fack bit. It's just set back on a socket if
the respective sysctl is set. No collected stats about packets flow
is preserved. As far as I see (plz, correct me if I'm wrong) the
fack-based congestion algorithm survives dropping all of the stats
and repairs itself eventually, probably losing the performance for
that period.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21 15:52:25 -04:00
Pavel Emelyanov 5e6a3ce657 tcp: Report mss_clamp with TCP_MAXSEG option in repair mode
The mss_clamp is the only connection-time negotiated option which
cannot be obtained from the user space. Make the TCP_MAXSEG sockopt
report one in the repair mode.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21 15:52:25 -04:00
Pavel Emelyanov c0e88ff0f2 tcp: Repair socket queues
Reading queues under repair mode is done with recvmsg call.
The queue-under-repair set by TCP_REPAIR_QUEUE option is used
to determine which queue should be read. Thus both send and
receive queue can be read with this.

Caller must pass the MSG_PEEK flag.

Writing to queues is done with sendmsg call and yet again --
the repair-queue option can be used to push data into the
receive queue.

When putting an skb into receive queue a zero tcp header is
appented to its head to address the tcp_hdr(skb)->syn and
the ->fin checks by the (after repair) tcp_recvmsg. These
flags flags are both set to zero and that's why.

The fin cannot be met in the queue while reading the source
socket, since the repair only works for closed/established
sockets and queueing fin packet always changes its state.

The syn in the queue denotes that the respective skb's seq
is "off-by-one" as compared to the actual payload lenght. Thus,
at the rcv queue refill we can just drop this flag and set the
skb's sequences to precice values.

When the repair mode is turned off, the write queue seqs are
updated so that the whole queue is considered to be 'already sent,
waiting for ACKs' (write_seq = snd_nxt <= snd_una). From the
protocol POV the send queue looks like it was sent, but the data
between the write_seq and snd_nxt is lost in the network.

This helps to avoid another sockoption for setting the snd_nxt
sequence. Leaving the whole queue in a 'not yet sent' state (as
it will be after sendmsg-s) will not allow to receive any acks
from the peer since the ack_seq will be after the snd_nxt. Thus
even the ack for the window probe will be dropped and the
connection will be 'locked' with the zero peer window.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21 15:52:25 -04:00
Pavel Emelyanov ee9952831c tcp: Initial repair mode
This includes (according the the previous description):

* TCP_REPAIR sockoption

This one just puts the socket in/out of the repair mode.
Allowed for CAP_NET_ADMIN and for closed/establised sockets only.
When repair mode is turned off and the socket happens to be in
the established state the window probe is sent to the peer to
'unlock' the connection.

* TCP_REPAIR_QUEUE sockoption

This one sets the queue which we're about to repair. The
'no-queue' is set by default.

* TCP_QUEUE_SEQ socoption

Sets the write_seq/rcv_nxt of a selected repaired queue.
Allowed for TCP_CLOSE-d sockets only. When the socket changes
its state the other seq-s are changed by the kernel according
to the protocol rules (most of the existing code is actually
reused).

* Ability to forcibly bind a socket to a port

The sk->sk_reuse is set to SK_FORCE_REUSE.

* Immediate connect modification

The connect syscall initializes the connection, then directly jumps
to the code which finalizes it.

* Silent close modification

The close just aborts the connection (similar to SO_LINGER with 0
time) but without sending any FIN/RST-s to peer.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21 15:52:25 -04:00
Pavel Emelyanov 370816aef0 tcp: Move code around
This is just the preparation patch, which makes the needed for
TCP repair code ready for use.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21 15:52:25 -04:00
Pavel Emelyanov 4a17fd5229 sock: Introduce named constants for sk_reuse
Name them in a "backward compatible" manner, i.e. reuse or not
are still 1 and 0 respectively. The reuse value of 2 means that
the socket with it will forcibly reuse everyone else's port.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-21 15:52:25 -04:00
Eric W. Biederman 5f568e5afe net: Remove register_net_sysctl_table
All of the users have been converted to use registera_net_sysctl so we
no longer need register_net_sysctl.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:22:30 -04:00
Eric W. Biederman a5347fe36b net: Delete all remaining instances of ctl_path
We don't use struct ctl_path anymore so delete the exported constants.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:22:30 -04:00
Eric W. Biederman ec8f23ce0f net: Convert all sysctl registrations to register_net_sysctl
This results in code with less boiler plate that is a bit easier
to read.

Additionally stops us from using compatibility code in the sysctl
core, hastening the day when the compatibility code can be removed.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:22:30 -04:00
Eric W. Biederman f99e8f715a net: Convert nf_conntrack_proto to use register_net_sysctl
There isn't much advantage here except that strings paths are a bit
easier to read, and converting everything to them allows me to kill off
ctl_path.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:22:30 -04:00
Eric W. Biederman 8607ddb867 net ipv4: Convert devinet to use register_net_sysctl
Using an ascii path to register_net_sysctl as opposed to the slightly
awkward ctl_path allows for much simpler code.

We no longer need to malloc dev_name to keep it alive the length of our
sysctl register instead we can use a small temporary buffer on the
stack.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:22:30 -04:00
Eric W. Biederman 6105e29320 net ipv6: Convert addrconf to use register_net_sysctl
Using an ascii path to register_net_sysctl as opposed to the slightly
awkward ctl_path allows for much simpler code.

We no longer need to malloc dev_name to keep it alive the length of our
sysctl register instead we can use a small temporary buffer on the
stack.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:22:29 -04:00
Eric W. Biederman 9bdcc88fa0 net decnet: Convert to use register_net_sysctl
Using an ascii path to register_net_sysctl as opposed to the slightly
awkward ctl_path allows for much simpler code.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:22:29 -04:00
Eric W. Biederman 8f40a1f982 net neighbour: Convert to use register_net_sysctl
Using an ascii path to register_net_sysctl as opposed to the slightly
awkward ctl_path allows for much simpler code.

We no longer need to malloc dev_name to keep it alive the length of our
sysctl register instead we can use a small temporary buffer on the
stack.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:22:29 -04:00
Eric W. Biederman 6dceb03687 net ipv6: Don't use sysctl tables with .child entries.
The sysctl core no longer natively understands sysctl tables
with .child entries.

Split the ipv6_table to remove the .child entries.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:22:29 -04:00
Eric W. Biederman 64fb301040 net llc: Don't use sysctl tables with .child entries.
The sysctl core no longer natively understands sysctl tables with .child
entries.

Kill the intermediate tables and use register_net_sysctl directly to
remove the need for compatibility code.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:22:29 -04:00
Eric W. Biederman 0ca7a4c87d net ax25: Simplify and cleanup the ax25 sysctl handling.
Don't register/unregister every ax25 table in a batch.  Instead register
and unregister per device ax25 sysctls as ax25 devices come and go.

This moves ax25 to be a completely modern sysctl user.  Registering the
sysctls in just the initial network namespace, removing the use of
.child entries that are no longer natively supported by the sysctl core
and taking advantage of the fact that there are no longer any ordering
constraints between registering and unregistering different sysctl
tables.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:22:28 -04:00
Eric W. Biederman 4e5ca78541 net ipv4: Remove the unneeded registration of an empty net/ipv4/neigh
sysctl no longer requires explicit creation of directories.  The neigh
directory is always populated with at least a default entry so this
won't cause any user visible changes.

Delete the ipv4_path and the ipv4_skeleton these are no longer needed.

Directly register the ipv4_route_table.

And since I am an idiot remove the header definitions that I should
have removed in the previous patch.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:21:18 -04:00
Eric W. Biederman a5287acc6c net ipv6: Remove unneded registration of an empty net/ipv6/neigh
sysctl no longer requires explicit creation of directories.  The neigh
directory is always populated with at least a default entry so this
should cause no user visible changes.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:21:18 -04:00
Eric W. Biederman 45bad91498 net core: Remove unneded creation of an empty net/core sysctl directory
On the next line we register the net_core_table in net/core which
creates the directory and ensures it exists.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:21:18 -04:00
Eric W. Biederman 5dd3df105b net: Move all of the network sysctls without a namespace into init_net.
This makes it clearer which sysctls are relative to your current network
namespace.

This makes it a little less error prone by not exposing sysctls for the
initial network namespace in other namespaces.

This is the same way we handle all of our other network interfaces to
userspace and I can't honestly remember why we didn't do this for
sysctls right from the start.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:21:17 -04:00
Eric W. Biederman 4344475797 net: Kill register_sysctl_rotable
register_sysctl_rotable never caught on as an interesting way to
register sysctls.  My take on the situation is that what we want are
sysctls that we can only see in the initial network namespace.  What we
have implemented with register_sysctl_rotable are sysctls that we can
see in all of the network namespaces and can only change in the initial
network namespace.

That is a very silly way to go.  Just register the network sysctls
in the initial network namespace and we don't have any weird special
cases to deal with.

The sysctls affected are:
/proc/sys/net/ipv4/ipfrag_secret_interval
/proc/sys/net/ipv4/ipfrag_max_dist
/proc/sys/net/ipv6/ip6frag_secret_interval
/proc/sys/net/ipv6/mld_max_msf

I really don't expect anyone will miss them if they can't read them in a
child user namespace.

CC: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:21:17 -04:00
Eric W. Biederman 2ca794e5e8 net sysctl: Initialize the network sysctls sooner to avoid problems.
If the netfilter code is modified to use register_net_sysctl_table the
kernel fails to boot because the per net sysctl infrasturce is not setup
soon enough.  So to avoid races call net_sysctl_init from sock_init().

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:21:16 -04:00
Eric W. Biederman bc8a36942a net sysctl: Register an empty /proc/sys/net
Implementation limitations of the sysctl core won't let /proc/sys/net
reside in a network namespace.  /proc/sys/net at least must be registered
as a normal sysctl.  So register /proc/sys/net early as an empty directory
to guarantee we don't violate this constraint and hit bugs in the sysctl
implementation.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:21:16 -04:00
Eric W. Biederman ab41a2ca50 net: Implement register_net_sysctl.
Right now all of the networking sysctl registrations are running in a
compatibiity mode.  The natvie sysctl registration api takes a cstring
for a path and a simple ctl_table.  Implement register_net_sysctl so
that we can register network sysctls without needing to use
compatiblity code in the sysctl core.

Switching from a ctl_path to a cstring results in less boiler plate
and denser code that is a little easier to read.

I would simply have changed the arguments to register_net_sysctl_table
instead of keeping two functions in parallel but gcc will allow a
ctl_path pointer to be passed to a char * pointer with only issuing a
warning resulting in completely incorrect code can be built.  Since I
have to change the function name I am taking advantage of the situation
to let both register_net_sysctl and register_net_sysctl_table live for a
short time in parallel which makes clean conversion patches a bit easier
to read and write.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-20 21:21:15 -04:00
David S. Miller 167de77fd4 Merge branch 'tipc_net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2012-04-20 20:40:31 -04:00
Allan Stephens 9d52ce4bd3 tipc: Ensure network address change doesn't impact configuration service
Enhances command validation done by TIPC's configuration service so
that it works properly even if the node's network address is changed in
mid-operation. The default node address of <0.0.0> is now recognized as an
alias for "this node" even after a new network address has been assigned.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-19 15:46:50 -04:00
Allan Stephens 630d920dca tipc: Ensure network address change doesn't impact rejected message
Revises handling of a rejected message to ensure that a locally
originated message is returned properly even if the node's network
address is changed in mid-operation. The routine now treats the
default node address of <0.0.0> as an alias for "this node" when
determining where to send a returned message.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-19 15:46:49 -04:00
Allan Stephens 8a55fe74b1 tipc: handle <0.0.0> as an alias for this node on outgoing msgs
Revises handling of send routines for payload messages to ensure that
they are processed properly even if the node's network address is
changed in mid-operation. The routines now treat the default node
address of <0.0.0> as an alias for "this node" when determining where
to send an outgoing message.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-19 15:46:48 -04:00
Allan Stephens b8f683d126 tipc: properly handle off-node send requests with invalid addr
There are two send routines that might conceivably be asked by an
application to send a message off-node when the node is still using
the default network address.  These now have an added check that
detects this and rejects the message gracefully.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-19 15:46:47 -04:00
Allan Stephens 974a5a864b tipc: take lock while updating node network address
The routine that changes the node's network address now takes TIPC's
network lock in write mode while the main address variable and associated
data structures are being changed; this is needed to ensure that the
link subsystem won't attempt to send a message off-node until the sending
port's message header template has been updated with the node's new
network address.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-19 15:46:46 -04:00
Allan Stephens f0712e86b7 tipc: Ensure network address change doesn't impact local connections
Revises routines that deal with connections between two ports on
the same node to ensure the connection is not impacted if the node's
network address is changed in mid-operation. The routines now treat
the default node address of <0.0.0> as an alias for "this node" in
the following situations:

1) Incoming messages destined to a connected port now handle the alias
properly when validating that the message was sent by the expected
peer port, ensuring that the message will be accepted regardless of
whether it specifies the node's old network address or it's current one.

2) The code which completes connection establishment now handles the
alias properly when determining if the peer port is on the same node
as the connected port.

An added benefit of addressing issue 1) is that some peer port
validation code has been relocated to TIPC's socket subsystem, which
means that validation is no longer done twice when a message is
sent to a non-socket port (such as TIPC's configuration service or
network topology service).

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-19 15:46:45 -04:00
Allan Stephens d0e17fedc2 tipc: delete duplicate peerport/peernode helper functions
Prior to commit 23dd4cce38

    "tipc: Combine port structure with tipc_port structure"

there was a need for the two sets of helper functions.  But
now they are just duplicates.  Remove the globally visible
ones, and mark the remaining ones as inline.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-19 15:46:43 -04:00
Allan Stephens f21536d1e7 tipc: Ensure network address change doesn't impact new port
Re-orders port creation logic so that the initialization of a new
port's message header template occurs while the port list lock is
held. This ensures that a change to the node's network address that
occurs at the same time as the port is being created does not result
in the template identifying the sender using the former network
address. The new approach guarantees that the new port's template is
using the current network address or that it will be updated when
the address changes.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-19 15:46:42 -04:00
Allan Stephens 5eb0a291fb tipc: Optimize re-initialization of port message header templates
Removes an unnecessary check in the logic that updates the message
header template for existing ports when a node's network address is
first assigned. There is no longer any need to check to see if the
node's network address has actually changed since the calling routine
has already verified that this is so.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-19 15:46:41 -04:00
Allan Stephens d4f5c12cdf tipc: Ensure network address change doesn't impact name table updates
Revises routines that add and remove an entry from a node's name table
so that the publication scope lists are updated properly even if the
node's network address is changed in mid-operation. The routines now
recognize the default node address of <0.0.0> as an alias for "this node"
even after a new network address has been assigned.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-19 15:46:40 -04:00
Allan Stephens 336ebf5bf5 tipc: Add routines for safe checking of node's network address
Introduces routines that test whether a given network address is
equal to a node's own network address or if it lies within the node's
own network cluster, and which work properly regardless of whether
the node is using the default network address <0.0.0> or a non-zero
network address that is assigned later on. In essence, these routines
ensure that address <0.0.0> is treated as an alias for "this node",
regardless of which network address the node is actually using.

Old users of the pre-existing more strict match in_own_cluster()
have been accordingly redirected to what is now called
in_own_cluster_exact() --- which does not extend matching to <0,0,0>.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-19 15:46:39 -04:00
Allan Stephens fd6eced8a4 tipc: Don't record failed publication attempt as a success
No longer increments counter of number of publications by a node
if an attempt to add a new publication fails. This prevents TIPC from
incorrectly blocking future publications because the configured maximum
number of publications has been reached.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-19 15:46:37 -04:00
Allan Stephens 1110b8d33a tipc: Update node-scope publications when network address is assigned
Ensures that node-scope name publications that exist prior to the
configuration of a node's network address are properly re-initialized
with that address when it is assigned. TIPC's node-scope publications
are now tracked using a publications list like the lists used for
cluster-scope and zone-scope publications so they can be easily updated
when required.

The inclusion of node scope name publications in a conventional publication
list means that they must now also be withdrawn, just like cluster and zone
scope publications are currently withdrawn.  So some conditional tests on
scope ==/!= TIPC_NODE_SCOPE are inserted/removed accordingly.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-19 15:46:36 -04:00
Allan Stephens a909804f7c tipc: Separate cluster-scope and zone-scope names into distinct lists
Utilizes distinct lists to track zone-scope and cluster-scope names
published by a node. For now, TIPC continues to process the entries
in both lists in the same way; however, an upcoming patch will utilize
the existence of the lists to prevent the sending of cluster-scope names
to nodes that are not part of the local cluster.

To achieve this, an array of publication lists is introduced, so
that they can be iterated over and accessed via publ->scope as
an index where convenient.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-19 15:46:05 -04:00
Eric W. Biederman 3adadc08cc net ax25: Reorder ax25_exit to remove races.
While reviewing the sysctl code in ax25 I spotted races in ax25_exit
where it is possible to receive notifications and packets after already
freeing up some of the data structures needed to process those
notifications and updates.

Call unregister_netdevice_notifier early so that the rest of the cleanup
code does not need to deal with network devices.  This takes advantage
of my recent enhancement to unregister_netdevice_notifier to send
unregister notifications of all network devices that are current
registered.

Move the unregistration for packet types, socket types and protocol
types before we cleanup any of the ax25 data structures to remove the
possibilities of other races.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-19 15:37:48 -04:00
Eric Dumazet cbf8f7bb20 ipv4: dont drop packet in defrag but consume it
When defragmentation is finalized, we clone a packet and kfree_skb() it.

Call consume_skb() to not confuse dropwatch, since its not a drop.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-19 14:25:51 -04:00
Eric Dumazet daa8654828 net: gro: GRO_MERGED_FREE consumes packets
As part of GRO processing, merged skbs should be consumed, not freed, to
not confuse dropwatch/drop_monitor.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-19 14:23:56 -04:00
Eric Dumazet 85bb2a60fa net: dont drop packet but consume it
When we need to clone skb, we dont drop a packet.
Call consume_skb() to not confuse dropwatch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-19 14:23:55 -04:00
Eric Dumazet 7604adc2ff ipv6: dccp: dont drop packet but consume it
When we need to clone skb, we dont drop a packet.
Call consume_skb() to not confuse dropwatch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-19 14:23:55 -04:00
Eric Dumazet abc4e4fa29 packet: dont drop packet but consume it
When we need to clone skb, we dont drop a packet.
Call consume_skb() to not confuse dropwatch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-19 14:23:55 -04:00
Eric Dumazet ab185d7b25 ipv6: tcp: dont drop packet but consume it
When we need to clone skb, we dont drop a packet.
Call consume_skb() to not confuse dropwatch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-19 14:23:55 -04:00
Eric Dumazet 8460c00f6e netlink: dont drop packet but consume it
When we need to clone skb, we dont drop a packet.
Call consume_skb() to not confuse dropwatch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-19 14:23:55 -04:00
Eric Dumazet 9ff264492f ip6_tunnel: dont drop packet but consume it
When we need to reallocate skb, we dont drop a packet.
Call consume_skb() to not confuse dropwatch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-19 14:23:55 -04:00
Shan Wei 7426a5645f net: fix compile error of leaking kmemleak.h header
net/core/sysctl_net_core.c: In function ‘sysctl_core_init’:
net/core/sysctl_net_core.c:259: error: implicit declaration of function ‘kmemleak_not_leak’

with same error in net/ipv4/route.c

Signed-off-by: Shan Wei <davidshan@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-19 00:11:39 -04:00
Greg Kroah-Hartman 665ab0f3c8 Merge 3.4-rc3 into tty-next
This allows us to pick up some changes needed for other serial patches.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-18 15:57:31 -07:00
Eric Dumazet 22b4a4f22d tcp: fix retransmit of partially acked frames
Alexander Beregalov reported skb_over_panic errors and provided stack
trace.

I occurs commit a21d45726a (tcp: avoid order-1 allocations on wifi and
tx path) added a regression, when a retransmit is done after a partial
ACK.

tcp_retransmit_skb() tries to aggregate several frames if the first one
has enough available room to hold the following ones payload. This is
controlled by /proc/sys/net/ipv4/tcp_retrans_collapse tunable (default :
enabled)

Problem is we must make sure _pskb_trim_head() doesnt fool
skb_availroom() when pulling some bytes from skb (this pull is done when
receiver ACK part of the frame).

Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Marc MERLIN <marc@merlins.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-18 16:52:45 -04:00
David S. Miller 56fa9b1630 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
From John:

Another batch of fixes intended for 3.4...

First up, we have a minor signedness fix for libertas from Amitkumar
Karwar.  Next, Arend gives us a brcm80211 fix for correctly enabling
Tx FIFOs on channels 12 and 13.  Bing Zhao gives us some register
address corrections for mwifiex.  Felix give us a trio of fixes --
one for ath9k to wake the hardware properly from full sleep, one for
mac80211 to properly handle packets in cooked monitor mode, and one
for ensuring that the proper HT mode selection is honored.

Hauke gives us a bcma fix for handling the lack of an sprom.  Jonathon
Bither gives us an ath5k fix for a missing THIS_MODULE build issue,
and another ath5k fix for an io mapping leak.  Lukasz Kucharczyk
fixes a bitwise check in cfg80211, and Sujith gives us an ath9k fix
for assigning sequence numbers for fragmented frames.  Finally, we
have a MAINTAINERS change from Wey-Yi Guy -- congrats to Johannes
Berg for taking the lead on iwlwifi. :-)

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-18 15:26:52 -04:00
John W. Linville 59ef43e681 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
Conflicts:
	drivers/net/wireless/iwlwifi/iwl-testmode.c
	include/net/nfc/nfc.h
	net/nfc/netlink.c
	net/wireless/nl80211.c
2012-04-18 14:27:48 -04:00
John W. Linville dbd717e37b Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2012-04-18 13:37:33 -04:00
David S. Miller 91fbe33034 Included changes:
* remove duplicated line in comment
 * add htons() invocation for tt_crc as suggested by Al Viro
 * OriGinator Message seqno initial value is now random
 * some cleanups and fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQEcBAABAgAGBQJPjnVmAAoJEFMQTLzJFOZFqpkH/33gzND7Ukfdax6CPYqb1AVm
 A63gtnZlNCwPf7dCJkq4yF4RVn/ir1pp+BwX5C9BIN9V/ZSaTsIKsMXAaZzUK3DH
 PCZEJCn+iys+ZX5KrpLum0wMSQyxt08GsGZLueiu+Rm0zRZLSCy58THNqLt2b6ZK
 mDH6tdbGxKXxrKeWzVz3PzQv8dPuFqApPiQ+M6ugf4YvjdYYEiGWFn8gad+XObeA
 oxbFGMt6MKdc+9EsKqd0Br1lqHiQ+RC2xXQiFEBizPe34LiYJ69irkEBki/6KV9Z
 ujeB0RxlMHXL75vUWoqyGcv/F2lzZd/tXQA6qz7ioCBHqzb1Mk/KGmVJ3KZ5CK8=
 =9lku
 -----END PGP SIGNATURE-----

Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge

Included changes:
* remove duplicated line in comment
* add htons() invocation for tt_crc as suggested by Al Viro
* OriGinator Message seqno initial value is now random
* some cleanups and fixes
2012-04-18 13:21:59 -04:00
Stanislav Kinsbursky adae0fe0ea SUNRPC: register PipeFS file system after pernet sybsystem
PipeFS superblock creation routine relays on SUNRPC pernet data presense, which
is created on register_pernet_subsys() call in SUNRPC module init function.
Registering of PipeFS filesystem prior to registering of per-net subsystem
leads to races (mount of PipeFS can dereference uninitialized data).

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-18 11:05:48 -04:00
Allan Stephens e11aa05971 tipc: Factor out name publication code to a separate function
This is done so that it can be reused with differing publication
lists, instead of being hard coded to the cluster publicaton list.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-18 09:36:02 -04:00
Allan Stephens 3f8375fee3 tipc: introduce publication lists struct
There is currently a single list that is containing both cluster-scope and
zone-scope publications, and the list count is a separate free floating
variable.  Create a struct to bind the count to the list, and to pave
the way for factoring out the publications into zone/cluster/node scope.

The current "publ_root" most matches what will be the cluster scope
list, so it is named accordingly in this commit.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-18 09:36:02 -04:00
Antonio Quartulli 1e5cc266db batman-adv: skip the window protection test when the originator has no neighbours
When we receive an OGM from from a node for the first time, the last_real_seqno
field of the orig_node structure has not been initialised yet. The value of this
field is used to compute the current ogm-seqno window and therefore the
protection mechanism will probably drop the packet due to an out-of-window error.
To avoid this situation this patch adds a check to skip the window protection
mechanism if no neighbour nodes have already been added. When the first
neighbour node is added, the last_real_seqno field is initialised too.

Reported-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-18 09:54:02 +02:00
Antonio Quartulli c97c72b493 batman-adv: print OGM seq numbers as unsigned int
OGM sequence numbers are declared as uint32_t and so they have to printed
using %u instead of %d in order to avoid wrong representations.

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-18 09:54:02 +02:00
Antonio Quartulli 0d125074eb batman-adv: use ETH_HLEN instead of sizeof(struct ethhdr)
Instead of using sizeof(struct ethhdr) it is strongly recommended to use the
kernel macro ETH_HLEN. This patch substitute each occurrence of the former
expressione with the latter one.

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-18 09:54:01 +02:00
Marek Lindner 1eeb479fda batman-adv: mark existing ogm variables as batman iv
The coming protocol changes also will have a part called "OGM". That
makes it necessary to introduce a distinction in the code base.

Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-18 09:54:01 +02:00
Marek Lindner 76e3d7fc1a batman-adv: rename BATMAN_OGM_LEN to BATMAN_OGM_HLEN
Using BATMAN_OGM_LEN leaves one with the impression that this is
the full packet size which is not the case. Therefore the variable
is renamed.

Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-18 09:54:00 +02:00
Marek Lindner cd8b78e7e9 batman-adv: refactoring API: find generalized name for bat_ogm_init_primary callback
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-18 09:54:00 +02:00
Marek Lindner 77af7575c4 batman-adv: handle routing code initialization properly
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-18 09:53:59 +02:00
Marek Lindner 00a50076a3 batman-adv: add iface_disable() callback to routing API
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-18 09:53:59 +02:00
Marek Lindner d7d32ec0f1 batman-adv: randomize initial seqno to avoid collision
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-18 09:53:58 +02:00
Marek Lindner c2aca02235 batman-adv: refactoring API: find generalized name for bat_ogm_init callback
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-18 09:53:58 +02:00
Marek Lindner 8140625e30 batman-adv: move ogm initialization into the proper function
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-18 09:53:57 +02:00
Antonio Quartulli e88af9464f batman-adv: remove duplicated line in comment
Remove an accidentally added duplicated line in a function comment

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-18 09:53:57 +02:00
Antonio Quartulli 6d2003fc26 batman-adv: convert the tt_crc to network order
Before sending out a TT_Request packet we must convert the tt_crc field value
to network order (since it is 16bits long).

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-18 09:43:36 +02:00
majianpeng 798ec84d45 net/core:Remove memleak reports by kmemleak_not_leak.
Signed-off-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-18 00:20:28 -04:00
majianpeng 7f59388108 net/ipv4:Remove two memleak reports by kmemleak_not_leak.
Signed-off-by: majianpeng <majianpeng@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-18 00:20:28 -04:00
Julian Anastasov b922934d01 netns: do not leak net_generic data on failed init
ops_init should free the net_generic data on
init failure and __register_pernet_operations should not
call ops_free when NET_NS is not enabled.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-18 00:05:33 -04:00
Eric Dumazet 4d846f0239 tcp: fix tcp_grow_window() for large incoming frames
tcp_grow_window() has to grow rcv_ssthresh up to window_clamp, allowing
sender to increase its window.

tcp_grow_window() still assumes a tcp frame is under MSS, but its no
longer true with LRO/GRO.

This patch fixes one of the performance issue we noticed with GRO on.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-17 22:32:00 -04:00
Felix Fietkau 6741e7f048 mac80211: fix logic error in ibss channel type check
The broken check leads to rate control attempting to use HT40 while
the driver is configured for HT20. This leads to interesting hardware
issues.

HT40 can only be used if the channel type is either HT40- or HT40+
and if the channel type of the cell matches the local type.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Cc: stable@vger.kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-17 14:17:04 -04:00
Felix Fietkau 973ef21a67 mac80211: fix truncated packets in cooked monitor rx
Cooked monitor rx was recently changed to use ieee80211_add_rx_radiotap_header
instead of generating only limited radiotap information.
ieee80211_add_rx_radiotap_header assumes that FCS info is still present if
the hardware supports receiving it, however when cooked monitor rx packets
are processed, FCS info has already been stripped.
Fix this by adding an extra flag indicating FCS presence.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-17 14:17:04 -04:00
David Ward 244b65dbfe net_sched: gred: Fix oops in gred_dump() in WRED mode
A parameter set exists for WRED mode, called wred_set, to hold the same
values for qavg and qidlestart across all VQs. The WRED mode values had
been previously held in the VQ for the default DP. After these values
were moved to wred_set, the VQ for the default DP was no longer created
automatically (so that it could be omitted on purpose, to have packets
in the default DP enqueued directly to the device without using RED).

However, gred_dump() was overlooked during that change; in WRED mode it
still reads qavg/qidlestart from the VQ for the default DP, which might
not even exist. As a result, this command sequence will cause an oops:

tc qdisc add dev $DEV handle $HANDLE parent $PARENT gred setup \
    DPs 3 default 2 grio
tc qdisc change dev $DEV handle $HANDLE gred DP 0 prio 8 $RED_OPTIONS
tc qdisc change dev $DEV handle $HANDLE gred DP 1 prio 8 $RED_OPTIONS

This fixes gred_dump() in WRED mode to use the values held in wred_set.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-16 23:51:07 -04:00
Daniel Baluta a75afd4770 can: fix sparse warning for cgw_list
Make cgw_list static to remove the following sparse warning:
net/can/gw.c:69:1: warning: symbol 'cgw_list' was not declared.
Should it be static?

Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2012-04-16 21:08:18 +02:00
Wey-Yi Guy 1dae27f84b mac80211: add function retrieve average rssi
Add utility function to provide the average rssi per vif

Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-16 14:38:49 -04:00
Rajkumar Manoharan f9616e0f88 cfg80211: increse bss expire time
The background scan completion takes more time when the station is
having heavy uplink traffic. The scan state machine decides to fall
back to home channel on every off-channel visit when there are pending
frames in tx queue. bgscan completion took ~30sec on dual band US
regulatory card.

scan period = (20 active channels * probe timeout) +
              (12 passive channels * passive probe timeout) +
              (32 * timeout on home channel) +
              (32 * flush timeout)

Signed-off-by: Rajkumar Manoharan <rmanohar@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-16 14:38:43 -04:00
Javier Cardona ec14bcd20f mac80211: Take into account TSF adjustment latency in Toffset setpoint
When testing mesh synchronization we observed a global TSF slowdown that
was dependent on the number of synchronized mesh stations.  This seems
to be caused by the TSF adjustment (read/write) latency.

Adding a small margin to the Toffset setpoint solved the problem.

Signed-off-by: Shinichi Hotori <hotorinn@gmail.com>
Signed-off-by: Yu Niiro <yu.niiro@gmail.com>
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-16 14:19:29 -04:00
Javier Cardona a802a6eba1 mac80211: Choose a new toffset setpoint if a big tsf jump is detected.
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-16 14:19:29 -04:00
Luciano Coelho bb3e10fb58 mac80211: check IEEE80211_HW_QUEUE_CONTROL in ieee80211_check_queues()
Commit 3a25a8c8 (mac80211: add improved HW queue control) introduced a
bug when running in AP mode without the IEEE80211_HW_QUEUE_CONTROL
flag set.  The ieee80211_check_queues() function always returns
-EINVAL, preventing AP mode from starting.  To fix this, check whether
this flag is set before checking if cab_queue is set properly.

Signed-off-by: Luciano Coelho <coelho@ti.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-16 14:16:58 -04:00
Johannes Berg 8e8b41f9d8 cfg80211: enforce lack of interface combinations
My grand plan to allow drivers to gradually move over
to advertising virtual interface combinations and only
enforce with drivers that do want it enforced doesn't
seem to be working out, only Christian ever added the
advertising (to carl9170), nobody else did.

Begin enforcing combinations in cfg80211 so that users
can rely on the information reported about a device.

Cc: "Luis R. Rodriguez" <mcgrof@qca.qualcomm.com>
Cc: Jouni Malinen <jouni@qca.qualcomm.com>
Cc: Vasanthakumar Thiagarajan <vthiagar@qca.qualcomm.com>
Cc: Senthil Balasubramanian <senthilb@qca.qualcomm.com>
Cc: Kalle Valo <kvalo@qca.qualcomm.com>
Cc: Jiri Slaby <jirislaby@gmail.com>
Cc: Nick Kossifidis <mickflemm@gmail.com>
Cc: Bob Copeland <me@bobcopeland.com>
Cc: Bing Zhao <bzhao@marvell.com>
Cc: Lennert Buytenhek <buytenh@wantstofly.org>
Cc: Ivo van Doorn <IvDoorn@gmail.com>
Cc: Gertjan van Wingerde <gwingerde@gmail.com>
Cc: Helmut Schaa <helmut.schaa@googlemail.com>
Cc: Luciano Coelho <coelho@ti.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-16 14:16:58 -04:00
Vishal Agarwal 6ec5bcadc2 Bluetooth: Temporary keys should be retained during connection
If a key is non persistent then it should not be used in future
connections but it should be kept for current connection. And it
should be removed when connecion is removed.

Signed-off-by: Vishal Agarwal <vishal.agarwal@stericsson.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2012-04-16 12:57:45 +03:00
Vishal Agarwal 745c0ce35f Bluetooth: hci_persistent_key should return bool
This patch changes the return type of function hci_persistent_key
from int to bool because it makes more sense to return information
whether a key is persistent or not as a bool.

Signed-off-by: Vishal Agarwal <vishal.agarwal@stericsson.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2012-04-16 12:57:40 +03:00
David S. Miller 56845d78ce Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/atheros/atlx/atl1.c
	drivers/net/ethernet/atheros/atlx/atl1.h

Resolved a conflict between a DMA error bug fix and NAPI
support changes in the atl1 driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15 13:19:04 -04:00
John Fastabend 3ff661c38c net: rtnetlink notify events for FDB NTF_SELF adds and deletes
It is useful to be able to monitor for FDB events in user space.
This patch adds support to generate netlink events when a change
is made to a device supporting the FDB ops.

This brings embedded switches inline with the SW net/bridge which
triggers events on FDB updates as well.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15 13:06:04 -04:00
John Fastabend d83b060360 net: add fdb generic dump routine
This adds a generic dump routine drivers can call. It
should be sufficient to handle any bridging model that
uses the unicast address list. This should be most SR-IOV
enabled NICs.

v2: return error on nlmsg_put and use -EMSGSIZE instead
    of -ENOMEM this is inline other usages

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15 13:06:04 -04:00
John Fastabend 12a9463445 net: addr_list: add exclusive dev_uc_add and dev_mc_add
This adds a dev_uc_add_excl() and dev_mc_add_excl() calls
similar to the original dev_{uc|mc}_add() except it sets
the global bit and returns -EEXIST for duplicat entires.

This is useful for drivers that support SR-IOV, macvlan
devices and any other devices that need to manage the
unicast and multicast lists.

v2: fix typo UNICAST should be MULTICAST in dev_mc_add_excl()

CC: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15 13:06:04 -04:00
John Fastabend 77162022ab net: add generic PF_BRIDGE:RTM_ FDB hooks
This adds two new flags NTF_MASTER and NTF_SELF that can
now be used to specify where PF_BRIDGE netlink commands should
be sent. NTF_MASTER sends the commands to the 'dev->master'
device for parsing. Typically this will be the linux net/bridge,
or open-vswitch devices. Also without any flags set the command
will be handled by the master device as well so that current user
space tools continue to work as expected.

The NTF_SELF flag will push the PF_BRIDGE commands to the
device. In the basic example below the commands are then parsed
and programmed in the embedded bridge.

Note if both NTF_SELF and NTF_MASTER bits are set then the
command will be sent to both 'dev->master' and 'dev' this allows
user space to easily keep the embedded bridge and software bridge
in sync.

There is a slight complication in the case with both flags set
when an error occurs. To resolve this the rtnl handler clears
the NTF_ flag in the netlink ack to indicate which sets completed
successfully. The add/del handlers will abort as soon as any
error occurs.

To support this new net device ops were added to call into
the device and the existing bridging code was refactored
to use these. There should be no required changes in user space
to support the current bridge behavior.

A basic setup with a SR-IOV enabled NIC looks like this,

          veth0  veth2
            |      |
          ------------
          |  bridge0 |   <---- software bridging
          ------------
               /
               /
  ethx.y      ethx
    VF         PF
     \         \          <---- propagate FDB entries to HW
     \         \
  --------------------
  |  Embedded Bridge |    <---- hardware offloaded switching
  --------------------

In this case the embedded bridge must be managed to allow 'veth0'
to communicate with 'ethx.y' correctly. At present drivers managing
the embedded bridge either send frames onto the network which
then get dropped by the switch OR the embedded bridge will flood
these frames. With this patch we have a mechanism to manage the
embedded bridge correctly from user space. This example is specific
to SR-IOV but replacing the VF with another PF or dropping this
into the DSA framework generates similar management issues.

Examples session using the 'br'[1] tool to add, dump and then
delete a mac address with a new "embedded" option and enabled
ixgbe driver:

# br fdb add 22:35:19:ac:60:59 dev eth3
# br fdb
port    mac addr                flags
veth0   22:35:19:ac:60:58       static
veth0   9a:5f:81:f7:f6:ec       local
eth3    00:1b:21:55:23:59       local
eth3    22:35:19:ac:60:59       static
veth0   22:35:19:ac:60:57       static
#br fdb add 22:35:19:ac:60:59 embedded dev eth3
#br fdb
port    mac addr                flags
veth0   22:35:19:ac:60:58       static
veth0   9a:5f:81:f7:f6:ec       local
eth3    00:1b:21:55:23:59       local
eth3    22:35:19:ac:60:59       static
veth0   22:35:19:ac:60:57       static
eth3    22:35:19:ac:60:59       local embedded
#br fdb del 22:35:19:ac:60:59 embedded dev eth3

I added a couple lines to 'br' to set the flags correctly is all. It
is my opinion that the merit of this patch is now embedded and SW
bridges can both be modeled correctly in user space using very nearly
the same message passing.

[1] 'br' tool was published as an RFC here and will be renamed 'bridge'
    http://patchwork.ozlabs.org/patch/117664/

Thanks to Jamal Hadi Salim, Stephen Hemminger and Ben Hutchings for
valuable feedback, suggestions, and review.

v2: fixed api descriptions and error case with both NTF_SELF and
    NTF_MASTER set plus updated patch description.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15 13:06:04 -04:00
Herbert Xu c5c2326059 bridge: Add multicast_querier toggle and disable queries by default
Sending general queries was implemented as an optimisation to speed
up convergence on start-up.  In order to prevent interference with
multicast routers a zero source address has to be used.

Unfortunately these packets appear to cause some multicast-aware
switches to misbehave, e.g., by disrupting multicast packets to us.

Since the multicast snooping feature still functions without sending
our own queries, this patch will change the default to not send
queries.

For those that need queries in order to speed up convergence on start-up,
a toggle is provided to restore the previous behaviour.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15 12:51:35 -04:00
Herbert Xu c83b8fab06 bridge: Restart queries when last querier expires
As it stands when we discover that a real querier (one that queries
with a non-zero source address) we stop querying.  However, even
after said querier has fallen off the edge of the earth, we will
never restart querying (unless the bridge itself is restarted).

This patch fixes this by kicking our own querier into gear when
the timer for other queriers expire.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15 12:51:34 -04:00
Herbert Xu 748572162a bridge: Add br_multicast_start_querier
This patch adds the helper br_multicast_start_querier so that
the code which starts the queriers in br_multicast_toggle can
be reused elsewhere.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15 12:51:34 -04:00
Eric Dumazet 95c9617472 net: cleanup unsigned to unsigned int
Use of "unsigned int" is preferred to bare "unsigned" in net tree.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15 12:44:40 -04:00
Daniel Baluta 5e73ea1a31 ipv4: fix checkpatch errors
Fix checkpatch errors of the following type:
	* ERROR: "foo * bar" should be "foo *bar"
	* ERROR: "(foo*)" should be "(foo *)"

Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15 12:37:19 -04:00
David S. Miller cf22f9a2b8 ipv6: Remove unused argument to addrconf_dad_start().
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-14 21:37:40 -04:00
Vijay Subramanian a8cb05b238 tcp: Remove redundant code entering quickack mode
tcp_enter_quickack_mode() already calls tcp_incr_quickack() and sets
icsk->icsk_ack.ato  to TCP_ATO_MIN. This patch removes the duplication.

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Reviewed-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-14 15:29:02 -04:00
Alex Copot aacd9289af tcp: bind() use stronger condition for bind_conflict
We must try harder to get unique (addr, port) pairs when
doing port autoselection for sockets with SO_REUSEADDR
option set.

We achieve this by adding a relaxation parameter to
inet_csk_bind_conflict. When 'relax' parameter is off
we return a conflict whenever the current searched
pair (addr, port) is not unique.

This tries to address the problems reported in patch:
	8d238b25b1
	Revert "tcp: bind() fix when many ports are bound"

Tests where ran for creating and binding(0) many sockets
on 100 IPs. The results are, on average:

	* 60000 sockets, 600 ports / IP:
		* 0.210 s, 620 (IP, port) duplicates without patch
		* 0.219 s, no duplicates with patch
	* 100000 sockets, 1000 ports / IP:
		* 0.371 s, 1720 duplicates without patch
		* 0.373 s, no duplicates with patch
	* 200000 sockets, 2000 ports / IP:
		* 0.766 s, 6900 duplicates without patch
		* 0.768 s, no duplicates with patch
	* 500000 sockets, 5000 ports / IP:
		* 2.227 s, 41500 duplicates without patch
		* 2.284 s, no duplicates with patch

Signed-off-by: Alex Copot <alex.mihai.c@gmail.com>
Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-14 15:28:55 -04:00
Eric Dumazet c72e118334 inet: makes syn_ack_timeout mandatory
There are two struct request_sock_ops providers, tcp and dccp.

inet_csk_reqsk_queue_prune() can avoid testing syn_ack_timeout being
NULL if we make it non NULL like syn_ack_timeout

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Cc: dccp@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-14 15:24:26 -04:00
Eric Dumazet fd4f2cead6 tcp: RFC6298 supersedes RFC2988bis
Updates some comments to track RFC6298

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-14 15:24:26 -04:00
stephen hemminger 87b6d218f3 tunnel: implement 64 bits statistics
Convert the per-cpu statistics kept for GRE, IPIP, and SIT tunnels
to use 64 bit statistics.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-14 14:47:05 -04:00
Will Drewry 0c5fe1b422 net/compat.c,linux/filter.h: share compat_sock_fprog
Any other users of bpf_*_filter that take a struct sock_fprog from
userspace will need to be able to also accept a compat_sock_fprog
if the arch supports compat calls.  This change allows the existing
compat_sock_fprog be shared.

Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Eric Paris <eparis@redhat.com>

v18: tasered by the apostrophe police
v14: rebase/nochanges
v13: rebase on to 88ebdda615
v12: rebase on to linux-next
v11: introduction
Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-04-14 11:13:19 +10:00
Will Drewry 46b325c7eb sk_run_filter: add BPF_S_ANC_SECCOMP_LD_W
Introduces a new BPF ancillary instruction that all LD calls will be
mapped through when skb_run_filter() is being used for seccomp BPF.  The
rewriting will be done using a secondary chk_filter function that is run
after skb_chk_filter.

The code change is guarded by CONFIG_SECCOMP_FILTER which is added,
along with the seccomp_bpf_load() function later in this series.

This is based on http://lkml.org/lkml/2012/3/2/141

Suggested-by: Indan Zupancic <indan@nul.nu>
Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Eric Paris <eparis@redhat.com>

v18: rebase
...
v15: include seccomp.h explicitly for when seccomp_bpf_load exists.
v14: First cut using a single additional instruction
... v13: made bpf functions generic.
Signed-off-by: James Morris <james.l.morris@oracle.com>
2012-04-14 11:13:19 +10:00
Mohammed Shafi Shajakhan 0446b49c33 mac80211: remove ieee80211_rx_bss_get
its not used where, while we directly obtain ieee80211_bss's
pointer in ibss.c by calling cfg80211_get_bss

Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-13 14:32:51 -04:00
Michal Kazior 4ee73f338a mac80211: remove hw.conf.channel usage where possible
Removes hw.conf.channel usage from the following functions:
 * ieee80211_mandatory_rates
 * ieee80211_sta_get_rates
 * ieee80211_frame_duration
 * ieee80211_rts_duration
 * ieee80211_ctstoself_duration

This is in preparation for multi-channel operation.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-13 14:32:50 -04:00
Lorenzo Bianconi d01b31604c mac80211: fix an issue in ieee80211_tx_info count field management
I noticed a possible issue in the status count field management of the
ieee80211_tx_info data structure. In particular, when the AGGR
processing is employed,
status.rates[].count is set just for the first frame and not for
others belonging to the same burst, leading to wrong statistic data in
the mac80211 debug file system.

Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-13 14:32:49 -04:00
Pontus Fuchs d91df0e3a1 cfg80211: Add channel information to NL80211_CMD_GET_INTERFACE
If the current channel is known, add frequency and channel type to
NL80211_CMD_GET_INTERFACE.

Signed-off-by: Pontus Fuchs <pontus.fuchs@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-13 14:32:49 -04:00
Stanislaw Gruszka 5f5460706e mac80211: protect ->scanning by mutex in ieee80211_work_work()
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-13 14:31:50 -04:00
Stanislaw Gruszka 133d40f9a2 mac80211: do not scan and monitor connection in parallel
Before we send probes in connection monitoring we check if scan is not
pending. But we do that check without locking. Fix that and also do not
start scan if connection monitoring is in progress.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-13 14:31:49 -04:00
Lukasz Kucharczyk e55a4046da cfg80211: fix interface combinations check.
Signed-off-by: Lukasz Kucharczyk <lukasz.kucharczyk@tieto.com>
Cc: stable@vger.kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-13 14:05:35 -04:00
Rémi Denis-Courmont 8e052e5289 Phonet: missing headers (sparse)
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13 14:03:16 -04:00
Rémi Denis-Courmont 3c490a87f6 Phonet: phonet_net_id can be static (sparse)
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13 14:03:16 -04:00
Hiroaki SHIMODA dcd2ba92e8 neighbour: Make neigh_table_init_no_netlink() static.
neigh_table_init_no_netlink() is only used in net/core/neighbour.c file.

Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13 14:00:44 -04:00
Eric Dumazet 447167bf56 udp: intoduce udp_encap_needed static_key
Most machines dont use UDP encapsulation (L2TP)

Adds a static_key so that udp_queue_rcv_skb() doesnt have to perform a
test if L2TP never setup the encap_rcv on a socket.

Idea of this patch came after Simon Horman proposal to add a hook on TCP
as well.

If static_key is not yet enabled, the fast path does a single JMP .

When static_key is enabled, JMP destination is patched to reach the real
encap_type/encap_rcv logic, possibly adding cache misses.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Simon Horman <horms@verge.net.au>
Cc: dev@openvswitch.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13 13:39:37 -04:00
stephen hemminger efacb309b5 rtnetlink & bonding: change args got get_tx_queues
Change get_tx_queues, drop unsused arg/return value real_tx_queues,
and use return by value (with error) rather than call by reference.

Probably bonding should just change to LLTX and the whole get_tx_queues
API could disappear!

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13 13:31:00 -04:00
David Ward 6f66cdc3e5 net/garp: fix GID rbtree ordering
The comparison operators were backwards in both garp_attr_lookup and
garp_attr_create, so the entire GID rbtree was in reverse order.
(There was no practical side effect to this though, except that PDUs
were sent with attributes listed in reverse order, which is still
valid by the protocol. This change is only for clarity.)

Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13 13:10:33 -04:00
David Woodhouse 9d02daf754 pppoatm: Fix excessive queue bloat
We discovered that PPPoATM has an excessively deep transmit queue. A
queue the size of the default socket send buffer (wmem_default) is
maintained between the PPP generic core and the ATM device.

Fix it to queue a maximum of *two* packets. The one the ATM device is
currently working on, and one more for the ATM driver to process
immediately in its TX done interrupt handler. The PPP core is designed
to feed packets to the channel with minimal latency, so that really
ought to be enough to keep the ATM device busy.

While we're at it, fix the fact that we were triggering the wakeup
tasklet on *every* pppoatm_pop() call. The comment saying "this is
inefficient, but doing it right is too hard" turns out to be overly
pessimistic... I think :)

On machines like the Traverse Geos, with a slow Geode CPU and two
high-speed ADSL2+ interfaces, there were reports of extremely high CPU
usage which could partly be attributed to the extra wakeups.

(The wakeup handling could actually be made a whole lot easier if we
 stop checking sk->sk_sndbuf altogether. Given that we now only queue
 *two* packets ever, one wonders what the point is. As it is, you could
 already deadlock the thing by setting the sk_sndbuf to a value lower
 than the MTU of the device, and it'd just block for ever.)

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13 13:03:45 -04:00
Gao feng 1716a96101 ipv6: fix problem with expired dst cache
If the ipv6 dst cache which copy from the dst generated by ICMPV6 RA packet.
this dst cache will not check expire because it has no RTF_EXPIRES flag.
So this dst cache will always be used until the dst gc run.

Change the struct dst_entry,add a union contains new pointer from and expires.
When rt6_info.rt6i_flags has no RTF_EXPIRES flag,the dst.expires has no use.
we can use this field to point to where the dst cache copy from.
The dst.from is only used in IPV6.

rt6_check_expired check if rt6_info.dst.from is expired.

ip6_rt_copy only set dst.from when the ort has flag RTF_ADDRCONF
and RTF_DEFAULT.then hold the ort.

ip6_dst_destroy release the ort.

Add some functions to operate the RTF_EXPIRES flag and expires(from) together.
and change the code to use these new adding functions.

Changes from v5:
modify ip6_route_add and ndisc_router_discovery to use new adding functions.

Only set dst.from when the ort has flag RTF_ADDRCONF
and RTF_DEFAULT.then hold the ort.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13 12:58:29 -04:00
Dmitry Tarnyagin 447648128e caif: set traffic class for caif packets
Set traffic class for CAIF packets, based on socket
priority, CAIF protocol type, or type of message.

Traffic class mapping for different packet types:
 - control:       TC_PRIO_CONTROL;
 - flow control:  TC_PRIO_CONTROL;
 - at:            TC_PRIO_CONTROL;
 - rfm:           TC_PRIO_INTERACTIVE_BULK;
 - other sockets: equals to socket's TC;
 - network data:  no change.

Signed-off-by: Dmitry Tarnyagin <dmitry.tarnyagin@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13 11:37:36 -04:00
David S. Miller e65ac4d545 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull in the 'net' tree to get CAIF bug fixes upon which
the following set of CAIF feature patches depend.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13 11:36:48 -04:00
Tomasz Gregorek 5c699fb7d8 caif: Fix memory leakage in the chnl_net.c.
Added kfree_skb() calls in the chnk_net.c file on
the error paths.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13 11:01:44 -04:00
James Chapman c9be48dc8b l2tp: don't overwrite source address in l2tp_ip_bind()
Applications using L2TP/IP sockets want to be able to bind() an L2TP/IP
socket to set the local tunnel id while leaving the auto-assigned source
address alone. So if no source address is supplied, don't overwrite
the address already stored in the socket.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13 11:01:44 -04:00
James Chapman d1f224ae18 l2tp: fix refcount leak in l2tp_ip sockets
The l2tp_ip socket close handler does not update the module refcount
correctly which prevents module unload after the first bind() call on
an L2TPv3 IP encapulation socket.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13 11:01:44 -04:00
Julia Lawall 89eb06f11c net/key/af_key.c: add missing kfree_skb
At the point of this error-handling code, alloc_skb has succeded, so free
the resulting skb by jumping to the err label.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13 11:01:44 -04:00
Eric W. Biederman 03478756b1 phonet: Sort out initiailziation and cleanup code.
Recently an oops was reported in phonet if there was a failure during
network namespace creation.

[  163.733755] ------------[ cut here ]------------
[  163.734501] kernel BUG at include/net/netns/generic.h:45!
[  163.734501] invalid opcode: 0000 [#1] PREEMPT SMP
[  163.734501] CPU 2
[  163.734501] Pid: 19145, comm: trinity Tainted: G        W 3.4.0-rc1-next-20120405-sasha-dirty #57
[  163.734501] RIP: 0010:[<ffffffff824d6062>]  [<ffffffff824d6062>] phonet_pernet+0x182/0x1a0
[  163.734501] RSP: 0018:ffff8800674d5ca8  EFLAGS: 00010246
[  163.734501] RAX: 000000003fffffff RBX: 0000000000000000 RCX: ffff8800678c88d8
[  163.734501] RDX: 00000000003f4000 RSI: ffff8800678c8910 RDI: 0000000000000282
[  163.734501] RBP: ffff8800674d5cc8 R08: 0000000000000000 R09: 0000000000000000
[  163.734501] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880068bec920
[  163.734501] R13: ffffffff836b90c0 R14: 0000000000000000 R15: 0000000000000000
[  163.734501] FS:  00007f055e8de700(0000) GS:ffff88007d000000(0000) knlGS:0000000000000000
[  163.734501] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  163.734501] CR2: 00007f055e6bb518 CR3: 0000000070c16000 CR4: 00000000000406e0
[  163.734501] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  163.734501] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  163.734501] Process trinity (pid: 19145, threadinfo ffff8800674d4000, task ffff8800678c8000)
[  163.734501] Stack:
[  163.734501]  ffffffff824d5f00 ffffffff810e2ec1 ffff880067ae0000 00000000ffffffd4
[  163.734501]  ffff8800674d5cf8 ffffffff824d667a ffff880067ae0000 00000000ffffffd4
[  163.734501]  ffffffff836b90c0 0000000000000000 ffff8800674d5d18 ffffffff824d707d
[  163.734501] Call Trace:
[  163.734501]  [<ffffffff824d5f00>] ? phonet_pernet+0x20/0x1a0
[  163.734501]  [<ffffffff810e2ec1>] ? get_parent_ip+0x11/0x50
[  163.734501]  [<ffffffff824d667a>] phonet_device_destroy+0x1a/0x100
[  163.734501]  [<ffffffff824d707d>] phonet_device_notify+0x3d/0x50
[  163.734501]  [<ffffffff810dd96e>] notifier_call_chain+0xee/0x130
[  163.734501]  [<ffffffff810dd9d1>] raw_notifier_call_chain+0x11/0x20
[  163.734501]  [<ffffffff821cce12>] call_netdevice_notifiers+0x52/0x60
[  163.734501]  [<ffffffff821cd235>] rollback_registered_many+0x185/0x270
[  163.734501]  [<ffffffff821cd334>] unregister_netdevice_many+0x14/0x60
[  163.734501]  [<ffffffff823123e3>] ipip_exit_net+0x1b3/0x1d0
[  163.734501]  [<ffffffff82312230>] ? ipip_rcv+0x420/0x420
[  163.734501]  [<ffffffff821c8515>] ops_exit_list+0x35/0x70
[  163.734501]  [<ffffffff821c911b>] setup_net+0xab/0xe0
[  163.734501]  [<ffffffff821c9416>] copy_net_ns+0x76/0x100
[  163.734501]  [<ffffffff810dc92b>] create_new_namespaces+0xfb/0x190
[  163.734501]  [<ffffffff810dca21>] unshare_nsproxy_namespaces+0x61/0x80
[  163.734501]  [<ffffffff810afd1f>] sys_unshare+0xff/0x290
[  163.734501]  [<ffffffff8187622e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[  163.734501]  [<ffffffff82665539>] system_call_fastpath+0x16/0x1b
[  163.734501] Code: e0 c3 fe 66 0f 1f 44 00 00 48 c7 c2 40 60 4d 82 be 01 00 00 00 48 c7 c7 80 d1 23 83 e8 48 2a c4 fe e8 73 06 c8 fe 48 85 db 75 0e <0f> 0b 0f 1f 40 00 eb fe 66 0f 1f 44 00 00 48 83 c4 10 48 89 d8
[  163.734501] RIP  [<ffffffff824d6062>] phonet_pernet+0x182/0x1a0
[  163.734501]  RSP <ffff8800674d5ca8>
[  163.861289] ---[ end trace fb5615826c548066 ]---

After investigation it turns out there were two issues.
1) Phonet was not implementing network devices but was using register_pernet_device
   instead of register_pernet_subsys.

   This was allowing there to be cases when phonenet was not initialized and
   the phonet net_generic was not set for a network namespace when network
   device events were being reported on the netdevice_notifier for a network
   namespace leading to the oops above.

2) phonet_exit_net was implementing a confusing and special case of handling all
   network devices from going away that it was hard to see was correct, and would
   only occur when the phonet module was removed.

   Now that unregister_netdevice_notifier has been modified to synthesize unregistration
   events for the network devices that are extant when called this confusing special
   case in phonet_exit_net is no longer needed.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13 11:01:43 -04:00
Eric W. Biederman 7d3d43dab4 net: In unregister_netdevice_notifier unregister the netdevices.
We already synthesize events in register_netdevice_notifier and synthesizing
events in unregister_netdevice_notifier allows to us remove the need for
special case cleanup code.

This change should be safe as it adds no new cases for existing callers
of unregiser_netdevice_notifier to handle.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13 11:01:43 -04:00
David S. Miller 816a7854d5 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next 2012-04-12 20:12:31 -04:00
David S. Miller 011e3c6325 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-04-12 19:41:23 -04:00
Eldad Zack c1412fce7e net/ipv6/exthdrs.c: Strict PadN option checking
Added strict checking of PadN, as PadN can be used to increase header
size and thus push the protocol header into the 2nd fragment.

PadN is used to align the options within the Hop-by-Hop or
Destination Options header to 64-bit boundaries. The maximum valid
size is thus 7 bytes.
RFC 4942 recommends to actively check the "payload" itself and
ensure that it contains only zeroes.

See also RFC 4942 section 2.1.9.5.

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-12 17:36:44 -04:00
Linus Torvalds 174808af90 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix bluetooth userland regression reported by Keith Packard, from
    Gustavo Padovan.

 2) Revert ath9k PS idle change, from Sujith Manoharan.

 3) Correct default TCP memory limits (again), from Eric Dumazet.

 4) Fix tcp_rcv_rtt_update() accidental use of unscaled RTT, from Neal
    Cardwell.

 5) We made a facility for layers like wireless to say how much tailroom
    they need in the SKB for link layer stuff such as wireless
    encryption etc., but TCP works hard to fill every SKB out to the end
    defeating this specification.

    This leads to every TCP packet getting reallocated by the wireless
    code in order to have the right amount of tailroom available.

    Fix TCP to only fill SKBs out to the real amount of data area it
    asked for during the allocation, this way it won't eat into the
    slack added for the device's tailroom needs.

    Reported by Marc Merlin and fixed by Eric Dumazet.

 6) Leaks, endian bugs, and new device IDs in bluetooth from Santosh
    Nayak, João Paulo Rechi Vita, Cho, Yu-Chen, Andrei Emeltchenko,
    AceLan Kao, and Andrei Emeltchenko.

 7) OOPS on tty_close fix in bluetooth's hci_ldisc from Johan Hovold.

 8) netfilter erroneously scales TCP window twice, fix from Changli Gao.

 9) Memleak fix in wext-core from Julia Lawall.

10) Consistently handle invalid TCP packets in ipv4 vs.  ipv6 conntrack,
    from Jozsef Kadlecsik.

11) Validate IP header length properly in netfilter conntrack's
    ipv4_get_l4proto().

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (39 commits)
  NFC: Fix the LLCP Tx fragmentation loop
  rtlwifi: Add missing DMA buffer unmapping for PCI drivers
  rtlwifi: Preallocate USB read buffers and eliminate kalloc in read routine
  tcp: avoid order-1 allocations on wifi and tx path
  net: allow pskb_expand_head() to get maximum tailroom
  bridge: Do not send queries on multicast group leaves
  MAINTAINERS: Mark NATSEMI driver as orphan'd.
  tcp: fix tcp_rcv_rtt_update() use of an unscaled RTT sample
  tcp: restore correct limit
  Revert "ath9k: fix going to full-sleep on PS idle"
  rt2x00: Fix rfkill_polling register function.
  bcma: fix build error on MIPS; implicit pcibios_enable_device
  netfilter: nf_conntrack: fix incorrect logic in nf_conntrack_init_net
  netfilter: nf_ct_ipv4: packets with wrong ihl are invalid
  netfilter: nf_ct_ipv4: handle invalid IPv4 and IPv6 packets consistently
  net/wireless/wext-core.c: add missing kfree
  rtlwifi: Fix oops on rate-control failure
  mac80211: Convert WARN_ON to WARN_ON_ONCE
  rtlwifi: rtl8192de: Fix firmware initialization
  nl80211: ensure interface is up in various APIs
  ...
2012-04-12 14:04:33 -07:00
Jeffrin Jose 46ba5b23c3 net: Fixed coding style issues relating to braces.
Fixed coding style issues in net/core/utils.c
in relation with braces placement.

Signed-off-by: Jeffrin Jose <ahiliation@yahoo.co.in>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-12 16:35:48 -04:00
Eldad Zack 6fdbd1648b net/ipv6/ipv6_sockglue.c: Removed redundant extern
extern int sysctl_mld_max_msf is already defined in linux/ipv6.h.

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-12 16:14:47 -04:00
Shuah Khan e1e420c71b net/core: simple_strtoul cleanup
Changed net/core/net-sysfs.c: netdev_store() to use kstrtoul()
instead of obsolete simple_strtoul().

Signed-off-by: Shuah Khan <shuahkhan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-12 16:09:08 -04:00
Alexey I. Froloff e35f30c131 Treat ND option 31 as userland (DNSSL support)
As specified in RFC6106, DNSSL option contains one or more domain names
of DNS suffixes.  8-bit identifier of the DNSSL option type as assigned
by the IANA is 31.  This option should also be treated as userland.

Signed-off-by: Alexey I. Froloff <raorn@raorn.name>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-12 15:56:57 -04:00
Samuel Ortiz 91b0ade112 NFC: Fix LLCP link timeout typo
We were sending the LTO TLV as a version TLV instead of the actual link
timeout one.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-12 15:10:45 -04:00
Samuel Ortiz 56d5876a22 NFC: Add MIUX to the local LLCP general bytes
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-12 15:10:44 -04:00
Samuel Ortiz ffc29315e5 NFC: Call llcp_add_header properly when sending LLCP DM or DISC
dsap and ssap were swapped when sending DN or DISC.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-12 15:10:42 -04:00
Samuel Ortiz 324b0af6f5 NFC: Fix LLCP TLV building routine
The if logic could lead to zero length TLVs.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-12 15:10:41 -04:00
Samuel Ortiz 279cf174ae NFC: No need to apply twice the modulo op to LLCP's recv_n
recv_n is set properly when receiving an HDLC frame.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-12 15:10:41 -04:00
Samuel Ortiz 4be646ecc9 NFC: Dump LLCP frames
At KERN_DEBUG level.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-12 15:10:40 -04:00
Eric Lapuyade c8d56ae786 NFC: Add Core support to generate tag lost event
Some HW/drivers get notifications when a tag moves out of the radio field.
This notification is now forwarded to user space through netlink.

Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-12 15:10:39 -04:00
Eric Lapuyade 144612cacc NFC: Changed target activated state logic
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-12 15:10:38 -04:00
Eric Lapuyade 01ae0eea9b NFC: Fix next target_idx type and rename for clarity
Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-12 15:10:37 -04:00
Samuel Ortiz c4fbb6515a NFC: The core part should generate the target index
The target index can be used by userspace to uniquely identify a target
and thus should be kept unique, per NFC adapter. Moreover, some protocols
do not provide a logical index when discovering new targets, so we have to
generate one for them.
For NCI or pn533 to fetch their logical index, we added a logical_idx field
to the target structure.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-12 15:10:37 -04:00
Eric Lapuyade eb738fe535 NFC: SHDLC implementation
Most NFC HCI chipsets actually use a simplified HDLC link layer to
carry HCI payloads.
This implementation registers itself as an HCI device on behalf of the
NFC driver.

Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-12 15:10:35 -04:00
Eric Lapuyade 8b8d2e08bf NFC: HCI support
This is an implementation of ETSI TS 102 622 specification.
Many NFC chipsets use HCI as the host <-> target protocol on top of a
serial link like i2c.

Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-12 15:10:34 -04:00
Eric Lapuyade e1da0efa2e NFC: Export target lost function
NFC drivers will call this routine when they detect that a tag leaves the
RF field. This will eventually lead to the corresponding netlink event
to be sent.

Signed-off-by: Eric Lapuyade <eric.lapuyade@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-12 15:10:34 -04:00
Samuel Ortiz 8112a5c91d NFC: Add a target lost netlink event
Some chips are capable of detecting when a tag is out of the field, so
they could send a netlink event about it to userspace.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-12 15:10:33 -04:00
Chun-Yeow Yeoh 35bcd59113 mac80211: fix the assignment of PREQ's MAC address for Proactive RANN
Record the RANN sender's address only for RANNs that meet the acceptance
criteria (per sections 13.10.12.4.2).

Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com>
Reviewed-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-12 15:10:31 -04:00
John W. Linville 7eab0f64a9 Merge branch 'master' into for-davem
Conflicts:
	drivers/net/wireless/iwlwifi/iwl-testmode.c
	net/wireless/nl80211.c
2012-04-12 14:41:59 -04:00
John W. Linville 8065248069 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless 2012-04-12 13:49:28 -04:00
John W. Linville 5d94994422 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2012-04-12 09:55:22 -04:00
Stanislav Kinsbursky e5f06f720e nfsd: make expkey cache allocated per network namespace context
This patch also changes svcauth_unix_purge() function: added network namespace
as a parameter and thus loop over all networks was replaced by only one call
for ip map cache purge.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-04-12 09:11:46 -04:00
Thomas Pedersen 5314526b17 cfg80211: add channel switch notify event
The firmware may decide to switch channels while already beaconing, e.g.
in response to a cfg80211 connect request on a different vif. Add this
event to notify userspace when an AP or GO interface has successfully
migrated to a new channel, so it can update its configuration
accordingly.

Signed-off-by: Thomas Pedersen <c_tpeder@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-11 16:23:59 -04:00
Johannes Berg 6d52563f2b cfg80211/mac80211: enable proper device_set_wakeup_enable handling
In WoWLAN, we only get the triggers when we actually get
to suspend. As a consequence, drivers currently don't
know that the device should enable wakeup. However, the
device_set_wakeup_enable() API is intended to be called
when the wakeup is enabled, not later when needed.

Add a new set_wakeup() call to cfg80211 and mac80211 to
allow drivers to properly call device_set_wakeup_enable.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-11 16:23:57 -04:00
Johannes Berg 4d6c36fa22 mac80211: clean up an ieee80211_do_open error path
Eliad's comment prompted me to look closer at
the error paths in ieee80211_do_open() and I
found one that should use the error labels.

Also add a comment about the clear_bit since
in many error cases the bit hasn't been set.

Cc: Eliad Peller <eliad@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-11 16:23:51 -04:00
Johannes Berg 3a25a8c8b7 mac80211: add improved HW queue control
mac80211 currently only supports one hardware queue
per AC. This is already problematic for off-channel
uses since if we go off channel while the BE queue
is full and then try to send an off-channel frame
the frame will never go out. This will become worse
when we support multi-channel since then a queue on
one channel might be full, but we have to stop the
software queue for all channels. That is obviously
not desirable.

To address this problem allow drivers to register
more hardware queues, and allow them to map them to
virtual interfaces. When they stop a hardware queue
the corresponding AC software queues on the correct
interfaces will be stopped as well. Additionally,
there's an off-channel queue to solve that problem
and a per-interface after-DTIM beacon queue. This
allows drivers to manage software queues closer to
how the hardware works.

Currently, there's a limit of 16 hardware queues.
This may or may not be sufficient, we can adjust it
as needed.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-11 16:23:50 -04:00
Johannes Berg 4b6f1dd6a6 mac80211: add explicit monitor interface if needed
The queue mapping redesign that I'm planning to do
will break pure injection unless we handle monitor
interfaces explicitly. One possible option would
be to have the driver tell mac80211 about monitor
mode queues etc., but that would duplicate the API
since we already need to have queue assignments
handled per virtual interface.

So in order to solve this, have a virtual monitor
interface that is added whenever all active vifs
are monitors. We could also use the state of one
of the monitor interfaces, but managing that would
be complicated, so allocate separate state.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-11 16:23:49 -04:00
Johannes Berg 3edaf3e61f mac80211: manage AP netdev carrier state
The AP netdev is really only active when beaconing, so
manage the carrier state accordingly. Also do that for
VLAN interfaces enslaved to a given AP interface.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-11 16:23:48 -04:00
Ashok Nagarajan fe40cb6274 mac80211: Check basic rates when peering
Section 13.2.3 of IEEE 80211s standard requires BSSBasicRateSet of mesh nodes
to be identical to establish peer link.

Signed-off-by: Ashok Nagarajan <ashok@cozybit.com>
Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-11 16:23:47 -04:00
Ashok Nagarajan 9ebb61a23d mac80211: Modify sta_get_rates to give basic rates
Signed-off-by: Ashok Nagarajan <ashok@cozybit.com>
Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-11 16:23:47 -04:00
Ashok Nagarajan 657c3e0c41 mac80211: Indicate basic rates when adding rate IEs
Basic rates are added with supported rates IE and extended supported
rates IE.

Signed-off-by: Ashok Nagarajan <ashok@cozybit.com>
Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-11 16:23:46 -04:00
Ashok Nagarajan d934f7d0d6 mac80211: Use mandatory rates as basic rates when starting mesh
Signed-off-by: Ashok Nagarajan <ashok@cozybit.com>
Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-11 16:23:45 -04:00
Samuel Ortiz b4838d12e1 NFC: Fix the LLCP Tx fragmentation loop
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-11 15:09:33 -04:00
Eric Dumazet a21d45726a tcp: avoid order-1 allocations on wifi and tx path
Marc Merlin reported many order-1 allocations failures in TX path on its
wireless setup, that dont make any sense with MTU=1500 network, and non
SG capable hardware.

After investigation, it turns out TCP uses sk_stream_alloc_skb() and
used as a convention skb_tailroom(skb) to know how many bytes of data
payload could be put in this skb (for non SG capable devices)

Note : these skb used kmalloc-4096 (MTU=1500 + MAX_HEADER +
sizeof(struct skb_shared_info) being above 2048)

Later, mac80211 layer need to add some bytes at the tail of skb
(IEEE80211_ENCRYPT_TAILROOM = 18 bytes) and since no more tailroom is
available has to call pskb_expand_head() and request order-1
allocations.

This patch changes sk_stream_alloc_skb() so that only
sk->sk_prot->max_header bytes of headroom are reserved, and use a new
skb field, avail_size to hold the data payload limit.

This way, order-0 allocations done by TCP stack can leave more than 2 KB
of tailroom and no more allocation is performed in mac80211 layer (or
any layer needing some tailroom)

avail_size is unioned with mark/dropcount, since mark will be set later
in IP stack for output packets. Therefore, skb size is unchanged.

Reported-by: Marc MERLIN <marc@merlins.org>
Tested-by: Marc MERLIN <marc@merlins.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-11 10:11:12 -04:00
Eric Dumazet 87151b8689 net: allow pskb_expand_head() to get maximum tailroom
Marc Merlin reported many order-1 allocations failures in TX path on its
wireless setup, that dont make any sense with MTU=1500 network, and non
SG capable hardware.

Turns out part of the problem comes from pskb_expand_head() not using
ksize() to get exact head size given by kmalloc(). Doing the same thing
than __alloc_skb() allows more tailroom in skb and can prevent future
reallocations.

As a bonus, struct skb_shared_info becomes cache line aligned.

Reported-by: Marc MERLIN <marc@merlins.org>
Tested-by: Marc MERLIN <marc@merlins.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-11 10:10:43 -04:00
Herbert Xu 996304bbea bridge: Do not send queries on multicast group leaves
As it stands the bridge IGMP snooping system will respond to
group leave messages with queries for remaining membership.
This is both unnecessary and undesirable.  First of all any
multicast routers present should be doing this rather than us.
What's more the queries that we send may end up upsetting other
multicast snooping swithces in the system that are buggy.

In fact, we can simply remove the code that send these queries
because the existing membership expiry mechanism doesn't rely
on them anyway.

So this patch simply removes all code associated with group
queries in response to group leave messages.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-11 09:43:13 -04:00
Simon Wunderlich 7a5cc24277 batman-adv: add bridge loop avoidance compile option
The define CONFIG_BATMAN_ADV_BLA switches the bridge loop avoidance
on - skip it, and the bridge loop avoidance is not compiled in.

This is useful if binary size should be saved or the feature is
not needed.

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-11 14:29:00 +02:00
Simon Wunderlich 38ef3d1d91 batman-adv: form groups in the bridge loop avoidance
backbone gateways may be part of the same LAN, but participate
in different meshes. With this patch, backbone gateways form groups by
applying the groupid of another backbone gateway if it is higher. After
forming the group, they only accept messages from backbone gateways of
the same group.

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-11 14:28:59 +02:00
Simon Wunderlich b1a8c04b8a batman-adv: drop STP over batman
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-11 14:28:59 +02:00
Simon Wunderlich fe2da6ff27 batman-adv: add broadcast duplicate check
When multiple backbone gateways relay the same broadcast from the
backbone into the mesh, other nodes in the mesh may receive this
broadcast multiple times. To avoid this, the crc checksums of
received broadcasts are recorded and new broadcast packets with
the same content may be dropped if received by another gateway.

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-11 14:28:59 +02:00
Simon Wunderlich 20ff9d593f batman-adv: don't let backbone gateways exchange tt entries
As the backbone gateways are connected to the same backbone, they
should announce the same clients on the backbone non-exclusively.

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-11 14:28:59 +02:00
Simon Wunderlich db08e6e557 batman-adv: allow multiple entries in tt_global_entries
as backbone gateways will all independently announce the same clients,
also the tt global table must be able to hold multiple originators per
client entry.

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-11 14:28:59 +02:00
Simon Wunderlich 9bf8e4d425 batman-adv: export claim tables through debugfs
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-11 14:28:59 +02:00
Simon Wunderlich c867305509 batman-adv: make bridge loop avoidance switchable
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-11 14:28:59 +02:00
Simon Wunderlich 23721387c4 batman-adv: add basic bridge loop avoidance code
This second version of the bridge loop avoidance for batman-adv
avoids loops between the mesh and a backbone (usually a LAN).

By connecting multiple batman-adv mesh nodes to the same ethernet
segment a loop can be created when the soft-interface is bridged
into that ethernet segment. A simple visualization of the loop
involving the most common case - a LAN as ethernet segment:

node1  <-- LAN  -->  node2
  |                   |
wifi   <-- mesh -->  wifi

Packets from the LAN (e.g. ARP broadcasts) will circle forever from
node1 or node2 over the mesh back into the LAN.

With this patch, batman recognizes backbone gateways, nodes which are
part of the mesh and backbone/LAN at the same time. Each backbone
gateway "claims" clients from within the mesh to handle them
exclusively. By restricting that only responsible backbone gateways
may handle their claimed clients traffic, loops are effectively
avoided.

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-11 14:28:58 +02:00
Simon Wunderlich a7f6ee9493 batman-adv: remove old bridge loop avoidance code
The functionality is to be replaced by an improved implementation,
so first clean up.

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-11 14:28:58 +02:00
Marek Lindner 8681a1c4dd batman-adv: encourage batman to take shorter routes by changing the default hop penalty
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-11 14:28:58 +02:00
Sven Eckelmann de7aae6570 batman-adv: Remove declaration of only locally used functions
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Acked-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-11 14:28:58 +02:00
Sven Eckelmann 0079d2cef1 batman-adv: Replace bitarray operations with bitmap
bitarray.c consists mostly of functionality that is already available as part
of the standard kernel API. batman-adv could use architecture optimized code
and reduce the binary size by switching to the standard functions.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-11 14:28:58 +02:00
Antonio Quartulli c1faead333 batman-adv: use ETH_ALEN instead of hardcoded numeric constants
In packet.h the numeric constant 6 is used instead of the more portable ETH_ALEN
define. This patch substitute any hardcoded value with such define.

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Acked-by: Sven Eckelmann <sven@narfation.org>
2012-04-11 14:28:58 +02:00
Antonio Quartulli 10e3cd6a25 batman-adv: clean up Kconfig
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-11 14:28:58 +02:00
Linus Torvalds 94fb175c04 dmaengine-fixes for 3.4-rc3
1/ regression fix for Xen as it now trips over a broken assumption
    about the dma address size on 32-bit builds
 
 2/ new quirk for netdma to ignore dma channels that cannot meet
    netdma alignment requirements
 
 3/ fixes for two long standing issues in ioatdma (ring size overflow)
    and iop-adma (potential stack corruption)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJPhIfhAAoJEB7SkWpmfYgCguIQAL4qF+RC9/JggSHIjfOrYiPd
 yboV80GqqQHHBwy8hfZVUrIEPMebvD/xUIk6iUQNXR+6EA8Ln0jukvQMpWNnI+Cc
 TXgA5Ok70an4PD1MqnCsWyCJjsyPyhprbRHurxBcesf+y96POJxhING0rcKvft50
 mvYnbtrkYe9M9x3b8TBGc0JaTVeL29Ck3FtkTz4uUktbkhRNfCcfEd28NRQpf8MB
 vkjbjRGBQmGsnKxYCaEhlF1GPJyTlYjg4BBWtseJgb2R9s7tvJrkotFea/NmSfjq
 XCuVKjpiFp3YyJuxJERWdwqRWvyAZFfcYyZX440nG0b7GBgSn+T7A9XhUs8vMboi
 tLwoDfBbJDlKMaFpHex7Z6RtZZmVl3gWDNZTqpG44n4pabd4RPip04f0k7Wfs+cp
 tzU9hGAOvgsZ8w4/JgxH8YJOZbIGzbDGOA1IhWcbxIbmFTblMiFnV3TC7qfhoRbR
 8qtScIE7bUck2MYVlMMn9utd9tvKFa6HNgo41+f78/4+U7zQ/VrsbA/DWQct40R5
 5k+EEvyYFUzIXn79E0GVN5h4NHH5gfAs3MZ7jIgwgHedBp4Ki68XYKNu+pIV3YwG
 CFTPn1mVOXnCdt+fsjG5tL9Jecx1Mij6w3nWU93ZU6cHmC77YmU+DLxPIGuyR1a2
 EmpObwfq5peXzkgQpEsB
 =F3IR
 -----END PGP SIGNATURE-----

Merge tag 'dmaengine-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine

Pull dmaengine fixes from Dan Williams:

1/ regression fix for Xen as it now trips over a broken assumption
   about the dma address size on 32-bit builds

2/ new quirk for netdma to ignore dma channels that cannot meet
   netdma alignment requirements

3/ fixes for two long standing issues in ioatdma (ring size overflow)
   and iop-adma (potential stack corruption)

* tag 'dmaengine-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/dmaengine:
  netdma: adding alignment check for NETDMA ops
  ioatdma: DMA copy alignment needed to address IOAT DMA silicon errata
  ioat: ring size variables need to be 32bit to avoid overflow
  iop-adma: Corrected array overflow in RAID6 Xscale(R) test.
  ioat: fix size of 'completion' for Xen
2012-04-10 15:30:16 -07:00
Javier Cardona d299a1f21e {nl,cfg}80211: Support for mesh synchronization
Report Toffset to userspace.
Let userspace select the mesh synchronization method.

Signed-off-by: Marco Porsch <marco.porsch@s2005.tu-chemnitz.de>
Signed-off-by: Pavel Zubarev <pavel.zubarev@gmail.com>
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-10 15:20:33 -04:00
Javier Cardona dbf498fbaf mac80211: Implement mesh synchronization framework
This patch adds MBSS extensible synchronization framework (Sec.
13.13.2 of IEEE Std. 802.11-2012).

The framework is implemented via an ops table which defines the
following functions:

    rx_bcn_presp() - this is called every time a mesh beacon is
received.
    adjust_tbtt() - this is called immediately before a beacon is about
to be transmitted.

The default neighbor offset synchronization defined in the standard is
implemented.  We also provide template functions for vendor specific
methods.

When neighbor offset synchronization is active (which is the default)
mesh neighbors in the same MBSS will track timing offsets to each other
and compensate clock drift.

In our tests we observed that this mesh synchronization implementation
successfully corrected drifts between stations of ~2PPM while
introducing a jitter of ~20us.

It is also possible to test this framework on mac80211_hwsim simulated
phys to see how it behaves under different topologies, over poor links,
etc.

Signed-off-by: Marco Porsch <marco.porsch@s2005.tu-chemnitz.de>
Signed-off-by: Pavel Zubarev <pavel.zubarev@gmail.com>
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-10 15:20:31 -04:00
Javier Cardona 9bdd3a6bf8 mac80211: Allow tsf increments via debugfs
Reading and writing back the tsf value via tsf is too slow if one wants
to make small increments to this timer.  With this change you can use
the syntax "+=<some value>" or "-=<some value>" to add or substract a
value from the tsf counter.

Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-10 15:20:30 -04:00
Stanislaw Gruszka 88c868c43b mac80211: sanity check for null SSID
While associated we should never have empty SSID, but life can be full
of surprises, and is allways better to print a warning than crash.

Before memcpy() in ieee80211_probereq_get() check ssid_len instead of
ssid pointer, sice pointer it always passed by "ssidie + 2" expression
to send probe functions, so practically never can be NULL.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-10 15:20:28 -04:00
Johannes Berg 32c5057b22 mac80211: use IEEE80211_NUM_ACS
When comparing hw->queues to determine if the
device is QoS capable, use IEEE80211_NUM_ACS
instead of just 4.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-10 14:56:10 -04:00
Johannes Berg 4644ae8903 mac80211: lazily stop queues in add_pending
When adding pending SKBs there's no need to
stop all queues, we only need to stop those
that we're adding frames to. Implement that
by lazily stopping a queue as we add an SKB.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-10 14:54:11 -04:00
Johannes Berg ada1512526 mac80211: debounce queue stop/wake
When the queue status changes we need to do a fair
bit of work, so ignore no-op changes early.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-10 14:54:11 -04:00
Johannes Berg ded81f6ba9 mac80211: decouple # of netdev queues from HW queues
When we get more hardware queues, we'll still want
to only have netdev queues per AC, so set it up in
that way. If the hardware doesn't support QoS (by
not supporting at least 4 queues) the netdevs get
a single queue only (this is no change in behavior
as there are no drivers with 2 or 3 queues today.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-10 14:54:10 -04:00
Johannes Berg 54bcbc695e mac80211: refuse TX queue configuration on non-QoS HW
Drivers that don't support QoS also don't support
setting up their ACs, catch that early. While at
it, remove the input check since cfg80211 does it
now.

Also fix up the restart code to not try to set up
the queues in this case.

Finally also change the tx_conf array to have
IEEE80211_NUM_ACS entries instead of # of queues
since that's what it really needs.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-10 14:54:10 -04:00
Johannes Berg a3304b0a17 cfg80211/nl80211: clarify TX queue API
With the plan to change mac80211's queue API to
not map ACs to queues 1:1, it seems necessary to
clarify some APIs that act on ACs rather than on
queues to spell that out explicitly. Do this.

Also verify that the AC number given is valid.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-10 14:54:09 -04:00
Johannes Berg 8f727ef3c4 mac80211: notify driver of rate control updates
Devices that have internal rate control need to be
notified when the bandwidth or SMPS state changes
just like external rate control algorithms get a
notification now.

Add this notification and clarify the change bits
while at it, the HT_CHANGED bit really meant only
bandwidth changed.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-10 14:54:08 -04:00
Johannes Berg 7213cf2cb0 mac80211: remove queue stop on rate control update
We currently stop the queue when changing the rate
control between 20/40 MHz in the BSS. This seems to
have been necessary when we actually changed the
channel, but now that we just update the station it
doesn't seem right any more. Remove it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-10 14:54:08 -04:00
Johannes Berg 64f68e5d15 mac80211: remove channel type argument from rate_update
The channel type argument to the rate_update()
callback isn't really the correct way to give
the rate control algorithm about the desired
RX bandwidth of the peer.

Remove this argument, and instead update the
STA capabilities with 20/40 appropriately. The
SMPS update done by this callback works in the
same way, so this makes the callback cleaner.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-10 14:54:08 -04:00
Johannes Berg 24398e39c8 mac80211: set HT channel before association
Changing the channel type during operation is
confusing to some drivers and will be hard to
handle in multi-channel scenarios. Instead of
changing the channel, set it to the right HT
channel before authenticating/associating and
don't change it -- just update the 20/40 MHz
restrictions in rate control as needed when
changed by the AP.

This also fixes a problem that Paul missed in
his fix for the "regulatory makes us deaf"
issue -- when we couldn't use 40 MHz we still
associated saying we were using 40 MHz, which
could in similarly broken APs make us never
even connect successfully.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-10 14:54:07 -04:00
Johannes Berg 1d98fb122d mac80211: use AC constants
Use the AC constants instead of hard-coding
the numbers with comments.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-10 14:54:07 -04:00
Johannes Berg 78307daadf mac80211: inline ieee80211_add_pending_skbs
This is a trivial wrapper function, inline it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-10 14:54:06 -04:00
Johannes Berg 4670cf7a84 mac80211: make ieee80211_downgrade_queue static
There's no reason for it to not be static.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-10 14:54:06 -04:00
Johannes Berg 4875d30df5 mac80211: clean up uAPSD TX code
Clean up the code formatting and also replace
the constant 0 by IEEE80211_AC_VO.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-10 14:54:06 -04:00
Johannes Berg 98aed9fd01 mac80211: fix mesh TX coding style
Fix bad indentation & pointless if nesting.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-10 14:54:05 -04:00
Johannes Berg 81ddbb5c11 mac80211: don't always advertise remain-on-channel
Not all devices are really capable of implementing
remain-on-channel, even if it is implemented in SW,
as they can't necessarily deal with channel changes
while associated.

Remove the WIPHY_FLAG_HAS_REMAIN_ON_CHANNEL and add
it only if either the driver has remain_on_channel
implemented in the driver/device.

Also add it to all drivers that advertise P2P right
now since those definitely have to have it working.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Luciano Coelho <coelho@ti.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-10 14:54:04 -04:00
Neal Cardwell 18a223e0b9 tcp: fix tcp_rcv_rtt_update() use of an unscaled RTT sample
Fix a code path in tcp_rcv_rtt_update() that was comparing scaled and
unscaled RTT samples.

The intent in the code was to only use the 'm' measurement if it was a
new minimum.  However, since 'm' had not yet been shifted left 3 bits
but 'new_sample' had, this comparison would nearly always succeed,
leading us to erroneously set our receive-side RTT estimate to the 'm'
sample when that sample could be nearly 8x too high to use.

The overall effect is to often cause the receive-side RTT estimate to
be significantly too large (up to 40% too large for brief periods in
my tests).

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-10 14:47:09 -04:00
Eric Dumazet 5fb84b1428 tcp: restore correct limit
Commit c43b874d5d (tcp: properly initialize tcp memory limits) tried
to fix a regression added in commits 4acb4190 & 3dc43e3,
but still get it wrong.

Result is machines with low amount of memory have too small tcp_rmem[2]
value and slow tcp receives : Per socket limit being 1/1024 of memory
instead of 1/128 in old kernels, so rcv window is capped to small
values.

Fix this to match comment and previous behavior.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Glauber Costa <glommer@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-10 14:39:26 -04:00
David S. Miller ecd159fc5f Merge branch 'master' of git://1984.lsi.us.es/net 2012-04-10 14:38:31 -04:00
David S. Miller 06eb4eafbd Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-04-10 14:30:45 -04:00
Glauber Costa 1d62e43657 cgroup: pass struct mem_cgroup instead of struct cgroup to socket memcg
The only reason cgroup was used, was to be consistent with the populate()
interface. Now that we're getting rid of it, not only we no longer need
it, but we also *can't* call it this way.

Since we will no longer rely on populate(), this will be called from
create(). During create, the association between struct mem_cgroup
and struct cgroup does not yet exist, since cgroup internals hasn't
yet initialized its bookkeeping. This means we would not be able
to draw the memcg pointer from the cgroup pointer in these
functions, which is highly undesirable.

Signed-off-by: Glauber Costa <glommer@parallels.com>
Acked-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
CC: Li Zefan <lizefan@huawei.com>
CC: Johannes Weiner <hannes@cmpxchg.org>
CC: Michal Hocko <mhocko@suse.cz>
2012-04-10 10:04:07 -07:00
Gao feng 6ba900676b netfilter: nf_conntrack: fix incorrect logic in nf_conntrack_init_net
in function nf_conntrack_init_net,when nf_conntrack_timeout_init falied,
we should call nf_conntrack_ecache_fini to do rollback.
but the current code calls nf_conntrack_timeout_fini.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-04-10 13:00:38 +02:00
Jozsef Kadlecsik 07153c6ec0 netfilter: nf_ct_ipv4: packets with wrong ihl are invalid
It was reported that the Linux kernel sometimes logs:

klogd: [2629147.402413] kernel BUG at net / netfilter /
nf_conntrack_proto_tcp.c: 447!
klogd: [1072212.887368] kernel BUG at net / netfilter /
nf_conntrack_proto_tcp.c: 392

ipv4_get_l4proto() in nf_conntrack_l3proto_ipv4.c and tcp_error() in
nf_conntrack_proto_tcp.c should catch malformed packets, so the errors
at the indicated lines - TCP options parsing - should not happen.
However, tcp_error() relies on the "dataoff" offset to the TCP header,
calculated by ipv4_get_l4proto().  But ipv4_get_l4proto() does not check
bogus ihl values in IPv4 packets, which then can slip through tcp_error()
and get caught at the TCP options parsing routines.

The patch fixes ipv4_get_l4proto() by invalidating packets with bogus
ihl value.

The patch closes netfilter bugzilla id 771.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-04-10 12:50:49 +02:00
Jozsef Kadlecsik 8430eac2f6 netfilter: nf_ct_ipv4: handle invalid IPv4 and IPv6 packets consistently
IPv6 conntrack marked invalid packets as INVALID and let the user
drop those by an explicit rule, while IPv4 conntrack dropped such
packets itself.

IPv4 conntrack is changed so that it marks INVALID packets and let
the user to drop them.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-04-10 00:38:34 +02:00
Luis R. Rodriguez 80007efeff cfg80211: warn if db.txt is empty with CONFIG_CFG80211_INTERNAL_REGDB
It has happened twice now where elaborate troubleshooting has
undergone on systems where CONFIG_CFG80211_INTERNAL_REGDB [0]
has been set but yet net/wireless/db.txt was not updated.

Despite the documentation on this it seems system integrators could
use some more help with this, so throw out a kernel warning at boot time
when their database is empty.

This does mean that the error-prone system integrator won't likely
realize the issue until they boot the machine but -- it does not seem
to make sense to enable a build bug breaking random build testing.

[0] http://wireless.kernel.org/en/developers/Regulatory/CRDA#CONFIG_CFG80211_INTERNAL_REGDB

Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Youngsin Lee <youngsin@qualcomm.com>
Cc: Raja Mani <rmani@qca.qualcomm.com>
Cc: Senthil Kumar Balasubramanian <senthilb@qca.qualcomm.com>
Cc: Vipin Mehta <vipimeht@qca.qualcomm.com>
Cc: yahuan@qca.qualcomm.com
Cc: jjan@qca.qualcomm.com
Cc: vthiagar@qca.qualcomm.com
Cc: henrykim@qualcomm.com
Cc: jouni@qca.qualcomm.com
Cc: athiruve@qca.qualcomm.com
Cc: cjkim@qualcomm.com
Cc: philipk@qca.qualcomm.com
Cc: sunnykim@qualcomm.com
Cc: sskwak@qualcomm.com
Cc: kkim@qualcomm.com
Cc: mattbyun@qualcomm.com
Cc: ryanlee@qualcomm.com
Cc: simbap@qualcomm.com
Cc: krislee@qualcomm.com
Cc: conner@qualcomm.com
Cc: hojinkim@qualcomm.com
Cc: honglee@qualcomm.com
Cc: johnwkim@qualcomm.com
Cc: jinyong@qca.qualcomm.com
Cc: stable@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@frijolero.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-09 16:37:11 -04:00
Chun-Yeow Yeoh d2a079fd48 mac80211: fix the RANN propagation issues
This patch is intended to solve the follwing issues in RANN propagation:
[1] The interval in propagated RANN should be based on the interval of received RANN.
[2] The aggregated path metric for propagated RANN is as received plus own link metric
    towards the transmitting mesh STA (not root mesh STA).
[3] The comparison of path metric for RANN with same sequence number should be done
    before deciding whether to propagate or not.

Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-09 16:37:10 -04:00
Chun-Yeow Yeoh 292c41acdd mac80211: fix the sparse warnings on endian handling in RANN propagation
The HWMP sequence number of received RANN element is compared to decide whether to be
propagated. The sequence number is required to covert from 32bit little endian data into
CPUs endianness for comparison. The same applies to the RANN metric.

Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com>
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-09 16:12:30 -04:00
Ronald Wahl 70b12f2612 mac80211: when receiving DTIM disable power-save mode only if it was enabled
When receiving DTIM we currently disable power save mode in the
hardware unconditionally, i.e. also when the hardware was not sleeping.
This causes trouble with at least one wireless chipset (Ralink RT3572).
When the hardware is not sleeping and we send a wakeup command (e.g.
this happens after a scan) then a significant decrease of the link
quality or a disconnect may occur.
Disabling power save mode only when it was enabled prevents this issue.

Signed-off-by: Ronald Wahl <ronald.wahl@raritan.com>
Reviewed-by: Gertjan van Wingerde <gwingerde@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-09 16:12:30 -04:00
Felix Fietkau 12d3952fc4 mac80211: optimize aggregation session timeout handling
Calling mod_timer from the rx/tx hotpath is somewhat expensive, and the
timeout doesn't need to be so precise.

Switch to a different strategy: Schedule the timer initially, store jiffies
of all last rx/tx activity which would previously modify the timer, and
let the timer re-arm itself after checking the last rx/tx timestamp.
Make the session timers deferrable to avoid causing extra wakeups on systems
running on battery.
This visibly reduces CPU load under high network load on small embedded
systems.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-09 16:09:36 -04:00
Felix Fietkau fcb2c9e102 mac80211: reduce code duplication in debugfs code
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-09 16:09:35 -04:00
Felix Fietkau c6fb08aaa8 cfg80211: use compare_ether_addr on MAC addresses instead of memcmp
Because of the constant size and guaranteed 16 bit alignment, the inline
compare_ether_addr function is much cheaper than calling memcmp.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-09 16:09:35 -04:00
Marco Porsch 52a3f20c09 mac80211: end service period only after sending last buffered frame
Signed-off-by: Marco Porsch <marco.porsch@etit.tu-chemnitz.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-09 16:06:00 -04:00
Ben Greear d17087e78d mac80211: Add iface name when calling WARN-ON.
This lets the user know which interface has failed
the check_sdata_in_driver check.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-09 16:05:57 -04:00
Johannes Berg 074d46d1d2 wireless: rename ht_info to ht_operation
Since some of the HT code pre-dates 802.11n-2009
some names are wrong. The one that bothers me most
is that "HT operation" is called "HT information"
in our code and that causes confusion.

Rename "HT information" to "HT operation" and also
the control_chan field to primary_chan to match
the name used in the spec.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-09 16:05:55 -04:00
Rajkumar Manoharan f69b9c79c9 mac80211: flush to get the tx status of nullfunc frame immediately
Sometimes the probe frame (nullfunc) is stuck at the hw queue. so that
the mac80211 terminates the connection as it wont see the tx status.
Instead of waiting for long period for ack status, lets call flush
to get nullfunc status immediately. It also helps to send the nullfunc
till max tries reached.

Signed-off-by: Rajkumar Manoharan <rmanohar@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-09 16:05:54 -04:00
Rajkumar Manoharan 6f0756a38f mac80211: do not send pspoll when powersave is disabled
There might be latency at AP side to update TIM IE which could cause the
station to send pspoll frame even after the wakeup. If the powersave is
disabled, the nullfunc notification alone is sufficient to receive
frames from the AP. And if the pspoll frame was already sent, no need to
resend the frame till it was acked by AP.

Cc: Jouni Malinen <jouni@qca.qualcomm.com>
Cc: Kalle Valo <kvalo@qca.qualcomm.com>
Signed-off-by: Rajkumar Manoharan <rmanohar@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-09 16:05:54 -04:00
Julia Lawall 7ab2485b69 net/wireless/wext-core.c: add missing kfree
Free extra as done in the error-handling code just above.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-09 15:54:50 -04:00
Johannes Berg 2b5f8b0b44 nl80211: ensure interface is up in various APIs
The nl80211 handling code should ensure as much as
it can that the interface is in a valid state, it
can certainly ensure the interface is running.

Not doing so can cause calls through mac80211 into
the driver that result in warnings and unspecified
behaviour in the driver.

Cc: stable@vger.kernel.org
Reported-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-09 15:54:46 -04:00
Johannes Berg 3f9768a5d2 mac80211: fix association beacon wait timeout
The TU_TO_EXP_TIME() macro already includes the
"jiffies +" piece of the calculation, so don't
add jiffies again.

Reported-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-09 15:54:46 -04:00
John W. Linville 41833af713 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth 2012-04-09 15:47:49 -04:00
Jiri Slaby f997a01e32 TTY: rfcomm/tty, use count from tty_port
This means converting an atomic counter to a counter protected by
lock. This is the first step needed to convert the rest of the code to
the tty_port helpers.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-09 12:04:31 -07:00
Jiri Slaby b2c4be398b TTY: rfcomm/tty, remove work for tty_wakeup
tty_wakeup is safe to be called from all contexts. No need to schedule
a work for that. Let us call it directly like in other drivers.

This allows us to kill another member of rfcomm_dev structure.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-09 12:04:31 -07:00
Jiri Slaby 6705401928 TTY: rfcomm/tty, use tty_port refcounting
Switch the refcounting from manual atomic plays with refcounter to the
one offered by tty_port.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-09 12:04:31 -07:00
Jiri Slaby f60db8c424 TTY: rfcomm/tty, add tty_port
And use tty from there.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-09 12:04:30 -07:00
Eric Paris 6ce74ec75c SELinux: include flow.h where used rather than get it indirectly
We use flow_cache_genid in the selinux xfrm files.  This is declared in
net/flow.h  However we do not include that file directly anywhere.  We have
always just gotten it through a long chain of indirect .h file includes.

on x86_64:

  CC      security/selinux/ss/services.o
In file included from
/next/linux-next-20120216/security/selinux/ss/services.c:69:0:
/next/linux-next-20120216/security/selinux/include/xfrm.h: In function 'selinux_xfrm_notify_policyload':
/next/linux-next-20120216/security/selinux/include/xfrm.h:51:14: error: 'flow_cache_genid' undeclared (first use in this function)
/next/linux-next-20120216/security/selinux/include/xfrm.h:51:14: note: each undeclared identifier is reported only once for each function it appears in
make[3]: *** [security/selinux/ss/services.o] Error 1

Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Eric Paris <eparis@redhat.com>
2012-04-09 12:22:41 -04:00
Pablo Neira Ayuso 95ad2f873d netfilter: ip6_tables: ip6t_ext_hdr is now static inline
We may hit this in xt_LOG:

net/built-in.o:xt_LOG.c:function dump_ipv6_packet:
	error: undefined reference to 'ip6t_ext_hdr'

happens with these config options:

CONFIG_NETFILTER_XT_TARGET_LOG=y
CONFIG_IP6_NF_IPTABLES=m

ip6t_ext_hdr is fairly small and it is called in the packet path.
Make it static inline.

Reported-by: Simon Kirby <sim@netnation.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-04-09 16:29:34 +02:00
Changli Gao 6ee0b693bd netfilter: nf_ct_tcp: don't scale the size of the window up twice
For a picked up connection, the window win is scaled twice: one is by the
initialization code, and the other is by the sender updating code.

I use the temporary variable swin instead of modifying the variable win.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-04-09 16:29:30 +02:00
Jiri Kosina e75d660672 Merge branch 'master' into for-next
Merge with latest Linus' tree, as I have incoming patches
that fix code that is newer than current HEAD of for-next.

Conflicts:
	drivers/net/ethernet/realtek/r8169.c
2012-04-08 21:48:52 +02:00
Linus Torvalds 23f347ef63 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking updates from David Miller:

 1) Fix inaccuracies in network driver interface documentation, from Ben
    Hutchings.

 2) Fix handling of negative offsets in BPF JITs, from Jan Seiffert.

 3) Compile warning, locking, and refcounting fixes in netfilter's
    xt_CT, from Pablo Neira Ayuso.

 4) phonet sendmsg needs to validate user length just like any other
    datagram protocol, fix from Sasha Levin.

 5) Ipv6 multicast code uses wrong loop index, from RongQing Li.

 6) Link handling and firmware fixes in bnx2x driver from Yaniv Rosner
    and Yuval Mintz.

 7) mlx4 erroneously allocates 4 pages at a time, regardless of page
    size, fix from Thadeu Lima de Souza Cascardo.

 8) SCTP socket option wasn't extended in a backwards compatible way,
    fix from Thomas Graf.

 9) Add missing address change event emissions to bonding, from Shlomo
    Pongratz.

10) /proc/net/dev regressed because it uses a private offset to track
    where we are in the hash table, but this doesn't track the offset
    pullback that the seq_file code does resulting in some entries being
    missed in large dumps.

    Fix from Eric Dumazet.

11) do_tcp_sendpage() unloads the send queue way too fast, because it
    invokes tcp_push() when it shouldn't.  Let the natural sequence
    generated by the splice paths, and the assosciated MSG_MORE
    settings, guide the tcp_push() calls.

    Otherwise what goes out of TCP is spaghetti and doesn't batch
    effectively into GSO/TSO clusters.

    From Eric Dumazet.

12) Once we put a SKB into either the netlink receiver's queue or a
    socket error queue, it can be consumed and freed up, therefore we
    cannot touch it after queueing it like that.

    Fixes from Eric Dumazet.

13) PPP has this annoying behavior in that for every transmit call it
    immediately stops the TX queue, then calls down into the next layer
    to transmit the PPP frame.

    But if that next layer can take it immediately, it just un-stops the
    TX queue right before returning from the transmit method.

    Besides being useless work, it makes several facilities unusable, in
    particular things like the equalizers.  Well behaved devices should
    only stop the TX queue when they really are full, and in PPP's case
    when it gets backlogged to the downstream device.

    David Woodhouse therefore fixed PPP to not stop the TX queue until
    it's downstream can't take data any more.

14) IFF_UNICAST_FLT got accidently lost in some recent stmmac driver
    changes, re-add.  From Marc Kleine-Budde.

15) Fix link flaps in ixgbe, from Eric W. Multanen.

16) Descriptor writeback fixes in e1000e from Matthew Vick.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits)
  net: fix a race in sock_queue_err_skb()
  netlink: fix races after skb queueing
  doc, net: Update ndo_start_xmit return type and values
  doc, net: Remove instruction to set net_device::trans_start
  doc, net: Update netdev operation names
  doc, net: Update documentation of synchronisation for TX multiqueue
  doc, net: Remove obsolete reference to dev->poll
  ethtool: Remove exception to the requirement of holding RTNL lock
  MAINTAINERS: update for Marvell Ethernet drivers
  bonding: properly unset current_arp_slave on slave link up
  phonet: Check input from user before allocating
  tcp: tcp_sendpages() should call tcp_push() once
  ipv6: fix array index in ip6_mc_add_src()
  mlx4: allocate just enough pages instead of always 4 pages
  stmmac: re-add IFF_UNICAST_FLT for dwmac1000
  bnx2x: Clear MDC/MDIO warning message
  bnx2x: Fix BCM57711+BCM84823 link issue
  bnx2x: Clear BCM84833 LED after fan failure
  bnx2x: Fix BCM84833 PHY FW version presentation
  bnx2x: Fix link issue for BCM8727 boards.
  ...
2012-04-06 10:37:38 -07:00
Eric Dumazet 110c43304d net: fix a race in sock_queue_err_skb()
As soon as an skb is queued into socket error queue, another thread
can consume it, so we are not allowed to reference skb anymore, or risk
use after free.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-06 05:07:21 -04:00
Eric Dumazet 4a7e7c2ad5 netlink: fix races after skb queueing
As soon as an skb is queued into socket receive_queue, another thread
can consume it, so we are not allowed to reference skb anymore, or risk
use after free.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-06 04:21:06 -04:00
Alexandra Vintila 61282f3792 atm: fix comments
Signed-off-by: Alexandra Vintila <andavintila@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-04-05 17:21:11 -07:00
Sasha Levin bcf1b70ac6 phonet: Check input from user before allocating
A phonet packet is limited to USHRT_MAX bytes, this is never checked during
tx which means that the user can specify any size he wishes, and the kernel
will attempt to allocate that size.

In the good case, it'll lead to the following warning, but it may also cause
the kernel to kick in the OOM and kill a random task on the server.

[ 8921.744094] WARNING: at mm/page_alloc.c:2255 __alloc_pages_slowpath+0x65/0x730()
[ 8921.749770] Pid: 5081, comm: trinity Tainted: G        W    3.4.0-rc1-next-20120402-sasha #46
[ 8921.756672] Call Trace:
[ 8921.758185]  [<ffffffff810b2ba7>] warn_slowpath_common+0x87/0xb0
[ 8921.762868]  [<ffffffff810b2be5>] warn_slowpath_null+0x15/0x20
[ 8921.765399]  [<ffffffff8117eae5>] __alloc_pages_slowpath+0x65/0x730
[ 8921.769226]  [<ffffffff81179c8a>] ? zone_watermark_ok+0x1a/0x20
[ 8921.771686]  [<ffffffff8117d045>] ? get_page_from_freelist+0x625/0x660
[ 8921.773919]  [<ffffffff8117f3a8>] __alloc_pages_nodemask+0x1f8/0x240
[ 8921.776248]  [<ffffffff811c03e0>] kmalloc_large_node+0x70/0xc0
[ 8921.778294]  [<ffffffff811c4bd4>] __kmalloc_node_track_caller+0x34/0x1c0
[ 8921.780847]  [<ffffffff821b0e3c>] ? sock_alloc_send_pskb+0xbc/0x260
[ 8921.783179]  [<ffffffff821b3c65>] __alloc_skb+0x75/0x170
[ 8921.784971]  [<ffffffff821b0e3c>] sock_alloc_send_pskb+0xbc/0x260
[ 8921.787111]  [<ffffffff821b002e>] ? release_sock+0x7e/0x90
[ 8921.788973]  [<ffffffff821b0ff0>] sock_alloc_send_skb+0x10/0x20
[ 8921.791052]  [<ffffffff824cfc20>] pep_sendmsg+0x60/0x380
[ 8921.792931]  [<ffffffff824cb4a6>] ? pn_socket_bind+0x156/0x180
[ 8921.794917]  [<ffffffff824cb50f>] ? pn_socket_autobind+0x3f/0x90
[ 8921.797053]  [<ffffffff824cb63f>] pn_socket_sendmsg+0x4f/0x70
[ 8921.798992]  [<ffffffff821ab8e7>] sock_aio_write+0x187/0x1b0
[ 8921.801395]  [<ffffffff810e325e>] ? sub_preempt_count+0xae/0xf0
[ 8921.803501]  [<ffffffff8111842c>] ? __lock_acquire+0x42c/0x4b0
[ 8921.805505]  [<ffffffff821ab760>] ? __sock_recv_ts_and_drops+0x140/0x140
[ 8921.807860]  [<ffffffff811e07cc>] do_sync_readv_writev+0xbc/0x110
[ 8921.809986]  [<ffffffff811958e7>] ? might_fault+0x97/0xa0
[ 8921.811998]  [<ffffffff817bd99e>] ? security_file_permission+0x1e/0x90
[ 8921.814595]  [<ffffffff811e17e2>] do_readv_writev+0xe2/0x1e0
[ 8921.816702]  [<ffffffff810b8dac>] ? do_setitimer+0x1ac/0x200
[ 8921.818819]  [<ffffffff810e2ec1>] ? get_parent_ip+0x11/0x50
[ 8921.820863]  [<ffffffff810e325e>] ? sub_preempt_count+0xae/0xf0
[ 8921.823318]  [<ffffffff811e1926>] vfs_writev+0x46/0x60
[ 8921.825219]  [<ffffffff811e1a3f>] sys_writev+0x4f/0xb0
[ 8921.827127]  [<ffffffff82658039>] system_call_fastpath+0x16/0x1b
[ 8921.829384] ---[ end trace dffe390f30db9eb7 ]---

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-05 19:05:56 -04:00
Eric Dumazet 35f9c09fe9 tcp: tcp_sendpages() should call tcp_push() once
commit 2f53384424 (tcp: allow splice() to build full TSO packets) added
a regression for splice() calls using SPLICE_F_MORE.

We need to call tcp_flush() at the end of the last page processed in
tcp_sendpages(), or else transmits can be deferred and future sends
stall.

Add a new internal flag, MSG_SENDPAGE_NOTLAST, acting like MSG_MORE, but
with different semantic.

For all sendpage() providers, its a transparent change. Only
sock_sendpage() and tcp_sendpages() can differentiate the two different
flags provided by pipe_to_sendpage()

Reported-by: Tom Herbert <therbert@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail>com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-05 19:04:27 -04:00
Linus Torvalds 5d32c88f0b Merge branch 'akpm' (Andrew's patch-bomb)
Merge batch of fixes from Andrew Morton:
 "The simple_open() cleanup was held back while I wanted for laggards to
  merge things.

  I still need to send a few checkpoint/restore patches.  I've been
  wobbly about merging them because I'm wobbly about the overall
  prospects for success of the project.  But after speaking with Pavel
  at the LSF conference, it sounds like they're further toward
  completion than I feared - apparently davem is at the "has stopped
  complaining" stage regarding the net changes.  So I need to go back
  and re-review those patchs and their (lengthy) discussion."

* emailed from Andrew Morton <akpm@linux-foundation.org>: (16 patches)
  memcg swap: use mem_cgroup_uncharge_swap fix
  backlight: add driver for DA9052/53 PMIC v1
  C6X: use set_current_blocked() and block_sigmask()
  MAINTAINERS: add entry for sparse checker
  MAINTAINERS: fix REMOTEPROC F: typo
  alpha: use set_current_blocked() and block_sigmask()
  simple_open: automatically convert to simple_open()
  scripts/coccinelle/api/simple_open.cocci: semantic patch for simple_open()
  libfs: add simple_open()
  hugetlbfs: remove unregister_filesystem() when initializing module
  drivers/rtc/rtc-88pm860x.c: fix rtc irq enable callback
  fs/xattr.c:setxattr(): improve handling of allocation failures
  fs/xattr.c:listxattr(): fall back to vmalloc() if kmalloc() failed
  fs/xattr.c: suppress page allocation failure warnings from sys_listxattr()
  sysrq: use SEND_SIG_FORCED instead of force_sig()
  proc: fix mount -t proc -o AAA
2012-04-05 15:30:34 -07:00
Dave Jiang a2bd1140a2 netdma: adding alignment check for NETDMA ops
This is the fallout from adding memcpy alignment workaround for certain
IOATDMA hardware. NetDMA will only use DMA engine that can handle byte align
ops.

Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-04-05 15:27:12 -07:00
Stephen Boyd 234e340582 simple_open: automatically convert to simple_open()
Many users of debugfs copy the implementation of default_open() when
they want to support a custom read/write function op.  This leads to a
proliferation of the default_open() implementation across the entire
tree.

Now that the common implementation has been consolidated into libfs we
can replace all the users of this function with simple_open().

This replacement was done with the following semantic patch:

<smpl>
@ open @
identifier open_f != simple_open;
identifier i, f;
@@
-int open_f(struct inode *i, struct file *f)
-{
(
-if (i->i_private)
-f->private_data = i->i_private;
|
-f->private_data = i->i_private;
)
-return 0;
-}

@ has_open depends on open @
identifier fops;
identifier open.open_f;
@@
struct file_operations fops = {
...
-.open = open_f,
+.open = simple_open,
...
};
</smpl>

[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-05 15:25:50 -07:00
RongQing.Li ce713ee5a1 net: replace continue with break to reduce unnecessary loop in xxx_xmarksources
The conditional which decides to skip inactive filters does not
change with the change of loop index, so it is unnecessary to
check them many times.

Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-05 05:42:18 -04:00
Eric Dumazet 51c56b004e net: remove k{un}map_skb_frag()
Since commit 3e4d3af501 (mm: stack based kmap_atomic()) we dont have
to disable BH anymore while mapping skb frags.

We can remove kmap_skb_frag() / kunmap_skb_frag() helpers and use
kmap_atomic() / kunmap_atomic()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-05 05:36:43 -04:00
Amir Vadai 08f10affe4 net/dcb: Add an optional max rate attribute
Although not specified in 8021Qaz spec, it could be useful to enable drivers
whose HW supports setting a rate limit for an ETS TC. This patch adds this
optional attribute to DCB netlink. To use it, drivers should implement and
register the callbacks ieee_setmaxrate and ieee_getmaxrate. The units are 64
bits long and specified in Kbps to enable usage over both slow and very fast
networks.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-05 05:08:04 -04:00
Amir Vadai d4a968658c net/route: export symbol ip_tos2prio
Need to export this to enable drivers use rt_tos2priority()

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-05 05:08:04 -04:00
RongQing.Li 78d50217ba ipv6: fix array index in ip6_mc_add_src()
Convert array index from the loop bound to the loop index.

And remove the void type conversion to ip6_mc_del1_src() return
code, seem it is unnecessary, since ip6_mc_del1_src() does not
use __must_check similar attribute, no compiler will report the
warning when it is removed.

v2: enrich the commit header

Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-05 00:00:42 -04:00
Thomas Graf acdd598536 sctp: Allow struct sctp_event_subscribe to grow without breaking binaries
getsockopt(..., SCTP_EVENTS, ...) performs a length check and returns
an error if the user provides less bytes than the size of struct
sctp_event_subscribe.

Struct sctp_event_subscribe needs to be extended by an u8 for every
new event or notification type that is added.

This obviously makes getsockopt fail for binaries that are compiled
against an older versions of <net/sctp/user.h> which do not contain
all event types.

This patch changes getsockopt behaviour to no longer return an error
if not enough bytes are being provided by the user. Instead, it
returns as much of sctp_event_subscribe as fits into the provided buffer.

This leads to the new behavior that users see what they have been aware
of at compile time.

The setsockopt(..., SCTP_EVENTS, ...) API is already behaving like this.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-04 18:05:02 -04:00
Richard Cochran 02eacbd0c4 ethtool: Add a common function for drivers with transmit time stamping.
Currently, most drivers do not support transmit SO_TIMESTAMPING. For those
that do support it, there is one appropriate response to the get_ts_info
query. This patch adds a common function providing this response.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-04 05:28:46 -04:00
Richard Cochran c8f3a8c310 ethtool: Introduce a method for getting time stamping capabilities.
This commit adds a new ethtool ioctl that exposes the SO_TIMESTAMPING
capabilities of a network interface. In addition, user space programs
can use this ioctl to discover the PTP Hardware Clock (PHC) device
associated with the interface.

Since software receive time stamps are handled by the stack, the generic
ethtool code can answer the query correctly in case the MAC or PHY
drivers lack special time stamping features.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-04 05:28:45 -04:00
Shmulik Ladkani 2173bff5dc ipv6: Fix 'inet6_rtm_getroute' to release 'rt->dst' in case of 'alloc_skb' failure
In 72331bc [ipv6: Fix RTM_GETROUTE's interpretation of RTA_IIF to be
consistent with ipv4] the code of 'inet6_rtm_getroute()' was re-ordered
such that the reference to 'rt->dst' is incremented prior skb
allocation.

Hence, if 'alloc_skb()' fails, must drop a reference from 'rt->dst'.
Add the missing 'dst_release()' call.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-04 05:25:51 -04:00
Pablo Neira Ayuso d96fc659ae netfilter: nf_conntrack: fix count leak in error path of __nf_conntrack_alloc
We have to decrement the conntrack counter if we fail to access the
zone extension.

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-03 19:20:30 -04:00
Pablo Neira Ayuso ee14186f8d netfilter: xt_CT: fix missing put timeout object in error path
The error path misses putting the timeout object. This patch adds
new function xt_ct_tg_timeout_put() to put the timeout object.

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-03 19:18:21 -04:00
Pablo Neira Ayuso ca53e44053 netfilter: xt_CT: allocation has to be GFP_ATOMIC under rcu_read_lock section
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-03 19:16:39 -04:00
David S. Miller 9b461783d3 Merge branch 'master' of git://1984.lsi.us.es/net 2012-04-03 19:15:48 -04:00
Jiri Pirko ffe06c17af filter: add XOR operation
Add XOR instruction fo BPF machine. Needed for computing packet hashes.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-03 18:36:20 -04:00
Jiri Pirko 302d663740 filter: Allow to create sk-unattached filters
Today, BPF filters are bind to sockets. Since BPF machine becomes handy
for other purposes, this patch allows to create unattached filter.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-03 18:36:20 -04:00
Jan Seiffert f03fb3f455 bpf jit: Make the filter.c::__load_pointer helper non-static for the jits
The function is renamed to make it a little more clear what it does.
It is not added to any .h because it is not for general consumption, only for
bpf internal use (and so by the jits).

Signed-of-by: Jan Seiffert <kaffeemonster@googlemail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-03 18:01:03 -04:00
Eric Dumazet 2f53384424 tcp: allow splice() to build full TSO packets
vmsplice()/splice(pipe, socket) call do_tcp_sendpages() one page at a
time, adding at most 4096 bytes to an skb. (assuming PAGE_SIZE=4096)

The call to tcp_push() at the end of do_tcp_sendpages() forces an
immediate xmit when pipe is not already filled, and tso_fragment() try
to split these skb to MSS multiples.

4096 bytes are usually split in a skb with 2 MSS, and a remaining
sub-mss skb (assuming MTU=1500)

This makes slow start suboptimal because many small frames are sent to
qdisc/driver layers instead of big ones (constrained by cwnd and packets
in flight of course)

In fact, applications using sendmsg() (adding an additional memory copy)
instead of vmsplice()/splice()/sendfile() are a bit faster because of
this anomaly, especially if serving small files in environments with
large initial [c]wnd.

Call tcp_push() only if MSG_MORE is not set in the flags parameter.

This bit is automatically provided by splice() internals but for the
last page, or on all pages if user specified SPLICE_F_MORE splice()
flag.

In some workloads, this can reduce number of sent logical packets by an
order of magnitude, making zero-copy TCP actually faster than
one-copy :)

Reported-by: Tom Herbert <therbert@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail>com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-03 17:35:43 -04:00
Eric Dumazet 2def16ae6b net: fix /proc/net/dev regression
Commit f04565ddf5 (dev: use name hash for dev_seq_ops) added a second
regression, as some devices are missing from /proc/net/dev if many
devices are defined.

When seq_file buffer is filled, the last ->next/show() method is
canceled (pos value is reverted to value prior ->next() call)

Problem is after above commit, we dont restart the lookup at right
position in ->start() method.

Fix this by removing the internal 'pos' pointer added in commit, since
we need to use the 'loff_t *pos' provided by seq_file layer.

This also reverts commit 5cac98dd0 (net: Fix corruption
in /proc/*/net/dev_mcast), since its not needed anymore.

Reported-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Mihai Maruseac <mmaruseac@ixiacom.com>
Tested-by:  Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-03 17:23:23 -04:00
Eric Dumazet eb6a24816b af_unix: reduce high order page allocations
unix_dgram_sendmsg() currently builds linear skbs, and this can stress
page allocator with high order page allocations. When memory gets
fragmented, this can eventually fail.

We can try to use order-2 allocations for skb head (SKB_MAX_ALLOC) plus
up to 16 page fragments to lower pressure on buddy allocator.

This patch has no effect on messages of less than 16064 bytes.
(on 64bit arches with PAGE_SIZE=4096)

For bigger messages (from 16065 to 81600 bytes), this patch brings
reliability at the expense of performance penalty because of extra pages
allocations.

netperf -t DG_STREAM -T 0,2 -- -m 16064 -s 200000
->4086040 Messages / 10s

netperf -t DG_STREAM -T 0,2 -- -m 16068 -s 200000
->3901747 Messages / 10s

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-03 16:43:18 -04:00
Pablo Neira Ayuso 44b52bccf8 netfilter: xt_CT: remove a compile warning
If CONFIG_NF_CONNTRACK_TIMEOUT=n we have following warning :

  CC [M]  net/netfilter/xt_CT.o
net/netfilter/xt_CT.c: In function ‘xt_ct_tg_check_v1’:
net/netfilter/xt_CT.c:284: warning: label ‘err4’ defined but not used

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-04-03 10:08:48 +02:00
Linus Torvalds ed359a3b7b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Provide device string properly for USB i2400m wimax devices, also
    don't OOPS when providing firmware string.  From Phil Sutter.

 2) Add support for sh_eth SH7734 chips, from Nobuhiro Iwamatsu.

 3) Add another device ID to USB zaurus driver, from Guan Xin.

 4) Loop index start in pool vector iterator is wrong causing MAC to not
    get configured in bnx2x driver, fix from Dmitry Kravkov.

 5) EQL driver assumes HZ=100, fix from Eric Dumazet.

 6) Now that skb_add_rx_frag() can specify the truesize increment
    separately, do so in f_phonet and cdc_phonet, also from Eric
    Dumazet.

 7) virtio_net accidently uses net_ratelimit() not only on the kernel
    warning but also the statistic bump, fix from Rick Jones.

 8) ip_route_input_mc() uses fixed init_net namespace, oops, use
    dev_net(dev) instead.  Fix from Benjamin LaHaise.

 9) dev_forward_skb() needs to clear the incoming interface index of the
    SKB so that it looks like a new incoming packet, also from Benjamin
    LaHaise.

10) iwlwifi mistakenly initializes a channel entry as 2GHZ instead of
    5GHZ, fix from Stanislav Yakovlev.

11) Missing kmalloc() return value checks in orinoco, from Santosh
    Nayak.

12) ath9k doesn't check for HT capabilities in the right way, it is
    checking ht_supported instead of the ATH9K_HW_CAP_HT flag.  Fix from
    Sujith Manoharan.

13) Fix x86 BPF JIT emission of 16-bit immediate field of AND
    instructions, from Feiran Zhuang.

14) Avoid infinite loop in GARP code when registering sysfs entries.
    From David Ward.

15) rose protocol uses memcpy instead of memcmp in a device address
    comparison, oops.  Fix from Daniel Borkmann.

16) Fix build of lpc_eth due to dev_hw_addr_rancom() interface being
    renamed to eth_hw_addr_random().  From Roland Stigge.

17) Make ipv6 RTM_GETROUTE interpret RTA_IIF attribute the same way
    that ipv4 does.  Fix from Shmulik Ladkani.

18) via-rhine has an inverted bit test, causing suspend/resume
    regressions.  Fix from Andreas Mohr.

19) RIONET assumes 4K page size, fix from Akinobu Mita.

20) Initialization of imask register in sky2 is buggy, because bits are
    "or'd" into an uninitialized local variable.  Fix from Lino
    Sanfilippo.

21) Fix FCOE checksum offload handling, from Yi Zou.

22) Fix VLAN processing regression in e1000, from Jiri Pirko.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits)
  sky2: dont overwrite settings for PHY Quick link
  tg3: Fix 5717 serdes powerdown problem
  net: usb: cdc_eem: fix mtu
  net: sh_eth: fix endian check for architecture independent
  usb/rtl8150 : Remove duplicated definitions
  rionet: fix page allocation order of rionet_active
  via-rhine: fix wait-bit inversion.
  ipv6: Fix RTM_GETROUTE's interpretation of RTA_IIF to be consistent with ipv4
  net: lpc_eth: Fix rename of dev_hw_addr_random
  net/netfilter/nfnetlink_acct.c: use linux/atomic.h
  rose_dev: fix memcpy-bug in rose_set_mac_address
  Fix non TBI PHY access; a bad merge undid bug fix in a previous commit.
  net/garp: avoid infinite loop if attribute already exists
  x86 bpf_jit: fix a bug in emitting the 16-bit immediate operand of AND
  bonding: emit event when bonding changes MAC
  mac80211: fix oper channel timestamp updation
  ath9k: Use HW HT capabilites properly
  MAINTAINERS: adding maintainer for ipw2x00
  net: orinoco: add error handling for failed kmalloc().
  net/wireless: ipw2x00: fix a typo in wiphy struct initilization
  ...
2012-04-02 17:53:39 -07:00
Jesse Gross bf32fecdc1 openvswitch: Add length check when retrieving TCP flags.
When collecting TCP flags we check that the IP header indicates that
a TCP header is present but not that the packet is actually long
enough to contain the header.  This adds a check to prevent reading
off the end of the packet.

In practice, this is only likely to result in reading of bad data and
not a crash due to the presence of struct skb_shared_info at the end
of the packet.

Signed-off-by: Jesse Gross <jesse@nicira.com>
2012-04-02 14:28:57 -07:00
Ben Greear edbc0bb3fb net: Report dev->promiscuity in netlink reports.
The standard ways of probing a device's promiscuity
(ifi_flags, for instance) does not report the actual
state of the device.  This patch adds dev->promiscuity
to the netlink netdevice report so that users can know
for certain if the device is acting PROMISC or not.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-02 04:33:46 -04:00
Eldad Zack 8e5e8f30d0 net/ipv6/addrconf.c: Checkpatch cleanups
net/ipv6/addrconf.c:340: WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
net/ipv6/addrconf.c:342: ERROR: "foo * bar" should be "foo *bar"
net/ipv6/addrconf.c:444: ERROR: "foo * bar" should be "foo *bar"
net/ipv6/addrconf.c:1337: WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
net/ipv6/addrconf.c:1526: ERROR: "(foo*)" should be "(foo *)"
net/ipv6/addrconf.c:1671: ERROR: open brace '{' following function declarations go on the next line
net/ipv6/addrconf.c:1914: ERROR: "foo * bar" should be "foo *bar"
net/ipv6/addrconf.c:2368: ERROR: "foo * bar" should be "foo *bar"
net/ipv6/addrconf.c:2370: ERROR: "foo * bar" should be "foo *bar"
net/ipv6/addrconf.c:2416: ERROR: "foo * bar" should be "foo *bar"
net/ipv6/addrconf.c:2437: ERROR: "foo    * bar" should be "foo    *bar"
net/ipv6/addrconf.c:2573: ERROR: "foo * bar" should be "foo *bar"
net/ipv6/addrconf.c:3797: ERROR: "foo* bar" should be "foo *bar"

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-02 04:33:46 -04:00
Eldad Zack a2d91a09da net/ipv6/icmp.c: Checkpatch cleanups
icmp.c:501: ERROR: "(foo*)" should be "(foo *)"
icmp.c:582: ERROR: "(foo*)" should be "(foo *)"
icmp.c:954: WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-02 04:33:46 -04:00
Eldad Zack 911c8541ef net/ipv6/fib6_rules.c: Checkpatch cleanup
fib6_rules.c:26: ERROR: open brace '{' following struct go on the same line

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-02 04:33:46 -04:00
Eldad Zack 81bd60b515 net/ipv6/exthdrs_core.c: Checkpatch cleanups
exthdrs_core.c:113: WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
exthdrs_core.c:114: WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-02 04:33:46 -04:00
Eldad Zack ac3c8172ff net/ipv6/exthdrs.c: Checkpatch cleanups
exthdrs.c:726: WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
exthdrs.c:741: ERROR: "(foo*)" should be "(foo *)"
exthdrs.c:741: ERROR: "(foo*)" should be "(foo *)"
exthdrs.c:744: ERROR: "(foo**)" should be "(foo **)"
exthdrs.c:746: ERROR: "(foo**)" should be "(foo **)"
exthdrs.c:748: ERROR: "(foo**)" should be "(foo **)"
exthdrs.c:750: ERROR: "(foo**)" should be "(foo **)"
exthdrs.c:755: WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
exthdrs.c:896: WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-02 04:33:45 -04:00
Eldad Zack b5a4257cef net/ipv6/datagram.c: Checkpatch cleanups
datagram.c:101: ERROR: "(foo*)" should be "(foo *)"
datagram.c:521: ERROR: space required before the open parenthesis '('
datagram.c:830: WARNING: braces {} are not necessary for single statement blocks
datagram.c:849: WARNING: braces {} are not necessary for single statement blocks

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-02 04:33:45 -04:00
Eldad Zack d94d34a066 net/ipv6/addrconf_core.c: Checkpatch cleanup
addrconf_core.c:13: ERROR: space required before the open parenthesis '('

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-02 04:33:45 -04:00
Eldad Zack 3e866703cb net/ipv6/sit.c: Checkpatch cleanup
sit.c:118: ERROR: "foo * bar" should be "foo *bar"
sit.c:694: ERROR: "(foo*)" should be "(foo *)"
sit.c:724: ERROR: "(foo*)" should be "(foo *)"

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-02 04:33:45 -04:00
David S. Miller 61849dda1d vlan: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-02 04:33:44 -04:00
David S. Miller 2eb812e650 bridge: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-02 04:33:44 -04:00
David S. Miller 2809cec5b5 caif: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-02 04:33:44 -04:00
David S. Miller 14ad6647f3 gen_stats: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-02 04:33:44 -04:00
David S. Miller 0e3cea7b3c fib_rules: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-02 04:33:44 -04:00
David S. Miller be51da0f3e ieee802154: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-02 04:33:43 -04:00
David S. Miller d317e4f68f netfilter: ipv4: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-02 04:33:43 -04:00
David S. Miller f3756b79e8 ipv4: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-02 04:33:43 -04:00
David S. Miller e549a6b3a5 netfilter: ipv6: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-02 04:33:43 -04:00
David S. Miller c78679e8f3 ipv6: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-02 04:33:43 -04:00
David S. Miller b21dddb9df decnet: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-02 04:33:42 -04:00
David S. Miller a6574349d0 rtnetlink: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-02 04:33:42 -04:00
David S. Miller 9a6308d74e neighbour: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-02 04:33:42 -04:00
David S. Miller 1eb4c97777 dcbnl: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-02 04:33:41 -04:00
David S. Miller 60aed2abb3 l2tp: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-02 04:33:41 -04:00
David S. Miller 7cf7899d9e ipset: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-02 04:33:41 -04:00
David S. Miller 969e8e255c ipvs: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Simon Horman <horms@verge.net.au>
2012-04-02 04:33:41 -04:00
David S. Miller bae65be896 nf_conntrack_core: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-01 18:58:28 -04:00
David S. Miller cc1eb43134 nf_conntrack_netlink: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-01 18:57:48 -04:00
David S. Miller 516ee48f0b nf_conntrack_proto_dccp: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-01 18:53:24 -04:00
David S. Miller f577694143 nf_conntrack_proto_generic: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-01 18:52:31 -04:00
David S. Miller 242ddfc014 nf_conntrack_proto_gre: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-01 18:52:03 -04:00
David S. Miller 5e8d1eb5fb nf_conntrack_proto_sctp: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-01 18:51:39 -04:00
David S. Miller 4925a459e9 nf_conntrack_proto_tcp: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-01 18:50:08 -04:00
David S. Miller 3c60a17b1b nf_conntrack_proto_udp{,lite}: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-01 18:48:06 -04:00
David S. Miller 7c80118953 nfnetlink_acct: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-01 18:46:29 -04:00
David S. Miller 48f03bdad8 nfnetlink_cttimeout: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-01 18:46:00 -04:00
David S. Miller 1db20a5295 nfnetlink_log: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-01 18:43:44 -04:00
David S. Miller a447189e07 nfnetlink_queue: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-01 18:43:44 -04:00
David S. Miller 444653f696 genetlink: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-01 18:43:25 -04:00
David S. Miller 1e6428d82b nfc: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-01 18:11:37 -04:00
David S. Miller 028d6a6767 openvswitch: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-01 18:11:37 -04:00
David S. Miller 7f116b5b6c phonet: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-01 18:11:37 -04:00
David S. Miller 1b34ec43c9 pkt_sched: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-01 18:11:37 -04:00
David S. Miller 9360ffd185 wireless: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-01 18:11:37 -04:00
David S. Miller d0fde795d2 xfrm_user: Stop using NLA_PUT*().
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-01 18:11:36 -04:00
Shmulik Ladkani 72331bc0cd ipv6: Fix RTM_GETROUTE's interpretation of RTA_IIF to be consistent with ipv4
In IPv4, if an RTA_IIF attribute is specified within an RTM_GETROUTE
message, then a route is searched as if a packet was received on the
specified 'iif' interface.

However in IPv6, RTA_IIF is not interpreted in the same way:
'inet6_rtm_getroute()' always calls 'ip6_route_output()', regardless the
RTA_IIF attribute.

As a result, in IPv6 there's no way to use RTM_GETROUTE in order to look
for a route as if a packet was received on a specific interface.

Fix 'inet6_rtm_getroute()' so that RTA_IIF is interpreted as "lookup a
route as if a packet was received on the specified interface", similar
to IPv4's 'inet_rtm_getroute()' interpretation.

Reported-by: Ami Koren <amikoren@yahoo.com>
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-01 17:29:40 -04:00
Andrew Morton 6523cf9a46 net/netfilter/nfnetlink_acct.c: use linux/atomic.h
There's no known problem here, but this is one of only two non-arch files
in the kernel which use asm/atomic.h instead of linux/atomic.h.

Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-01 16:47:12 -04:00
danborkmann@iogearbox.net 81213b5e8a rose_dev: fix memcpy-bug in rose_set_mac_address
If both addresses equal, nothing needs to be done. If the device is down,
then we simply copy the new address to dev->dev_addr. If the device is up,
then we add another loopback device with the new address, and if that does
not fail, we remove the loopback device with the old address. And only
then, we update the dev->dev_addr.

Signed-off-by: Daniel Borkmann <daniel.borkmann@tik.ee.ethz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-01 16:47:12 -04:00
David Ward 67378563df net/garp: avoid infinite loop if attribute already exists
An infinite loop occurred if garp_attr_create was called with the values
of an existing attribute. This might happen if a previous leave request
for the attribute has not yet been followed by a PDU transmission (or,
if the application previously issued a join request for the attribute
and is now issuing another one, without having issued a leave request).

If garp_attr_create finds an existing attribute having the same values,
return the address to it. Its state will then get updated (i.e., if it
was in a leaving state, it will move into a non-leaving state and not
get deleted during the next PDU transmission).

To accomplish this fix, collapse garp_attr_insert into garp_attr_create
(which is its only caller).

Thanks to Jorge Boncompte [DTI2] <jorge@dti2.net> for contributing to
this fix.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
Acked-by: Jorge Boncompte [DTI2] <jorge@dti2.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-01 16:47:11 -04:00
David S. Miller 54f5ffbf30 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless 2012-04-01 16:47:08 -04:00
Tejun Heo 6bc103498f cgroup: convert memcg controller to the new cftype interface
Convert memcg to use the new cftype based interface.  kmem support
abuses ->populate() for mem_cgroup_sockets_init() so it can't be
removed at the moment.

tcp_memcontrol is updated so that tcp_files[] is registered via a
__initcall.  This change also allows removing the forward declaration
of tcp_files[].  Removed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Greg Thelen <gthelen@google.com>
2012-04-01 12:09:55 -07:00