Commit Graph

15093 Commits

Author SHA1 Message Date
YOSHIFUJI Hideaki / 吉藤英明 02cdce53f3 ipv6 fib: Use "Sweezle" to optimize addr_bit_test().
addr_bit_test() is used in various places in IPv6 routing table
subsystem.  It checks if the given fn_bit is set,
where fn_bit counts bits from MSB in words in network-order.

 fn_bit        :   0 .... 31 32 .... 64 65 .... 95 96 ....127

fn_bit >> 5 gives offset of word, and (~fn_bit & 0x1f) gives
count from LSB in the network-endian word in question.

 fn_bit >> 5   :       0          1          2          3
 ~fn_bit & 0x1f:  31 ....  0 31 ....  0 31 ....  0 31 ....  0

Thus, the mask was generated as htonl(1 << (~fn_bit & 0x1f)).
This can be optimized by "sweezle" (See include/asm-generic/bitops/le.h).

In little-endian,
  htonl(1 << bit) = 1 << (bit ^ BITOP_BE32_SWIZZLE)
where
  BITOP_BE32_SWIZZLE is (0x1f & ~7)
So,
  htonl(1 << (~fn_bit & 0x1f)) = 1 << ((~fn_bit & 0x1f) ^ (0x1f & ~7))
                               = 1 << ((~fn_bit ^ ~7) & 0x1f)
                               = 1 << ((~fn_bit ^ BITOP_BE32_SWIZZLE) & 0x1f)

In big-endian, BITOP_BE32_SWIZZLE is equal to 0.
  1 << ((~fn_bit ^ BITOP_BE32_SWIZZLE) & 0x1f)
                               = 1 << ((~fn_bit) & 0x1f)
                               = htonl(1 << (~fn_bit & 0x1f))

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-30 23:28:47 -07:00
YOSHIFUJI Hideaki / 吉藤英明 de7737e056 sctp: Use ipv6_addr_diff() in sctp_v6_addr_match_len().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-30 23:28:47 -07:00
Tom Goff 7e5ab15781 net_sched: minor netns related cleanup
These changes were suggested by Alexey Dobriyan <adobriyan@gmail.com>:

  - psched_show() does not use any private data so just pass NULL to
    psched_open()

  - remove unnecessary return statement

Signed-off-by: Tom Goff <thomas.goff@boeing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-30 19:44:56 -07:00
Sjur Braendeland 3908c69023 net-caif: add CAIF Kconfig and Makefiles
Kconfig and Makefiles with options for:
CAIF:        Including caif
CAIF_DEBUG:  CAIF Debug
CAIF_NETDEV: CAIF Network Device for GPRS Contexts

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-30 19:08:49 -07:00
Sjur Braendeland cc36a070b5 net-caif: add CAIF netdevice
Adding GPRS Net Device for PDP Contexts.
The device can be managed by RTNL as defined in if_caif.h.

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-30 19:08:48 -07:00
Sjur Braendeland e6f95ec8db net-caif: add CAIF socket implementation
Implementation of CAIF sockets for protocol and address family
PF_CAIF and AF_CAIF.
CAIF socket is connection oriented implementing SOCK_SEQPACKET
and SOCK_STREAM interface with supporting blocking and non-blocking mode.

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-30 19:08:48 -07:00
Sjur Braendeland c72dfae2f7 net-caif: add CAIF device registration functionality
Registration and deregistration of CAIF Link Layer.

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-30 19:08:47 -07:00
Sjur Braendeland 15c9ac0c80 net-caif: add CAIF generic caif support functions
Support functions for the caif protocol stack:
cfcnfg.c        - CAIF Configuration Module used for
                  adding and removing drivers and connection
cfpkt_skbuff.c  - CAIF Packet layer (SKB helper functions)

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-30 19:08:47 -07:00
Sjur Braendeland b482cd2053 net-caif: add CAIF core protocol stack
CAIF generic protocol implementation. This layer is
somewhat generic in order to be able to use and test it outside
the Linux Kernel.

cfctrl.c     - CAIF control protocol layer
cfdbgl.c     - CAIF debug protocol layer
cfdgml.c     - CAIF datagram protocol layer
cffrml.c     - CAIF framing protocol layer
cfmuxl.c     - CAIF mux protocol layer
cfrfml.c     - CAIF remote file manager protocol layer
cfserl.c     - CAIF serial (fragmentation) protocol layer
cfsrvl.c     - CAIF generic service layer functions
cfutill.c    - CAIF utility protocol layer
cfveil.c     - CAIF AT protocol layer
cfvidl.c     - CAIF video protocol layer

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-30 19:08:46 -07:00
Steven J. Magnani baff42ab14 net: Fix oops from tcp_collapse() when using splice()
tcp_read_sock() can have a eat skbs without immediately advancing copied_seq.
This can cause a panic in tcp_collapse() if it is called as a result
of the recv_actor dropping the socket lock.

A userspace program that splices data from a socket to either another
socket or to a file can trigger this bug.

Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-30 13:56:01 -07:00
Johannes Berg 7236fe29fd mac80211: move netdev queue enabling to correct spot
"mac80211: fix skb buffering issue" still left a race
between enabling the hardware queues and the virtual
interface queues. In hindsight it's totally obvious
that enabling the netdev queues for a hardware queue
when the hardware queue is enabled is wrong, because
it could well possible that we can fill the hw queue
with packets we already have pending. Thus, we must
only enable the netdev queues once all the pending
packets have been processed and sent off to the device.

In testing, I haven't been able to trigger this race
condition, but it's clearly there, possibly only when
aggregation is being enabled.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Cc: stable@kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-30 15:37:28 -04:00
Porsch, Marco 533866b12c mac80211: fix PREQ processing and one small bug
1st) a PREQ should only be processed, if it has the same SN and better
metric (instead of better or equal).
2nd) next_hop[ETH_ALEN] now actually used to buffer
mpath->next_hop->sta.addr for use out of lock.

Signed-off-by: Marco Porsch <marco.porsch@siemens.com>
Acked-by: Javier Cardona <javier@cozybit.com>
Cc: stable@kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-30 15:37:23 -04:00
John W. Linville c7a00dc73b mac80211: correct typos in "unavailable upon resume" warning
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-30 15:37:22 -04:00
John W. Linville 368d06f5b0 wireless: convert reg_regdb_search_lock to mutex
Stanse discovered that kmalloc is being called with GFP_KERNEL while
holding this spinlock.  The spinlock can be a mutex instead, which also
enables the removal of the unlock/lock around the lock/unlock of
cfg80211_mutex and the call to set_regdom.

Reported-by: Jiri Slaby <jirislaby@gmail.com>
Cc: stable@kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-30 15:37:20 -04:00
Tejun Heo 5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Linus Torvalds 6631424fd2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (33 commits)
  r8169: offical fix for CVE-2009-4537 (overlength frame DMAs)
  ipv6: Don't drop cache route entry unless timer actually expired.
  tulip: Add missing parens.
  r8169: fix broken register writes
  pcnet_cs: add new id
  bonding: fix broken multicast with round-robin mode
  drivers/net: Fix continuation lines
  e1000: do not modify tx_queue_len on link speed change
  net: ipmr/ip6mr: prevent out-of-bounds vif_table access
  ixgbe: Do not run all Diagnostic offline tests when VFs are active
  igb: use correct bits to identify if managability is enabled
  benet: Fix compile warnnings in drivers/net/benet/be_ethtool.c
  net: Add MSG_WAITFORONE flag to recvmmsg
  e1000e: do not modify tx_queue_len on link speed change
  igbvf: do not modify tx_queue_len on link speed change
  ipv4: Restart rt_intern_hash after emergency rebuild (v2)
  ipv4: Cleanup struct net dereference in rt_intern_hash
  net: fix netlink address dumping in IPv4/IPv6
  tulip: Fix null dereference in uli526x_rx_packet()
  gianfar: fix undo of reserve()
  ...
2010-03-29 14:41:18 -07:00
David S. Miller 7905e357eb Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2010-03-29 13:50:10 -07:00
Stephen Rothwell 30bde1f507 rps: fix net-sysfs build for !CONFIG_RPS
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-29 01:00:44 -07:00
Eric Dumazet 10f744d205 net: __netif_receive_skb should be static
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-28 23:07:20 -07:00
YOSHIFUJI Hideaki / 吉藤英明 54c1a859ef ipv6: Don't drop cache route entry unless timer actually expired.
This is ipv6 variant of the commit 5e016cbf6.. ("ipv4: Don't drop
redirected route cache entry unless PTMU actually expired")
by Guenter Roeck <guenter.roeck@ericsson.com>.

Remove cache route entry in ipv6_negative_advice() only if
the timer is expired.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-28 19:34:26 -07:00
Jan Engelhardt adcfe1964e net: increase preallocated size of nlmsg to accomodate for IFLA_STATS64
When more data is stuffed into an nlmsg than initially projected, an
extra allocation needs to be done. Reserve enough for IFLA_STATS64 so
that this does not to needlessy happen.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-27 17:15:29 -07:00
Jan Engelhardt 14a4b42bd6 net: fix unaligned access in IFLA_STATS64
Tony Luck observes that the original IFLA_STATS64 submission causes
unaligned accesses. This is because nla_data() returns a pointer to a
memory region that is only aligned to 32 bits. Do some memcpying to
workaround this.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-27 16:35:50 -07:00
Nicolas Dichtel 7438189baa net: ipmr/ip6mr: prevent out-of-bounds vif_table access
When cache is unresolved, c->mf[6]c_parent is set to 65535 and
minvif, maxvif are not initialized, hence we must avoid to
parse IIF and OIF.
A second problem can happen when the user dumps a cache entry
where a VIF, that was referenced at creation time, has been
removed.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-27 08:33:21 -07:00
Brandon L Black 71c5c1595c net: Add MSG_WAITFORONE flag to recvmmsg
Add new flag MSG_WAITFORONE for the recvmmsg() syscall.
When this flag is specified for a blocking socket, recvmmsg()
will only block until at least 1 packet is available.  The
default behavior is to block until all vlen packets are
available.  This flag has no effect on non-blocking sockets
or when used in combination with MSG_DONTWAIT.

Signed-off-by: Brandon L Black <blblack@gmail.com>
Acked-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-27 08:29:01 -07:00
Pavel Emelyanov 6a2bad70d5 ipv4: Restart rt_intern_hash after emergency rebuild (v2)
The the rebuild changes the genid which in turn is used at
the hash calculation. Thus if we don't restart and go on with
inserting the rt will happen in wrong chain.

(Fixed Neil's comment about the index passed into the rt_intern_hash)

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-26 20:57:35 -07:00
Pavel Emelyanov b35ecb5d40 ipv4: Cleanup struct net dereference in rt_intern_hash
There's no need in getting it 3 times and gcc isn't smart enough
to understand this himself.

This is just a cleanup before the fix (next patch).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-26 20:57:35 -07:00
Patrick McHardy 4b97efdf39 net: fix netlink address dumping in IPv4/IPv6
When a dump is interrupted at the last device in a hash chain and
then continued, "idx" won't get incremented past s_idx, so s_ip_idx
is not reset when moving on to the next device. This means of all
following devices only the last n - s_ip_idx addresses are dumped.

Tested-by: Pawel Staszewski <pstaszewski@itcare.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-03-26 20:27:49 -07:00
Tom Goff 66aa4a55fe netlink: use the appropriate namespace pid
This was included in OpenVZ kernels but wasn't integrated upstream.
>From git://git.openvz.org/pub/linux-2.6.24-openvz:

	commit 5c69402f18adf7276352e051ece2cf31feefab02
	Author: Alexey Dobriyan <adobriyan@openvz.org>
	Date:   Mon Dec 24 14:37:45 2007 +0300

	    netlink: fixup ->tgid to work in multiple PID namespaces

Signed-off-by: Tom Goff <thomas.goff@boeing.com>
Acked-by: Alexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-26 20:13:58 -07:00
David S. Miller b79d1d54cf ipv6: Fix result generation in ipv6_get_ifaddr().
Finishing naturally from hlist_for_each_entry(x, ...) does not result
in 'x' being NULL.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-25 21:39:21 -07:00
David S. Miller b54c9b98bb ipv6: Preserve pervious behavior in ipv6_link_dev_addr().
Use list_add_tail() to get the behavior we had before
the list_head conversion for ipv6 address lists.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-25 21:25:30 -07:00
Linus Torvalds 126a031e43 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (25 commits)
  TIPC: Removed inactive maintainer
  isdn: Cleanup Sections in PCMCIA driver elsa
  isdn: Cleanup Sections in PCMCIA driver avma1
  isdn: Cleanup Sections in PCMCIA driver teles
  isdn: Cleanup Sections in PCMCIA driver sedlbauer
  via-velocity: Fix FLOW_CNTL_TX_RX handling in set_mii_flow_control()
  netfilter: xt_hashlimit: IPV6 bugfix
  netfilter: ip6table_raw: fix table priority
  netfilter: xt_hashlimit: dl_seq_stop() fix
  af_key: return error if pfkey_xfrm_policy2msg_prep() fails
  skbuff: remove unused dma_head & dma_maps fields
  vlan: updates vlan real_num_tx_queues
  vlan: adds vlan_dev_select_queue
  igb: only use vlan_gro_receive if vlans are registered
  igb: do not modify tx_queue_len on link speed change
  igb: count Rx FIFO errors correctly
  bnx2: Use proper handler during netpoll.
  bnx2: Fix netpoll crash.
  ksz884x: fix return value of netdev_set_eeprom
  cgroups: net_cls as module
  ...
2010-03-25 14:06:29 -07:00
Eric Dumazet df3345457a rps: add CONFIG_RPS
RPS currently depends on SMP and SYSFS

Adding a CONFIG_RPS makes sense in case this requirement changes in the
future. This patch saves about 1500 bytes of kernel text in case SMP is
on but SYSFS is off.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-25 12:07:00 -07:00
David S. Miller 80bb3a00fa Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6 2010-03-25 11:48:58 -07:00
Eric Dumazet 8f59922914 netfilter: xt_hashlimit: IPV6 bugfix
A missing break statement in hashlimit_ipv6_mask(), and masks
between /64 and /95 are not working at all...

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-03-25 17:25:11 +01:00
Jozsef Kadlecsik 9c13886665 netfilter: ip6table_raw: fix table priority
The order of the IPv6 raw table is currently reversed, that makes impossible
to use the NOTRACK target in IPv6: for example if someone enters

ip6tables -t raw -A PREROUTING -p tcp --dport 80 -j NOTRACK

and if we receive fragmented packets then the first fragment will be
untracked and thus skip nf_ct_frag6_gather (and conntrack), while all
subsequent fragments enter nf_ct_frag6_gather and reassembly will never
successfully be finished.

Singed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-03-25 11:17:26 +01:00
Eric Dumazet 55e0d7cf27 netfilter: xt_hashlimit: dl_seq_stop() fix
If dl_seq_start() memory allocation fails, we crash later in
dl_seq_stop(), trying to kfree(ERR_PTR(-ENOMEM))

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-03-25 11:00:22 +01:00
Linus Torvalds 6c75969e22 Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: don't try to decode GETATTR if DELEGRETURN returned error
  sunrpc: handle allocation errors from __rpc_lookup_create()
  SUNRPC: Fix the return value of rpc_run_bc_task()
  SUNRPC: Fix a use after free bug with the NFSv4.1 backchannel
  SUNRPC: Fix a potential memory leak in auth_gss
  NFS: Prevent another deadlock in nfs_release_page()
2010-03-24 16:50:46 -07:00
Frans Pop a570f095ea tipc: remove trailing space in messages
Signed-off-by: Frans Pop <elendil@planet.nl>
Cc: Per Liden <per.liden@ericsson.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-24 14:01:54 -07:00
Frans Pop b138338056 net: remove trailing space in messages
Signed-off-by: Frans Pop <elendil@planet.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-24 14:01:54 -07:00
Dan Carpenter 18062ca947 rds: cleanup: remove unneeded variable
We never use "sk" so this patch removes it.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-24 13:34:09 -07:00
Dan Carpenter a424077a0a wimax: remove unneeded variable
We never actually use "dev" so I removed it.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-24 13:34:09 -07:00
Dan Carpenter a3dcce97b2 llc: cleanup: remove dead code from llc_init()
We don't need "dev" any more after:
	a5a04819c5
	[LLC]: station source mac address

Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-24 13:34:08 -07:00
Dan Carpenter 9a127aad4d af_key: return error if pfkey_xfrm_policy2msg_prep() fails
The original code saved the error value but just returned 0 in the end.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Jamal Hadi Salim <hadi@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-24 13:28:27 -07:00
Dan Carpenter 14b44974d5 mac80211: remove unneed variable from ieee80211_tx_pending()
We don't need "sdata" any more after:
	d84f323477
	mac80211: remove dev_hold/put calls

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-24 16:04:33 -04:00
Juuso Oikarinen a97c13c345 mac80211: Add support for connection quality monitoring
Add support for the set_cqm_config op. This op function configures the
requested connection quality monitor rssi threshold and rssi hysteresis
values to the hardware  if the hardware supports
IEEE80211_HW_SUPPORTS_CQM.

For unsupported hardware, currently -EOPNOTSUPP is returned, so the mac80211
is currently not doing connection quality monitoring on the host. This could be
added later, if needed.

Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-24 16:04:33 -04:00
Juuso Oikarinen d6dc1a3863 cfg80211: Add connection quality monitoring support to nl80211
Add support for basic configuration of a connection quality monitoring to the
nl80211 interface, and basic support for notifying about triggered monitoring
events.

Via this interface a user-space connection manager may configure and receive
pre-warning events of deteriorating WLAN connection quality, and start
preparing for roaming in advance, before the connection is already lost.

An example usage of such a trigger is starting scanning for nearby AP's in
an attempt to find one with better connection quality, and associate to it
before the connection characteristics of the existing connection become too bad
or the association is even lost, leading in a prolonged delay in connectivity.

The interface currently supports only RSSI, but it could be later extended
to include other parameters, such as signal-to-noise ratio, if need for that
arises.

Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-24 16:02:37 -04:00
Vasu Dev f6b9f4b263 vlan: updates vlan real_num_tx_queues
Updates real_num_tx_queues in case underlying real device
has changed real_num_tx_queues.

-v2
 As per Eric Dumazet<eric.dumazet@gmail.com> comment:-
   -- adds BUG_ON to catch case of real_num_tx_queues exceeding num_tx_queues.
   -- created this self contained patch to just update real_num_tx_queues.

Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-24 11:11:38 -07:00
Vasu Dev 669d3e0bab vlan: adds vlan_dev_select_queue
This is required to correctly select vlan tx queue for a driver
supporting multi tx queue with ndo_select_queue implemented since
currently selected vlan tx queue is unaligned to selected queue by
real net_devce ndo_select_queue.

Unaligned vlan tx queue selection causes thrash with higher vlan
tx lock contention for least fcoe traffic and wrong socket tx
queue_mapping for ixgbe having ndo_select_queue implemented.

-v2

As per Eric Dumazet<eric.dumazet@gmail.com> comments, mirrored
vlan net_device_ops to have them with and without vlan_dev_select_queue
and then select according to real dev ndo_select_queue present or not
for a vlan net_device. This is to completely skip vlan_dev_select_queue
calling for real net_device not supporting ndo_select_queue.

Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-24 11:11:38 -07:00
Tom Herbert e51d739ab7 net: Fix locking in flush_backlog
Need to take spinlocks when dequeuing from input_pkt_queue in flush_backlog.
Also, flush_backlog can now be called directly from netdev_run_todo.

Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-23 23:17:18 -07:00
Juuso Oikarinen 1e4dcd0124 mac80211: Add support for connection monitor in hardware
This patch is based on a RFC patch by Kalle Valo.

The wl1271 has a feature which handles the connection monitor logic
in hardware, basically sending periodically nullfunc frames and reporting
to the host if AP is lost, after attempting to recover by sending
probe-requests to the AP.

Add support to mac80211 by adding a new flag IEEE80211_HW_CONNECTION_MONITOR
which prevents conn_mon_timer from triggering during idle periods, and
prevents sending probe-requests to the AP if beacon-loss is indicated by the
hardware.

Cc: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-23 16:51:42 -04:00
Joe Perches 76326f1d4c net/wireless/wext-core.c: Use IW_EVENT_IDX macro
There's a wireless.h macro for this, might as well use it.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-23 16:50:27 -04:00
Joe Perches 44608f8012 net/wireless/wext_core.c: Use IW_IOCTL_IDX macro
There's a wireless.h macro for this, might as well use it.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-23 16:50:27 -04:00
Ben Blum 8e039d84b3 cgroups: net_cls as module
Allows the net_cls cgroup subsystem to be compiled as a module

This patch modifies net/sched/cls_cgroup.c to allow the net_cls subsystem
to be optionally compiled as a module instead of builtin.  The
cgroup_subsys struct is moved around a bit to allow the subsys_id to be
either declared as a compile-time constant by the cgroup_subsys.h include
in cgroup.h, or, if it's a module, initialized within the struct by
cgroup_load_subsys.

Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-23 13:06:14 -07:00
Tom Goff 7316ae88c4 net_sched: make traffic control network namespace aware
Mostly minor changes to add a net argument to various functions and
remove initial network namespace checks.

Make /proc/net/psched per network namespace.

Signed-off-by: Tom Goff <thomas.goff@boeing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-22 20:26:25 -07:00
Amerigo Wang 5fc05f8764 netpoll: warn when there are spaces in parameters
v2: update according to Frans' comments.

Currently, if we leave spaces before dst port,
netconsole will silently accept it as 0. Warn about this.

Also, when spaces appear in other places, make them
visible in error messages.

Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: David Miller <davem@davemloft.net>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-22 20:05:45 -07:00
David S. Miller 33e2bf6aa1 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
Conflicts:
	Documentation/feature-removal-schedule.txt
	drivers/net/wireless/ath/ath5k/phy.c
2010-03-22 18:15:15 -07:00
Tom Herbert e880eb6c5c rps: Fix build with CONFIG_SYSFS enabled
Fix build with CONFIG_SYSFS not enabled.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-22 18:06:47 -07:00
Patrick McHardy ef1691504c netfilter: xt_recent: fix regression in rules using a zero hit_count
Commit 8ccb92ad (netfilter: xt_recent: fix false match) fixed supposedly
false matches in rules using a zero hit_count. As it turns out there is
nothing false about these matches and people are actually using entries
with a hit_count of zero to make rules dependant on addresses inserted
manually through /proc.

Since this slipped past the eyes of three reviewers, instead of
reverting the commit in question, this patch explicitly checks
for a hit_count of zero to make the intentions more clear.

Reported-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Tested-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Cc: stable@kernel.org
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-03-22 18:25:20 +01:00
Linus Torvalds 258152acc0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (38 commits)
  ip_gre: include route header_len in max_headroom calculation
  if_tunnel.h: add missing ams/byteorder.h include
  ipv4: Don't drop redirected route cache entry unless PTMU actually expired
  net: suppress lockdep-RCU false positive in FIB trie.
  Bluetooth: Fix kernel crash on L2CAP stress tests
  Bluetooth: Convert debug files to actually use debugfs instead of sysfs
  Bluetooth: Fix potential bad memory access with sysfs files
  netfilter: ctnetlink: fix reliable event delivery if message building fails
  netlink: fix NETLINK_RECV_NO_ENOBUFS in netlink_set_err()
  NET_DMA: free skbs periodically
  netlink: fix unaligned access in nla_get_be64()
  tcp: Fix tcp_mark_head_lost() with packets == 0
  net: ipmr/ip6mr: fix potential out-of-bounds vif_table access
  KS8695: update ksp->next_rx_desc_read at the end of rx loop
  igb: Add support for 82576 ET2 Quad Port Server Adapter
  ixgbevf: Message formatting cleanups
  ixgbevf: Shorten up delay timer for watchdog task
  ixgbevf: Fix VF Stats accounting after reset
  ixgbe: Set IXGBE_RSC_CB(skb)->DMA field to zero after unmapping the address
  ixgbe: fix for real_num_tx_queues update issue
  ...
2010-03-22 10:01:58 -07:00
Tetsuo Handa c3824d21eb rxrpc: Check allocation failure.
alloc_skb() can return NULL.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-22 09:57:19 -07:00
Dan Carpenter f1f0abe192 sunrpc: handle allocation errors from __rpc_lookup_create()
__rpc_lookup_create() can return ERR_PTR(-ENOMEM).

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-03-22 05:34:13 -04:00
Trond Myklebust ff0901f803 SUNRPC: Fix the return value of rpc_run_bc_task()
Currently rpc_run_bc_task() will return NULL if the task allocation failed.
However the only caller is bc_send, which assumes that the return value
will be an ERR_PTR.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-22 05:34:12 -04:00
Trond Myklebust c9acb42ef1 SUNRPC: Fix a use after free bug with the NFSv4.1 backchannel
The ->release_request() callback was designed to allow the transport layer
to do housekeeping after the RPC call is done. It cannot be used to free
the request itself, and doing so leads to a use-after-free bug in
xprt_release().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-22 05:32:44 -04:00
Timo Teräs 243aad830e ip_gre: include route header_len in max_headroom calculation
Taking route's header_len into account, and updating gre device
needed_headroom will give better hints on upper bound of required
headroom. This is useful if the gre traffic is xfrm'ed.

Signed-off-by: Timo Teras <timo.teras@iki.fi>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-21 21:23:28 -07:00
Dan Carpenter 7668448ea9 bridge: cleanup: remove unused assignment
We never actually use iph again so this assignment can be removed.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-21 21:21:58 -07:00
Guenter Roeck 5e016cbf6c ipv4: Don't drop redirected route cache entry unless PTMU actually expired
TCP sessions over IPv4 can get stuck if routers between endpoints
do not fragment packets but implement PMTU instead, and we are using
those routers because of an ICMP redirect.

Setup is as follows

       MTU1    MTU2   MTU1
    A--------B------C------D

with MTU1 > MTU2. A and D are endpoints, B and C are routers. B and C
implement PMTU and drop packets larger than MTU2 (for example because
DF is set on all packets). TCP sessions are initiated between A and D.
There is packet loss between A and D, causing frequent TCP
retransmits.

After the number of retransmits on a TCP session reaches tcp_retries1,
tcp calls dst_negative_advice() prior to each retransmit. This results
in route cache entries for the peer to be deleted in
ipv4_negative_advice() if the Path MTU is set.

If the outstanding data on an affected TCP session is larger than
MTU2, packets sent from the endpoints will be dropped by B or C, and
ICMP NEEDFRAG will be returned. A and D receive NEEDFRAG messages and
update PMTU.

Before the next retransmit, tcp will again call dst_negative_advice(),
causing the route cache entry (with correct PMTU) to be deleted. The
retransmitted packet will be larger than MTU2, causing it to be
dropped again.

This sequence repeats until the TCP session aborts or is terminated.

Problem is fixed by removing redirected route cache entries in
ipv4_negative_advice() only if the PMTU is expired.

Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-21 20:55:13 -07:00
Robert Olsson e99b99b471 pktgen node allocation
Here is patch to manipulate packet node allocation and implicitly
how packets are DMA'd etc.

The flag NODE_ALLOC enables the function and numa_node_id();
when enabled it can also be explicitly controlled via a new
node parameter

Tested this with 10 Intel 82599 ports w. TYAN S7025 E5520 CPU's.
Was able to TX/DMA ~80 Gbit/s to Ethernet wires.

Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-21 20:33:36 -07:00
Eric Dumazet 99fe3c391d net: dev_getfirstbyhwtype() optimization
Use RCU to avoid RTNL use in dev_getfirstbyhwtype()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-21 20:33:36 -07:00
Eric Dumazet ec733b15a3 net: snmp mib cleanup
There is no point to align or pad mibs to cache lines, they are per cpu
allocated with a 8 bytes alignment anyway.
This wastes space for no gain. This patch removes __SNMP_MIB_ALIGN__

Since SNMP mibs contain "unsigned long" fields only, we can relax the
allocation alignment from "unsigned long long" to "unsigned long"

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-21 18:34:16 -07:00
Eric Dumazet 62c97ac04a atm: Use kasprintf
Use kasprintf in atm_proc_dev_register()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-21 18:34:15 -07:00
Eric Dumazet 283f2fe87e net: speedup netdev_set_master()
We currently force a synchronize_net() in netdev_set_master()

This seems necessary only when a slave had a master and we dismantle it.

In the other case ("ifenslave bond0 ethO"), we dont need this long
delay.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-21 18:34:15 -07:00
Eric Dumazet 907cdda520 tcp: Add SNMP counter for DEFER_ACCEPT
Its currently hard to diagnose when ACK frames are dropped because an
application set TCP_DEFER_ACCEPT on its listening socket.

See http://bugzilla.kernel.org/show_bug.cgi?id=15507

This patch adds a SNMP value, named TCPDeferAcceptDrop

netstat -s | grep TCPDeferAcceptDrop
    TCPDeferAcceptDrop: 0

This counter is incremented every time we drop a pure ACK frame received
by a socket in SYN_RECV state because its SYNACK retrans count is lower
than defer_accept value.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-21 18:31:35 -07:00
Jiri Pirko 32a806c194 bonding: flush unicast and multicast lists when changing type
After the type change, addresses in unicast and multicast lists wouldn't make
sense, not to mention possible different lenghts. So flush both lists here.

Note "dev_addr_discard" will be very soon replaced by "dev_mc_flush" (once
mc_list conversion will be done).

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-21 18:31:34 -07:00
Patrick McHardy 755d0e77ac net: rtnetlink: ignore NETDEV_PRE_TYPE_CHANGE in rtnetlink_event()
Ignore the new NETDEV_PRE_TYPE_CHANGE event in rtnetlink_event() since
there have been no changes userspace needs to be notified of.

Also add a comment to the netdev notifier event definitions to remind
people to update the exclusion list when adding new event types.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-21 18:31:34 -07:00
David S. Miller e3a61d47cc Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6 2010-03-21 18:03:11 -07:00
Paul E. McKenney 634a4b2038 net: suppress lockdep-RCU false positive in FIB trie.
Allow fib_find_node() to be called either under rcu_read_lock()
protection or with RTNL held.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-21 18:01:05 -07:00
Trond Myklebust cdead7cf12 SUNRPC: Fix a potential memory leak in auth_gss
The function alloc_enc_pages() currently fails to release the pointer
rqstp->rq_enc_pages in the error path.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: stable@kernel.org
2010-03-21 12:22:49 -04:00
Andrei Emeltchenko c2c77ec83b Bluetooth: Fix kernel crash on L2CAP stress tests
Added very simple check that req buffer has enough space to
fit configuration parameters. Shall be enough to reject packets
with configuration size more than req buffer.

Crash trace below

[ 6069.659393] Unable to handle kernel paging request at virtual address 02000205
[ 6069.673034] Internal error: Oops: 805 [#1] PREEMPT
...
[ 6069.727172] PC is at l2cap_add_conf_opt+0x70/0xf0 [l2cap]
[ 6069.732604] LR is at l2cap_recv_frame+0x1350/0x2e78 [l2cap]
...
[ 6070.030303] Backtrace:
[ 6070.032806] [<bf1c2880>] (l2cap_add_conf_opt+0x0/0xf0 [l2cap]) from
[<bf1c6624>] (l2cap_recv_frame+0x1350/0x2e78 [l2cap])
[ 6070.043823]  r8:dc5d3100 r7:df2a91d6 r6:00000001 r5:df2a8000 r4:00000200
[ 6070.050659] [<bf1c52d4>] (l2cap_recv_frame+0x0/0x2e78 [l2cap]) from
[<bf1c8408>] (l2cap_recv_acldata+0x2bc/0x350 [l2cap])
[ 6070.061798] [<bf1c814c>] (l2cap_recv_acldata+0x0/0x350 [l2cap]) from
[<bf0037a4>] (hci_rx_task+0x244/0x478 [bluetooth])
[ 6070.072631]  r6:dc647700 r5:00000001 r4:df2ab740
[ 6070.077362] [<bf003560>] (hci_rx_task+0x0/0x478 [bluetooth]) from
[<c006b9fc>] (tasklet_action+0x78/0xd8)
[ 6070.087005] [<c006b984>] (tasklet_action+0x0/0xd8) from [<c006c160>]

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com>
Acked-by: Gustavo F. Padovan <gustavo@padovan.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-03-21 05:49:36 +01:00
Marcel Holtmann aef7d97cc6 Bluetooth: Convert debug files to actually use debugfs instead of sysfs
Some of the debug files ended up wrongly in sysfs, because at that point
of time, debugfs didn't exist. Convert these files to use debugfs and
also seq_file. This patch converts all of these files at once and then
removes the exported symbol for the Bluetooth sysfs class.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-03-21 05:49:35 +01:00
Marcel Holtmann 101545f6fe Bluetooth: Fix potential bad memory access with sysfs files
When creating a high number of Bluetooth sockets (L2CAP, SCO
and RFCOMM) it is possible to scribble repeatedly on arbitrary
pages of memory. Ensure that the content of these sysfs files is
always less than one page. Even if this means truncating. The
files in question are scheduled to be moved over to debugfs in
the future anyway.

Based on initial patches from Neil Brown and Linus Torvalds

Reported-by: Neil Brown <neilb@suse.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-03-21 05:49:32 +01:00
David S. Miller 3e81c6da39 ipv6: Fix bug in ipv6_chk_same_addr().
hlist_for_each_entry(p...) will not necessarily initialize 'p'
to anything if the hlist is empty.  GCC notices this and emits
a warning.

Just return true explicitly when we hit a match, and return
false is we fall out of the loop without one.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-20 16:18:00 -07:00
YOSHIFUJI Hideaki b2db756449 ipv6: Reduce timer events for addrconf_verify().
This patch reduces timer events while keeping accuracy by rounding
our timer and/or batching several address validations in addrconf_verify().

addrconf_verify() is called at earliest timeout among interface addresses'
timeouts, but at maximum ADDR_CHECK_FREQUENCY (120 secs).

In most cases, all of timeouts of interface addresses are long enough
(e.g. several hours or days vs 2 minutes), this timer is usually called
every ADDR_CHECK_FREQUENCY, and it is okay to be lazy.
(Note this timer could be eliminated if all code paths which modifies
variables related to timeouts call us manually, but it is another story.)

However, in other least but important cases, we try keeping accuracy.

When the real interface address timeout is coming, and the timeout
is just before the rounded timeout, we accept some error.

When a timeout has been reached, we also try batching other several
events in very near future.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-20 16:11:12 -07:00
stephen hemminger 88949cf484 IPv6: addrconf cleanup addrconf_verify
The variable regen_advance is only used in the privacy case.
Move it to simplify code and eliminate ifdef's

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-20 16:09:11 -07:00
Stephen Hemminger e21e8467d3 addrconf: checkpatch fixes
Fix some of the checkpatch complaints.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-20 16:09:01 -07:00
Stephen Hemminger bcdd553fd3 IPv6: addrconf cleanups
Some minor stuff, reformat comments and add whitespace for clarity

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-20 16:08:18 -07:00
stephen hemminger 502a2ffd73 ipv6: convert idev_list to list macros
Convert to list macro's for the list of addresses per interface
in IPv6.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-20 15:45:09 -07:00
stephen hemminger 3a88a81d89 ipv6: user better hash for addrconf
The existing hash function has a couple of issues:
  * it is hardwired to 16 for IN6_ADDR_HSIZE
  * limited to 256 and callers using int
  * use jhash2 rather than some old BSD algorithm

No need for random seed since this is local only (based on assigned
addresses) table.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-20 15:44:35 -07:00
stephen hemminger 5c578aedcb IPv6: convert addrconf hash list to RCU
Convert from reader/writer lock to RCU and spinlock for addrconf
hash list.

Adds an additional helper macro for hlist_for_each_entry_continue_rcu
to handle the continue case.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-20 15:44:35 -07:00
stephen hemminger c2e21293c0 ipv6: convert addrconf list to hlist
Using hash list macros, simplifies code and helps later RCU.

This patch includes some initialization that is not strictly necessary,
since an empty hlist node/list is all zero; and list is in BSS
and node is allocated with kzalloc.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-20 15:44:34 -07:00
stephen hemminger 372e6c8f1f ipv6: convert temporary address list to list macros
Use list macros instead of open coded linked list.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-20 15:44:34 -07:00
David S. Miller e77c8e83dd Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-03-20 15:24:29 -07:00
Pablo Neira Ayuso 37b7ef7203 netfilter: ctnetlink: fix reliable event delivery if message building fails
This patch fixes a bug that allows to lose events when reliable
event delivery mode is used, ie. if NETLINK_BROADCAST_SEND_ERROR
and NETLINK_RECV_NO_ENOBUFS socket options are set.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-20 14:29:03 -07:00
Pablo Neira Ayuso 1a50307ba1 netlink: fix NETLINK_RECV_NO_ENOBUFS in netlink_set_err()
Currently, ENOBUFS errors are reported to the socket via
netlink_set_err() even if NETLINK_RECV_NO_ENOBUFS is set. However,
that should not happen. This fixes this problem and it changes the
prototype of netlink_set_err() to return the number of sockets that
have set the NETLINK_RECV_NO_ENOBUFS socket option. This return
value is used in the next patch in these bugfix series.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-20 14:29:03 -07:00
Steven J. Magnani 73852e8151 NET_DMA: free skbs periodically
Under NET_DMA, data transfer can grind to a halt when userland issues a
large read on a socket with a high RCVLOWAT (i.e., 512 KB for both).
This appears to be because the NET_DMA design queues up lots of memcpy
operations, but doesn't issue or wait for them (and thus free the
associated skbs) until it is time for tcp_recvmesg() to return.
The socket hangs when its TCP window goes to zero before enough data is
available to satisfy the read.

Periodically issue asynchronous memcpy operations, and free skbs for ones
that have completed, to prevent sockets from going into zero-window mode.

Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-20 14:29:02 -07:00
Lennart Schulte 6830c25b7d tcp: Fix tcp_mark_head_lost() with packets == 0
A packet is marked as lost in case packets == 0, although nothing should be done.
This results in a too early retransmitted packet during recovery in some cases.
This small patch fixes this issue by returning immediately.

Signed-off-by: Lennart Schulte <lennart.schulte@nets.rwth-aachen.de>
Signed-off-by: Arnd Hannemann <hannemann@nets.rwth-aachen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-19 22:47:22 -07:00
Patrick McHardy a50436f2cd net: ipmr/ip6mr: fix potential out-of-bounds vif_table access
mfc_parent of cache entries is used to index into the vif_table and is
initialised from mfcctl->mfcc_parent. This can take values of to 2^16-1,
while the vif_table has only MAXVIFS (32) entries. The same problem
affects ip6mr.

Refuse invalid values to fix a potential out-of-bounds access. Unlike
the other validity checks, this is checked in ipmr_mfc_add() instead of
the setsockopt handler since its unused in the delete path and might be
uninitialized.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-19 22:47:22 -07:00
stephen hemminger 97e3ecd112 TCP: check min TTL on received ICMP packets
This adds RFC5082 checks for TTL on received ICMP packets.
It adds some security against spoofed ICMP packets
disrupting GTSM protected sessions.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-19 21:00:42 -07:00
Herbert Xu 10414444cb ipv6: Remove redundant dst NULL check in ip6_dst_check
As the only path leading to ip6_dst_check makes an indirect call
through dst->ops, dst cannot be NULL in ip6_dst_check.

This patch removes this check in case it misleads people who
come across this code.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-19 21:00:42 -07:00
Timo Teräs d11a4dc18b ipv4: check rt_genid in dst_check
Xfrm_dst keeps a reference to ipv4 rtable entries on each
cached bundle. The only way to renew xfrm_dst when the underlying
route has changed, is to implement dst_check for this. This is
what ipv6 side does too.

The problems started after 87c1e12b5e
("ipsec: Fix bogus bundle flowi") which fixed a bug causing xfrm_dst
to not get reused, until that all lookups always generated new
xfrm_dst with new route reference and path mtu worked. But after the
fix, the old routes started to get reused even after they were expired
causing pmtu to break (well it would occationally work if the rtable
gc had run recently and marked the route obsolete causing dst_check to
get called).

Signed-off-by: Timo Teras <timo.teras@iki.fi>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-19 21:00:41 -07:00
florian@mickler.org 819bfecc4f rename new rfkill sysfs knobs
This patch renames the (never officially released) sysfs-knobs
"blocked_hw" and "blocked_sw" to "hard" and "soft", as the hardware vs
software conotation is misleading.

It also gets rid of not needed locks around u32-read-access.

Signed-off-by: Florian Mickler <florian@mickler.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-19 15:48:25 -04:00
Eric Dumazet 0641e4fbf2 net: Potential null skb->dev dereference
When doing "ifenslave -d bond0 eth0", there is chance to get NULL
dereference in netif_receive_skb(), because dev->master suddenly becomes
NULL after we tested it.

We should use ACCESS_ONCE() to avoid this (or rcu_dereference())

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-18 21:16:45 -07:00
Alexandra Kossovsky b634f87522 tcp: Fix OOB POLLIN avoidance.
From: Alexandra.Kossovsky@oktetlabs.ru

Fixes kernel bugzilla #15541

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-18 20:29:24 -07:00
Jiri Pirko 1c01fe14a8 net: forbid underlaying devices to change its type
It's not desired for underlaying devices to change type. At the time,
there is for example possible to have bond with changed type from
Ethernet to Infiniband as a port of a bridge. This patch fixes this.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-18 20:00:02 -07:00
Jiri Pirko 3ca5b4042e bonding: check return value of nofitier when changing type
This patch adds the possibility to refuse the bonding type change for
other subsystems (such as for example bridge, vlan, etc.)

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-18 20:00:02 -07:00
Jiri Pirko 93d9b7d7a8 net: rename notifier defines for netdev type change
Since generally there could be more netdevices changing type other
than bonding, making this event type name "bonding-unrelated"

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-18 20:00:01 -07:00
Tom Herbert 1e94d72fea rps: Fixed build with CONFIG_SMP not enabled.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-18 17:45:44 -07:00
Linus Torvalds 7c34691abe Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: ensure bdi_unregister is called on mount failure.
  NFS: Avoid a deadlock in nfs_release_page
  NFSv4: Don't ignore the NFS_INO_REVAL_FORCED flag in nfs_revalidate_inode()
  nfs4: Make the v4 callback service hidden
  nfs: fix unlikely memory leak
  rpc client can not deal with ENOSOCK, so translate it into ENOCONN
2010-03-18 16:50:09 -07:00
Jiri Pirko ff6e2163f2 net: convert multiple drivers to use netdev_for_each_mc_addr, part7
In mlx4, using char * to store mc address in private structure instead.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 21:23:25 -07:00
Neil Horman ca50910185 tipc: Allow retransmission of cloned buffers
Forward port commit
fc477e160af086f6e30c3d4fdf5f5c000d29beb5
from git://tipc.cslab.ericsson.net/pub/git/people/allan/tipc.git

Origional commit message:

Allow retransmission of cloned buffers

This patch fixes an issue with TIPC's message retransmission logic
that prevented retransmission of clone sk_buffs.  Originally intended
as a means of avoiding wasted work in retransmitting messages that
were still on the driver's outbound queue, it also prevented TIPC
from retransmitting messages through other means -- such as the
secondary bearer of the broadcast link, or another interface in a
set of bonded interfaces.  This fix removes existing checks for
cloned sk_buffs that prevented such retransmission.

Origionally-Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 21:23:24 -07:00
Neil Horman 1a624832a0 tipc: Increase frequency of load distribution over broadcast link
Forward port commit 29eb572941501c40ac6e62dbc5043bf9ee76ee56
from git://tipc.cslab.ericsson.net/pub/git/people/allan/tipc.git

Origional commit message:
Increase frequency of load distribution over broadcast link

This patch enhances the behavior of TIPC's broadcast link so that it
alternates between redundant bearers (if available) after every
message sent, rather than after every 10 messages.  This change helps
to speed up delivery of retransmitted messages by ensuring that
they are not sent repeatedly over a bearer that is no longer working,
but not yet recognized as failed.

Tested by myself in the latest net-2.6 tree using the tipc sanity test suite

Origionally-signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

bcast.c |   35 ++++++++++++++---------------------
1 file changed, 14 insertions(+), 21 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 21:23:23 -07:00
Jan Engelhardt 10708f37ae net: core: add IFLA_STATS64 support
`ip -s link` shows interface counters truncated to 32 bit. This is
because interface statistics are transported only in 32-bit quantity
to userspace. This commit adds a new IFLA_STATS64 attribute that
exports them in full 64 bit.

References: http://lkml.indiana.edu/hypermail/linux/kernel/0307.3/0215.html
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 21:23:22 -07:00
Jan Engelhardt 6ce1a6df6e net: tcp: make veno selectable as default congestion module
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 21:23:21 -07:00
Jan Engelhardt dd2acaa7bc net: tcp: make hybla selectable as default congestion module
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 21:23:20 -07:00
Eric Dumazet 2fb3573dfb net: remove rcu locking from fib_rules_event()
We hold RTNL at this point and dont use RCU variants of list traversals,
we dont need rcu_read_lock()/rcu_read_unlock()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 21:23:19 -07:00
stephen hemminger 14bb478983 bridge: per-cpu packet statistics (v3)
The shared packet statistics are a potential source of slow down
on bridged traffic. Convert to per-cpu array, but only keep those
statistics which change per-packet.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 21:23:19 -07:00
Tom Herbert 0a9627f264 rps: Receive Packet Steering
This patch implements software receive side packet steering (RPS).  RPS
distributes the load of received packet processing across multiple CPUs.

Problem statement: Protocol processing done in the NAPI context for received
packets is serialized per device queue and becomes a bottleneck under high
packet load.  This substantially limits pps that can be achieved on a single
queue NIC and provides no scaling with multiple cores.

This solution queues packets early on in the receive path on the backlog queues
of other CPUs.   This allows protocol processing (e.g. IP and TCP) to be
performed on packets in parallel.   For each device (or each receive queue in
a multi-queue device) a mask of CPUs is set to indicate the CPUs that can
process packets. A CPU is selected on a per packet basis by hashing contents
of the packet header (e.g. the TCP or UDP 4-tuple) and using the result to index
into the CPU mask.  The IPI mechanism is used to raise networking receive
softirqs between CPUs.  This effectively emulates in software what a multi-queue
NIC can provide, but is generic requiring no device support.

Many devices now provide a hash over the 4-tuple on a per packet basis
(e.g. the Toeplitz hash).  This patch allow drivers to set the HW reported hash
in an skb field, and that value in turn is used to index into the RPS maps.
Using the HW generated hash can avoid cache misses on the packet when
steering it to a remote CPU.

The CPU mask is set on a per device and per queue basis in the sysfs variable
/sys/class/net/<device>/queues/rx-<n>/rps_cpus.  This is a set of canonical
bit maps for receive queues in the device (numbered by <n>).  If a device
does not support multi-queue, a single variable is used for the device (rx-0).

Generally, we have found this technique increases pps capabilities of a single
queue device with good CPU utilization.  Optimal settings for the CPU mask
seem to depend on architectures and cache hierarcy.  Below are some results
running 500 instances of netperf TCP_RR test with 1 byte req. and resp.
Results show cumulative transaction rate and system CPU utilization.

e1000e on 8 core Intel
   Without RPS: 108K tps at 33% CPU
   With RPS:    311K tps at 64% CPU

forcedeth on 16 core AMD
   Without RPS: 156K tps at 15% CPU
   With RPS:    404K tps at 49% CPU

bnx2x on 16 core AMD
   Without RPS  567K tps at 61% CPU (4 HW RX queues)
   Without RPS  738K tps at 96% CPU (8 HW RX queues)
   With RPS:    854K tps at 76% CPU (4 HW RX queues)

Caveats:
- The benefits of this patch are dependent on architecture and cache hierarchy.
Tuning the masks to get best performance is probably necessary.
- This patch adds overhead in the path for processing a single packet.  In
a lightly loaded server this overhead may eliminate the advantages of
increased parallelism, and possibly cause some relative performance degradation.
We have found that masks that are cache aware (share same caches with
the interrupting CPU) mitigate much of this.
- The RPS masks can be changed dynamically, however whenever the mask is changed
this introduces the possibility of generating out of order packets.  It's
probably best not change the masks too frequently.

Signed-off-by: Tom Herbert <therbert@google.com>

 include/linux/netdevice.h |   32 ++++-
 include/linux/skbuff.h    |    3 +
 net/core/dev.c            |  335 +++++++++++++++++++++++++++++++++++++--------
 net/core/net-sysfs.c      |  225 ++++++++++++++++++++++++++++++-
 net/core/skbuff.c         |    2 +
 5 files changed, 538 insertions(+), 59 deletions(-)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 21:23:18 -07:00
Tina Yang 768bbedf9c RDS: Enable per-cpu workqueue threads
Create per-cpu workqueue threads instead of a single
krdsd thread. This is a step towards better scalability.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 21:17:02 -07:00
Andy Grover 561c7df63e RDS: Do not call set_page_dirty() with irqs off
set_page_dirty() unconditionally re-enables interrupts, so
if we call it with irqs off, they will be on after the call,
and that's bad. This patch moves the call after we've re-enabled
interrupts in send_drop_to(), so it's safe.

Also, add BUG_ONs to let us know if we ever do call set_page_dirty
with interrupts off.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 21:17:01 -07:00
Sherman Pun 450d06c020 RDS: Properly unmap when getting a remote access error
If the RDMA op has aborted with a remote access error,
in addition to what we already do (tell userspace it has
completed with an error) also unmap it and put() the rm.

Otherwise, hangs may occur on arches that track maps and
will not exit without proper cleanup.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 21:17:00 -07:00
Andy Grover b98ba52f96 RDS: only put sockets that have seen congestion on the poll_waitq
rds_poll_waitq's listeners will be awoken if we receive a congestion
notification. Bad performance may result because *all* polled sockets
contend for this single lock. However, it should not be necessary to
wake pollers when a congestion update arrives if they have never
experienced congestion, and not putting these on the waitq will
hopefully greatly reduce contention.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 21:16:59 -07:00
Tina Yang 550a8002e4 RDS: Fix locking in rds_send_drop_to()
It seems rds_send_drop_to() called
__rds_rdma_send_complete(rs, rm, RDS_RDMA_CANCELED)
with only rds_sock lock, but not rds_message lock. It raced with
other threads that is attempting to modify the rds_message as well,
such as from within rds_rdma_send_complete().

Signed-off-by: Tina Yang <tina.yang@oracle.com>
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 21:16:58 -07:00
Andy Grover 97069788d6 RDS: Turn down alarming reconnect messages
RDS's error messages when a connection goes down are a little
extreme. A connection may go down, and it will be re-established,
and everything is fine. This patch links these messages through
rdsdebug(), instead of to printk directly.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 21:16:57 -07:00
Andy Grover 571c02fa81 RDS: Workaround for in-use MRs on close causing crash
if a machine is shut down without closing sockets properly, and
freeing all MRs, then a BUG_ON will bring it down. This patch
changes these to WARN_ONs -- leaking MRs is not fatal (although
not ideal, and there is more work to do here for a proper fix.)

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 21:16:56 -07:00
Tina Yang 048c15e641 RDS: Fix send locking issue
Fix a deadlock between rds_rdma_send_complete() and
rds_send_remove_from_sock() when rds socket lock and
rds message lock are acquired out-of-order.

Signed-off-by: Tina Yang <Tina.Yang@oracle.com>
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 21:16:55 -07:00
Andy Grover 2e7b3b9945 RDS: Fix congestion issues for loopback
We have two kinds of loopback: software (via loop transport)
and hardware (via IB). sw is used for 127.0.0.1, and doesn't
support rdma ops. hw is used for sends to local device IPs,
and supports rdma. Both are used in different cases.

For both of these, when there is a congestion map update, we
want to call rds_cong_map_updated() but not actually send
anything -- since loopback local and foreign congestion maps
point to the same spot, they're already in sync.

The old code never called sw loop's xmit_cong_map(),so
rds_cong_map_updated() wasn't being called for it. sw loop
ports would not work right with the congestion monitor.

Fixing that meant that hw loopback now would send congestion maps
to itself. This is also undesirable (racy), so we check for this
case in the ib-specific xmit code.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 21:16:55 -07:00
Andy Grover 8e82376e5f RDS/TCP: Wait to wake thread when write space available
Instead of waking the send thread whenever any send space is available,
wait until it is at least half empty. This is modeled on how
sock_def_write_space() does it, and may help to minimize context
switches.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 21:16:55 -07:00
Andy Grover b075cfdb66 RDS: update copy_to_user state in tcp transport
Other transports use rds_page_copy_user, which updates our
s_copy_to_user counter. TCP doesn't, so it needs to explicity
call rds_stats_add().

Reported-by: Richard Frank <richard.frank@oracle.com>
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 21:16:54 -07:00
Andy Grover 1123fd734d RDS: sendmsg() should check sndtimeo, not rcvtimeo
Most likely cut n paste error - sendmsg() was checking sock_rcvtimeo.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 21:16:54 -07:00
Andy Grover 735f61e626 RDS: Do not BUG() on error returned from ib_post_send
BUGging on a runtime error code should be avoided. This
patch also eliminates all other BUG()s that have no real
reason to exist.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 21:16:53 -07:00
David S. Miller 87faf3ccf1 bridge: Make first arg to deliver_clone const.
Otherwise we get a warning from the call in br_forward().

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 14:37:47 -07:00
YOSHIFUJI Hideaki / 吉藤英明 32dec5dd02 bridge br_multicast: Don't refer to BR_INPUT_SKB_CB(skb)->mrouters_only without IGMP snooping.
Without CONFIG_BRIDGE_IGMP_SNOOPING,
BR_INPUT_SKB_CB(skb)->mrouters_only is not appropriately
initialized, so we can see garbage.

A clear option to fix this is to set it even without that
config, but we cannot optimize out the branch.

Let's introduce a macro that returns value of mrouters_only
and let it return 0 without CONFIG_BRIDGE_IGMP_SNOOPING.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 14:34:23 -07:00
Vitaliy Gusev 858a18a6a2 route: Fix caught BUG_ON during rt_secret_rebuild_oneshot()
route: Fix caught BUG_ON during rt_secret_rebuild_oneshot()

Call rt_secret_rebuild can cause BUG_ON(timer_pending(&net->ipv4.rt_secret_timer)) in
add_timer as there is not any synchronization for call rt_secret_rebuild_oneshot()
for the same net namespace.

Also this issue affects to rt_secret_reschedule().

Thus use mod_timer enstead.

Signed-off-by: Vitaliy Gusev <vgusev@openvz.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 14:15:47 -07:00
YOSHIFUJI Hideaki / 吉藤英明 8440853bb7 bridge br_multicast: Fix skb leakage in error path.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 14:15:46 -07:00
YOSHIFUJI Hideaki / 吉藤英明 0ba8c9ec25 bridge br_multicast: Fix handling of Max Response Code in IGMPv3 message.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 14:15:46 -07:00
Jiri Slaby 21edbb223e NET: netpoll, fix potential NULL ptr dereference
Stanse found that one error path in netpoll_setup dereferences npinfo
even though it is NULL. Avoid that by adding new label and go to that
instead.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Daniel Borkmann <danborkmann@googlemail.com>
Cc: David S. Miller <davem@davemloft.net>
Acked-by: chavey@google.com
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 14:15:45 -07:00
Neil Horman a2f46ee1ba tipc: fix lockdep warning on address assignment
So in the forward porting of various tipc packages, I was constantly
getting this lockdep warning everytime I used tipc-config to set a network
address for the protocol:

[ INFO: possible circular locking dependency detected ]
2.6.33 #1
tipc-config/1326 is trying to acquire lock:
(ref_table_lock){+.-...}, at: [<ffffffffa0315148>] tipc_ref_discard+0x53/0xd4 [tipc]

but task is already holding lock:
(&(&entry->lock)->rlock#2){+.-...}, at: [<ffffffffa03150d5>] tipc_ref_lock+0x43/0x63 [tipc]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&(&entry->lock)->rlock#2){+.-...}:
[<ffffffff8107b508>] __lock_acquire+0xb67/0xd0f
[<ffffffff8107b78c>] lock_acquire+0xdc/0x102
[<ffffffff8145471e>] _raw_spin_lock_bh+0x3b/0x6e
[<ffffffffa03152b1>] tipc_ref_acquire+0xe8/0x11b [tipc]
[<ffffffffa031433f>] tipc_createport_raw+0x78/0x1b9 [tipc]
[<ffffffffa031450b>] tipc_createport+0x8b/0x125 [tipc]
[<ffffffffa030f221>] tipc_subscr_start+0xce/0x126 [tipc]
[<ffffffffa0308fb2>] process_signal_queue+0x47/0x7d [tipc]
[<ffffffff81053e0c>] tasklet_action+0x8c/0xf4
[<ffffffff81054bd8>] __do_softirq+0xf8/0x1cd
[<ffffffff8100aadc>] call_softirq+0x1c/0x30
[<ffffffff810549f4>] _local_bh_enable_ip+0xb8/0xd7
[<ffffffff81054a21>] local_bh_enable_ip+0xe/0x10
[<ffffffff81454d31>] _raw_spin_unlock_bh+0x34/0x39
[<ffffffffa0308eb8>] spin_unlock_bh.clone.0+0x15/0x17 [tipc]
[<ffffffffa0308f47>] tipc_k_signal+0x8d/0xb1 [tipc]
[<ffffffffa0308dd9>] tipc_core_start+0x8a/0xad [tipc]
[<ffffffffa01b1087>] 0xffffffffa01b1087
[<ffffffff8100207d>] do_one_initcall+0x72/0x18a
[<ffffffff810872fb>] sys_init_module+0xd8/0x23a
[<ffffffff81009b42>] system_call_fastpath+0x16/0x1b

-> #0 (ref_table_lock){+.-...}:
[<ffffffff8107b3b2>] __lock_acquire+0xa11/0xd0f
[<ffffffff8107b78c>] lock_acquire+0xdc/0x102
[<ffffffff81454836>] _raw_write_lock_bh+0x3b/0x6e
[<ffffffffa0315148>] tipc_ref_discard+0x53/0xd4 [tipc]
[<ffffffffa03141ee>] tipc_deleteport+0x40/0x119 [tipc]
[<ffffffffa0316e35>] release+0xeb/0x137 [tipc]
[<ffffffff8139dbf4>] sock_release+0x1f/0x6f
[<ffffffff8139dc6b>] sock_close+0x27/0x2b
[<ffffffff811116f6>] __fput+0x12a/0x1df
[<ffffffff811117c5>] fput+0x1a/0x1c
[<ffffffff8110e49b>] filp_close+0x68/0x72
[<ffffffff8110e552>] sys_close+0xad/0xe7
[<ffffffff81009b42>] system_call_fastpath+0x16/0x1b

Finally decided I should fix this.  Its a straightforward inversion,
tipc_ref_acquire takes two locks in this order:
ref_table_lock
entry->lock

while tipc_deleteport takes them in this order:
entry->lock (via tipc_port_lock())
ref_table_lock (via tipc_ref_discard())

when the same entry is referenced, we get the above warning.  The fix is equally
straightforward.  Theres no real relation between the entry->lock and the
ref_table_lock (they just are needed at the same time), so move the entry->lock
aquisition in tipc_ref_acquire down, after we unlock ref_table_lock (this is
safe since the ref_table_lock guards changes to the reference table, and we've
already claimed a slot there.  I've tested the below fix and confirmed that it
clears up the lockdep issue

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 14:15:45 -07:00
Bruno Randolf 09a08cff3d mac80211: (really) fix rates setup on IBSS merge
when an IBSS merge happened, the supported rates for the newly added station
were left empty, causing the rate control module to be initialized with only
the basic rates.

the section of the ibss code which deals with updating supported rates for
an already existing station failed to inform the rate control module about the
new rates. as both minstrel and pid don't have an update function i just use
the init function.

also remove unnecessary (unsigned long long) casts and edit debug message.

Signed-off-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-16 15:05:47 -04:00
John W. Linville 819386dfc6 Revert "mac80211: fix rates setup on IBSS merge"
I accidentally merged an incomplete version of the patch...

This reverts commit b4d59a9317.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-16 15:05:23 -04:00
Michael Braun 7f7708f005 bridge: Fix br_forward crash in promiscuous mode
From: Michael Braun <michael-dev@fami-braun.de>

bridge: Fix br_forward crash in promiscuous mode

It's a linux-next kernel from 2010-03-12 on an x86 system and it
OOPs in the bridge module in br_pass_frame_up (called by
br_handle_frame_finish) because brdev cannot be dereferenced (its set to
a non-null value).

Adding some BUG_ON statements revealed that
 BR_INPUT_SKB_CB(skb)->brdev == br-dev
(as set in br_handle_frame_finish first)
only holds until br_forward is called.
The next call to br_pass_frame_up then fails.

Digging deeper it seems that br_forward either frees the skb or passes
it to NF_HOOK which will in turn take care of freeing the skb. The
same is holds for br_pass_frame_ip. So it seems as if two independent
skb allocations are required. As far as I can see, commit
b33084be19 ("bridge: Avoid unnecessary
clone on forward path") removed skb duplication and so likely causes
this crash. This crash does not happen on 2.6.33.

I've therefore modified br_forward the same way br_flood has been
modified so that the skb is not freed if skb0 is going to be used
and I can confirm that the attached patch resolves the issue for me.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 00:26:22 -07:00
Herbert Xu 0821ec55bb bridge: Move NULL mdb check into br_mdb_ip_get
Since all callers of br_mdb_ip_get need to check whether the
hash table is NULL, this patch moves the check into the function.

This fixes the two callers (query/leave handler) that didn't
check it.

Reported-by: Michael Braun <michael-dev@fami-braun.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-15 20:38:25 -07:00
David S. Miller 4961e02f19 Merge branch 'master' of /home/davem/src/GIT/linux-2.6/ 2010-03-15 16:23:54 -07:00
Gerrit Renker d14a0ebda7 net-2.6 [Bug-Fix][dccp]: fix oops caused after failed initialisation
dccp: fix panic caused by failed initialisation

This fixes a kernel panic reported thanks to Andre Noll:

if DCCP is compiled into the kernel and any out of the initialisation
steps in net/dccp/proto.c:dccp_init() fail, a subsequent attempt to create
a SOCK_DCCP socket will panic, since inet{,6}_create() are not prevented
from creating DCCP sockets.

This patch fixes the problem by propagating a failure in dccp_init() to
dccp_v{4,6}_init_net(), and from there to dccp_v{4,6}_init(), so that the
DCCP protocol is not made available if its initialisation fails.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-15 16:00:50 -07:00
Akinobu Mita a1ca14ac54 phonet: use for_each_set_bit()
Replace open-coded loop with for_each_set_bit().

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-15 16:00:47 -07:00
Felix Fietkau eaf55530c9 mac80211: optimize tx status processing
When a cooked monitor interface is active, ieee80211_tx_status()
generates a radiotap header for every single frame, even if it wasn't
injected and thus won't be sent to a monitor interface.
This patch reduces cpu utilization by moving the cooked monitor check a
bit earlier, before it generates the rtap header.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-15 15:31:59 -04:00
Linus Torvalds ceb804cd0f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  9p: Skip check for mandatory locks when unlocking
  9p: Fixes a simple bug enabling writes beyond 2GB.
  9p: Change the name of new protocol from 9p2010.L to 9p2000.L
  fs/9p: re-init the wstat in readdir loop
  net/9p: Add sysfs mount_tag file for virtio 9P device
  net/9p: Use the tag name in the config space for identifying mount point
2010-03-14 11:11:08 -07:00
Linus Torvalds d89b218b80 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (108 commits)
  bridge: ensure to unlock in error path in br_multicast_query().
  drivers/net/tulip/eeprom.c: fix bogus "(null)" in tulip init messages
  sky2: Avoid rtnl_unlock without rtnl_lock
  ipv6: Send netlink notification when DAD fails
  drivers/net/tg3.c: change the field used with the TG3_FLAG_10_100_ONLY constant
  ipconfig: Handle devices which take some time to come up.
  mac80211: Fix memory leak in ieee80211_if_write()
  mac80211: Fix (dynamic) power save entry
  ipw2200: use kmalloc for large local variables
  ath5k: read eeprom IQ calibration values correctly for G mode
  ath5k: fix I/Q calibration (for real)
  ath5k: fix TSF reset
  ath5k: use fixed antenna for tx descriptors
  libipw: split ieee->networks into small pieces
  mac80211: Fix sta_mtx unlocking on insert STA failure path
  rt2x00: remove KSEG1ADDR define from rt2x00soc.h
  net: add ColdFire support to the smc91x driver
  asix: fix setting mac address for AX88772
  ipv6 ip6_tunnel: eliminate unused recursion field from ip6_tnl{}.
  net: Fix dev_mc_add()
  ...
2010-03-13 14:50:18 -08:00
YOSHIFUJI Hideaki bec68ff163 bridge: ensure to unlock in error path in br_multicast_query().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-13 12:27:21 -08:00
Herbert Xu e2577a0658 ipv6: Send netlink notification when DAD fails
If we are managing IPv6 addresses using DHCP, it would be nice
for user-space to be notified if an address configured through
DHCP fails DAD.  Otherwise user-space would have to poll to see
whether DAD succeeds.

This patch uses the existing notification mechanism and simply
hooks it into the DAD failure code path.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-13 12:23:29 -08:00
David S. Miller a003460b21 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2010-03-13 12:17:09 -08:00
Sripathi Kodi 45bc21edb5 9p: Change the name of new protocol from 9p2010.L to 9p2000.L
This patch changes the name of the new 9P protocol from 9p2010.L to
9p2000.u. This is because we learnt that the name 9p2010 is already
being used by others.

Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-03-13 08:57:29 -06:00
Aneesh Kumar K.V 86c8437383 net/9p: Add sysfs mount_tag file for virtio 9P device
This adds a new file for virtio 9P device. The file
contain details of the mount device name that should
be used to mount the 9P file system.

Ex: /sys/devices/virtio-pci/virtio1/mount_tag  file now
contian the tag name to be used to mount the 9P file system.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-03-13 08:57:29 -06:00
Aneesh Kumar K.V 97ee9b0257 net/9p: Use the tag name in the config space for identifying mount point
This patch use the tag name in the config space to identify the
mount device. The the virtio device name depend on the enumeration
order of the device and may not remain the same across multiple boots
So we use the tag name which is set via qemu option to uniquely identify
the mount device

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-03-13 08:57:28 -06:00
Linus Torvalds c32da02342 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (56 commits)
  doc: fix typo in comment explaining rb_tree usage
  Remove fs/ntfs/ChangeLog
  doc: fix console doc typo
  doc: cpuset: Update the cpuset flag file
  Fix of spelling in arch/sparc/kernel/leon_kernel.c no longer needed
  Remove drivers/parport/ChangeLog
  Remove drivers/char/ChangeLog
  doc: typo - Table 1-2 should refer to "status", not "statm"
  tree-wide: fix typos "ass?o[sc]iac?te" -> "associate" in comments
  No need to patch AMD-provided drivers/gpu/drm/radeon/atombios.h
  devres/irq: Fix devm_irq_match comment
  Remove reference to kthread_create_on_cpu
  tree-wide: Assorted spelling fixes
  tree-wide: fix 'lenght' typo in comments and code
  drm/kms: fix spelling in error message
  doc: capitalization and other minor fixes in pnp doc
  devres: typo fix s/dev/devm/
  Remove redundant trailing semicolons from macros
  fix typo "definetly" -> "definitely" in comment
  tree-wide: s/widht/width/g typo in comments
  ...

Fix trivial conflict in Documentation/laptops/00-INDEX
2010-03-12 16:04:50 -08:00
David S. Miller 964ad81cbd ipconfig: Handle devices which take some time to come up.
Some network devices, particularly USB ones, take several seconds to
fully init and appear in the device list.

If the user turned ipconfig on, they are using it for NFS root or some
other early booting purpose.  So it makes no sense to just flat out
fail immediately if the device isn't found.

It also doesn't make sense to just jack up the initial wait to
something crazy like 10 seconds.

Instead, poll immediately, and then periodically once a second,
waiting for a usable device to appear.  Fail after 12 seconds.

Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Christian Pellegrin <chripell@fsfe.org>
2010-03-12 00:00:17 -08:00
Felix Fietkau eae44756d6 minstrel: make the rate control ops reusable from another rc implementation
This patch makes it possible to reuse the minstrel rate control ops
from another rate control module. This is useful in preparing for the
new 802.11n implementation of minstrel, which will reuse the old code
for legacy stations.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-10 17:44:23 -05:00
Felix Fietkau 44ac91ea84 minstrel: simplify and fix debugfs code
This patch cleans up the debugfs read function for the statistics by
using simple_read_from_buffer instead of its own semi-broken hack.
Also removes a useless member of the minstrel debugfs info struct.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-10 17:44:17 -05:00
Andres Salomon c2ef355bf3 mac80211: give warning if building w/out rate ctrl algorithm
I discovered that if EMBEDDED=y, one can accidentally build a mac80211 stack
and drivers w/ no rate control algorithm.  For drivers like RTL8187 that don't
supply their own RC algorithms, this will cause ieee80211_register_hw to
fail (making the driver unusable).

This will tell kconfig to provide a warning if no rate control algorithms
have been selected.  That'll at least warn the user; users that know that
their drivers supply a rate control algorithm can safely ignore the
warning, and those who don't know (or who expect to be using multiple
drivers) can select a default RC algorithm.

Signed-off-by: Andres Salomon <dilinger@collabora.co.uk>
Cc: stable@kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-10 17:09:40 -05:00
florian@mickler.org 6c26361e4b enhance sysfs rfkill interface
This commit introduces two new sysfs knobs.

/sys/class/rfkill/rfkill[0-9]+/blocked_hw: (ro)
	hardblock kill state
/sys/class/rfkill/rfkill[0-9]+/blocked_sw: (rw)
	softblock kill state

Signed-off-by: Florian Mickler <florian@mickler.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-10 17:09:34 -05:00
Eric Dumazet 51f5f8ca44 mac80211: Fix memory leak in ieee80211_if_write()
Fix memory leak and use kmalloc() instead of kzalloc() as we are going
to overwrite the allocated buffer.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-10 16:29:11 -05:00
Juuso Oikarinen 2a13052fe4 mac80211: Fix (dynamic) power save entry
Currently hardware with !IEEE80211_HW_PS_NULLFUNC_STACK and
IEEE80211_HW_REPORTS_TX_ACK_STATUS will never enter PSM due to the
conditions in the power save entry functions.

Fix those conditions.

Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Cc: stable@kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-10 16:29:10 -05:00
Jouni Malinen 38a679a52b mac80211: Fix sta_mtx unlocking on insert STA failure path
Commit 34e895075e introduced sta_mtx
locking into sta_info_insert() (now sta_info_insert_rcu), but forgot
to unlock this mutex on one of the error paths. Fix this by adding
the missing mutex_unlock() call for the case where STA insert fails
due to an entry existing already. This may happen at least in AP mode
when a STA roams between two BSSes (vifs).

Signed-off-by: Jouni Malinen <j@w1.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-10 16:16:54 -05:00
Eric Dumazet 3041f51707 net: Fix dev_mc_add()
Commit 6e17d45a (net: add addr len check to dev_mc_add)
added a bug in dev_mc_add(), since it can now exit with a lock
imbalance.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-10 07:32:28 -08:00
Eric Dumazet 0a141509ed net: Annotates neigh_invalidate()
Annotates neigh_invalidate() with __releases() and __acquires() for
sparse sake.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-10 07:32:28 -08:00
Eric Dumazet bb134d5d95 tcp: Fix tcp_v4_rcv()
Commit d218d111 (tcp: Generalized TTL Security Mechanism) added a bug
for TIMEWAIT sockets. We should not test min_ttl for TW sockets.

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-10 07:32:27 -08:00
Johannes Berg fa9029f8c3 mac80211: use different MAC addresses for virtual interfaces
Drivers can now advertise to cfg80211 that they have
multiple MAC addresses reserved for a device, but we
don't currently make use of that in mac80211.

Change that and assign different addresses to new
virtual interfaces (if addresses are available) in
order to make it easier for users to use multiple
virtual interfaces; they no longer need to always
assign a new MAC address manually.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-09 15:03:07 -05:00
Helmut Schaa df13cce53a mac80211: Improve software scan timing
The current software scan implemenation in mac80211 returns to the operating
channel after each scanned channel. However, in some situations (e.g. no
traffic) it would be nicer to scan a few channels in a row to speed up
the scan itself.

Hence, after scanning a channel, check if we have queued up any tx frames and
return to the operating channel in that case.

Unfortunately we don't know if the AP has buffered any frames for us. Hence,
scan only as many channels in a row as the pm_qos latency and the negotiated
listen interval allows us to.

Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-09 15:03:07 -05:00
Bruno Randolf b4d59a9317 mac80211: fix rates setup on IBSS merge
when an IBSS merge happened, the supported rates for the newly added station
were left empty, causing the rate control module to be initialized with only
the basic rates.

also the section of the ibss code which deals with updating supported rates for
an already existing station fails to inform the rate control module about the
new rates. as i don't know how to fix this (minstrel does not have an update
function), i have just added a comment for now.

Signed-off-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-09 15:03:06 -05:00
Johannes Berg 62bb2ac5cb mac80211: deprecate RX status noise
The noise value as is won't be used, isn't
filled by most drivers and doesn't really
make a whole lot of sense on a per packet
basis -- proper cfg80211 survey support in
mac80211 will need to be different.

Mark the struct member as deprecated so it
will be removed from drivers.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-09 15:02:53 -05:00
Neil Horman de58657146 tipc: filter out messages not intended for this host
Port commit 20deb48d16fdd07ce2fdc8d03ea317362217e085
from git://tipc.cslab.ericsson.net/pub/git/people/allan/tipc.git

Part of the large effort I'm trying to help with getting all the downstreamed
code from windriver forward ported to the upstream tree

Origional commit message
Restore check to filter out inadverdently received messages
This patch reimplements a check that allows TIPC to discard messages
that are not intended for it.  This check was present in TIPC 1.5/1.6,
but was removed by accident during the development of TIPC 1.7; it has
now been updated to account for new features present in TIPC 1.7 and
reinserted into TIPC.  The main benefit of this check is to filter
out messages arriving from orphaned link endpoints, which can arise
when a node exits the network and then re-enters it with a different
TIPC network address (i.e. <Z.C.N> value).

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Origionally-authored-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-08 12:43:56 -08:00
Neil Horman d88dca79d3 tipc: fix endianness on tipc subscriber messages
Remove htohl implementation from tipc

I was working on forward porting the downstream commits for TIPC and ran accross this one:
http://tipc.cslab.ericsson.net/cgi-bin/gitweb.cgi?p=people/allan/tipc.git;a=commitdiff;h=894279b9437b63cbb02405ad5b8e033b51e4e31e

I was going to just take it, when I looked closer and noted what it was doing.
This is basically a routine to byte swap fields of data in sent/received packets
for tipc, dependent upon the receivers guessed endianness of the peer when a
connection is established.  Asside from just seeming silly to me, it appears to
violate the latest RFC draft for tipc:
http://tipc.sourceforge.net/doc/draft-spec-tipc-02.txt
Which, according to section 4.2 and 4.3.3, requires that all fields of all
commands be sent in network byte order.  So instead of just taking this patch,
instead I'm removing the htohl function and replacing the calls with calls to
ntohl in the rx path and htonl in the send path.

As part of this fix, I'm also changing the subscr_cancel function, which
searches the list of subscribers, using a memcmp of the entire subscriber list,
for the entry to tear down.  unfortunately it memcmps the entire tipc_subscr
structure which has several bits that are private to the local side, so nothing
will ever match.  section 5.2 of the draft spec indicates the <type,upper,lower>
tuple should uniquely identify a subscriber, so convert subscr_cancel to just
match on those fields (properly endian swapped).

I've tested this using the tipc test suite, and its passed without issue.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-08 12:20:58 -08:00
Eric Dumazet f5c445ed41 ethtool: Use noinline_for_stack
Use self documenting noinline_for_stack instead of duplicated comments.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-08 12:17:04 -08:00
Joe Perches 81160e66cc net/sunrpc: Convert (void)snprintf to snprintf
(Applies on top of "Remove uses of NIPQUAD, use %pI4")

Casts to void of snprintf are most uncommon in kernel source.
9 use casts, 1301 do not.

Remove the remaining uses in net/sunrpc/

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-08 12:15:59 -08:00
Joe Perches fc0b579168 net/sunrpc: Remove uses of NIPQUAD, use %pI4
Originally submitted Jan 1, 2010
http://patchwork.kernel.org/patch/71221/

Convert NIPQUAD to the %pI4 format extension where possible
Convert %02x%02x%02x%02x/NIPQUAD to %08x/ntohl

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-08 12:15:28 -08:00
Eric Dumazet 28b2774a0d tcp: Fix tcp_make_synack()
Commit 4957faad (TCPCT part 1g: Responder Cookie => Initiator), part
of TCP_COOKIE_TRANSACTION implementation, forgot to correctly size
synack skb in case user data must be included.

Many thanks to Mika Pentillä for spotting this error.

Reported-by: Penttillä Mika <mika.penttila@ixonos.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-08 11:32:01 -08:00
Bian Naimeng 5fe46e9d73 rpc client can not deal with ENOSOCK, so translate it into ENOCONN
If NFSv4 client send a request before connect, or the old connection was broken
because a ETIMEOUT error catched by call_status, ->send_request will return
ENOSOCK, but rpc layer can not deal with it, so make sure ->send_request can
translate ENOSOCK into ENOCONN.

Signed-off-by: Bian Naimeng <biannm@cn.fujitsu.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-08 14:05:57 -05:00
Eric Dumazet 9837638727 net: fix route cache rebuilds
We added an automatic route cache rebuilding in commit 1080d709fb
but had to correct few bugs. One of the assumption of original patch,
was that entries where kept sorted in a given way.

This assumption is known to be wrong (commit 1ddbcb005c gave an
explanation of this and corrected a leak) and expensive to respect.

Paweł Staszewski reported to me one of his machine got its routing cache
disabled after few messages like :

[ 2677.850065] Route hash chain too long!
[ 2677.850080] Adjust your secret_interval!
[82839.662993] Route hash chain too long!
[82839.662996] Adjust your secret_interval!
[155843.731650] Route hash chain too long!
[155843.731664] Adjust your secret_interval!
[155843.811881] Route hash chain too long!
[155843.811891] Adjust your secret_interval!
[155843.858209] vlan0811: 5 rebuilds is over limit, route caching
disabled
[155843.858212] Route hash chain too long!
[155843.858213] Adjust your secret_interval!

This is because rt_intern_hash() might be fooled when computing a chain
length, because multiple entries with same keys can differ because of
TOS (or mark/oif) bits.

In the rare case the fast algorithm see a too long chain, and before
taking expensive path, we call a helper function in order to not count
duplicates of same routes, that only differ with tos/mark/oif bits. This
helper works with data already in cpu cache and is not be very
expensive, despite its O(N^2) implementation.

Paweł Staszewski sucessfully tested this patch on his loaded router.

Reported-and-tested-by: Paweł Staszewski <pstaszewski@itcare.pl>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-08 10:45:31 -08:00
Eric Dumazet 6cce09f87a tcp: Add SNMP counters for backlog and min_ttl drops
Commit 6b03a53a (tcp: use limited socket backlog) added the possibility
of dropping frames when backlog queue is full.

Commit d218d111 (tcp: Generalized TTL Security Mechanism) added the
possibility of dropping frames when TTL is under a given limit.

This patch adds new SNMP MIB entries, named TCPBacklogDrop and
TCPMinTTLDrop, published in /proc/net/netstat in TcpExt: line

netstat -s | egrep "TCPBacklogDrop|TCPMinTTLDrop"
    TCPBacklogDrop: 0
    TCPMinTTLDrop: 0

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-08 10:45:27 -08:00
Jiri Kosina 318ae2edc3 Merge branch 'for-next' into for-linus
Conflicts:
	Documentation/filesystems/proc.txt
	arch/arm/mach-u300/include/mach/debug-macro.S
	drivers/net/qlge/qlge_ethtool.c
	drivers/net/qlge/qlge_main.c
	drivers/net/typhoon.c
2010-03-08 16:55:37 +01:00
Emese Revfy 52cf25d0ab Driver core: Constify struct sysfs_ops in struct kobj_type
Constify struct sysfs_ops.

This is part of the ops structure constification
effort started by Arjan van de Ven et al.

Benefits of this constification:

 * prevents modification of data that is shared
   (referenced) by many other structure instances
   at runtime

 * detects/prevents accidental (but not intentional)
   modification attempts on archs that enforce
   read-only kernel data at runtime

 * potentially better optimized code as the compiler
   can assume that the const data cannot be changed

 * the compiler/linker move const data into .rodata
   and therefore exclude them from false sharing

Signed-off-by: Emese Revfy <re.emese@gmail.com>
Acked-by: David Teigland <teigland@redhat.com>
Acked-by: Matt Domsch <Matt_Domsch@dell.com>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Acked-by: Hans J. Koch <hjk@linutronix.de>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-07 17:04:49 -08:00
Andi Kleen 28812fe11a driver-core: Add attribute argument to class_attribute show/store
Passing the attribute to the low level IO functions allows all kinds
of cleanups, by sharing low level IO code without requiring
an own function for every piece of data.

Also drivers can extend the attributes with own data fields
and use that in the low level function.

This makes the class attributes the same as sysdev_class attributes
and plain attributes.

This will allow further cleanups in drivers.

Full tree sweep converting all users.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-07 17:04:48 -08:00
Herbert Xu 10cc2b50eb bridge: Fix RCU race in br_multicast_stop
Thanks to Paul McKenny for pointing out that it is incorrect to use
synchronize_rcu_bh to ensure that pending callbacks have completed.
Instead we should use rcu_barrier_bh.

Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-07 15:31:12 -08:00
Herbert Xu 49f5fcfd4a bridge: Use RCU list primitive in __br_mdb_ip_get
As Paul McKenney correctly pointed out, __br_mdb_ip_get needs
to use the RCU list walking primitive in order to work correctly
on platforms where data-dependency ordering is not guaranteed.

Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-07 15:31:12 -08:00
YOSHIFUJI Hideaki / 吉藤英明 0c9a2ac1f8 ipv6: Optmize translation between IPV6_PREFER_SRC_xxx and RT6_LOOKUP_F_xxx.
IPV6_PREFER_SRC_xxx definitions:
| #define IPV6_PREFER_SRC_TMP             0x0001
| #define IPV6_PREFER_SRC_PUBLIC          0x0002
| #define IPV6_PREFER_SRC_COA             0x0004

RT6_LOOKUP_F_xxx definitions:
| #define RT6_LOOKUP_F_SRCPREF_TMP        0x00000008
| #define RT6_LOOKUP_F_SRCPREF_PUBLIC     0x00000010
| #define RT6_LOOKUP_F_SRCPREF_COA        0x00000020

So, we can translate between these two groups by shift operation
instead of multiple 'if's.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-07 15:25:53 -08:00
Dan Carpenter 72150e9b7f sock.c: potential null dereference
We test that "prot->rsk_prot" is non-null right before we dereference it
on this line.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-07 15:25:50 -08:00
Dan Carpenter 02a780c014 bridge: cleanup: remove unneed check
We dereference "port" on the lines immediately before and immediately
after the test so port should hopefully never be null here.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-07 15:25:49 -08:00
Linus Torvalds 05c5cb31ec Merge branch 'for-2.6.34' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.34' of git://linux-nfs.org/~bfields/linux: (22 commits)
  nfsd4: fix minor memory leak
  svcrpc: treat uid's as unsigned
  nfsd: ensure sockets are closed on error
  Revert "sunrpc: move the close processing after do recvfrom method"
  Revert "sunrpc: fix peername failed on closed listener"
  sunrpc: remove unnecessary svc_xprt_put
  NFSD: NFSv4 callback client should use RPC_TASK_SOFTCONN
  xfs_export_operations.commit_metadata
  commit_metadata export operation replacing nfsd_sync_dir
  lockd: don't clear sm_monitored on nsm_reboot_lookup
  lockd: release reference to nsm_handle in nlm_host_rebooted
  nfsd: Use vfs_fsync_range() in nfsd_commit
  NFSD: Create PF_INET6 listener in write_ports
  SUNRPC: NFS kernel APIs shouldn't return ENOENT for "transport not found"
  SUNRPC: Bury "#ifdef IPV6" in svc_create_xprt()
  NFSD: Support AF_INET6 in svc_addsock() function
  SUNRPC: Use rpc_pton() in ip_map_parse()
  nfsd: 4.1 has an rfc number
  nfsd41: Create the recovery entry for the NFSv4.1 client
  nfsd: use vfs_fsync for non-directories
  ...
2010-03-06 11:31:38 -08:00
H Hartley Sweeten 72c3368856 nodemask.h: remove macro any_online_node
The macro any_online_node() is prone to producing sparse warnings due to
the local symbol 'node'.  Since all the in-tree users are really
requesting the first online node (the mask argument is either
NODE_MASK_ALL or node_online_map) just use the first_online_node macro and
remove the any_online_node macro since there are no users.

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Milton Miller <miltonm@bga.com>
Cc: Nathan Fontenot <nfont@austin.ibm.com>
Cc: Geoff Levand <geoffrey.levand@am.sony.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Benny Halevy <bhalevy@panasas.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-06 11:26:31 -08:00
Jeff Garzik d17792ebdf ethtool: Add direct access to ops->get_sset_count
On 03/04/2010 09:26 AM, Ben Hutchings wrote:
> On Thu, 2010-03-04 at 00:51 -0800, Jeff Kirsher wrote:
>> From: Jeff Garzik<jgarzik@redhat.com>
>>
>> This patch is an alternative approach for accessing string
>> counts, vs. the drvinfo indirect approach.  This way the drvinfo
>> space doesn't run out, and we don't break ABI later.
> [...]
>> --- a/net/core/ethtool.c
>> +++ b/net/core/ethtool.c
>> @@ -214,6 +214,10 @@ static noinline int ethtool_get_drvinfo(struct net_device *dev, void __user *use
>>   	info.cmd = ETHTOOL_GDRVINFO;
>>   	ops->get_drvinfo(dev,&info);
>>
>> +	/*
>> +	 * this method of obtaining string set info is deprecated;
>> +	 * consider using ETHTOOL_GSSET_INFO instead
>> +	 */
>
> This comment belongs on the interface (ethtool.h) not the
> implementation.

Debatable -- the current comment is located at the callsite of
ops->get_sset_count(), which is where an implementor might think to add
a new call.  Not all the numeric fields in ethtool_drvinfo are obtained
from ->get_sset_count().

Hence the "some" in the attached patch to include/linux/ethtool.h,
addressing your comment.

> [...]
>> +static noinline int ethtool_get_sset_info(struct net_device *dev,
>> +                                          void __user *useraddr)
>> +{
> [...]
>> +	/* calculate size of return buffer */
>> +	for (i = 0; i<  64; i++)
>> +		if (sset_mask&  (1ULL<<  i))
>> +			n_bits++;
> [...]
>
> We have a function for this:
>
> 	n_bits = hweight64(sset_mask);

Agreed.

I've attached a follow-up patch, which should enable my/Jeff's kernel
patch to be applied, followed by this one.

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-05 14:00:17 -08:00
Jeff Garzik 723b2f57ad ethtool: Add direct access to ops->get_sset_count
This patch is an alternative approach for accessing string
counts, vs. the drvinfo indirect approach.  This way the drvinfo
space doesn't run out, and we don't break ABI later.

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-05 14:00:17 -08:00
Zhu Yi a3a858ff18 net: backlog functions rename
sk_add_backlog -> __sk_add_backlog
sk_add_backlog_limited -> sk_add_backlog

Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-05 13:34:03 -08:00
Zhu Yi 2499849ee8 x25: use limited socket backlog
Make x25 adapt to the limited socket backlog change.

Cc: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-05 13:34:02 -08:00
Zhu Yi 53eecb1be5 tipc: use limited socket backlog
Make tipc adapt to the limited socket backlog change.

Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-05 13:34:02 -08:00
Zhu Yi 50b1a782f8 sctp: use limited socket backlog
Make sctp adapt to the limited socket backlog change.

Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-05 13:34:01 -08:00
Zhu Yi 79545b6819 llc: use limited socket backlog
Make llc adapt to the limited socket backlog change.

Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-05 13:34:01 -08:00
Zhu Yi 55349790d7 udp: use limited socket backlog
Make udp adapt to the limited socket backlog change.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-05 13:34:00 -08:00
Zhu Yi 6b03a53a5a tcp: use limited socket backlog
Make tcp adapt to the limited socket backlog change.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-05 13:34:00 -08:00
Zhu Yi 8eae939f14 net: add limit for socket backlog
We got system OOM while running some UDP netperf testing on the loopback
device. The case is multiple senders sent stream UDP packets to a single
receiver via loopback on local host. Of course, the receiver is not able
to handle all the packets in time. But we surprisingly found that these
packets were not discarded due to the receiver's sk->sk_rcvbuf limit.
Instead, they are kept queuing to sk->sk_backlog and finally ate up all
the memory. We believe this is a secure hole that a none privileged user
can crash the system.

The root cause for this problem is, when the receiver is doing
__release_sock() (i.e. after userspace recv, kernel udp_recvmsg ->
skb_free_datagram_locked -> release_sock), it moves skbs from backlog to
sk_receive_queue with the softirq enabled. In the above case, multiple
busy senders will almost make it an endless loop. The skbs in the
backlog end up eat all the system memory.

The issue is not only for UDP. Any protocols using socket backlog is
potentially affected. The patch adds limit for socket backlog so that
the backlog size cannot be expanded endlessly.

Reported-by: Alex Shi <alex.shi@intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru
Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Allan Stephens <allan.stephens@windriver.com>
Cc: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-05 13:33:59 -08:00
Linus Torvalds cc7889ff5e Merge branch 'nfs-for-2.6.34' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'nfs-for-2.6.34' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (44 commits)
  NFS: Remove requirement for inode->i_mutex from nfs_invalidate_mapping
  NFS: Clean up nfs_sync_mapping
  NFS: Simplify nfs_wb_page()
  NFS: Replace __nfs_write_mapping with sync_inode()
  NFS: Simplify nfs_wb_page_cancel()
  NFS: Ensure inode is always marked I_DIRTY_DATASYNC, if it has unstable pages
  NFS: Run COMMIT as an asynchronous RPC call when wbc->for_background is set
  NFS: Reduce the number of unnecessary COMMIT calls
  NFS: Add a count of the number of unstable writes carried by an inode
  NFS: Cleanup - move nfs_write_inode() into fs/nfs/write.c
  nfs41 fix NFS4ERR_CLID_INUSE for exchange id
  NFS: Fix an allocation-under-spinlock bug
  SUNRPC: Handle EINVAL error returns from the TCP connect operation
  NFSv4.1: Various fixes to the sequence flag error handling
  nfs4: renewd renew operations should take/put a client reference
  nfs41: renewd sequence operations should take/put client reference
  nfs: prevent backlogging of renewd requests
  nfs: kill renewd before clearing client minor version
  NFS: Make close(2) asynchronous when closing NFS O_DIRECT files
  NFS: Improve NFS iostat byte count accuracy for writes
  ...
2010-03-05 13:25:45 -08:00
Sripathi Kodi c5a7697da9 9P2010.L handshake: .L protocol negotiation
This patch adds 9P2010.L protocol negotiation with the server

Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-03-05 15:04:42 -06:00
Sripathi Kodi 342fee1d5c 9P2010.L handshake: Remove "dotu" variable
Removes 'dotu' variable and make everything dependent
on 'proto_version' field.

Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-03-05 15:04:42 -06:00