Commit Graph

14973 Commits

Author SHA1 Message Date
John Hughes ddd0451fc8 x.25 attempts to negotiate invalid throughput
The current X.25 code has some bugs in throughput negotiation:

   1. It does negotiation in all cases, usually there is no need
   2. It incorrectly attempts to negotiate the throughput class in one
      direction only.  There are separate throughput classes for input
      and output and if either is negotiated both mist be negotiates.

This is bug https://bugzilla.kernel.org/show_bug.cgi?id=15681

This bug was first reported by Daniel Ferenci to the linux-x25 mailing
list on 6/8/2004, but is still present.

The current (2.6.34) x.25 code doesn't seem to know that the X.25
throughput facility includes two values, one for the required
throughput outbound, one for inbound.

This causes it to attempt to negotiate throughput 0x0A, which is
throughput 9600 inbound and the illegal value "0" for inbound
throughput.

Because of this some X.25 devices (e.g. Cisco 1600) refuse to connect
to Linux X.25.

The following patch fixes this behaviour.  Unless the user specifies a
required throughput it does not attempt to negotiate.  If the user
does not specify a throughput it accepts the suggestion of the remote
X.25 system.  If the user requests a throughput then it validates both
the input and output throughputs and correctly negotiates them with
the remote end.

Signed-off-by: John Hughes <john@calva.com>
Tested-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-07 21:33:02 -07:00
John Hughes f5eb917b86 x25: Patch to fix bug 15678 - x25 accesses fields beyond end of packet.
Here is a patch to stop X.25 examining fields beyond the end of the packet.

For example, when a simple CALL ACCEPTED was received:

	10 10 0f

x25_parse_facilities was attempting to decode the FACILITIES field, but this
packet contains no facilities field.

Signed-off-by: John Hughes <john@calva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-07 21:29:25 -07:00
Herbert Xu fd218cf955 bridge: Fix IGMP3 report parsing
The IGMP3 report parsing is looking at the wrong address for
group records.  This patch fixes it.

Reported-by: Banyeer <banyeer@yahoo.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-07 21:20:47 -07:00
David S. Miller 005c93b5d8 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2010-04-07 16:41:03 -07:00
Johannes Berg 0379185b6c mac80211: annotate station rcu dereferences
The new RCU lockdep support warns about these
in some contexts -- make it aware of the locks
used to protect all this. Different locks are
used in different contexts which unfortunately
means we can't get perfect checking.

Also remove rcu_dereference() from two places
that don't actually dereference the pointers.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-06 15:53:30 -04:00
Javier Cardona 1cb561f837 mac80211: Handle mesh action frames in ieee80211_rx_h_action
This fixes the problem introduced in commit
8404080568 which broke mesh peer link establishment.

changes:
v2 	Added missing break (Johannes)
v3 	Broke original patch into two (Johannes)

Signed-off-by: Javier Cardona <javier@cozybit.com>
Cc: stable@kernel.org
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-06 15:53:28 -04:00
Linus Torvalds cb4361c1dc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (37 commits)
  smc91c92_cs: fix the problem of "Unable to find hardware address"
  r8169: clean up my printk uglyness
  net: Hook up cxgb4 to Kconfig and Makefile
  cxgb4: Add main driver file and driver Makefile
  cxgb4: Add remaining driver headers and L2T management
  cxgb4: Add packet queues and packet DMA code
  cxgb4: Add HW and FW support code
  cxgb4: Add register, message, and FW definitions
  netlabel: Fix several rcu_dereference() calls used without RCU read locks
  bonding: fix potential deadlock in bond_uninit()
  net: check the length of the socket address passed to connect(2)
  stmmac: add documentation for the driver.
  stmmac: fix kconfig for crc32 build error
  be2net: fix bug in vlan rx path for big endian architecture
  be2net: fix flashing on big endian architectures
  be2net: fix a bug in flashing the redboot section
  bonding: bond_xmit_roundrobin() fix
  drivers/net: Add missing unlock
  net: gianfar - align BD ring size console messages
  net: gianfar - initialize per-queue statistics
  ...
2010-04-06 08:34:06 -07:00
Linus Torvalds 749d229761 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  9p: saving negative to unsigned char
  9p: return on mutex_lock_interruptible()
  9p: Creating files with names too long should fail with ENAMETOOLONG.
  9p: Make sure we are able to clunk the cached fid on umount
  9p: drop nlink remove
  fs/9p: Clunk the fid resulting from partial walk of the name
  9p: documentation update
  9p: Fix setting of protocol flags in v9fs_session_info structure.
2010-04-05 13:42:54 -07:00
Dan Carpenter 3dc9fef67f 9p: saving negative to unsigned char
Saving -EINVAL as unsigned char truncates the high bits and changes it
into 234 instead of -22.  This breaks the test for "if (ret == -EINVAL)"
in parse_opts().

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-04-05 14:37:28 -05:00
Tom Tucker bade732a28 svcrdma: RDMA support not yet compatible with RPC6
RPC6 requires that it be possible to create endpoints that listen
exclusively for IPv4 or IPv6 connection requests. This is not currently
supported by the RDMA API.

This fixes a server RDMA regression introduced by 37498292a "NFSD:
Create PF_INET6 listener in write_ports".

Signed-off-by: Tom Tucker<tom@opengridcomputing.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-04-05 12:10:22 -04:00
Aneesh Kumar K.V 6d96d3ab7a 9p: Make sure we are able to clunk the cached fid on umount
dcache prune happen on umount. So we cannot mark the client
satus disconnect. That will prevent a 9p call to the server

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-04-05 10:37:36 -05:00
Paul Moore b914f3a2a3 netlabel: Fix several rcu_dereference() calls used without RCU read locks
The recent changes to add RCU lock verification to rcu_dereference() calls
caught out a problem with netlbl_unlhsh_hash(), see below.

 ===================================================
 [ INFO: suspicious rcu_dereference_check() usage. ]
 ---------------------------------------------------
 net/netlabel/netlabel_unlabeled.c:246 invoked rcu_dereference_check()
 without protection!

This patch fixes this problem as well as others like it in the NetLabel
code.  Also included in this patch is the identification of future work
to eliminate the RCU read lock in netlbl_domhsh_add(), but in the interest
of getting this patch out quickly that work will happen in another patch
to be finished later.

Thanks to Eric Dumazet and Paul McKenney for their help in understanding
the recent RCU changes.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Reported-by: David Howells <dhowells@redhat.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-01 18:32:08 -07:00
Changli Gao 6503d96168 net: check the length of the socket address passed to connect(2)
check the length of the socket address passed to connect(2).

Check the length of the socket address passed to connect(2). If the
length is invalid, -EINVAL will be returned.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
net/bluetooth/l2cap.c | 3 ++-
net/bluetooth/rfcomm/sock.c | 3 ++-
net/bluetooth/sco.c | 3 ++-
net/can/bcm.c | 3 +++
net/ieee802154/af_ieee802154.c | 3 +++
net/ipv4/af_inet.c | 5 +++++
net/netlink/af_netlink.c | 3 +++
7 files changed, 20 insertions(+), 3 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-01 17:26:01 -07:00
David S. Miller d5dc056cce Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2010-03-31 19:32:50 -07:00
Steven J. Magnani baff42ab14 net: Fix oops from tcp_collapse() when using splice()
tcp_read_sock() can have a eat skbs without immediately advancing copied_seq.
This can cause a panic in tcp_collapse() if it is called as a result
of the recv_actor dropping the socket lock.

A userspace program that splices data from a socket to either another
socket or to a file can trigger this bug.

Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-30 13:56:01 -07:00
Johannes Berg 7236fe29fd mac80211: move netdev queue enabling to correct spot
"mac80211: fix skb buffering issue" still left a race
between enabling the hardware queues and the virtual
interface queues. In hindsight it's totally obvious
that enabling the netdev queues for a hardware queue
when the hardware queue is enabled is wrong, because
it could well possible that we can fill the hw queue
with packets we already have pending. Thus, we must
only enable the netdev queues once all the pending
packets have been processed and sent off to the device.

In testing, I haven't been able to trigger this race
condition, but it's clearly there, possibly only when
aggregation is being enabled.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Cc: stable@kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-30 15:37:28 -04:00
Porsch, Marco 533866b12c mac80211: fix PREQ processing and one small bug
1st) a PREQ should only be processed, if it has the same SN and better
metric (instead of better or equal).
2nd) next_hop[ETH_ALEN] now actually used to buffer
mpath->next_hop->sta.addr for use out of lock.

Signed-off-by: Marco Porsch <marco.porsch@siemens.com>
Acked-by: Javier Cardona <javier@cozybit.com>
Cc: stable@kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-30 15:37:23 -04:00
John W. Linville c7a00dc73b mac80211: correct typos in "unavailable upon resume" warning
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-30 15:37:22 -04:00
John W. Linville 368d06f5b0 wireless: convert reg_regdb_search_lock to mutex
Stanse discovered that kmalloc is being called with GFP_KERNEL while
holding this spinlock.  The spinlock can be a mutex instead, which also
enables the removal of the unlock/lock around the lock/unlock of
cfg80211_mutex and the call to set_regdom.

Reported-by: Jiri Slaby <jirislaby@gmail.com>
Cc: stable@kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-30 15:37:20 -04:00
Tejun Heo 5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
J. Bruce Fields 788e69e548 svcrpc: don't hold sv_lock over svc_xprt_put()
svc_xprt_put() can call tcp_close(), which can sleep, so we shouldn't be
holding this lock.

In fact, only the xpt_list removal and the sv_tmpcnt decrement should
need the sv_lock here.

Reported-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-03-29 21:02:31 -04:00
Linus Torvalds 6631424fd2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (33 commits)
  r8169: offical fix for CVE-2009-4537 (overlength frame DMAs)
  ipv6: Don't drop cache route entry unless timer actually expired.
  tulip: Add missing parens.
  r8169: fix broken register writes
  pcnet_cs: add new id
  bonding: fix broken multicast with round-robin mode
  drivers/net: Fix continuation lines
  e1000: do not modify tx_queue_len on link speed change
  net: ipmr/ip6mr: prevent out-of-bounds vif_table access
  ixgbe: Do not run all Diagnostic offline tests when VFs are active
  igb: use correct bits to identify if managability is enabled
  benet: Fix compile warnnings in drivers/net/benet/be_ethtool.c
  net: Add MSG_WAITFORONE flag to recvmmsg
  e1000e: do not modify tx_queue_len on link speed change
  igbvf: do not modify tx_queue_len on link speed change
  ipv4: Restart rt_intern_hash after emergency rebuild (v2)
  ipv4: Cleanup struct net dereference in rt_intern_hash
  net: fix netlink address dumping in IPv4/IPv6
  tulip: Fix null dereference in uli526x_rx_packet()
  gianfar: fix undo of reserve()
  ...
2010-03-29 14:41:18 -07:00
Gilles Espinasse f77f13e22d Fix comment and Kconfig typos for 'require' and 'fragment'
Signed-off-by: Gilles Espinasse <g.esp@free.fr>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-03-29 15:41:47 +02:00
YOSHIFUJI Hideaki / 吉藤英明 54c1a859ef ipv6: Don't drop cache route entry unless timer actually expired.
This is ipv6 variant of the commit 5e016cbf6.. ("ipv4: Don't drop
redirected route cache entry unless PTMU actually expired")
by Guenter Roeck <guenter.roeck@ericsson.com>.

Remove cache route entry in ipv6_negative_advice() only if
the timer is expired.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-28 19:34:26 -07:00
Nicolas Dichtel 7438189baa net: ipmr/ip6mr: prevent out-of-bounds vif_table access
When cache is unresolved, c->mf[6]c_parent is set to 65535 and
minvif, maxvif are not initialized, hence we must avoid to
parse IIF and OIF.
A second problem can happen when the user dumps a cache entry
where a VIF, that was referenced at creation time, has been
removed.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-27 08:33:21 -07:00
Brandon L Black 71c5c1595c net: Add MSG_WAITFORONE flag to recvmmsg
Add new flag MSG_WAITFORONE for the recvmmsg() syscall.
When this flag is specified for a blocking socket, recvmmsg()
will only block until at least 1 packet is available.  The
default behavior is to block until all vlen packets are
available.  This flag has no effect on non-blocking sockets
or when used in combination with MSG_DONTWAIT.

Signed-off-by: Brandon L Black <blblack@gmail.com>
Acked-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-27 08:29:01 -07:00
Pavel Emelyanov 6a2bad70d5 ipv4: Restart rt_intern_hash after emergency rebuild (v2)
The the rebuild changes the genid which in turn is used at
the hash calculation. Thus if we don't restart and go on with
inserting the rt will happen in wrong chain.

(Fixed Neil's comment about the index passed into the rt_intern_hash)

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-26 20:57:35 -07:00
Pavel Emelyanov b35ecb5d40 ipv4: Cleanup struct net dereference in rt_intern_hash
There's no need in getting it 3 times and gcc isn't smart enough
to understand this himself.

This is just a cleanup before the fix (next patch).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-26 20:57:35 -07:00
Patrick McHardy 4b97efdf39 net: fix netlink address dumping in IPv4/IPv6
When a dump is interrupted at the last device in a hash chain and
then continued, "idx" won't get incremented past s_idx, so s_ip_idx
is not reset when moving on to the next device. This means of all
following devices only the last n - s_ip_idx addresses are dumped.

Tested-by: Pawel Staszewski <pstaszewski@itcare.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-03-26 20:27:49 -07:00
Linus Torvalds 126a031e43 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (25 commits)
  TIPC: Removed inactive maintainer
  isdn: Cleanup Sections in PCMCIA driver elsa
  isdn: Cleanup Sections in PCMCIA driver avma1
  isdn: Cleanup Sections in PCMCIA driver teles
  isdn: Cleanup Sections in PCMCIA driver sedlbauer
  via-velocity: Fix FLOW_CNTL_TX_RX handling in set_mii_flow_control()
  netfilter: xt_hashlimit: IPV6 bugfix
  netfilter: ip6table_raw: fix table priority
  netfilter: xt_hashlimit: dl_seq_stop() fix
  af_key: return error if pfkey_xfrm_policy2msg_prep() fails
  skbuff: remove unused dma_head & dma_maps fields
  vlan: updates vlan real_num_tx_queues
  vlan: adds vlan_dev_select_queue
  igb: only use vlan_gro_receive if vlans are registered
  igb: do not modify tx_queue_len on link speed change
  igb: count Rx FIFO errors correctly
  bnx2: Use proper handler during netpoll.
  bnx2: Fix netpoll crash.
  ksz884x: fix return value of netdev_set_eeprom
  cgroups: net_cls as module
  ...
2010-03-25 14:06:29 -07:00
David S. Miller 80bb3a00fa Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6 2010-03-25 11:48:58 -07:00
Eric Dumazet 8f59922914 netfilter: xt_hashlimit: IPV6 bugfix
A missing break statement in hashlimit_ipv6_mask(), and masks
between /64 and /95 are not working at all...

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-03-25 17:25:11 +01:00
Jozsef Kadlecsik 9c13886665 netfilter: ip6table_raw: fix table priority
The order of the IPv6 raw table is currently reversed, that makes impossible
to use the NOTRACK target in IPv6: for example if someone enters

ip6tables -t raw -A PREROUTING -p tcp --dport 80 -j NOTRACK

and if we receive fragmented packets then the first fragment will be
untracked and thus skip nf_ct_frag6_gather (and conntrack), while all
subsequent fragments enter nf_ct_frag6_gather and reassembly will never
successfully be finished.

Singed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-03-25 11:17:26 +01:00
Eric Dumazet 55e0d7cf27 netfilter: xt_hashlimit: dl_seq_stop() fix
If dl_seq_start() memory allocation fails, we crash later in
dl_seq_stop(), trying to kfree(ERR_PTR(-ENOMEM))

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-03-25 11:00:22 +01:00
Linus Torvalds 6c75969e22 Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: don't try to decode GETATTR if DELEGRETURN returned error
  sunrpc: handle allocation errors from __rpc_lookup_create()
  SUNRPC: Fix the return value of rpc_run_bc_task()
  SUNRPC: Fix a use after free bug with the NFSv4.1 backchannel
  SUNRPC: Fix a potential memory leak in auth_gss
  NFS: Prevent another deadlock in nfs_release_page()
2010-03-24 16:50:46 -07:00
Dan Carpenter 9a127aad4d af_key: return error if pfkey_xfrm_policy2msg_prep() fails
The original code saved the error value but just returned 0 in the end.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Jamal Hadi Salim <hadi@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-24 13:28:27 -07:00
Vasu Dev f6b9f4b263 vlan: updates vlan real_num_tx_queues
Updates real_num_tx_queues in case underlying real device
has changed real_num_tx_queues.

-v2
 As per Eric Dumazet<eric.dumazet@gmail.com> comment:-
   -- adds BUG_ON to catch case of real_num_tx_queues exceeding num_tx_queues.
   -- created this self contained patch to just update real_num_tx_queues.

Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-24 11:11:38 -07:00
Vasu Dev 669d3e0bab vlan: adds vlan_dev_select_queue
This is required to correctly select vlan tx queue for a driver
supporting multi tx queue with ndo_select_queue implemented since
currently selected vlan tx queue is unaligned to selected queue by
real net_devce ndo_select_queue.

Unaligned vlan tx queue selection causes thrash with higher vlan
tx lock contention for least fcoe traffic and wrong socket tx
queue_mapping for ixgbe having ndo_select_queue implemented.

-v2

As per Eric Dumazet<eric.dumazet@gmail.com> comments, mirrored
vlan net_device_ops to have them with and without vlan_dev_select_queue
and then select according to real dev ndo_select_queue present or not
for a vlan net_device. This is to completely skip vlan_dev_select_queue
calling for real net_device not supporting ndo_select_queue.

Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-24 11:11:38 -07:00
Li Zefan a5990ea125 sunrpc/cache: fix module refcnt leak in a failure path
Don't forget to release the module refcnt if seq_open() returns failure.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-03-24 10:40:46 -04:00
Ben Blum 8e039d84b3 cgroups: net_cls as module
Allows the net_cls cgroup subsystem to be compiled as a module

This patch modifies net/sched/cls_cgroup.c to allow the net_cls subsystem
to be optionally compiled as a module instead of builtin.  The
cgroup_subsys struct is moved around a bit to allow the subsys_id to be
either declared as a compile-time constant by the cgroup_subsys.h include
in cgroup.h, or, if it's a module, initialized within the struct by
cgroup_load_subsys.

Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-23 13:06:14 -07:00
Amerigo Wang 5fc05f8764 netpoll: warn when there are spaces in parameters
v2: update according to Frans' comments.

Currently, if we leave spaces before dst port,
netconsole will silently accept it as 0. Warn about this.

Also, when spaces appear in other places, make them
visible in error messages.

Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: David Miller <davem@davemloft.net>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-22 20:05:45 -07:00
Patrick McHardy ef1691504c netfilter: xt_recent: fix regression in rules using a zero hit_count
Commit 8ccb92ad (netfilter: xt_recent: fix false match) fixed supposedly
false matches in rules using a zero hit_count. As it turns out there is
nothing false about these matches and people are actually using entries
with a hit_count of zero to make rules dependant on addresses inserted
manually through /proc.

Since this slipped past the eyes of three reviewers, instead of
reverting the commit in question, this patch explicitly checks
for a hit_count of zero to make the intentions more clear.

Reported-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Tested-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Cc: stable@kernel.org
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-03-22 18:25:20 +01:00
Linus Torvalds 258152acc0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (38 commits)
  ip_gre: include route header_len in max_headroom calculation
  if_tunnel.h: add missing ams/byteorder.h include
  ipv4: Don't drop redirected route cache entry unless PTMU actually expired
  net: suppress lockdep-RCU false positive in FIB trie.
  Bluetooth: Fix kernel crash on L2CAP stress tests
  Bluetooth: Convert debug files to actually use debugfs instead of sysfs
  Bluetooth: Fix potential bad memory access with sysfs files
  netfilter: ctnetlink: fix reliable event delivery if message building fails
  netlink: fix NETLINK_RECV_NO_ENOBUFS in netlink_set_err()
  NET_DMA: free skbs periodically
  netlink: fix unaligned access in nla_get_be64()
  tcp: Fix tcp_mark_head_lost() with packets == 0
  net: ipmr/ip6mr: fix potential out-of-bounds vif_table access
  KS8695: update ksp->next_rx_desc_read at the end of rx loop
  igb: Add support for 82576 ET2 Quad Port Server Adapter
  ixgbevf: Message formatting cleanups
  ixgbevf: Shorten up delay timer for watchdog task
  ixgbevf: Fix VF Stats accounting after reset
  ixgbe: Set IXGBE_RSC_CB(skb)->DMA field to zero after unmapping the address
  ixgbe: fix for real_num_tx_queues update issue
  ...
2010-03-22 10:01:58 -07:00
Tetsuo Handa c3824d21eb rxrpc: Check allocation failure.
alloc_skb() can return NULL.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-22 09:57:19 -07:00
Dan Carpenter f1f0abe192 sunrpc: handle allocation errors from __rpc_lookup_create()
__rpc_lookup_create() can return ERR_PTR(-ENOMEM).

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-03-22 05:34:13 -04:00
Trond Myklebust ff0901f803 SUNRPC: Fix the return value of rpc_run_bc_task()
Currently rpc_run_bc_task() will return NULL if the task allocation failed.
However the only caller is bc_send, which assumes that the return value
will be an ERR_PTR.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-22 05:34:12 -04:00
Trond Myklebust c9acb42ef1 SUNRPC: Fix a use after free bug with the NFSv4.1 backchannel
The ->release_request() callback was designed to allow the transport layer
to do housekeeping after the RPC call is done. It cannot be used to free
the request itself, and doing so leads to a use-after-free bug in
xprt_release().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-22 05:32:44 -04:00
Timo Teräs 243aad830e ip_gre: include route header_len in max_headroom calculation
Taking route's header_len into account, and updating gre device
needed_headroom will give better hints on upper bound of required
headroom. This is useful if the gre traffic is xfrm'ed.

Signed-off-by: Timo Teras <timo.teras@iki.fi>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-21 21:23:28 -07:00
Guenter Roeck 5e016cbf6c ipv4: Don't drop redirected route cache entry unless PTMU actually expired
TCP sessions over IPv4 can get stuck if routers between endpoints
do not fragment packets but implement PMTU instead, and we are using
those routers because of an ICMP redirect.

Setup is as follows

       MTU1    MTU2   MTU1
    A--------B------C------D

with MTU1 > MTU2. A and D are endpoints, B and C are routers. B and C
implement PMTU and drop packets larger than MTU2 (for example because
DF is set on all packets). TCP sessions are initiated between A and D.
There is packet loss between A and D, causing frequent TCP
retransmits.

After the number of retransmits on a TCP session reaches tcp_retries1,
tcp calls dst_negative_advice() prior to each retransmit. This results
in route cache entries for the peer to be deleted in
ipv4_negative_advice() if the Path MTU is set.

If the outstanding data on an affected TCP session is larger than
MTU2, packets sent from the endpoints will be dropped by B or C, and
ICMP NEEDFRAG will be returned. A and D receive NEEDFRAG messages and
update PMTU.

Before the next retransmit, tcp will again call dst_negative_advice(),
causing the route cache entry (with correct PMTU) to be deleted. The
retransmitted packet will be larger than MTU2, causing it to be
dropped again.

This sequence repeats until the TCP session aborts or is terminated.

Problem is fixed by removing redirected route cache entries in
ipv4_negative_advice() only if the PMTU is expired.

Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-21 20:55:13 -07:00
David S. Miller e3a61d47cc Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6 2010-03-21 18:03:11 -07:00
Paul E. McKenney 634a4b2038 net: suppress lockdep-RCU false positive in FIB trie.
Allow fib_find_node() to be called either under rcu_read_lock()
protection or with RTNL held.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-21 18:01:05 -07:00
Trond Myklebust cdead7cf12 SUNRPC: Fix a potential memory leak in auth_gss
The function alloc_enc_pages() currently fails to release the pointer
rqstp->rq_enc_pages in the error path.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: stable@kernel.org
2010-03-21 12:22:49 -04:00
Andrei Emeltchenko c2c77ec83b Bluetooth: Fix kernel crash on L2CAP stress tests
Added very simple check that req buffer has enough space to
fit configuration parameters. Shall be enough to reject packets
with configuration size more than req buffer.

Crash trace below

[ 6069.659393] Unable to handle kernel paging request at virtual address 02000205
[ 6069.673034] Internal error: Oops: 805 [#1] PREEMPT
...
[ 6069.727172] PC is at l2cap_add_conf_opt+0x70/0xf0 [l2cap]
[ 6069.732604] LR is at l2cap_recv_frame+0x1350/0x2e78 [l2cap]
...
[ 6070.030303] Backtrace:
[ 6070.032806] [<bf1c2880>] (l2cap_add_conf_opt+0x0/0xf0 [l2cap]) from
[<bf1c6624>] (l2cap_recv_frame+0x1350/0x2e78 [l2cap])
[ 6070.043823]  r8:dc5d3100 r7:df2a91d6 r6:00000001 r5:df2a8000 r4:00000200
[ 6070.050659] [<bf1c52d4>] (l2cap_recv_frame+0x0/0x2e78 [l2cap]) from
[<bf1c8408>] (l2cap_recv_acldata+0x2bc/0x350 [l2cap])
[ 6070.061798] [<bf1c814c>] (l2cap_recv_acldata+0x0/0x350 [l2cap]) from
[<bf0037a4>] (hci_rx_task+0x244/0x478 [bluetooth])
[ 6070.072631]  r6:dc647700 r5:00000001 r4:df2ab740
[ 6070.077362] [<bf003560>] (hci_rx_task+0x0/0x478 [bluetooth]) from
[<c006b9fc>] (tasklet_action+0x78/0xd8)
[ 6070.087005] [<c006b984>] (tasklet_action+0x0/0xd8) from [<c006c160>]

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com>
Acked-by: Gustavo F. Padovan <gustavo@padovan.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-03-21 05:49:36 +01:00
Marcel Holtmann aef7d97cc6 Bluetooth: Convert debug files to actually use debugfs instead of sysfs
Some of the debug files ended up wrongly in sysfs, because at that point
of time, debugfs didn't exist. Convert these files to use debugfs and
also seq_file. This patch converts all of these files at once and then
removes the exported symbol for the Bluetooth sysfs class.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-03-21 05:49:35 +01:00
Marcel Holtmann 101545f6fe Bluetooth: Fix potential bad memory access with sysfs files
When creating a high number of Bluetooth sockets (L2CAP, SCO
and RFCOMM) it is possible to scribble repeatedly on arbitrary
pages of memory. Ensure that the content of these sysfs files is
always less than one page. Even if this means truncating. The
files in question are scheduled to be moved over to debugfs in
the future anyway.

Based on initial patches from Neil Brown and Linus Torvalds

Reported-by: Neil Brown <neilb@suse.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2010-03-21 05:49:32 +01:00
Pablo Neira Ayuso 37b7ef7203 netfilter: ctnetlink: fix reliable event delivery if message building fails
This patch fixes a bug that allows to lose events when reliable
event delivery mode is used, ie. if NETLINK_BROADCAST_SEND_ERROR
and NETLINK_RECV_NO_ENOBUFS socket options are set.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-20 14:29:03 -07:00
Pablo Neira Ayuso 1a50307ba1 netlink: fix NETLINK_RECV_NO_ENOBUFS in netlink_set_err()
Currently, ENOBUFS errors are reported to the socket via
netlink_set_err() even if NETLINK_RECV_NO_ENOBUFS is set. However,
that should not happen. This fixes this problem and it changes the
prototype of netlink_set_err() to return the number of sockets that
have set the NETLINK_RECV_NO_ENOBUFS socket option. This return
value is used in the next patch in these bugfix series.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-20 14:29:03 -07:00
Steven J. Magnani 73852e8151 NET_DMA: free skbs periodically
Under NET_DMA, data transfer can grind to a halt when userland issues a
large read on a socket with a high RCVLOWAT (i.e., 512 KB for both).
This appears to be because the NET_DMA design queues up lots of memcpy
operations, but doesn't issue or wait for them (and thus free the
associated skbs) until it is time for tcp_recvmesg() to return.
The socket hangs when its TCP window goes to zero before enough data is
available to satisfy the read.

Periodically issue asynchronous memcpy operations, and free skbs for ones
that have completed, to prevent sockets from going into zero-window mode.

Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-20 14:29:02 -07:00
Lennart Schulte 6830c25b7d tcp: Fix tcp_mark_head_lost() with packets == 0
A packet is marked as lost in case packets == 0, although nothing should be done.
This results in a too early retransmitted packet during recovery in some cases.
This small patch fixes this issue by returning immediately.

Signed-off-by: Lennart Schulte <lennart.schulte@nets.rwth-aachen.de>
Signed-off-by: Arnd Hannemann <hannemann@nets.rwth-aachen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-19 22:47:22 -07:00
Patrick McHardy a50436f2cd net: ipmr/ip6mr: fix potential out-of-bounds vif_table access
mfc_parent of cache entries is used to index into the vif_table and is
initialised from mfcctl->mfcc_parent. This can take values of to 2^16-1,
while the vif_table has only MAXVIFS (32) entries. The same problem
affects ip6mr.

Refuse invalid values to fix a potential out-of-bounds access. Unlike
the other validity checks, this is checked in ipmr_mfc_add() instead of
the setsockopt handler since its unused in the delete path and might be
uninitialized.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-19 22:47:22 -07:00
stephen hemminger 97e3ecd112 TCP: check min TTL on received ICMP packets
This adds RFC5082 checks for TTL on received ICMP packets.
It adds some security against spoofed ICMP packets
disrupting GTSM protected sessions.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-19 21:00:42 -07:00
Herbert Xu 10414444cb ipv6: Remove redundant dst NULL check in ip6_dst_check
As the only path leading to ip6_dst_check makes an indirect call
through dst->ops, dst cannot be NULL in ip6_dst_check.

This patch removes this check in case it misleads people who
come across this code.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-19 21:00:42 -07:00
Timo Teräs d11a4dc18b ipv4: check rt_genid in dst_check
Xfrm_dst keeps a reference to ipv4 rtable entries on each
cached bundle. The only way to renew xfrm_dst when the underlying
route has changed, is to implement dst_check for this. This is
what ipv6 side does too.

The problems started after 87c1e12b5e
("ipsec: Fix bogus bundle flowi") which fixed a bug causing xfrm_dst
to not get reused, until that all lookups always generated new
xfrm_dst with new route reference and path mtu worked. But after the
fix, the old routes started to get reused even after they were expired
causing pmtu to break (well it would occationally work if the rtable
gc had run recently and marked the route obsolete causing dst_check to
get called).

Signed-off-by: Timo Teras <timo.teras@iki.fi>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-19 21:00:41 -07:00
Eric Dumazet 0641e4fbf2 net: Potential null skb->dev dereference
When doing "ifenslave -d bond0 eth0", there is chance to get NULL
dereference in netif_receive_skb(), because dev->master suddenly becomes
NULL after we tested it.

We should use ACCESS_ONCE() to avoid this (or rcu_dereference())

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-18 21:16:45 -07:00
Alexandra Kossovsky b634f87522 tcp: Fix OOB POLLIN avoidance.
From: Alexandra.Kossovsky@oktetlabs.ru

Fixes kernel bugzilla #15541

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-18 20:29:24 -07:00
Linus Torvalds 7c34691abe Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: ensure bdi_unregister is called on mount failure.
  NFS: Avoid a deadlock in nfs_release_page
  NFSv4: Don't ignore the NFS_INO_REVAL_FORCED flag in nfs_revalidate_inode()
  nfs4: Make the v4 callback service hidden
  nfs: fix unlikely memory leak
  rpc client can not deal with ENOSOCK, so translate it into ENOCONN
2010-03-18 16:50:09 -07:00
David S. Miller 87faf3ccf1 bridge: Make first arg to deliver_clone const.
Otherwise we get a warning from the call in br_forward().

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 14:37:47 -07:00
YOSHIFUJI Hideaki / 吉藤英明 32dec5dd02 bridge br_multicast: Don't refer to BR_INPUT_SKB_CB(skb)->mrouters_only without IGMP snooping.
Without CONFIG_BRIDGE_IGMP_SNOOPING,
BR_INPUT_SKB_CB(skb)->mrouters_only is not appropriately
initialized, so we can see garbage.

A clear option to fix this is to set it even without that
config, but we cannot optimize out the branch.

Let's introduce a macro that returns value of mrouters_only
and let it return 0 without CONFIG_BRIDGE_IGMP_SNOOPING.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 14:34:23 -07:00
Vitaliy Gusev 858a18a6a2 route: Fix caught BUG_ON during rt_secret_rebuild_oneshot()
route: Fix caught BUG_ON during rt_secret_rebuild_oneshot()

Call rt_secret_rebuild can cause BUG_ON(timer_pending(&net->ipv4.rt_secret_timer)) in
add_timer as there is not any synchronization for call rt_secret_rebuild_oneshot()
for the same net namespace.

Also this issue affects to rt_secret_reschedule().

Thus use mod_timer enstead.

Signed-off-by: Vitaliy Gusev <vgusev@openvz.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 14:15:47 -07:00
YOSHIFUJI Hideaki / 吉藤英明 8440853bb7 bridge br_multicast: Fix skb leakage in error path.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 14:15:46 -07:00
YOSHIFUJI Hideaki / 吉藤英明 0ba8c9ec25 bridge br_multicast: Fix handling of Max Response Code in IGMPv3 message.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 14:15:46 -07:00
Jiri Slaby 21edbb223e NET: netpoll, fix potential NULL ptr dereference
Stanse found that one error path in netpoll_setup dereferences npinfo
even though it is NULL. Avoid that by adding new label and go to that
instead.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Daniel Borkmann <danborkmann@googlemail.com>
Cc: David S. Miller <davem@davemloft.net>
Acked-by: chavey@google.com
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 14:15:45 -07:00
Neil Horman a2f46ee1ba tipc: fix lockdep warning on address assignment
So in the forward porting of various tipc packages, I was constantly
getting this lockdep warning everytime I used tipc-config to set a network
address for the protocol:

[ INFO: possible circular locking dependency detected ]
2.6.33 #1
tipc-config/1326 is trying to acquire lock:
(ref_table_lock){+.-...}, at: [<ffffffffa0315148>] tipc_ref_discard+0x53/0xd4 [tipc]

but task is already holding lock:
(&(&entry->lock)->rlock#2){+.-...}, at: [<ffffffffa03150d5>] tipc_ref_lock+0x43/0x63 [tipc]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&(&entry->lock)->rlock#2){+.-...}:
[<ffffffff8107b508>] __lock_acquire+0xb67/0xd0f
[<ffffffff8107b78c>] lock_acquire+0xdc/0x102
[<ffffffff8145471e>] _raw_spin_lock_bh+0x3b/0x6e
[<ffffffffa03152b1>] tipc_ref_acquire+0xe8/0x11b [tipc]
[<ffffffffa031433f>] tipc_createport_raw+0x78/0x1b9 [tipc]
[<ffffffffa031450b>] tipc_createport+0x8b/0x125 [tipc]
[<ffffffffa030f221>] tipc_subscr_start+0xce/0x126 [tipc]
[<ffffffffa0308fb2>] process_signal_queue+0x47/0x7d [tipc]
[<ffffffff81053e0c>] tasklet_action+0x8c/0xf4
[<ffffffff81054bd8>] __do_softirq+0xf8/0x1cd
[<ffffffff8100aadc>] call_softirq+0x1c/0x30
[<ffffffff810549f4>] _local_bh_enable_ip+0xb8/0xd7
[<ffffffff81054a21>] local_bh_enable_ip+0xe/0x10
[<ffffffff81454d31>] _raw_spin_unlock_bh+0x34/0x39
[<ffffffffa0308eb8>] spin_unlock_bh.clone.0+0x15/0x17 [tipc]
[<ffffffffa0308f47>] tipc_k_signal+0x8d/0xb1 [tipc]
[<ffffffffa0308dd9>] tipc_core_start+0x8a/0xad [tipc]
[<ffffffffa01b1087>] 0xffffffffa01b1087
[<ffffffff8100207d>] do_one_initcall+0x72/0x18a
[<ffffffff810872fb>] sys_init_module+0xd8/0x23a
[<ffffffff81009b42>] system_call_fastpath+0x16/0x1b

-> #0 (ref_table_lock){+.-...}:
[<ffffffff8107b3b2>] __lock_acquire+0xa11/0xd0f
[<ffffffff8107b78c>] lock_acquire+0xdc/0x102
[<ffffffff81454836>] _raw_write_lock_bh+0x3b/0x6e
[<ffffffffa0315148>] tipc_ref_discard+0x53/0xd4 [tipc]
[<ffffffffa03141ee>] tipc_deleteport+0x40/0x119 [tipc]
[<ffffffffa0316e35>] release+0xeb/0x137 [tipc]
[<ffffffff8139dbf4>] sock_release+0x1f/0x6f
[<ffffffff8139dc6b>] sock_close+0x27/0x2b
[<ffffffff811116f6>] __fput+0x12a/0x1df
[<ffffffff811117c5>] fput+0x1a/0x1c
[<ffffffff8110e49b>] filp_close+0x68/0x72
[<ffffffff8110e552>] sys_close+0xad/0xe7
[<ffffffff81009b42>] system_call_fastpath+0x16/0x1b

Finally decided I should fix this.  Its a straightforward inversion,
tipc_ref_acquire takes two locks in this order:
ref_table_lock
entry->lock

while tipc_deleteport takes them in this order:
entry->lock (via tipc_port_lock())
ref_table_lock (via tipc_ref_discard())

when the same entry is referenced, we get the above warning.  The fix is equally
straightforward.  Theres no real relation between the entry->lock and the
ref_table_lock (they just are needed at the same time), so move the entry->lock
aquisition in tipc_ref_acquire down, after we unlock ref_table_lock (this is
safe since the ref_table_lock guards changes to the reference table, and we've
already claimed a slot there.  I've tested the below fix and confirmed that it
clears up the lockdep issue

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 14:15:45 -07:00
Thomas Weber 8839316121 Fix typos in comments
[Ss]ytem => [Ss]ystem
udpate => update
paramters => parameters
orginal => original

Signed-off-by: Thomas Weber <swirl@gmx.li>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-03-16 11:47:56 +01:00
Michael Braun 7f7708f005 bridge: Fix br_forward crash in promiscuous mode
From: Michael Braun <michael-dev@fami-braun.de>

bridge: Fix br_forward crash in promiscuous mode

It's a linux-next kernel from 2010-03-12 on an x86 system and it
OOPs in the bridge module in br_pass_frame_up (called by
br_handle_frame_finish) because brdev cannot be dereferenced (its set to
a non-null value).

Adding some BUG_ON statements revealed that
 BR_INPUT_SKB_CB(skb)->brdev == br-dev
(as set in br_handle_frame_finish first)
only holds until br_forward is called.
The next call to br_pass_frame_up then fails.

Digging deeper it seems that br_forward either frees the skb or passes
it to NF_HOOK which will in turn take care of freeing the skb. The
same is holds for br_pass_frame_ip. So it seems as if two independent
skb allocations are required. As far as I can see, commit
b33084be19 ("bridge: Avoid unnecessary
clone on forward path") removed skb duplication and so likely causes
this crash. This crash does not happen on 2.6.33.

I've therefore modified br_forward the same way br_flood has been
modified so that the skb is not freed if skb0 is going to be used
and I can confirm that the attached patch resolves the issue for me.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-16 00:26:22 -07:00
Herbert Xu 0821ec55bb bridge: Move NULL mdb check into br_mdb_ip_get
Since all callers of br_mdb_ip_get need to check whether the
hash table is NULL, this patch moves the check into the function.

This fixes the two callers (query/leave handler) that didn't
check it.

Reported-by: Michael Braun <michael-dev@fami-braun.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-15 20:38:25 -07:00
David S. Miller 4961e02f19 Merge branch 'master' of /home/davem/src/GIT/linux-2.6/ 2010-03-15 16:23:54 -07:00
Gerrit Renker d14a0ebda7 net-2.6 [Bug-Fix][dccp]: fix oops caused after failed initialisation
dccp: fix panic caused by failed initialisation

This fixes a kernel panic reported thanks to Andre Noll:

if DCCP is compiled into the kernel and any out of the initialisation
steps in net/dccp/proto.c:dccp_init() fail, a subsequent attempt to create
a SOCK_DCCP socket will panic, since inet{,6}_create() are not prevented
from creating DCCP sockets.

This patch fixes the problem by propagating a failure in dccp_init() to
dccp_v{4,6}_init_net(), and from there to dccp_v{4,6}_init(), so that the
DCCP protocol is not made available if its initialisation fails.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-15 16:00:50 -07:00
Akinobu Mita a1ca14ac54 phonet: use for_each_set_bit()
Replace open-coded loop with for_each_set_bit().

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-15 16:00:47 -07:00
NeilBrown d202cce896 sunrpc: never return expired entries in sunrpc_cache_lookup
If sunrpc_cache_lookup finds an expired entry, remove it from
the cache and return a freshly created non-VALID entry instead.
This ensures that we only ever get a usable entry, or an
entry that will become usable once an update arrives.
i.e. we will never need to repeat the lookup.

This allows us to remove the 'is_expired' test from cache_check
(i.e. from cache_is_valid).  cache_check should never get an expired
entry as 'lookup' will never return one.  If it does happen - due to
inconvenient timing - then just accept it as still valid, it won't be
very much past it's use-by date.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-03-14 21:48:52 -04:00
NeilBrown 2f50d8b63d sunrpc/cache: factor out cache_is_expired
This removes a tiny bit of code duplication, but more important
prepares for following patch which will perform the expiry check in
cache_lookup and the rest of the validity check in cache_check.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-03-14 18:08:37 -04:00
NeilBrown 3af4974eb2 sunrpc: don't keep expired entries in the auth caches.
currently expired entries remain in the auth caches as long
as there is a reference.
This was needed long ago when the auth_domain cache used the same
cache infrastructure.  But since that (being a very different sort
of cache) was separated, this test is no longer needed.

So remove the test on refcnt and tidy up the surrounding code.

This allows the cache_dequeue call (which needed to be there to
drop a potentially awkward reference) can be moved outside of the
spinlock which is a better place for it.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-03-14 18:03:54 -04:00
Linus Torvalds ceb804cd0f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  9p: Skip check for mandatory locks when unlocking
  9p: Fixes a simple bug enabling writes beyond 2GB.
  9p: Change the name of new protocol from 9p2010.L to 9p2000.L
  fs/9p: re-init the wstat in readdir loop
  net/9p: Add sysfs mount_tag file for virtio 9P device
  net/9p: Use the tag name in the config space for identifying mount point
2010-03-14 11:11:08 -07:00
Linus Torvalds d89b218b80 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (108 commits)
  bridge: ensure to unlock in error path in br_multicast_query().
  drivers/net/tulip/eeprom.c: fix bogus "(null)" in tulip init messages
  sky2: Avoid rtnl_unlock without rtnl_lock
  ipv6: Send netlink notification when DAD fails
  drivers/net/tg3.c: change the field used with the TG3_FLAG_10_100_ONLY constant
  ipconfig: Handle devices which take some time to come up.
  mac80211: Fix memory leak in ieee80211_if_write()
  mac80211: Fix (dynamic) power save entry
  ipw2200: use kmalloc for large local variables
  ath5k: read eeprom IQ calibration values correctly for G mode
  ath5k: fix I/Q calibration (for real)
  ath5k: fix TSF reset
  ath5k: use fixed antenna for tx descriptors
  libipw: split ieee->networks into small pieces
  mac80211: Fix sta_mtx unlocking on insert STA failure path
  rt2x00: remove KSEG1ADDR define from rt2x00soc.h
  net: add ColdFire support to the smc91x driver
  asix: fix setting mac address for AX88772
  ipv6 ip6_tunnel: eliminate unused recursion field from ip6_tnl{}.
  net: Fix dev_mc_add()
  ...
2010-03-13 14:50:18 -08:00
YOSHIFUJI Hideaki bec68ff163 bridge: ensure to unlock in error path in br_multicast_query().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-13 12:27:21 -08:00
Herbert Xu e2577a0658 ipv6: Send netlink notification when DAD fails
If we are managing IPv6 addresses using DHCP, it would be nice
for user-space to be notified if an address configured through
DHCP fails DAD.  Otherwise user-space would have to poll to see
whether DAD succeeds.

This patch uses the existing notification mechanism and simply
hooks it into the DAD failure code path.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-13 12:23:29 -08:00
David S. Miller a003460b21 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2010-03-13 12:17:09 -08:00
Sripathi Kodi 45bc21edb5 9p: Change the name of new protocol from 9p2010.L to 9p2000.L
This patch changes the name of the new 9P protocol from 9p2010.L to
9p2000.u. This is because we learnt that the name 9p2010 is already
being used by others.

Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-03-13 08:57:29 -06:00
Aneesh Kumar K.V 86c8437383 net/9p: Add sysfs mount_tag file for virtio 9P device
This adds a new file for virtio 9P device. The file
contain details of the mount device name that should
be used to mount the 9P file system.

Ex: /sys/devices/virtio-pci/virtio1/mount_tag  file now
contian the tag name to be used to mount the 9P file system.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-03-13 08:57:29 -06:00
Aneesh Kumar K.V 97ee9b0257 net/9p: Use the tag name in the config space for identifying mount point
This patch use the tag name in the config space to identify the
mount device. The the virtio device name depend on the enumeration
order of the device and may not remain the same across multiple boots
So we use the tag name which is set via qemu option to uniquely identify
the mount device

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-03-13 08:57:28 -06:00
Linus Torvalds c32da02342 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (56 commits)
  doc: fix typo in comment explaining rb_tree usage
  Remove fs/ntfs/ChangeLog
  doc: fix console doc typo
  doc: cpuset: Update the cpuset flag file
  Fix of spelling in arch/sparc/kernel/leon_kernel.c no longer needed
  Remove drivers/parport/ChangeLog
  Remove drivers/char/ChangeLog
  doc: typo - Table 1-2 should refer to "status", not "statm"
  tree-wide: fix typos "ass?o[sc]iac?te" -> "associate" in comments
  No need to patch AMD-provided drivers/gpu/drm/radeon/atombios.h
  devres/irq: Fix devm_irq_match comment
  Remove reference to kthread_create_on_cpu
  tree-wide: Assorted spelling fixes
  tree-wide: fix 'lenght' typo in comments and code
  drm/kms: fix spelling in error message
  doc: capitalization and other minor fixes in pnp doc
  devres: typo fix s/dev/devm/
  Remove redundant trailing semicolons from macros
  fix typo "definetly" -> "definitely" in comment
  tree-wide: s/widht/width/g typo in comments
  ...

Fix trivial conflict in Documentation/laptops/00-INDEX
2010-03-12 16:04:50 -08:00
David S. Miller 964ad81cbd ipconfig: Handle devices which take some time to come up.
Some network devices, particularly USB ones, take several seconds to
fully init and appear in the device list.

If the user turned ipconfig on, they are using it for NFS root or some
other early booting purpose.  So it makes no sense to just flat out
fail immediately if the device isn't found.

It also doesn't make sense to just jack up the initial wait to
something crazy like 10 seconds.

Instead, poll immediately, and then periodically once a second,
waiting for a usable device to appear.  Fail after 12 seconds.

Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Christian Pellegrin <chripell@fsfe.org>
2010-03-12 00:00:17 -08:00
Eric Dumazet 51f5f8ca44 mac80211: Fix memory leak in ieee80211_if_write()
Fix memory leak and use kmalloc() instead of kzalloc() as we are going
to overwrite the allocated buffer.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-10 16:29:11 -05:00
Juuso Oikarinen 2a13052fe4 mac80211: Fix (dynamic) power save entry
Currently hardware with !IEEE80211_HW_PS_NULLFUNC_STACK and
IEEE80211_HW_REPORTS_TX_ACK_STATUS will never enter PSM due to the
conditions in the power save entry functions.

Fix those conditions.

Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Cc: stable@kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-10 16:29:10 -05:00
Jouni Malinen 38a679a52b mac80211: Fix sta_mtx unlocking on insert STA failure path
Commit 34e895075e introduced sta_mtx
locking into sta_info_insert() (now sta_info_insert_rcu), but forgot
to unlock this mutex on one of the error paths. Fix this by adding
the missing mutex_unlock() call for the case where STA insert fails
due to an entry existing already. This may happen at least in AP mode
when a STA roams between two BSSes (vifs).

Signed-off-by: Jouni Malinen <j@w1.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-10 16:16:54 -05:00
Eric Dumazet 3041f51707 net: Fix dev_mc_add()
Commit 6e17d45a (net: add addr len check to dev_mc_add)
added a bug in dev_mc_add(), since it can now exit with a lock
imbalance.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-10 07:32:28 -08:00
Eric Dumazet 0a141509ed net: Annotates neigh_invalidate()
Annotates neigh_invalidate() with __releases() and __acquires() for
sparse sake.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-10 07:32:28 -08:00
Eric Dumazet bb134d5d95 tcp: Fix tcp_v4_rcv()
Commit d218d111 (tcp: Generalized TTL Security Mechanism) added a bug
for TIMEWAIT sockets. We should not test min_ttl for TW sockets.

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-10 07:32:27 -08:00
Neil Horman de58657146 tipc: filter out messages not intended for this host
Port commit 20deb48d16fdd07ce2fdc8d03ea317362217e085
from git://tipc.cslab.ericsson.net/pub/git/people/allan/tipc.git

Part of the large effort I'm trying to help with getting all the downstreamed
code from windriver forward ported to the upstream tree

Origional commit message
Restore check to filter out inadverdently received messages
This patch reimplements a check that allows TIPC to discard messages
that are not intended for it.  This check was present in TIPC 1.5/1.6,
but was removed by accident during the development of TIPC 1.7; it has
now been updated to account for new features present in TIPC 1.7 and
reinserted into TIPC.  The main benefit of this check is to filter
out messages arriving from orphaned link endpoints, which can arise
when a node exits the network and then re-enters it with a different
TIPC network address (i.e. <Z.C.N> value).

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Origionally-authored-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-08 12:43:56 -08:00
Neil Horman d88dca79d3 tipc: fix endianness on tipc subscriber messages
Remove htohl implementation from tipc

I was working on forward porting the downstream commits for TIPC and ran accross this one:
http://tipc.cslab.ericsson.net/cgi-bin/gitweb.cgi?p=people/allan/tipc.git;a=commitdiff;h=894279b9437b63cbb02405ad5b8e033b51e4e31e

I was going to just take it, when I looked closer and noted what it was doing.
This is basically a routine to byte swap fields of data in sent/received packets
for tipc, dependent upon the receivers guessed endianness of the peer when a
connection is established.  Asside from just seeming silly to me, it appears to
violate the latest RFC draft for tipc:
http://tipc.sourceforge.net/doc/draft-spec-tipc-02.txt
Which, according to section 4.2 and 4.3.3, requires that all fields of all
commands be sent in network byte order.  So instead of just taking this patch,
instead I'm removing the htohl function and replacing the calls with calls to
ntohl in the rx path and htonl in the send path.

As part of this fix, I'm also changing the subscr_cancel function, which
searches the list of subscribers, using a memcmp of the entire subscriber list,
for the entry to tear down.  unfortunately it memcmps the entire tipc_subscr
structure which has several bits that are private to the local side, so nothing
will ever match.  section 5.2 of the draft spec indicates the <type,upper,lower>
tuple should uniquely identify a subscriber, so convert subscr_cancel to just
match on those fields (properly endian swapped).

I've tested this using the tipc test suite, and its passed without issue.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-08 12:20:58 -08:00
Eric Dumazet f5c445ed41 ethtool: Use noinline_for_stack
Use self documenting noinline_for_stack instead of duplicated comments.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-08 12:17:04 -08:00
Joe Perches 81160e66cc net/sunrpc: Convert (void)snprintf to snprintf
(Applies on top of "Remove uses of NIPQUAD, use %pI4")

Casts to void of snprintf are most uncommon in kernel source.
9 use casts, 1301 do not.

Remove the remaining uses in net/sunrpc/

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-08 12:15:59 -08:00
Joe Perches fc0b579168 net/sunrpc: Remove uses of NIPQUAD, use %pI4
Originally submitted Jan 1, 2010
http://patchwork.kernel.org/patch/71221/

Convert NIPQUAD to the %pI4 format extension where possible
Convert %02x%02x%02x%02x/NIPQUAD to %08x/ntohl

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-08 12:15:28 -08:00
Eric Dumazet 28b2774a0d tcp: Fix tcp_make_synack()
Commit 4957faad (TCPCT part 1g: Responder Cookie => Initiator), part
of TCP_COOKIE_TRANSACTION implementation, forgot to correctly size
synack skb in case user data must be included.

Many thanks to Mika Pentillä for spotting this error.

Reported-by: Penttillä Mika <mika.penttila@ixonos.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-08 11:32:01 -08:00
Bian Naimeng 5fe46e9d73 rpc client can not deal with ENOSOCK, so translate it into ENOCONN
If NFSv4 client send a request before connect, or the old connection was broken
because a ETIMEOUT error catched by call_status, ->send_request will return
ENOSOCK, but rpc layer can not deal with it, so make sure ->send_request can
translate ENOSOCK into ENOCONN.

Signed-off-by: Bian Naimeng <biannm@cn.fujitsu.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-08 14:05:57 -05:00
Eric Dumazet 9837638727 net: fix route cache rebuilds
We added an automatic route cache rebuilding in commit 1080d709fb
but had to correct few bugs. One of the assumption of original patch,
was that entries where kept sorted in a given way.

This assumption is known to be wrong (commit 1ddbcb005c gave an
explanation of this and corrected a leak) and expensive to respect.

Paweł Staszewski reported to me one of his machine got its routing cache
disabled after few messages like :

[ 2677.850065] Route hash chain too long!
[ 2677.850080] Adjust your secret_interval!
[82839.662993] Route hash chain too long!
[82839.662996] Adjust your secret_interval!
[155843.731650] Route hash chain too long!
[155843.731664] Adjust your secret_interval!
[155843.811881] Route hash chain too long!
[155843.811891] Adjust your secret_interval!
[155843.858209] vlan0811: 5 rebuilds is over limit, route caching
disabled
[155843.858212] Route hash chain too long!
[155843.858213] Adjust your secret_interval!

This is because rt_intern_hash() might be fooled when computing a chain
length, because multiple entries with same keys can differ because of
TOS (or mark/oif) bits.

In the rare case the fast algorithm see a too long chain, and before
taking expensive path, we call a helper function in order to not count
duplicates of same routes, that only differ with tos/mark/oif bits. This
helper works with data already in cpu cache and is not be very
expensive, despite its O(N^2) implementation.

Paweł Staszewski sucessfully tested this patch on his loaded router.

Reported-and-tested-by: Paweł Staszewski <pstaszewski@itcare.pl>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-08 10:45:31 -08:00
Eric Dumazet 6cce09f87a tcp: Add SNMP counters for backlog and min_ttl drops
Commit 6b03a53a (tcp: use limited socket backlog) added the possibility
of dropping frames when backlog queue is full.

Commit d218d111 (tcp: Generalized TTL Security Mechanism) added the
possibility of dropping frames when TTL is under a given limit.

This patch adds new SNMP MIB entries, named TCPBacklogDrop and
TCPMinTTLDrop, published in /proc/net/netstat in TcpExt: line

netstat -s | egrep "TCPBacklogDrop|TCPMinTTLDrop"
    TCPBacklogDrop: 0
    TCPMinTTLDrop: 0

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-08 10:45:27 -08:00
Jiri Kosina 318ae2edc3 Merge branch 'for-next' into for-linus
Conflicts:
	Documentation/filesystems/proc.txt
	arch/arm/mach-u300/include/mach/debug-macro.S
	drivers/net/qlge/qlge_ethtool.c
	drivers/net/qlge/qlge_main.c
	drivers/net/typhoon.c
2010-03-08 16:55:37 +01:00
Emese Revfy 52cf25d0ab Driver core: Constify struct sysfs_ops in struct kobj_type
Constify struct sysfs_ops.

This is part of the ops structure constification
effort started by Arjan van de Ven et al.

Benefits of this constification:

 * prevents modification of data that is shared
   (referenced) by many other structure instances
   at runtime

 * detects/prevents accidental (but not intentional)
   modification attempts on archs that enforce
   read-only kernel data at runtime

 * potentially better optimized code as the compiler
   can assume that the const data cannot be changed

 * the compiler/linker move const data into .rodata
   and therefore exclude them from false sharing

Signed-off-by: Emese Revfy <re.emese@gmail.com>
Acked-by: David Teigland <teigland@redhat.com>
Acked-by: Matt Domsch <Matt_Domsch@dell.com>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Acked-by: Hans J. Koch <hjk@linutronix.de>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-07 17:04:49 -08:00
Andi Kleen 28812fe11a driver-core: Add attribute argument to class_attribute show/store
Passing the attribute to the low level IO functions allows all kinds
of cleanups, by sharing low level IO code without requiring
an own function for every piece of data.

Also drivers can extend the attributes with own data fields
and use that in the low level function.

This makes the class attributes the same as sysdev_class attributes
and plain attributes.

This will allow further cleanups in drivers.

Full tree sweep converting all users.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-07 17:04:48 -08:00
Herbert Xu 10cc2b50eb bridge: Fix RCU race in br_multicast_stop
Thanks to Paul McKenny for pointing out that it is incorrect to use
synchronize_rcu_bh to ensure that pending callbacks have completed.
Instead we should use rcu_barrier_bh.

Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-07 15:31:12 -08:00
Herbert Xu 49f5fcfd4a bridge: Use RCU list primitive in __br_mdb_ip_get
As Paul McKenney correctly pointed out, __br_mdb_ip_get needs
to use the RCU list walking primitive in order to work correctly
on platforms where data-dependency ordering is not guaranteed.

Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-07 15:31:12 -08:00
YOSHIFUJI Hideaki / 吉藤英明 0c9a2ac1f8 ipv6: Optmize translation between IPV6_PREFER_SRC_xxx and RT6_LOOKUP_F_xxx.
IPV6_PREFER_SRC_xxx definitions:
| #define IPV6_PREFER_SRC_TMP             0x0001
| #define IPV6_PREFER_SRC_PUBLIC          0x0002
| #define IPV6_PREFER_SRC_COA             0x0004

RT6_LOOKUP_F_xxx definitions:
| #define RT6_LOOKUP_F_SRCPREF_TMP        0x00000008
| #define RT6_LOOKUP_F_SRCPREF_PUBLIC     0x00000010
| #define RT6_LOOKUP_F_SRCPREF_COA        0x00000020

So, we can translate between these two groups by shift operation
instead of multiple 'if's.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-07 15:25:53 -08:00
Dan Carpenter 72150e9b7f sock.c: potential null dereference
We test that "prot->rsk_prot" is non-null right before we dereference it
on this line.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-07 15:25:50 -08:00
Dan Carpenter 02a780c014 bridge: cleanup: remove unneed check
We dereference "port" on the lines immediately before and immediately
after the test so port should hopefully never be null here.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-07 15:25:49 -08:00
Linus Torvalds 05c5cb31ec Merge branch 'for-2.6.34' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.34' of git://linux-nfs.org/~bfields/linux: (22 commits)
  nfsd4: fix minor memory leak
  svcrpc: treat uid's as unsigned
  nfsd: ensure sockets are closed on error
  Revert "sunrpc: move the close processing after do recvfrom method"
  Revert "sunrpc: fix peername failed on closed listener"
  sunrpc: remove unnecessary svc_xprt_put
  NFSD: NFSv4 callback client should use RPC_TASK_SOFTCONN
  xfs_export_operations.commit_metadata
  commit_metadata export operation replacing nfsd_sync_dir
  lockd: don't clear sm_monitored on nsm_reboot_lookup
  lockd: release reference to nsm_handle in nlm_host_rebooted
  nfsd: Use vfs_fsync_range() in nfsd_commit
  NFSD: Create PF_INET6 listener in write_ports
  SUNRPC: NFS kernel APIs shouldn't return ENOENT for "transport not found"
  SUNRPC: Bury "#ifdef IPV6" in svc_create_xprt()
  NFSD: Support AF_INET6 in svc_addsock() function
  SUNRPC: Use rpc_pton() in ip_map_parse()
  nfsd: 4.1 has an rfc number
  nfsd41: Create the recovery entry for the NFSv4.1 client
  nfsd: use vfs_fsync for non-directories
  ...
2010-03-06 11:31:38 -08:00
H Hartley Sweeten 72c3368856 nodemask.h: remove macro any_online_node
The macro any_online_node() is prone to producing sparse warnings due to
the local symbol 'node'.  Since all the in-tree users are really
requesting the first online node (the mask argument is either
NODE_MASK_ALL or node_online_map) just use the first_online_node macro and
remove the any_online_node macro since there are no users.

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Milton Miller <miltonm@bga.com>
Cc: Nathan Fontenot <nfont@austin.ibm.com>
Cc: Geoff Levand <geoffrey.levand@am.sony.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Benny Halevy <bhalevy@panasas.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-06 11:26:31 -08:00
Jeff Garzik d17792ebdf ethtool: Add direct access to ops->get_sset_count
On 03/04/2010 09:26 AM, Ben Hutchings wrote:
> On Thu, 2010-03-04 at 00:51 -0800, Jeff Kirsher wrote:
>> From: Jeff Garzik<jgarzik@redhat.com>
>>
>> This patch is an alternative approach for accessing string
>> counts, vs. the drvinfo indirect approach.  This way the drvinfo
>> space doesn't run out, and we don't break ABI later.
> [...]
>> --- a/net/core/ethtool.c
>> +++ b/net/core/ethtool.c
>> @@ -214,6 +214,10 @@ static noinline int ethtool_get_drvinfo(struct net_device *dev, void __user *use
>>   	info.cmd = ETHTOOL_GDRVINFO;
>>   	ops->get_drvinfo(dev,&info);
>>
>> +	/*
>> +	 * this method of obtaining string set info is deprecated;
>> +	 * consider using ETHTOOL_GSSET_INFO instead
>> +	 */
>
> This comment belongs on the interface (ethtool.h) not the
> implementation.

Debatable -- the current comment is located at the callsite of
ops->get_sset_count(), which is where an implementor might think to add
a new call.  Not all the numeric fields in ethtool_drvinfo are obtained
from ->get_sset_count().

Hence the "some" in the attached patch to include/linux/ethtool.h,
addressing your comment.

> [...]
>> +static noinline int ethtool_get_sset_info(struct net_device *dev,
>> +                                          void __user *useraddr)
>> +{
> [...]
>> +	/* calculate size of return buffer */
>> +	for (i = 0; i<  64; i++)
>> +		if (sset_mask&  (1ULL<<  i))
>> +			n_bits++;
> [...]
>
> We have a function for this:
>
> 	n_bits = hweight64(sset_mask);

Agreed.

I've attached a follow-up patch, which should enable my/Jeff's kernel
patch to be applied, followed by this one.

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-05 14:00:17 -08:00
Jeff Garzik 723b2f57ad ethtool: Add direct access to ops->get_sset_count
This patch is an alternative approach for accessing string
counts, vs. the drvinfo indirect approach.  This way the drvinfo
space doesn't run out, and we don't break ABI later.

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-05 14:00:17 -08:00
Zhu Yi a3a858ff18 net: backlog functions rename
sk_add_backlog -> __sk_add_backlog
sk_add_backlog_limited -> sk_add_backlog

Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-05 13:34:03 -08:00
Zhu Yi 2499849ee8 x25: use limited socket backlog
Make x25 adapt to the limited socket backlog change.

Cc: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-05 13:34:02 -08:00
Zhu Yi 53eecb1be5 tipc: use limited socket backlog
Make tipc adapt to the limited socket backlog change.

Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-05 13:34:02 -08:00
Zhu Yi 50b1a782f8 sctp: use limited socket backlog
Make sctp adapt to the limited socket backlog change.

Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-05 13:34:01 -08:00
Zhu Yi 79545b6819 llc: use limited socket backlog
Make llc adapt to the limited socket backlog change.

Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-05 13:34:01 -08:00
Zhu Yi 55349790d7 udp: use limited socket backlog
Make udp adapt to the limited socket backlog change.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-05 13:34:00 -08:00
Zhu Yi 6b03a53a5a tcp: use limited socket backlog
Make tcp adapt to the limited socket backlog change.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-05 13:34:00 -08:00
Zhu Yi 8eae939f14 net: add limit for socket backlog
We got system OOM while running some UDP netperf testing on the loopback
device. The case is multiple senders sent stream UDP packets to a single
receiver via loopback on local host. Of course, the receiver is not able
to handle all the packets in time. But we surprisingly found that these
packets were not discarded due to the receiver's sk->sk_rcvbuf limit.
Instead, they are kept queuing to sk->sk_backlog and finally ate up all
the memory. We believe this is a secure hole that a none privileged user
can crash the system.

The root cause for this problem is, when the receiver is doing
__release_sock() (i.e. after userspace recv, kernel udp_recvmsg ->
skb_free_datagram_locked -> release_sock), it moves skbs from backlog to
sk_receive_queue with the softirq enabled. In the above case, multiple
busy senders will almost make it an endless loop. The skbs in the
backlog end up eat all the system memory.

The issue is not only for UDP. Any protocols using socket backlog is
potentially affected. The patch adds limit for socket backlog so that
the backlog size cannot be expanded endlessly.

Reported-by: Alex Shi <alex.shi@intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru
Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Allan Stephens <allan.stephens@windriver.com>
Cc: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-05 13:33:59 -08:00
Linus Torvalds cc7889ff5e Merge branch 'nfs-for-2.6.34' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'nfs-for-2.6.34' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (44 commits)
  NFS: Remove requirement for inode->i_mutex from nfs_invalidate_mapping
  NFS: Clean up nfs_sync_mapping
  NFS: Simplify nfs_wb_page()
  NFS: Replace __nfs_write_mapping with sync_inode()
  NFS: Simplify nfs_wb_page_cancel()
  NFS: Ensure inode is always marked I_DIRTY_DATASYNC, if it has unstable pages
  NFS: Run COMMIT as an asynchronous RPC call when wbc->for_background is set
  NFS: Reduce the number of unnecessary COMMIT calls
  NFS: Add a count of the number of unstable writes carried by an inode
  NFS: Cleanup - move nfs_write_inode() into fs/nfs/write.c
  nfs41 fix NFS4ERR_CLID_INUSE for exchange id
  NFS: Fix an allocation-under-spinlock bug
  SUNRPC: Handle EINVAL error returns from the TCP connect operation
  NFSv4.1: Various fixes to the sequence flag error handling
  nfs4: renewd renew operations should take/put a client reference
  nfs41: renewd sequence operations should take/put client reference
  nfs: prevent backlogging of renewd requests
  nfs: kill renewd before clearing client minor version
  NFS: Make close(2) asynchronous when closing NFS O_DIRECT files
  NFS: Improve NFS iostat byte count accuracy for writes
  ...
2010-03-05 13:25:45 -08:00
Sripathi Kodi c5a7697da9 9P2010.L handshake: .L protocol negotiation
This patch adds 9P2010.L protocol negotiation with the server

Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-03-05 15:04:42 -06:00
Sripathi Kodi 342fee1d5c 9P2010.L handshake: Remove "dotu" variable
Removes 'dotu' variable and make everything dependent
on 'proto_version' field.

Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-03-05 15:04:42 -06:00
Sripathi Kodi 0fb80abd91 9P2010.L handshake: Add mount option
Add new mount V9FS mount option to specify protocol version

This patch adds a new mount option to specify protocol version.
With this option it is possible to use "-o version=" switch to
specify 9P protocol version to use. Valid options for version
are:
9p2000
9p2000.u
9p2010.L

Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-03-05 15:04:42 -06:00
Aneesh Kumar K.V c1a7c22620 net/9p: Handle mount errors correctly.
With this patch we have

# mount -t 9p -o trans=virtio virtio2 /mnt/
# mount -t 9p -o trans=virtio virtio2 /mnt/
mount: virtio2 already mounted or /mnt/ busy
mount: according to mtab, virtio2 is already mounted on /mnt
# mount -t 9p -o trans=virtio virtio3 /mnt/ -o debug=0xfff
mount: special device virtio3 does not exist

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-03-05 15:04:41 -06:00
Aneesh Kumar K.V 37c1209d41 net/9p: Remove MAX_9P_CHAN limit
Use a list to track the channel instead of statically
allocated array

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-03-05 15:04:41 -06:00
Aneesh Kumar K.V f75580c4af net/9p: Add multi channel support.
This is needed for supporting multiple mount points.

We can find out the device names to be used with mount by checking

/sys/devices/virtio-pci/virtio*/device file

if the device file have value 9 then the specific virtio device can
be used for mounting.

ex:
 #cat /sys/devices/virtio-pci/virtio1/device
 9

now we can mount using
# mount -t 9p -o trans=virtio virtio1  /mnt/

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-03-05 15:04:41 -06:00
Trond Myklebust 3fa04ecd72 Merge branch 'writeback-for-2.6.34' into nfs-for-2.6.34 2010-03-05 15:46:18 -05:00
J. Bruce Fields 4ea41e2de5 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs into for-2.6.34-incoming
Resolve merge conflict in fs/xfs/linux-2.6/xfs_export.c.
2010-03-04 12:04:51 -05:00
Linus Torvalds 0f2cc4ecd8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits)
  init: Open /dev/console from rootfs
  mqueue: fix typo "failues" -> "failures"
  mqueue: only set error codes if they are really necessary
  mqueue: simplify do_open() error handling
  mqueue: apply mathematics distributivity on mq_bytes calculation
  mqueue: remove unneeded info->messages initialization
  mqueue: fix mq_open() file descriptor leak on user-space processes
  fix race in d_splice_alias()
  set S_DEAD on unlink() and non-directory rename() victims
  vfs: add NOFOLLOW flag to umount(2)
  get rid of ->mnt_parent in tomoyo/realpath
  hppfs can use existing proc_mnt, no need for do_kern_mount() in there
  Mirror MS_KERNMOUNT in ->mnt_flags
  get rid of useless vfsmount_lock use in put_mnt_ns()
  Take vfsmount_lock to fs/internal.h
  get rid of insanity with namespace roots in tomoyo
  take check for new events in namespace (guts of mounts_poll()) to namespace.c
  Don't mess with generic_permission() under ->d_lock in hpfs
  sanitize const/signedness for udf
  nilfs: sanitize const/signedness in dealing with ->d_name.name
  ...

Fix up fairly trivial (famous last words...) conflicts in
drivers/infiniband/core/uverbs_main.c and security/tomoyo/realpath.c
2010-03-04 08:15:33 -08:00
Neil Horman d0021b252e tipc: Fix oops on send prior to entering networked mode (v3)
Fix TIPC to disallow sending to remote addresses prior to entering NET_MODE

user programs can oops the kernel by sending datagrams via AF_TIPC prior to
entering networked mode.  The following backtrace has been observed:

ID: 13459  TASK: ffff810014640040  CPU: 0   COMMAND: "tipc-client"
[exception RIP: tipc_node_select_next_hop+90]
RIP: ffffffff8869d3c3  RSP: ffff81002d9a5ab8  RFLAGS: 00010202
RAX: 0000000000000001  RBX: 0000000000000001  RCX: 0000000000000001
RDX: 0000000000000000  RSI: 0000000000000001  RDI: 0000000001001001
RBP: 0000000001001001   R8: 0074736575716552   R9: 0000000000000000
R10: ffff81003fbd0680  R11: 00000000000000c8  R12: 0000000000000008
R13: 0000000000000001  R14: 0000000000000001  R15: ffff810015c6ca00
ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
RIP: 0000003cbd8d49a3  RSP: 00007fffc84e0be8  RFLAGS: 00010206
RAX: 000000000000002c  RBX: ffffffff8005d116  RCX: 0000000000000000
RDX: 0000000000000008  RSI: 00007fffc84e0c00  RDI: 0000000000000003
RBP: 0000000000000000   R8: 00007fffc84e0c10   R9: 0000000000000010
R10: 0000000000000000  R11: 0000000000000246  R12: 0000000000000000
R13: 00007fffc84e0d10  R14: 0000000000000000  R15: 00007fffc84e0c30
ORIG_RAX: 000000000000002c  CS: 0033  SS: 002b

What happens is that, when the tipc module in inserted it enters a standalone
node mode in which communication to its own address is allowed <0.0.0> but not
to other addresses, since the appropriate data structures have not been
allocated yet (specifically the tipc_net pointer).  There is nothing stopping a
client from trying to send such a message however, and if that happens, we
attempt to dereference tipc_net.zones while the pointer is still NULL, and
explode.  The fix is pretty straightforward.  Since these oopses all arise from
the dereference of global pointers prior to their assignment to allocated
values, and since these allocations are small (about 2k total), lets convert
these pointers to static arrays of the appropriate size.  All the accesses to
these bits consider 0/NULL to be a non match when searching, so all the lookups
still work properly, and there is no longer a chance of a bad dererence
anywhere.  As a bonus, this lets us eliminate the setup/teardown routines for
those pointers, and elimnates the need to preform any locking around them to
prevent access while their being allocated/freed.

I've updated the tipc_net structure to behave this way to fix the exact reported
problem, and also fixed up the tipc_bearers and media_list arrays to fix an
obvious simmilar problem that arises from issuing tipc-config commands to
manipulate bearers/links prior to entering networked mode

I've tested this for a few hours by running the sanity tests and stress test
with the tipcutils suite, and nothing has fallen over.  There have been a few
lockdep warnings, but those were there before, and can be addressed later, as
they didn't actually result in any deadlock.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Allan Stephens <allan.stephens@windriver.com>
CC: David S. Miller <davem@davemloft.net>
CC: tipc-discussion@lists.sourceforge.net

 bearer.c |   37 ++++++-------------------------------
 bearer.h |    2 +-
 net.c    |   25 ++++---------------------
 3 files changed, 11 insertions(+), 53 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-04 00:53:52 -08:00
Timo Teräs 6d55cb91a0 gre: fix hard header destination address checking
ipgre_header() can be called with zero daddr when the gre device is
configured as multipoint tunnel and still has the NOARP flag set (which is
typically cleared by the userspace arp daemon).  If the NOARP packets are
not dropped, ipgre_tunnel_xmit() will take rt->rt_gateway (= NBMA IP) and
use that for route look up (and may lead to bogus xfrm acquires).

The multicast address check is removed as sending to multicast group should
be ok.  In fact, if gre device has a multicast address as destination
ipgre_header is always called with multicast address.

Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-04 00:53:52 -08:00
stephen hemminger 8f37ada5b5 IPv6: fix race between cleanup and add/delete address
This solves a potential race problem during the cleanup process.
The issue is that addrconf_ifdown() needs to traverse address list,
but then drop lock to call the notifier. The version in -next
could get confused if add/delete happened during this window.
Original code (2.6.32 and earlier) was okay because all addresses
were always deleted.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-04 00:39:34 -08:00
stephen hemminger 84e8b803f1 IPv6: addrconf notify when address is unavailable
My recent change in net-next to retain permanent addresses caused regression.
Device refcount would not go to zero when device was unregistered because
left over anycast reference would hold ipv6 dev reference which would hold
device references...

The correct procedure is to call notify chain when address is no longer
available for use.  When interface comes back DAD timer will notify
back that address is available.

Also, link local addresses should be purged when interface is brought
down. The address might be changed.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-04 00:39:33 -08:00
stephen hemminger 5b2a19539c IPv6: addrconf timer race
The Router Solicitation timer races with device state changes
because it doesn't lock the device. Use local variable to avoid
one repeated dereference.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-04 00:39:33 -08:00
stephen hemminger 122e4519cd IPv6: addrconf dad timer unnecessary bh_disable
Timer code runs in bottom half, so there is no need for
using _bh form of locking.  Also check if device is not ready
to avoid race with address that is no longer active.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-04 00:39:32 -08:00
David S. Miller e5c1a0aa00 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2010-03-03 22:42:54 -08:00
Sujith 4fa0043731 mac80211: Fix HT rate control configuration
Handling HT configuration changes involved setting the channel
with the new HT parameters and then issuing a rate_update()
notification to the driver.

This behavior changed after the off-channel changes. Now, the channel
is not updated with the new HT params in enable_ht() - instead, it
is now done when the scan work terminates. This results in the driver
depending on stale information, defaulting to non-HT mode always.

Fix this by passing the new channel type to the driver.

Cc: stable@kernel.org
Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-03-03 15:39:21 -05:00
Al Viro fc7bed8c80 Don't bother with d_genocide in rpc_pipe
kill_litter_super() from ->kill_sb() will take care of the junk
2010-03-03 14:07:54 -05:00
Randy Dunlap 1cd4efddc4 bridge: depends on INET
br_multicast calls ip_send_check(), so it should depend on INET.

built-in:
br_multicast.c:(.text+0x88cf4): undefined reference to `ip_send_check'

or modular:
ERROR: "ip_send_check" [net/bridge/bridge.ko] undefined!

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-03 01:23:22 -08:00
Marcel Holtmann d4612cb86e Bluetooth: Use single_open() for inquiry cache within debugfs
The inquiry cache information in debugfs should be using seq_file support
and not allocating memory on the stack for the string. Since the usage of
these information is really seldom, using single_open() for it is good
enough.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-03 01:04:38 -08:00
Jiri Pirko 1162563f82 af_packet: move strict addr_len check right before dev_[mc/unicast]_[add/del]
My previous patch 914c8ad2d1 incorrectly changed
the length check in packet_mc_add to be more strict. The problem is that
userspace is not filling this field (and it stays zeroed) in case of setting
PACKET_MR_PROMISC or PACKET_MR_ALLMULTI. So move the strict check to the point
in path where the addr_len must be set correctly.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Reported-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-03 01:04:38 -08:00
Herbert Xu 87c1e12b5e ipsec: Fix bogus bundle flowi
When I merged the bundle creation code, I introduced a bogus
flowi value in the bundle.  Instead of getting from the caller,
it was instead set to the flow in the route object, which is
totally different.

The end result is that the bundles we created never match, and
we instead end up with an ever growing bundle list.

Thanks to Jamal for find this problem.

Reported-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-03-03 01:04:37 -08:00