Commit Graph

39010 Commits

Author SHA1 Message Date
J. Bruce Fields edc7a89403 nfsd: provide callbacks on svc_xprt deletion
NFSv4.1 needs warning when a client tcp connection goes down, if that
connection is being used as a backchannel, so that it can warn the
client that it has lost the backchannel connection.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-10-01 19:29:44 -04:00
J. Bruce Fields c7662518c7 nfsd4: keep per-session list of connections
The spec requires us in various places to keep track of the connections
associated with each session.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-10-01 19:29:44 -04:00
J. Bruce Fields 1e7af1b806 nfsd4: remove spkm3
Unfortunately, spkm3 never got very far; while interoperability with one
other implementation was demonstrated at some point, problems were found
with the spec that were deemed not worth fixing.

The kernel code is useless on its own without nfs-utils patches which
were never merged into nfs-utils, and were only ever available from
citi.umich.edu.  They appear not to have been updated since 2005.

Therefore it seems safe to assume that this code has no users, and never
will.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-01 18:09:55 -04:00
Pavel Emelyanov 721db93a55 net: Export __sock_create
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-01 17:18:59 -04:00
Pavel Emelyanov 37aa213373 sunrpc: Tag rpc_xprt with net
The net is known from the xprt_create and this tagging will also
give un the context in the conntection workers where real sockets
are created.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-01 17:18:58 -04:00
Pavel Emelyanov 9a23e332ec sunrpc: Add net to xprt_create
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-01 17:18:57 -04:00
Pavel Emelyanov c653ce3f0a sunrpc: Add net to rpc_create_args
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-01 17:18:56 -04:00
Pavel Emelyanov 62832c039e sunrpc: Pull net argument downto svc_create_socket
After this the socket creation in it knows the context.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-01 17:18:55 -04:00
Pavel Emelyanov fc5d00b04a sunrpc: Add net argument to svc_create_xprt
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-01 17:18:54 -04:00
Pavel Emelyanov e204e621b4 sunrpc: Factor out rpc_xprt freeing
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-01 17:18:53 -04:00
Pavel Emelyanov bd1722d431 sunrpc: Factor out rpc_xprt allocation
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-01 17:18:52 -04:00
Stephen Rothwell c135e84afb sunrpc: fix up rpcauth_remove_module section mismatch
On Wed, 29 Sep 2010 14:02:38 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> After merging the final tree, today's linux-next build (powerpc
> ppc44x_defconfig) produced tis warning:
>
> WARNING: net/sunrpc/sunrpc.o(.init.text+0x110): Section mismatch in reference from the function init_sunrpc() to the function .exit.text:rpcauth_remove_module()
> The function __init init_sunrpc() references
> a function __exit rpcauth_remove_module().
> This is often seen when error handling in the init function
> uses functionality in the exit path.
> The fix is often to remove the __exit annotation of
> rpcauth_remove_module() so it may be used outside an exit section.
>
> Probably caused by commit 2f72c9b737
> ("sunrpc: The per-net skeleton").

This actually causes a build failure on a sparc32 defconfig build:

`rpcauth_remove_module' referenced in section `.init.text' of net/built-in.o: defined in discarded section `.exit.text' of net/built-in.o

I applied the following patch for today:

Fixes:

`rpcauth_remove_module' referenced in section `.init.text' of net/built-in.o: defined in discarded section `.exit.text' of net/built-in.o

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-09-29 12:27:37 -04:00
Pavel Emelyanov 4f42d0d53c sunrpc: Make the /proc/net/rpc appear in net namespaces
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-09-27 10:16:12 -04:00
Pavel Emelyanov 4fb8518bda sunrpc: Tag svc_xprt with net
The transport representation should be per-net of course.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-09-27 10:16:12 -04:00
Pavel Emelyanov 593ce16b94 sunrpc: Add routines that allow registering per-net caches
Existing calls do the same, but for the init_net.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-09-27 10:16:11 -04:00
Pavel Emelyanov 352114f395 sunrpc: Add net to pure API calls
There are two calls that operate on ip_map_cache and are
directly called from the nfsd code. Other places will be
handled in a different way.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-09-27 10:16:11 -04:00
Pavel Emelyanov e3bfca01c1 sunrpc: Make xprt auth cache release work with the xprt
This is done in order to facilitate getting the ip_map_cache from
which to put the ip_map.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-09-27 10:16:11 -04:00
NeilBrown 1117449276 sunrpc/cache: change deferred-request hash table to use hlist.
Being a hash table, hlist is the best option.

There is currently some ugliness were we treat "->next == NULL" as
a special case to avoid having to initialise the whole array.
This change nicely gets rid of that case.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-09-21 21:51:11 -04:00
NeilBrown 1ebede86b8 sunrpc: close connection when a request is irretrievably lost.
If we drop a request in the sunrpc layer, either due kmalloc failure,
or due to a cache miss when we could not queue the request for later
replay, then close the connection to encourage the client to retry sooner.

Note that if the drop happens in the NFS layer, NFSERR_JUKEBOX
(aka NFS4ERR_DELAY) is returned to guide the client concerning
replay.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-09-21 16:57:49 -04:00
J. Bruce Fields c88739b373 Merge remote branch 'trond/bugfixes' into for-2.6.37
Without some client-side fixes, server testing is currently difficult.
2010-09-19 23:48:32 -04:00
Trond Myklebust 006abe887c SUNRPC: Fix a race in rpc_info_open
There is a race between rpc_info_open and rpc_release_client()
in that nothing stops a process from opening the file after
the clnt->cl_kref goes to zero.

Fix this by using atomic_inc_unless_zero()...

Reported-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
2010-09-12 19:55:25 -04:00
Linus Torvalds 002e473d1c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (28 commits)
  ipheth: remove incorrect devtype to WWAN
  MAINTAINERS: Add CAIF
  sctp: fix test for end of loop
  KS8851: Correct RX packet allocation
  udp: add rehash on connect()
  net: blackhole route should always be recalculated
  ipv4: Suppress lockdep-RCU false positive in FIB trie (3)
  niu: Fix kernel buffer overflow for ETHTOOL_GRXCLSRLALL
  ipvs: fix active FTP
  gro: Re-fix different skb headrooms
  via-velocity: Turn scatter-gather support back off.
  ipv4: Fix reverse path filtering with multipath routing.
  UNIX: Do not loop forever at unix_autobind().
  PATCH: b44 Handle RX FIFO overflow better (simplified)
  irda: off by one
  3c59x: Fix deadlock in vortex_error()
  netfilter: discard overlapping IPv6 fragment
  ipv6: discard overlapping fragment
  net: fix tx queue selection for bridged devices implementing select_queue
  bonding: Fix jiffies overflow problems (again)
  ...

Fix up trivial conflicts due to the same cgroup API thinko fix going
through both Andrew and the networking tree.  However, there were small
differences between the two, with Andrew's version generally being the
nicer one, and the one I merged first. So pick that one.

Conflicts in: include/linux/cgroup.h and kernel/cgroup.c
2010-09-11 08:06:38 -07:00
Linus Torvalds ff3cb3fec3 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  block: Range check cpu in blk_cpu_to_group
  scatterlist: prevent invalid free when alloc fails
  writeback: Fix lost wake-up shutting down writeback thread
  writeback: do not lose wakeup events when forking bdi threads
  cciss: fix reporting of max queue depth since init
  block: switch s390 tape_block and mg_disk to elevator_change()
  block: add function call to switch the IO scheduler from a driver
  fs/bio-integrity.c: return -ENOMEM on kmalloc failure
  bio-integrity.c: remove dependency on __GFP_NOFAIL
  BLOCK: fix bio.bi_rw handling
  block: put dev->kobj in blk_register_queue fail path
  cciss: handle allocation failure
  cfq-iosched: Documentation help for new tunables
  cfq-iosched: blktrace print per slice sector stats
  cfq-iosched: Implement tunable group_idle
  cfq-iosched: Do group share accounting in IOPS when slice_idle=0
  cfq-iosched: Do not idle if slice_idle=0
  cciss: disable doorbell reset on reset_devices
  blkio: Fix return code for mkdir calls
2010-09-10 07:26:27 -07:00
David S. Miller 053d8f6622 Merge branch 'vhost-net' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost 2010-09-09 21:59:51 -07:00
Linus Torvalds df423dc7f2 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  libata-sff: Reenable Port Multiplier after libata-sff remodeling.
  libata: skip EH autopsy and recovery during suspend
  ahci: AHCI and RAID mode SATA patch for Intel Patsburg DeviceIDs
  ata_piix: IDE Mode SATA patch for Intel Patsburg DeviceIDs
  libata,pata_via: revert ata_wait_idle() removal from ata_sff/via_tf_load()
  ahci: fix hang on failed softreset
  pata_artop: Fix device ID parity check
2010-09-09 20:28:19 -07:00
Gwendal Grignou ea3c64506e libata-sff: Reenable Port Multiplier after libata-sff remodeling.
Keep track of the link on the which the current request is in progress.
It allows support of links behind port multiplier.

Not all libata-sff is PMP compliant. Code for native BMDMA controller
does not take in accound PMP.

Tested on Marvell 7042 and Sil7526.

Signed-off-by: Gwendal Grignou <gwendal@google.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-09-09 22:31:55 -04:00
Tejun Heo e2f3d75fc0 libata: skip EH autopsy and recovery during suspend
For some mysterious reason, certain hardware reacts badly to usual EH
actions while the system is going for suspend.  As the devices won't
be needed until the system is resumed, ask EH to skip usual autopsy
and recovery and proceed directly to suspend.

Signed-off-by: Tejun Heo <tj@kernel.org>
Tested-by: Stephan Diestelhorst <stephan.diestelhorst@amd.com>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2010-09-09 22:27:59 -04:00
Christoph Lameter aa45484031 mm: page allocator: calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake
Ordinarily watermark checks are based on the vmstat NR_FREE_PAGES as it is
cheaper than scanning a number of lists.  To avoid synchronization
overhead, counter deltas are maintained on a per-cpu basis and drained
both periodically and when the delta is above a threshold.  On large CPU
systems, the difference between the estimated and real value of
NR_FREE_PAGES can be very high.  If NR_FREE_PAGES is much higher than
number of real free page in buddy, the VM can allocate pages below min
watermark, at worst reducing the real number of pages to zero.  Even if
the OOM killer kills some victim for freeing memory, it may not free
memory if the exit path requires a new page resulting in livelock.

This patch introduces a zone_page_state_snapshot() function (courtesy of
Christoph) that takes a slightly more accurate view of an arbitrary vmstat
counter.  It is used to read NR_FREE_PAGES while kswapd is awake to avoid
the watermark being accidentally broken.  The estimate is not perfect and
may result in cache line bounces but is expected to be lighter than the
IPI calls necessary to continually drain the per-cpu counters while kswapd
is awake.

Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09 18:57:25 -07:00
Hugh Dickins 3399446632 swap: discard while swapping only if SWAP_FLAG_DISCARD
Tests with recent firmware on Intel X25-M 80GB and OCZ Vertex 60GB SSDs
show a shift since I last tested in December: in part because of firmware
updates, in part because of the necessary move from barriers to awaiting
completion at the block layer.  While discard at swapon still shows as
slightly beneficial on both, discarding 1MB swap cluster when allocating
is now disadvanteous: adds 25% overhead on Intel, adds 230% on OCZ (YMMV).

Surrender: discard as presently implemented is more hindrance than help
for swap; but might prove useful on other devices, or with improvements.
So continue to do the discard at swapon, but make discard while swapping
conditional on a SWAP_FLAG_DISCARD to sys_swapon() (which has been using
only the lower 16 bits of int flags).

We can add a --discard or -d to swapon(8), and a "discard" to swap in
/etc/fstab: matching the mount option for btrfs, ext4, fat, gfs2, nilfs2.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Nigel Cunningham <nigel@tuxonice.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <jaxboe@fusionio.com>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09 18:57:25 -07:00
Hugh Dickins 910321ea81 swap: revert special hibernation allocation
Please revert 2.6.36-rc commit d2997b1042
"hibernation: freeze swap at hibernation".  It complicated matters by
adding a second swap allocation path, just for hibernation; without in any
way fixing the issue that it was intended to address - page reclaim after
fixing the hibernation image might free swap from a page already imaged as
swapcache, letting its swap be reallocated to store a different page of
the image: resulting in data corruption if the imaged page were freed as
clean then swapped back in.  Pages freed to si->swap_map were still in
danger of being reallocated by the alternative allocation path.

I guess it inadvertently fixed slow SSD swap allocation for hibernation,
as reported by Nigel Cunningham: by missing out the discards that occur on
the usual swap allocation path; but that was unintentional, and needs a
separate fix.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Ondrej Zary <linux@rainbow-software.org>
Cc: Andrea Gelmini <andrea.gelmini@gmail.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Nigel Cunningham <nigel@tuxonice.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09 18:57:25 -07:00
David Brownell c956126c13 gpio: doc updates
There's been some recent confusion about error checking GPIO numbers.
briefly, it should be handled mostly during setup, when gpio_request() is
called, and NEVER by expectig gpio_is_valid to report more than
never-usable GPIO numbers.

[akpm@linux-foundation.org: terminate unterminated comment]
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Eric Miao" <eric.y.miao@gmail.com>
Cc: "Ryan Mallon" <ryan@bluewatersys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09 18:57:24 -07:00
Gregory Bean 5affb60772 gpio: sx150x: correct and refine reset-on-probe behavior
Replace the arbitrary software-reset call from the device-probe
method, because:

- It is defective.  To work correctly, it should be two byte writes,
  not a single word write.  As it stands, it does nothing.

- Some devices with sx150x expanders installed have their NRESET pins
  ganged on the same line, so resetting one causes the others to reset -
  not a nice thing to do arbitrarily!

- The probe, usually taking place at boot, implies a recent hard-reset,
  so a software reset at this point is just a waste of energy anyway.

Therefore, make it optional, defaulting to off, as this will match the
common case of probing at powerup and also matches the current broken
no-op behavior.

Signed-off-by: Gregory Bean <gbean@codeaurora.org>
Reviewed-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09 18:57:24 -07:00
Andrea Arcangeli 4969c1192d mm: fix swapin race condition
The pte_same check is reliable only if the swap entry remains pinned (by
the page lock on swapcache).  We've also to ensure the swapcache isn't
removed before we take the lock as try_to_free_swap won't care about the
page pin.

One of the possible impacts of this patch is that a KSM-shared page can
point to the anon_vma of another process, which could exit before the page
is freed.

This can leave a page with a pointer to a recycled anon_vma object, or
worse, a pointer to something that is no longer an anon_vma.

[riel@redhat.com: changelog help]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09 18:57:24 -07:00
Michael S. Tsirkin 31583bb0cf cgroups: fix API thinko
Add cgroup_attach_task_all()

The existing cgroup_attach_task_current_cg() API is called by a thread to
attach another thread to all of its cgroups; this is unsuitable for cases
where a privileged task wants to attach itself to the cgroups of a less
privileged one, since the call must be made from the context of the target
task.

This patch adds a more generic cgroup_attach_task_all() API that allows
both the source task and to-be-moved task to be specified.
cgroup_attach_task_current_cg() becomes a specialization of the more
generic new function.

[menage@google.com: rewrote changelog]
[akpm@linux-foundation.org: address reviewer comments]
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Ben Blum <bblum@google.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09 18:57:23 -07:00
Huang Ying e0bf1024b3 kfifo: add parenthesis for macro parameter reference
Some macro parameter references inside typeof() operator are not enclosed
with parenthesis.  It should be safer to add them.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Stefani Seibold <stefani@seibold.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09 18:57:22 -07:00
David Vrabel f3c65b2870 mmc: avoid getting CID on SDIO-only cards
The introduction of support for SD combo cards breaks the initialization
of all CSR SDIO chips.  The GO_IDLE (CMD0) in mmc_sd_get_cid() causes CSR
chips to be reset (this is non-standard behavior).

When initializing an SDIO card check for a combo card by using the memory
present bit in the R4 response to IO_SEND_OP_COND (CMD5).  This avoids the
call to mmc_sd_get_cid() on an SDIO-only card.

Signed-off-by: David Vrabel <david.vrabel@csr.com>
Acked-by: Michal Mirolaw <mirq-linux@rere.qmqm.pl>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09 18:57:22 -07:00
Jonathan Corbet a73f8844e1 lglock: make lg_lock_global() actually lock globally
lg_lock_global() currently only acquires spinlocks for online CPUs, but
it's meant to lock all possible CPUs.  Lglock-protected resources may be
associated with removed CPUs - and, indeed, that could happen with the
per-superblock open files lists.

At Nick's suggestion, change for_each_online_cpu() to
for_each_possible_cpu() to protect accesses to those resources.

Cc: Al Viro <viro@ZenIV.linux.org.uk>
Acked-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09 09:09:43 -07:00
Stefan Bader 39aa3cb3e8 mm: Move vma_stack_continue into mm.h
So it can be used by all that need to check for that.

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-09 09:05:06 -07:00
David S. Miller e199e6136c Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 2010-09-08 23:49:04 -07:00
Eric Dumazet 719f835853 udp: add rehash on connect()
commit 30fff923 introduced in linux-2.6.33 (udp: bind() optimisation)
added a secondary hash on UDP, hashed on (local addr, local port).

Problem is that following sequence :

fd = socket(...)
connect(fd, &remote, ...)

not only selects remote end point (address and port), but also sets
local address, while UDP stack stored in secondary hash table the socket
while its local address was INADDR_ANY (or ipv6 equivalent)

Sequence is :
 - autobind() : choose a random local port, insert socket in hash tables
              [while local address is INADDR_ANY]
 - connect() : set remote address and port, change local address to IP
              given by a route lookup.

When an incoming UDP frame comes, if more than 10 sockets are found in
primary hash table, we switch to secondary table, and fail to find
socket because its local address changed.

One solution to this problem is to rehash datagram socket if needed.

We add a new rehash(struct socket *) method in "struct proto", and
implement this method for UDP v4 & v6, using a common helper.

This rehashing only takes care of secondary hash table, since primary
hash (based on local port only) is not changed.

Reported-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-08 21:45:01 -07:00
Linus Torvalds 2c20130f20 Merge branch 'semaphore-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'semaphore-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  semaphore: Add DEFINE_SEMAPHORE
2010-09-08 11:15:51 -07:00
Linus Torvalds 1faa6ec8cc Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, mcheck: Avoid duplicate sysfs links/files for thresholding banks
  io-mapping: Fix the address space annotations
  x86: Fix the address space annotations of iomap_atomic_prot_pfn()
  x86, mm: Fix CONFIG_VMSPLIT_1G and 2G_OPT trampoline
  x86, hwmon: Fix unsafe smp_processor_id() in thermal_throttle_add_dev
2010-09-08 11:14:10 -07:00
Linus Torvalds 79637a41e4 Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  gcc-4.6: kernel/*: Fix unused but set warnings
  mutex: Fix annotations to include it in kernel-locking docbook
  pid: make setpgid() system call use RCU read-side critical section
  MAINTAINERS: Add RCU's public git tree
2010-09-08 11:13:42 -07:00
Julian Anastasov 6523ce1525 ipvs: fix active FTP
- Do not create expectation when forwarding the PORT
  command to avoid blocking the connection. The problem is that
  nf_conntrack_ftp.c:help() tries to create the same expectation later in
  POST_ROUTING and drops the packet with "dropping packet" message after
  failure in nf_ct_expect_related.

- Change ip_vs_update_conntrack to alter the conntrack
  for related connections from real server. If we do not alter the reply in
  this direction the next packet from client sent to vport 20 comes as NEW
  connection. We alter it but may be some collision happens for both
  conntracks and the second conntrack gets destroyed immediately. The
  connection stucks too.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-08 10:39:57 -07:00
Thomas Gleixner febc88c594 semaphore: Add DEFINE_SEMAPHORE
The full cleanup of init_MUTEX[_LOCKED] and DECLARE_MUTEX has not been
done. Some of the users are real semaphores and we should name them as
such instead of confusing everyone with "MUTEX".

Provide the infrastructure to get finally rid of init_MUTEX[_LOCKED]
and DECLARE_MUTEX.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
LKML-Reference: <20100907125054.795929962@linutronix.de>
2010-09-08 15:04:10 +02:00
NeilBrown f16b6e8d83 sunrpc/cache: allow threads to block while waiting for cache update.
The current practice of waiting for cache updates by queueing the
whole request to be retried has (at least) two problems.

1/ With NFSv4, requests can be quite complex and re-trying a whole
  request when a later part fails should only be a last-resort, not a
  normal practice.

2/ Large requests, and in particular any 'write' request, will not be
  queued by the current code and doing so would be undesirable.

In many cases only a very sort wait is needed before the cache gets
valid data.

So, providing the underlying transport permits it by setting
 ->thread_wait,
arrange to wait briefly for an upcall to be completed (as reflected in
the clearing of CACHE_PENDING).
If the short wait was not long enough and CACHE_PENDING is still set,
fall back on the old approach.

The 'thread_wait' value is set to 5 seconds when there are spare
threads, and 1 second when there are no spare threads.

These values are probably much higher than needed, but will ensure
some forward progress.

Note that as we only request an update for a non-valid item, and as
non-valid items are updated in place it is extremely unlikely that
cache_check will return -ETIMEDOUT.  Normally cache_defer_req will
sleep for a short while and then find that the item is_valid.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-09-07 19:22:07 -04:00
NeilBrown c5b29f885a sunrpc: use seconds since boot in expiry cache
This protects us from confusion when the wallclock time changes.

We convert to and from wallclock when  setting or reading expiry
times.

Also use seconds since boot for last_clost time.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-09-07 19:21:20 -04:00
NeilBrown 17cebf658e sunrpc: extract some common sunrpc_cache code from nfsd
Rather can duplicating this idiom twice, put it in an inline function.
This reduces the usage of 'expiry_time' out side the sunrpc/cache.c
code and thus the impact of a change that is about to be made to that
field.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-09-07 19:21:19 -04:00
Linus Torvalds d56557af19 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  PCI: bus speed strings should be const
  PCI hotplug: Fix build with CONFIG_ACPI unset
  PCI: PCIe: Remove the port driver module exit routine
  PCI: PCIe: Move PCIe PME code to the pcie directory
  PCI: PCIe: Disable PCIe port services during port initialization
  PCI: PCIe: Ask BIOS for control of all native services at once
  ACPI/PCI: Negotiate _OSC control bits before requesting them
  ACPI/PCI: Do not preserve _OSC control bits returned by a query
  ACPI/PCI: Make acpi_pci_query_osc() return control bits
  ACPI/PCI: Reorder checks in acpi_pci_osc_control_set()
  PCI: PCIe: Introduce commad line switch for disabling port services
  PCI: PCIe AER: Introduce pci_aer_available()
  x86/PCI: only define pci_domain_nr if PCI and PCI_DOMAINS are set
  PCI: provide stub pci_domain_nr function for !CONFIG_PCI configs
2010-09-07 16:00:17 -07:00
Linus Torvalds a44a553f82 Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc/pseries: Correct rtas_data_buf locking in dlpar code
  powerpc/85xx: Add P1021 PCI IDs and quirks
  arch/powerpc/sysdev/qe_lib/qe.c: Add of_node_put to avoid memory leak
  arch/powerpc/platforms/83xx/mpc837x_mds.c: Add missing iounmap
  fsl_rio: fix compile errors
  powerpc/85xx: Fix compile issue with p1022_ds due to lmb rename to memblock
  powerpc/85xx: Fix compilation of mpc85xx_mds.c
  powerpc: Don't use kernel stack with translation off
  powerpc/perf_event: Reduce latency of calling perf_event_do_pending
  powerpc/kexec: Adds correct calling convention for kexec purgatory
2010-09-07 14:34:37 -07:00