167740 Commits

Author SHA1 Message Date
Ian Campbell
bc2c030322 xen: try harder to balloon up under memory pressure.
Currently if the balloon driver is unable to increase the guest's
reservation it assumes the failure was due to reaching its full
allocation, gives up on the ballooning operation and records the limit
it reached as the "hard limit". The driver will not try again until
the target is set again (even to the same value).

However it is possible that ballooning has in fact failed due to
memory pressure in the host and therefore it is desirable to keep
attempting to reach the target in case memory becomes available. The
most likely scenario is that some guests are ballooning down while
others are ballooning up and therefore there is temporary memory
pressure while things stabilise. You would not expect a well behaved
toolstack to ask a domain to balloon to more than its allocation nor
would you expect it to deliberately over-commit memory by setting
balloon targets which exceed the total host memory.

This patch drops the concept of a hard limit and causes the balloon
driver to retry increasing the reservation on a timer in the same
manner as when decreasing the reservation.

Also if we partially succeed in increasing the reservation
(i.e. receive less pages than we asked for) then we may as well keep
those pages rather than returning them to Xen.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
2009-12-04 08:15:19 -08:00
Gianluca Guida
3d65c9488c Xen balloon: fix totalram_pages counting.
Change totalram_pages when a single page is added/removed to the
ballooned list. This avoid totalram_pages to be set erroneously to
max_pfn at boot.

Signed-off-by: Gianluca Guida <gianluca.guida@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
2009-12-04 08:13:44 -08:00
Ian Campbell
b4606f2165 xen: explicitly create/destroy stop_machine workqueues outside suspend/resume region.
I have observed cases where the implicit stop_machine_destroy() done by
stop_machine() hangs while destroying the workqueues, specifically in
kthread_stop(). This seems to be because timer ticks are not restarted
until after stop_machine() returns.

Fortunately stop_machine provides a facility to pre-create/post-destroy
the workqueues so use this to ensure that workqueues are only destroyed
after everything is really up and running again.

I only actually observed this failure with 2.6.30. It seems that newer
kernels are somehow more robust against doing kthread_stop() without timer
interrupts (I tried some backports of some likely looking candidates but
did not track down the commit which added this robustness). However this
change seems like a reasonable belt&braces thing to do.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
2009-12-03 11:14:56 -08:00
Ian Campbell
65f63384b3 xen: improve error handling in do_suspend.
The existing error handling has a few issues:
- If freeze_processes() fails it exits with shutting_down = SHUTDOWN_SUSPEND.
- If dpm_suspend_noirq() fails it exits without resuming xenbus.
- If stop_machine() fails it exits without resuming xenbus or calling
  dpm_resume_end().
- xs_suspend()/xs_resume() and dpm_suspend_noirq()/dpm_resume_noirq() were not
  nested in the obvious way.

Fix by ensuring each failure case goto's the correct label. Treat a failure of
stop_machine() as a cancelled suspend in order to follow the correct resume
path.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
2009-12-03 11:14:56 -08:00
Ian Campbell
fed5ea87e0 xen: don't leak IRQs over suspend/resume.
On resume irq_info[*].evtchn is reset to 0 since event channel mappings
are not preserved over suspend/resume. The other contents of irq_info
is preserved to allow rebind_evtchn_irq() to function.

However when a device resumes it will try to unbind from the
previous IRQ (e.g.  blkfront goes blkfront_resume() -> blkif_free() ->
unbind_from_irqhandler() -> unbind_from_irq()). This will fail due to the
check for VALID_EVTCHN in unbind_from_irq() and the IRQ is leaked. The
device will then continue to resume and allocate a new IRQ, eventually
leading to find_unbound_irq() panic()ing.

Fix this by changing unbind_from_irq() to handle teardown of interrupts
which have type!=IRQT_UNBOUND but are not currently bound to a specific
event channel.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
2009-12-03 11:14:55 -08:00
Ian Campbell
f6eafe3665 xen: call clock resume notifier on all CPUs
tick_resume() is never called on secondary processors. Presumably this
is because they are offlined for suspend on native and so this is
normally taken care of in the CPU onlining path. Under Xen we keep all
CPUs online over a suspend.

This patch papers over the issue for me but I will investigate a more
generic, less hacky, way of doing to the same.

tick_suspend is also only called on the boot CPU which I presume should
be fixed too.

Signed-off-by: Ian Campbell <Ian.Campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
2009-12-03 11:14:55 -08:00
Jeremy Fitzhardinge
6aaf5d633b xen: use iret for return from 64b kernel to 32b usermode
If Xen wants to return to a 32b usermode with sysret it must use the
right form.  When using VCGF_in_syscall to trigger this, it looks at
the code segment and does a 32b sysret if it is FLAT_USER_CS32.
However, this is different from __USER32_CS, so it fails to return
properly if we use the normal Linux segment.

So avoid the whole mess by dropping VCGF_in_syscall and simply use
plain iret to return to usermode.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Jan Beulich <jbeulich@novell.com>
Cc: Stable Kernel <stable@kernel.org>
2009-12-03 11:14:54 -08:00
Jeremy Fitzhardinge
922cc38ab7 xen: don't call dpm_resume_noirq() with interrupts disabled.
dpm_resume_noirq() takes a mutex, so it can't be called from a no-interrupt
context.  Don't call it from within the stop-machine function, but just
afterwards, since we're resuming anyway, regardless of what happened.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
2009-12-03 11:14:53 -08:00
Jeremy Fitzhardinge
499d19b82b xen: register runstate info for boot CPU early
printk timestamping uses sched_clock, which in turn relies on runstate
info under Xen.  So make sure we set it up before any printks can
be called.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
2009-12-03 11:14:53 -08:00
Ian Campbell
028896721a xen: register runstate on secondary CPUs
The commit "xen: re-register runstate area earlier on resume" caused us
to never try and setup the runstate area for secondary CPUs. Ensure that
we do this...

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
2009-12-03 11:14:52 -08:00
Ian Campbell
f350c7922f xen: register timer interrupt with IRQF_TIMER
Otherwise the timer is disabled by dpm_suspend_noirq() which in turn prevents
correct operation of stop_machine on multi-processor systems and breaks
suspend.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
2009-12-03 11:14:52 -08:00
Ian Campbell
fa24ba62ea xen: correctly restore pfn_to_mfn_list_list after resume
pvops kernels >= 2.6.30 can currently only be saved and restored once. The
second attempt to save results in:

    ERROR Internal error: Frame# in pfn-to-mfn frame list is not in pseudophys
    ERROR Internal error: entry 0: p2m_frame_list[0] is 0xf2c2c2c2, max 0x120000
    ERROR Internal error: Failed to map/save the p2m frame list

I finally narrowed it down to:

    commit cdaead6b4e657f960d6d6f9f380e7dfeedc6a09b
        Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
        Date:   Fri Feb 27 15:34:59 2009 -0800

            xen: split construction of p2m mfn tables from registration

            Build the p2m_mfn_list_list early with the rest of the p2m table, but
            register it later when the real shared_info structure is in place.

            Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

The unforeseen side-effect of this change was to cause the mfn list list to not
be rebuilt on resume. Prior to this change it would have been rebuilt via
xen_post_suspend() -> xen_setup_shared_info() -> xen_setup_mfn_list_list().

Fix by explicitly calling xen_build_mfn_list_list() from xen_post_suspend().

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
2009-12-03 11:14:51 -08:00
Jeremy Fitzhardinge
3905bb2aa7 xen: restore runstate_info even if !have_vcpu_info_placement
Even if have_vcpu_info_placement is not set, we still need to set up
the runstate area on each resumed vcpu.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
2009-12-03 11:14:51 -08:00
Ian Campbell
be012920ec xen: re-register runstate area earlier on resume.
This is necessary to ensure the runstate area is available to
xen_sched_clock before any calls to printk which will require it in
order to provide a timestamp.

I chose to pull the xen_setup_runstate_info out of xen_time_init into
the caller in order to maintain parity with calling
xen_setup_runstate_info separately from calling xen_time_resume.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
2009-12-03 11:14:50 -08:00
Paolo Bonzini
ae78880129 xen: wait up to 5 minutes for device connetion
Increases the device timeout from 10s to 5 minutes, giving the user a
visual indication during that time in case there are problems.  The patch
is a backport of changesets 144 and 150 in the Xenbits tree.

Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2009-12-03 11:14:49 -08:00
Paolo Bonzini
f8dc33088f xen: improvement to wait_for_devices()
When printing a warning about a timed-out device, print the
current state of both ends of the device connection (i.e., backend as
well as frontend).  This backports half of changeset 146 from the
Xenbits tree.

Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2009-12-03 11:14:49 -08:00
Paolo Bonzini
c6e1971139 xen: fix is_disconnected_device/exists_disconnected_device
The logic of is_disconnected_device/exists_disconnected_device is wrong
in that they are used to test whether a device is trying to connect (i.e.
connecting).  For this reason the patch fixes them to not consider a
Closing or Closed device to be connecting.  At the same time the patch
also renames the functions according to what they really do; you could
say a closed device is "disconnected" (the old name), but not "connecting"
(the new name).

This patch is a backport of changeset 909 from the Xenbits tree.

Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2009-12-03 11:14:48 -08:00
Jeremy Fitzhardinge
db05fed0ad xen/xenbus: make DEVICE_ATTR()s static
They don't need to be global, and may cause linker clashes.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
2009-12-03 11:14:26 -08:00
Jeremy Fitzhardinge
82d6469916 xen: mask extended topology info in cpuid
A Xen guest never needs to know about extended topology, and knowing
would just confuse it.

This patch just zeros ebx in leaf 0xb which indicates no topology info,
preventing a crash under Xen on cpus which support this leaf.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
2009-11-03 11:09:12 -08:00
Jeremy Fitzhardinge
7825cf10e3 xen/hvc: make sure console output is always emitted, with explicit polling
We never want to rely on the hvc workqueue to emit output, because the
most interesting output is when the kernel is broken.  This will
improve oops/crash/console message for better debugging.

Instead, we force-poll until all output is emitted.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
2009-11-03 11:05:51 -08:00
Jeremy Fitzhardinge
973df35ed9 xen: set up mmu_ops before trying to set any ptes
xen_setup_stackprotector() ends up trying to set page protections,
so we need to have vm_mmu_ops set up before trying to do so.
Failing to do so causes an early boot crash.

[ Impact: Fix early crash under Xen. ]

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2009-10-27 16:54:19 -07:00
Linus Torvalds
964fe080d9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  move virtrng_remove to .devexit.text
  move virtballoon_remove to .devexit.text
  virtio_blk: Revert serial number support
  virtio: let header files include virtio_ids.h
  virtio_blk: revert QUEUE_FLAG_VIRT addition
2009-10-23 07:35:16 +09:00
Linus Torvalds
4848490c50 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits)
  niu: VLAN_ETH_HLEN should be used to make sure that the whole MAC header was copied to the head buffer in the Vlan packets case
  KS8851: Fix ks8851_set_rx_mode() for IFF_MULTICAST
  KS8851: Fix MAC address write order
  KS8851: Add soft reset at probe time
  net: fix section mismatch in fec.c
  net: Fix struct inet_timewait_sock bitfield annotation
  tcp: Try to catch MSG_PEEK bug
  net: Fix IP_MULTICAST_IF
  bluetooth: static lock key fix
  bluetooth: scheduling while atomic bug fix
  tcp: fix TCP_DEFER_ACCEPT retrans calculation
  tcp: reduce SYN-ACK retrans for TCP_DEFER_ACCEPT
  tcp: accept socket after TCP_DEFER_ACCEPT period
  Revert "tcp: fix tcp_defer_accept to consider the timeout"
  AF_UNIX: Fix deadlock on connecting to shutdown socket
  ethoc: clear only pending irqs
  ethoc: inline regs access
  vmxnet3: use dev_dbg, fix build for CONFIG_BLOCK=n
  virtio_net: use dev_kfree_skb_any() in free_old_xmit_skbs()
  be2net: fix support for PCI hot plug
  ...
2009-10-23 07:34:23 +09:00
Uwe Kleine-König
ff07eb897a move virtrng_remove to .devexit.text
The function virtrng_remove is used only wrapped by __devexit_p so define
it using __devexit.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-10-22 16:39:34 +10:30
Uwe Kleine-König
1e65175c2c move virtballoon_remove to .devexit.text
The function virtballoon_remove is used only wrapped by __devexit_p so
define it using __devexit.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-10-22 16:39:31 +10:30
Rusty Russell
3225beaba0 virtio_blk: Revert serial number support
This reverts "Add serial number support for virtio_blk, V4a".

Turns out that virtio_pci, lguest and s/390 all have an 8 bit limit
on virtio config space, so noone could ever use this.

This is coming back later in a cleaner form.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: john cooper <john.cooper@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
2009-10-22 16:39:30 +10:30
Christian Borntraeger
e95646c3ec virtio: let header files include virtio_ids.h
Rusty,

commit 3ca4f5ca73057a617f9444a91022d7127041970a
    virtio: add virtio IDs file
moved all device IDs into a single file. While the change itself is
a very good one, it can break userspace applications. For example
if a userspace tool wanted to get the ID of virtio_net it used to
include virtio_net.h. This does no longer work, since virtio_net.h
does not include virtio_ids.h.
This patch moves all "#include <linux/virtio_ids.h>" from the C
files into the header files, making the header files compatible with
the old ones.

In addition, this patch exports virtio_ids.h to userspace.

CC: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-10-22 16:39:28 +10:30
Christoph Hellwig
f8b12e513b virtio_blk: revert QUEUE_FLAG_VIRT addition
It seems like the addition of QUEUE_FLAG_VIRT caueses major performance
regressions for Fedora users:

	https://bugzilla.redhat.com/show_bug.cgi?id=509383
	https://bugzilla.redhat.com/show_bug.cgi?id=505695

while I can't reproduce those extreme regressions myself I think the flag
is wrong.

Rationale:

  QUEUE_FLAG_VIRT expands to QUEUE_FLAG_NONROT which casus the queue
  unplugged immediately.  This is not a good behaviour for at least
  qemu and kvm where we do have significant overhead for every
  I/O operations.  Even with all the latested speeups (native AIO,
  MSI support, zero copy) we can only get native speed for up to 128kb
  I/O requests we already are down to 66% of native performance for 4kb
  requests even on my laptop running the Intel X25-M SSD for which the
  QUEUE_FLAG_NONROT was designed.
  If we ever get virtio-blk overhead low enough that this flag makes
  sense it should only be set based on a feature flag set by the host.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-10-22 16:39:26 +10:30
Joyce Yu
845de8afa6 niu: VLAN_ETH_HLEN should be used to make sure that the whole MAC header was copied to the head buffer in the Vlan packets case
Signed-off-by: Joyce Yu <joyce.yu@sun.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-21 17:21:10 -07:00
Linus Torvalds
d995053d04 Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify
* 'for-linus' of git://git.infradead.org/users/eparis/notify:
  dnotify: ignore FS_EVENT_ON_CHILD
  inotify: fix coalesce duplicate events into a single event in special case
  inotify: deprecate the inotify kernel interface
  fsnotify: do not set group for a mark before it is on the i_list
2009-10-22 08:28:28 +09:00
Linus Torvalds
be8db0b843 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: hp_sdc_rtc - fix test in hp_sdc_rtc_read_rt()
  Input: atkbd - consolidate force release quirks for volume keys
  Input: logips2pp - model 73 is actually TrackMan FX
  Input: i8042 - add Sony Vaio VGN-FZ240E to the nomux list
  Input: fix locking issue in /proc/bus/input/ handlers
  Input: atkbd - postpone restoring LED/repeat rate at resume
  Input: atkbd - restore resetting LED state at startup
  Input: i8042 - make pnp_data_busted variable boolean instead of int
  Input: synaptics - add another Protege M300 to rate blacklist
2009-10-22 08:27:12 +09:00
Linus Torvalds
422b42fa79 Merge branch 'kvm-updates/2.6.32' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.32' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: Prevent kvm_init from corrupting debugfs structures
  KVM: MMU: fix pointer cast
  KVM: use proper hrtimer function to retrieve expiration time
2009-10-22 08:26:15 +09:00
Linus Torvalds
1b7607030d Merge git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm
* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm:
  dm snapshot: allow chunk size to be less than page size
  dm snapshot: use unsigned integer chunk size
  dm snapshot: lock snapshot while supplying status
  dm exception store: fix failed set_chunk_size error path
  dm snapshot: require non zero chunk size by end of ctr
  dm: dec_pending needs locking to save error value
  dm: add missing del_gendisk to alloc_dev error path
  dm log: userspace fix incorrect luid cast in userspace_ctr
  dm snapshot: free exception store on init failure
  dm snapshot: sort by chunk size to fix race
2009-10-22 08:25:36 +09:00
Rafael J. Wysocki
04bf7539c0 PM: Make warning in suspend_test_finish() less likely to happen
Increase TEST_SUSPEND_SECONDS to 10 so the warning in
suspend_test_finish() doesn't annoy the users of slower systems so much.

Also, make the warning print the suspend-resume cycle time, so that we
know why the warning actually triggered.

Patch prepared during the hacking session at the Kernel Summit in Tokyo.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-22 08:23:45 +09:00
Uwe Kleine-König
af2bd9d534 mmc: at91_mci: Don't include asm/mach/mmc.h
This fixes a compile bug introduced in

	6ef297f (ARM: 5720/1: Move MMCI header to amba include dir)

That commit moved arch/arm/include/asm/mach/mmc.h to
include/linux/amba/mmci.h.  Just removing the include was enough.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Linus Walleij <linus.walleij@stericsson.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Bill Gatliff <bgat@billgatliff.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Pierre Ossman <drzeus@drzeus.cx>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-22 08:22:25 +09:00
Linus Torvalds
ab4ed677f3 Merge branch 'sh/for-2.6.32' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6
* 'sh/for-2.6.32' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
  sh: Kill off stray HAVE_FTRACE_SYSCALLS reference.
  sh: Remove BKL from landisk gio.
  sh: disabled cache handling fix.
  sh: Fix up single page flushing to use PAGE_SIZE.
2009-10-22 08:17:15 +09:00
Linus Torvalds
4fe71dba2f Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: aesni-intel - Fix irq_fpu_usable usage
  crypto: padlock-sha - Fix stack alignment
2009-10-22 08:16:01 +09:00
Yinghai Lu
4223a4a155 nfs: Fix nfs_parse_mount_options() kfree() leak
Fix a (small) memory leak in one of the error paths of the NFS mount
options parsing code.

Regression introduced in 2.6.30 by commit a67d18f (NFS: load the
rpc/rdma transport module automatically).

Reported-by: Yinghai Lu <yinghai@kernel.org>
Reported-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-22 08:15:23 +09:00
Earl Chew
ad3960243e fs: pipe.c null pointer dereference
This patch fixes a null pointer exception in pipe_rdwr_open() which
generates the stack trace:

> Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP:
>  [<ffffffff802899a5>] pipe_rdwr_open+0x35/0x70
>  [<ffffffff8028125c>] __dentry_open+0x13c/0x230
>  [<ffffffff8028143d>] do_filp_open+0x2d/0x40
>  [<ffffffff802814aa>] do_sys_open+0x5a/0x100
>  [<ffffffff8021faf3>] sysenter_do_call+0x1b/0x67

The failure mode is triggered by an attempt to open an anonymous
pipe via /proc/pid/fd/* as exemplified by this script:

=============================================================
while : ; do
   { echo y ; sleep 1 ; } | { while read ; do echo z$REPLY; done ; } &
   PID=$!
   OUT=$(ps -efl | grep 'sleep 1' | grep -v grep |
        { read PID REST ; echo $PID; } )
   OUT="${OUT%% *}"
   DELAY=$((RANDOM * 1000 / 32768))
   usleep $((DELAY * 1000 + RANDOM % 1000 ))
   echo n > /proc/$OUT/fd/1                 # Trigger defect
done
=============================================================

Note that the failure window is quite small and I could only
reliably reproduce the defect by inserting a small delay
in pipe_rdwr_open(). For example:

 static int
 pipe_rdwr_open(struct inode *inode, struct file *filp)
 {
       msleep(100);
       mutex_lock(&inode->i_mutex);

Although the defect was observed in pipe_rdwr_open(), I think it
makes sense to replicate the change through all the pipe_*_open()
functions.

The core of the change is to verify that inode->i_pipe has not
been released before attempting to manipulate it. If inode->i_pipe
is no longer present, return ENOENT to indicate so.

The comment about potentially using atomic_t for i_pipe->readers
and i_pipe->writers has also been removed because it is no longer
relevant in this context. The inode->i_mutex lock must be used so
that inode->i_pipe can be dealt with correctly.

Signed-off-by: Earl Chew <earl_chew@agilent.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-22 08:11:44 +09:00
Ben Dooks
b6a71bfa00 KS8851: Fix ks8851_set_rx_mode() for IFF_MULTICAST
In ks8851_set_rx_mode() the case handling IFF_MULTICAST was also setting
the RXCR1_AE bit by accident. This meant that all unicast frames where
being accepted by the device. Remove RXCR1_AE from this case.

Note, RXCR1_AE was also masking a problem with setting the MAC address
properly, so needs to be applied after fixing the MAC write order.

Fixes a bug reported by Doong, Ping of Micrel. This version of the
patch avoids setting RXCR1_ME for all cases.

Signed-off-by: Ben Dooks <ben@simtec.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-20 19:11:07 -07:00
Ben Dooks
160d0fadaf KS8851: Fix MAC address write order
The MAC address register was being written in the wrong order, so add
a new address macro to convert mac-address byte to register address and
a ks8851_wrreg8() function to write each byte without having to worry
about any difficult byte swapping.

Fixes a bug reported by Doong, Ping of Micrel.

Signed-off-by: Ben Dooks <ben@simtec.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-20 19:11:06 -07:00
Ben Dooks
57dada6819 KS8851: Add soft reset at probe time
Issue a full soft reset at probe time.

This was reported by Doong Ping of Micrel, but no explanation of why this
is necessary or what bug it is fixing. Add it as it does not seem to hurt
the current driver and ensures that the device is in a known state when we
start setting it up.

Signed-off-by: Ben Dooks <ben@simtec.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-20 19:11:05 -07:00
Steven King
78abcb13dd net: fix section mismatch in fec.c
fec_enet_init is called by both fec_probe and fec_resume, so it
shouldn't be marked as __init.

Signed-off-by: Steven King <sfking@fdwdc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-20 18:51:37 -07:00
Andreas Gruenbacher
945526846a dnotify: ignore FS_EVENT_ON_CHILD
Mask off FS_EVENT_ON_CHILD in dnotify_handle_event().  Otherwise, when there
is more than one watch on a directory and dnotify_should_send_event()
succeeds, events with FS_EVENT_ON_CHILD set will trigger all watches and cause
spurious events.

This case was overlooked in commit e42e2773.

	#define _GNU_SOURCE

	#include <stdio.h>
	#include <stdlib.h>
	#include <unistd.h>
	#include <signal.h>
	#include <sys/types.h>
	#include <sys/stat.h>
	#include <fcntl.h>
	#include <string.h>

	static void create_event(int s, siginfo_t* si, void* p)
	{
		printf("create\n");
	}

	static void delete_event(int s, siginfo_t* si, void* p)
	{
		printf("delete\n");
	}

	int main (void) {
		struct sigaction action;
		char *tmpdir, *file;
		int fd1, fd2;

		sigemptyset (&action.sa_mask);
		action.sa_flags = SA_SIGINFO;

		action.sa_sigaction = create_event;
		sigaction (SIGRTMIN + 0, &action, NULL);

		action.sa_sigaction = delete_event;
		sigaction (SIGRTMIN + 1, &action, NULL);

	#	define TMPDIR "/tmp/test.XXXXXX"
		tmpdir = malloc(strlen(TMPDIR) + 1);
		strcpy(tmpdir, TMPDIR);
		mkdtemp(tmpdir);

	#	define TMPFILE "/file"
		file = malloc(strlen(tmpdir) + strlen(TMPFILE) + 1);
		sprintf(file, "%s/%s", tmpdir, TMPFILE);

		fd1 = open (tmpdir, O_RDONLY);
		fcntl(fd1, F_SETSIG, SIGRTMIN);
		fcntl(fd1, F_NOTIFY, DN_MULTISHOT | DN_CREATE);

		fd2 = open (tmpdir, O_RDONLY);
		fcntl(fd2, F_SETSIG, SIGRTMIN + 1);
		fcntl(fd2, F_NOTIFY, DN_MULTISHOT | DN_DELETE);

		if (fork()) {
			/* This triggers a create event */
			creat(file, 0600);
			/* This triggers a create and delete event (!) */
			unlink(file);
		} else {
			sleep(1);
			rmdir(tmpdir);
		}

		return 0;
	}

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
2009-10-20 18:02:33 -04:00
Eric Dumazet
abf90cca97 net: Fix struct inet_timewait_sock bitfield annotation
commit 9e337b0f (net: annotate inet_timewait_sock bitfields)
added 4/8 bytes in struct inet_timewait_sock.

Fix this by declaring tw_ipv6_offset in the 'flags' bitfield
The 14 bits hole is named tw_pad to make it cleary apparent.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-20 01:13:26 -07:00
Herbert Xu
b6b39e8f3f tcp: Try to catch MSG_PEEK bug
This patch tries to print out more information when we hit the
MSG_PEEK bug in tcp_recvmsg.  It's been around since at least
2005 and it's about time that we finally fix it.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-20 00:51:57 -07:00
Huang Ying
13b79b9715 crypto: aesni-intel - Fix irq_fpu_usable usage
When renaming kernel_fpu_using to irq_fpu_usable, the semantics of the
function is changed too, from mesuring whether kernel is using FPU,
that is, the FPU is NOT available, to measuring whether FPU is usable,
that is, the FPU is available.

But the usage of irq_fpu_usable in aesni-intel_glue.c is not changed
accordingly. This patch fixes this.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2009-10-20 16:20:47 +09:00
Eric Dumazet
55b8050353 net: Fix IP_MULTICAST_IF
ipv4/ipv6 setsockopt(IP_MULTICAST_IF) have dubious __dev_get_by_index() calls.

This function should be called only with RTNL or dev_base_lock held, or reader
could see a corrupt hash chain and eventually enter an endless loop.

Fix is to call dev_get_by_index()/dev_put().

If this happens to be performance critical, we could define a new dev_exist_by_index()
function to avoid touching dev refcount.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-19 21:34:20 -07:00
Dave Young
45054dc1bf bluetooth: static lock key fix
When shutdown ppp connection, lockdep waring about non-static key
will happen, it is caused by the lock is not initialized properly
at that time.

Fix with tuning the lock/skb_queue_head init order

[   94.339261] INFO: trying to register non-static key.
[   94.342509] the code is fine but needs lockdep annotation.
[   94.342509] turning off the locking correctness validator.
[   94.342509] Pid: 0, comm: swapper Not tainted 2.6.31-mm1 #2
[   94.342509] Call Trace:
[   94.342509]  [<c0248fbe>] register_lock_class+0x58/0x241
[   94.342509]  [<c024b5df>] ? __lock_acquire+0xb57/0xb73
[   94.342509]  [<c024ab34>] __lock_acquire+0xac/0xb73
[   94.342509]  [<c024b7fa>] ? lock_release_non_nested+0x17b/0x1de
[   94.342509]  [<c024b662>] lock_acquire+0x67/0x84
[   94.342509]  [<c04cd1eb>] ? skb_dequeue+0x15/0x41
[   94.342509]  [<c054a857>] _spin_lock_irqsave+0x2f/0x3f
[   94.342509]  [<c04cd1eb>] ? skb_dequeue+0x15/0x41
[   94.342509]  [<c04cd1eb>] skb_dequeue+0x15/0x41
[   94.342509]  [<c054a648>] ? _read_unlock+0x1d/0x20
[   94.342509]  [<c04cd641>] skb_queue_purge+0x14/0x1b
[   94.342509]  [<fab94fdc>] l2cap_recv_frame+0xea1/0x115a [l2cap]
[   94.342509]  [<c024b5df>] ? __lock_acquire+0xb57/0xb73
[   94.342509]  [<c0249c04>] ? mark_lock+0x1e/0x1c7
[   94.342509]  [<f8364963>] ? hci_rx_task+0xd2/0x1bc [bluetooth]
[   94.342509]  [<fab95346>] l2cap_recv_acldata+0xb1/0x1c6 [l2cap]
[   94.342509]  [<f8364997>] hci_rx_task+0x106/0x1bc [bluetooth]
[   94.342509]  [<fab95295>] ? l2cap_recv_acldata+0x0/0x1c6 [l2cap]
[   94.342509]  [<c02302c4>] tasklet_action+0x69/0xc1
[   94.342509]  [<c022fbef>] __do_softirq+0x94/0x11e
[   94.342509]  [<c022fcaf>] do_softirq+0x36/0x5a
[   94.342509]  [<c022fe14>] irq_exit+0x35/0x68
[   94.342509]  [<c0204ced>] do_IRQ+0x72/0x89
[   94.342509]  [<c02038ee>] common_interrupt+0x2e/0x34
[   94.342509]  [<c024007b>] ? pm_qos_add_requirement+0x63/0x9d
[   94.342509]  [<c038e8a5>] ? acpi_idle_enter_bm+0x209/0x238
[   94.342509]  [<c049d238>] cpuidle_idle_call+0x5c/0x94
[   94.342509]  [<c02023f8>] cpu_idle+0x4e/0x6f
[   94.342509]  [<c0534153>] rest_init+0x53/0x55
[   94.342509]  [<c0781894>] start_kernel+0x2f0/0x2f5
[   94.342509]  [<c0781091>] i386_start_kernel+0x91/0x96

Reported-by: Oliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Tested-by: Oliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-19 19:36:49 -07:00
Dave Young
f74c77cb11 bluetooth: scheduling while atomic bug fix
Due to driver core changes dev_set_drvdata will call kzalloc which should be
in might_sleep context, but hci_conn_add will be called in atomic context

Like dev_set_name move dev_set_drvdata to work queue function.

oops as following:

Oct  2 17:41:59 darkstar kernel: [  438.001341] BUG: sleeping function called from invalid context at mm/slqb.c:1546
Oct  2 17:41:59 darkstar kernel: [  438.001345] in_atomic(): 1, irqs_disabled(): 0, pid: 2133, name: sdptool
Oct  2 17:41:59 darkstar kernel: [  438.001348] 2 locks held by sdptool/2133:
Oct  2 17:41:59 darkstar kernel: [  438.001350]  #0:  (sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+.+.}, at: [<faa1d2f5>] lock_sock+0xa/0xc [l2cap]
Oct  2 17:41:59 darkstar kernel: [  438.001360]  #1:  (&hdev->lock){+.-.+.}, at: [<faa20e16>] l2cap_sock_connect+0x103/0x26b [l2cap]
Oct  2 17:41:59 darkstar kernel: [  438.001371] Pid: 2133, comm: sdptool Not tainted 2.6.31-mm1 #2
Oct  2 17:41:59 darkstar kernel: [  438.001373] Call Trace:
Oct  2 17:41:59 darkstar kernel: [  438.001381]  [<c022433f>] __might_sleep+0xde/0xe5
Oct  2 17:41:59 darkstar kernel: [  438.001386]  [<c0298843>] __kmalloc+0x4a/0x15a
Oct  2 17:41:59 darkstar kernel: [  438.001392]  [<c03f0065>] ? kzalloc+0xb/0xd
Oct  2 17:41:59 darkstar kernel: [  438.001396]  [<c03f0065>] kzalloc+0xb/0xd
Oct  2 17:41:59 darkstar kernel: [  438.001400]  [<c03f04ff>] device_private_init+0x15/0x3d
Oct  2 17:41:59 darkstar kernel: [  438.001405]  [<c03f24c5>] dev_set_drvdata+0x18/0x26
Oct  2 17:41:59 darkstar kernel: [  438.001414]  [<fa51fff7>] hci_conn_init_sysfs+0x40/0xd9 [bluetooth]
Oct  2 17:41:59 darkstar kernel: [  438.001422]  [<fa51cdc0>] ? hci_conn_add+0x128/0x186 [bluetooth]
Oct  2 17:41:59 darkstar kernel: [  438.001429]  [<fa51ce0f>] hci_conn_add+0x177/0x186 [bluetooth]
Oct  2 17:41:59 darkstar kernel: [  438.001437]  [<fa51cf8a>] hci_connect+0x3c/0xfb [bluetooth]
Oct  2 17:41:59 darkstar kernel: [  438.001442]  [<faa20e87>] l2cap_sock_connect+0x174/0x26b [l2cap]
Oct  2 17:41:59 darkstar kernel: [  438.001448]  [<c04c8df5>] sys_connect+0x60/0x7a
Oct  2 17:41:59 darkstar kernel: [  438.001453]  [<c024b703>] ? lock_release_non_nested+0x84/0x1de
Oct  2 17:41:59 darkstar kernel: [  438.001458]  [<c028804b>] ? might_fault+0x47/0x81
Oct  2 17:41:59 darkstar kernel: [  438.001462]  [<c028804b>] ? might_fault+0x47/0x81
Oct  2 17:41:59 darkstar kernel: [  438.001468]  [<c033361f>] ? __copy_from_user_ll+0x11/0xce
Oct  2 17:41:59 darkstar kernel: [  438.001472]  [<c04c9419>] sys_socketcall+0x82/0x17b
Oct  2 17:41:59 darkstar kernel: [  438.001477]  [<c020329d>] syscall_call+0x7/0xb

Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-10-19 19:36:45 -07:00