Register_xenstore_notifier should guarantee that the caller gets
notified even if xenstore is already up.
Therefore we revert "do not notify callers from
register_xenstore_notifier" and set xenstored_read at the right time for
PV on HVM guests too.
In fact in case of PV on HVM guests xenstored is ready only after the
platform pci driver has completed the initialization, so do not set
xenstored_ready before the call to xenbus_probe().
This patch fixes a shutdown_event watcher registration bug that causes
"xm shutdown" not to work properly.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
* 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block: (149 commits)
block: make sure that REQ_* types are seen even with CONFIG_BLOCK=n
xen-blkfront: fix missing out label
blkdev: fix blkdev_issue_zeroout return value
block: update request stacking methods to support discards
block: fix missing export of blk_types.h
writeback: fix bad _bh spinlock nesting
drbd: revert "delay probes", feature is being re-implemented differently
drbd: Initialize all members of sync_conf to their defaults [Bugz 315]
drbd: Disable delay probes for the upcomming release
writeback: cleanup bdi_register
writeback: add new tracepoints
writeback: remove unnecessary init_timer call
writeback: optimize periodic bdi thread wakeups
writeback: prevent unnecessary bdi threads wakeups
writeback: move bdi threads exiting logic to the forker thread
writeback: restructure bdi forker loop a little
writeback: move last_active to bdi
writeback: do not remove bdi from bdi_list
writeback: simplify bdi code a little
writeback: do not lose wake-ups in bdi threads
...
Fixed up pretty trivial conflicts in drivers/block/virtio_blk.c and
drivers/scsi/scsi_error.c as per Jens.
According to the comments, this was how it's been done years ago, but
apparently took an xbt pointer from elsewhere back then. The code was
removed because of consistency issues: cancellation wont't roll back
the saved xbdev->state.
Still, unsolicited writes to the state field remain an issue,
especially if device shutdown takes thread synchronization, and subtle
races cause accidental recreation of the device node.
Fixed by reintroducing the transaction. An internal one is sufficient,
so the xbdev->state value remains consistent.
Also fixes the original hack to prevent infinite recursion. Instead of
bailing out on the first attempt to switch to Closing, checks call
depth now.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
* upstream/pvhvm:
Introduce CONFIG_XEN_PVHVM compile option
blkfront: do not create a PV cdrom device if xen_hvm_guest
support multiple .discard.* sections to avoid section type conflicts
xen/pvhvm: fix build problem when !CONFIG_XEN
xenfs: enable for HVM domains too
x86: Call HVMOP_pagetable_dying on exit_mmap.
x86: Unplug emulated disks and nics.
x86: Use xen_vcpuop_clockevent, xen_clocksource and xen wallclock.
xen: Fix find_unbound_irq in presence of ioapic irqs.
xen: Add suspend/resume support for PV on HVM guests.
xen: Xen PCI platform device driver.
x86/xen: event channels delivery on HVM.
x86: early PV on HVM features initialization.
xen: Add support for HVM hypercalls.
Conflicts:
arch/x86/xen/enlighten.c
arch/x86/xen/time.c
* upstream/core:
xen/panic: use xen_reboot and fix smp_send_stop
Xen: register panic notifier to take crashes of xen guests on panic
xen: support large numbers of CPUs with vcpu info placement
xen: drop xen_sched_clock in favour of using plain wallclock time
pvops: do not notify callers from register_xenstore_notifier
xen: make sure pages are really part of domain before freeing
xen: release unused free memory
Currently register_xenstore_notifier notifies the caller during the
registration itself if xenstore is believed to be ready. This behaviour
causes problems to PV on HVM guests, in which case callers should be
notified by xenbus_probe only after the platform pci driver is loaded.
We already make sure xenbus_probe is called at the right time, calling
it either from device_initcall (PV case) or from the platform pci
driver initialization (HVM case) so we don't need this additional
notification.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Add a xen_emul_unplug command line option to the kernel to unplug
xen emulated disks and nics.
Set the default value of xen_emul_unplug depending on whether or
not the Xen PV frontends and the Xen platform PCI driver have
been compiled for this kernel (modules or built-in are both OK).
The user can specify xen_emul_unplug=ignore to enable PV drivers on HVM
even if the host platform doesn't support unplug.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Add the xen pci platform device driver that is responsible
for initializing the grant table and xenbus in PV on HVM mode.
Few changes to xenbus and grant table are necessary to allow the delayed
initialization in HVM mode.
Grant table needs few additional modifications to work in HVM mode.
The Xen PCI platform device raises an irq every time an event has been
delivered to us. However these interrupts are only delivered to vcpu 0.
The Xen PCI platform interrupt handler calls xen_hvm_evtchn_do_upcall
that is a little wrapper around __xen_evtchn_do_upcall, the traditional
Xen upcall handler, the very same used with traditional PV guests.
When running on HVM the event channel upcall is never called while in
progress because it is a normal Linux irq handler (and we cannot switch
the irq chip wholesale to the Xen PV ones as we are running QEMU and
might have passed in PCI devices), therefore we cannot be sure that
evtchn_upcall_pending is 0 when returning.
For this reason if evtchn_upcall_pending is set by Xen we need to loop
again on the event channels set pending otherwise we might loose some
event channel deliveries.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Initialize basic pv on hvm features adding a new Xen HVM specific
hypervisor_x86 structure.
Don't try to initialize xen-kbdfront and xen-fbfront when running on HVM
because the backends are not available.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Since the device we are resuming could be the device containing the
swap device we should ensure that the allocation cannot cause
IO.
On resume, this path is triggered when the running system tries to
continue using its devices. If it cannot then the resume will fail;
to try to avoid this we let it dip into the emergency pools.
The majority of these changes were made when linux-2.6.18-xen.hg
changeset e8b49cfbdac0 was ported upstream in
a144ff09bc but somehow this hunk was
dropped.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Stable Kernel <stable@kernel.org> # .32.x
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Increases the device timeout from 10s to 5 minutes, giving the user a
visual indication during that time in case there are problems. The patch
is a backport of changesets 144 and 150 in the Xenbits tree.
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
When printing a warning about a timed-out device, print the
current state of both ends of the device connection (i.e., backend as
well as frontend). This backports half of changeset 146 from the
Xenbits tree.
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
The logic of is_disconnected_device/exists_disconnected_device is wrong
in that they are used to test whether a device is trying to connect (i.e.
connecting). For this reason the patch fixes them to not consider a
Closing or Closed device to be connecting. At the same time the patch
also renames the functions according to what they really do; you could
say a closed device is "disconnected" (the old name), but not "connecting"
(the new name).
This patch is a backport of changeset 909 from the Xenbits tree.
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
They don't need to be global, and may cause linker clashes.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
Move xen_domain and related tests out of asm-x86 to xen/xen.h so they
can be included whenever they are necessary.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
================================================
[ BUG: lock held when returning to user space! ]
------------------------------------------------
xenstore-list/3522 is leaving the kernel with locks still held!
1 lock held by xenstore-list/3522:
#0: (&xs_state.transaction_mutex){......}, at: [<c026dc6f>] xenbus_dev_request_and_reply+0x8f/0xa0
The canonical fix for this type of issue appears to be to maintain a
count manually rather than using an rwsem so do that here.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Remove suspend_cancel hook from xenbus_driver, in preparation for using
the device model for suspending.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
The xenfs filesystem exports various interfaces to usermode. Initially
this exports a file to allow usermode to interact with xenbus/xenstore.
Traditionally this appeared in /proc/xen. Rather than extending procfs,
this patch adds a backward-compat mountpoint on /proc/xen, and provides
a xenfs filesystem which can be mounted there.
Signed-off-by: Alex Zeffertt <alex.zeffertt@eu.citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
no argument named @xbt in xenbus_switch_state(), remove it.
Signed-off-by: Qinghuang Feng <qhfeng.kernel@gmail.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are four operating modes Xen code may find itself running in:
- native
- hvm domain
- pv dom0
- pv domU
Clean up predicates for testing for these states to make them more consistent.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Xen-devel <xen-devel@lists.xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Avoid allocations causing swap activity on the resume path by
preventing the allocations from doing IO and allowing them
to access the emergency pools.
These paths are used when a frontend device is trying to connect
to its backend driver over Xenbus. These reconnections are triggered
on demand by IO, so by definition there is already IO underway,
and further IO would naturally deadlock. On resume, this path
is triggered when the running system tries to continue using its
devices. If it cannot then the resume will fail; to try to avoid this
we let it dip into the emergency pools.
[ linux-2.6.18-xen changesets e8b49cfbdac, fdb998e79aba ]
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
When restoring, rebind the existing xenbus irq to the new xenbus event
channel. (It turns out in practice that this is always the same, and
is never updated on restore. That's a bug, but Xeno-linux has been
like this for a long time, so it can't really be fixed.)
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add module aliases to support autoprobing modules
for xen frontend devices.
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
When the xen block frontend driver is built as a module the module load
is only synchronous up to the point where the frontend and the backend
become connected rather than when the disk is added.
This means that there can be a race on boot between loading the module and
loading the dm-* modules and doing the scan for LVM physical volumes (all
in the initrd). In the failure case the disk is not present until after the
scan for physical volumes is complete.
Taken from:
http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/11483a00c017
Signed-off-by: Christian Limpach <Christian.Limpach@xensource.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Don't use alloc_vm_area()/free_vm_area() directly, instead define
xen_alloc_vm_area()/xen_free_vm_area() and use them.
alloc_vm_area()/free_vm_area() are used to allocate/free area which
are for grant table mapping. Xen/x86 grant table is based on virtual
address so that alloc_vm_area()/free_vm_area() are suitable.
On the other hand Xen/ia64 (and Xen/powerpc) grant table is based on
pseudo physical address (guest physical address) so that allocation
should be done differently.
The original version of xenified Linux/IA64 have its own
allocate_vm_area()/free_vm_area() definitions which don't allocate vm area
contradictory to those names.
Now vanilla Linux already has its definitions so that it's impossible
to have IA64 definitions of allocate_vm_area()/free_vm_area().
Instead introduce xen_allocate_vm_area()/xen_free_vm_area() and use them.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Found these while looking at printk uses.
Add missing newlines to dev_<level> uses
Add missing KERN_<level> prefixes to multiline dev_<level>s
Fixed a wierd->weird spelling typo
Added a newline to a printk
Signed-off-by: Joe Perches <joe@perches.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Mark M. Hoffman <mhoffman@lightlink.com>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Tilman Schmidt <tilman@imap.cc>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Stephen Hemminger <shemminger@linux-foundation.org>
Cc: Greg KH <greg@kroah.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Cc: James Smart <James.Smart@Emulex.Com>
Cc: Andrew Vasquez <andrew.vasquez@qlogic.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Jaroslav Kysela <perex@suse.cz>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes an obvious use-after-free spotted by the Coverity checker.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This communicates with the machine control software via a registry
residing in a controlling virtual machine. This allows dynamic
creation, destruction and modification of virtual device
configurations (network devices, block devices and CPUS, to name some
examples).
[ Greg, would you mind giving this a review? Thanks -J ]
Signed-off-by: Ian Pratt <ian.pratt@xensource.com>
Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Greg KH <greg@kroah.com>