After discussions with Anthony Liguori, it seems that the virtio
balloon can be made even simpler. Here's my attempt.
The device configuration tells the driver how much memory it should
take from the guest (ie. balloon size). The guest feeds the page
numbers it has taken via one virtqueue.
A second virtqueue feeds the page numbers the driver wants back: if
the device has the VIRTIO_BALLOON_F_MUST_TELL_HOST bit, then this
queue is compulsory, otherwise it's advisory (and the guest can simply
fault the pages back in).
This driver can be enhanced later to deflate the balloon via a
shrinker, oom callback or we could even go for a complete set of
in-guest regulators.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As Avi pointed out, as we continue to massage the virtio PCI ABI, we can make
things a little more friendly to users by utilizing the PCI revision field to
indicate which version of the ABI we're using. This is a hard ABI version
and incrementing it will cause the guest driver to break.
This is the necessary changes to virtio_pci to support this.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a PCI device that implements a transport for virtio. It allows virtio
devices to be used by QEMU based VMMs like KVM or Xen.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Am Freitag, 1. Februar 2008 schrieb Christian Borntraeger:
> Right. I will fix that with an additional patch.
This patch goes on top of the minor number patch. Please let me know if
you want a merged patch:
Currently virtio_blk creates the disk name combinging "vd" with 'a'++.
This will give strange names after vdz. I have implemented names up to
vdzzz - inspired by the sd.c code. That should be sufficient for now.
There is one driver in the kernel (driver/s390/block/dasd_genhd.c) that
implements names from dasda-dasdzzzz allowing even more disks. Maybe
a janitor can come up with a common implementation usable for all kind
of block device drivers.
I have tested this patch with 100 disks - seems to work.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty,
currently virtio_blk uses one major number per device. While this works
quite well on most systems it is wasteful and will exhaust major numbers
on larger installations.
This patch allocates a major number on init and will use 16 minor numbers
for each disk. That will allow ~64k virtio_blk disks.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty,
I currently try to make my guest boot from an virtio root device
without having an external kernel. Some of the tools that I tried
expect HDIO_GETGEO to work. The most interesting value is likely
the geo.start value to get the offset of a partition. This value
is filled by block/ioctl.c if fops->getgeo is set. This patch also
fills in some standard values for heads, sectors and cylinders.
Makes sense?
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This fixes a potential dangling xmit problem.
We also suppress refill interrupts until we need them.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fix bug found by Christian Borntraeger: if the other side fills all
the registered network buffers before we enable NAPI, we will never
get an interrupt. The simplest fix is to process the input queue once
on open.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Hello Rusty,
virtnet_probe already calls alloc_etherdev, which calls ether_setup.
There is no need to do that again.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is needed for the virtio PCI device to be compiled as a module.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This patch moves virtio under the virtualization menu and changes virtio
devices to not claim to only be for lguest.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Anthony Liguori found double interrupt suppression in the virtio_net
driver, triggered by two skb_recv_done's in a row. This is because
virtio_ring's interrupt suppression is a best-effort optimization: it
contains no synchronization so the host can miss it and still send
interrupts.
But it's certainly nicer for virtio users if calling disable_cb
actually disables callbacks, so we check for the race in the interrupt
routine.
Note: SMP guests might require syncronization here, but since
disable_cb is actually called from interrupt context, there has to be
some form of synchronization before the next same interrupt handler is
called (Linux guarantees that the same device's irq handler will never
run simultanously on multiple CPUs).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
A reset function solves three problems:
1) It allows us to renegotiate features, eg. if we want to upgrade a
guest driver without rebooting the guest.
2) It gives us a clean way of shutting down virtqueues: after a reset,
we know that the buffers won't be used by the host, and
3) It helps the guest recover from messed-up drivers.
So we remove the ->shutdown hook, and the only way we now remove
feature bits is via reset.
We leave it to the driver to do the reset before it deletes queues:
the balloon driver, for example, needs to chat to the host in its
remove function.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Since we want to reset the device to remove them, this is simpler
(device is reset for us on driver remove).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1) Turn GSO on virtio net into an all-or-nothing (keep checksumming
separate). Having multiple bits is a pain: if you can't support something
you should handle it in software, which is still a performance win.
2) Make VIRTIO_NET_HDR_GSO_ECN a flag in the header, so it can apply to
IPv6 or v4.
3) Rename VIRTIO_NET_F_NO_CSUM to VIRTIO_NET_F_CSUM (ie. means we do
checksumming).
4) Add csum and gso params to virtio_net to allow more testing.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's far easier to deal with packets if we don't have to parse the
packet to figure out the header length to know how much to pull into
the skb data. Add the field to the virtio_net_hdr struct (and fix the
spaces that somehow crept in there).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This field has been unused since an older version of virtio. Remove
it now before we freeze the ABI.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au.
The other side (host) can set the NO_NOTIFY flag as an optimization,
to say "no need to kick me when you add things". Make it clear that
this is advisory only; especially that we should always notify when
the ring is full.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Using unsigned int resulted in silent truncation of the upper 32-bit
on x86_64 resulting in an OOPS since the ring was being initialized
wrong.
Please reconsider my previous patch to just use PAGE_ALIGN(). Open
coding this sort of stuff, no matter how simple it seems, is just
asking for this sort of trouble.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Various drivers want to know when their configuration information
changes: the balloon driver is the immediate user, but the network
driver may one day have a "carrier" status as well.
This introduces that callback (lguest doesn't use it yet).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It seems that virtio_net wants to disable callbacks (interrupts) before
calling netif_rx_schedule(), so we can't use the return value to do so.
Rename "restart" to "cb_enable" and introduce "cb_disable" hook: callback
now returns void, rather than a boolean.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Previously we used a type/len pair within the config space, but this
seems overkill. We now simply define a structure which represents the
layout in the config space: the config space can now only be extended
at the end.
The main driver-visible changes:
1) We indicate what fields are present with an explicit feature bit.
2) Virtqueues are explicitly numbered, and not in the config space.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Use it in virtio_net (replacing buggy version there), it's also going
to be used by TAP for partial csum support.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: David S. Miller <davem@davemloft.net>
fcntl(F_GETLK,..) can return pid of process for not current pid namespace
(if process is belonged to the several namespaces). It is true also for
pids in /proc/locks. So correct behavior is saving pointer to the struct
pid of the process lock owner.
Signed-off-by: Vitaliy Gusev <vgusev@openvz.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
interruptible_sleep_on_locked() is just an open-coded
wait_event_interruptible_timeout(), with the one difference that
interruptible_sleep_on_locked() doesn't bother to check the condition on
which it is waiting, depending instead on the BKL to avoid the case
where it blocks after the wakeup has already been called.
locks_block_on_timeout() is only used in one place, so it's actually
simpler to inline it into its caller.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
For such a short function (with such a long comment),
posix_locks_deadlock() seems to cause a lot of confusion. Attempt to
make it a bit clearer:
- Remove the initial posix_same_owner() check, which can never
pass (since this is only called in the case that block_fl and
caller_fl conflict)
- Use an explicit loop (and a helper function) instead of a goto.
- Rewrite the comment, attempting a clearer explanation, and
removing some uninteresting historical detail.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Building the aic7xxx driver includes the copy
of an .h file from a _shipped file.
In a highly parallel build Ingo saw that the
build sometimes failed (included distcc usage).
It was tracked down to a missing dependency from the .c
source file to the generated .h file.
We started to build the .c file before the
copy (cat) operation of the .h file completed
and we then only got half of the definitions
from the copied .h file.
Add an explicit dependency from the .c files to the
generated .h files so make knows all dependencies and
finsih the build of the .h files before it starts
building the .o files.
Ingo tested this fix and reported:
good news: hundreds of successful kernel builds and no failures
overnight.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com>
I was struggling to get my email-client no to mangle my patch files,
and I didn't find enough information in the SubmittingPatches file.
By looking for more information on the web, I eventually found the
email-clients.txt file, and it answered all my needs
This patch adds a reference to email-clients.txt in SubmittingPatches,
and Mozilla related information which is no longer accurate
(as opposed to the details found in email-clients.txt).
This should be helpful for people sending their first patches,
or not sending patches on a frequent basis.
Signed-off-by: Michael Opdenacker <michael@free-electrons.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Acked-by: James Smart <james.smart@emulex.com>
Acked-by: Darrick J. Wong <djwong@us.ibm.com>
Acked-by: David Somayajulu <david.somayajulu@qlogic.com>
Acked-by: Mark Salyzyn <mark_salyzyn@adaptec.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>