Commit Graph

83026 Commits

Author SHA1 Message Date
Rusty Russell
6b35e40767 virtio: balloon driver
After discussions with Anthony Liguori, it seems that the virtio
balloon can be made even simpler.  Here's my attempt.

The device configuration tells the driver how much memory it should
take from the guest (ie. balloon size).  The guest feeds the page
numbers it has taken via one virtqueue.

A second virtqueue feeds the page numbers the driver wants back: if
the device has the VIRTIO_BALLOON_F_MUST_TELL_HOST bit, then this
queue is compulsory, otherwise it's advisory (and the guest can simply
fault the pages back in).

This driver can be enhanced later to deflate the balloon via a
shrinker, oom callback or we could even go for a complete set of
in-guest regulators.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:13 +11:00
Anthony Liguori
55a7c06604 virtio: Use PCI revision field to indicate virtio PCI ABI version
As Avi pointed out, as we continue to massage the virtio PCI ABI, we can make
things a little more friendly to users by utilizing the PCI revision field to
indicate which version of the ABI we're using.  This is a hard ABI version
and incrementing it will cause the guest driver to break.

This is the necessary changes to virtio_pci to support this.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:12 +11:00
Anthony Liguori
3343660d8c virtio: PCI device
This is a PCI device that implements a transport for virtio.  It allows virtio
devices to be used by QEMU based VMMs like KVM or Xen.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:11 +11:00
Christian Borntraeger
d50ed907dc virtio_blk: implement naming for vda-vdz,vdaa-vdzz,vdaaa-vdzzz
Am Freitag, 1. Februar 2008 schrieb Christian Borntraeger:
> Right. I will fix that with an additional patch.

This patch goes on top of the minor number patch. Please let me know if
you want a merged patch:

Currently virtio_blk creates the disk name combinging "vd"  with 'a'++.
This will give strange names after vdz. I have implemented names up to
vdzzz - inspired by the sd.c code. That should be sufficient for now.

There is one driver in the kernel (driver/s390/block/dasd_genhd.c) that
implements names from dasda-dasdzzzz allowing even more disks. Maybe
a janitor can come up with a common implementation usable for all kind
of block device drivers.

I have tested this patch with 100 disks - seems to work.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:11 +11:00
Christian Borntraeger
4f3bf19c6e virtio_blk: Dont waste major numbers
Rusty,

currently virtio_blk uses one major number per device. While this works
quite well on most systems it is wasteful and will exhaust major numbers
on larger installations.

This patch allocates a major number on init and will use 16 minor numbers
for each disk. That will allow ~64k virtio_blk disks.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:10 +11:00
Christian Borntraeger
135da0b037 virtio_blk: provide getgeo
Rusty,

I currently try to make my guest boot from an virtio root device
without having an external kernel. Some of the tools that I tried
expect HDIO_GETGEO to work. The most interesting value is likely
the geo.start value to get the offset of a partition. This value
is filled by block/ioctl.c if fops->getgeo is set. This patch also
fills in some standard values for heads, sectors and cylinders.

Makes sense?

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:09 +11:00
Dor Laor
6c0cd7c000 virtio_net: parametrize the napi_weight for virtio receive queue.
It is done in order to improve performance.

Signed-off-by: Dor Laor <dor.laor@qumranet.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:09 +11:00
Rusty Russell
2cb9c6bafc virtio: free transmit skbs when notified, not on next xmit.
This fixes a potential dangling xmit problem.

We also suppress refill interrupts until we need them.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:08 +11:00
Rusty Russell
a48bd8f670 virtio: flush buffers on open
Fix bug found by Christian Borntraeger: if the other side fills all
the registered network buffers before we enable NAPI, we will never
get an interrupt.  The simplest fix is to process the input queue once
on open.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:07 +11:00
Christian Borntraeger
e70f2f1bb8 virtnet: remove double ether_setup
Hello Rusty,

virtnet_probe already calls alloc_etherdev, which calls ether_setup.
There is no need to do that again.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:07 +11:00
Rusty Russell
c6fd47011b virtio: Allow virtio to be modular and used by modules
This is needed for the virtio PCI device to be compiled as a module.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:06 +11:00
Rusty Russell
15f9c8903c virtio: Use the sg_phys convenience function.
Simple cleanup.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:05 +11:00
Anthony Liguori
0ad07ec1fd virtio: Put the virtio under the virtualization menu
This patch moves virtio under the virtualization menu and changes virtio
devices to not claim to only be for lguest.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:05 +11:00
Rusty Russell
81a8deab1c virtio: handle interrupts after callbacks turned off
Anthony Liguori found double interrupt suppression in the virtio_net
driver, triggered by two skb_recv_done's in a row.  This is because
virtio_ring's interrupt suppression is a best-effort optimization: it
contains no synchronization so the host can miss it and still send
interrupts.

But it's certainly nicer for virtio users if calling disable_cb
actually disables callbacks, so we check for the race in the interrupt
routine.

Note: SMP guests might require syncronization here, but since
disable_cb is actually called from interrupt context, there has to be
some form of synchronization before the next same interrupt handler is
called (Linux guarantees that the same device's irq handler will never
run simultanously on multiple CPUs).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:04 +11:00
Rusty Russell
6e5aa7efb2 virtio: reset function
A reset function solves three problems:

1) It allows us to renegotiate features, eg. if we want to upgrade a
   guest driver without rebooting the guest.

2) It gives us a clean way of shutting down virtqueues: after a reset,
   we know that the buffers won't be used by the host, and

3) It helps the guest recover from messed-up drivers.

So we remove the ->shutdown hook, and the only way we now remove
feature bits is via reset.

We leave it to the driver to do the reset before it deletes queues:
the balloon driver, for example, needs to chat to the host in its
remove function.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:03 +11:00
Rusty Russell
b3369c1fb4 virtio: populate network rings in the probe routine, not open
Since we want to reset the device to remove them, this is simpler
(device is reset for us on driver remove).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:03 +11:00
Rusty Russell
34a48579e4 virtio: Tweak virtio_net defines
1) Turn GSO on virtio net into an all-or-nothing (keep checksumming
   separate).  Having multiple bits is a pain: if you can't support something
   you should handle it in software, which is still a performance win.

2) Make VIRTIO_NET_HDR_GSO_ECN a flag in the header, so it can apply to
   IPv6 or v4.

3) Rename VIRTIO_NET_F_NO_CSUM to VIRTIO_NET_F_CSUM (ie. means we do
   checksumming).

4) Add csum and gso params to virtio_net to allow more testing.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:02 +11:00
Rusty Russell
50c8ea8080 virtio: Net header needs hdr_len
It's far easier to deal with packets if we don't have to parse the
packet to figure out the header length to know how much to pull into
the skb data.  Add the field to the virtio_net_hdr struct (and fix the
spaces that somehow crept in there).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:02 +11:00
Rusty Russell
24a5ae5d03 virtio: remove unused id field from struct virtio_blk_outhdr
This field has been unused since an older version of virtio.  Remove
it now before we freeze the ABI.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au.
2008-02-04 23:50:01 +11:00
Rusty Russell
426e3e0af5 virtio: clarify NO_NOTIFY flag usage
The other side (host) can set the NO_NOTIFY flag as an optimization,
to say "no need to kick me when you add things".  Make it clear that
this is advisory only; especially that we should always notify when
the ring is full.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:50:00 +11:00
Anthony Liguori
3309daaad7 virtio: Fix vring_init/vring_size to take unsigned long
Using unsigned int resulted in silent truncation of the upper 32-bit
on x86_64 resulting in an OOPS since the ring was being initialized
wrong.

Please reconsider my previous patch to just use PAGE_ALIGN().  Open
coding this sort of stuff, no matter how simple it seems, is just
asking for this sort of trouble.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:49:59 +11:00
Rusty Russell
f957d1f05a virtio: configuration change callback
Various drivers want to know when their configuration information
changes: the balloon driver is the immediate user, but the network
driver may one day have a "carrier" status as well.

This introduces that callback (lguest doesn't use it yet).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:49:59 +11:00
Rusty Russell
18445c4d50 virtio: explicit enable_cb/disable_cb rather than callback return.
It seems that virtio_net wants to disable callbacks (interrupts) before
calling netif_rx_schedule(), so we can't use the return value to do so.

Rename "restart" to "cb_enable" and introduce "cb_disable" hook: callback
now returns void, rather than a boolean.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:49:58 +11:00
Rusty Russell
a586d4f601 virtio: simplify config mechanism.
Previously we used a type/len pair within the config space, but this
seems overkill.  We now simply define a structure which represents the
layout in the config space: the config space can now only be extended
at the end.

The main driver-visible changes:
1) We indicate what fields are present with an explicit feature bit.
2) Virtqueues are explicitly numbered, and not in the config space.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-02-04 23:49:57 +11:00
Rusty Russell
f35d9d8aae virtio: Implement skb_partial_csum_set, for setting partial csums on untrusted packets.
Use it in virtio_net (replacing buggy version there), it's also going
to be used by TAP for partial csum support.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: David S. Miller <davem@davemloft.net>
2008-02-04 23:49:56 +11:00
Vitaliy Gusev
ab1f161165 pid-namespaces-vs-locks-interaction
fcntl(F_GETLK,..) can return pid of process for not current pid namespace
(if process is belonged to the several namespaces).  It is true also for
pids in /proc/locks.  So correct behavior is saving pointer to the struct
pid of the process lock owner.

Signed-off-by: Vitaliy Gusev <vgusev@openvz.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-03 17:51:36 -05:00
Matthew Wilcox
4321e01e7d file locks: Use wait_event_interruptible_timeout()
interruptible_sleep_on_locked() is just an open-coded
wait_event_interruptible_timeout(), with the one difference that
interruptible_sleep_on_locked() doesn't bother to check the condition on
which it is waiting, depending instead on the BKL to avoid the case
where it blocks after the wakeup has already been called.

locks_block_on_timeout() is only used in one place, so it's actually
simpler to inline it into its caller.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-03 17:51:36 -05:00
J. Bruce Fields
b533184fc3 locks: clarify posix_locks_deadlock
For such a short function (with such a long comment),
posix_locks_deadlock() seems to cause a lot of confusion.  Attempt to
make it a bit clearer:

	- Remove the initial posix_same_owner() check, which can never
	  pass (since this is only called in the case that block_fl and
	  caller_fl conflict)
	- Use an explicit loop (and a helper function) instead of a goto.
	- Rewrite the comment, attempting a clearer explanation, and
	  removing some uninteresting historical detail.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-03 17:51:36 -05:00
Sam Ravnborg
8891fec65a scsi: fix dependency bug in aic7 Makefile
Building the aic7xxx driver includes the copy
of an .h file from a _shipped file.

In a highly parallel build Ingo saw that the
build sometimes failed (included distcc usage).
It was tracked down to a missing dependency from the .c
source file to the generated .h file.
We started to build the .c file before the
copy (cat) operation of the .h file completed
and we then only got half of the definitions
from the copied .h file.

Add an explicit dependency from the .c files to the
generated .h files so make knows all dependencies and
finsih the build of the .h files before it starts
building the .o files.

Ingo tested this fix and reported:
good news: hundreds of successful kernel builds and no failures
overnight.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2008-02-03 21:55:49 +01:00
Adrian Bunk
1560a79a2c Jesper Juhl is the new trivial patches maintainer
Jesper has agreed to take over maintainership for the trivial patches.

Thanks, Jesper!

Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 18:17:37 +02:00
Michael Opdenacker
097091c0a8 Documentation: mention email-clients.txt in SubmittingPatches
I was struggling to get my email-client no to mangle my patch files,
and I didn't find enough information in the SubmittingPatches file.
By looking for more information on the web, I eventually found the
email-clients.txt file, and it answered all my needs

This patch adds a reference to email-clients.txt in SubmittingPatches,
and Mozilla related information which is no longer accurate
(as opposed to the details found in email-clients.txt).

This should be helpful for people sending their first patches,
or not sending patches on a frequent basis.

Signed-off-by: Michael Opdenacker <michael@free-electrons.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 18:06:58 +02:00
Ohad Ben-Cohen
09c6dd3c9d fs/binfmt_elf.c: spello fix
s/litle/little

Signed-off-by: Ohad Ben-Cohen <ohad@bencohen.org>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 18:05:15 +02:00
Fengguang Wu
28bc44d7d1 do_invalidatepage() comment typo fix
Fix a typo in the comment for do_invalidatepage().

Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 18:04:10 +02:00
Oliver Pinter
3eb43f689c Documentation/filesystems/porting fixes
typo fix and whitespace cleanup

Signed-off-by: Oliver Pinter <oliver.pntr@gmail.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 17:59:17 +02:00
Oliver Pinter
53379e57a7 typo fixes in net/core/net_namespace.c
Signed-off-by: Oliver Pinter <oliver.pntr@gmail.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 17:56:48 +02:00
Oliver Pinter
1dbede8714 typo fix in net/rfkill/rfkill.c
Signed-off-by: Oliver Pinter <oliver.pntr@gmail.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 17:55:45 +02:00
Oliver Pinter
ac461a0330 typo fixes in net/sctp/sm_statefuns.c
Signed-off-by: Oliver Pinter <oliver.pntr@gmail.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 17:52:41 +02:00
Joe Perches
643d1f7fe3 lib/: Spelling fixes
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 17:48:52 +02:00
Joe Perches
0b0a3e7b18 kernel/: Spelling fixes
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 17:48:04 +02:00
Joe Perches
1e4478c325 include/scsi/: Spelling fixes
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 17:47:00 +02:00
Joe Perches
fd3f8984f6 include/linux/: Spelling fixes
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 17:45:46 +02:00
Joe Perches
ab690d9fed include/asm-m68knommu/: Spelling fixes
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 17:38:04 +02:00
Joe Perches
62018b5b58 include/asm-frv/: Spelling fixes
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 17:34:55 +02:00
Joe Perches
c78bad11fb fs/: Spelling fixes
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 17:33:42 +02:00
Joe Perches
ee0fc097ef drivers/watchdog/: Spelling fixes
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 17:32:52 +02:00
Joe Perches
44363f14d9 drivers/video/: Spelling fixes
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 17:31:49 +02:00
Joe Perches
b8c268d104 drivers/ssb/: Spelling fixes
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 17:30:25 +02:00
Joe Perches
8f4aafec6a drivers/serial/: Spelling fixes
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 17:29:25 +02:00
Joe Perches
b1c118121a drivers/scsi/: Spelling fixes
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Acked-by: James Smart <james.smart@emulex.com>
Acked-by: Darrick J. Wong <djwong@us.ibm.com>
Acked-by: David Somayajulu <david.somayajulu@qlogic.com>
Acked-by: Mark Salyzyn <mark_salyzyn@adaptec.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 17:28:22 +02:00
Joe Perches
f26fc4e08a drivers/pcmcia/: Spelling fixes
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 17:26:02 +02:00