scsi, file-posix: add support for persistent reservation management
It is a common requirement for virtual machine to send persistent
reservations, but this currently requires either running QEMU with
CAP_SYS_RAWIO, or using out-of-tree patches that let an unprivileged
QEMU bypass Linux's filter on SG_IO commands.
As an alternative mechanism, the next patches will introduce a
privileged helper to run persistent reservation commands without
expanding QEMU's attack surface unnecessarily.
The helper is invoked through a "pr-manager" QOM object, to which
file-posix.c passes SG_IO requests for PERSISTENT RESERVE OUT and
PERSISTENT RESERVE IN commands. For example:
$ qemu-system-x86_64
-device virtio-scsi \
-object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock
-drive if=none,id=hd,driver=raw,file.filename=/dev/sdb,file.pr-manager=helper0
-device scsi-block,drive=hd
or:
$ qemu-system-x86_64
-device virtio-scsi \
-object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock
-blockdev node-name=hd,driver=raw,file.driver=host_device,file.filename=/dev/sdb,file.pr-manager=helper0
-device scsi-block,drive=hd
Multiple pr-manager implementations are conceivable and possible, though
only one is implemented right now. For example, a pr-manager could:
- talk directly to the multipath daemon from a privileged QEMU
(i.e. QEMU links to libmpathpersist); this makes reservation work
properly with multipath, but still requires CAP_SYS_RAWIO
- use the Linux IOC_PR_* ioctls (they require CAP_SYS_ADMIN though)
- more interestingly, implement reservations directly in QEMU
through file system locks or a shared database (e.g. sqlite)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-08-21 18:58:56 +02:00
|
|
|
======================================
|
|
|
|
Persistent reservation managers
|
|
|
|
======================================
|
|
|
|
|
|
|
|
SCSI persistent Reservations allow restricting access to block devices
|
|
|
|
to specific initiators in a shared storage setup. When implementing
|
|
|
|
clustering of virtual machines, it is a common requirement for virtual
|
|
|
|
machines to send persistent reservation SCSI commands. However,
|
|
|
|
the operating system restricts sending these commands to unprivileged
|
|
|
|
programs because incorrect usage can disrupt regular operation of the
|
|
|
|
storage fabric.
|
|
|
|
|
|
|
|
For this reason, QEMU's SCSI passthrough devices, ``scsi-block``
|
|
|
|
and ``scsi-generic`` (both are only available on Linux) can delegate
|
|
|
|
implementation of persistent reservations to a separate object,
|
|
|
|
the "persistent reservation manager". Only PERSISTENT RESERVE OUT and
|
|
|
|
PERSISTENT RESERVE IN commands are passed to the persistent reservation
|
|
|
|
manager object; other commands are processed by QEMU as usual.
|
|
|
|
|
|
|
|
-----------------------------------------
|
|
|
|
Defining a persistent reservation manager
|
|
|
|
-----------------------------------------
|
|
|
|
|
|
|
|
A persistent reservation manager is an instance of a subclass of the
|
|
|
|
"pr-manager" QOM class.
|
|
|
|
|
|
|
|
Right now only one subclass is defined, ``pr-manager-helper``, which
|
|
|
|
forwards the commands to an external privileged helper program
|
|
|
|
over Unix sockets. The helper program only allows sending persistent
|
|
|
|
reservation commands to devices for which QEMU has a file descriptor,
|
|
|
|
so that QEMU will not be able to effect persistent reservations
|
|
|
|
unless it has access to both the socket and the device.
|
|
|
|
|
|
|
|
``pr-manager-helper`` has a single string property, ``path``, which
|
|
|
|
accepts the path to the helper program's Unix socket. For example,
|
|
|
|
the following command line defines a ``pr-manager-helper`` object and
|
|
|
|
attaches it to a SCSI passthrough device::
|
|
|
|
|
|
|
|
$ qemu-system-x86_64
|
|
|
|
-device virtio-scsi \
|
|
|
|
-object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock
|
|
|
|
-drive if=none,id=hd,driver=raw,file.filename=/dev/sdb,file.pr-manager=helper0
|
|
|
|
-device scsi-block,drive=hd
|
|
|
|
|
|
|
|
Alternatively, using ``-blockdev``::
|
|
|
|
|
|
|
|
$ qemu-system-x86_64
|
|
|
|
-device virtio-scsi \
|
|
|
|
-object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock
|
|
|
|
-blockdev node-name=hd,driver=raw,file.driver=host_device,file.filename=/dev/sdb,file.pr-manager=helper0
|
|
|
|
-device scsi-block,drive=hd
|
2017-08-22 06:50:18 +02:00
|
|
|
|
|
|
|
----------------------------------
|
|
|
|
Invoking :program:`qemu-pr-helper`
|
|
|
|
----------------------------------
|
|
|
|
|
|
|
|
QEMU provides an implementation of the persistent reservation helper,
|
|
|
|
called :program:`qemu-pr-helper`. The helper should be started as a
|
|
|
|
system service and supports the following option:
|
|
|
|
|
|
|
|
-d, --daemon run in the background
|
|
|
|
-q, --quiet decrease verbosity
|
2017-08-22 06:50:55 +02:00
|
|
|
-v, --verbose increase verbosity
|
2017-08-22 06:50:18 +02:00
|
|
|
-f, --pidfile=path PID file when running as a daemon
|
|
|
|
-k, --socket=path path to the socket
|
|
|
|
-T, --trace=trace-opts tracing options
|
|
|
|
|
|
|
|
By default, the socket and PID file are placed in the runtime state
|
|
|
|
directory, for example :file:`/var/run/qemu-pr-helper.sock` and
|
|
|
|
:file:`/var/run/qemu-pr-helper.pid`. The PID file is not created
|
|
|
|
unless :option:`-d` is passed too.
|
|
|
|
|
|
|
|
:program:`qemu-pr-helper` can also use the systemd socket activation
|
|
|
|
protocol. In this case, the systemd socket unit should specify a
|
|
|
|
Unix stream socket, like this::
|
|
|
|
|
|
|
|
[Socket]
|
|
|
|
ListenStream=/var/run/qemu-pr-helper.sock
|
|
|
|
|
|
|
|
After connecting to the socket, :program:`qemu-pr-helper`` can optionally drop
|
|
|
|
root privileges, except for those capabilities that are needed for
|
|
|
|
its operation. To do this, add the following options:
|
|
|
|
|
|
|
|
-u, --user=user user to drop privileges to
|
|
|
|
-g, --group=group group to drop privileges to
|
2017-08-22 06:50:55 +02:00
|
|
|
|
|
|
|
---------------------------------------------
|
|
|
|
Multipath devices and persistent reservations
|
|
|
|
---------------------------------------------
|
|
|
|
|
|
|
|
Proper support of persistent reservation for multipath devices requires
|
|
|
|
communication with the multipath daemon, so that the reservation is
|
|
|
|
registered and applied when a path is newly discovered or becomes online
|
|
|
|
again. :command:`qemu-pr-helper` can do this if the ``libmpathpersist``
|
|
|
|
library was available on the system at build time.
|
|
|
|
|
|
|
|
As of August 2017, a reservation key must be specified in ``multipath.conf``
|
|
|
|
for ``multipathd`` to check for persistent reservation for newly
|
|
|
|
discovered paths or reinstated paths. The attribute can be added
|
|
|
|
to the ``defaults`` section or the ``multipaths`` section; for example::
|
|
|
|
|
|
|
|
multipaths {
|
|
|
|
multipath {
|
|
|
|
wwid XXXXXXXXXXXXXXXX
|
|
|
|
alias yellow
|
|
|
|
reservation_key 0x123abc
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
Linking :program:`qemu-pr-helper` to ``libmpathpersist`` does not impede
|
|
|
|
its usage on regular SCSI devices.
|