# Background
I was investigating spurious non-deterministic EINTR returns from
various 9p file system operations in a Linux guest served from the
qemu 9p server.
## EINTR, ERESTARTSYS and the linux kernel
When a signal arrives that the Linux kernel needs to deliver to user-space
while a given thread is blocked (in the 9p case waiting for a reply to its
request in 9p_client_rpc -> wait_event_interruptible), it asks whatever
driver is currently running to abort its current operation (in the 9p case
causing the submission of a TFLUSH message) and return to user space.
In these situations, the error message reported is generally ERESTARTSYS.
If the userspace processes specified SA_RESTART, this means that the
system call will get restarted upon completion of the signal handler
delivery (assuming the signal handler doesn't modify the process state
in complicated ways not relevant here). If SA_RESTART is not specified,
ERESTARTSYS gets translated to EINTR and user space is expected to handle
the restart itself.
## The 9p TFLUSH command
The 9p TFLUSH commands requests that the server abort an ongoing operation.
The man page [1] specifies:
```
If it recognizes oldtag as the tag of a pending transaction, it should
abort any pending response and discard that tag.
[...]
When the client sends a Tflush, it must wait to receive the corresponding
Rflush before reusing oldtag for subsequent messages. If a response to the
flushed request is received before the Rflush, the client must honor the
response as if it had not been flushed, since the completed request may
signify a state change in the server
```
In particular, this means that the server must not send a reply with the
orignal tag in response to the cancellation request, because the client is
obligated to interpret such a reply as a coincidental reply to the original
request.
# The bug
When qemu receives a TFlush request, it sets the `cancelled` flag on the
relevant pdu. This flag is periodically checked, e.g. in
`v9fs_co_name_to_path`, and if set, the operation is aborted and the error
is set to EINTR. However, the server then violates the spec, by returning
to the client an Rerror response, rather than discarding the message
entirely. As a result, the client is required to assume that said Rerror
response is a result of the original request, not a result of the
cancellation and thus passes the EINTR error back to user space.
This is not the worst thing it could do, however as discussed above, the
correct error code would have been ERESTARTSYS, such that user space
programs with SA_RESTART set get correctly restarted upon completion of
the signal handler.
Instead, such programs get spurious EINTR results that they were not
expecting to handle.
It should be noted that there are plenty of user space programs that do not
set SA_RESTART and do not correctly handle EINTR either. However, that is
then a userspace bug. It should also be noted that this bug has been
mitigated by a recent commit to the Linux kernel [2], which essentially
prevents the kernel from sending Tflush requests unless the process is about
to die (in which case the process likely doesn't care about the response).
Nevertheless, for older kernels and to comply with the spec, I believe this
change is beneficial.
# Implementation
The fix is fairly simple, just skipping notification of a reply if
the pdu was previously cancelled. We do however, also notify the transport
layer that we're doing this, so it can clean up any resources it may be
holding. I also added a new trace event to distinguish
operations that caused an error reply from those that were cancelled.
One complication is that we only omit sending the message on EINTR errors in
order to avoid confusing the rest of the code (which may assume that a
client knows about a fid if it sucessfully passed it off to pud_complete
without checking for cancellation status). This does mean that if the server
acts upon the cancellation flag, it always needs to set err to EINTR. I
believe this is true of the current code.
[1] https://9fans.github.io/plan9port/man/man9/flush.html
[2] https://github.com/torvalds/linux/commit/9523feac272ccad2ad8186ba4fcc891
Signed-off-by: Keno Fischer <keno@juliacomputing.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
[groug, send a zero-sized reply instead of detaching the buffer]
Signed-off-by: Greg Kurz <groug@kaod.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
No good reasons to do this outside of v9fs_device_realize_common().
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
This backend raise some concerns:
- doesn't support symlinks
- fails +100 tests in the PJD POSIX file system test suite [1]
- requires the QEMU process to run with the CAP_DAC_READ_SEARCH
capability, which isn't recommended for security reasons
This backend should not be used and wil be removed. The 'local'
backend is the recommended alternative.
[1] https://www.tuxera.com/community/posix-test-suite/
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
This patch changes some error messages in the backend init code and
convert backends to propagate QEMU Error objects instead of calling
error_report().
One notable improvement is that the local backend now provides a more
detailed error report when it fails to open the shared directory.
Signed-off-by: Greg Kurz <groug@kaod.org>
This patch changes some error messages in the backend opts parsing
code and convert backends to propagate QEMU Error objects instead
of calling error_report().
Signed-off-by: Greg Kurz <groug@kaod.org>
If we receive an unsupported request id, we first decide to
return -ENOTSUPP to the client, but since the request id
causes is_read_only_op() to return false, we change the
error to be -EROFS if the fsdev is read-only. This doesn't
make sense since we don't know what the client asked for.
This patch ensures that -EROFS can only be returned if the
request id is supported.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
The return value of v9fs_mark_fids_unreclaim() is then propagated to
pdu_complete(). It should be a negative errno, not -1.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
9p back-end first queries the size of an extended attribute,
allocates space for it via g_malloc() and then retrieves its
value into allocated buffer. Race between querying attribute
size and retrieving its could lead to memory bytes disclosure.
Use g_malloc0() to avoid it.
Reported-by: Tuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
v9fs_do_readdir_with_stat() should check for a maximum buffer size
before an attempt to marshal gathered data. Otherwise, buffers assumed
as misconfigured and the transport would be broken.
The patch brings v9fs_do_readdir_with_stat() in conformity with
v9fs_do_readdir() behavior.
Signed-off-by: Jan Dakinevich <jan.dakinevich@gmail.com>
[groug, regression caused my commit 8d37de41ca # 2.10]
Signed-off-by: Greg Kurz <groug@kaod.org>
The third parameter of v9fs_co_name_to_path() must not contain `/'
character.
The issue is most likely related to 9p2000.u protocol only.
Signed-off-by: Jan Dakinevich <jan.dakinevich@gmail.com>
[groug, regression caused by commit f57f587857 # 2.10]
Signed-off-by: Greg Kurz <groug@kaod.org>
If the client is using 9p2000.u, the following occurs:
$ cd ${virtfs_shared_dir}
$ mkdir -p a/b/c
$ ls a/b
ls: cannot access 'a/b/a': No such file or directory
ls: cannot access 'a/b/b': No such file or directory
a b c
instead of the expected:
$ ls a/b
c
This is a regression introduced by commit f57f5878578a;
local_name_to_path() now resolves ".." and "." in paths,
and v9fs_do_readdir_with_stat()->stat_to_v9stat() then
copies the basename of the resulting path to the response.
With the example above, this means that "." and ".." are
turned into "b" and "a" respectively...
stat_to_v9stat() currently assumes it is passed a full
canonicalized path and uses it to do two different things:
1) to pass it to v9fs_co_readlink() in case the file is a symbolic
link
2) to set the name field of the V9fsStat structure to the basename
part of the given path
It only has two users: v9fs_stat() and v9fs_do_readdir_with_stat().
v9fs_stat() really needs 1) and 2) to be performed since it starts
with the full canonicalized path stored in the fid. It is different
for v9fs_do_readdir_with_stat() though because the name we want to
put into the V9fsStat structure is the d_name field of the dirent
actually (ie, we want to keep the "." and ".." special names). So,
we only need 1) in this case.
This patch hence adds a basename argument to stat_to_v9stat(), to
be used to set the name field of the V9fsStat structure, and moves
the basename logic to v9fs_stat().
Signed-off-by: Jan Dakinevich <jan.dakinevich@gmail.com>
(groug, renamed old name argument to path and updated changelog)
Signed-off-by: Greg Kurz <groug@kaod.org>
Since fchmodat(2) on Linux doesn't support AT_SYMLINK_NOFOLLOW, we have to
implement it using workarounds. There are two different ways, depending on
whether the system supports O_PATH or not.
In the case O_PATH is supported, we rely on the behavhior of openat(2)
when passing O_NOFOLLOW | O_PATH and the file is a symbolic link. Even
if openat_file() already adds O_NOFOLLOW to the flags, this patch makes
it explicit that we need both creation flags to obtain the expected
behavior.
This is only cleanup, no functional change.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
(note this is how other functions also handle the errors).
hw/9pfs/9p.c:948:18: warning: Loss of sign in implicit conversion
offset = err;
^~~
Reported-by: Clang Static Analyzer
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
Nothing in fsdev/ or hw/9pfs/ depends on pci; it should rather depend
on CONFIG_VIRTFS and CONFIG_VIRTIO/CONFIG_XEN only.
Acked-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
This function has to ensure it doesn't follow a symlink that could be used
to escape the virtfs directory. This could be easily achieved if fchmodat()
on linux honored the AT_SYMLINK_NOFOLLOW flag as described in POSIX, but
it doesn't. There was a tentative to implement a new fchmodat2() syscall
with the correct semantics:
https://patchwork.kernel.org/patch/9596301/
but it didn't gain much momentum. Also it was suggested to look at an O_PATH
based solution in the first place.
The current implementation covers most use-cases, but it notably fails if:
- the target path has access rights equal to 0000 (openat() returns EPERM),
=> once you've done chmod(0000) on a file, you can never chmod() again
- the target path is UNIX domain socket (openat() returns ENXIO)
=> bind() of UNIX domain sockets fails if the file is on 9pfs
The solution is to use O_PATH: openat() now succeeds in both cases, and we
can ensure the path isn't a symlink with fstat(). The associated entry in
"/proc/self/fd" can hence be safely passed to the regular chmod() syscall.
The previous behavior is kept for older systems that don't have O_PATH.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Zhi Yong Wu <zhiyong.wu@ucloud.cn>
Acked-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
With the move of some docs/ to docs/devel/ on ac06724a71,
no references were updated.
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Convert all uses of error_report("warning:"... to use warn_report()
instead. This helps standardise on a single method of printing warnings
to the user.
All of the warnings were changed using these two commands:
find ./* -type f -exec sed -i \
's|error_report(".*warning[,:] |warn_report("|Ig' {} +
Indentation fixed up manually afterwards.
The test-qdev-global-props test case was manually updated to ensure that
this patch passes make check (as the test cases are case sensitive).
Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Suggested-by: Thomas Huth <thuth@redhat.com>
Cc: Jeff Cody <jcody@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Lieven <pl@kamp.de>
Cc: Josh Durgin <jdurgin@redhat.com>
Cc: "Richard W.M. Jones" <rjones@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Greg Kurz <groug@kaod.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Peter Chubb <peter.chubb@nicta.com.au>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Alexander Graf <agraf@suse.de>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Greg Kurz <groug@kaod.org>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed by: Peter Chubb <peter.chubb@data61.csiro.au>
Acked-by: Max Reitz <mreitz@redhat.com>
Acked-by: Marcel Apfelbaum <marcel@redhat.com>
Message-Id: <e1cfa2cd47087c248dd24caca9c33d9af0c499b0.1499866456.git.alistair.francis@xilinx.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Contrary to what is written in the comment, a buggy guest can misconfigure
the transport buffers and pdu_marshal() may return an error. If this ever
happens, it is up to the transport layer to handle the situation (9P is
transport agnostic).
This fixes Coverity issue CID1348518.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Implement xen_9pfs_disconnect by unbinding the event channels. On
xen_9pfs_free, call disconnect if any event channels haven't been
disconnected.
If the frontend misconfigured the buffers set the backend to "Closing"
and disconnect it. Misconfigurations include requesting a read of more
bytes than available on the ring buffer, or claiming to be writing more
data than available on the ring buffer.
Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
The 9P protocol is transport agnostic: if the guest misconfigured the
buffers, the best we can do is to set the broken flag on the device.
Signed-off-by: Greg Kurz <groug@kaod.org>
The 9p spec at http://man.cat-v.org/plan_9/5/intro reads:
"Each 9P message begins with a four-byte size field specify-
ing the length in bytes of the complete message including
the four bytes of the size field itself. The next byte is
the message type, one of the constants in the enumeration in
the include file <fcall.h>. The next two bytes are an iden-
tifying tag, described below."
ie, each message starts with a 7-byte long header.
The core 9P code already assumes this pretty much everywhere. This patch
does the following:
- makes the assumption explicit in the common 9p.h header, since it isn't
related to the transport
- open codes the header size in handle_9p_output() and hardens the sanity
check on the space needed for the reply message
Signed-off-by: Greg Kurz <groug@kaod.org>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
If the guest sends a malformed request, we end up with a dangling pointer
in V9fsVirtioState. This doesn't seem to cause any bug, but let's remove
this side effect anyway.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
I found these pattern via grepping the source tree. I don't have a
coccinelle script for it!
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
In mapped security modes, files are created with very restrictive
permissions (600 for files and 700 for directories). This makes
file sharing between virtual machines and users on the host rather
complicated. Imagine eg. a group of users that need to access data
produced by processes on a virtual machine. Giving those users access
to the data will be difficult since the group access mode is always 0.
This patch makes the default mode for both files and directories
configurable. Existing setups that don't know about the new parameters
keep using the current secure behavior.
Signed-off-by: Tobias Schramm <tobleminer@gmail.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Commit a0e640a8 introduced a path processing error.
Pass fstatat the dirpath based path component instead
of the entire path.
Signed-off-by: Bruce Rogers <brogers@suse.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
When using the mapped-file security, credentials are stored in a metadata
directory located in the parent directory. This is okay for all paths with
the notable exception of the root path, since we don't want and probably
can't create a metadata directory above the virtfs directory on the host.
This patch introduces a dedicated metadata file, sitting in the virtfs root
for this purpose. It relies on the fact that the "." name necessarily refers
to the virtfs root.
As for the metadata directory, we don't want the client to see this file.
The current code only cares for readdir() but there are many other places
to fix actually. The filtering logic is hence put in a separate function.
Before:
# ls -ld
drwxr-xr-x. 3 greg greg 4096 May 5 12:49 .
# chown root.root .
chown: changing ownership of '.': Is a directory
# ls -ld
drwxr-xr-x. 3 greg greg 4096 May 5 12:49 .
After:
# ls -ld
drwxr-xr-x. 3 greg greg 4096 May 5 12:49 .
# chown root.root .
# ls -ld
drwxr-xr-x. 3 root root 4096 May 5 12:50 .
and from the host:
ls -al .virtfs_metadata_root
-rwx------. 1 greg greg 26 May 5 12:50 .virtfs_metadata_root
$ cat .virtfs_metadata_root
virtfs.uid=0
virtfs.gid=0
Reported-by: Leo Gaspard <leo@gaspard.io>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Leo Gaspard <leo@gaspard.io>
[groug: work around a patchew false positive in
local_set_mapped_file_attrat()]
The logic to open a path currently sits between local_open_nofollow() and
the relative_openat_nofollow() helper, which has no other user.
For the sake of clarity, this patch moves all the code of the helper into
its unique caller. While here we also:
- drop the code to skip leading "/" because the backend isn't supposed to
pass anything but relative paths without consecutive slashes. The assert()
is kept because we really don't want a buggy backend to pass an absolute
path to openat().
- use strchrnul() to get a simpler code. This is ok since virtfs is for
linux+glibc hosts only.
- don't dup() the initial directory and add an assert() to ensure we don't
return the global mountfd to the caller. BTW, this would mean that the
caller passed an empty path, which isn't supposed to happen either.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
[groug: fixed typos in changelog]
When using the mapped-file security mode, the creds of a path /foo/bar
are stored in the /foo/.virtfs_metadata/bar file. This is okay for all
paths unless they end with '.' or '..', because we cannot create the
corresponding file in the metadata directory.
This patch ensures that '.' and '..' are resolved in all paths.
The core code only passes path elements (no '/') to the backend, with
the notable exception of the '/' path, which refers to the virtfs root.
This patch preserves the current behavior of converting it to '.' so
that it can be passed to "*at()" syscalls ('/' would mean the host root).
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
These v9fs_co_name_to_path() call sites have always been around. I guess
no care was taken to check the return value because the name_to_path
operation could never fail at the time. This is no longer true: the
handle and synth backends can already fail this operation, and so will the
local backend soon.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
The utimensat() and futimens() syscalls have been around for ages (ie,
glibc 2.6 and linux 2.6.22), and the decision was already taken to
switch to utimensat() anyway when fixing CVE-2016-9602 in 2.9.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
When trying to remove a file from a directory, both created in non-mapped
mode, the file remains and EBADF is returned to the guest.
This is a regression introduced by commit "df4938a6651b 9pfs: local:
unlinkat: don't follow symlinks" when fixing CVE-2016-9602. It changed the
way we unlink the metadata file from
ret = remove("$dir/.virtfs_metadata/$name");
if (ret < 0 && errno != ENOENT) {
/* Error out */
}
/* Ignore absence of metadata */
to
fd = openat("$dir/.virtfs_metadata")
unlinkat(fd, "$name")
if (ret < 0 && errno != ENOENT) {
/* Error out */
}
/* Ignore absence of metadata */
If $dir was created in non-mapped mode, openat() fails with ENOENT and
we pass -1 to unlinkat(), which fails in turn with EBADF.
We just need to check the return of openat() and ignore ENOENT, in order
to restore the behaviour we had with remove().
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
[groug: rewrote the comments as suggested by Eric]
Only pdu_complete() needs to notify the client that a request has completed.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
These bits aren't related to the transport so let's move them to the core
code.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
This allows us to remove lots of includes of migration/migration.h
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Use the common utility function, which contains checks on return values
and first calls F_GETFD as recommended by POSIX.1-2001, instead of
manually calling fcntl.
CID: 1374831
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
CC: anthony.perard@citrix.com
CC: groug@kaod.org
CC: aneesh.kumar@linux.vnet.ibm.com
CC: Eric Blake <eblake@redhat.com>
When using the mapped-file security mode, we shouldn't let the client mess
with the metadata. The current code already tries to hide the metadata dir
from the client by skipping it in local_readdir(). But the client can still
access or modify it through several other operations. This can be used to
escalate privileges in the guest.
Affected backend operations are:
- local_mknod()
- local_mkdir()
- local_open2()
- local_symlink()
- local_link()
- local_unlinkat()
- local_renameat()
- local_rename()
- local_name_to_path()
Other operations are safe because they are only passed a fid path, which
is computed internally in local_name_to_path().
This patch converts all the functions listed above to fail and return
EINVAL when being passed the name of the metadata dir. This may look
like a poor choice for errno, but there's no such thing as an illegal
path name on Linux and I could not think of anything better.
This fixes CVE-2017-7493.
Reported-by: Leo Gaspard <leo@gaspard.io>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJY/5EdAAoJEIlPj0hw4a6QHj8P/1D3iZq8vKyLvnkioYC00ao8
dhuCFV1Dk0dl/QXvDXxNAHFqmHp2PzHeHF2V19bTbUh9QnY3WH9aSsMQ7YVPDzLb
6ZezbxXd1l8+IOrWh+ch+x6jiUpRAeHGjhSpFxQEmH5THBmPtLHBFhAeeGpVq6CT
MPFlN0mr1snyMMbK2aKCVTHgh5Wip83/t/kijIq4RmPwXdJbwxx55PL8jxC/NZHq
/NbAZzAT149fmItfBNnsRTfNoDs7fSmgM9VelYsnJNHs6qkYYfewxys4vudePG8n
ZcjCXMv9vdCSTfXfYY9nxhfGCgkkRSvBWrzARRIUPxTXGAmEU52LJrccpG9AEFRZ
Kgz1JYr0y+u/g+RsCJvAglvmciawXgZH8GIR4sFl2iv4u6cx8PxNRgjTdt7YX4Je
V7+scmOLB0U5wkI7PUcPV1v2fwKJHFfMdyik257otfMCJiTCnWGJfCZGR4JAdy3C
NHuG4eIj2XLjZ7IQIo3mg+EfdCWdfVMKtXj0MAFsP6Kcr6sY/ef5sx/wB61k3NU1
Y4ctI/LAOONsjlC2yzJ4aZR4LgyFcVcwx8FKOK7niCcCgVLYGWpyiHU3kFKYlmKM
FKIlGPPOK8WuRV9NUGDI9A8XENyFXGyiR2xGCotfgHj+FSSqRzhKH4ecfgNimJVY
wjTzZmBa68pLHdXaJ+SV
=OfeB
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/sstabellini/tags/xen-20170421-v2-tag' into staging
Xen 2017/04/21 + fix
# gpg: Signature made Tue 25 Apr 2017 19:10:37 BST
# gpg: using RSA key 0x894F8F4870E1AE90
# gpg: Good signature from "Stefano Stabellini <stefano.stabellini@eu.citrix.com>"
# gpg: aka "Stefano Stabellini <sstabellini@kernel.org>"
# Primary key fingerprint: D04E 33AB A51F 67BA 07D3 0AEA 894F 8F48 70E1 AE90
* remotes/sstabellini/tags/xen-20170421-v2-tag: (21 commits)
move xen-mapcache.c to hw/i386/xen/
move xen-hvm.c to hw/i386/xen/
move xen-common.c to hw/xen/
add xen-9p-backend to MAINTAINERS under Xen
xen/9pfs: build and register Xen 9pfs backend
xen/9pfs: send responses back to the frontend
xen/9pfs: implement in/out_iov_from_pdu and vmarshal/vunmarshal
xen/9pfs: receive requests from the frontend
xen/9pfs: connect to the frontend
xen/9pfs: introduce Xen 9pfs backend
9p: introduce a type for the 9p header
xen: import ring.h from xen
configure: use pkg-config for obtaining xen version
xen: additionally restrict xenforeignmemory operations
xen: use libxendevice model to restrict operations
xen: use 5 digit xen versions
xen: use libxendevicemodel when available
configure: detect presence of libxendevicemodel
xen: create wrappers for all other uses of xc_hvm_XXX() functions
xen: rename xen_modified_memory() to xen_hvm_modified_memory()
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Once a request is completed, xen_9pfs_push_and_notify gets called. In
xen_9pfs_push_and_notify, update the indexes (data has already been
copied to the sg by the common code) and send a notification to the
frontend.
Schedule the bottom-half to check if we already have any other requests
pending.
Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: anthony.perard@citrix.com
CC: jgross@suse.com
CC: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
CC: Greg Kurz <groug@kaod.org>
Implement xen_9pfs_init_in/out_iov_from_pdu and
xen_9pfs_pdu_vmarshal/vunmarshall by creating new sg pointing to the
data on the ring.
This is safe as we only handle one request per ring at any given time.
Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: anthony.perard@citrix.com
CC: jgross@suse.com
CC: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
CC: Greg Kurz <groug@kaod.org>
Upon receiving an event channel notification from the frontend, schedule
the bottom half. From the bottom half, read one request from the ring,
create a pdu and call pdu_submit to handle it.
For now, only handle one request per ring at a time.
Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: anthony.perard@citrix.com
CC: jgross@suse.com
CC: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
CC: Greg Kurz <groug@kaod.org>
Write the limits of the backend to xenstore. Connect to the frontend.
Upon connection, allocate the rings according to the protocol
specification.
Initialize a QEMUBH to schedule work upon receiving an event channel
notification from the frontend.
Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
CC: anthony.perard@citrix.com
CC: jgross@suse.com
CC: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
CC: Greg Kurz <groug@kaod.org>
Introduce the Xen 9pfs backend: add struct XenDevOps to register as a
Xen backend and add struct V9fsTransport to register as v9fs transport.
All functions are empty stubs for now.
Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
CC: anthony.perard@citrix.com
CC: jgross@suse.com
CC: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
CC: Greg Kurz <groug@kaod.org>
Use the new type in virtio-9p-device.
Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
CC: anthony.perard@citrix.com
CC: jgross@suse.com
CC: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
CC: Greg Kurz <groug@kaod.org>
The local backend was recently converted to using "at*()" syscalls in order
to ensure all accesses happen below the shared directory. This requires that
we only pass relative paths, otherwise the dirfd argument to the "at*()"
syscalls is ignored and the path is treated as an absolute path in the host.
This is actually the case for paths in all fids, with the notable exception
of the root fid, whose path is "/". This causes the following backend ops to
act on the "/" directory of the host instead of the virtfs shared directory
when the export root is involved:
- lstat
- chmod
- chown
- utimensat
ie, chmod /9p_mount_point in the guest will be converted to chmod / in the
host for example. This could cause security issues with a privileged QEMU.
All "*at()" syscalls are being passed an open file descriptor. In the case
of the export root, this file descriptor points to the path in the host that
was passed to -fsdev.
The fix is thus as simple as changing the path of the export root fid to be
"." instead of "/".
This is CVE-2017-7471.
Cc: qemu-stable@nongnu.org
Reported-by: Léo Gaspard <leo@gaspard.io>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The migration blocker survives a device reset: if the guest mounts a 9p
share and then gets rebooted with system_reset, it will be unmigratable
until it remounts and umounts the 9p share again.
This happens because the migration blocker is supposed to be cleared when
we put the last reference on the root fid, but virtfs_reset() wrongly calls
free_fid() instead of put_fid().
This patch fixes virtfs_reset() so that it honor the way fids are supposed
to be manipulated: first get a reference and later put it back when you're
done.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Li Qiang <liqiang6-s@360.cn>
If a client tries to flush the same outstanding request several times, only
the first flush completes. Subsequent ones keep waiting for the request
completion in v9fs_flush() and, therefore, leak a PDU. This will cause QEMU
to hang when draining active PDUs the next time the device is reset.
Let have each flush request wake up the next one if any. The last waiter
frees the cancelled PDU.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
The v9fs_create() and v9fs_lcreate() functions are used to create a file
on the backend and to associate it to a fid. The fid shouldn't be already
in-use, otherwise both functions may silently leak a file descriptor or
allocated memory. The current code doesn't check that.
This patch ensures that the fid isn't already associated to anything
before using it.
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
(reworded the changelog, Greg Kurz)
Signed-off-by: Greg Kurz <groug@kaod.org>
Replies from the virtfs proxy are made up of a fixed-size header (8 bytes)
and a payload of variable size (maximum 64kb). When receiving a reply,
the proxy backend first reads the whole header and then unmarshals it.
If the header is okay, it then does the same operation with the payload.
Since the proxy backend uses a pre-allocated buffer which has enough room
for a header and the maximum payload size, marshalling should never fail
with fixed size arguments. Any error here is likely to result from a more
serious corruption in QEMU and we'd better dump core right away.
This patch adds error checks where they are missing and converts the
associated error paths into assertions.
This should also address Coverity's complaints CID 1348519 and CID 1348520,
about not always checking the return value of proxy_unmarshal().
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
According to the 9P spec [*], when a client wants to cancel a pending I/O
request identified by a given tag (uint16), it must send a Tflush message
and wait for the server to respond with a Rflush message before reusing this
tag for another I/O. The server may still send a completion message for the
I/O if it wasn't actually cancelled but the Rflush message must arrive after
that.
QEMU hence waits for the flushed PDU to complete before sending the Rflush
message back to the client.
If a client sends 'Tflush tag oldtag' and tag == oldtag, QEMU will then
allocate a PDU identified by tag, find it in the PDU list and wait for
this same PDU to complete... i.e. wait for a completion that will never
happen. This causes a tag and ring slot leak in the guest, and a PDU
leak in QEMU, all of them limited by the maximal number of PDUs (128).
But, worse, this causes QEMU to hang on device reset since v9fs_reset()
wants to drain all pending I/O.
This insane behavior is likely to denote a bug in the client, and it would
deserve an Rerror message to be sent back. Unfortunately, the protocol
allows it and requires all flush requests to suceed (only a Tflush response
is expected).
The only option is to detect when we have to handle a self-referencing
flush request and report success to the client right away.
[*] http://man.cat-v.org/plan_9/5/flush
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Greg Kurz <groug@kaod.org>
We should pass O_NOFOLLOW otherwise openat() will follow symlinks and make
QEMU vulnerable.
While here, we also fix local_unlinkat_common() to use openat_dir() for
the same reasons (it was a leftover in the original patchset actually).
This fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
When O_PATH is used with O_DIRECTORY, it only acts as an optimization: the
openat() syscall simply finds the name in the VFS, and doesn't trigger the
underlying filesystem.
On systems that don't define O_PATH, because they have glibc version 2.13
or older for example, we can safely omit it. We don't want to deactivate
O_PATH globally though, in case it is used without O_DIRECTORY. The is done
with a dedicated macro.
Systems without O_PATH may thus fail to resolve names that involve
unreadable directories, compared to newer systems succeeding, but such
corner case failure is our only option on those older systems to avoid
the security hole of chasing symlinks inappropriately.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
(added last paragraph to changelog as suggested by Eric Blake)
Signed-off-by: Greg Kurz <groug@kaod.org>
The name argument can never be an empty string, and dirfd always point to
the containing directory of the file name. AT_EMPTY_PATH is hence useless
here. Also it breaks build with glibc version 2.13 and older.
It is actually an oversight of a previous tentative patch to implement this
function. We can safely drop it.
Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Greg Kurz <groug@kaod.org>
Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Eric Blake <eblake@redhat.com>
If we cannot open the given path, we can return right away instead of
passing -1 to fstatfs() and close(). This will make Coverity happy.
(Coverity issue CID1371729)
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Daniel P. berrange <berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Coverity issue CID1371731
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
This was spotted by Coverity as a fd leak. This is certainly true, but also
local_remove() would always return without doing anything, unless the fd is
zero, which is very unlikely.
(Coverity issue CID1371732)
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Now that the all callbacks have been converted to use "at" syscalls, we
can drop this code.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The local_open2() callback is vulnerable to symlink attacks because it
calls:
(1) open() which follows symbolic links for all path elements but the
rightmost one
(2) local_set_xattr()->setxattr() which follows symbolic links for all
path elements
(3) local_set_mapped_file_attr() which calls in turn local_fopen() and
mkdir(), both functions following symbolic links for all path
elements but the rightmost one
(4) local_post_create_passthrough() which calls in turn lchown() and
chmod(), both functions also following symbolic links
This patch converts local_open2() to rely on opendir_nofollow() and
mkdirat() to fix (1), as well as local_set_xattrat(),
local_set_mapped_file_attrat() and local_set_cred_passthrough() to
fix (2), (3) and (4) respectively. Since local_open2() already opens
a descriptor to the target file, local_set_cred_passthrough() is
modified to reuse it instead of opening a new one.
The mapped and mapped-file security modes are supposed to be identical,
except for the place where credentials and file modes are stored. While
here, we also make that explicit by sharing the call to openat().
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The local_mkdir() callback is vulnerable to symlink attacks because it
calls:
(1) mkdir() which follows symbolic links for all path elements but the
rightmost one
(2) local_set_xattr()->setxattr() which follows symbolic links for all
path elements
(3) local_set_mapped_file_attr() which calls in turn local_fopen() and
mkdir(), both functions following symbolic links for all path
elements but the rightmost one
(4) local_post_create_passthrough() which calls in turn lchown() and
chmod(), both functions also following symbolic links
This patch converts local_mkdir() to rely on opendir_nofollow() and
mkdirat() to fix (1), as well as local_set_xattrat(),
local_set_mapped_file_attrat() and local_set_cred_passthrough() to
fix (2), (3) and (4) respectively.
The mapped and mapped-file security modes are supposed to be identical,
except for the place where credentials and file modes are stored. While
here, we also make that explicit by sharing the call to mkdirat().
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The local_mknod() callback is vulnerable to symlink attacks because it
calls:
(1) mknod() which follows symbolic links for all path elements but the
rightmost one
(2) local_set_xattr()->setxattr() which follows symbolic links for all
path elements
(3) local_set_mapped_file_attr() which calls in turn local_fopen() and
mkdir(), both functions following symbolic links for all path
elements but the rightmost one
(4) local_post_create_passthrough() which calls in turn lchown() and
chmod(), both functions also following symbolic links
This patch converts local_mknod() to rely on opendir_nofollow() and
mknodat() to fix (1), as well as local_set_xattrat() and
local_set_mapped_file_attrat() to fix (2) and (3) respectively.
A new local_set_cred_passthrough() helper based on fchownat() and
fchmodat_nofollow() is introduced as a replacement to
local_post_create_passthrough() to fix (4).
The mapped and mapped-file security modes are supposed to be identical,
except for the place where credentials and file modes are stored. While
here, we also make that explicit by sharing the call to mknodat().
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The local_symlink() callback is vulnerable to symlink attacks because it
calls:
(1) symlink() which follows symbolic links for all path elements but the
rightmost one
(2) open(O_NOFOLLOW) which follows symbolic links for all path elements but
the rightmost one
(3) local_set_xattr()->setxattr() which follows symbolic links for all
path elements
(4) local_set_mapped_file_attr() which calls in turn local_fopen() and
mkdir(), both functions following symbolic links for all path
elements but the rightmost one
This patch converts local_symlink() to rely on opendir_nofollow() and
symlinkat() to fix (1), openat(O_NOFOLLOW) to fix (2), as well as
local_set_xattrat() and local_set_mapped_file_attrat() to fix (3) and
(4) respectively.
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The local_chown() callback is vulnerable to symlink attacks because it
calls:
(1) lchown() which follows symbolic links for all path elements but the
rightmost one
(2) local_set_xattr()->setxattr() which follows symbolic links for all
path elements
(3) local_set_mapped_file_attr() which calls in turn local_fopen() and
mkdir(), both functions following symbolic links for all path
elements but the rightmost one
This patch converts local_chown() to rely on open_nofollow() and
fchownat() to fix (1), as well as local_set_xattrat() and
local_set_mapped_file_attrat() to fix (2) and (3) respectively.
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The local_chmod() callback is vulnerable to symlink attacks because it
calls:
(1) chmod() which follows symbolic links for all path elements
(2) local_set_xattr()->setxattr() which follows symbolic links for all
path elements
(3) local_set_mapped_file_attr() which calls in turn local_fopen() and
mkdir(), both functions following symbolic links for all path
elements but the rightmost one
We would need fchmodat() to implement AT_SYMLINK_NOFOLLOW to fix (1). This
isn't the case on linux unfortunately: the kernel doesn't even have a flags
argument to the syscall :-\ It is impossible to fix it in userspace in
a race-free manner. This patch hence converts local_chmod() to rely on
open_nofollow() and fchmod(). This fixes the vulnerability but introduces
a limitation: the target file must readable and/or writable for the call
to openat() to succeed.
It introduces a local_set_xattrat() replacement to local_set_xattr()
based on fsetxattrat() to fix (2), and a local_set_mapped_file_attrat()
replacement to local_set_mapped_file_attr() based on local_fopenat()
and mkdirat() to fix (3). No effort is made to factor out code because
both local_set_xattr() and local_set_mapped_file_attr() will be dropped
when all users have been converted to use the "at" versions.
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The local_link() callback is vulnerable to symlink attacks because it calls:
(1) link() which follows symbolic links for all path elements but the
rightmost one
(2) local_create_mapped_attr_dir()->mkdir() which follows symbolic links
for all path elements but the rightmost one
This patch converts local_link() to rely on opendir_nofollow() and linkat()
to fix (1), mkdirat() to fix (2).
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
When using the mapped-file security model, we also have to create a link
for the metadata file if it exists. In case of failure, we should rollback.
That's what this patch does.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The local_rename() callback is vulnerable to symlink attacks because it
uses rename() which follows symbolic links in all path elements but the
rightmost one.
This patch simply transforms local_rename() into a wrapper around
local_renameat() which is symlink-attack safe.
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The local_renameat() callback is currently a wrapper around local_rename()
which is vulnerable to symlink attacks.
This patch rewrites local_renameat() to have its own implementation, based
on local_opendir_nofollow() and renameat().
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The local_lstat() callback is vulnerable to symlink attacks because it
calls:
(1) lstat() which follows symbolic links in all path elements but the
rightmost one
(2) getxattr() which follows symbolic links in all path elements
(3) local_mapped_file_attr()->local_fopen()->openat(O_NOFOLLOW) which
follows symbolic links in all path elements but the rightmost
one
This patch converts local_lstat() to rely on opendir_nofollow() and
fstatat(AT_SYMLINK_NOFOLLOW) to fix (1), fgetxattrat_nofollow() to
fix (2).
A new local_fopenat() helper is introduced as a replacement to
local_fopen() to fix (3). No effort is made to factor out code
because local_fopen() will be dropped when all users have been
converted to call local_fopenat().
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The local_readlink() callback is vulnerable to symlink attacks because it
calls:
(1) open(O_NOFOLLOW) which follows symbolic links for all path elements but
the rightmost one
(2) readlink() which follows symbolic links for all path elements but the
rightmost one
This patch converts local_readlink() to rely on open_nofollow() to fix (1)
and opendir_nofollow(), readlinkat() to fix (2).
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The local_truncate() callback is vulnerable to symlink attacks because
it calls truncate() which follows symbolic links in all path elements.
This patch converts local_truncate() to rely on open_nofollow() and
ftruncate() instead.
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The local_statfs() callback is vulnerable to symlink attacks because it
calls statfs() which follows symbolic links in all path elements.
This patch converts local_statfs() to rely on open_nofollow() and fstatfs()
instead.
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The local_utimensat() callback is vulnerable to symlink attacks because it
calls qemu_utimens()->utimensat(AT_SYMLINK_NOFOLLOW) which follows symbolic
links in all path elements but the rightmost one or qemu_utimens()->utimes()
which follows symbolic links for all path elements.
This patch converts local_utimensat() to rely on opendir_nofollow() and
utimensat(AT_SYMLINK_NOFOLLOW) directly instead of using qemu_utimens().
It is hence assumed that the OS supports utimensat(), i.e. has glibc 2.6
or higher and linux 2.6.22 or higher, which seems reasonable nowadays.
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The local_remove() callback is vulnerable to symlink attacks because it
calls:
(1) lstat() which follows symbolic links in all path elements but the
rightmost one
(2) remove() which follows symbolic links in all path elements but the
rightmost one
This patch converts local_remove() to rely on opendir_nofollow(),
fstatat(AT_SYMLINK_NOFOLLOW) to fix (1) and unlinkat() to fix (2).
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The local_unlinkat() callback is vulnerable to symlink attacks because it
calls remove() which follows symbolic links in all path elements but the
rightmost one.
This patch converts local_unlinkat() to rely on opendir_nofollow() and
unlinkat() instead.
Most of the code is moved to a separate local_unlinkat_common() helper
which will be reused in a subsequent patch to fix the same issue in
local_remove().
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The local_lremovexattr() callback is vulnerable to symlink attacks because
it calls lremovexattr() which follows symbolic links in all path elements
but the rightmost one.
This patch introduces a helper to emulate the non-existing fremovexattrat()
function: it is implemented with /proc/self/fd which provides a trusted
path that can be safely passed to lremovexattr().
local_lremovexattr() is converted to use this helper and opendir_nofollow().
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The local_lsetxattr() callback is vulnerable to symlink attacks because
it calls lsetxattr() which follows symbolic links in all path elements but
the rightmost one.
This patch introduces a helper to emulate the non-existing fsetxattrat()
function: it is implemented with /proc/self/fd which provides a trusted
path that can be safely passed to lsetxattr().
local_lsetxattr() is converted to use this helper and opendir_nofollow().
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The local_llistxattr() callback is vulnerable to symlink attacks because
it calls llistxattr() which follows symbolic links in all path elements but
the rightmost one.
This patch introduces a helper to emulate the non-existing flistxattrat()
function: it is implemented with /proc/self/fd which provides a trusted
path that can be safely passed to llistxattr().
local_llistxattr() is converted to use this helper and opendir_nofollow().
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The local_lgetxattr() callback is vulnerable to symlink attacks because
it calls lgetxattr() which follows symbolic links in all path elements but
the rightmost one.
This patch introduces a helper to emulate the non-existing fgetxattrat()
function: it is implemented with /proc/self/fd which provides a trusted
path that can be safely passed to lgetxattr().
local_lgetxattr() is converted to use this helper and opendir_nofollow().
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
The local_open() and local_opendir() callbacks are vulnerable to symlink
attacks because they call:
(1) open(O_NOFOLLOW) which follows symbolic links in all path elements but
the rightmost one
(2) opendir() which follows symbolic links in all path elements
This patch converts both callbacks to use new helpers based on
openat_nofollow() to only open files and directories if they are
below the virtfs shared folder
This partly fixes CVE-2016-9602.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
This patch opens the shared folder and caches the file descriptor, so that
it can be used to do symlink-safe path walk.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
When using the passthrough security mode, symbolic links created by the
guest are actual symbolic links on the host file system.
Since the resolution of symbolic links during path walk is supposed to
occur on the client side. The server should hence never receive any path
pointing to an actual symbolic link. This isn't guaranteed by the protocol
though, and malicious code in the guest can trick the server to issue
various syscalls on paths whose one or more elements are symbolic links.
In the case of the "local" backend using the "passthrough" or "none"
security modes, the guest can directly create symbolic links to arbitrary
locations on the host (as per spec). The "mapped-xattr" and "mapped-file"
security modes are also affected to a lesser extent as they require some
help from an external entity to create actual symbolic links on the host,
i.e. another guest using "passthrough" mode for example.
The current code hence relies on O_NOFOLLOW and "l*()" variants of system
calls. Unfortunately, this only applies to the rightmost path component.
A guest could maliciously replace any component in a trusted path with a
symbolic link. This could allow any guest to escape a virtfs shared folder.
This patch introduces a variant of the openat() syscall that successively
opens each path element with O_NOFOLLOW. When passing a file descriptor
pointing to a trusted directory, one is guaranteed to be returned a
file descriptor pointing to a path which is beneath the trusted directory.
This will be used by subsequent patches to implement symlink-safe path walk
for any access to the backend.
Symbolic links aren't the only threats actually: a malicious guest could
change a path element to point to other types of file with undesirable
effects:
- a named pipe or any other thing that would cause openat() to block
- a terminal device which would become QEMU's controlling terminal
These issues can be addressed with O_NONBLOCK and O_NOCTTY.
Two helpers are introduced: one to open intermediate path elements and one
to open the rightmost path element.
Suggested-by: Jann Horn <jannh@google.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(renamed openat_nofollow() to relative_openat_nofollow(),
assert path is relative and doesn't contain '//',
fixed side-effect in assert, Greg Kurz)
Signed-off-by: Greg Kurz <groug@kaod.org>
If these functions fail, they should not change *fs. Let's use local
variables to fix this.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
These functions are always called indirectly. It really doesn't make sense
for them to sit in a header file.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
This patchset adds the throttle support for the 9p-local driver.
For now this functionality can be enabled only through qemu cli options.
QMP interface and support to other drivers need further extensions.
To make it simple for other 9p drivers, the throttle code has been put in
separate files.
Signed-off-by: Pradeep Jagadeesh <pradeep.jagadeesh@huawei.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
(pass extra NULL CoMutex * argument to qemu_co_queue_wait(),
added options to qemu-options.hx, Greg Kurz)
Signed-off-by: Greg Kurz <groug@kaod.org>
In this case, we are marshaling an error status instead of the errno value.
Reorganize the out and out_nofid labels to look like all the other cases.
Coverity reports this because the "err = -ENOENT" and "err = -EINVAL"
assignments above are dead, overwritten by the call to pdu_marshal.
(Coverity issues CID1348512 and CID1348513)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(also open-coded the success path since locking is a nop for us, Greg Kurz)
Signed-off-by: Greg Kurz <groug@kaod.org>
All that CoQueue needs in order to become thread-safe is help
from an external mutex. Add this to the API.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20170213181244.16297-6-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
cause 9p clients to hang. Other patches are minor enhancements.
-----BEGIN PGP SIGNATURE-----
iEYEABECAAYFAliIegsACgkQAvw66wEB28LjzwCeIKbBFC/hbc43UqaNX82OGd2v
soYAn0YYXJUAykyjNEMLdhhNp+rABzNk
=1PaE
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/gkurz/tags/for-upstream' into staging
This pull request fixes a 2.9 regression and a long standing bug that can
cause 9p clients to hang. Other patches are minor enhancements.
# gpg: Signature made Wed 25 Jan 2017 10:12:27 GMT
# gpg: using DSA key 0x02FC3AEB0101DBC2
# gpg: Good signature from "Greg Kurz <groug@kaod.org>"
# gpg: aka "Greg Kurz <groug@free.fr>"
# gpg: aka "Greg Kurz <gkurz@fr.ibm.com>"
# gpg: aka "Greg Kurz <gkurz@linux.vnet.ibm.com>"
# gpg: aka "Gregory Kurz (Groug) <groug@free.fr>"
# gpg: aka "Gregory Kurz (Cimai Technology) <gkurz@cimai.com>"
# gpg: aka "Gregory Kurz (Meiosys Technology) <gkurz@meiosys.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 2BD4 3B44 535E C0A7 9894 DBA2 02FC 3AEB 0101 DBC2
* remotes/gkurz/tags/for-upstream:
9pfs: fix offset error in v9fs_xattr_read()
9pfs: local: trivial cosmetic fix in pwritev op
9pfs: fix off-by-one error in PDU free list
tests: virtio-9p: improve error reporting
9pfs: add missing coroutine_fn annotations
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The current code tries to copy `read_count' bytes starting at offset
`offset' from a `read_count`-sized iovec. This causes v9fs_pack() to
fail with ENOBUFS.
Since the PDU iovec is already partially filled with `offset' bytes,
let's skip them when creating `qiov_full' and have v9fs_pack() to
copy the whole of it. Moreover, this is consistent with the other
places where v9fs_init_qiov_from_pdu() is called.
This fixes commit "bcb8998fac16 9pfs: call v9fs_init_qiov_from_pdu
before v9fs_pack".
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
The server can handle MAX_REQ - 1 PDUs at a time and the virtio-9p
device has a MAX_REQ sized virtqueue. If the client manages to fill
up the virtqueue, pdu_alloc() will fail and the request won't be
processed without any notice to the client (it actually causes the
linux 9p client to hang).
This has been there since the beginning (commit 9f10751365 "virtio-9p:
Add a virtio 9p device to qemu"), but it needs an agressive workload to
run in the guest to show up.
We actually allocate MAX_REQ PDUs and I see no reason not to link them
all into the free list, so let's fix the init loop.
Reported-by: Tuomas Tynkkynen <tuomas@tuxera.com>
Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Greg Kurz <groug@kaod.org>
If a migration is already in progress and somebody attempts
to add a migration blocker, this should rightly fail.
Add an errp parameter and a retcode return value to migrate_add_blocker.
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com>
Message-Id: <1484566314-3987-5-git-send-email-ashijeetacharya@gmail.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Acked-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Merged with recent 'Allow invtsc migration' change
The u16 and u32 types don't exist in QEMU common headers. It never broke
build because these two macros aren't use by the current code, but this
is about to change with the future addition of functional tests for 9P.
Also, these should have enclosing parenthesis to be usable in any
syntactical situation.
As suggested by Eric Blake, let's use UINT16_MAX and UINT32_MAX to address
both issues.
Signed-off-by: Greg Kurz <groug@kaod.org>
If the user passes -device virtio-9p without the corresponding -fsdev, QEMU
dereferences a NULL pointer and crashes.
This is a 2.8 regression introduced by commit 702dbcc274.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Not all 9pfs transports share memory between request and response. For
those who don't, it is necessary to know how much memory is required in
the response.
Split the existing init_iov_from_pdu function in two:
init_out_iov_from_pdu (for writes) and init_in_iov_from_pdu (for reads).
init_in_iov_from_pdu takes an additional size parameter to specify the
memory required for the response message.
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
v9fs_xattr_read should not access VirtQueueElement elems directly.
Move v9fs_init_qiov_from_pdu up in the file and call
v9fs_init_qiov_from_pdu before v9fs_pack. Use v9fs_pack on the new
iovec.
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
Don't call virtio functions from 9pfs generic code, use generic function
callbacks instead.
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
pdus are initialized and used in 9pfs common code. Move the array from
V9fsVirtioState to V9fsState.
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
In the init operation of proxy backend dirver, it allocates a
V9fsProxy struct and some other resources. We should free these
resources when the 9pfs device is unrealized. This is what this
patch does.
Signed-off-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
In the init operation of handle backend dirver, it allocates a
handle_data struct and opens a mount file. We should free these
resources when the 9pfs device is unrealized. This is what this
patch does.
Signed-off-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
Currently, the backend of VirtFS doesn't have a cleanup
function. This will lead resource leak issues if the backed
driver allocates resources. This patch addresses this issue.
Signed-off-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
Unrealize should undo things that were set during realize in
reverse order. So should do in the error path in realize.
Signed-off-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
The virtfs_reset() function is called either when the virtio-9p device
gets reset, or when the client starts a new 9P session. In both cases,
if it finds fids from a previous session, the following is printed in
the monitor:
9pfs:virtfs_reset: One or more uncluncked fids found during reset
For example, if a linux guest with a mounted 9P share is reset from the
monitor with system_reset, the message will be printed. This is excessive
since these fids are now clunked and the state is clean.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
A buggy or malicious guest could pass the id of an already opened fid and
cause QEMU to abort. Let's return EINVAL to the guest instead.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
The xattrcreate operation only makes sense on a freshly cloned fid
actually, since any open state would be leaked because of the fid_type
change. This is indeed what the linux kernel client does:
fid = clone_fid(fid);
[...]
retval = p9_client_xattrcreate(fid, name, value_len, flags);
This patch also reverts commit ff55e94d23 since we are sure that a fid
with type P9_FID_NONE doesn't have a previously allocated xattr.
Signed-off-by: Greg Kurz <groug@kaod.org>
We shouldn't allow guests to create extended attribute with arbitrary sizes.
On linux hosts, the limit is XATTR_SIZE_MAX. Let's use it.
Signed-off-by: Greg Kurz <groug@kaod.org>
The v9fs_xattr_read() and v9fs_xattr_write() are passed a guest
originated offset: they must ensure this offset does not go beyond
the size of the extended attribute that was set in v9fs_xattrcreate().
Unfortunately, the current code implement these checks with unsafe
calculations on 32 and 64 bit values, which may allow a malicious
guest to cause OOB access anyway.
Fix this by comparing the offset and the xattr size, which are
both uint64_t, before trying to compute the effective number of bytes
to read or write.
Suggested-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-By: Guido Günther <agx@sigxcpu.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
The 'len' in V9fsXattr comes from the 'size' argument in setxattr()
function in guest. The setxattr() function's declaration is this:
int setxattr(const char *path, const char *name,
const void *value, size_t size, int flags);
and 'size' is treated as u64 in linux kernel client code:
int p9_client_xattrcreate(struct p9_fid *fid, const char *name,
u64 attr_size, int flags)
So the 'len' should have an type of 'uint64_t'.
The 'copied_len' in V9fsXattr is used to account for copied bytes, it
should also have an type of 'uint64_t'.
Suggested-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
Currently, 9pfs sets the 'copied_len' field in V9fsXattr
to -1 to tag xattr walk fid. As the 'copied_len' is also
used to account for copied bytes, this may make confusion. This patch
add a bool 'xattrwalk_fid' to tag the xattr walk fid.
Suggested-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
If an error occurs when marshalling the transfer length to the guest, the
v9fs_write() function doesn't free an IO vector, thus leading to a memory
leak. This patch fixes the issue.
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Reviewed-by: Greg Kurz <groug@kaod.org>
[groug, rephrased the changelog]
Signed-off-by: Greg Kurz <groug@kaod.org>
The v9fs_link() function keeps a reference on the source fid object. This
causes a memory leak since the reference never goes down to 0. This patch
fixes the issue.
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Reviewed-by: Greg Kurz <groug@kaod.org>
[groug, rephrased the changelog]
Signed-off-by: Greg Kurz <groug@kaod.org>
The 'fs.xattr.value' field in V9fsFidState object doesn't consider the
situation that this field has been allocated previously. Every time, it
will be allocated directly. This leads to a host memory leak issue if
the client sends another Txattrcreate message with the same fid number
before the fid from the previous time got clunked.
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Reviewed-by: Greg Kurz <groug@kaod.org>
[groug, updated the changelog to indicate how the leak can occur]
Signed-off-by: Greg Kurz <groug@kaod.org>
9pfs uses g_malloc() to allocate the xattr memory space, if the guest
reads this memory before writing to it, this will leak host heap memory
to the guest. This patch avoid this.
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
Virtio devices should implement the VirtIODevice->reset() function to
perform necessary cleanup actions and to bring the device to a quiescent
state.
In the case of the virtio-9p device, this means:
- emptying the list of active PDUs (i.e. draining all in-flight I/O)
- freeing all fids (i.e. close open file descriptors and free memory)
That's what this patch does.
The reset handler first waits for all active PDUs to complete. Since
completion happens in the QEMU global aio context, we just have to
loop around aio_poll() until the active list is empty.
The freeing part involves some actions to be performed on the backend,
like closing file descriptors or flushing extended attributes to the
underlying filesystem. The virtfs_reset() function already does the
job: it calls free_fid() for all open fids not involved in an ongoing
I/O operation. We are sure this is the case since we have drained
the PDU active list.
The current code implements all backend accesses with coroutines, but we
want to stay synchronous on the reset path. We can either change the
current code to be able to run when not in coroutine context, or create
a coroutine context and wait for virtfs_reset() to complete. This patch
goes for the latter because it results in simpler code.
Note that we also need to create a dummy PDU because it is also an API
to pass the FsContext pointer to all backend callbacks.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
If a PDU has a flush request pending, the current code calls pdu_free()
twice:
1) pdu_complete()->pdu_free() with pdu->cancelled set, which does nothing
2) v9fs_flush()->pdu_free() with pdu->cancelled cleared, which moves the
PDU back to the free list.
This works but it complexifies the logic of pdu_free().
With this patch, pdu_complete() only calls pdu_free() if no flush request
is pending, i.e. qemu_co_queue_next() returns false.
Since pdu_free() is now supposed to be called with pdu->cancelled cleared,
the check in pdu_free() is dropped and replaced by an assertion.
Signed-off-by: Greg Kurz <groug@kaod.org>
All these functions either call the v9fs_co_* functions which have the
coroutine_fn annotation, or pdu_complete() which calls qemu_co_queue_next().
Let's mark them to make it obvious they execute in coroutine context.
Signed-off-by: Greg Kurz <groug@kaod.org>
All these functions use the v9fs_co_run_in_worker() macro, and thus always
call qemu_coroutine_self() and qemu_coroutine_yield().
Let's mark them to make it obvious they execute in coroutine context.
Signed-off-by: Greg Kurz <groug@kaod.org>
In 9pfs read dispatch function, it doesn't free two QEMUIOVector
object thus causing potential memory leak. This patch avoid this.
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Greg Kurz <groug@kaod.org>
If a guest sends an empty string paramater to any 9P operation, the current
code unmarshals it into a V9fsString equal to { .size = 0, .data = NULL }.
This is unfortunate because it can cause NULL pointer dereference to happen
at various locations in the 9pfs code. And we don't want to check str->data
everywhere we pass it to strcmp() or any other function which expects a
dereferenceable pointer.
This patch enforces the allocation of genuine C empty strings instead, so
callers don't have to bother.
Out of all v9fs_iov_vunmarshal() users, only v9fs_xattrwalk() checks if
the returned string is empty. It now uses v9fs_string_size() since
name.data cannot be NULL anymore.
Signed-off-by: Li Qiang <liqiang6-s@360.cn>
[groug, rewritten title and changelog,
fix empty string check in v9fs_xattrwalk()]
Signed-off-by: Greg Kurz <groug@kaod.org>
Now all the usages of the old version of VMSTATE_VIRTIO_DEVICE are gone,
so we can get rid of the conditionals, and the old macro.
Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Use the new VMSTATE_VIRTIO_DEVICE macro.
Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
A broken guest may send a request without providing buffers for the reply
or for the request itself, and virtqueue_pop() will return an element with
either in_num == 0 or out_num == 0.
All 9P requests are expected to start with the following 7-byte header:
uint32_t size_le;
uint8_t id;
uint16_t tag_le;
If iov_to_buf() fails to return these 7 bytes, then something is wrong in
the guest.
In both cases, it is wrong to crash QEMU, since the root cause lies in the
guest.
This patch hence does the following:
- keep the check of in_num since pdu_complete() assumes it has enough
space to store the reply and we will send something broken to the guest
- let iov_to_buf() handle out_num == 0, since it will return 0 just like
if the guest had provided an zero-sized buffer.
- call virtio_error() to inform the guest that the device is now broken,
instead of aborting
- detach the request from the virtqueue and free it
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
If the call to fid_to_qid() returns an error, we will call v9fs_path_free()
on uninitialized paths.
It is a regression introduced by the following commit:
56f101ecce 9pfs: handle walk of ".." in the root directory
Let's fix this by initializing dpath and path before calling fid_to_qid().
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
[groug: updated the changelog to indicate this is regression and to provide
the offending commit SHA1]
Signed-off-by: Greg Kurz <groug@kaod.org>
This helper is similar to v9fs_string_sprintf(), but it includes the
terminating NUL character in the size field.
This is to avoid doing v9fs_string_sprintf((V9fsString *) &path) and
then bumping the size.
Affected users are changed to use this new helper.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
The v9fs_string_null() function just calls v9fs_string_free(). Also it
only has 4 users, whereas v9fs_string_free() has 87.
This patch converts users to call directly v9fs_string_free() and drops
the useless function.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
This double free did not cause harm because v9fs_string_free() sets
str->data to NULL and g_free(NULL) is valid.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
The v9fs_request() function doesn't use its fmt argument: it passes literal
format strings to proxy_marshal() for all commands.
This patch simply drops the unused fmt argument and updates all callers
accordingly.
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
The 9P spec at http://man.cat-v.org/plan_9/5/intro says:
All directories must support walks to the directory .. (dot-dot) meaning
parent directory, although by convention directories contain no explicit
entry for .. or . (dot). The parent of the root directory of a server's
tree is itself.
This means that a client cannot walk further than the root directory
exported by the server. In other words, if the client wants to walk
"/.." or "/foo/../..", the server should answer like the request was
to walk "/".
This patch just does that:
- we cache the QID of the root directory at attach time
- during the walk we compare the QID of each path component with the root
QID to detect if we're in a "/.." situation
- if so, we skip the current component and go to the next one
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
According to the 9P spec http://man.cat-v.org/plan_9/5/open about the
create request:
The names . and .. are special; it is illegal to create files with these
names.
This patch causes the create and lcreate requests to fail with EINVAL if
the file name is either "." or "..".
Even if it isn't explicitly written in the spec, this patch extends the
checking to all requests that may cause a directory entry to be created:
- mknod
- rename
- renameat
- mkdir
- link
- symlink
The unlinkat request also gets patched for consistency (even if
rmdir("foo/..") is expected to fail according to POSIX.1-2001).
The various error values come from the linux manual pages.
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Empty path components don't make sense for most commands and may cause
undefined behavior, depending on the backend.
Also, the walk request described in the 9P spec [1] clearly shows that
the client is supposed to send individual path components: the official
linux client never sends portions of path containing the / character for
example.
Moreover, the 9P spec [2] also states that a system can decide to restrict
the set of supported characters used in path components, with an explicit
mention "to remove slashes from name components".
This patch introduces a new name_is_illegal() helper that checks the
names sent by the client are not empty and don't contain unwanted chars.
Since 9pfs is only supported on linux hosts, only the / character is
checked at the moment. When support for other hosts (AKA. win32) is added,
other chars may need to be blacklisted as well.
If a client sends an illegal path component, the request will fail and
ENOENT is returned to the client.
[1] http://man.cat-v.org/plan_9/5/walk
[2] http://man.cat-v.org/plan_9/5/intro
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Forcibly convert it to a vmstate wrapper; proper conversion
comes later.
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In practice the entry argument is always known at creation time, and
it is confusing that sometimes qemu_coroutine_enter is used with a
non-NULL argument to re-enter a coroutine (this happens in
block/sheepdog.c and tests/test-coroutine.c). So pass the opaque value
at creation time, for consistency with e.g. aio_bh_new.
Mostly done with the following semantic patch:
@ entry1 @
expression entry, arg, co;
@@
- co = qemu_coroutine_create(entry);
+ co = qemu_coroutine_create(entry, arg);
...
- qemu_coroutine_enter(co, arg);
+ qemu_coroutine_enter(co);
@ entry2 @
expression entry, arg;
identifier co;
@@
- Coroutine *co = qemu_coroutine_create(entry);
+ Coroutine *co = qemu_coroutine_create(entry, arg);
...
- qemu_coroutine_enter(co, arg);
+ qemu_coroutine_enter(co);
@ entry3 @
expression entry, arg;
@@
- qemu_coroutine_enter(qemu_coroutine_create(entry), arg);
+ qemu_coroutine_enter(qemu_coroutine_create(entry, arg));
@ reentry @
expression co;
@@
- qemu_coroutine_enter(co, NULL);
+ qemu_coroutine_enter(co);
except for the aforementioned few places where the semantic patch
stumbled (as expected) and for test_co_queue, which would otherwise
produce an uninitialized variable warning.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Header guard symbols should match their file name to make guard
collisions less likely. Offenders found with
scripts/clean-header-guards.pl -vn.
Cleaned up with scripts/clean-header-guards.pl, followed by some
renaming of new guard symbols picked by the script to better ones.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Move all trace-events for files in the hw/9pfs/ directory to
their own file.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1466066426-16657-26-git-send-email-berrange@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Remove glib.h includes, as it is provided by osdep.h.
This commit was created with scripts/clean-includes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Tested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
This patch changes the 9p code to use readdir() again instead of
readdir_r(), which is deprecated in glibc 2.24.
All the locking was put in place by a previous patch.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
If several threads concurrently call readdir() with the same directory
stream pointer, it is possible that they all get a pointer to the same
dirent structure, whose content is overwritten each time readdir() is
called.
We must thus serialize accesses to the dirent structure.
This may be achieved with a mutex like below:
lock_mutex();
readdir();
// work with the dirent
unlock_mutex();
This patch adds all the locking, to prepare the switch to readdir().
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
If we are to switch back to readdir(), we need a more complex type than
DIR * to be able to serialize concurrent accesses to the directory stream.
This patch introduces a placeholder type and fixes all users.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Most of the 9p code is now virtio agnostic. This patch does a final cleanup:
- drop references to Virtio from the header comments
- fix includes
Also drop a couple of leading empty lines while here.
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
The "9p-attr.h" header isn't needed by 9p synth and virtio 9p.
While here, also drop last references to virtio from 9p synth since it is
now transport agnostic code.
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Commit "ebac1202c95a virtio-9p: use QEMU thread pool" dropped function
v9fs_init_worker_threads.
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Move declarations out of qemu-common.h for functions declared in
utils/ files: e.g. include/qemu/path.h for utils/path.c.
Move inline functions out of qemu-common.h and into new files (e.g.
include/qemu/bcd.h)
Signed-off-by: Veronia Bahaa <veroniabahaa@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit 57cb38b included qapi/error.h into qemu/osdep.h to get the
Error typedef. Since then, we've moved to include qemu/osdep.h
everywhere. Its file comment explains: "To avoid getting into
possible circular include dependencies, this file should not include
any other QEMU headers, with the exceptions of config-host.h,
compiler.h, os-posix.h and os-win32.h, all of which are doing a
similar job to this file and are under similar constraints."
qapi/error.h doesn't do a similar job, and it doesn't adhere to
similar constraints: it includes qapi-types.h. That's in excess of
100KiB of crap most .c files don't actually need.
Add the typedef to qemu/typedefs.h, and include that instead of
qapi/error.h. Include qapi/error.h in .c files that need it and don't
get it now. Include qapi-types.h in qom/object.h for uint16List.
Update scripts/clean-includes accordingly. Update it further to match
reality: replace config.h by config-target.h, add sysemu/os-posix.h,
sysemu/os-win32.h. Update the list of includes in the qemu/osdep.h
comment quoted above similarly.
This reduces the number of objects depending on qapi/error.h from "all
of them" to less than a third. Unfortunately, the number depending on
qapi-types.h shrinks only a little. More work is needed for that one.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
[Fix compilation without the spice devel packages. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.
This commit was created with scripts/clean-includes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
The return code of virtqueue_pop/vring_pop is unused except to check for
errors or 0. We can thus easily move allocation inside the functions
and just return a pointer to the VirtQueueElement.
The advantage is that we will be able to allocate only the space that
is needed for the actual size of the s/g list instead of the full
VIRTQUEUE_MAX_SIZE items. Currently VirtQueueElement takes about 48K
of memory, and this kind of allocation puts a lot of stress on malloc.
By cutting the size by two or three orders of magnitude, malloc can
use much more efficient algorithms.
The patch is pretty large, but changes to each device are testable
more or less independently. Splitting it would mostly add churn.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.
This commit was created with scripts/clean-includes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1453832250-766-18-git-send-email-peter.maydell@linaro.org
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.
This commit was created with scripts/clean-includes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1453832250-766-15-git-send-email-peter.maydell@linaro.org
V9fsState now only contains generic fields. Introduce V9fsVirtioState
for virtio transport. Change virtio-pci and virtio-ccw to use
V9fsVirtioState.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Now that file only contains generic code.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
The new function resides in virtio specific file.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Move out generic definitions from virtio-9p.h to 9p.h. Fix header
inclusions.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Factor out v9fs_iov_v{,un}marshal. Implement pdu_{,un}marshal with those
functions.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
This matches naming convention of pdu_marshal and pdu_unmarshal.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
V9fsState can be referenced by pdu->s. Initialise that in device
realization function.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
And rename v9fs_marshal to v9fs_iov_marshal, v9fs_unmarshal to
v9fs_iov_unmarshal.
The rationale behind this change is that, this marshalling interface is
used both by virtio and proxy helper. Renaming files and functions to
reflect the true nature of this interface.
Xen transport is going to have its own marshalling interface.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Some structures in virtio-9p.h have been unused since 2011 when relevant
functions switched to use coroutines.
The declaration of pdu_packunpack and function do_pdu_unpack are
useless.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
The deleted file only contained V9fsConf which wasn't virtio specific.
Merge that to the general header of 9pfs.
Fixed header inclusions as I went along.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
These three files are not virtio specific. Rename them to generic
names.
Fix comments and header inclusion in various files.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
These two files are not virtio specific. Rename them to use generic
names.
Fix includes in various C files. Change define guards and comments
in header files.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Those two files are not virtio specific. Rename them to use generic
names.
Fix includes in various C files. Change define guards and comments
in header files.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
This file is not virtio specific. Rename it to use generic name.
Fix comment and remove unneeded inclusion of virtio.h.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
This file is not virtio specific. Rename it to use generic name.
Fix comment and remove unneeded inclusion of virtio.h.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
This file is not virtio specific. Rename it to use generic name.
Fix comment and remove unneeded inclusion of virtio.h.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Those two files are not virtio specific. Rename them to use generic
names.
Fix includes in various C files. Change define guards and comments in
header files.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
The aio_context_new() function does not allocate a thread pool. This is
deferred to the first call to the aio_get_thread_pool() accessor. It is
hence forbidden to access the thread_pool field directly, as it may be
NULL. The accessor *must* be used always.
Fixes: ebac1202c9
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Tested-by: Michael Tokarev <mjt@tls.msk.ru>
Cc: qemu-stable@nongnu.org
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Since commit 4652f1640e "virtio-9p: add savevm
handlers", if the user hot-unplugs a quiescent 9p device and live
migrates, the source QEMU crashes before migration completetion...
This happens because virtio-9p devices have a realize handler which
calls virtio_init() and register_savevm(). Both calls store pointers
to the device internals, that get dereferenced during migration even
if the device got unplugged.
This patch simply adds an unrealize handler to perform minimal
cleanup and avoid the crash. Hot unplug of non-quiescent 9p devices
is still not supported in QEMU, and not supported by linux guests
either.
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 20151208155457.27775.69441.stgit@bahia.huguette.org
[PMM: rewrapped long lines in commit message]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The QEMU thread pool already has a mechanism to invoke callbacks in the main
thread. It does not need an EventNotifier and it is more efficient too.
Use it instead of GAsyncQueue + GThreadPool + glue.
As a side effect, it silences Coverity's complaint about an unchecked
return value for event_notifier_init.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
(removed no more needed #include <glib.h> from virtio-9p-coth.h)
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
We don't support migration of mounted 9p shares. This is handled by a
migration blocker.
One would expect, however, to be able to migrate if the share is unmounted.
Unfortunately virtio-9p-device does not register savevm handlers at all !
Migration succeeds and leaves the guest with a dangling device...
This patch simply registers migration handlers for virtio-9p-device. Whether
migration is possible or not still depends on the migration blocker.
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The coroutine files are currently referenced by the block-obj-y
variable. The coroutine functionality though is already used by
more than just the block code. eg migration code uses coroutine
yield. In the future the I/O channel code will also use the
coroutine yield functionality. Since the coroutine code is nicely
self-contained it can be easily built as part of the libqemuutil.a
library, making it widely available.
The headers are also moved into include/qemu, instead of the
include/block directory, since they are now part of the util
codebase, and the impl was never in the block/ directory
either.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
virtio pci allows any device to have a modern interface,
this in turn requires ANY_LAYOUT support.
Fix up ANY_LAYOUT for virtio-9p.
Reported-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Jason Wang <jasowang@redhat.com>