commit d78092e493 upstream.
After unlock_request() pages from the ap->pages[] array may be put (e.g. by
aborting the connection) and the pages can be freed.
Prevent use after free by grabbing a reference to the page before calling
unlock_request().
The original patch was created by Pradeep P V K.
Reported-by: Pradeep P V K <ppvk@codeaurora.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 933a3752ba ]
the callers rely upon having any iov_iter_truncate() done inside
->direct_IO() countered by iov_iter_reexpand().
Reported-by: Qian Cai <cai@redhat.com>
Tested-by: Qian Cai <cai@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5ddd9ced9a ]
A GETATTR request can race with FUSE_NOTIFY_INVAL_INODE, resulting in the
attribute cache being updated with stale information after the
invalidation.
Fix this by bumping the attribute version in fuse_reverse_inval_inode().
Reported-by: Krzysztof Rusek <rusek@9livesdata.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 32f98877c5 ]
page_count() is unstable. Unless there has been an RCU grace period
between when the page was removed from the page cache and now, a
speculative reference may exist from the page cache.
Reported-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 31070f6cce upstream.
The ioctl encoding for this parameter is a long but the documentation says
it should be an int and the kernel drivers expect it to be an int. If the
fuse driver treats this as a long it might end up scribbling over the stack
of a userspace process that only allocated enough space for an int.
This was previously discussed in [1] and a patch for fuse was proposed in
[2]. From what I can tell the patch in [2] was nacked in favor of adding
new, "fixed" ioctls and using those from userspace. However there is still
no "fixed" version of these ioctls and the fact is that it's sometimes
infeasible to change all userspace to use the new one.
Handling the ioctls specially in the fuse driver seems like the most
pragmatic way for fuse servers to support them without causing crashes in
userspace applications that call them.
[1]: https://lore.kernel.org/linux-fsdevel/20131126200559.GH20559@hall.aurel32.net/T/
[2]: https://sourceforge.net/p/fuse/mailman/message/31771759/
Signed-off-by: Chirantan Ekbote <chirantan@chromium.org>
Fixes: 59efec7b90 ("fuse: implement ioctl support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0189a2d367 upstream.
s_op->remount_fs() is only called from legacy_reconfigure(), which is not
used after being converted to the new API.
Convert to using ->reconfigure(). This restores the previous behavior of
syncing the filesystem and rejecting MS_MANDLOCK on remount.
Fixes: c30da2e981 ("fuse: convert to use the new mount API")
Cc: <stable@vger.kernel.org> # v5.4
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e8b20a474c upstream.
The command
mount -o remount -o unknownoption /mnt/fuse
succeeds on kernel versions prior to v5.4 and fails on kernel version at or
after. This is because fuse_parse_param() rejects any unrecognised options
in case of FS_CONTEXT_FOR_RECONFIGURE, just as for FS_CONTEXT_FOR_MOUNT.
This causes a regression in case the fuse filesystem is in fstab, since
remount sends all options found there to the kernel; even ones that are
meant for the initial mount and are consumed by the userspace fuse server.
Fix this by ignoring mount options, just as fuse_remount_fs() did prior to
the conversion to the new API.
Reported-by: Stefan Priebe <s.priebe@profihost.ag>
Fixes: c30da2e981 ("fuse: convert to use the new mount API")
Cc: <stable@vger.kernel.org> # v5.4
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7779b047a5 ]
fuse_writepages() ignores some errors taken from fuse_writepages_fill() I
believe it is a bug: if .writepages is called with WB_SYNC_ALL it should
either guarantee that all data was successfully saved or return error.
Fixes: 26d614df1d ("fuse: Implement writepages callback")
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9b46418c40 ]
After the copy operation completes the cache is not up-to-date. Truncate
all pages in the interval that has successfully been copied.
Truncating completely copied dirty pages is okay, since the data has been
overwritten anyway. Truncating partially copied dirty pages is not okay;
add a comment for now.
Fixes: 88bc7d5097 ("fuse: add support for copy_file_range()")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2c4656dfd9 ]
a) Dirty cache needs to be written back not just in the writeback_cache
case, since the dirty pages may come from memory maps.
b) The fuse_writeback_range() helper takes an inclusive interval, so the
end position needs to be pos+len-1 instead of pos+len.
Fixes: 88bc7d5097 ("fuse: add support for copy_file_range()")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bb737bbe48 ]
In virtiofs (unlike in regular fuse) processing of async replies is
serialized. This can result in a deadlock in rare corner cases when
there's a circular dependency between the completion of two or more async
replies.
Such a deadlock can be reproduced with xfstests:generic/503 if TEST_DIR ==
SCRATCH_MNT (which is a misconfiguration):
- Process A is waiting for page lock in worker thread context and blocked
(virtio_fs_requests_done_work()).
- Process B is holding page lock and waiting for pending writes to
finish (fuse_wait_on_page_writeback()).
- Write requests are waiting in virtqueue and can't complete because
worker thread is blocked on page lock (process A).
Fix this by creating a unique work_struct for each async reply that can
block (O_DIRECT read).
Fixes: a62a8ef9d9 ("virtio-fs: add virtiofs filesystem")
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 3e8cb8b2ea upstream.
Normal, synchronous requests will have their args allocated on the stack.
After the FR_FINISHED bit is set by receiving the reply from the userspace
fuse server, the originating task may return and reuse the stack frame,
resulting in an Oops if the args structure is dereferenced.
Fix by setting a flag in the request itself upon initializing, indicating
whether it has an asynchronous ->end() callback.
Reported-by: Kyle Sanderson <kyle.leet@gmail.com>
Reported-by: Michael Stapelberg <michael+lkml@stapelberg.ch>
Fixes: 2b319d1f6f ("fuse: don't dereference req->args on finished request")
Cc: <stable@vger.kernel.org> # v5.4
Tested-by: Michael Stapelberg <michael+lkml@stapelberg.ch>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2f1398291b ]
Handle the special case of fuse_readpages() wanting to read the last page
of a hugest file possible and overflowing the end offset in the process.
This is basically to unbreak xfstests:generic/525 and prevent filesystems
from doing bad things with an overflowing offset.
Reported-by: Xiao Yang <ice_yangxiao@163.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit f658adeea4 upstream.
fuse_direct_io() can end up advancing the iterator by more than the amount
of data read or written. This case is handled by the generic code if going
through ->direct_IO(), but not in the FOPEN_DIRECT_IO case.
Fix by reverting the extra bytes from the iterator in case of error or a
short count.
To test: install lxcfs, then the following testcase
int fd = open("/var/lib/lxcfs/proc/uptime", O_RDONLY);
sendfile(1, fd, NULL, 16777216);
sendfile(1, fd, NULL, 16777216);
will spew WARN_ON() in iov_iter_pipe().
Reported-by: Peter Geis <pgwipeout@gmail.com>
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Fixes: 3c3db095b6 ("fuse: use iov_iter based generic splice helpers")
Cc: <stable@vger.kernel.org> # v5.1
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7df1e988c7 upstream.
Buffered read in fuse normally goes via:
-> generic_file_buffered_read()
-> fuse_readpages()
-> fuse_send_readpages()
->fuse_simple_request() [called since v5.4]
In the case of a read request, fuse_simple_request() will return a
non-negative bytecount on success or a negative error value. A positive
bytecount was taken to be an error and the PG_error flag set on the page.
This resulted in generic_file_buffered_read() falling back to ->readpage(),
which would repeat the read request and succeed. Because of the repeated
read succeeding the bug was not detected with regression tests or other use
cases.
The FTP module in GVFS however fails the second read due to the
non-seekable nature of FTP downloads.
Fix by checking and ignoring positive return value from
fuse_simple_request().
Reported-by: Ondrej Holy <oholy@redhat.com>
Link: https://gitlab.gnome.org/GNOME/gvfs/issues/441
Fixes: 134831e36b ("fuse: convert readpages to simple api")
Cc: <stable@vger.kernel.org> # v5.4
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit eb59bd17d2 upstream.
If a filesystem returns negative inode sizes, future reads on the file were
causing the cpu to spin on truncate_pagecache.
Create a helper to validate the attributes. This now does two things:
- check the file mode
- check if the file size fits in i_size without overflowing
Reported-by: Arijit Banerjee <arijit@rubrik.com>
Fixes: d8a5ba4545 ("[PATCH] FUSE - core")
Cc: <stable@vger.kernel.org> # v2.6.14
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c634da718d upstream.
When adding a new hard link, make sure that i_nlink doesn't overflow.
Fixes: ac45d61357 ("fuse: fix nlink after unlink")
Cc: <stable@vger.kernel.org> # v3.4
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f1ebdeffc6 upstream.
exit_aio() is sometimes stuck in wait_for_completion() after aio is issued
with direct IO and the task receives a signal.
The reason is failure to call ->ki_complete() due to a leaked reference to
fuse_io_priv. This happens in fuse_async_req_send() if
fuse_simple_background() returns an error (e.g. -EINTR).
In this case the error value is propagated via io->err, so return success
to not confuse callers.
This issue is tracked as a virtio-fs issue:
https://gitlab.com/virtio-fs/qemu/issues/14
Reported-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Fixes: 45ac96ed7c ("fuse: convert direct_io to simple api")
Cc: <stable@vger.kernel.org> # v5.4
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently fuse_writepages_fill() calls get_fuse_inode() few times with
the same argument.
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Make sure cached writes are not reordered around open(..., O_TRUNC), with
the obvious wrong results.
Fixes: 4d99ff8f12 ("fuse: Turn writeback cache on")
Cc: <stable@vger.kernel.org> # v3.15+
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
If writeback cache is enabled, then writes might get reordered with
chmod/chown/utimes. The problem with this is that performing the write in
the fuse daemon might itself change some of these attributes. In such case
the following sequence of operations will result in file ending up with the
wrong mode, for example:
int fd = open ("suid", O_WRONLY|O_CREAT|O_EXCL);
write (fd, "1", 1);
fchown (fd, 0, 0);
fchmod (fd, 04755);
close (fd);
This patch fixes this by flushing pending writes before performing
chown/chmod/utimes.
Reported-by: Giuseppe Scrivano <gscrivan@redhat.com>
Tested-by: Giuseppe Scrivano <gscrivan@redhat.com>
Fixes: 4d99ff8f12 ("fuse: Turn writeback cache on")
Cc: <stable@vger.kernel.org> # v3.15+
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes gcc '-Wunused-but-set-variable' warning:
fs/fuse/virtio_fs.c: In function virtio_fs_wake_pending_and_unlock:
fs/fuse/virtio_fs.c:983:20: warning: variable fc set but not used [-Wunused-but-set-variable]
It is not used since commit 7ee1e2e631 ("virtiofs: No need to check
fpq->connected state")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
If regular request queue gets full, currently we sleep for a bit and
retrying submission in submitter's context. This assumes submitter is not
holding any spin lock. But this assumption is not true for background
requests. For background requests, we are called with fc->bg_lock held.
This can lead to deadlock where one thread is trying submission with
fc->bg_lock held while request completion thread has called
fuse_request_end() which tries to acquire fc->bg_lock and gets blocked. As
request completion thread gets blocked, it does not make further progress
and that means queue does not get empty and submitter can't submit more
requests.
To solve this issue, retry submission with the help of a worker, instead of
retrying in submitter's context. We already do this for hiprio/forget
requests.
Reported-by: Chirantan Ekbote <chirantan@chromium.org>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
If virtqueue is full, we put forget requests on a list and these forgets
are dispatched later using a worker. As of now we don't count these forgets
in fsvq->in_flight variable. This means when queue is being drained, we
have to have special logic to first drain these pending requests and then
wait for fsvq->in_flight to go to zero.
By counting pending forgets in fsvq->in_flight, we can get rid of special
logic and just wait for in_flight to go to zero. Worker thread will kick
and drain all the forgets anyway, leading in_flight to zero.
I also need similar logic for normal request queue in next patch where I am
about to defer request submission in the worker context if queue is full.
This simplifies the code a bit.
Also add two helper functions to inc/dec in_flight. Decrement in_flight
helper will later used to call completion when in_flight reaches zero.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
FR_SENT flag should be set when request has been sent successfully sent
over virtqueue. This is used by interrupt logic to figure out if interrupt
request should be sent or not.
Also add it to fqp->processing list after sending it successfully.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
In virtiofs we keep per queue connected state in virtio_fs_vq->connected
and use that to end request if queue is not connected. And virtiofs does
not even touch fpq->connected state.
We probably need to merge these two at some point of time. For now,
simplify the code a bit and do not worry about checking state of
fpq->connected.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Submission context can hold some locks which end request code tries to hold
again and deadlock can occur. For example, fc->bg_lock. If a background
request is being submitted, it might hold fc->bg_lock and if we could not
submit request (because device went away) and tried to end request, then
deadlock happens. During testing, I also got a warning from deadlock
detection code.
So put requests on a list and end requests from a worker thread.
I got following warning from deadlock detector.
[ 603.137138] WARNING: possible recursive locking detected
[ 603.137142] --------------------------------------------
[ 603.137144] blogbench/2036 is trying to acquire lock:
[ 603.137149] 00000000f0f51107 (&(&fc->bg_lock)->rlock){+.+.}, at: fuse_request_end+0xdf/0x1c0 [fuse]
[ 603.140701]
[ 603.140701] but task is already holding lock:
[ 603.140703] 00000000f0f51107 (&(&fc->bg_lock)->rlock){+.+.}, at: fuse_simple_background+0x92/0x1d0 [fuse]
[ 603.140713]
[ 603.140713] other info that might help us debug this:
[ 603.140714] Possible unsafe locking scenario:
[ 603.140714]
[ 603.140715] CPU0
[ 603.140716] ----
[ 603.140716] lock(&(&fc->bg_lock)->rlock);
[ 603.140718] lock(&(&fc->bg_lock)->rlock);
[ 603.140719]
[ 603.140719] *** DEADLOCK ***
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
If the FUSE_READDIRPLUS_AUTO feature is enabled, then lookups on a
directory before/during readdir are used as an indication that READDIRPLUS
should be used instead of READDIR. However if the lookup turns out to be
negative, then selecting READDIRPLUS makes no sense.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Move the check for async request after check for the request being already
finished and done with.
Reported-by: syzbot+ae0bb7aae3de6b4594e2@syzkaller.appspotmail.com
Fixes: d49937749f ("fuse: stop copying args to fuse_req")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Virtio-fs does not accept any mount options, so it's confusing and wrong to
show any in /proc/mounts.
Reported-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
We have been calling it virtio_fs and even file name is virtio_fs.c. Module
name is virtio_fs.ko but when registering file system user is supposed to
specify filesystem type as "virtiofs".
Masayoshi Mizuma reported that he specified filesytem type as "virtio_fs"
and got this warning on console.
------------[ cut here ]------------
request_module fs-virtio_fs succeeded, but still no fs?
WARNING: CPU: 1 PID: 1234 at fs/filesystems.c:274 get_fs_type+0x12c/0x138
Modules linked in: ... virtio_fs fuse virtio_net net_failover ...
CPU: 1 PID: 1234 Comm: mount Not tainted 5.4.0-rc1 #1
So looks like kernel could find the module virtio_fs.ko but could not find
filesystem type after that.
It probably is better to rename module name to virtiofs.ko so that above
warning goes away in case user ends up specifying wrong fs name.
Reported-by: Masayoshi Mizuma <msys.mizuma@gmail.com>
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCXYx2zAAKCRDh3BK/laaZ
PFpHAQD2G+F8a9e41jFTJg5YpNKMD8/Pl4T6v9chIO9qPXF2IAEAji0P1JterRfv
ixiBhv54hSwYbk527nxNWE9tP5gAHAQ=
=WCHy
-----END PGP SIGNATURE-----
Merge tag 'virtio-fs-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse virtio-fs support from Miklos Szeredi:
"Virtio-fs allows exporting directory trees on the host and mounting
them in guest(s).
This isn't actually a new filesystem, but a glue layer between the
fuse filesystem and a virtio based back-end.
It's similar in functionality to the existing virtio-9p solution, but
significantly faster in benchmarks and has better POSIX compliance.
Further permformance improvements can be achieved by sharing the page
cache between host and guest, allowing for faster I/O and reduced
memory use.
Kata Containers have been including the out-of-tree virtio-fs (with
the shared page cache patches as well) since version 1.7 as an
experimental feature. They have been active in development and plan to
switch from virtio-9p to virtio-fs as their default solution. There
has been interest from other sources as well.
The userspace infrastructure is slated to be merged into qemu once the
kernel part hits mainline.
This was developed by Vivek Goyal, Dave Gilbert and Stefan Hajnoczi"
* tag 'virtio-fs-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
virtio-fs: add virtiofs filesystem
virtio-fs: add Documentation/filesystems/virtiofs.rst
fuse: reserve values for mapping protocol
Fix sparse warning:
fs/fuse/dev.c:468:6: warning: symbol 'fuse_args_to_req' was not declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Fixes: 68583165f9 ("fuse: add pages to fuse_args")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
This function has been made static, which now causes a compile-time
warning:
WARNING: "fuse_put_request" [vmlinux] is a static EXPORT_SYMBOL_GPL
Remove the unneeded export.
Fixes: 66abc3599c ("fuse: unexport request ops")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
account per-file, dentry, and inode data
blockdev/superblock and temporary per-request data was left alone, as
this usually isn't accounted
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Implements the optimization noted in commit f75fdf22b0 ("fuse: don't
use ->d_time"), as the additional memory can be significant. (In
particular, on SLAB configurations this 8-byte alloc becomes 32 bytes).
Per-dentry, this can consume significant memory.
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
unlock_page() was missing in case of an already in-flight write against the
same page.
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Fixes: ff17be0864 ("fuse: writepage: skip already in flight")
Cc: <stable@vger.kernel.org> # v3.13
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Add a basic file system module for virtio-fs. This does not yet contain
shared data support between host and guest or metadata coherency speedups.
However it is already significantly faster than virtio-9p.
Design Overview
===============
With the goal of designing something with better performance and local file
system semantics, a bunch of ideas were proposed.
- Use fuse protocol (instead of 9p) for communication between guest and
host. Guest kernel will be fuse client and a fuse server will run on
host to serve the requests.
- For data access inside guest, mmap portion of file in QEMU address space
and guest accesses this memory using dax. That way guest page cache is
bypassed and there is only one copy of data (on host). This will also
enable mmap(MAP_SHARED) between guests.
- For metadata coherency, there is a shared memory region which contains
version number associated with metadata and any guest changing metadata
updates version number and other guests refresh metadata on next access.
This is yet to be implemented.
How virtio-fs differs from existing approaches
==============================================
The unique idea behind virtio-fs is to take advantage of the co-location of
the virtual machine and hypervisor to avoid communication (vmexits).
DAX allows file contents to be accessed without communication with the
hypervisor. The shared memory region for metadata avoids communication in
the common case where metadata is unchanged.
By replacing expensive communication with cheaper shared memory accesses,
we expect to achieve better performance than approaches based on network
file system protocols. In addition, this also makes it easier to achieve
local file system semantics (coherency).
These techniques are not applicable to network file system protocols since
the communications channel is bypassed by taking advantage of shared memory
on a local machine. This is why we decided to build virtio-fs rather than
focus on 9P or NFS.
Caching Modes
=============
Like virtio-9p, different caching modes are supported which determine the
coherency level as well. The “cache=FOO” and “writeback” options control
the level of coherence between the guest and host filesystems.
- cache=none
metadata, data and pathname lookup are not cached in guest. They are
always fetched from host and any changes are immediately pushed to host.
- cache=always
metadata, data and pathname lookup are cached in guest and never expire.
- cache=auto
metadata and pathname lookup cache expires after a configured amount of
time (default is 1 second). Data is cached while the file is open
(close to open consistency).
- writeback/no_writeback
These options control the writeback strategy. If writeback is disabled,
then normal writes will immediately be synchronized with the host fs.
If writeback is enabled, then writes may be cached in the guest until
the file is closed or an fsync(2) performed. This option has no effect
on mmap-ed writes or writes going through the DAX mechanism.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
virtio-fs does not support aborting requests which are being
processed. That is requests which have been sent to fuse daemon on host.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Don't hold onto dentry in lru list if need to re-lookup it anyway at next
access. Only do this if explicitly enabled, otherwise it could result in
performance regression.
More advanced version of this patch would periodically flush out dentries
from the lru which have gone stale.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
As of now fuse_dev_alloc() both allocates a fuse device and installs it in
fuse_conn list. fuse_dev_alloc() can fail if fuse_device allocation fails.
virtio-fs needs to initialize multiple fuse devices (one per virtio queue).
It initializes one fuse device as part of call to fuse_fill_super_common()
and rest of the devices are allocated and installed after that.
But, we can't afford to fail after calling fuse_fill_super_common() as we
don't have a way to undo all the actions done by fuse_fill_super_common().
So to avoid failures after the call to fuse_fill_super_common(),
pre-allocate all fuse devices early and install them into fuse connection
later.
This patch provides two separate helpers for fuse device allocation and
fuse device installation in fuse_conn.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
The /dev/fuse device uses fiq->waitq and fasync to signal that requests are
available. These mechanisms do not apply to virtio-fs. This patch
introduces callbacks so alternative behavior can be used.
Note that queue_interrupt() changes along these lines:
spin_lock(&fiq->waitq.lock);
wake_up_locked(&fiq->waitq);
+ kill_fasync(&fiq->fasync, SIGIO, POLL_IN);
spin_unlock(&fiq->waitq.lock);
- kill_fasync(&fiq->fasync, SIGIO, POLL_IN);
Since queue_request() and queue_forget() also call kill_fasync() inside
the spinlock this should be safe.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
fuse_fill_super() includes code to process the fd= option and link the
struct fuse_dev to the fd's struct file. In virtio-fs there is no file
descriptor because /dev/fuse is not used.
This patch extracts fuse_fill_super_common() so that both classic fuse and
virtio-fs can share the code to initialize a mount.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
File systems like virtio-fs need to do not have to play directly with
forget list data structures. There is a helper function use that instead.
Rename dequeue_forget() to fuse_dequeue_forget() and export it so that
stacked filesystems can use it.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
virtio-fs will need unique IDs for FORGET requests from outside
fs/fuse/dev.c. Make the symbol visible.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>