Commit Graph

74641 Commits

Author SHA1 Message Date
zhenwei pi 600d7b47e8 pvpanic: introduce crashloaded for pvpanic
Add bit 1 for pvpanic. This bit means that guest hits a panic, but
guest wants to handle error by itself. Typical case: Linux guest runs
kdump in panic. It will help us to separate the abnormal reboot from
normal operation.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Message-Id: <20200114023102.612548-2-pizhenwei@bytedance.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24 20:59:07 +01:00
Greg Kurz bc9888f759 cpu: Use cpu_class_set_parent_reset()
Convert all targets to use cpu_class_set_parent_reset() with the following
coccinelle script:

@@
type CPUParentClass;
CPUParentClass *pcc;
CPUClass *cc;
identifier parent_fn;
identifier child_fn;
@@
+cpu_class_set_parent_reset(cc, child_fn, &pcc->parent_fn);
-pcc->parent_fn = cc->reset;
...
-cc->reset = child_fn;

Signed-off-by: Greg Kurz <groug@kaod.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Message-Id: <157650847817.354886.7047137349018460524.stgit@bahia.lan>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24 20:59:06 +01:00
Greg Kurz ef0a6249a8 cpu: Introduce cpu_class_set_parent_reset()
Similarly to what we already do with qdev, use a helper to overload the
reset QOM methods of the parent in children classes, for clarity.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <157650847239.354886.2782881118916307978.stgit@bahia.lan>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24 20:59:06 +01:00
Peter Maydell ba2ed84fe6 RISC-V Patches for the 5.0 Soft Freeze, Part 1
This patch set contains a handful of collected fixes that I'd like to target
 for the 5.0 soft freeze (I know that's a long way away, I just don't know what
 else to call these):
 
 * A fix for a memory leak initializing the sifive_u board.
 * Fixes to privilege mode emulation related to interrupts and fstatus.
 
 Notably absent is the H extension implementation.  That's pretty much reviewed,
 but not quite ready to go yet and I didn't want to hold back these important
 fixes.  This boots 32-bit and 64-bit Linux (buildroot this time, just for fun)
 and passes "make check".
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAl4ngWATHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYiT4eD/450dJ8tGRJKA7V/XEYFM4yYZ87cgsE
 HSF3E4lOTdjqp+wNwag2P1uIFJO1snqAa+6qwQFLsPRtGwn43hQzTbay86L7sPK8
 YXL143OaQz0jUtcmEyTJ2EczOti4bVhX+gy9T4NckvsteSJHnHbMmdqfaafwVlmy
 Qrx2IMAwY8k7+Dfil2INZTp4jlJ6ibe0XwziiMM+5RwTL2a30vYAREkRU3wrF0UN
 87BVp5jA5ytiOAtJoniD6Yh9zKgDhkQwv6orJUbIseAWhJUvVVoMg4IIdF11Hert
 FoqFZy6dUbIAeCTwEtHfvgJJrA/2Mn2Go1uLG22ToStp8Vq9VGzq29MN7Zx4fM8J
 22j3SOwOzUVGl0bSsSktfQ+QqJEMXRfWGzq4/al0JCjEx68DMKkVW3V5P4H8FP5Z
 7ZlBvf9NxaZeXLFOavZgW0fQBdVKMjb0Uk2QXWbe45TdMOPQ7BT8WGRogb0MhOyW
 t8iyKPCaYfF0NygOSCUaRdZ7Ng2NkAvppyUiAFccmttlPbaYSui306yMIJ9Ls2z1
 3pny3gwdMTIy0/86yEWdi6gffVgUR/o9bhwRDjMQpJU21cAHn06zJjPM4hnPk83Z
 j0/5JTQFWJzMO7xML9btMUV/bdPp5nJmI2ljQgkgaT92Qi3e9x4je09Duky5R2Bl
 XbGrB9GzcDjAtw==
 =WooQ
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/palmer/tags/riscv-for-master-5.0-sf1' into staging

RISC-V Patches for the 5.0 Soft Freeze, Part 1

This patch set contains a handful of collected fixes that I'd like to target
for the 5.0 soft freeze (I know that's a long way away, I just don't know what
else to call these):

* A fix for a memory leak initializing the sifive_u board.
* Fixes to privilege mode emulation related to interrupts and fstatus.

Notably absent is the H extension implementation.  That's pretty much reviewed,
but not quite ready to go yet and I didn't want to hold back these important
fixes.  This boots 32-bit and 64-bit Linux (buildroot this time, just for fun)
and passes "make check".

# gpg: Signature made Tue 21 Jan 2020 22:55:28 GMT
# gpg:                using RSA key 2B3C3747446843B24A943A7A2E1319F35FBB1889
# gpg:                issuer "palmer@dabbelt.com"
# gpg: Good signature from "Palmer Dabbelt <palmer@dabbelt.com>" [unknown]
# gpg:                 aka "Palmer Dabbelt <palmer@sifive.com>" [unknown]
# gpg:                 aka "Palmer Dabbelt <palmerdabbelt@google.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 00CE 76D1 8349 60DF CE88  6DF8 EF4C A150 2CCB AB41
#      Subkey fingerprint: 2B3C 3747 4468 43B2 4A94  3A7A 2E13 19F3 5FBB 1889

* remotes/palmer/tags/riscv-for-master-5.0-sf1:
  target/riscv: update mstatus.SD when FS is set dirty
  target/riscv: fsd/fsw doesn't dirty FP state
  target/riscv: Fix tb->flags FS status
  riscv: Set xPIE to 1 after xRET
  riscv/sifive_u: fix a memory leak in soc_realize()

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-01-24 12:34:04 +00:00
Peter Maydell a43efa34c7 virtiofsd first pull v2
Import our virtiofsd.
 This pulls in the daemon to drive a file system connected to the
 existing qemu virtiofsd device.
 It's derived from upstream libfuse with lots of changes (and a lot
 trimmed out).
 The daemon lives in the newly created qemu/tools/virtiofsd
 
 Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
 
 v2
   drop the docs while we discuss where they should live
   and we need to redo the manpage in anything but texi
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEERfXHG0oMt/uXep+pBRYzHrxb/ecFAl4pzZ4ACgkQBRYzHrxb
 /eelUg//evho+RwlOK4TjOjLJyGfMqZDQO5TFR2S2NmiCP7makND4BWll2A5Zu26
 oqzZw5pcsKZmpYJ81tqe1bnlCa6SCUx6cNGt+5n2cA0MYSjpPDeB2OegjS57NUoE
 eGXXIE7GOrGShHx1fW7BuA3Pi0hmFSHGRHCs006WsVktb1rP1w+7/NBohgLFkYob
 fqytP/K9ACEySETPGgDUEh6ZmmalrY1WeD+a11RZstOSA+2YhR3WbyN0z8fc6lCE
 puFHNEs2L0zVIUicSyJ4ux9+rbxdZIelLD91mGZhxrWy0H0AIox4bYURUJlbajI7
 Yl/FInQRMhStsKn3UN25MSYgGS8ZAM3IcG605vrC4HoQh9r8mVC/H19buWFCycvL
 1naK6LTqFkL0igAKTeg+DUk3tNP3i+j8JaMnopvKIfEHwV1lpVEVHI7zUynBA85d
 2xfOllkJreFtniYg5nfdfhVixKHLAId0x9ZvYw3wefLDF3ugXLHbrtj0hPcJiAny
 TINAzZCbxZsCEdZsrPq4Ldf7Pmb8vI8pxJVsoD28gRcHNRQvPWef07mtW370IAdJ
 SJXWLlsFh/rPJx51lVIMQf6d4qLePyHfB81VQ25qlrS5CW87XMmTyr5rngGFlJ2e
 vJnMb+DgwG1gf+HV4W5Y3l0wehou5GxgbLT478s+r3YzfdV13d4=
 =UUVN
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/dgilbert-gitlab/tags/pull-virtiofs-20200123b' into staging

virtiofsd first pull v2

Import our virtiofsd.
This pulls in the daemon to drive a file system connected to the
existing qemu virtiofsd device.
It's derived from upstream libfuse with lots of changes (and a lot
trimmed out).
The daemon lives in the newly created qemu/tools/virtiofsd

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

v2
  drop the docs while we discuss where they should live
  and we need to redo the manpage in anything but texi

# gpg: Signature made Thu 23 Jan 2020 16:45:18 GMT
# gpg:                using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" [full]
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A  9FA9 0516 331E BC5B FDE7

* remotes/dgilbert-gitlab/tags/pull-virtiofs-20200123b: (108 commits)
  virtiofsd: add some options to the help message
  virtiofsd: stop all queue threads on exit in virtio_loop()
  virtiofsd/passthrough_ll: Pass errno to fuse_reply_err()
  virtiofsd: Convert lo_destroy to take the lo->mutex lock itself
  virtiofsd: add --thread-pool-size=NUM option
  virtiofsd: fix lo_destroy() resource leaks
  virtiofsd: prevent FUSE_INIT/FUSE_DESTROY races
  virtiofsd: process requests in a thread pool
  virtiofsd: use fuse_buf_writev to replace fuse_buf_write for better performance
  virtiofsd: add definition of fuse_buf_writev()
  virtiofsd: passthrough_ll: Use cache_readdir for directory open
  virtiofsd: Fix data corruption with O_APPEND write in writeback mode
  virtiofsd: Reset O_DIRECT flag during file open
  virtiofsd: convert more fprintf and perror to use fuse log infra
  virtiofsd: do not always set FUSE_FLOCK_LOCKS
  virtiofsd: introduce inode refcount to prevent use-after-free
  virtiofsd: passthrough_ll: fix refcounting on remove/rename
  libvhost-user: Fix some memtable remap cases
  virtiofsd: rename inode->refcount to inode->nlookup
  virtiofsd: prevent races with lo_dirp_put()
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-01-24 09:59:11 +00:00
Peter Maydell c0248b36d8 vnc: fix zlib compression artifacts.
ui: add "none" to -display help.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJeKavFAAoJEEy22O7T6HE4jR8P/AnX/EqfmksfqATZSOvpjVQj
 qspUIYvgg3itI3LeNP1oMuPoI7157fEmzAPo2LNEBHyJfqTg9HjDUMevATs/k3fZ
 iiD1BCSIPiAJySqZ6ULn/mpir8MuiLdcRrVsN0oj8f9hDEJ803g7tjRSZKGxU+HL
 H7ICmULKH4YeDE+q9NluLKxom76RuZsWUKlvLoiiaO6oG40xT8hAbrp59EQ7Rhze
 lVJayhm8uUVt4yIbhO6Y5sY5NxwER2DJK7v7WczuTkidWXjW/RZfes9Hfy+UQ7Fv
 p5KdUHxXfOWbe6R0P4SG9jh29PdnJh/3acDIepr2h+bG8whgHD3sGIoTxEweB/wz
 0wNIGp/5JF7JJwE5luMnbfxEtnXBZ4c3X2cr9PBFzm6Yid9qhH8aldCLKdh3dG+7
 QXYRXprtjDkp2JlduPW5sbyHjzX6M2PQ48kKr2sj/IFKB8I1dz0v1hX9NkZQx8UF
 fXqx48GFAXv+Hobb8OpE9ru+i8/PNHFFEShcz+clHE67OIm6sXdZHBHSpH7+dDpi
 iXamaVkBb9PgbvEzyFM3bS5gJq2k4EPUdFFmbPgYfOfwMnMxZvYgXpwjzfFNpoK/
 rCMar/JrmdiBwY9l4r2CIhifweMin1i7kqxu3g7DFypRE4/b+nQmgRu/szzHm5Q5
 Bu4S90f2GoqjcucgNPBQ
 =+VAq
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kraxel/tags/ui-20200123-pull-request' into staging

vnc: fix zlib compression artifacts.
ui: add "none" to -display help.

# gpg: Signature made Thu 23 Jan 2020 14:20:53 GMT
# gpg:                using RSA key 4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>" [full]
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>" [full]
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>" [full]
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/ui-20200123-pull-request:
  ui/console: Display the 'none' backend in '-display help'
  vnc: prioritize ZRLE compression over ZLIB
  Revert "vnc: allow fall back to RAW encoding"

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-01-23 18:44:39 +00:00
Masayoshi Mizuma 1d59b1b210 virtiofsd: add some options to the help message
Add following options to the help message:
- cache
- flock|no_flock
- norace
- posix_lock|no_posix_lock
- readdirplus|no_readdirplus
- timeout
- writeback|no_writeback
- xattr|no_xattr

Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>

dgilbert: Split cache, norace, posix_lock, readdirplus off
  into our own earlier patches that added the options

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Eryu Guan 9883df8cca virtiofsd: stop all queue threads on exit in virtio_loop()
On guest graceful shutdown, virtiofsd receives VHOST_USER_GET_VRING_BASE
request from VMM and shuts down virtqueues by calling fv_set_started(),
which joins fv_queue_thread() threads. So when virtio_loop() returns,
there should be no thread is still accessing data in fuse session and/or
virtio dev.

But on abnormal exit, e.g. guest got killed for whatever reason,
vhost-user socket is closed and virtio_loop() breaks out the main loop
and returns to main(). But it's possible fv_queue_worker()s are still
working and accessing fuse session and virtio dev, which results in
crash or use-after-free.

Fix it by stopping fv_queue_thread()s before virtio_loop() returns,
to make sure there's no-one could access fuse session and virtio dev.

Reported-by: Qingming Su <qingming.su@linux.alibaba.com>
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Xiao Yang a931b6861e virtiofsd/passthrough_ll: Pass errno to fuse_reply_err()
lo_copy_file_range() passes -errno to fuse_reply_err() and then fuse_reply_err()
changes it to errno again, so that subsequent fuse_send_reply_iov_nofree() catches
the wrong errno.(i.e. reports "fuse: bad error value: ...").

Make fuse_send_reply_iov_nofree() accept the correct -errno by passing errno
directly in lo_copy_file_range().

Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com>
Reviewed-by: Eryu Guan <eguan@linux.alibaba.com>

dgilbert: Sent upstream and now Merged as aa1185e153f774f1df65
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Dr. David Alan Gilbert fe4c15798a virtiofsd: Convert lo_destroy to take the lo->mutex lock itself
lo_destroy was relying on some implicit knowledge of the locking;
we can avoid this if we create an unref_inode that doesn't take
the lock and then grab it for the whole of the lo_destroy.

Suggested-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Stefan Hajnoczi 951b3120db virtiofsd: add --thread-pool-size=NUM option
Add an option to control the size of the thread pool.  Requests are now
processed in parallel by default.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Stefan Hajnoczi 28f7a3b026 virtiofsd: fix lo_destroy() resource leaks
Now that lo_destroy() is serialized we can call unref_inode() so that
all inode resources are freed.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Stefan Hajnoczi cdc497c692 virtiofsd: prevent FUSE_INIT/FUSE_DESTROY races
When running with multiple threads it can be tricky to handle
FUSE_INIT/FUSE_DESTROY in parallel with other request types or in
parallel with themselves.  Serialize FUSE_INIT and FUSE_DESTROY so that
malicious clients cannot trigger race conditions.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Stefan Hajnoczi a3d756c5ae virtiofsd: process requests in a thread pool
Introduce a thread pool so that fv_queue_thread() just pops
VuVirtqElements and hands them to the thread pool.  For the time being
only one worker thread is allowed since passthrough_ll.c is not
thread-safe yet.  Future patches will lift this restriction so that
multiple FUSE requests can be processed in parallel.

The main new concept is struct FVRequest, which contains both
VuVirtqElement and struct fuse_chan.  We now have fv_VuDev for a device,
fv_QueueInfo for a virtqueue, and FVRequest for a request.  Some of
fv_QueueInfo's fields are moved into FVRequest because they are
per-request.  The name FVRequest conforms to QEMU coding style and I
expect the struct fv_* types will be renamed in a future refactoring.

This patch series is not optimal.  fbuf reuse is dropped so each request
does malloc(se->bufsize), but there is no clean and cheap way to keep
this with a thread pool.  The vq_lock mutex is held for longer than
necessary, especially during the eventfd_write() syscall.  Performance
can be improved in the future.

prctl(2) had to be added to the seccomp whitelist because glib invokes
it.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
piaojun c465bba2c9 virtiofsd: use fuse_buf_writev to replace fuse_buf_write for better performance
fuse_buf_writev() only handles the normal write in which src is buffer
and dest is fd. Specially if src buffer represents guest physical
address that can't be mapped by the daemon process, IO must be bounced
back to the VMM to do it by fuse_buf_copy().

Signed-off-by: Jun Piao <piaojun@huawei.com>
Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
piaojun 9ceaaa15cf virtiofsd: add definition of fuse_buf_writev()
Define fuse_buf_writev() which use pwritev and writev to improve io
bandwidth. Especially, the src bufs with 0 size should be skipped as
their mems are not *block_size* aligned which will cause writev failed
in direct io mode.

Signed-off-by: Jun Piao <piaojun@huawei.com>
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Misono Tomohiro 9b610b09b4 virtiofsd: passthrough_ll: Use cache_readdir for directory open
Since keep_cache(FOPEN_KEEP_CACHE) has no effect for directory as
described in fuse_common.h, use cache_readdir(FOPNE_CACHE_DIR) for
diretory open when cache=always mode.

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Misono Tomohiro 8e4e41e39e virtiofsd: Fix data corruption with O_APPEND write in writeback mode
When writeback mode is enabled (-o writeback), O_APPEND handling is
done in kernel. Therefore virtiofsd clears O_APPEND flag when open.
Otherwise O_APPEND flag takes precedence over pwrite() and write
data may corrupt.

Currently clearing O_APPEND flag is done in lo_open(), but we also
need the same operation in lo_create(). So, factor out the flag
update operation in lo_open() to update_open_flags() and call it
in both lo_open() and lo_create().

This fixes the failure of xfstest generic/069 in writeback mode
(which tests O_APPEND write data integrity).

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Vivek Goyal 65da453980 virtiofsd: Reset O_DIRECT flag during file open
If an application wants to do direct IO and opens a file with O_DIRECT
in guest, that does not necessarily mean that we need to bypass page
cache on host as well. So reset this flag on host.

If somebody needs to bypass page cache on host as well (and it is safe to
do so), we can add a knob in daemon later to control this behavior.

I check virtio-9p and they do reset O_DIRECT flag.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Eryu Guan fc1aed0bf9 virtiofsd: convert more fprintf and perror to use fuse log infra
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Peng Tao e468d4af5f virtiofsd: do not always set FUSE_FLOCK_LOCKS
Right now we always enable it regardless of given commandlines.
Fix it by setting the flag relying on the lo->flock bit.

Signed-off-by: Peng Tao <tao.peng@linux.alibaba.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Stefan Hajnoczi c241aa9457 virtiofsd: introduce inode refcount to prevent use-after-free
If thread A is using an inode it must not be deleted by thread B when
processing a FUSE_FORGET request.

The FUSE protocol itself already has a counter called nlookup that is
used in FUSE_FORGET messages.  We cannot trust this counter since the
untrusted client can manipulate it via FUSE_FORGET messages.

Introduce a new refcount to keep inodes alive for the required lifespan.
lo_inode_put() must be called to release a reference.  FUSE's nlookup
counter holds exactly one reference so that the inode stays alive as
long as the client still wants to remember it.

Note that the lo_inode->is_symlink field is moved to avoid creating a
hole in the struct due to struct field alignment.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Miklos Szeredi 9257e514d8 virtiofsd: passthrough_ll: fix refcounting on remove/rename
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Dr. David Alan Gilbert 49e9ec749d libvhost-user: Fix some memtable remap cases
If a new setmemtable command comes in once the vhost threads are
running, it will remap the guests address space and the threads
will now be looking in the wrong place.

Fortunately we're running this command under lock, so we can
update the queue mappings so that threads will look in the new-right
place.

Note: This doesn't fix things that the threads might be doing
without a lock (e.g. a readv/writev!)  That's for another time.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Stefan Hajnoczi 1222f01555 virtiofsd: rename inode->refcount to inode->nlookup
This reference counter plays a specific role in the FUSE protocol.  It's
not a generic object reference counter and the FUSE kernel code calls it
"nlookup".

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Stefan Hajnoczi acefdde73b virtiofsd: prevent races with lo_dirp_put()
Introduce lo_dirp_put() so that FUSE_RELEASEDIR does not cause
use-after-free races with other threads that are accessing lo_dirp.

Also make lo_releasedir() atomic to prevent FUSE_RELEASEDIR racing with
itself.  This prevents double-frees.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Stefan Hajnoczi baed65c060 virtiofsd: make lo_release() atomic
Hold the lock across both lo_map_get() and lo_map_remove() to prevent
races between two FUSE_RELEASE requests.  In this case I don't see a
serious bug but it's safer to do things atomically.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Stefan Hajnoczi e7b337326d virtiofsd: prevent fv_queue_thread() vs virtio_loop() races
We call into libvhost-user from the virtqueue handler thread and the
vhost-user message processing thread without a lock.  There is nothing
protecting the virtqueue handler thread if the vhost-user message
processing thread changes the virtqueue or memory table while it is
running.

This patch introduces a read-write lock.  Virtqueue handler threads are
readers.  The vhost-user message processing thread is a writer.  This
will allow concurrency for multiqueue in the future while protecting
against fv_queue_thread() vs virtio_loop() races.

Note that the critical sections could be made smaller but it would be
more invasive and require libvhost-user changes.  Let's start simple and
improve performance later, if necessary.  Another option would be an
RCU-style approach with lighter-weight primitives.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Stefan Hajnoczi 620e9d8d9c virtiofsd: use fuse_lowlevel_is_virtio() in fuse_session_destroy()
vu_socket_path is NULL when --fd=FDNUM was used.  Use
fuse_lowlevel_is_virtio() instead.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Vivek Goyal 0e81414c54 virtiofsd: Support remote posix locks
Doing posix locks with-in guest kernel are not sufficient if a file/dir
is being shared by multiple guests. So we need the notion of daemon doing
the locks which are visible to rest of the guests.

Given posix locks are per process, one can not call posix lock API on host,
otherwise bunch of basic posix locks properties are broken. For example,
If two processes (A and B) in guest open the file and take locks on different
sections of file, if one of the processes closes the fd, it will close
fd on virtiofsd and all posix locks on file will go away. This means if
process A closes the fd, then locks of process B will go away too.

Similar other problems exist too.

This patch set tries to emulate posix locks while using open file
description locks provided on Linux.

Daemon provides two options (-o posix_lock, -o no_posix_lock) to enable
or disable posix locking in daemon. By default it is enabled.

There are few issues though.

- GETLK() returns pid of process holding lock. As we are emulating locks
  using OFD, and these locks are not per process and don't return pid
  of process, so GETLK() in guest does not reuturn process pid.

- As of now only F_SETLK is supported and not F_SETLKW. We can't block
  the thread in virtiofsd for arbitrary long duration as there is only
  one thread serving the queue. That means unlock request will not make
  it to daemon and F_SETLKW will block infinitely and bring virtio-fs
  to a halt. This is a solvable problem though and will require significant
  changes in virtiofsd and kernel. Left as a TODO item for now.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Liu Bo 740b0b700a Virtiofsd: fix memory leak on fuse queueinfo
For fuse's queueinfo, both queueinfo array and queueinfos are allocated in
fv_queue_set_started() but not cleaned up when the daemon process quits.

This fixes the leak in proper places.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Eric Ren fc3f0041b4 virtiofsd: fix incorrect error handling in lo_do_lookup
Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Liu Bo b7ed733a38 virtiofsd: enable PARALLEL_DIROPS during INIT
lookup is a RO operations, PARALLEL_DIROPS can be enabled.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Masayoshi Mizuma 96814800d2 virtiofsd: Prevent multiply running with same vhost_user_socket
virtiofsd can run multiply even if the vhost_user_socket is same path.

  ]# ./virtiofsd -o vhost_user_socket=/tmp/vhostqemu -o source=/tmp/share &
  [1] 244965
  virtio_session_mount: Waiting for vhost-user socket connection...
  ]# ./virtiofsd -o vhost_user_socket=/tmp/vhostqemu -o source=/tmp/share &
  [2] 244966
  virtio_session_mount: Waiting for vhost-user socket connection...
  ]#

The user will get confused about the situation and maybe the cause of the
unexpected problem. So it's better to prevent the multiple running.

Create a regular file under localstatedir directory to exclude the
vhost_user_socket. To create and lock the file, use qemu_write_pidfile()
because the API has some sanity checks and file lock.

Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  Applied fixes from Stefan's review and moved osdep include
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Liu Bo 18a69cbbb6 virtiofsd: add helper for lo_data cleanup
This offers an helper function for lo_data's cleanup.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Liu Bo eb68a33b5f virtiofsd: fix memory leak on lo.source
valgrind reported that lo.source is leaked on quiting, but it was defined
as (const char*) as it may point to a const string "/".

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Liu Bo 61cfc44982 virtiofsd: cleanup allocated resource in se
This cleans up unfreed resources in se on quiting, including
se->virtio_dev, se->vu_socket_path, se->vu_socketfd.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Liu Bo c6de804670 virtiofsd: fix error handling in main()
Neither fuse_parse_cmdline() nor fuse_opt_parse() goes to the right place
to do cleanup.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Jiufei Xue 8a792b034d virtiofsd: support nanosecond resolution for file timestamp
Define HAVE_STRUCT_STAT_ST_ATIM to 1 if `st_atim' is member of `struct
stat' which means support nanosecond resolution for the file timestamp
fields.

Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Dr. David Alan Gilbert 771b01eb76 virtiofsd: Clean up inodes on destroy
Clear out our inodes and fd's on a 'destroy' - so we get rid
of them if we reboot the guest.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Miklos Szeredi bfc50a6e06 virtiofsd: passthrough_ll: use hashtable
Improve performance of inode lookup by using a hash table.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Miklos Szeredi 230e777b5e virtiofsd: passthrough_ll: clean up cache related options
- Rename "cache=never" to "cache=none" to match 9p's similar option.

 - Rename CACHE_NORMAL constant to CACHE_AUTO to match the "cache=auto"
   option.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Miklos Szeredi 3ca8a2b1c8 virtiofsd: extract root inode init into setup_root()
Inititialize the root inode in a single place.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
dgilbert:
with fix suggested by Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Miklos Szeredi 9de4fab599 virtiofsd: fail when parent inode isn't known in lo_do_lookup()
The Linux file handle APIs (struct export_operations) can access inodes
that are not attached to parents because path name traversal is not
performed.  Refuse if there is no parent in lo_do_lookup().

Also clean up lo_do_lookup() while we're here.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Miklos Szeredi 95d2715791 virtiofsd: rename unref_inode() to unref_inode_lolocked()
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Miklos Szeredi 59aef494be virtiofsd: passthrough_ll: control readdirplus
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Miklos Szeredi ddcbabcb0e virtiofsd: passthrough_ll: disable readdirplus on cache=never
...because the attributes sent in the READDIRPLUS reply would be discarded
anyway.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Miklos Szeredi f0ab7d6f78 virtiofsd: passthrough_ll: add renameat2 support
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Dr. David Alan Gilbert c25c02b9e6 contrib/libvhost-user: Protect slave fd with mutex
In future patches we'll be performing commands on the slave-fd driven
by commands on queues, since those queues will be driven by individual
threads we need to make sure they don't attempt to use the slave-fd
for multiple commands in parallel.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00
Dr. David Alan Gilbert 0fdc465d7d vhost-user: Print unexpected slave message types
When we receive an unexpected message type on the slave fd, print
the type.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-01-23 16:41:37 +00:00