Commit Graph

1063 Commits

Author SHA1 Message Date
Steve Sistare
d9cda21303 migration: simplify notifiers
Pass the callback function to add_migration_state_change_notifier so
that migration can initialize the notifier on add and clear it on
delete, which simplifies the call sites.  Shorten the function names
so the extra arg can be added more legibly.  Hide the global notifier
list in a new function migration_call_notifiers, and make it externally
visible so future live update code can call it.

No functional change.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: Michael Galaxy <mgalaxy@akamai.com>
Reviewed-by: Michael Galaxy <mgalaxy@akamai.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <1686148954-250144-1-git-send-email-steven.sistare@oracle.com>
2023-10-20 08:51:41 +02:00
Philippe Mathieu-Daudé
73071f1923 net/net: Clean up global variable shadowing
Fix:

  net/net.c:1680:35: error: declaration shadows a variable in the global scope [-Werror,-Wshadow]
  bool netdev_is_modern(const char *optarg)
                                    ^
  net/net.c:1714:38: error: declaration shadows a variable in the global scope [-Werror,-Wshadow]
  void netdev_parse_modern(const char *optarg)
                                       ^
  net/net.c:1728:60: error: declaration shadows a variable in the global scope [-Werror,-Wshadow]
  void net_client_parse(QemuOptsList *opts_list, const char *optarg)
                                                             ^
  /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/getopt.h:77:14: note: previous declaration is here
  extern char *optarg;                    /* getopt(3) external variables */
               ^

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20231004120019.93101-4-philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2023-10-06 13:27:43 +02:00
Stefan Hajnoczi
2f3913f4b2 virtio,pci: features, cleanups
vdpa:
       shadow vq vlan support
       net migration with cvq
 cxl:
      support emulating 4 HDM decoders
      serial number extended capability
 virtio:
       hared dma-buf
 
 Fixes, cleanups all over the place.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmUd4/YPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpyM8H/02cRbJcQOjYt7j68zPW6GaDXxBI/UmdWDyG
 15LZZbGNOPjyjNd3Vz1M7stQ5rhoKcgo/RdI+0E60a78svgW5JvpXoXR3pksc3Dx
 v28B/akXwHUErYFSZQ+2VHNc8OhCd0v2ehxZxbwPEAYIOAj3hcCIVoPGXTnKJmAJ
 imr5hjH0wZUc0+xdsmn8Vfdv5NTzpwfVObbGiMZejeJsaoh0y6Rt8RANBMY67KQD
 S7/HPlVuDYf/y43t4ZEHNYuV9RaCdZZYlLWwV1scdKaYcofgmtJOKbOdCjHRXgj+
 004Afb3rggIoCfnCzOFzhGx+MLDtLjvEn2N4oLEWCLi+k/3huaA=
 =GAvH
 -----END PGP SIGNATURE-----

Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging

virtio,pci: features, cleanups

vdpa:
      shadow vq vlan support
      net migration with cvq
cxl:
     support emulating 4 HDM decoders
     serial number extended capability
virtio:
      hared dma-buf

Fixes, cleanups all over the place.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (53 commits)
  libvhost-user: handle shared_object msg
  vhost-user: add shared_object msg
  hw/display: introduce virtio-dmabuf
  util/uuid: add a hash function
  virtio: remove unused next argument from virtqueue_split_read_next_desc()
  virtio: remove unnecessary thread fence while reading next descriptor
  virtio: use shadow_avail_idx while checking number of heads
  libvhost-user.c: add assertion to vu_message_read_default
  pcie_sriov: unregister_vfs(): fix error path
  hw/i386/pc: improve physical address space bound check for 32-bit x86 systems
  amd_iommu: Fix APIC address check
  vdpa net: follow VirtIO initialization properly at cvq isolation probing
  vdpa net: stop probing if cannot set features
  vdpa net: fix error message setting virtio status
  hw/pci-bridge/cxl-upstream: Add serial number extended capability support
  hw/cxl: Support 4 HDM decoders at all levels of topology
  hw/cxl: Fix and use same calculation for HDM decoder block size everywhere
  hw/cxl: Add utility functions decoder interleave ways and target count.
  hw/cxl: Push cxl_decoder_count_enc() and cxl_decode_ig() into .c
  vdpa net: zero vhost_vdpa iova_tree pointer at cleanup
  ...

Conflicts:
  hw/core/machine.c
  Context conflict with commit 314e0a84cd ("hw/core: remove needless
  includes") because it removed an adjacent #include.
2023-10-05 09:01:01 -04:00
Eugenio Pérez
845ec38ae1 vdpa net: follow VirtIO initialization properly at cvq isolation probing
This patch solves a few issues.  The most obvious is that the feature
set was done previous to ACKNOWLEDGE | DRIVER status bit set.  Current
vdpa devices are permissive with this, but it is better to follow the
standard.

Fixes: 152128d646 ("vdpa: move CVQ isolation check to net_init_vhost_vdpa")
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20230915170836.3078172-4-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-04 18:15:06 -04:00
Eugenio Pérez
f1085882d0 vdpa net: stop probing if cannot set features
Otherwise it continues the CVQ isolation probing.

Fixes: 152128d646 ("vdpa: move CVQ isolation check to net_init_vhost_vdpa")
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20230915170836.3078172-3-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2023-10-04 18:15:06 -04:00
Eugenio Pérez
cbc9ae87b5 vdpa net: fix error message setting virtio status
It incorrectly prints "error setting features", probably because a copy
paste miss.

Fixes: 152128d646 ("vdpa: move CVQ isolation check to net_init_vhost_vdpa")
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20230915170836.3078172-2-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2023-10-04 18:15:06 -04:00
Eugenio Pérez
0a7a164bc3 vdpa net: zero vhost_vdpa iova_tree pointer at cleanup
Not zeroing it causes a SIGSEGV if the live migration is cancelled, at
net device restart.

This is caused because CVQ tries to reuse the iova_tree that is present
in the first vhost_vdpa device at the end of vhost_vdpa_net_cvq_start.
As a consequence, it tries to access an iova_tree that has been already
free.

Fixes: 00ef422e9f ("vdpa net: move iova tree creation from init to start")
Reported-by: Yanhui Ma <yama@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20230913123408.2819185-1-eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-04 18:15:06 -04:00
Stefan Hajnoczi
e77db790d1 vdpa: fix gcc cvq_isolated uninitialized variable warning
gcc 13.2.1 emits the following warning:

  net/vhost-vdpa.c: In function ‘net_vhost_vdpa_init.constprop’:
  net/vhost-vdpa.c:1394:25: error: ‘cvq_isolated’ may be used uninitialized [-Werror=maybe-uninitialized]
   1394 |         s->cvq_isolated = cvq_isolated;
        |         ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~
  net/vhost-vdpa.c:1355:9: note: ‘cvq_isolated’ was declared here
   1355 |     int cvq_isolated;
        |         ^~~~~~~~~~~~
  cc1: all warnings being treated as errors

Cc: Eugenio Pérez <eperezma@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230911215435.4156314-1-stefanha@redhat.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-04 18:15:06 -04:00
Hawkins Jiawei
b0de17a2e2 vhost: Add count argument to vhost_svq_poll()
Next patches in this series will no longer perform an
immediate poll and check of the device's used buffers
for each CVQ state load command. Instead, they will
send CVQ state load commands in parallel by polling
multiple pending buffers at once.

To achieve this, this patch refactoring vhost_svq_poll()
to accept a new argument `num`, which allows vhost_svq_poll()
to wait for the device to use multiple elements,
rather than polling for a single element.

Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <950b3bfcfc5d446168b9d6a249d554a013a691d4.1693287885.git.yin31149@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-04 04:54:23 -04:00
Eugenio Pérez
f13f5f6412 vdpa: remove net cvq migration blocker
Now that we have add migration blockers if the device does not support
all the needed features, remove the general blocker applied to all net
devices with CVQ.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230822085330.3978829-6-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-04 04:54:22 -04:00
Eugenio Pérez
6c4825476a vdpa: move vhost_vdpa_set_vring_ready to the caller
Doing that way allows CVQ to be enabled before the dataplane vqs,
restoring the state as MQ or MAC addresses properly in the case of a
migration.

The patch does it by defining a ->load NetClientInfo callback also for
dataplane.  Ideally, this should be done by an independent patch, but
the function is already static so it would only add an empty
vhost_vdpa_net_data_load stub.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20230822085330.3978829-5-eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-04 04:54:21 -04:00
Eugenio Pérez
f3fada598c vdpa: rename vhost_vdpa_net_load to vhost_vdpa_net_cvq_load
Next patches will add the corresponding data load.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230822085330.3978829-4-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-04 04:54:19 -04:00
Eugenio Pérez
b40eba9cdd vdpa: use first queue SVQ state for CVQ default
Previous to this patch the only way CVQ would be shadowed is if it does
support to isolate CVQ group or if all vqs were shadowed from the
beginning.  The second condition was checked at the beginning, and no
more configuration was done.

After this series we need to check if data queues are shadowed because
they are in the middle of the migration.  As checking if they are
shadowed already covers the previous case, let's just mimic it.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20230822085330.3978829-2-eperezma@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-04 04:54:17 -04:00
Hawkins Jiawei
e213c45a04 vdpa: Allow VIRTIO_NET_F_CTRL_VLAN in SVQ
Enable SVQ with VIRTIO_NET_F_CTRL_VLAN feature.

Co-developed-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Message-Id: <38dc63102a42c31c72fd293d0e6e2828fd54c86e.1690106284.git.yin31149@gmail.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-04 04:54:13 -04:00
Hawkins Jiawei
8f7e996748 vdpa: Restore vlan filtering state
This patch introduces vhost_vdpa_net_load_single_vlan()
and vhost_vdpa_net_load_vlan() to restore the vlan
filtering state at device's startup.

Co-developed-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Message-Id: <e76a29f77bb3f386e4a643c8af94b77b775d1752.1690106284.git.yin31149@gmail.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-04 04:54:10 -04:00
Philippe Mathieu-Daudé
1728593a82 net/eth: Clean up local variable shadowing
Fix:

  net/eth.c:435:20: error: declaration shadows a local variable [-Werror,-Wshadow]
            size_t input_size = iov_size(pkt, pkt_frags);
                   ^
  net/eth.c:413:16: note: previous declaration is here
        size_t input_size = iov_size(pkt, pkt_frags);
               ^

Suggested-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20230904161235.84651-16-philmd@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2023-09-29 10:07:16 +02:00
Peter Maydell
6d7a53e9f1 net/tap: Avoid variable-length array
Use a heap allocation instead of a variable length array in
tap_receive_iov().

The codebase has very few VLAs, and if we can get rid of them all we
can make the compiler error on new additions.  This is a defensive
measure against security bugs where an on-stack dynamic allocation
isn't correctly size-checked (e.g.  CVE-2021-3527).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2023-09-18 14:36:13 +08:00
Peter Maydell
c4cf68198e net/dump: Avoid variable length array
Use a g_autofree heap allocation instead of a variable length
array in dump_receive_iov().

The codebase has very few VLAs, and if we can get rid of them all we
can make the compiler error on new additions.  This is a defensive
measure against security bugs where an on-stack dynamic allocation
isn't correctly size-checked (e.g.  CVE-2021-3527).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2023-09-18 14:36:13 +08:00
Ilya Maximets
cb039ef3d9 net: add initial support for AF_XDP network backend
AF_XDP is a network socket family that allows communication directly
with the network device driver in the kernel, bypassing most or all
of the kernel networking stack.  In the essence, the technology is
pretty similar to netmap.  But, unlike netmap, AF_XDP is Linux-native
and works with any network interfaces without driver modifications.
Unlike vhost-based backends (kernel, user, vdpa), AF_XDP doesn't
require access to character devices or unix sockets.  Only access to
the network interface itself is necessary.

This patch implements a network backend that communicates with the
kernel by creating an AF_XDP socket.  A chunk of userspace memory
is shared between QEMU and the host kernel.  4 ring buffers (Tx, Rx,
Fill and Completion) are placed in that memory along with a pool of
memory buffers for the packet data.  Data transmission is done by
allocating one of the buffers, copying packet data into it and
placing the pointer into Tx ring.  After transmission, device will
return the buffer via Completion ring.  On Rx, device will take
a buffer form a pre-populated Fill ring, write the packet data into
it and place the buffer into Rx ring.

AF_XDP network backend takes on the communication with the host
kernel and the network interface and forwards packets to/from the
peer device in QEMU.

Usage example:

  -device virtio-net-pci,netdev=guest1,mac=00:16:35:AF:AA:5C
  -netdev af-xdp,ifname=ens6f1np1,id=guest1,mode=native,queues=1

XDP program bridges the socket with a network interface.  It can be
attached to the interface in 2 different modes:

1. skb - this mode should work for any interface and doesn't require
         driver support.  With a caveat of lower performance.

2. native - this does require support from the driver and allows to
            bypass skb allocation in the kernel and potentially use
            zero-copy while getting packets in/out userspace.

By default, QEMU will try to use native mode and fall back to skb.
Mode can be forced via 'mode' option.  To force 'copy' even in native
mode, use 'force-copy=on' option.  This might be useful if there is
some issue with the driver.

Option 'queues=N' allows to specify how many device queues should
be open.  Note that all the queues that are not open are still
functional and can receive traffic, but it will not be delivered to
QEMU.  So, the number of device queues should generally match the
QEMU configuration, unless the device is shared with something
else and the traffic re-direction to appropriate queues is correctly
configured on a device level (e.g. with ethtool -N).
'start-queue=M' option can be used to specify from which queue id
QEMU should start configuring 'N' queues.  It might also be necessary
to use this option with certain NICs, e.g. MLX5 NICs.  See the docs
for examples.

In a general case QEMU will need CAP_NET_ADMIN and CAP_SYS_ADMIN
or CAP_BPF capabilities in order to load default XSK/XDP programs to
the network interface and configure BPF maps.  It is possible, however,
to run with no capabilities.  For that to work, an external process
with enough capabilities will need to pre-load default XSK program,
create AF_XDP sockets and pass their file descriptors to QEMU process
on startup via 'sock-fds' option.  Network backend will need to be
configured with 'inhibit=on' to avoid loading of the program.
QEMU will need 32 MB of locked memory (RLIMIT_MEMLOCK) per queue
or CAP_IPC_LOCK.

There are few performance challenges with the current network backends.

First is that they do not support IO threads.  This means that data
path is handled by the main thread in QEMU and may slow down other
work or may be slowed down by some other work.  This also means that
taking advantage of multi-queue is generally not possible today.

Another thing is that data path is going through the device emulation
code, which is not really optimized for performance.  The fastest
"frontend" device is virtio-net.  But it's not optimized for heavy
traffic either, because it expects such use-cases to be handled via
some implementation of vhost (user, kernel, vdpa).  In practice, we
have virtio notifications and rcu lock/unlock on a per-packet basis
and not very efficient accesses to the guest memory.  Communication
channels between backend and frontend devices do not allow passing
more than one packet at a time as well.

Some of these challenges can be avoided in the future by adding better
batching into device emulation or by implementing vhost-af-xdp variant.

There are also a few kernel limitations.  AF_XDP sockets do not
support any kinds of checksum or segmentation offloading.  Buffers
are limited to a page size (4K), i.e. MTU is limited.  Multi-buffer
support implementation for AF_XDP is in progress, but not ready yet.
Also, transmission in all non-zero-copy modes is synchronous, i.e.
done in a syscall.  That doesn't allow high packet rates on virtual
interfaces.

However, keeping in mind all of these challenges, current implementation
of the AF_XDP backend shows a decent performance while running on top
of a physical NIC with zero-copy support.

Test setup:

2 VMs running on 2 physical hosts connected via ConnectX6-Dx card.
Network backend is configured to open the NIC directly in native mode.
The driver supports zero-copy.  NIC is configured to use 1 queue.

Inside a VM - iperf3 for basic TCP performance testing and dpdk-testpmd
for PPS testing.

iperf3 result:
 TCP stream      : 19.1 Gbps

dpdk-testpmd (single queue, single CPU core, 64 B packets) results:
 Tx only         : 3.4 Mpps
 Rx only         : 2.0 Mpps
 L2 FWD Loopback : 1.5 Mpps

In skb mode the same setup shows much lower performance, similar to
the setup where pair of physical NICs is replaced with veth pair:

iperf3 result:
  TCP stream      : 9 Gbps

dpdk-testpmd (single queue, single CPU core, 64 B packets) results:
  Tx only         : 1.2 Mpps
  Rx only         : 1.0 Mpps
  L2 FWD Loopback : 0.7 Mpps

Results in skb mode or over the veth are close to results of a tap
backend with vhost=on and disabled segmentation offloading bridged
with a NIC.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> (docker/lcitool)
Signed-off-by: Jason Wang <jasowang@redhat.com>
2023-09-18 14:36:13 +08:00
Andrew Melnychenko
9da1684954 virtio-net: Add USO flags to vhost support.
New features are subject to check with vhost-user and vdpa.

Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2023-09-18 14:36:13 +08:00
Yuri Benditovich
f03e0cf63b tap: Add check for USO features
Tap indicates support for USO features according to
capabilities of current kernel module.

Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
Signed-off-by: Andrew Melnychecnko <andrew@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2023-09-18 14:36:13 +08:00
Andrew Melnychenko
2ab0ec3121 tap: Add USO support to tap device.
Passing additional parameters (USOv4 and USOv6 offloads) when
setting TAP offloads

Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2023-09-18 14:36:13 +08:00
Jonathan Perkin
fb0a8b0e23 meson: Fix targetos match for illumos and Solaris.
qemu 8.1.0 breaks on illumos platforms due to _XOPEN_SOURCE and others no longer being set correctly, leading to breakage such as:

  https://us-central.manta.mnx.io/pkgsrc/public/reports/trunk/tools/20230908.1404/qemu-8.1.0/build.log

This is a result of meson conversion which incorrectly matches against 'solaris' instead of 'sunos' for uname.

First time submitting a patch here, hope I did it correctly.  Thanks.

Signed-off-by: Jonathan Perkin <jonathan@perkin.org.uk>
Message-ID: <ZPtdxtum9UVPy58J@perkin.org.uk>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-13 09:33:51 +02:00
Michael Tokarev
0a19d87995 misc/other: spelling fixes
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Eric Blake <eblake@redhat.com>
2023-09-08 13:08:52 +03:00
Stefan Hajnoczi
03a3a62fbd * only build util/async-teardown.c when system build is requested
* target/i386: fix BQL handling of the legacy FERR interrupts
 * target/i386: fix memory operand size for CVTPS2PD
 * target/i386: Add support for AMX-COMPLEX in CPUID enumeration
 * compile plugins on Darwin
 * configure and meson cleanups
 * drop mkvenv support for Python 3.7 and Debian10
 * add wrap file for libblkio
 * tweak KVM stubs
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmT5t6UUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMmjwf+MpvVuq+nn+3PqGUXgnzJx5ccA5ne
 O9Xy8+1GdlQPzBw/tPovxXDSKn3HQtBfxObn2CCE1tu/4uHWpBA1Vksn++NHdUf2
 P0yoHxGskJu5iYYTtIcNw5cH2i+AizdiXuEjhfNjqD5Y234cFoHnUApt9e3zBvVO
 cwGD7WpPuSb4g38hHkV6nKcx72o7b4ejDToqUVZJ2N+RkddSqB03fSdrOru0hR7x
 V+lay0DYdFszNDFm05LJzfDbcrHuSryGA91wtty7Fzj6QhR/HBHQCUZJxMB5PI7F
 Zy4Zdpu60zxtSxUqeKgIi7UhNFgMcax2Hf9QEqdc/B4ARoBbboh4q4u8kQ==
 =dH7/
 -----END PGP SIGNATURE-----

Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging

* only build util/async-teardown.c when system build is requested
* target/i386: fix BQL handling of the legacy FERR interrupts
* target/i386: fix memory operand size for CVTPS2PD
* target/i386: Add support for AMX-COMPLEX in CPUID enumeration
* compile plugins on Darwin
* configure and meson cleanups
* drop mkvenv support for Python 3.7 and Debian10
* add wrap file for libblkio
* tweak KVM stubs

# -----BEGIN PGP SIGNATURE-----
#
# iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmT5t6UUHHBib256aW5p
# QHJlZGhhdC5jb20ACgkQv/vSX3jHroMmjwf+MpvVuq+nn+3PqGUXgnzJx5ccA5ne
# O9Xy8+1GdlQPzBw/tPovxXDSKn3HQtBfxObn2CCE1tu/4uHWpBA1Vksn++NHdUf2
# P0yoHxGskJu5iYYTtIcNw5cH2i+AizdiXuEjhfNjqD5Y234cFoHnUApt9e3zBvVO
# cwGD7WpPuSb4g38hHkV6nKcx72o7b4ejDToqUVZJ2N+RkddSqB03fSdrOru0hR7x
# V+lay0DYdFszNDFm05LJzfDbcrHuSryGA91wtty7Fzj6QhR/HBHQCUZJxMB5PI7F
# Zy4Zdpu60zxtSxUqeKgIi7UhNFgMcax2Hf9QEqdc/B4ARoBbboh4q4u8kQ==
# =dH7/
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 07 Sep 2023 07:44:37 EDT
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (51 commits)
  docs/system/replay: do not show removed command line option
  subprojects: add wrap file for libblkio
  sysemu/kvm: Restrict kvm_pc_setup_irq_routing() to x86 targets
  sysemu/kvm: Restrict kvm_has_pit_state2() to x86 targets
  sysemu/kvm: Restrict kvm_get_apic_state() to x86 targets
  sysemu/kvm: Restrict kvm_arch_get_supported_cpuid/msr() to x86 targets
  target/i386: Restrict declarations specific to CONFIG_KVM
  target/i386: Allow elision of kvm_hv_vpindex_settable()
  target/i386: Allow elision of kvm_enable_x2apic()
  target/i386: Remove unused KVM stubs
  target/i386/cpu-sysemu: Inline kvm_apic_in_kernel()
  target/i386/helper: Restrict KVM declarations to system emulation
  hw/i386/fw_cfg: Include missing 'cpu.h' header
  hw/i386/pc: Include missing 'cpu.h' header
  hw/i386/pc: Include missing 'sysemu/tcg.h' header
  Revert "mkvenv: work around broken pip installations on Debian 10"
  mkvenv: assume presence of importlib.metadata
  Python: Drop support for Python 3.7
  configure: remove dead code
  meson: list leftover CONFIG_* symbols
  ...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-09-07 10:29:06 -04:00
Paolo Bonzini
73258b3864 configure, meson: remove CONFIG_SOLARIS from config-host.mak
CONFIG_SOLARIS is only used to pick tap implementations.  But the
target OS is invariant and does not depend on the configuration, so move
away from config_host and just use unconditional rules in softmmu_ss.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-07 13:32:37 +02:00
Philippe Mathieu-Daudé
53c7c92422 hw/char: Have FEWatchFunc handlers return G_SOURCE_CONTINUE/REMOVE
GLib recommend to use G_SOURCE_REMOVE / G_SOURCE_CONTINUE
for GSourceFunc callbacks. Our FEWatchFunc is a GSourceFunc
returning such value. Use such definitions which are
"more memorable" [*].

[*] https://docs.gtk.org/glib/callback.SourceFunc.html#return-value

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20230705133139.54419-5-philmd@linaro.org>
2023-08-31 19:47:43 +02:00
Hawkins Jiawei
d669b7bba2 vdpa: Allow VIRTIO_NET_F_CTRL_RX_EXTRA in SVQ
Enable SVQ with VIRTIO_NET_F_CTRL_RX_EXTRA feature.

Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <15ecc49975f9b8d1316ed4296879564a18abf31e.1688797728.git.yin31149@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-07-10 18:59:32 -04:00
Hawkins Jiawei
4fd180c7bb vdpa: Restore packet receive filtering state relative with _F_CTRL_RX_EXTRA feature
This patch refactors vhost_vdpa_net_load_rx() to
restore the packet receive filtering state in relation to
VIRTIO_NET_F_CTRL_RX_EXTRA feature at device's startup.

Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Message-Id: <abddc477a476f756de6e3d24c0e9f7b21c99a4c1.1688797728.git.yin31149@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-07-10 18:59:32 -04:00
Hawkins Jiawei
ea6eec4979 vdpa: Allow VIRTIO_NET_F_CTRL_RX in SVQ
Enable SVQ with VIRTIO_NET_F_CTRL_RX feature.

Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <5d6173a6d7c4c514c98362b404c019f52d73b06c.1688743107.git.yin31149@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-07-10 18:59:32 -04:00
Hawkins Jiawei
fee364e4b1 vdpa: Avoid forwarding large CVQ command failures
Due to the size limitation of the out buffer sent to the vdpa device,
which is determined by vhost_vdpa_net_cvq_cmd_len(), excessive CVQ
command is truncated in QEMU. As a result, the vdpa device rejects
this flawd CVQ command.

However, the problem is that, the VIRTIO_NET_CTRL_MAC_TABLE_SET
CVQ command has a variable length, which may exceed
vhost_vdpa_net_cvq_cmd_len() if the guest sets more than
`MAC_TABLE_ENTRIES` MAC addresses for the filter table.

This patch solves this problem by following steps:

  * Increase the out buffer size to vhost_vdpa_net_cvq_cmd_page_len(),
which represents the size of the buffer that is allocated and mmaped.
This ensures that everything works correctly as long as the guest
sets fewer than `(vhost_vdpa_net_cvq_cmd_page_len() -
sizeof(struct virtio_net_ctrl_hdr)
- 2 * sizeof(struct virtio_net_ctrl_mac)) / ETH_ALEN` MAC addresses.
    Considering the highly unlikely scenario for the guest setting
more than that number of MAC addresses for the filter table, this
should work fine for the majority of cases.

  * If the CVQ command exceeds vhost_vdpa_net_cvq_cmd_page_len(),
instead of directly sending this CVQ command, QEMU should send
a VIRTIO_NET_CTRL_RX_PROMISC CVQ command to vdpa device. Addtionally,
a fake VIRTIO_NET_CTRL_MAC_TABLE_SET command including
(`MAC_TABLE_ENTRIES` + 1) non-multicast MAC addresses and
(`MAC_TABLE_ENTRIES` + 1) multicast MAC addresses should be provided
to the device model.
    By doing so, the vdpa device turns promiscuous mode on, aligning
with the VirtIO standard. The device model marks
`n->mac_table.uni_overflow` and `n->mac_table.multi_overflow`,
which aligns with the state of the vdpa device.

Note that the bug cannot be triggered at the moment, since
VIRTIO_NET_F_CTRL_RX feature is not enabled for SVQ.

Fixes: 7a7f87e94c ("vdpa: Move command buffers map to start of net device")
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Message-Id: <267e15e4eed2d7aeb9887f193da99a13d22a2f1d.1688743107.git.yin31149@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-07-10 18:59:32 -04:00
Hawkins Jiawei
45c4101828 vdpa: Accessing CVQ header through its structure
We can access the CVQ header through `struct virtio_net_ctrl_hdr`,
instead of accessing it through a `uint8_t` pointer,
which improves the code's readability and maintainability.

Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Message-Id: <cd522e06a4371e9d6b8a1c1a86f90a92401d56e8.1688743107.git.yin31149@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-07-10 18:59:32 -04:00
Hawkins Jiawei
b12f907eea vdpa: Restore packet receive filtering state relative with _F_CTRL_RX feature
This patch introduces vhost_vdpa_net_load_rx_mode()
and vhost_vdpa_net_load_rx() to restore the packet
receive filtering state in relation to
VIRTIO_NET_F_CTRL_RX feature at device's startup.

Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Message-Id: <804cedac93e19ba3b810d52b274ca5ec11469f09.1688743107.git.yin31149@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-07-10 18:59:32 -04:00
Hawkins Jiawei
0ddcecb8f2 vdpa: Restore MAC address filtering state
This patch refactors vhost_vdpa_net_load_mac() to
restore the MAC address filtering state at device's startup.

Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Message-Id: <4b9550c14bc8c98c8f48e04dbf3d3ac41489d3fd.1688743107.git.yin31149@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-07-10 18:59:32 -04:00
Hawkins Jiawei
2848c6aa75 vdpa: Use iovec for vhost_vdpa_net_load_cmd()
According to VirtIO standard, "The driver MUST follow
the VIRTIO_NET_CTRL_MAC_TABLE_SET command by a le32 number,
followed by that number of non-multicast MAC addresses,
followed by another le32 number, followed by that number
of multicast addresses."

Considering that these data is not stored in contiguous memory,
this patch refactors vhost_vdpa_net_load_cmd() to accept
scattered data, eliminating the need for an addtional data copy or
packing the data into s->cvq_cmd_out_buffer outside of
vhost_vdpa_net_load_cmd().

Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Message-Id: <3482cc50eebd13db4140b8b5dec9d0cc25b20b1b.1688743107.git.yin31149@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-07-10 18:59:32 -04:00
Hawkins Jiawei
031b1abacb vdpa: Fix possible use-after-free for VirtQueueElement
QEMU uses vhost_handle_guest_kick() to forward guest's available
buffers to the vdpa device in SVQ avail ring.

In vhost_handle_guest_kick(), a `g_autofree` `elem` is used to
iterate through the available VirtQueueElements. This `elem` is
then passed to `svq->ops->avail_handler`, specifically to the
vhost_vdpa_net_handle_ctrl_avail(). If this handler fails to
process the CVQ command, vhost_handle_guest_kick() regains
ownership of the `elem`, and either frees it or requeues it.

Yet the problem is that, vhost_vdpa_net_handle_ctrl_avail()
mistakenly frees the `elem`, even if it fails to forward the
CVQ command to vdpa device. This can result in a use-after-free
for the `elem` in vhost_handle_guest_kick().

This patch solves this problem by refactoring
vhost_vdpa_net_handle_ctrl_avail() to only freeing the `elem` if
it owns it.

Fixes: bd907ae4b0 ("vdpa: manual forward CVQ buffers")
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Message-Id: <e3f2d7db477734afe5c6a5ab3fa8b8317514ea34.1688746840.git.yin31149@gmail.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-07-10 18:59:32 -04:00
Hawkins Jiawei
6f34807116 vdpa: Return -EIO if device ack is VIRTIO_NET_ERR in _load_offloads()
According to VirtIO standard, "The class, command and
command-specific-data are set by the driver,
and the device sets the ack byte.
There is little it can do except issue a diagnostic
if ack is not VIRTIO_NET_OK."

Therefore, QEMU should stop sending the queued SVQ commands and
cancel the device startup if the device's ack is not VIRTIO_NET_OK.

Yet the problem is that, vhost_vdpa_net_load_offloads() returns 1 based on
`*s->status != VIRTIO_NET_OK` when the device's ack is VIRTIO_NET_ERR.
As a result, net->nc->info->load() also returns 1, this makes
vhost_net_start_one() incorrectly assume the device state is
successfully loaded by vhost_vdpa_net_load() and return 0, instead of
goto `fail` label to cancel the device startup, as vhost_net_start_one()
only cancels the device startup when net->nc->info->load() returns a
negative value.

This patch fixes this problem by returning -EIO when the device's
ack is not VIRTIO_NET_OK.

Fixes: 0b58d3686a ("vdpa: Add vhost_vdpa_net_load_offloads()")
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <b0396b80e96322b86f1a0b10c098fc1edd947d72.1688438055.git.yin31149@gmail.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-07-10 18:59:32 -04:00
Hawkins Jiawei
f45fd95ec9 vdpa: Return -EIO if device ack is VIRTIO_NET_ERR in _load_mq()
According to VirtIO standard, "The class, command and
command-specific-data are set by the driver,
and the device sets the ack byte.
There is little it can do except issue a diagnostic
if ack is not VIRTIO_NET_OK."

Therefore, QEMU should stop sending the queued SVQ commands and
cancel the device startup if the device's ack is not VIRTIO_NET_OK.

Yet the problem is that, vhost_vdpa_net_load_mq() returns 1 based on
`*s->status != VIRTIO_NET_OK` when the device's ack is VIRTIO_NET_ERR.
As a result, net->nc->info->load() also returns 1, this makes
vhost_net_start_one() incorrectly assume the device state is
successfully loaded by vhost_vdpa_net_load() and return 0, instead of
goto `fail` label to cancel the device startup, as vhost_net_start_one()
only cancels the device startup when net->nc->info->load() returns a
negative value.

This patch fixes this problem by returning -EIO when the device's
ack is not VIRTIO_NET_OK.

Fixes: f64c7cda69 ("vdpa: Add vhost_vdpa_net_load_mq")
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <ec515ebb0b4f56368751b9e318e245a5d994fa72.1688438055.git.yin31149@gmail.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-07-10 18:59:32 -04:00
Hawkins Jiawei
b479bc3c9d vdpa: Return -EIO if device ack is VIRTIO_NET_ERR in _load_mac()
According to VirtIO standard, "The class, command and
command-specific-data are set by the driver,
and the device sets the ack byte.
There is little it can do except issue a diagnostic
if ack is not VIRTIO_NET_OK."

Therefore, QEMU should stop sending the queued SVQ commands and
cancel the device startup if the device's ack is not VIRTIO_NET_OK.

Yet the problem is that, vhost_vdpa_net_load_mac() returns 1 based on
`*s->status != VIRTIO_NET_OK` when the device's ack is VIRTIO_NET_ERR.
As a result, net->nc->info->load() also returns 1, this makes
vhost_net_start_one() incorrectly assume the device state is
successfully loaded by vhost_vdpa_net_load() and return 0, instead of
goto `fail` label to cancel the device startup, as vhost_net_start_one()
only cancels the device startup when net->nc->info->load() returns a
negative value.

This patch fixes this problem by returning -EIO when the device's
ack is not VIRTIO_NET_OK.

Fixes: f73c0c43ac ("vdpa: extract vhost_vdpa_net_load_mac from vhost_vdpa_net_load")
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <a21731518644abbd0c495c5b7960527c5911f80d.1688438055.git.yin31149@gmail.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-07-10 18:59:32 -04:00
Hawkins Jiawei
2875a0ca02 vdpa: Sort vdpa_feature_bits array alphabetically
This patch sorts the vdpa_feature_bits array
alphabetically in ascending order to avoid future duplicates.

Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-07-08 07:24:38 +03:00
Hawkins Jiawei
aee9701729 vdpa: Delete duplicated VIRTIO_NET_F_RSS in vdpa_feature_bits
This entry was duplicated on referenced commit. Removing it.

Fixes: 402378407d ("vhost-vdpa: multiqueue support")
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-07-08 07:24:38 +03:00
Laurent Vivier
b6aeee0298 net: socket: remove net_init_socket()
Move the file descriptor type checking before doing anything with it.
If it's not usable, don't close it as it could be in use by another
part of QEMU, only fail and report an error.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2023-07-07 16:35:12 +08:00
Laurent Vivier
23455ae341 net: socket: move fd type checking to its own function
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2023-07-07 16:35:12 +08:00
Laurent Vivier
006c3fa74c net: socket: prepare to cleanup net_init_socket()
Use directly net_socket_fd_init_stream() and net_socket_fd_init_dgram()
when the socket type is already known.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2023-07-07 16:35:12 +08:00
Ani Sinha
a0d7215e33 vhost-vdpa: do not cleanup the vdpa/vhost-net structures if peer nic is present
When a peer nic is still attached to the vdpa backend, it is too early to free
up the vhost-net and vdpa structures. If these structures are freed here, then
QEMU crashes when the guest is being shut down. The following call chain
would result in an assertion failure since the pointer returned from
vhost_vdpa_get_vhost_net() would be NULL:

do_vm_stop() -> vm_state_notify() -> virtio_set_status() ->
virtio_net_vhost_status() -> get_vhost_net().

Therefore, we defer freeing up the structures until at guest shutdown
time when qemu_cleanup() calls net_cleanup() which then calls
qemu_del_net_client() which would eventually call vhost_vdpa_cleanup()
again to free up the structures. This time, the loop in net_cleanup()
ensures that vhost_vdpa_cleanup() will be called one last time when
all the peer nics are detached and freed.

All unit tests pass with this change.

CC: imammedo@redhat.com
CC: jusual@redhat.com
CC: mst@redhat.com
Fixes: CVE-2023-3301
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Message-Id: <20230619065209.442185-1-anisinha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-26 09:50:00 -04:00
Eugenio Pérez
d45243bcfc vdpa: fix not using CVQ buffer in case of error
Bug introducing when refactoring.  Otherway, the guest never received
the used buffer.

Fixes: be4278b65f ("vdpa: extract vhost_vdpa_net_cvq_add from vhost_vdpa_net_handle_ctrl_avail")
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20230602173451.1917999-1-eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
2023-06-26 09:50:00 -04:00
Eugenio Pérez
51e84244a7 vdpa: mask _F_CTRL_GUEST_OFFLOADS for vhost vdpa devices
QEMU does not emulate it so it must be disabled as long as the backend
does not support it.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20230602173328.1917385-1-eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
2023-06-26 09:50:00 -04:00
Hawkins Jiawei
4b4a1378b9 vdpa: Allow VIRTIO_NET_F_CTRL_GUEST_OFFLOADS in SVQ
Enable SVQ with VIRTIO_NET_F_CTRL_GUEST_OFFLOADS feature.

Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <778d642ecae6deed8a218b0e6232e4d7bb96b439.1685704856.git.yin31149@gmail.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Tested-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-26 09:50:00 -04:00
Hawkins Jiawei
0b58d3686a vdpa: Add vhost_vdpa_net_load_offloads()
This patch introduces vhost_vdpa_net_load_offloads() to
restore offloads state at device's startup.

Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Message-Id: <7e2b5cad9c48c917df53d80dec27dbfeb513e1a3.1685704856.git.yin31149@gmail.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Tested-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-26 09:50:00 -04:00
Hawkins Jiawei
02d3bf099b vdpa: reuse virtio_vdev_has_feature()
We can use virtio_vdev_has_feature() instead of manually
accessing the features.

Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Acked-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <ff838d30206209fd865511b16ffb34cc0d5e8d8f.1685704856.git.yin31149@gmail.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
Tested-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-26 09:50:00 -04:00