qemu-e2k

Author	SHA1	Message	Date
Steve Sistare	d9cda21303	migration: simplify notifiers Pass the callback function to add_migration_state_change_notifier so that migration can initialize the notifier on add and clear it on delete, which simplifies the call sites. Shorten the function names so the extra arg can be added more legibly. Hide the global notifier list in a new function migration_call_notifiers, and make it externally visible so future live update code can call it. No functional change. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Michael Galaxy <mgalaxy@akamai.com> Reviewed-by: Michael Galaxy <mgalaxy@akamai.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <1686148954-250144-1-git-send-email-steven.sistare@oracle.com>	2023-10-20 08:51:41 +02:00
Hawkins Jiawei	acec5f685c	vdpa: Send cvq state load commands in parallel This patch enables sending CVQ state load commands in parallel at device startup by following steps: * Refactor vhost_vdpa_net_load_cmd() to iterate through the control commands shadow buffers. This allows different CVQ state load commands to use their own unique buffers. * Delay the polling and checking of buffers until either the SVQ is full or control commands shadow buffers are full. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1578 Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <9350f32278e39f7bce297b8f2d82dac27c6f8c9a.1697165821.git.yin31149@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-10-18 10:41:50 -04:00
Hawkins Jiawei	1d7e2a8fd4	vdpa: Introduce cursors to vhost_vdpa_net_loadx() This patch introduces two new arugments, `out_cursor` and `in_cursor`, to vhost_vdpa_net_loadx(). Addtionally, it includes a helper function vhost_vdpa_net_load_cursor_reset() for resetting these cursors. Furthermore, this patch refactors vhost_vdpa_net_load_cmd() so that vhost_vdpa_net_load_cmd() prepares buffers for the device using the cursors arguments, instead of directly accesses `s->cvq_cmd_out_buffer` and `s->status` fields. By making these change, next patches in this series can refactor vhost_vdpa_net_load_cmd() directly to iterate through the control commands shadow buffers, allowing QEMU to send CVQ state load commands in parallel at device startup. Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Message-Id: <1c6516e233a14cc222f0884e148e4e1adceda78d.1697165821.git.yin31149@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-10-18 10:41:50 -04:00
Hawkins Jiawei	a864a3219d	vdpa: Move vhost_svq_poll() to the caller of vhost_vdpa_net_cvq_add() This patch moves vhost_svq_poll() to the caller of vhost_vdpa_net_cvq_add() and introduces a helper funtion. By making this change, next patches in this series is able to refactor vhost_vdpa_net_load_x() only to delay the polling and checking process until either the SVQ is full or control commands shadow buffers are full. Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Message-Id: <196cadb55175a75275660c6634a538289f027ae3.1697165821.git.yin31149@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-10-18 10:41:50 -04:00
Hawkins Jiawei	24e59cfe0c	vdpa: Check device ack in vhost_vdpa_net_load_rx_mode() Considering that vhost_vdpa_net_load_rx_mode() is only called within vhost_vdpa_net_load_rx() now, this patch refactors vhost_vdpa_net_load_rx_mode() to include a check for the device's ack, simplifying the code and improving its maintainability. Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <68811d52f96ae12d68f0d67d996ac1642a623943.1697165821.git.yin31149@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-10-18 10:41:49 -04:00
Hawkins Jiawei	327dedb8df	vdpa: Avoid using vhost_vdpa_net_load_() outside vhost_vdpa_net_load() Next patches in this series will refactor vhost_vdpa_net_load_cmd() to iterate through the control commands shadow buffers, allowing QEMU to send CVQ state load commands in parallel at device startup. Considering that QEMU always forwards the CVQ command serialized outside of vhost_vdpa_net_load(), it is more elegant to send the CVQ commands directly without invoking vhost_vdpa_net_load_() helpers. Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <254f0618efde7af7229ba4fdada667bb9d318991.1697165821.git.yin31149@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-10-18 10:41:49 -04:00
Hawkins Jiawei	0e6bff0d43	vdpa: Use iovec for vhost_vdpa_net_cvq_add() Next patches in this series will no longer perform an immediate poll and check of the device's used buffers for each CVQ state load command. Consequently, there will be multiple pending buffers in the shadow VirtQueue, making it a must for every control command to have its own buffer. To achieve this, this patch refactor vhost_vdpa_net_cvq_add() to accept `struct iovec`, which eliminates the coupling of control commands to `s->cvq_cmd_out_buffer` and `s->status`, allowing them to use their own buffer. Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <8a328f146fb043f34edb75ba6d043d2d6de88f99.1697165821.git.yin31149@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-10-18 10:41:49 -04:00
Philippe Mathieu-Daudé	73071f1923	net/net: Clean up global variable shadowing Fix: net/net.c:1680:35: error: declaration shadows a variable in the global scope [-Werror,-Wshadow] bool netdev_is_modern(const char optarg) ^ net/net.c:1714:38: error: declaration shadows a variable in the global scope [-Werror,-Wshadow] void netdev_parse_modern(const char optarg) ^ net/net.c:1728:60: error: declaration shadows a variable in the global scope [-Werror,-Wshadow] void net_client_parse(QemuOptsList opts_list, const char optarg) ^ /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/getopt.h:77:14: note: previous declaration is here extern char optarg; / getopt(3) external variables */ ^ Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20231004120019.93101-4-philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2023-10-06 13:27:43 +02:00
Stefan Hajnoczi	2f3913f4b2	virtio,pci: features, cleanups vdpa: shadow vq vlan support net migration with cvq cxl: support emulating 4 HDM decoders serial number extended capability virtio: hared dma-buf Fixes, cleanups all over the place. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmUd4/YPHG1zdEByZWRo YXQuY29tAAoJECgfDbjSjVRpyM8H/02cRbJcQOjYt7j68zPW6GaDXxBI/UmdWDyG 15LZZbGNOPjyjNd3Vz1M7stQ5rhoKcgo/RdI+0E60a78svgW5JvpXoXR3pksc3Dx v28B/akXwHUErYFSZQ+2VHNc8OhCd0v2ehxZxbwPEAYIOAj3hcCIVoPGXTnKJmAJ imr5hjH0wZUc0+xdsmn8Vfdv5NTzpwfVObbGiMZejeJsaoh0y6Rt8RANBMY67KQD S7/HPlVuDYf/y43t4ZEHNYuV9RaCdZZYlLWwV1scdKaYcofgmtJOKbOdCjHRXgj+ 004Afb3rggIoCfnCzOFzhGx+MLDtLjvEn2N4oLEWCLi+k/3huaA= =GAvH -----END PGP SIGNATURE----- Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging virtio,pci: features, cleanups vdpa: shadow vq vlan support net migration with cvq cxl: support emulating 4 HDM decoders serial number extended capability virtio: hared dma-buf Fixes, cleanups all over the place. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> * tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (53 commits) libvhost-user: handle shared_object msg vhost-user: add shared_object msg hw/display: introduce virtio-dmabuf util/uuid: add a hash function virtio: remove unused next argument from virtqueue_split_read_next_desc() virtio: remove unnecessary thread fence while reading next descriptor virtio: use shadow_avail_idx while checking number of heads libvhost-user.c: add assertion to vu_message_read_default pcie_sriov: unregister_vfs(): fix error path hw/i386/pc: improve physical address space bound check for 32-bit x86 systems amd_iommu: Fix APIC address check vdpa net: follow VirtIO initialization properly at cvq isolation probing vdpa net: stop probing if cannot set features vdpa net: fix error message setting virtio status hw/pci-bridge/cxl-upstream: Add serial number extended capability support hw/cxl: Support 4 HDM decoders at all levels of topology hw/cxl: Fix and use same calculation for HDM decoder block size everywhere hw/cxl: Add utility functions decoder interleave ways and target count. hw/cxl: Push cxl_decoder_count_enc() and cxl_decode_ig() into .c vdpa net: zero vhost_vdpa iova_tree pointer at cleanup ... Conflicts: hw/core/machine.c Context conflict with commit `314e0a84cd` ("hw/core: remove needless includes") because it removed an adjacent #include.	2023-10-05 09:01:01 -04:00
Eugenio Pérez	845ec38ae1	vdpa net: follow VirtIO initialization properly at cvq isolation probing This patch solves a few issues. The most obvious is that the feature set was done previous to ACKNOWLEDGE \| DRIVER status bit set. Current vdpa devices are permissive with this, but it is better to follow the standard. Fixes: `152128d646` ("vdpa: move CVQ isolation check to net_init_vhost_vdpa") Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20230915170836.3078172-4-eperezma@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-10-04 18:15:06 -04:00
Eugenio Pérez	f1085882d0	vdpa net: stop probing if cannot set features Otherwise it continues the CVQ isolation probing. Fixes: `152128d646` ("vdpa: move CVQ isolation check to net_init_vhost_vdpa") Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20230915170836.3078172-3-eperezma@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>	2023-10-04 18:15:06 -04:00
Eugenio Pérez	cbc9ae87b5	vdpa net: fix error message setting virtio status It incorrectly prints "error setting features", probably because a copy paste miss. Fixes: `152128d646` ("vdpa: move CVQ isolation check to net_init_vhost_vdpa") Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20230915170836.3078172-2-eperezma@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>	2023-10-04 18:15:06 -04:00
Eugenio Pérez	0a7a164bc3	vdpa net: zero vhost_vdpa iova_tree pointer at cleanup Not zeroing it causes a SIGSEGV if the live migration is cancelled, at net device restart. This is caused because CVQ tries to reuse the iova_tree that is present in the first vhost_vdpa device at the end of vhost_vdpa_net_cvq_start. As a consequence, it tries to access an iova_tree that has been already free. Fixes: `00ef422e9f` ("vdpa net: move iova tree creation from init to start") Reported-by: Yanhui Ma <yama@redhat.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20230913123408.2819185-1-eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-10-04 18:15:06 -04:00
Stefan Hajnoczi	e77db790d1	vdpa: fix gcc cvq_isolated uninitialized variable warning gcc 13.2.1 emits the following warning: net/vhost-vdpa.c: In function ‘net_vhost_vdpa_init.constprop’: net/vhost-vdpa.c:1394:25: error: ‘cvq_isolated’ may be used uninitialized [-Werror=maybe-uninitialized] 1394 \| s->cvq_isolated = cvq_isolated; \| ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~ net/vhost-vdpa.c:1355:9: note: ‘cvq_isolated’ was declared here 1355 \| int cvq_isolated; \| ^~~~~~~~~~~~ cc1: all warnings being treated as errors Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230911215435.4156314-1-stefanha@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-10-04 18:15:06 -04:00
Hawkins Jiawei	b0de17a2e2	vhost: Add count argument to vhost_svq_poll() Next patches in this series will no longer perform an immediate poll and check of the device's used buffers for each CVQ state load command. Instead, they will send CVQ state load commands in parallel by polling multiple pending buffers at once. To achieve this, this patch refactoring vhost_svq_poll() to accept a new argument `num`, which allows vhost_svq_poll() to wait for the device to use multiple elements, rather than polling for a single element. Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <950b3bfcfc5d446168b9d6a249d554a013a691d4.1693287885.git.yin31149@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-10-04 04:54:23 -04:00
Eugenio Pérez	f13f5f6412	vdpa: remove net cvq migration blocker Now that we have add migration blockers if the device does not support all the needed features, remove the general blocker applied to all net devices with CVQ. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230822085330.3978829-6-eperezma@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-10-04 04:54:22 -04:00
Eugenio Pérez	6c4825476a	vdpa: move vhost_vdpa_set_vring_ready to the caller Doing that way allows CVQ to be enabled before the dataplane vqs, restoring the state as MQ or MAC addresses properly in the case of a migration. The patch does it by defining a ->load NetClientInfo callback also for dataplane. Ideally, this should be done by an independent patch, but the function is already static so it would only add an empty vhost_vdpa_net_data_load stub. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20230822085330.3978829-5-eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-10-04 04:54:21 -04:00
Eugenio Pérez	f3fada598c	vdpa: rename vhost_vdpa_net_load to vhost_vdpa_net_cvq_load Next patches will add the corresponding data load. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230822085330.3978829-4-eperezma@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-10-04 04:54:19 -04:00
Eugenio Pérez	b40eba9cdd	vdpa: use first queue SVQ state for CVQ default Previous to this patch the only way CVQ would be shadowed is if it does support to isolate CVQ group or if all vqs were shadowed from the beginning. The second condition was checked at the beginning, and no more configuration was done. After this series we need to check if data queues are shadowed because they are in the middle of the migration. As checking if they are shadowed already covers the previous case, let's just mimic it. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230822085330.3978829-2-eperezma@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-10-04 04:54:17 -04:00
Hawkins Jiawei	e213c45a04	vdpa: Allow VIRTIO_NET_F_CTRL_VLAN in SVQ Enable SVQ with VIRTIO_NET_F_CTRL_VLAN feature. Co-developed-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Message-Id: <38dc63102a42c31c72fd293d0e6e2828fd54c86e.1690106284.git.yin31149@gmail.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-10-04 04:54:13 -04:00
Hawkins Jiawei	8f7e996748	vdpa: Restore vlan filtering state This patch introduces vhost_vdpa_net_load_single_vlan() and vhost_vdpa_net_load_vlan() to restore the vlan filtering state at device's startup. Co-developed-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Message-Id: <e76a29f77bb3f386e4a643c8af94b77b775d1752.1690106284.git.yin31149@gmail.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-10-04 04:54:10 -04:00
Philippe Mathieu-Daudé	1728593a82	net/eth: Clean up local variable shadowing Fix: net/eth.c:435:20: error: declaration shadows a local variable [-Werror,-Wshadow] size_t input_size = iov_size(pkt, pkt_frags); ^ net/eth.c:413:16: note: previous declaration is here size_t input_size = iov_size(pkt, pkt_frags); ^ Suggested-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20230904161235.84651-16-philmd@linaro.org> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2023-09-29 10:07:16 +02:00
Peter Maydell	6d7a53e9f1	net/tap: Avoid variable-length array Use a heap allocation instead of a variable length array in tap_receive_iov(). The codebase has very few VLAs, and if we can get rid of them all we can make the compiler error on new additions. This is a defensive measure against security bugs where an on-stack dynamic allocation isn't correctly size-checked (e.g. CVE-2021-3527). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2023-09-18 14:36:13 +08:00
Peter Maydell	c4cf68198e	net/dump: Avoid variable length array Use a g_autofree heap allocation instead of a variable length array in dump_receive_iov(). The codebase has very few VLAs, and if we can get rid of them all we can make the compiler error on new additions. This is a defensive measure against security bugs where an on-stack dynamic allocation isn't correctly size-checked (e.g. CVE-2021-3527). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2023-09-18 14:36:13 +08:00
Ilya Maximets	cb039ef3d9	net: add initial support for AF_XDP network backend AF_XDP is a network socket family that allows communication directly with the network device driver in the kernel, bypassing most or all of the kernel networking stack. In the essence, the technology is pretty similar to netmap. But, unlike netmap, AF_XDP is Linux-native and works with any network interfaces without driver modifications. Unlike vhost-based backends (kernel, user, vdpa), AF_XDP doesn't require access to character devices or unix sockets. Only access to the network interface itself is necessary. This patch implements a network backend that communicates with the kernel by creating an AF_XDP socket. A chunk of userspace memory is shared between QEMU and the host kernel. 4 ring buffers (Tx, Rx, Fill and Completion) are placed in that memory along with a pool of memory buffers for the packet data. Data transmission is done by allocating one of the buffers, copying packet data into it and placing the pointer into Tx ring. After transmission, device will return the buffer via Completion ring. On Rx, device will take a buffer form a pre-populated Fill ring, write the packet data into it and place the buffer into Rx ring. AF_XDP network backend takes on the communication with the host kernel and the network interface and forwards packets to/from the peer device in QEMU. Usage example: -device virtio-net-pci,netdev=guest1,mac=00:16:35:AF:AA:5C -netdev af-xdp,ifname=ens6f1np1,id=guest1,mode=native,queues=1 XDP program bridges the socket with a network interface. It can be attached to the interface in 2 different modes: 1. skb - this mode should work for any interface and doesn't require driver support. With a caveat of lower performance. 2. native - this does require support from the driver and allows to bypass skb allocation in the kernel and potentially use zero-copy while getting packets in/out userspace. By default, QEMU will try to use native mode and fall back to skb. Mode can be forced via 'mode' option. To force 'copy' even in native mode, use 'force-copy=on' option. This might be useful if there is some issue with the driver. Option 'queues=N' allows to specify how many device queues should be open. Note that all the queues that are not open are still functional and can receive traffic, but it will not be delivered to QEMU. So, the number of device queues should generally match the QEMU configuration, unless the device is shared with something else and the traffic re-direction to appropriate queues is correctly configured on a device level (e.g. with ethtool -N). 'start-queue=M' option can be used to specify from which queue id QEMU should start configuring 'N' queues. It might also be necessary to use this option with certain NICs, e.g. MLX5 NICs. See the docs for examples. In a general case QEMU will need CAP_NET_ADMIN and CAP_SYS_ADMIN or CAP_BPF capabilities in order to load default XSK/XDP programs to the network interface and configure BPF maps. It is possible, however, to run with no capabilities. For that to work, an external process with enough capabilities will need to pre-load default XSK program, create AF_XDP sockets and pass their file descriptors to QEMU process on startup via 'sock-fds' option. Network backend will need to be configured with 'inhibit=on' to avoid loading of the program. QEMU will need 32 MB of locked memory (RLIMIT_MEMLOCK) per queue or CAP_IPC_LOCK. There are few performance challenges with the current network backends. First is that they do not support IO threads. This means that data path is handled by the main thread in QEMU and may slow down other work or may be slowed down by some other work. This also means that taking advantage of multi-queue is generally not possible today. Another thing is that data path is going through the device emulation code, which is not really optimized for performance. The fastest "frontend" device is virtio-net. But it's not optimized for heavy traffic either, because it expects such use-cases to be handled via some implementation of vhost (user, kernel, vdpa). In practice, we have virtio notifications and rcu lock/unlock on a per-packet basis and not very efficient accesses to the guest memory. Communication channels between backend and frontend devices do not allow passing more than one packet at a time as well. Some of these challenges can be avoided in the future by adding better batching into device emulation or by implementing vhost-af-xdp variant. There are also a few kernel limitations. AF_XDP sockets do not support any kinds of checksum or segmentation offloading. Buffers are limited to a page size (4K), i.e. MTU is limited. Multi-buffer support implementation for AF_XDP is in progress, but not ready yet. Also, transmission in all non-zero-copy modes is synchronous, i.e. done in a syscall. That doesn't allow high packet rates on virtual interfaces. However, keeping in mind all of these challenges, current implementation of the AF_XDP backend shows a decent performance while running on top of a physical NIC with zero-copy support. Test setup: 2 VMs running on 2 physical hosts connected via ConnectX6-Dx card. Network backend is configured to open the NIC directly in native mode. The driver supports zero-copy. NIC is configured to use 1 queue. Inside a VM - iperf3 for basic TCP performance testing and dpdk-testpmd for PPS testing. iperf3 result: TCP stream : 19.1 Gbps dpdk-testpmd (single queue, single CPU core, 64 B packets) results: Tx only : 3.4 Mpps Rx only : 2.0 Mpps L2 FWD Loopback : 1.5 Mpps In skb mode the same setup shows much lower performance, similar to the setup where pair of physical NICs is replaced with veth pair: iperf3 result: TCP stream : 9 Gbps dpdk-testpmd (single queue, single CPU core, 64 B packets) results: Tx only : 1.2 Mpps Rx only : 1.0 Mpps L2 FWD Loopback : 0.7 Mpps Results in skb mode or over the veth are close to results of a tap backend with vhost=on and disabled segmentation offloading bridged with a NIC. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> (docker/lcitool) Signed-off-by: Jason Wang <jasowang@redhat.com>	2023-09-18 14:36:13 +08:00
Andrew Melnychenko	9da1684954	virtio-net: Add USO flags to vhost support. New features are subject to check with vhost-user and vdpa. Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com> Signed-off-by: Andrew Melnychenko <andrew@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2023-09-18 14:36:13 +08:00
Yuri Benditovich	f03e0cf63b	tap: Add check for USO features Tap indicates support for USO features according to capabilities of current kernel module. Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com> Signed-off-by: Andrew Melnychecnko <andrew@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2023-09-18 14:36:13 +08:00
Andrew Melnychenko	2ab0ec3121	tap: Add USO support to tap device. Passing additional parameters (USOv4 and USOv6 offloads) when setting TAP offloads Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com> Signed-off-by: Andrew Melnychenko <andrew@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2023-09-18 14:36:13 +08:00
Jonathan Perkin	fb0a8b0e23	meson: Fix targetos match for illumos and Solaris. qemu 8.1.0 breaks on illumos platforms due to _XOPEN_SOURCE and others no longer being set correctly, leading to breakage such as: https://us-central.manta.mnx.io/pkgsrc/public/reports/trunk/tools/20230908.1404/qemu-8.1.0/build.log This is a result of meson conversion which incorrectly matches against 'solaris' instead of 'sunos' for uname. First time submitting a patch here, hope I did it correctly. Thanks. Signed-off-by: Jonathan Perkin <jonathan@perkin.org.uk> Message-ID: <ZPtdxtum9UVPy58J@perkin.org.uk> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-09-13 09:33:51 +02:00
Michael Tokarev	0a19d87995	misc/other: spelling fixes Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Eric Blake <eblake@redhat.com>	2023-09-08 13:08:52 +03:00
Stefan Hajnoczi	03a3a62fbd	* only build util/async-teardown.c when system build is requested * target/i386: fix BQL handling of the legacy FERR interrupts * target/i386: fix memory operand size for CVTPS2PD * target/i386: Add support for AMX-COMPLEX in CPUID enumeration * compile plugins on Darwin * configure and meson cleanups * drop mkvenv support for Python 3.7 and Debian10 * add wrap file for libblkio * tweak KVM stubs -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmT5t6UUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroMmjwf+MpvVuq+nn+3PqGUXgnzJx5ccA5ne O9Xy8+1GdlQPzBw/tPovxXDSKn3HQtBfxObn2CCE1tu/4uHWpBA1Vksn++NHdUf2 P0yoHxGskJu5iYYTtIcNw5cH2i+AizdiXuEjhfNjqD5Y234cFoHnUApt9e3zBvVO cwGD7WpPuSb4g38hHkV6nKcx72o7b4ejDToqUVZJ2N+RkddSqB03fSdrOru0hR7x V+lay0DYdFszNDFm05LJzfDbcrHuSryGA91wtty7Fzj6QhR/HBHQCUZJxMB5PI7F Zy4Zdpu60zxtSxUqeKgIi7UhNFgMcax2Hf9QEqdc/B4ARoBbboh4q4u8kQ== =dH7/ -----END PGP SIGNATURE----- Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging * only build util/async-teardown.c when system build is requested * target/i386: fix BQL handling of the legacy FERR interrupts * target/i386: fix memory operand size for CVTPS2PD * target/i386: Add support for AMX-COMPLEX in CPUID enumeration * compile plugins on Darwin * configure and meson cleanups * drop mkvenv support for Python 3.7 and Debian10 * add wrap file for libblkio * tweak KVM stubs # -----BEGIN PGP SIGNATURE----- # # iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmT5t6UUHHBib256aW5p # QHJlZGhhdC5jb20ACgkQv/vSX3jHroMmjwf+MpvVuq+nn+3PqGUXgnzJx5ccA5ne # O9Xy8+1GdlQPzBw/tPovxXDSKn3HQtBfxObn2CCE1tu/4uHWpBA1Vksn++NHdUf2 # P0yoHxGskJu5iYYTtIcNw5cH2i+AizdiXuEjhfNjqD5Y234cFoHnUApt9e3zBvVO # cwGD7WpPuSb4g38hHkV6nKcx72o7b4ejDToqUVZJ2N+RkddSqB03fSdrOru0hR7x # V+lay0DYdFszNDFm05LJzfDbcrHuSryGA91wtty7Fzj6QhR/HBHQCUZJxMB5PI7F # Zy4Zdpu60zxtSxUqeKgIi7UhNFgMcax2Hf9QEqdc/B4ARoBbboh4q4u8kQ== # =dH7/ # -----END PGP SIGNATURE----- # gpg: Signature made Thu 07 Sep 2023 07:44:37 EDT # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (51 commits) docs/system/replay: do not show removed command line option subprojects: add wrap file for libblkio sysemu/kvm: Restrict kvm_pc_setup_irq_routing() to x86 targets sysemu/kvm: Restrict kvm_has_pit_state2() to x86 targets sysemu/kvm: Restrict kvm_get_apic_state() to x86 targets sysemu/kvm: Restrict kvm_arch_get_supported_cpuid/msr() to x86 targets target/i386: Restrict declarations specific to CONFIG_KVM target/i386: Allow elision of kvm_hv_vpindex_settable() target/i386: Allow elision of kvm_enable_x2apic() target/i386: Remove unused KVM stubs target/i386/cpu-sysemu: Inline kvm_apic_in_kernel() target/i386/helper: Restrict KVM declarations to system emulation hw/i386/fw_cfg: Include missing 'cpu.h' header hw/i386/pc: Include missing 'cpu.h' header hw/i386/pc: Include missing 'sysemu/tcg.h' header Revert "mkvenv: work around broken pip installations on Debian 10" mkvenv: assume presence of importlib.metadata Python: Drop support for Python 3.7 configure: remove dead code meson: list leftover CONFIG_* symbols ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2023-09-07 10:29:06 -04:00
Paolo Bonzini	73258b3864	configure, meson: remove CONFIG_SOLARIS from config-host.mak CONFIG_SOLARIS is only used to pick tap implementations. But the target OS is invariant and does not depend on the configuration, so move away from config_host and just use unconditional rules in softmmu_ss. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-09-07 13:32:37 +02:00
Philippe Mathieu-Daudé	53c7c92422	hw/char: Have FEWatchFunc handlers return G_SOURCE_CONTINUE/REMOVE GLib recommend to use G_SOURCE_REMOVE / G_SOURCE_CONTINUE for GSourceFunc callbacks. Our FEWatchFunc is a GSourceFunc returning such value. Use such definitions which are "more memorable" []. [] https://docs.gtk.org/glib/callback.SourceFunc.html#return-value Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20230705133139.54419-5-philmd@linaro.org>	2023-08-31 19:47:43 +02:00
Hawkins Jiawei	d669b7bba2	vdpa: Allow VIRTIO_NET_F_CTRL_RX_EXTRA in SVQ Enable SVQ with VIRTIO_NET_F_CTRL_RX_EXTRA feature. Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <15ecc49975f9b8d1316ed4296879564a18abf31e.1688797728.git.yin31149@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-07-10 18:59:32 -04:00
Hawkins Jiawei	4fd180c7bb	vdpa: Restore packet receive filtering state relative with _F_CTRL_RX_EXTRA feature This patch refactors vhost_vdpa_net_load_rx() to restore the packet receive filtering state in relation to VIRTIO_NET_F_CTRL_RX_EXTRA feature at device's startup. Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Message-Id: <abddc477a476f756de6e3d24c0e9f7b21c99a4c1.1688797728.git.yin31149@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-07-10 18:59:32 -04:00
Hawkins Jiawei	ea6eec4979	vdpa: Allow VIRTIO_NET_F_CTRL_RX in SVQ Enable SVQ with VIRTIO_NET_F_CTRL_RX feature. Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <5d6173a6d7c4c514c98362b404c019f52d73b06c.1688743107.git.yin31149@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-07-10 18:59:32 -04:00
Hawkins Jiawei	fee364e4b1	vdpa: Avoid forwarding large CVQ command failures Due to the size limitation of the out buffer sent to the vdpa device, which is determined by vhost_vdpa_net_cvq_cmd_len(), excessive CVQ command is truncated in QEMU. As a result, the vdpa device rejects this flawd CVQ command. However, the problem is that, the VIRTIO_NET_CTRL_MAC_TABLE_SET CVQ command has a variable length, which may exceed vhost_vdpa_net_cvq_cmd_len() if the guest sets more than `MAC_TABLE_ENTRIES` MAC addresses for the filter table. This patch solves this problem by following steps: * Increase the out buffer size to vhost_vdpa_net_cvq_cmd_page_len(), which represents the size of the buffer that is allocated and mmaped. This ensures that everything works correctly as long as the guest sets fewer than `(vhost_vdpa_net_cvq_cmd_page_len() - sizeof(struct virtio_net_ctrl_hdr) - 2 * sizeof(struct virtio_net_ctrl_mac)) / ETH_ALEN` MAC addresses. Considering the highly unlikely scenario for the guest setting more than that number of MAC addresses for the filter table, this should work fine for the majority of cases. * If the CVQ command exceeds vhost_vdpa_net_cvq_cmd_page_len(), instead of directly sending this CVQ command, QEMU should send a VIRTIO_NET_CTRL_RX_PROMISC CVQ command to vdpa device. Addtionally, a fake VIRTIO_NET_CTRL_MAC_TABLE_SET command including (`MAC_TABLE_ENTRIES` + 1) non-multicast MAC addresses and (`MAC_TABLE_ENTRIES` + 1) multicast MAC addresses should be provided to the device model. By doing so, the vdpa device turns promiscuous mode on, aligning with the VirtIO standard. The device model marks `n->mac_table.uni_overflow` and `n->mac_table.multi_overflow`, which aligns with the state of the vdpa device. Note that the bug cannot be triggered at the moment, since VIRTIO_NET_F_CTRL_RX feature is not enabled for SVQ. Fixes: `7a7f87e94c` ("vdpa: Move command buffers map to start of net device") Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Message-Id: <267e15e4eed2d7aeb9887f193da99a13d22a2f1d.1688743107.git.yin31149@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-07-10 18:59:32 -04:00
Hawkins Jiawei	45c4101828	vdpa: Accessing CVQ header through its structure We can access the CVQ header through `struct virtio_net_ctrl_hdr`, instead of accessing it through a `uint8_t` pointer, which improves the code's readability and maintainability. Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Message-Id: <cd522e06a4371e9d6b8a1c1a86f90a92401d56e8.1688743107.git.yin31149@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-07-10 18:59:32 -04:00
Hawkins Jiawei	b12f907eea	vdpa: Restore packet receive filtering state relative with _F_CTRL_RX feature This patch introduces vhost_vdpa_net_load_rx_mode() and vhost_vdpa_net_load_rx() to restore the packet receive filtering state in relation to VIRTIO_NET_F_CTRL_RX feature at device's startup. Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Message-Id: <804cedac93e19ba3b810d52b274ca5ec11469f09.1688743107.git.yin31149@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-07-10 18:59:32 -04:00
Hawkins Jiawei	0ddcecb8f2	vdpa: Restore MAC address filtering state This patch refactors vhost_vdpa_net_load_mac() to restore the MAC address filtering state at device's startup. Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Message-Id: <4b9550c14bc8c98c8f48e04dbf3d3ac41489d3fd.1688743107.git.yin31149@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-07-10 18:59:32 -04:00
Hawkins Jiawei	2848c6aa75	vdpa: Use iovec for vhost_vdpa_net_load_cmd() According to VirtIO standard, "The driver MUST follow the VIRTIO_NET_CTRL_MAC_TABLE_SET command by a le32 number, followed by that number of non-multicast MAC addresses, followed by another le32 number, followed by that number of multicast addresses." Considering that these data is not stored in contiguous memory, this patch refactors vhost_vdpa_net_load_cmd() to accept scattered data, eliminating the need for an addtional data copy or packing the data into s->cvq_cmd_out_buffer outside of vhost_vdpa_net_load_cmd(). Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Message-Id: <3482cc50eebd13db4140b8b5dec9d0cc25b20b1b.1688743107.git.yin31149@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-07-10 18:59:32 -04:00
Hawkins Jiawei	031b1abacb	vdpa: Fix possible use-after-free for VirtQueueElement QEMU uses vhost_handle_guest_kick() to forward guest's available buffers to the vdpa device in SVQ avail ring. In vhost_handle_guest_kick(), a `g_autofree` `elem` is used to iterate through the available VirtQueueElements. This `elem` is then passed to `svq->ops->avail_handler`, specifically to the vhost_vdpa_net_handle_ctrl_avail(). If this handler fails to process the CVQ command, vhost_handle_guest_kick() regains ownership of the `elem`, and either frees it or requeues it. Yet the problem is that, vhost_vdpa_net_handle_ctrl_avail() mistakenly frees the `elem`, even if it fails to forward the CVQ command to vdpa device. This can result in a use-after-free for the `elem` in vhost_handle_guest_kick(). This patch solves this problem by refactoring vhost_vdpa_net_handle_ctrl_avail() to only freeing the `elem` if it owns it. Fixes: `bd907ae4b0` ("vdpa: manual forward CVQ buffers") Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Message-Id: <e3f2d7db477734afe5c6a5ab3fa8b8317514ea34.1688746840.git.yin31149@gmail.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-07-10 18:59:32 -04:00
Hawkins Jiawei	6f34807116	vdpa: Return -EIO if device ack is VIRTIO_NET_ERR in _load_offloads() According to VirtIO standard, "The class, command and command-specific-data are set by the driver, and the device sets the ack byte. There is little it can do except issue a diagnostic if ack is not VIRTIO_NET_OK." Therefore, QEMU should stop sending the queued SVQ commands and cancel the device startup if the device's ack is not VIRTIO_NET_OK. Yet the problem is that, vhost_vdpa_net_load_offloads() returns 1 based on `*s->status != VIRTIO_NET_OK` when the device's ack is VIRTIO_NET_ERR. As a result, net->nc->info->load() also returns 1, this makes vhost_net_start_one() incorrectly assume the device state is successfully loaded by vhost_vdpa_net_load() and return 0, instead of goto `fail` label to cancel the device startup, as vhost_net_start_one() only cancels the device startup when net->nc->info->load() returns a negative value. This patch fixes this problem by returning -EIO when the device's ack is not VIRTIO_NET_OK. Fixes: `0b58d3686a` ("vdpa: Add vhost_vdpa_net_load_offloads()") Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <b0396b80e96322b86f1a0b10c098fc1edd947d72.1688438055.git.yin31149@gmail.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-07-10 18:59:32 -04:00
Hawkins Jiawei	f45fd95ec9	vdpa: Return -EIO if device ack is VIRTIO_NET_ERR in _load_mq() According to VirtIO standard, "The class, command and command-specific-data are set by the driver, and the device sets the ack byte. There is little it can do except issue a diagnostic if ack is not VIRTIO_NET_OK." Therefore, QEMU should stop sending the queued SVQ commands and cancel the device startup if the device's ack is not VIRTIO_NET_OK. Yet the problem is that, vhost_vdpa_net_load_mq() returns 1 based on `*s->status != VIRTIO_NET_OK` when the device's ack is VIRTIO_NET_ERR. As a result, net->nc->info->load() also returns 1, this makes vhost_net_start_one() incorrectly assume the device state is successfully loaded by vhost_vdpa_net_load() and return 0, instead of goto `fail` label to cancel the device startup, as vhost_net_start_one() only cancels the device startup when net->nc->info->load() returns a negative value. This patch fixes this problem by returning -EIO when the device's ack is not VIRTIO_NET_OK. Fixes: `f64c7cda69` ("vdpa: Add vhost_vdpa_net_load_mq") Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <ec515ebb0b4f56368751b9e318e245a5d994fa72.1688438055.git.yin31149@gmail.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-07-10 18:59:32 -04:00
Hawkins Jiawei	b479bc3c9d	vdpa: Return -EIO if device ack is VIRTIO_NET_ERR in _load_mac() According to VirtIO standard, "The class, command and command-specific-data are set by the driver, and the device sets the ack byte. There is little it can do except issue a diagnostic if ack is not VIRTIO_NET_OK." Therefore, QEMU should stop sending the queued SVQ commands and cancel the device startup if the device's ack is not VIRTIO_NET_OK. Yet the problem is that, vhost_vdpa_net_load_mac() returns 1 based on `*s->status != VIRTIO_NET_OK` when the device's ack is VIRTIO_NET_ERR. As a result, net->nc->info->load() also returns 1, this makes vhost_net_start_one() incorrectly assume the device state is successfully loaded by vhost_vdpa_net_load() and return 0, instead of goto `fail` label to cancel the device startup, as vhost_net_start_one() only cancels the device startup when net->nc->info->load() returns a negative value. This patch fixes this problem by returning -EIO when the device's ack is not VIRTIO_NET_OK. Fixes: `f73c0c43ac` ("vdpa: extract vhost_vdpa_net_load_mac from vhost_vdpa_net_load") Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <a21731518644abbd0c495c5b7960527c5911f80d.1688438055.git.yin31149@gmail.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-07-10 18:59:32 -04:00
Hawkins Jiawei	2875a0ca02	vdpa: Sort vdpa_feature_bits array alphabetically This patch sorts the vdpa_feature_bits array alphabetically in ascending order to avoid future duplicates. Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2023-07-08 07:24:38 +03:00
Hawkins Jiawei	aee9701729	vdpa: Delete duplicated VIRTIO_NET_F_RSS in vdpa_feature_bits This entry was duplicated on referenced commit. Removing it. Fixes: `402378407d` ("vhost-vdpa: multiqueue support") Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2023-07-08 07:24:38 +03:00
Laurent Vivier	b6aeee0298	net: socket: remove net_init_socket() Move the file descriptor type checking before doing anything with it. If it's not usable, don't close it as it could be in use by another part of QEMU, only fail and report an error. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2023-07-07 16:35:12 +08:00
Laurent Vivier	23455ae341	net: socket: move fd type checking to its own function Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2023-07-07 16:35:12 +08:00
Laurent Vivier	006c3fa74c	net: socket: prepare to cleanup net_init_socket() Use directly net_socket_fd_init_stream() and net_socket_fd_init_dgram() when the socket type is already known. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2023-07-07 16:35:12 +08:00
Ani Sinha	a0d7215e33	vhost-vdpa: do not cleanup the vdpa/vhost-net structures if peer nic is present When a peer nic is still attached to the vdpa backend, it is too early to free up the vhost-net and vdpa structures. If these structures are freed here, then QEMU crashes when the guest is being shut down. The following call chain would result in an assertion failure since the pointer returned from vhost_vdpa_get_vhost_net() would be NULL: do_vm_stop() -> vm_state_notify() -> virtio_set_status() -> virtio_net_vhost_status() -> get_vhost_net(). Therefore, we defer freeing up the structures until at guest shutdown time when qemu_cleanup() calls net_cleanup() which then calls qemu_del_net_client() which would eventually call vhost_vdpa_cleanup() again to free up the structures. This time, the loop in net_cleanup() ensures that vhost_vdpa_cleanup() will be called one last time when all the peer nics are detached and freed. All unit tests pass with this change. CC: imammedo@redhat.com CC: jusual@redhat.com CC: mst@redhat.com Fixes: CVE-2023-3301 Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2128929 Signed-off-by: Ani Sinha <anisinha@redhat.com> Message-Id: <20230619065209.442185-1-anisinha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-06-26 09:50:00 -04:00
Eugenio Pérez	d45243bcfc	vdpa: fix not using CVQ buffer in case of error Bug introducing when refactoring. Otherway, the guest never received the used buffer. Fixes: `be4278b65f` ("vdpa: extract vhost_vdpa_net_cvq_add from vhost_vdpa_net_handle_ctrl_avail") Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20230602173451.1917999-1-eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com>	2023-06-26 09:50:00 -04:00
Eugenio Pérez	51e84244a7	vdpa: mask _F_CTRL_GUEST_OFFLOADS for vhost vdpa devices QEMU does not emulate it so it must be disabled as long as the backend does not support it. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20230602173328.1917385-1-eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com>	2023-06-26 09:50:00 -04:00
Hawkins Jiawei	4b4a1378b9	vdpa: Allow VIRTIO_NET_F_CTRL_GUEST_OFFLOADS in SVQ Enable SVQ with VIRTIO_NET_F_CTRL_GUEST_OFFLOADS feature. Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <778d642ecae6deed8a218b0e6232e4d7bb96b439.1685704856.git.yin31149@gmail.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Tested-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-06-26 09:50:00 -04:00
Hawkins Jiawei	0b58d3686a	vdpa: Add vhost_vdpa_net_load_offloads() This patch introduces vhost_vdpa_net_load_offloads() to restore offloads state at device's startup. Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Message-Id: <7e2b5cad9c48c917df53d80dec27dbfeb513e1a3.1685704856.git.yin31149@gmail.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Tested-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-06-26 09:50:00 -04:00
Hawkins Jiawei	02d3bf099b	vdpa: reuse virtio_vdev_has_feature() We can use virtio_vdev_has_feature() instead of manually accessing the features. Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <ff838d30206209fd865511b16ffb34cc0d5e8d8f.1685704856.git.yin31149@gmail.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Tested-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-06-26 09:50:00 -04:00
Eugenio Pérez	babf8b8712	vdpa: map shadow vrings with MAP_SHARED The vdpa devices that use va addresses neeeds these maps shared. Otherwise, vhost_vdpa checks will refuse to accept the maps. The mmap call will always return a page aligned address, so removing the qemu_memalign call. Keeping the ROUND_UP for the size as we still need to DMA-map them in full. Not applying fixes tag as it never worked with va devices. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20230602143854.1879091-4-eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-06-26 09:50:00 -04:00
Eugenio Pérez	915bf6ccd7	vdpa: reorder vhost_vdpa_net_cvq_cmd_page_len function We need to call it from resource cleanup context, as munmap needs the size of the mappings. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20230602143854.1879091-3-eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-06-26 09:50:00 -04:00
Eugenio Pérez	8bc0049ead	vdpa: do not block migration if device has cvq and x-svq=on It was a mistake to forbid in all cases, as SVQ is already able to send all the CVQ messages before start forwarding data vqs. It actually caused a regression, making impossible to migrate device previously migratable. Fixes: `36e4647247` ("vdpa: add vhost_vdpa_net_valid_svq_features") Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20230602143854.1879091-2-eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com>	2023-06-26 09:50:00 -04:00
Eugenio Pérez	152128d646	vdpa: move CVQ isolation check to net_init_vhost_vdpa Evaluating it at start time instead of initialization time may make the guest capable of dynamically adding or removing migration blockers. Also, moving to initialization reduces the number of ioctls in the migration, reducing failure possibilities. As a drawback we need to check for CVQ isolation twice: one time with no MQ negotiated and another one acking it, as long as the device supports it. This is because Vring ASID / group management is based on vq indexes, but we don't know the index of CVQ before negotiating MQ. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20230526153143.470745-3-eperezma@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>	2023-06-23 03:09:45 -04:00
Eugenio Pérez	0f2bb0bf38	vdpa: return errno in vhost_vdpa_get_vring_group error We need to tell in the caller, as some errors are expected in a normal workflow. In particular, parent drivers in recent kernels with VHOST_BACKEND_F_IOTLB_ASID may not support vring groups. In that case, -ENOTSUP is returned. This is the case of vp_vdpa in Linux 6.2. Next patches in this series will use that information to know if it must abort or not. Also, next patches return properly an errp instead of printing with error_report. Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20230526153143.470745-2-eperezma@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-06-23 02:54:44 -04:00
Philippe Mathieu-Daudé	de6cd7599b	meson: Replace softmmu_ss -> system_ss We use the user_ss[] array to hold the user emulation sources, and the softmmu_ss[] array to hold the system emulation ones. Hold the latter in the 'system_ss[]' array for parity with user emulation. Mechanical change doing: $ sed -i -e s/softmmu_ss/system_ss/g $(git grep -l softmmu_ss) Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20230613133347.82210-10-philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2023-06-20 10:01:30 +02:00
Philippe Mathieu-Daudé	f975033d56	cocoa: Fix warnings about invalid prototype declarations Fix the following Cocoa trivial warnings: C compiler for the host machine: cc (clang 14.0.0 "Apple clang version 14.0.0 (clang-1400.0.29.202)") Objective-C compiler for the host machine: clang (clang 14.0.0) [100/334] Compiling Objective-C object libcommon.fa.p/net_vmnet-bridged.m.o net/vmnet-bridged.m:40:31: warning: a function declaration without a prototype is deprecated in all versions of C [-Wstrict-prototypes] static char* get_valid_ifnames() ^ void [742/1436] Compiling Objective-C object libcommon.fa.p/ui_cocoa.m.o ui/cocoa.m:1937:22: warning: a function declaration without a prototype is deprecated in all versions of C [-Wstrict-prototypes] static int cocoa_main() ^ void Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-Id: <20230425192820.34063-1-philmd@linaro.org>	2023-06-13 11:28:58 +02:00
Akihiko Odaki	7e64a9cabb	igb: Strip the second VLAN tag for extended VLAN Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2023-05-23 15:20:15 +08:00
Akihiko Odaki	907209e311	igb: Implement Rx SCTP CSO Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech> Signed-off-by: Jason Wang <jasowang@redhat.com>	2023-05-23 15:20:15 +08:00
Akihiko Odaki	aaa8a15c96	net/eth: Always add VLAN tag It is possible to have another VLAN tag even if the packet is already tagged. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2023-05-23 15:20:15 +08:00
Akihiko Odaki	85427bf388	net/eth: Use void pointers The uses of uint8_t pointers were misleading as they are never accessed as an array of octets and it even require more strict alignment to access as struct eth_header. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Jason Wang <jasowang@redhat.com>	2023-05-23 15:20:15 +08:00
Akihiko Odaki	0b11783014	net/eth: Rename eth_setup_vlan_headers_ex The old eth_setup_vlan_headers has no user so remove it and rename eth_setup_vlan_headers_ex. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Jason Wang <jasowang@redhat.com>	2023-05-23 15:20:15 +08:00
Akihiko Odaki	2f0fa232b8	net/net_rx_pkt: Use iovec for net_rx_pkt_set_protocols() igb does not properly ensure the buffer passed to net_rx_pkt_set_protocols() is contiguous for the entire L2/L3/L4 header. Allow it to pass scattered data to net_rx_pkt_set_protocols(). Fixes: `3a977deebe` ("Intrdocue igb device emulation") Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech> Signed-off-by: Jason Wang <jasowang@redhat.com>	2023-05-23 15:20:15 +08:00
Vladimir Sementsov-Ogievskiy	6c1e3906ce	configure: add --disable-colo-proxy option Add option to not build filter-rewriter and colo-compare when they are not needed. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Zhang Chen <chen.zhang@intel.com> Message-Id: <20230515130640.46035-2-vsementsov@yandex-team.ru> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-05-18 18:40:50 +02:00
Eugenio Pérez	0d74e2b785	vdpa: accept VIRTIO_NET_F_SPEED_DUPLEX in SVQ There is no reason to block it as it has nothing to do with the vrings. All the support of the feature comes via config space. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Suggested-by: Alvaro Karsz <alvaro.karsz@solid-run.com> Message-Id: <20230307170018.260557-1-eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-04-21 03:08:21 -04:00
Marc-André Lureau	25657fc6c1	win32: replace closesocket() with close() wrapper Use a close() wrapper instead, so that we don't need to worry about closesocket() vs close() anymore, let's hope. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Message-Id: <20230221124802.4103554-17-marcandre.lureau@redhat.com>	2023-03-13 15:39:31 +04:00
Marc-André Lureau	fd3c333315	slirp: open-code qemu_socket_(un)select() We are about to make the QEMU socket API use file-descriptor space only, but libslirp gives us SOCKET as fd, still. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Message-Id: <20230221124802.4103554-14-marcandre.lureau@redhat.com>	2023-03-13 15:39:31 +04:00
Marc-André Lureau	21ac728498	slirp: unregister the win32 SOCKET Presumably, this is what should happen when the SOCKET is to be removed. (it probably worked until now because closesocket() does it implicitly, but we never now how the slirp library could use the SOCKET later) Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Message-Id: <20230221124802.4103554-13-marcandre.lureau@redhat.com>	2023-03-13 15:39:31 +04:00
Marc-André Lureau	faa4ec1641	main-loop: remove qemu_fd_register(), win32/slirp/socket specific Open-code the socket registration where it's needed, to avoid artificially used or unclear generic interface. Furthermore, the following patches are going to make socket handling use FD-only inside QEMU, but we need to handle win32 SOCKET from libslirp. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Message-Id: <20230221124802.4103554-12-marcandre.lureau@redhat.com>	2023-03-13 15:39:31 +04:00
Peter Maydell	7284d53f6f	-----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJkCvgFAAoJEO8Ells5jWIRHiUH/jhydpJHIqnAPxHQAwGtmyhb 9Z52UOzW5V6KxfZJ+bQ4RPFkS2UwcxmeadPHY4zvvJTVBLAgG3QVgP4igj8CXKCI xRnwMgTNeu655kZQ5P/elTwdBTCJFODk7Egg/bH3H1ZiUhXBhVRhK7q/wMgtlZkZ Kexo6txCK4d941RNzEh45ZaGhdELE+B+D7cRuQgBs/DXZtJpsyEzBbP8KYSMHuER AXfWo0YIBYj7X3ek9D6j0pbOkB61vqtYd7W6xV4iDrJCcFBIOspJbbBb1tGCHola AXo5/OhRmiQnp/c/HTbJIDbrj0sq/r7LxYK4zY1x7UPbewHS9R+wz+FfqSmoBF0= =056y -----END PGP SIGNATURE----- Merge tag 'net-pull-request' of https://github.com/jasowang/qemu into staging # -----BEGIN PGP SIGNATURE----- # Version: GnuPG v1 # # iQEcBAABAgAGBQJkCvgFAAoJEO8Ells5jWIRHiUH/jhydpJHIqnAPxHQAwGtmyhb # 9Z52UOzW5V6KxfZJ+bQ4RPFkS2UwcxmeadPHY4zvvJTVBLAgG3QVgP4igj8CXKCI # xRnwMgTNeu655kZQ5P/elTwdBTCJFODk7Egg/bH3H1ZiUhXBhVRhK7q/wMgtlZkZ # Kexo6txCK4d941RNzEh45ZaGhdELE+B+D7cRuQgBs/DXZtJpsyEzBbP8KYSMHuER # AXfWo0YIBYj7X3ek9D6j0pbOkB61vqtYd7W6xV4iDrJCcFBIOspJbbBb1tGCHola # AXo5/OhRmiQnp/c/HTbJIDbrj0sq/r7LxYK4zY1x7UPbewHS9R+wz+FfqSmoBF0= # =056y # -----END PGP SIGNATURE----- # gpg: Signature made Fri 10 Mar 2023 09:27:33 GMT # gpg: using RSA key EF04965B398D6211 # gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>" [marginal] # gpg: WARNING: This key is not certified with sufficiently trusted signatures! # gpg: It is not certain that the signature belongs to the owner. # Primary key fingerprint: 215D 46F4 8246 689E C77F 3562 EF04 965B 398D 6211 * tag 'net-pull-request' of https://github.com/jasowang/qemu: (44 commits) ebpf: fix compatibility with libbpf 1.0+ docs/system/devices/igb: Add igb documentation tests/avocado: Add igb test igb: Introduce qtest for igb device tests/qtest/libqos/e1000e: Export macreg functions tests/qtest/e1000e-test: Fabricate ethernet header Intrdocue igb device emulation e1000: Split header files pcie: Introduce pcie_sriov_num_vfs net/eth: Introduce EthL4HdrProto e1000e: Implement system clock net/eth: Report if headers are actually present e1000e: Count CRC in Tx statistics e1000: Count CRC in Tx statistics e1000e: Combine rx traces MAINTAINERS: Add e1000e test files MAINTAINERS: Add Akihiko Odaki as a e1000e reviewer e1000e: Do not assert when MSI-X is disabled later hw/net/net_tx_pkt: Check the payload length hw/net/net_tx_pkt: Implement TCP segmentation ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2023-03-11 17:17:18 +00:00
Akihiko Odaki	65f474bbae	net/eth: Introduce EthL4HdrProto igb, a new network device emulation, will need SCTP checksum offloading. Currently eth_get_protocols() has a bool parameter for each protocol currently it supports, but there will be a bit too many parameters if we add yet another protocol. Introduce an enum type, EthL4HdrProto to represent all L4 protocols eth_get_protocols() support with one parameter. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2023-03-10 15:35:38 +08:00
Akihiko Odaki	69ff5ef847	net/eth: Report if headers are actually present The values returned by eth_get_protocols() are used to perform RSS, checksumming and segmentation. Even when a packet signals the use of the protocols which these operations can be applied to, the headers for them may not be present because of too short packet or fragmentation, for example. In such a case, the operations cannot be applied safely. Report the presence of headers instead of whether the use of the protocols are indicated with eth_get_protocols(). This also makes corresponding changes to the callers of eth_get_protocols() to match with its new signature and to remove redundant checks for fragmentation. Fixes: `75020a7021` ("Common definitions for VMWARE devices") Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2023-03-10 15:35:38 +08:00
Akihiko Odaki	02ef5fdc09	hw/net/net_tx_pkt: Implement TCP segmentation There was no proper implementation of TCP segmentation before this change, and net_tx_pkt relied solely on IPv4 fragmentation. Not only this is not aligned with the specification, but it also resulted in corrupted IPv6 packets. This is particularly problematic for the igb, a new proposed device implementation; igb provides loopback feature for VMDq and the feature relies on software segmentation. Implement proper TCP segmentation in net_tx_pkt to fix such a scenario. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2023-03-10 15:35:38 +08:00
Akihiko Odaki	481c52320a	net: Strip virtio-net header when dumping filter-dump specifiees Ethernet as PCAP LinkType, which does not expect virtio-net header. Having virtio-net header in such PCAP file breaks PCAP unconsumable. Unfortunately currently there is no LinkType for virtio-net so for now strip virtio-net header to convert the output to Ethernet. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2023-03-10 15:35:38 +08:00
Eugenio Pérez	609ab4c3ed	vdpa net: allow VHOST_F_LOG_ALL Since some actions move to the start function instead of init, the device features may not be the parent vdpa device's, but the one returned by vhost backend. If transition to SVQ is supported, the vhost backend will return _F_LOG_ALL to signal the device is migratable. Add VHOST_F_LOG_ALL. HW dirty page tracking can be added on top of this change if the device supports it in the future. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230303172445.1089785-14-eperezma@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-03-07 12:38:59 -05:00
Eugenio Pérez	5c1ebd4c43	vdpa: block migration if device has unsupported features A vdpa net device must initialize with SVQ in order to be migratable at this moment, and initialization code verifies some conditions. If the device is not initialized with the x-svq parameter, it will not expose _F_LOG so the vhost subsystem will block VM migration from its initialization. Next patches change this, so we need to verify migration conditions differently. QEMU only supports a subset of net features in SVQ, and it cannot migrate state that cannot track or restore in the destination. Add a migration blocker if the device offers an unsupported feature. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20230303172445.1089785-12-eperezma@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-03-07 12:38:59 -05:00
Eugenio Pérez	9c363cf6d5	vdpa net: block migration if the device has CVQ Devices with CVQ need to migrate state beyond vq state. Leaving this to future series. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20230303172445.1089785-11-eperezma@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-03-07 12:38:59 -05:00
Eugenio Pérez	6949843046	vdpa: add vdpa net migration state notifier This allows net to restart the device backend to configure SVQ on it. Ideally, these changes should not be net specific and they could be done in: * vhost_vdpa_set_features (with VHOST_F_LOG_ALL) * vhost_vdpa_set_vring_addr (with .enable_log) * vhost_vdpa_set_log_base. However, the vdpa net backend is the one with enough knowledge to configure everything because of some reasons: * Queues might need to be shadowed or not depending on its kind (control vs data). * Queues need to share the same map translations (iova tree). Also, there are other problems that may have solutions but complicates the implementation at this stage: * We're basically duplicating vhost_dev_start and vhost_dev_stop, and they could go out of sync. If we want to reuse them, we need a way to skip some function calls to avoid recursiveness (either vhost_ops -> vhost_set_features, vhost_set_vring_addr, ...). * We need to traverse all vhost_dev of a given net device twice: one to stop and get the vq state and another one after the reset to configure properties like address, fd, etc. Because of that it is cleaner to restart the whole net backend and configure again as expected, similar to how vhost-kernel moves between userspace and passthrough. If more kinds of devices need dynamic switching to SVQ we can: * Create a callback struct like VhostOps and move most of the code there. VhostOps cannot be reused since all vdpa backend share them, and to personalize just for networking would be too heavy. * Add a parent struct or link all the vhost_vdpa or vhost_dev structs so we can traverse them. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20230303172445.1089785-9-eperezma@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-03-07 12:38:59 -05:00
Eugenio Pérez	00ef422e9f	vdpa net: move iova tree creation from init to start Only create iova_tree if and when it is needed. The cleanup keeps being responsible for the last VQ but this change allows it to merge both cleanup functions. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230303172445.1089785-2-eperezma@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2023-03-07 12:38:59 -05:00
Eugenio Pérez	525ae11522	vdpa: fix VHOST_BACKEND_F_IOTLB_ASID flag check VHOST_BACKEND_F_IOTLB_ASID is the feature bit, not the bitmask. Since the device under test also provided VHOST_BACKEND_F_IOTLB_MSG_V2 and VHOST_BACKEND_F_IOTLB_BATCH, this went unnoticed. Fixes: `c1a1008685` ("vdpa: always start CVQ in SVQ mode if possible") Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2023-02-17 13:31:33 +08:00
Laurent Vivier	148fbf0d58	net: stream: add a new option to automatically reconnect In stream mode, if the server shuts down there is currently no way to reconnect the client to a new server without removing the NIC device and the netdev backend (or to reboot). This patch introduces a reconnect option that specifies a delay to try to reconnect with the same parameters. Add a new test in qtest to test the reconnect option and the connect/disconnect events. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2023-02-17 13:31:33 +08:00
Joelle van Dyne	993f71ee33	vmnet: stop recieving events when VM is stopped When the VM is stopped using the HMP command "stop", soon the handler will stop reading from the vmnet interface. This causes a flood of `VMNET_INTERFACE_PACKETS_AVAILABLE` events to arrive and puts the host CPU at 100%. We fix this by removing the event handler from vmnet when the VM is no longer in a running state and restore it when we return to a running state. Signed-off-by: Joelle van Dyne <j@getutm.app> Signed-off-by: Jason Wang <jasowang@redhat.com>	2023-02-17 13:31:33 +08:00
Christian Svensson	0c65ef4fbb	net: Increase L2TPv3 buffer to fit jumboframes Increase the allocated buffer size to fit larger packets. Given that jumboframes can commonly be up to 9000 bytes the closest suitable value seems to be 16 KiB. Tested by running qemu towards a Linux L2TPv3 endpoint and pushing jumboframe traffic through the interfaces. Signed-off-by: Christian Svensson <blue@cmd.nu> Signed-off-by: Jason Wang <jasowang@redhat.com>	2023-02-17 13:31:33 +08:00
Thomas Huth	3b0cca8e4e	net: Replace "Supported NIC models" with "Available NIC models" Just because a NIC model is compiled into the QEMU binary does not necessary mean that it can be used with each and every machine. So let's rather talk about "available" models instead of "supported" models, just to avoid confusion. Reviewed-by: Claudio Fontana <cfontana@suse.de> Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2023-02-17 13:31:33 +08:00
Thomas Huth	27c819244b	net: Restore printing of the help text with "-nic help" Running QEMU with "-nic help" used to work in QEMU 5.2 and earlier versions (it showed the available netdev backends), but this feature got broken during some refactoring in version 6.0. Let's restore the old behavior, and while we're at it, let's also print the available NIC models here now since this option can be used to configure both, netdev backend and model in one go. Fixes: `ad6f932fe8` ("net: do not exit on "netdev_add help" monitor command") Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2023-02-17 13:31:33 +08:00
Thomas Huth	c6941b3b9b	net: Move the code to collect available NIC models to a separate function The code that collects the available NIC models is not really specific to PCI anymore and will be required in the next patch, too, so let's move this into a new separate function in net.c instead. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2023-02-17 13:31:33 +08:00
Markus Armbruster	e02e085c8b	net: Clean up includes This commit was created with scripts/clean-includes. All .c should include qemu/osdep.h first. The script performs three related cleanups: * Ensure .c files include qemu/osdep.h first. * Including it in a .h is redundant, since the .c already includes it. Drop such inclusions. * Likewise, including headers qemu/osdep.h includes is redundant. Drop these, too. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20230202133830.2152150-13-armbru@redhat.com>	2023-02-08 07:28:05 +01:00
Markus Armbruster	ae71d13d4e	net: Move hmp_info_network() to net-hmp-cmds.c Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20230124121946.1139465-17-armbru@redhat.com>	2023-02-04 07:56:54 +01:00
Markus Armbruster	2030ca36bf	net: Move HMP commands from monitor to net/ This moves these commands from MAINTAINERS sections "Human Monitor (HMP)" and "QMP" to "Network device backends". Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20230124121946.1139465-16-armbru@redhat.com>	2023-02-04 07:56:54 +01:00
Peter Maydell	aa96ab7c9d	* s390x header clean-ups from Philippe * Rework and improvements of the EINTR handling by Nikita * Deprecate the -no-hpet command line option * Disable the qtests in the 32-bit Windows CI job again * Some other misc fixes here and there -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmO8It8RHHRodXRoQHJl ZGhhdC5jb20ACgkQLtnXdP5wLbUwbA//dXgfHy95C1r2nTMDekk09+KkmNB1f6M8 3HK4ROmmrMT/aP9FwfqMBT7JHM/m4bwOGw0Sula8vfjg9NYGPWuSYjdObWKnrIq/ YORoTxqak9c98Co06EQbAfWn3Pj0ifQkX+FIyzcNGhu4856FWdBsMuyq52VLi36q Z8ruSOmclzluoIB3mVYY/s5J7ED2A3K0h39frKLE9FGsKObX10KWj+MZyDHi9oGZ ucTHai12OXgNghjlrwI0BqJziih4NxfIWs0JovSo3cN0at7m57G5JChjR38zTMNT 2Q46tDKoIXesY1GUmVuIgJ5F1Uoshc8Pz5qBSQ5mUbZUQMpivhFrEB666wsYmPd1 M/YwnZ+PFhWjem7p28fKmnmkeATvE0S+vMDifTVZ880nmAbyUm1vFKfqV6r2mBrT p4iXfh/9easFfJWHueU4fBwyMndDGRaCRJnP8KQ5I9yb0WZbt+/0k/y8CQD8Oxr7 dNFFFoY3KnIO9DCRO5Wr+3OqUgtSAQyhBDf5V2wSMCFrwPHKsvWKSbdiWR3Qe4ck 41InWgawB3xx57+vXraDUA10+nBZ1VrM92ObqfLPTFqjLCom6Fm85cG4YFRLIvRt rdlOC+ScpeVpec7MwcHrScGL0HmUgPnShDAo07pRy4oKK+c89sXzdAFf2nYJTAWS WCuChrn7VFM= =D+Yw -----END PGP SIGNATURE----- Merge tag 'pull-request-2023-01-09' of https://gitlab.com/thuth/qemu into staging * s390x header clean-ups from Philippe * Rework and improvements of the EINTR handling by Nikita * Deprecate the -no-hpet command line option * Disable the qtests in the 32-bit Windows CI job again * Some other misc fixes here and there # gpg: Signature made Mon 09 Jan 2023 14:21:19 GMT # gpg: using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5 # gpg: issuer "thuth@redhat.com" # gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full] # gpg: aka "Thomas Huth <thuth@redhat.com>" [full] # gpg: aka "Thomas Huth <huth@tuxfamily.org>" [full] # gpg: aka "Thomas Huth <th.huth@posteo.de>" [unknown] # Primary key fingerprint: 27B8 8847 EEE0 2501 18F3 EAB9 2ED9 D774 FE70 2DB5 * tag 'pull-request-2023-01-09' of https://gitlab.com/thuth/qemu: .gitlab-ci.d/windows: Do not run the qtests in the msys2-32bit job error handling: Use RETRY_ON_EINTR() macro where applicable Refactoring: refactor TFR() macro to RETRY_ON_EINTR() docs/interop: Change the vnc-ledstate-Pseudo-encoding doc into .rst i386: Deprecate the -no-hpet QEMU command line option tests/qtest/bios-tables-test: Replace -no-hpet with hpet=off machine parameter tests/readconfig: spice doesn't support unix socket on windows yet target/s390x: Restrict sysemu/reset.h to system emulation target/s390x/tcg/excp_helper: Restrict system headers to sysemu target/s390x/tcg/misc_helper: Remove unused "memory.h" include hw/s390x/pv: Restrict Protected Virtualization to sysemu exec/memory: Expose memory_region_access_valid() MAINTAINERS: Add MIPS-related docs and configs to the MIPS architecture section tests/vm: Update get_default_jobs() to work on non-x86_64 non-KVM hosts qemu-iotests/stream-under-throttle: do not shutdown QEMU Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2023-01-09 15:54:31 +00:00
Nikita Ivanov	37b0b24e93	error handling: Use RETRY_ON_EINTR() macro where applicable There is a defined RETRY_ON_EINTR() macro in qemu/osdep.h which handles the same while loop. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/415 Signed-off-by: Nikita Ivanov <nivanov@cloudlinux.com> Message-Id: <20221023090422.242617-3-nivanov@cloudlinux.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> [thuth: Dropped the hunk that changed socket_accept() in libqtest.c] Signed-off-by: Thomas Huth <thuth@redhat.com>	2023-01-09 13:50:47 +01:00
Nikita Ivanov	8b6aa69365	Refactoring: refactor TFR() macro to RETRY_ON_EINTR() Rename macro name to more transparent one and refactor it to expression. Signed-off-by: Nikita Ivanov <nivanov@cloudlinux.com> Message-Id: <20221023090422.242617-2-nivanov@cloudlinux.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Bin Meng <bmeng.cn@gmail.com> Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2023-01-09 13:50:47 +01:00
Longpeng	bf7a2ad8b6	vdpa: harden the error path if get_iova_range failed We should stop if the GET_IOVA_RANGE ioctl failed. Signed-off-by: Longpeng <longpeng2@huawei.com> Message-Id: <20221224114848.3062-3-longpeng2@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>	2023-01-08 01:54:22 -05:00
Longpeng	c672f348cb	vdpa-dev: get iova range explicitly In commit `a585fad26b` ("vdpa: request iova_range only once") we remove GET_IOVA_RANGE form vhost_vdpa_init, the generic vdpa device will start without iova_range populated, so the device won't work. Let's call GET_IOVA_RANGE ioctl explicitly. Fixes: `a585fad26b` ("vdpa: request iova_range only once") Signed-off-by: Longpeng <longpeng2@huawei.com> Message-Id: <20221224114848.3062-2-longpeng2@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>	2023-01-08 01:54:22 -05:00

1 2 3 4 5 ...

1119 Commits