When enabled the virtio-net-pci, guest network packet will
load the vnet_hdr. In COLO status, the primary VM's network
packet maybe redirect to another VM, it needs filter-redirect
enable the vnet_hdr flag at the same time, COLO-proxy will
correctly parse the original network packet. If have any
misconfiguration here, the vnet_hdr_len is wrong for parse
the packet, the data+offset will point to wrong place.
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
We can restore the device state in the destination via CVQ now. Remove
the migration blocker.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This is needed so the destination vdpa device see the same state a the
guest set in the source.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
So we can reuse it to inject state messages.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
--
v7:
* Remove double free error
v6:
* Do not assume in buffer sent to the device is sizeof(virtio_net_ctrl_ack)
v5:
* Do not use an artificial !NULL VirtQueueElement
* Use only out size instead of iovec dev_buffers for these functions.
Signed-off-by: Jason Wang <jasowang@redhat.com>
As this series will reuse them to restore the device state at the end of
a migration (or a device start), let's allocate only once at the device
start so we don't duplicate their map and unmap.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Next patches will add a new info callback to restore NIC status through
CVQ. Since only the CVQ vhost device is needed, create it with a new
NetClientInfo.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
It's convenient to call iova_tree_remove from a map returned from
iova_tree_find or iova_tree_find_iova. With the current code this is not
possible, since we will free it, and then we will try to search for it
again.
Fix it making accepting the map by value, forcing a copy of the
argument. Not applying a fixes tag, since there is no use like that at
the moment.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
File descriptor vdpa_device_fd is not free in the case of returning
error from vhost_vdpa_get_features. Fixing it by making all errors go to
the same error path.
Resolves: Coverity CID 1490785
Fixes: 8170ab3f43 ("vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs")
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20220802112447.249436-2-eperezma@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
When COLO use only one vnet_hdr_support parameter between
filter-redirector and filter-mirror(or colo-compare), COLO will crash
with segmentation fault. Back track as follow:
Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0)
at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296
296 uint16_t proto = be16_to_cpu(PKT_GET_ETH_HDR(p)->h_proto);
(gdb) bt
0 0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0)
at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296
1 0x0000555555cb22b4 in parse_packet_early (pkt=0x555556a44840) at
net/colo.c:49
2 0x0000555555cb2b91 in is_tcp_packet (pkt=0x555556a44840) at
net/filter-rewriter.c:63
So wrong vnet_hdr_len will cause pkt->data become NULL. Add check to
raise error and add trace-events to track vnet_hdr_len.
Signed-off-by: Tao Xu <tao3.xu@intel.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Filter-rewriter no need to track connection in conn_list.
This patch fix the glib g_queue_is_empty assertion when COLO guest
keep a lot of network connection.
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
We notice the QEMU may crash when the guest has too many
incoming network connections with the following log:
15197@1593578622.668573:colo_proxy_main : colo proxy connection hashtable full, clear it
free(): invalid pointer
[1] 15195 abort (core dumped) qemu-system-x86_64 ....
This is because we create the s->connection_track_table with
g_hash_table_new_full() which is defined as:
GHashTable * g_hash_table_new_full (GHashFunc hash_func,
GEqualFunc key_equal_func,
GDestroyNotify key_destroy_func,
GDestroyNotify value_destroy_func);
The fourth parameter connection_destroy() will be called to free the
memory allocated for all 'Connection' values in the hashtable when
we call g_hash_table_remove_all() in the connection_hashtable_reset().
But both connection_track_table and conn_list reference to the same
conn instance. It will trigger double free in conn_list clear. So this
patch remove free action on hash table side to avoid double free the
conn.
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Finally offering the possibility to enable SVQ from the command line.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
To know the device features is needed for CVQ SVQ, so SVQ knows if it
can handle all commands or not. Extract from
vhost_vdpa_get_max_queue_pairs so we can reuse it.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Introduce the control virtqueue support for vDPA shadow virtqueue. This
is needed for advanced networking features like rx filtering.
Virtio-net control VQ copies the descriptors to qemu's VA, so we avoid
TOCTOU with the guest's or device's memory every time there is a device
model change. Otherwise, the guest could change the memory content in
the time between qemu and the device read it.
To demonstrate command handling, VIRTIO_NET_F_CTRL_MACADDR is
implemented. If the virtio-net driver changes MAC the virtio-net device
model will be updated with the new one, and a rx filtering change event
will be raised.
More cvq commands could be added here straightforwardly but they have
not been tested.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Do a simple forwarding of CVQ buffers, the same work SVQ could do but
through callbacks. No functional change intended.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
net/vhost-vdpa.c will need functions that are declared in
vhost-shadow-virtqueue.c, that needs functions of virtio-net.c.
Copy the vhost-vdpa-stub.c code so
only the constructor net_init_vhost_vdpa needs to be defined.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Interaction with vmnet.framework in different modes
differs only on configuration stage, so we can create
common `send`, `receive`, etc. procedures and reuse them.
Reviewed-by: Akihiko Odaki <akihiko.odaki@gmail.com>
Tested-by: Akihiko Odaki <akihiko.odaki@gmail.com>
Signed-off-by: Phillip Tennen <phillip@axleos.com>
Signed-off-by: Vladislav Yaroshchuk <Vladislav.Yaroshchuk@jetbrains.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
... such that no memory leaks on dangling net clients in case of
error.
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <1651890498-24478-4-git-send-email-si-wei.liu@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
libslirp 4.7 introduces a CFI-friendly version of the .timer_new callback.
The new callback replaces the function pointer with an enum; invoking the
callback is done with a new function slirp_handle_timer.
Support the new API so that CFI can be made compatible with using a system
libslirp.
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Marc-André Lureau <malureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Replace slirp_init with slirp_new, so that a more recent cfg.version
can be specified. The function appeared in version 4.1.0.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This struct will be extended in the next few patches to support the
new slirp_handle_timer() call. For that we need to store an additional
"int" for each SLIRP timer, in addition to the cb_opaque.
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Marc-André Lureau <malureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The qemu_*block() functions are meant to be be used with sockets (the
win32 implementation expects SOCKET)
Over time, those functions where used with Win32 SOCKET or
file-descriptors interchangeably. But for portability, they must only be
used with socket-like file-descriptors. FDs can use
g_unix_set_fd_nonblocking() instead.
Rename the functions with "socket" in the name to prevent bad usages.
This is effectively reverting commit f9e8cacc55 ("oslib-posix:
rename socket_set_nonblock() to qemu_set_nonblock()").
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Those calls are POSIX-specific. Use the dedicated GLib
API. (qemu_set_nonblock() is for socket-like)
(this is a preliminary patch before renaming qemu_set_nonblock())
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
All uses flush output immediately before or after qemu_log_unlock.
Instead of a separate call, move the flush into qemu_log_unlock.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220417183019.755276-20-richard.henderson@linaro.org>
Inside qemu_log, we perform qemu_log_trylock/unlock, which need
not be done if we have already performed the lock beforehand.
Always check the result of qemu_log_trylock -- only checking
qemu_loglevel_mask races with the acquisition of the lock on
the logfile.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220417183019.755276-10-richard.henderson@linaro.org>
This function can fail, which makes it more like ftrylockfile
or pthread_mutex_trylock than flockfile or pthread_mutex_lock,
so rename it.
To closer match the other trylock functions, release rcu_read_lock
along the failure path, so that qemu_log_unlock need not be called
on failure.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220417183019.755276-8-richard.henderson@linaro.org>
Replace a config-time define with a compile time condition
define (compatible with clang and gcc) that must be declared prior to
its usage. This avoids having a global configure time define, but also
prevents from bad usage, if the config header wasn't included before.
This can help to make some code independent from qemu too.
gcc supports __BYTE_ORDER__ from about 4.6 and clang from 3.2.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
[ For the s390x parts I'm involved in ]
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220323155743.1585078-7-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move qemu_ether_ntoa() which is only needed in net/.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
The socket API wrappers were initially introduced in commit
00aa0040 ("Wrap recv to avoid warnings"), but made redundant with
commit a2d96af4 ("osdep: add wrappers for socket functions") which fixes
the win32 declarations and thus removed the earlier warnings.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Move the various memalign-related functions out of osdep.h and into
their own header, which we include only where they are used.
While we're doing this, add some brief documentation comments.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20220226180723.1706285-10-peter.maydell@linaro.org
The IPv6 option headers all have in common that they start with some
common fields, in particular the type of the next header followed by the
extention header length. This is used to traverse the list of the
options. The ESP header does not follow that format, which can break the
IPv6 option header traversal code in eth_parse_ipv6_hdr().
The effect of that is that network interfaces such as vmxnet3 that use
the following call chain
eth_is_ip6_extension_header_type
eth_parse_ipv6_hdr
net_tx_pkt_parse_headers
net_tx_pkt_parse
vmxnet3_process_tx_queue
to send packets from the VM out to the host will drop packets of the
following structure:
Ethernet-Header(IPv6-Header(ESP(encrypted data)))
Note that not all types of network interfaces use the net_tx_pkt_parse
function though, leading to inconsistent behavior regarding sending
those packets. The e1000 network interface for example does not suffer
from this limitation.
By not considering ESP to be an IPv6 header we can allow sending those
packets out to the host on all types of network interfaces.
Fixes: 75020a7021 ("Common definitions for VMWARE devices")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/149
Buglink: https://bugs.launchpad.net/qemu/+bug/1758091
Signed-off-by: Thomas Jansen <mithi@mithi.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This patch is to improve the logic of QEMU main thread sleep code in
qemu_chr_write_buffer() where it can be blocked and can't run other
coroutines during COLO IO stress test.
Our approach is to put filter_send() in a coroutine. In this way,
filter_send() will call qemu_coroutine_yield() in qemu_co_sleep_ns(),
so that it can be scheduled out and QEMU main thread has opportunity to
run other tasks.
Signed-off-by: Lei Rao <lei.rao@intel.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Make the comments consistent with the REGULAR_PACKET_CHECK_MS.
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
COLO-compare use the glib function g_queue_find_custom to dump
another VM's networking packet to compare. But this function always
start find from the queue->head(here is the newest packet), It will
reduce the success rate of comparison. So this patch reversed
the order of the queues for performance.
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reported-by: leirao <lei.rao@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Match the other error handling in this function.
Fixes: e7b347d0bf ("net: detect errors from probing vnet hdr flag for TAP devices")
Reviewed-by: Patrick Venture <venture@google.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Peter Foley <pefoley@google.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
softmmu/rtc.c defines two public functions: qemu_get_timedate() and
qemu_timedate_diff(). Currently we keep the prototypes for these in
qemu-common.h, but most files don't need them. Move them to their
own header, a new include/sysemu/rtc.h.
Since the C files using these two functions did not need to include
qemu-common.h for any other reason, we can remove those include lines
when we add the include of the new rtc.h.
The license for the .h file follows that of the softmmu/rtc.c
where both the functions are defined.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Qemu falls back on userland handlers even if vhost-user and vhost-vdpa
cases. These assumes a tap device can handle the packets.
If a vdpa device fail to start, it can trigger a sigsegv because of
that. Add dummy receiver that returns no progress so it can keep
running.
Fixes: 1e0a84ea49 ("vhost-vdpa: introduce vhost-vdpa net client")
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Message-Id: <20211125101614.76927-2-eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
The TCP protocol ACK maybe bigger than uint32_t MAX.
At this time, the ACK will reverse to 0. This patch
fix the max_ack and min_ack track issue.
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>