This commit adds implementation of RX packets
coalescing, compatible with requirements of Windows
Hardware compatibility kit.
The device enables feature VIRTIO_NET_F_RSC_EXT in
host features if it supports extended RSC functionality
as defined in the specification.
This feature requires at least one of VIRTIO_NET_F_GUEST_TSO4,
VIRTIO_NET_F_GUEST_TSO6. Windows guest driver acks
this feature only if VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
is also present.
If the guest driver acks VIRTIO_NET_F_RSC_EXT feature,
the device coalesces TCPv4 and TCPv6 packets (if
respective VIRTIO_NET_F_GUEST_TSO feature is on,
populates extended RSC information in virtio header
and sets VIRTIO_NET_HDR_F_RSC_INFO bit in header flags.
The device does not recalculate checksums in the coalesced
packet, so they are not valid.
In this case:
All the data packets in a tcp connection are cached
to a single buffer in every receive interval, and will
be sent out via a timer, the 'virtio_net_rsc_timeout'
controls the interval, this value may impact the
performance and response time of tcp connection,
50000(50us) is an experience value to gain a performance
improvement, since the whql test sends packets every 100us,
so '300000(300us)' passes the test case, it is the default
value as well, tune it via the command line parameter
'rsc_interval' within 'virtio-net-pci' device, for example,
to launch a guest with interval set as '500000':
'virtio-net-pci,netdev=hostnet1,bus=pci.0,id=net1,mac=00,
guest_rsc_ext=on,rsc_interval=500000'
The timer will only be triggered if the packets pool is not empty,
and it'll drain off all the cached packets.
'NetRscChain' is used to save the segments of IPv4/6 in a
VirtIONet device.
A new segment becomes a 'Candidate' as well as it passed sanity check,
the main handler of TCP includes TCP window update, duplicated
ACK check and the real data coalescing.
An 'Candidate' segment means:
1. Segment is within current window and the sequence is the expected one.
2. 'ACK' of the segment is in the valid window.
Sanity check includes:
1. Incorrect version in IP header
2. An IP options or IP fragment
3. Not a TCP packet
4. Sanity size check to prevent buffer overflow attack.
5. An ECN packet
Even though, there might more cases should be considered such as
ip identification other flags, while it breaks the test because
windows set it to the same even it's not a fragment.
Normally it includes 2 typical ways to handle a TCP control flag,
'bypass' and 'finalize', 'bypass' means should be sent out directly,
while 'finalize' means the packets should also be bypassed, but this
should be done after search for the same connection packets in the
pool and drain all of them out, this is to avoid out of order fragment.
All the 'SYN' packets will be bypassed since this always begin a new'
connection, other flags such 'URG/FIN/RST/CWR/ECE' will trigger a
finalization, because this normally happens upon a connection is going
to be closed, an 'URG' packet also finalize current coalescing unit.
Statistics can be used to monitor the basic coalescing status, the
'out of order' and 'out of window' means how many retransmitting packets,
thus describe the performance intuitively.
Difference between ip v4 and v6 processing:
Fragment length in ipv4 header includes itself, while it's not
included for ipv6, thus means ipv6 can carry a real 65535 payload.
Note that main goal of implementing this feature in software
is to create reference setup for certification tests. In such
setups guest migration is not required, so the coalesced packets
not yet delivered to the guest will be lost in case of migration.
Signed-off-by: Wei Xu <wexu@redhat.com>
Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We try to detect and drop too large packet (>INT_MAX) in 1592a99470
("net: ignore packet size greater than INT_MAX") during packet
delivering. Unfortunately, this is not sufficient as we may hit
another integer overflow when trying to queue such large packet in
qemu_net_queue_append_iov():
- size of the allocation may overflow on 32bit
- packet->size is integer which may overflow even on 64bit
Fixing this by moving the check to qemu_sendv_packet_async() which is
the entrance of all networking codes and reduce the limit to
NET_BUFSIZE to be more conservative. This works since:
- For the callers that call qemu_sendv_packet_async() directly, they
only care about if zero is returned to determine whether to prevent
the source from producing more packets. A callback will be triggered
if peer can accept more then source could be enabled. This is
usually used by high speed networking implementation like virtio-net
or netmap.
- For the callers that call qemu_sendv_packet() that calls
qemu_sendv_packet_async() indirectly, they often ignore the return
value. In this case qemu will just the drop packets if peer can't
receive.
Qemu will copy the packet if it was queued. So it was safe for both
kinds of the callers to assume the packet was sent.
Since we move the check from qemu_deliver_packet_iov() to
qemu_sendv_packet_async(), it would be safer to make
qemu_deliver_packet_iov() static to prevent any external user in the
future.
This is a revised patch of CVE-2018-17963.
Cc: qemu-stable@nongnu.org
Cc: Li Qiang <liq3ea@163.com>
Fixes: 1592a99470 ("net: ignore packet size greater than INT_MAX")
Reported-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-id: 20181204035347.6148-2-jasowang@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Filter needs to process the event of checkpoint/failover or
other event passed by COLO frame.
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Zhang Chen <zhangckid@gmail.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
These options likely do not work as expected as soon as the user
tries to use more than one network interface at once. The parameters
have been marked as deprecated since QEMU v2.6, so users had plenty
of time to move their scripts to the new syntax. Time to remove the
old parameters now.
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Acked-by: Peter Krempa <pkrempa@redhat.com>
Acked-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
It's been marked as deprecated since QEMU v2.9.0, so that should have
been enough time for everybody to either just drop unnecessary "vlan=0"
parameters, to switch to the modern -device + -netdev syntax for connecting
guest NICs with host network backends, or to switch to the "hubport" netdev
in case hubs are really wanted instead.
Buglink: https://bugs.launchpad.net/qemu/+bug/658904
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
If the backend could not transmit a packet right away for some reason,
the packet is queued for asynchronous sending. The corresponding vq
element is tracked in the async_tx.elem field of the VirtIONetQueue,
for later freeing when the transmission is complete.
If a reset happens before completion, virtio_net_tx_complete() will push
async_tx.elem back to the guest anyway, and we end up with the inuse flag
of the vq being equal to -1. The next call to virtqueue_pop() is then
likely to fail with "Virtqueue size exceeded".
This can be reproduced easily by starting a guest with an hubport backend
that is not connected to a functional network, eg,
-device virtio-net-pci,netdev=hub0 -netdev hubport,id=hub0,hubid=0
and no other -netdev hubport,hubid=0 on the command line.
The appropriate fix is to ensure that such an asynchronous transmission
cannot survive a device reset. So for all queues, we first try to send
the packet again, and eventually we purge it if the backend still could
not deliver it.
CC: qemu-stable@nongnu.org
Reported-by: R. Nageswara Sastry <nasastry@in.ibm.com>
Buglink: https://github.com/open-power-host-os/qemu/issues/37
Signed-off-by: Greg Kurz <groug@kaod.org>
Tested-by: R. Nageswara Sastry <nasastry@in.ibm.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Version: GnuPG v1
iQEcBAABAgAGBQJanLRTAAoJEO8Ells5jWIRAxsH/3wX62o+msLjJHakjcu2OTMG
vdhnB8GfjC5HgMYbovG7TJ95KXg7VRodwru9zgJheTK7DG8fG0nFRuzr8L2tSAph
3s0YTFYDXJ6MBYD//ubdX+jNnchIvMlTX6yheAzcXvQb+nCcN2efN0XpSlGR+g4D
wGi1lWKurGEJE6RUfYPpbbUkjXjbbKyclE0RL+WBmmyruerXI8OxuXQ3GuHK75fb
cZLiToyP9+qnnDyT4lceG5vGRjYLtL8t1nB01M4UTr+tlCkMMsoOjufzGB/CDUdm
oGi1OuKRw06xTLroUJ/uiwSKQH6dMbrv6sLvk92dHnL9k2ZAjv2bVAgz8eAG7o0=
=pPIZ
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging
# gpg: Signature made Mon 05 Mar 2018 03:06:59 GMT
# gpg: using RSA key EF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F 3562 EF04 965B 398D 6211
* remotes/jasowang/tags/net-pull-request:
tap: setting error appropriately when calling net_init_tap_one()
hw/net: Remove unnecessary header includes
net: Add a new convenience option "--nic" to configure default/on-board NICs
net: Remove the deprecated 'host_net_add' and 'host_net_remove' HMP commands
net: Remove the deprecated way of dumping network packets
net: Make net_client_init() static
net: Only show vhost-user in the help text if CONFIG_POSIX is defined
net: List available netdevs with "-netdev help"
net: Move error reporting from net_init_client/netdev to the calling site
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
- Markus Armbruster: Modularize generated QAPI code
-----BEGIN PGP SIGNATURE-----
Comment: Public key at http://people.redhat.com/eblake/eblake.gpg
iQEcBAABCAAGBQJamar4AAoJEKeha0olJ0NqaOcH/js0SXjzfny/qNziZ69I4diF
+hYXpO3MB6XkTME32nFgWpvosG11YaWHkfYbhEgGRIBR8OfmdbnLI5k/1jo9xl/5
OWA/PxVvZ50kK7oVFg1MfX+wDrYOL3XRrwv52LVp9l/QqSSMbcHmt8xYhF1Yb7Ij
JAmcGBVZIeY4k4rXvq1r6TR5f4ItyzkJBPUBtaj2PruSmQPOy+LlOCAjMaO3a+pk
aiRcjbIRWodcC5p5eT7GEdbfEpzxB2iVkyI1SUvDuJKAJfDkBvCE3B/vY7SOdw7I
5rZ4wRSxNHeTX/cjtJSftTrrG/DC4nPTMFGgxlv6ElcP70HhWQFAxoJjdjf6HGw=
=oaW2
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/ericb/tags/pull-qapi-2018-03-01-v4' into staging
qapi patches for 2018-03-01
- Markus Armbruster: Modularize generated QAPI code
# gpg: Signature made Fri 02 Mar 2018 19:50:16 GMT
# gpg: using RSA key A7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>"
# gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>"
# gpg: aka "[jpeg image of size 6874]"
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A
* remotes/ericb/tags/pull-qapi-2018-03-01-v4: (30 commits)
qapi: Don't create useless directory qapi-generated
Fix up dangling references to qmp-commands.* in comment and doc
qapi: Move qapi-schema.json to qapi/, rename generated files
docs: Correct outdated information on QAPI
docs/devel/writing-qmp-commands: Update for modular QAPI
qapi: Empty out qapi-schema.json
Include less of the generated modular QAPI headers
qapi: Generate separate .h, .c for each module
watchdog: Consolidate QAPI into single file
qapi/common: Fix guardname() for funny filenames
qapi/types qapi/visit: Generate built-in stuff into separate files
qapi: Make code-generating visitors use QAPIGen more
qapi: Rename generated qmp-marshal.c to qmp-commands.c
qapi: Record 'include' directives in intermediate representation
qapi: Generate in source order
qapi: Record 'include' directives in parse tree
qapi: Concentrate QAPISchemaParser.exprs updates in .__init__()
qapi: Lift error reporting from QAPISchema.__init__() to callers
qapi/common: Eliminate QAPISchema.exprs
qapi: Improve include file name reporting in error messages
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
If netdev_add tap,id=net0,...,vhost=on failed in net_init_tap_one(),
the followed up device_add virtio-net-pci,netdev=net0 will fail
too, prints:
TUNSETOFFLOAD ioctl() failed: Bad file descriptor TUNSETOFFLOAD
ioctl() failed: Bad file descriptor
The reason is that the fd of tap is closed when error occured after
calling net_init_tap_one().
The fd should be closed when calling net_init_tap_one failed:
- if tap_set_sndbuf() failed
- if tap_set_sndbuf() succeeded but vhost failed to open or
initialize with vhostforce flag on
- with wrong vhost command line parameter
The fd should not be closed just because vhost failed to open or
initialize but without vhostforce flag. So the followed up
device_add can fall back to userspace virtio successfully.
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
The function is only used within net.c, so there's no need that
this is a global function.
While we're at it, also remove the unused prototype compute_mcast_idx()
(the function has been removed in commit d9caeb09b1).
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
It looks strange that net_init_client() and net_init_netdev() both
take an "Error **errp" parameter, but then do the error reporting
with "error_report_err(local_err)" on their own. Let's move the
error reporting to the calling site instead to simplify this code
a little bit.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
In my "build everything" tree, a change to the types in
qapi-schema.json triggers a recompile of about 4800 out of 5100
objects.
The previous commit split up qmp-commands.h, qmp-event.h, qmp-visit.h,
qapi-types.h. Each of these headers still includes all its shards.
Reduce compile time by including just the shards we actually need.
To illustrate the benefits: adding a type to qapi/migration.json now
recompiles some 2300 instead of 4800 objects. The next commit will
improve it further.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180211093607.27351-24-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
[eblake: rebase to master]
Signed-off-by: Eric Blake <eblake@redhat.com>
The CanBusState state structure is created for each
emulated CAN channel. Individual clients/emulated
CAN interfaces or host interface connection registers
to the bus by CanBusClientState structure.
The CAN core is prepared to support connection to the
real host CAN bus network. The commit with such support
for Linux SocketCAN follows.
Implementation is as simple as possible. There is no state to be
migrated, and messages prioritization and queuing are not considered
for now. But it is intended to be extended when need arises.
Development repository and more documentation at
https://gitlab.fel.cvut.cz/canbus/qemu-canbus
The work is based on Jin Yang GSoC 2013 work funded
by Google and mentored in frame of RTEMS project GSoC
slot donated to QEMU.
Rewritten for QEMU-2.0+ versions and architecture cleanup
by Pavel Pisa (Czech Technical University in Prague).
Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
qemu-common.h includes qemu/option.h, but most places that include the
former don't actually need the latter. Drop the include, and add it
to the places that actually need it.
While there, drop superfluous includes of both headers, and
separate #include from file comment with a blank line.
This cleanup makes the number of objects depending on qemu/option.h
drop from 4545 (out of 4743) to 284 in my "build everything" tree.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180201111846.21846-20-armbru@redhat.com>
[Semantic conflict with commit bdd6a90a9e in block/nvme.c resolved]
This cleanup makes the number of objects depending on qapi/qmp/qdict.h
drop from 4550 (out of 4743) to 368 in my "build everything" tree.
For qapi/qmp/qobject.h, the number drops from 4552 to 390.
While there, separate #include from file comment with a blank line.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180201111846.21846-13-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20180201111846.21846-4-armbru@redhat.com>
It has never been documented, so hardly anybody knows about this
parameter, and it is marked as deprecated since QEMU v2.6.
Time to let it go now.
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This provides a standard ethernet CRC32 little-endian implementation.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Separate out the standard ethernet CRC32 calculation into a new net_crc32()
function, renaming the constant POLYNOMIAL to POLYNOMIAL_BE to make it clear
that this is a big-endian CRC32 calculation.
As part of the constant rename, remove the duplicate definition of POLYNOMIAL
from eepro100.c and use the new POLYNOMIAL_BE constant instead.
Once this is complete remove the existing CRC32 implementation from
compute_mcast_idx() and call the new net_crc32() function in its place.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
The checksum algorithm used by IPv4, TCP and UDP allows a zero value
to be represented by either 0x0000 and 0xFFFF. But per RFC 768, a zero
UDP checksum must be transmitted as 0xFFFF because 0x0000 is a special
value meaning no checksum.
Substitute 0xFFFF whenever a checksum is computed as zero when
modifying a UDP datagram header. Doing this on IPv4 and TCP checksums
is unnecessary but legal. Add a wrapper for net_checksum_finish() that
makes the substitution.
(We can't just change net_checksum_finish(), as that function is also
used by receivers to verify checksums, and in that case the expected
value is always 0x0000.)
Signed-off-by: Ed Swierk <eswierk@skyportsystems.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
We add a flag to decide whether net_fill_rstate() need read
the vnet_hdr_len or not.
Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Add vnet_hdr_len arguments in NetClientState
that make other module get real vnet_hdr_len easily.
Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
NC-SI (Network Controller Sideband Interface) enables a BMC to manage
a set of NICs on a system. This model takes the simplest approach and
reverses the NC-SI packets to pretend a NIC is present and exercise
the Linux driver.
The NCSI header file <ncsi-pkt.h> comes from mainline Linux and was
untabified.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Make VLAN stripping functions return number of bytes
copied to given Ethernet header buffer.
This information should be used to re-compose
packet IOV after VLAN stripping.
Cc: qemu-stable@nongnu.org
Signed-off-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This patch provides a way for virtio-net to notify the
backend about the host MTU set by the user.
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Aaron Conole <aconole@redhat.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Unused function declarations were found using a simple gcc plugin and
manually verified by grepping the sources.
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
is_netdev is only used as a bool, so make it one.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1468468228-27827-14-git-send-email-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
This is a mostly-mechanical conversion that creates a new flat
union 'Netdev' QAPI type that covers all the branches of the
former 'NetClientOptions' simple union, where the branches are
now listed in a new 'NetClientDriver' enum rather than generated
from the simple union. The existence of a flat union has no
change to the command line syntax accepted for new code, and
will make it possible for a future patch to switch the QMP
command to parse a boxed union for no change to valid QMP; but
it does have some ripple effect on the C code when dealing with
the new types.
While making the conversion, note that the 'NetLegacy' type
remains unchanged: it applies only to legacy command line options,
and will not be ported to QMP, so it should remain a wrapper
around a simple union; to avoid confusion, the type named
'NetClientOptions' is now gone, and we introduce 'NetLegacyOptions'
in its place. Then, in the C code, we convert from NetLegacy to
Netdev as soon as possible, so that the bulk of the net stack
only has to deal with one QAPI type, not two. Note that since
the old legacy code always rejected 'hubport', we can just omit
that branch from the new 'NetLegacyOptions' simple union.
Based on an idea originally by Zoltán Kővágó <DirtY.iCE.hu@gmail.com>:
Message-Id: <01a527fbf1a5de880091f98cf011616a78adeeee.1441627176.git.DirtY.iCE.hu@gmail.com>
although the sed script in that patch no longer applies due to
other changes in the tree since then, and I also did some manual
cleanups (such as fixing whitespace to keep checkpatch happy).
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1468468228-27827-13-git-send-email-eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
[Fixup from Eric squashed in]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
This patch add the capability of basic vhost net busy polling which is
supported by recent kernel. User could configure the maximum number of
us that could be spent on busy polling through a new property of tap
"poll-us".
Cc: Greg Kurz <groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
A driver may change the vring enable state at run time but vhost-user
backend may not be present (a contrived example is when the backend is
disconnected and the device is reconfigured after driver rebinding)
Restore the vring state when the vhost-user backend is started, so it
can process the ring.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The initial vhost-user connection sets the features to be negotiated
with the driver. Renegotiation isn't possible without device reset.
To handle reconnection of vhost-user backend, ensure the same set of
features are provided, and reuse already acked features.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This patch extends the TX/RX packet abstractions with features that will
be used by the e1000e device implementation.
Changes are:
1. Support iovec lists for RX buffers
2. Deeper RX packets parsing
3. Loopback option for TX packets
4. Extended VLAN headers handling
5. RSS processing for RX packets
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
These macros will be used by future commits introducing
e1000e device emulation and by vmxnet3 tracing code.
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Dmitry Fleytman <dmitry.fleytman@ravellosystems.com>
Signed-off-by: Leonid Bloch <leonid.bloch@ravellosystems.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
This function is from net/socket.c, move it to net.c and net.h.
Add SocketReadState to make others reuse net_fill_rstate().
suggestion from jason.
v4:
- move 'rs->finalize = finalize' to rs_init()
v3:
- remove SocketReadState init callback
- put finalize callback to net_fill_rstate()
v2:
- rename ReadState to SocketReadState
- add SocketReadState init and finalize callback
v1:
- init patch
Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
All handling of defaults (default_* variables) is inside vl.c,
move default_net there too, so we can more easily refactor that
code later.
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Re-run scripts/clean-includes to apply the previous commit's
corrections and updates. Besides redundant qemu/typedefs.h, this only
finds a redundant config-host.h include in ui/egl-helpers.c. No idea
how that escaped the previous runs.
Some manual whitespace trimming around dropped includes squashed in.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
With this property, users can control if this filter is 'on'
or 'off'. The default behavior for filter is 'on'.
For some types of filters, they may need to react to status changing,
So here, we introduced status changing callback/notifier for filter class.
We will skip the disabled ('off') filter when delivering packets in net layer.
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.
This commit was created with scripts/clean-includes.
NB: If this commit breaks compilation for your out-of-tree
patchseries or fork, then you need to make sure you add
#include "qemu/osdep.h" to any new .c files that you have.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
The properties of netfilter object could be changed by 'qom-set'
command, but the output of 'info network' command is not updated,
because it got the old information through nf->info_str, it will
not be updated while we change the value of netfilter's property.
Here we split a helper function that could collect the output
information for filter, and also remove the useless member
'info_str' from struct NetFilterState.
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Yang Hongyang <hongyang.yang@easystack.cn>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Previously, if we attach more than one filters for a single netdev,
both ingress and egress traffic will go through net filters in same
order like:
ingress: netdev ->filter1 ->filter2 ->...filter[n] ->emulated device
egress: emulated device ->filter1 ->filter2 ->...filter[n] ->netdev.
This is against the natural feeling and will complicate filters
configuration since in some scenes, we hope filters handle the egress
traffic in a reverse order. For example, in colo-proxy (will be
implemented later), we have a redirector filter and a colo-rewriter
filter, we need the filter behave like:
ingress(->)/egress(<-): chardev<->redirector<->colo-rewriter<->emulated device
Since both buffer filter and dump do not require strict order of
filters, this patch switches to always let egress traffic walk through
net filters in reverse to simplify the possible filters configuration
in the future.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Yang Hongyang <hongyang.yang@easystack.cn>
Signed-off-by: Jason Wang <jasowang@redhat.com>
eth.h and slirp.h both define ETH_ALEN and ETH_P_IP
rtl8139.c and eth.h both define ETH_HLEN
Move the related constant (ETH_P_ARP) from slirp.h to eth.h, and
remove the duplicates; make slirp.h include eth.h
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
A new vhost user message is added to allow QEMU to ask to vhost user backend to
broadcast a fake RARP after live migration for guest without GUEST_ANNOUNCE
capability.
This new message is sent only if the backend supports the new
VHOST_USER_PROTOCOL_F_RARP protocol feature.
The payload of this new message is the MAC address of the guest (not known by
the backend). The MAC address is copied in the first 6 bytes of a u64 to avoid
to create a new payload message type.
This new message has no equivalent ioctl so a new callback is added in the
userOps structure to send the request.
Upon reception of this new message the vhost user backend must generate and
broadcast a fake RARP request to notify the migration is terminated.
Signed-off-by: Thibaut Collet <thibaut.collet@6wind.com>
[Rebased and fixed checkpatch errors - Marc-André]
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
This will be used by buffer filter implementation later to
queue packets.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
When execute "info network", print filter info also.
add a info_str member to NetFilterState, store specific filters
info.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
add an API qemu_netfilter_pass_to_next() to pass the packet
to next filter.
Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>