We want to support memory devices that have a memory region container as
device memory region that maps multiple RAM memory regions. Let's start
by supporting memory devices that statically map multiple RAM memory
regions and, thereby, consume multiple memslots.
We already have one device that uses a container as device memory region:
NVDIMMs. However, a NVDIMM always ends up consuming exactly one memslot.
Let's add support for that by asking the memory device via a new
callback how many memslots it requires.
Message-ID: <20230926185738.277351-7-david@redhat.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's return the number of free slots instead of only checking if there
is a free slot. Required to support memory devices that consume multiple
memslots.
This is a preparation for memory devices that consume multiple memslots.
Message-ID: <20230926185738.277351-6-david@redhat.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's return the number of free slots instead of only checking if there
is a free slot. While at it, check all address spaces, which will also
consider SMM under x86 correctly.
This is a preparation for memory devices that consume multiple memslots.
Message-ID: <20230926185738.277351-5-david@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's fixup the documentation (e.g., removing traces of the ram_addr
parameter that no longer exists) and move it to the header file while at
it.
Message-ID: <20230926185738.277351-4-david@redhat.com>
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Checking whether the memory regions are equal is sufficient: if they are
equal, then most certainly the contained fd is equal.
The whole vhost-user memslot handling is suboptimal and overly
complicated. We shouldn't have to lookup a RAM memory regions we got
notified about in vhost_user_get_mr_data() using a host pointer. But that
requires a bigger rework -- especially an alternative vhost_set_mem_table()
backend call that simply consumes MemoryRegionSections.
For now, let's just drop vhost_backend_can_merge().
Message-ID: <20230926185738.277351-3-david@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Having multiple vhost devices, some filtering out fd-less memslots and
some not, can mess up the "used_memslot" accounting. Consequently our
"free memslot" checks become unreliable and we might run out of free
memslots at runtime later.
An example sequence which can trigger a potential issue that involves
different vhost backends (vhost-kernel and vhost-user) and hotplugged
memory devices can be found at [1].
Let's make the filtering mechanism less generic and distinguish between
backends that support private memslots (without a fd) and ones that only
support shared memslots (with a fd). Track the used_memslots for both
cases separately and use the corresponding value when required.
Note: Most probably we should filter out MAP_PRIVATE fd-based RAM regions
(for example, via memory-backend-memfd,...,shared=off or as default with
memory-backend-file) as well. When not using MAP_SHARED, it might not work
as expected. Add a TODO for now.
[1] https://lkml.kernel.org/r/fad9136f-08d3-3fd9-71a1-502069c000cf@redhat.com
Message-ID: <20230926185738.277351-2-david@redhat.com>
Fixes: 988a27754b ("vhost: allow backends to filter memory sections")
Cc: Tiwei Bie <tiwei.bie@intel.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
- enable more sbsa-ref tests in avocado
- add swtpm to the package lists
- reduce avocado noise in gitlab by limiting tests
- make docker engine choice driven by configure and enable override
- remove unneeded gcc suffix on some cross compilers
- fix some NULL returns in gdbstub
- improve locking in execlog plugin
- introduce the GDBFeature structure
- consistently set gdb_core_xml_file
- use cleaner escaping for gdb xml
- drop ancient gdb_has_xml() test
- disable multi-instruction GUSA emulation when plugins enabled
- fix some coverity issues in plugins
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEZoWumedRZ7yvyN81+9DbCVqeKkQFAmUmVJkACgkQ+9DbCVqe
KkTfNggAiS02FcL3faGjAN9+60xhvEQ3DJjI473hjvFWu0bSkQTjObcQqGc+V7Cw
9yNtnxOOWB6KdAU8At7HlVqiUXeyTCJB7Att5/UgNUZj63j+cs7PXb4p7cVCcJOc
17zni22tnmCBcC8wZaz0yj68jaftL3hz1QNUZOmv6CBt42q0+/4g1WKfaJ+w+SbK
T7cJEiMDObm8qeNAAXpDLB+9v3bRDxMZ8hFJ3p3CatQC8jbDrkuH7RrVPHDWiWQx
w0uXpUHlZEOVX23v6+iIoeb8YQW2bZI9UsfeyIHJlENaVgyL200LHgLvvAE4Qd63
dCtfQUZzj4t9sfoL4XgxaB7G4qtXTg==
=7PLI
-----END PGP SIGNATURE-----
Merge tag 'pull-omnibus-111023-1' of https://gitlab.com/stsquad/qemu into staging
testing, gdbstub and plugin updates
- enable more sbsa-ref tests in avocado
- add swtpm to the package lists
- reduce avocado noise in gitlab by limiting tests
- make docker engine choice driven by configure and enable override
- remove unneeded gcc suffix on some cross compilers
- fix some NULL returns in gdbstub
- improve locking in execlog plugin
- introduce the GDBFeature structure
- consistently set gdb_core_xml_file
- use cleaner escaping for gdb xml
- drop ancient gdb_has_xml() test
- disable multi-instruction GUSA emulation when plugins enabled
- fix some coverity issues in plugins
# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCgAdFiEEZoWumedRZ7yvyN81+9DbCVqeKkQFAmUmVJkACgkQ+9DbCVqe
# KkTfNggAiS02FcL3faGjAN9+60xhvEQ3DJjI473hjvFWu0bSkQTjObcQqGc+V7Cw
# 9yNtnxOOWB6KdAU8At7HlVqiUXeyTCJB7Att5/UgNUZj63j+cs7PXb4p7cVCcJOc
# 17zni22tnmCBcC8wZaz0yj68jaftL3hz1QNUZOmv6CBt42q0+/4g1WKfaJ+w+SbK
# T7cJEiMDObm8qeNAAXpDLB+9v3bRDxMZ8hFJ3p3CatQC8jbDrkuH7RrVPHDWiWQx
# w0uXpUHlZEOVX23v6+iIoeb8YQW2bZI9UsfeyIHJlENaVgyL200LHgLvvAE4Qd63
# dCtfQUZzj4t9sfoL4XgxaB7G4qtXTg==
# =7PLI
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 11 Oct 2023 03:54:01 EDT
# gpg: using RSA key 6685AE99E75167BCAFC8DF35FBD0DB095A9E2A44
# gpg: Good signature from "Alex Bennée (Master Work Key) <alex.bennee@linaro.org>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 6685 AE99 E751 67BC AFC8 DF35 FBD0 DB09 5A9E 2A44
* tag 'pull-omnibus-111023-1' of https://gitlab.com/stsquad/qemu: (25 commits)
contrib/plugins: fix coverity warning in hotblocks
contrib/plugins: fix coverity warning in lockstep
contrib/plugins: fix coverity warning in cache
plugins: Set final instruction count in plugin_gen_tb_end
target/sh4: Disable decode_gusa when plugins enabled
accel/tcg: Add plugin_enabled to DisasContextBase
gdbstub: Replace gdb_regs with an array
gdbstub: Remove gdb_has_xml variable
target/ppc: Remove references to gdb_has_xml
target/arm: Remove references to gdb_has_xml
gdbstub: Use g_markup_printf_escaped()
hw/core/cpu: Return static value with gdb_arch_name()
target/arm: Move the reference to arm-core.xml
gdbstub: Introduce GDBFeature structure
contrib/plugins: Use GRWLock in execlog
plugins: Check if vCPU is realized
gdbstub: Fix target.xml response
gdbstub: Fix target_xml initialization
configure: remove gcc version suffixes
configure: allow user to override docker engine
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
It's just a simple wrapper for rp_sem on either wait() or kick(), make it
even clearer on how it is used. Prepared to be used even for other things.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-ID: <20231004220240.167175-8-peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Instead of only relying on the count of rp_sem, make the counter be part of
RAMState so it can be used in both threads to synchronize on the process.
rp_sem will be further reused in follow up patches, as a way to kick the
main thread, e.g., on recovery failures.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231004220240.167175-7-peterx@redhat.com>
There're a lot of cases where we only have an errno set in last_error but
without a detailed error description. When this happens, try to generate
an error contains the errno as a descriptive error.
This will be helpful in cases where one relies on the Error*. E.g.,
migration state only caches Error* in MigrationState.error. With this,
we'll display correct error messages in e.g. query-migrate when the error
was only set by qemu_file_set_error().
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231004220240.167175-6-peterx@redhat.com>
Introduce a helper to detect whether MigrationState.error is set for
whatever reason.
This is preparation work for any thread (e.g. source return path thread) to
setup errors in an unified way to MigrationState, rather than relying on
its own way to set errors (mark_source_rp_bad()).
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231004220240.167175-3-peterx@redhat.com>
Display it as long as being set, irrelevant of FAILED status. E.g., it may
also be applicable to PAUSED stage of postcopy, to provide hint on what has
gone wrong.
The error_mutex seems to be overlooked when referencing the error, add it
to be very safe.
This will change QAPI behavior by showing up error message outside !FAILED
status, but it's intended and doesn't expect to break anyone.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2018404
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231004220240.167175-2-peterx@redhat.com>
qemu_rdma_dump_id() dumps RDMA device details to stdout.
rdma_start_outgoing_migration() calls it via qemu_rdma_source_init()
and qemu_rdma_resolve_host() to show source device details.
rdma_start_incoming_migration() arranges its call via
rdma_accept_incoming_migration() and qemu_rdma_accept() to show
destination device details.
Two issues:
1. rdma_start_outgoing_migration() can run in HMP context. The
information should arguably go the monitor, not stdout.
2. ibv_query_port() failure is reported as error. Its callers remain
unaware of this failure (qemu_rdma_dump_id() can't fail), so
reporting this to the user as an error is problematic.
Fixable, but the device detail dump is noise, except when
troubleshooting. Tracing is a better fit. Similar function
qemu_rdma_dump_id() was converted to tracing in commit
733252deb8 (Tracify migration/rdma.c).
Convert qemu_rdma_dump_id(), too.
While there, touch up qemu_rdma_dump_gid()'s outdated comment.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-54-armbru@redhat.com>
error_report() obeys -msg, reports the current error location if any,
and reports to the current monitor if any. Reporting to stderr
directly with fprintf() or perror() is wrong, because it loses all
this.
Fix the offenders. Bonus: resolves a FIXME about problematic use of
errno.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-53-armbru@redhat.com>
Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job. When the caller does, the error is reported twice. When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.
qemu_rdma_source_init(), qemu_rdma_connect(),
rdma_start_incoming_migration(), and rdma_start_outgoing_migration()
violate this principle: they call error_report() via
qemu_rdma_cleanup().
Moreover, qemu_rdma_cleanup() can't fail. It is called on error
paths, and QIOChannel close and finalization. Are the conditions it
reports really errors? I doubt it.
Downgrade qemu_rdma_cleanup()'s errors to warnings.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-52-armbru@redhat.com>
Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job. When the caller does, the error is reported twice. When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.
qemu_rdma_write_one() violates this principle: it reports errors to
stderr via qemu_rdma_register_and_get_keys(). I elected not to
investigate how callers handle the error, i.e. precise impact is not
known.
Clean this up: silence qemu_rdma_register_and_get_keys(). I believe
the caller's error reports suffice. If they don't, we need to convert
to Error instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-51-armbru@redhat.com>
Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job. When the caller does, the error is reported twice. When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.
qemu_rdma_post_send_control(), qemu_rdma_exchange_get_response(), and
qemu_rdma_write_one() violate this principle: they call
error_report(), fprintf(stderr, ...), and perror() via
qemu_rdma_block_for_wrid(), qemu_rdma_poll(), and
qemu_rdma_wait_comp_channel(). I elected not to investigate how
callers handle the error, i.e. precise impact is not known.
Clean this up by dropping the error reporting from qemu_rdma_poll(),
qemu_rdma_wait_comp_channel(), and qemu_rdma_block_for_wrid(). I
believe the callers' error reports suffice. If they don't, we need to
convert to Error instead.
Bonus: resolves a FIXME about problematic use of errno.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-50-armbru@redhat.com>
When qemu_rdma_wait_comp_channel() receives an event from the
completion channel, it reports an error "receive cm event while wait
comp channel,cm event is T", where T is the numeric event type.
However, the function fails only when T is a disconnect or device
removal. Events other than these two are not actually an error, and
reporting them as an error is wrong. If we need to report them to the
user, we should use something else, and what to use depends on why we
need to report them to the user.
For now, report this error only when the function actually fails.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-49-armbru@redhat.com>
Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job. When the caller does, the error is reported twice. When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.
qemu_rdma_source_init() and qemu_rdma_accept() violate this principle:
they call error_report() via qemu_rdma_reg_control(). I elected not
to investigate how callers handle the error, i.e. precise impact is
not known.
Clean this up by dropping the error reporting from
qemu_rdma_reg_control(). I believe the callers' error reports
suffice. If they don't, we need to convert to Error instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-48-armbru@redhat.com>
Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job. When the caller does, the error is reported twice. When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.
qemu_rdma_connect() violates this principle: it calls error_report()
and perror(). I elected not to investigate how callers handle the
error, i.e. precise impact is not known.
Clean this up: replace perror() by changing error_setg() to
error_setg_errno(), and drop error_report(). I believe the callers'
error reports suffice then. If they don't, we need to convert to
Error instead.
Bonus: resolves a FIXME about problematic use of errno.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-47-armbru@redhat.com>
Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job. When the caller does, the error is reported twice. When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.
qemu_rdma_resolve_host() violates this principle: it calls
error_report().
Clean this up: drop error_report().
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-46-armbru@redhat.com>
Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job. When the caller does, the error is reported twice. When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.
qemu_rdma_source_init() violates this principle: it calls
error_report() via qemu_rdma_alloc_pd_cq(). I elected not to
investigate how callers handle the error, i.e. precise impact is not
known.
Clean this up by converting qemu_rdma_alloc_pd_cq() to Error.
The conversion loses a piece of advice on one of two failure paths:
Your mlock() limits may be too low. Please check $ ulimit -a # and search for 'ulimit -l' in the output
Not worth retaining.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-45-armbru@redhat.com>
Just for symmetry with qemu_rdma_post_send_control(). Error messages
lose detail I consider of no use to users.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-44-armbru@redhat.com>
Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job. When the caller does, the error is reported twice. When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.
qemu_rdma_exchange_send() violates this principle: it calls
error_report() via qemu_rdma_post_send_control(). I elected not to
investigate how callers handle the error, i.e. precise impact is not
known.
Clean this up by converting qemu_rdma_post_send_control() to Error.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-43-armbru@redhat.com>
Just for consistency with qemu_rdma_write_one() and
qemu_rdma_write_flush(), and for slightly simpler code.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-42-armbru@redhat.com>
Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job. When the caller does, the error is reported twice. When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.
qemu_rdma_write_flush() violates this principle: it calls
error_report() via qemu_rdma_write_one(). I elected not to
investigate how callers handle the error, i.e. precise impact is not
known.
Clean this up by converting qemu_rdma_write_one() to Error. Bonus:
resolves a FIXME about problematic use of errno.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-41-armbru@redhat.com>
Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job. When the caller does, the error is reported twice. When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.
qio_channel_rdma_writev() violates this principle: it calls
error_report() via qemu_rdma_write_flush(). I elected not to
investigate how callers handle the error, i.e. precise impact is not
known.
Clean this up by converting qemu_rdma_write_flush() to Error.
Necessitates setting an error when qemu_rdma_write_one() failed.
Since this error will go away later in this series, simply use "FIXME
temporary error message" there.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-40-armbru@redhat.com>
Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job. When the caller does, the error is reported twice. When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.
qemu_rdma_exchange_send() violates this principle: it calls
error_report() via callback qemu_rdma_reg_whole_ram_blocks(). I
elected not to investigate how callers handle the error, i.e. precise
impact is not known.
Clean this up by converting the callback to Error.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-39-armbru@redhat.com>
Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job. When the caller does, the error is reported twice. When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.
qemu_rdma_exchange_send() and qemu_rdma_exchange_recv() violate this
principle: they call error_report() via
qemu_rdma_exchange_get_response(). I elected not to investigate how
callers handle the error, i.e. precise impact is not known.
Clean this up by converting qemu_rdma_exchange_get_response() to
Error.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-38-armbru@redhat.com>
Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job. When the caller does, the error is reported twice. When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.
qio_channel_rdma_writev() violates this principle: it calls
error_report() via qemu_rdma_exchange_send(). I elected not to
investigate how callers handle the error, i.e. precise impact is not
known.
Clean this up by converting qemu_rdma_exchange_send() to Error.
Necessitates setting an error when qemu_rdma_post_recv_control(),
callback(), or qemu_rdma_exchange_get_response() failed. Since these
errors will go away later in this series, simply use "FIXME temporary
error message" there.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-37-armbru@redhat.com>
Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job. When the caller does, the error is reported twice. When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.
qio_channel_rdma_readv() violates this principle: it calls
error_report() via qemu_rdma_exchange_recv(). I elected not to
investigate how callers handle the error, i.e. precise impact is not
known.
Clean this up by converting qemu_rdma_exchange_recv() to Error.
Necessitates setting an error when qemu_rdma_exchange_get_response()
failed. Since this error will go away later in this series, simply
use "FIXME temporary error message" there.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-36-armbru@redhat.com>
These guards are all redundant now.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-35-armbru@redhat.com>
qemu_rdma_resolve_host() and qemu_rdma_dest_init() iterate over
addresses to find one that works, holding onto the first Error from
qemu_rdma_broken_ipv6_kernel() for use when no address works. Issues:
1. If @errp was &error_abort or &error_fatal, we'd terminate instead
of trying the next address. Can't actually happen, since no caller
passes these arguments.
2. When @errp is a pointer to a variable containing NULL, and
qemu_rdma_broken_ipv6_kernel() fails, the variable no longer
contains NULL. Subsequent iterations pass it again, violating
Error usage rules. Dangerous, as setting an error would then trip
error_setv()'s assertion. Works only because
qemu_rdma_broken_ipv6_kernel() and the code following the loops
carefully avoids setting a second error.
3. If qemu_rdma_broken_ipv6_kernel() fails, and then a later iteration
finds a working address, @errp still holds the first error from
qemu_rdma_broken_ipv6_kernel(). If we then run into another error,
we report the qemu_rdma_broken_ipv6_kernel() failure instead.
4. If we don't run into another error, we leak the Error object.
Use a local error variable, and propagate to @errp. This fixes 3. and
also cleans up 1 and partly 2.
Free this error when we have a working address. This fixes 4.
Pass the local error variable to qemu_rdma_broken_ipv6_kernel() only
until it fails. Pass null on any later iterations. This cleans up
the remainder of 2.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-34-armbru@redhat.com>
ERROR() has become "error_setg() unless an error has been set
already". Hiding the conditional in the macro is in the way of
further work. Replace the macro uses by their expansion, and delete
the macro.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-33-armbru@redhat.com>
Functions that use an Error **errp parameter to return errors should
not also report them to the user, because reporting is the caller's
job. When the caller does, the error is reported twice. When it
doesn't (because it recovered from the error), there is no error to
report, i.e. the report is bogus.
Macro ERROR() violates this principle. Delete the error_report()
there.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Tested-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-32-armbru@redhat.com>
When migration capability @rdma-pin-all is true, but the server cannot
honor it, qemu_rdma_connect() calls macro ERROR(), then returns
success.
ERROR() sets an error. Since qemu_rdma_connect() returns success, its
caller rdma_start_outgoing_migration() duly assumes @errp is still
clear. The Error object leaks.
ERROR() additionally reports the situation to the user as an error:
RDMA ERROR: Server cannot support pinning all memory. Will register memory dynamically.
Is this an error or not? It actually isn't; we disable @rdma-pin-all
and carry on. "Correcting" the user's configuration decisions that
way feels problematic, but that's a topic for another day.
Replace ERROR() by warn_report(). This plugs the memory leak, and
emits a clearer message to the user.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-31-armbru@redhat.com>
When a function returns 0 on success, negative value on error,
checking for non-zero suffices, but checking for negative is clearer.
So do that.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-30-armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-29-armbru@redhat.com>
All we do with the value of RDMAContext member @error_state is test
whether it's zero. Change to bool and rename to @errored.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-28-armbru@redhat.com>
This is just to make the error value more obvious. Callers don't
mind.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-27-armbru@redhat.com>
Several functions return negative errno codes on failure. Callers
check for specific codes exactly never. For some of the functions,
callers couldn't check even if they wanted to, because the functions
also return negative values that aren't errno codes, leaving readers
confused on what the function actually returns.
Clean up and simplify: return -1 instead of negative errno code.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-26-armbru@redhat.com>
rdma_getaddrinfo() returns 0 on success. On error, it returns one of
the EAI_ error codes like getaddrinfo() does, or -1 with errno set.
This is broken by design: POSIX implicitly specifies the EAI_ error
codes to be non-zero, no more. They could clash with -1. Nothing we
can do about this design flaw.
Both callers of rdma_getaddrinfo() only recognize negative values as
error. Works only because systems elect to make the EAI_ error codes
negative.
Best not to rely on that: change the callers to treat any non-zero
value as failure. Also change them to return -1 instead of the value
received from getaddrinfo() on failure, to avoid positive error
values.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-25-armbru@redhat.com>
The QEMUFileHooks methods don't come with a written contract. Digging
through the code calling them, we find:
* save_page():
Negative values RAM_SAVE_CONTROL_DELAYED and
RAM_SAVE_CONTROL_NOT_SUPP are special. Any other negative value is
an unspecified error.
qemu_rdma_save_page() returns -EIO or rdma->error_state on error. I
believe the latter is always negative. Nothing stops either of them
to clash with the special values, though. Feels unlikely, but fix
it anyway to return only the special values and -1.
* before_ram_iterate(), after_ram_iterate():
Negative value means error. qemu_rdma_registration_start() and
qemu_rdma_registration_stop() comply as far as I can tell. Make
them comply *obviously*, by returning -1 on error.
* hook_ram_load:
Negative value means error. rdma_load_hook() already returns -1 on
error. Leave it alone.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-24-armbru@redhat.com>
qemu_rdma_data_init() neglects to set an Error when it fails because
@host_port is null. Fortunately, no caller passes null, so this is
merely a latent bug. Drop the flawed code handling null argument.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-23-armbru@redhat.com>
qemu_get_cm_event_timeout() neglects to set an error when it fails
because rdma_get_cm_event() fails. Harmless, as its caller
qemu_rdma_connect() substitutes a generic error then. Fix it anyway.
qemu_rdma_connect() also sets the generic error when its own call of
rdma_get_cm_event() fails. Make the error handling more obvious: set
a specific error right after rdma_get_cm_event() fails. Delete the
generic error.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-22-armbru@redhat.com>
qemu_rdma_resolve_host() and qemu_rdma_dest_init() try addresses until
they find on that works. If none works, they return the first Error
set by qemu_rdma_broken_ipv6_kernel(), or else return a generic one.
qemu_rdma_broken_ipv6_kernel() neglects to set an Error when
ibv_open_device() fails. If a later address fails differently, we use
that Error instead, or else the generic one. Harmless enough, but
needs fixing all the same.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-21-armbru@redhat.com>
Hiding return statements in macros is a bad idea. Use a function
instead, and open code the return part.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-20-armbru@redhat.com>
QIOChannelClass methods qio_channel_rdma_readv() and
qio_channel_rdma_writev() violate their method contract when
rdma->error_state is non-zero:
1. They return whatever is in rdma->error_state then. Only -1 will be
fine. -2 will be misinterpreted as "would block". Anything less
than -2 isn't defined in the contract. A positive value would be
misinterpreted as success, but I believe that's not actually
possible.
2. They neglect to set an error then. If something up the call stack
dereferences the error when failure is returned, it will crash. If
it ignores the return value and checks the error instead, it will
miss the error.
Crap like this happens when return statements hide in macros,
especially when their uses are far away from the definition.
I elected not to investigate how callers are impacted.
Expand the two bad macro uses, so we can set an error and return -1.
The next commit will then get rid of the macro altogether.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20230928132019.2544702-19-armbru@redhat.com>