migration/multifd: Don't fsync when closing QIOChannelFile

Commit bc38feddeb ("io: fsync before closing a file channel") added a
fsync/fdatasync at the closing point of the QIOChannelFile to ensure
integrity of the migration stream in case of QEMU crash.

The decision to do the sync at qio_channel_close() was not the best
since that function runs in the main thread and the fsync can cause
QEMU to hang for several minutes, depending on the migration size and
disk speed.

To fix the hang, remove the fsync from qio_channel_file_close().

At this moment, the migration code is the only user of the fsync and
we're taking the tradeoff of not having a sync at all, leaving the
responsibility to the upper layers.

Fixes: bc38feddeb ("io: fsync before closing a file channel")
Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240305195629.9922-1-farosas@suse.de
Link: https://lore.kernel.org/r/20240305174332.2553-1-farosas@suse.de
[peterx: add more comment to the qio_channel_close()]
Signed-off-by: Peter Xu <peterx@redhat.com>
This commit is contained in:
Fabiano Rosas 2024-03-05 16:56:29 -03:00 committed by Peter Xu
parent e6e08e8323
commit 61dec06082
3 changed files with 21 additions and 15 deletions

View File

@ -44,7 +44,8 @@ over any transport.
- file migration: do the migration using a file that is passed to QEMU - file migration: do the migration using a file that is passed to QEMU
by path. A file offset option is supported to allow a management by path. A file offset option is supported to allow a management
application to add its own metadata to the start of the file without application to add its own metadata to the start of the file without
QEMU interference. QEMU interference. Note that QEMU does not flush cached file
data/metadata at the end of migration.
In addition, support is included for migration using RDMA, which In addition, support is included for migration using RDMA, which
transports the page data using ``RDMA``, where the hardware takes care of transports the page data using ``RDMA``, where the hardware takes care of

View File

@ -242,11 +242,6 @@ static int qio_channel_file_close(QIOChannel *ioc,
{ {
QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc); QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc);
if (qemu_fdatasync(fioc->fd) < 0) {
error_setg_errno(errp, errno,
"Unable to synchronize file data with storage device");
return -1;
}
if (qemu_close(fioc->fd) < 0) { if (qemu_close(fioc->fd) < 0) {
error_setg_errno(errp, errno, error_setg_errno(errp, errno,
"Unable to close file"); "Unable to close file");

View File

@ -710,16 +710,26 @@ static bool multifd_send_cleanup_channel(MultiFDSendParams *p, Error **errp)
if (p->c) { if (p->c) {
migration_ioc_unregister_yank(p->c); migration_ioc_unregister_yank(p->c);
/* /*
* An explicit close() on the channel here is normally not * The object_unref() cannot guarantee the fd will always be
* required, but can be helpful for "file:" iochannels, where it * released because finalize() of the iochannel is only
* will include fdatasync() to make sure the data is flushed to the * triggered on the last reference and it's not guaranteed
* disk backend. * that we always hold the last refcount when reaching here.
* *
* The object_unref() cannot guarantee that because: (1) finalize() * Closing the fd explicitly has the benefit that if there is any
* of the iochannel is only triggered on the last reference, and * registered I/O handler callbacks on such fd, that will get a
* it's not guaranteed that we always hold the last refcount when * POLLNVAL event and will further trigger the cleanup to finally
* reaching here, and, (2) even if finalize() is invoked, it only * release the IOC.
* does a close(fd) without data flush. *
* FIXME: It should logically be guaranteed that all multifd
* channels have no I/O handler callback registered when reaching
* here, because migration thread will wait for all multifd channel
* establishments to complete during setup. Since
* migrate_fd_cleanup() will be scheduled in main thread too, all
* previous callbacks should guarantee to be completed when
* reaching here. See multifd_send_state.channels_created and its
* usage. In the future, we could replace this with an assert
* making sure we're the last reference, or simply drop it if above
* is more clear to be justified.
*/ */
qio_channel_close(p->c, &error_abort); qio_channel_close(p->c, &error_abort);
object_unref(OBJECT(p->c)); object_unref(OBJECT(p->c));