test-bdrv-drain: avoid race with BH in IOThread drain test

This patch fixes a race condition in test-bdrv-drain that is difficult
to reproduce. test-bdrv-drain sometimes fails without an error message
on the block pull request sent by Kevin Wolf on Sep 4, 2023. I was able
to reproduce it locally and found that "block-backend: process I/O in
the current AioContext" (in this patch series) is the first commit where
it reproduces.

I do not know why "block-backend: process I/O in the current AioContext"
exposes this bug. It might be related to the fact that the test's preadv
request runs in the main thread instead of IOThread a after my commit.
That might simply change the timing of the test.

Now on to the race condition in test-bdrv-drain. The main thread
schedules a BH in IOThread a and then drains the BDS:

  aio_bh_schedule_oneshot(ctx_a, test_iothread_main_thread_bh, &data);

  /* The request is running on the IOThread a. Draining its block device
   * will make sure that it has completed as far as the BDS is concerned,
   * but the drain in this thread can continue immediately after
   * bdrv_dec_in_flight() and aio_ret might be assigned only slightly
   * later. */
  do_drain_begin(drain_type, bs);

If the BH completes before do_drain_begin() then there is nothing to
worry about.

If the BH invokes bdrv_flush() before do_drain_begin(), then
do_drain_begin() waits for it to complete.

The problematic case is when do_drain_begin() runs before the BH enters
bdrv_flush(). Then do_drain_begin() misses the BH and the drain
mechanism has failed in quiescing I/O.

Fix this by incrementing the in_flight counter so that do_drain_begin()
waits for test_iothread_main_thread_bh().

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230912231037.826804-3-stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This commit is contained in:
Stefan Hajnoczi 2023-09-12 19:10:34 -04:00 committed by Kevin Wolf
parent 652b0dd808
commit c8bf923d5e

View File

@ -512,6 +512,7 @@ static void test_iothread_main_thread_bh(void *opaque)
* executed during drain, otherwise this would deadlock. */
aio_context_acquire(bdrv_get_aio_context(data->bs));
bdrv_flush(data->bs);
bdrv_dec_in_flight(data->bs); /* incremented by test_iothread_common() */
aio_context_release(bdrv_get_aio_context(data->bs));
}
@ -583,6 +584,13 @@ static void test_iothread_common(enum drain_type drain_type, int drain_thread)
aio_context_acquire(ctx_a);
}
/*
* Increment in_flight so that do_drain_begin() waits for
* test_iothread_main_thread_bh(). This prevents the race between
* test_iothread_main_thread_bh() in IOThread a and do_drain_begin() in
* this thread. test_iothread_main_thread_bh() decrements in_flight.
*/
bdrv_inc_in_flight(bs);
aio_bh_schedule_oneshot(ctx_a, test_iothread_main_thread_bh, &data);
/* The request is running on the IOThread a. Draining its block device