2012-05-09 16:09:46 +02:00
|
|
|
/*
|
|
|
|
* Ratelimiting calculations
|
|
|
|
*
|
|
|
|
* Copyright IBM, Corp. 2011
|
|
|
|
*
|
|
|
|
* Authors:
|
|
|
|
* Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
|
|
|
|
*
|
|
|
|
* This work is licensed under the terms of the GNU LGPL, version 2 or later.
|
|
|
|
* See the COPYING.LIB file in the top-level directory.
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
|
|
|
|
#ifndef QEMU_RATELIMIT_H
|
2016-06-29 15:29:06 +02:00
|
|
|
#define QEMU_RATELIMIT_H
|
2012-05-09 16:09:46 +02:00
|
|
|
|
2021-04-13 10:20:32 +02:00
|
|
|
#include "qemu/lockable.h"
|
2019-08-12 07:23:31 +02:00
|
|
|
#include "qemu/timer.h"
|
|
|
|
|
2012-05-09 16:09:46 +02:00
|
|
|
typedef struct {
|
2021-04-13 10:20:32 +02:00
|
|
|
QemuMutex lock;
|
Improve block job rate limiting for small bandwidth values
ratelimit_calculate_delay() previously reset the accounting every time
slice, no matter how much data had been processed before. This had (at
least) two consequences:
1. The minimum speed is rather large, e.g. 5 MiB/s for commit and stream.
Not sure if there are real-world use cases where this would be a
problem. Mirroring and backup over a slow link (e.g. DSL) would
come to mind, though.
2. Tests for block job operations (e.g. cancel) were rather racy
All block jobs currently use a time slice of 100ms. That's a
reasonable value to get smooth output during regular
operation. However this also meant that the state of block jobs
changed every 100ms, no matter how low the configured limit was. On
busy hosts, qemu often transferred additional chunks until the test
case had a chance to cancel the job.
Fix the block job rate limit code to delay for more than one time
slice to address the above issues. To make it easier to handle
oversized chunks we switch the semantics from returning a delay
_before_ the current request to a delay _after_ the current
request. If necessary, this delay consists of multiple time slice
units.
Since the mirror job sends multiple chunks in one go even if the rate
limit was exceeded in between, we need to keep track of the start of
the current time slice so we can correctly re-compute the delay for
the updated amount of data.
The minimum bandwidth now is 1 data unit per time slice. The block
jobs are currently passing the amount of data transferred in sectors
and using 100ms time slices, so this translates to 5120
bytes/second. With chunk sizes usually being O(512KiB), tests have
plenty of time (O(100s)) to operate on block jobs. The chance of a
race condition now is fairly remote, except possibly on insanely
loaded systems.
Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Message-id: 1467127721-9564-2-git-send-email-silbe@linux.vnet.ibm.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-06-28 17:28:41 +02:00
|
|
|
int64_t slice_start_time;
|
|
|
|
int64_t slice_end_time;
|
2012-05-09 16:09:46 +02:00
|
|
|
uint64_t slice_quota;
|
|
|
|
uint64_t slice_ns;
|
|
|
|
uint64_t dispatched;
|
|
|
|
} RateLimit;
|
|
|
|
|
Improve block job rate limiting for small bandwidth values
ratelimit_calculate_delay() previously reset the accounting every time
slice, no matter how much data had been processed before. This had (at
least) two consequences:
1. The minimum speed is rather large, e.g. 5 MiB/s for commit and stream.
Not sure if there are real-world use cases where this would be a
problem. Mirroring and backup over a slow link (e.g. DSL) would
come to mind, though.
2. Tests for block job operations (e.g. cancel) were rather racy
All block jobs currently use a time slice of 100ms. That's a
reasonable value to get smooth output during regular
operation. However this also meant that the state of block jobs
changed every 100ms, no matter how low the configured limit was. On
busy hosts, qemu often transferred additional chunks until the test
case had a chance to cancel the job.
Fix the block job rate limit code to delay for more than one time
slice to address the above issues. To make it easier to handle
oversized chunks we switch the semantics from returning a delay
_before_ the current request to a delay _after_ the current
request. If necessary, this delay consists of multiple time slice
units.
Since the mirror job sends multiple chunks in one go even if the rate
limit was exceeded in between, we need to keep track of the start of
the current time slice so we can correctly re-compute the delay for
the updated amount of data.
The minimum bandwidth now is 1 data unit per time slice. The block
jobs are currently passing the amount of data transferred in sectors
and using 100ms time slices, so this translates to 5120
bytes/second. With chunk sizes usually being O(512KiB), tests have
plenty of time (O(100s)) to operate on block jobs. The chance of a
race condition now is fairly remote, except possibly on insanely
loaded systems.
Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Message-id: 1467127721-9564-2-git-send-email-silbe@linux.vnet.ibm.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-06-28 17:28:41 +02:00
|
|
|
/** Calculate and return delay for next request in ns
|
|
|
|
*
|
2017-07-07 14:44:39 +02:00
|
|
|
* Record that we sent @n data units (where @n matches the scale chosen
|
|
|
|
* during ratelimit_set_speed). If we may send more data units
|
Improve block job rate limiting for small bandwidth values
ratelimit_calculate_delay() previously reset the accounting every time
slice, no matter how much data had been processed before. This had (at
least) two consequences:
1. The minimum speed is rather large, e.g. 5 MiB/s for commit and stream.
Not sure if there are real-world use cases where this would be a
problem. Mirroring and backup over a slow link (e.g. DSL) would
come to mind, though.
2. Tests for block job operations (e.g. cancel) were rather racy
All block jobs currently use a time slice of 100ms. That's a
reasonable value to get smooth output during regular
operation. However this also meant that the state of block jobs
changed every 100ms, no matter how low the configured limit was. On
busy hosts, qemu often transferred additional chunks until the test
case had a chance to cancel the job.
Fix the block job rate limit code to delay for more than one time
slice to address the above issues. To make it easier to handle
oversized chunks we switch the semantics from returning a delay
_before_ the current request to a delay _after_ the current
request. If necessary, this delay consists of multiple time slice
units.
Since the mirror job sends multiple chunks in one go even if the rate
limit was exceeded in between, we need to keep track of the start of
the current time slice so we can correctly re-compute the delay for
the updated amount of data.
The minimum bandwidth now is 1 data unit per time slice. The block
jobs are currently passing the amount of data transferred in sectors
and using 100ms time slices, so this translates to 5120
bytes/second. With chunk sizes usually being O(512KiB), tests have
plenty of time (O(100s)) to operate on block jobs. The chance of a
race condition now is fairly remote, except possibly on insanely
loaded systems.
Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Message-id: 1467127721-9564-2-git-send-email-silbe@linux.vnet.ibm.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-06-28 17:28:41 +02:00
|
|
|
* in the current time slice, return 0 (i.e. no delay). Otherwise
|
|
|
|
* return the amount of time (in ns) until the start of the next time
|
|
|
|
* slice that will permit sending the next chunk of data.
|
|
|
|
*
|
|
|
|
* Recording sent data units even after exceeding the quota is
|
|
|
|
* permitted; the time slice will be extended accordingly.
|
|
|
|
*/
|
2012-05-09 16:09:46 +02:00
|
|
|
static inline int64_t ratelimit_calculate_delay(RateLimit *limit, uint64_t n)
|
|
|
|
{
|
2013-08-21 17:03:08 +02:00
|
|
|
int64_t now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
|
2018-02-07 08:17:58 +01:00
|
|
|
double delay_slices;
|
2012-05-09 16:09:46 +02:00
|
|
|
|
2021-04-13 10:20:32 +02:00
|
|
|
QEMU_LOCK_GUARD(&limit->lock);
|
2021-06-14 10:11:26 +02:00
|
|
|
if (!limit->slice_quota) {
|
|
|
|
/* Throttling disabled. */
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
assert(limit->slice_ns);
|
Improve block job rate limiting for small bandwidth values
ratelimit_calculate_delay() previously reset the accounting every time
slice, no matter how much data had been processed before. This had (at
least) two consequences:
1. The minimum speed is rather large, e.g. 5 MiB/s for commit and stream.
Not sure if there are real-world use cases where this would be a
problem. Mirroring and backup over a slow link (e.g. DSL) would
come to mind, though.
2. Tests for block job operations (e.g. cancel) were rather racy
All block jobs currently use a time slice of 100ms. That's a
reasonable value to get smooth output during regular
operation. However this also meant that the state of block jobs
changed every 100ms, no matter how low the configured limit was. On
busy hosts, qemu often transferred additional chunks until the test
case had a chance to cancel the job.
Fix the block job rate limit code to delay for more than one time
slice to address the above issues. To make it easier to handle
oversized chunks we switch the semantics from returning a delay
_before_ the current request to a delay _after_ the current
request. If necessary, this delay consists of multiple time slice
units.
Since the mirror job sends multiple chunks in one go even if the rate
limit was exceeded in between, we need to keep track of the start of
the current time slice so we can correctly re-compute the delay for
the updated amount of data.
The minimum bandwidth now is 1 data unit per time slice. The block
jobs are currently passing the amount of data transferred in sectors
and using 100ms time slices, so this translates to 5120
bytes/second. With chunk sizes usually being O(512KiB), tests have
plenty of time (O(100s)) to operate on block jobs. The chance of a
race condition now is fairly remote, except possibly on insanely
loaded systems.
Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Message-id: 1467127721-9564-2-git-send-email-silbe@linux.vnet.ibm.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-06-28 17:28:41 +02:00
|
|
|
|
|
|
|
if (limit->slice_end_time < now) {
|
|
|
|
/* Previous, possibly extended, time slice finished; reset the
|
|
|
|
* accounting. */
|
|
|
|
limit->slice_start_time = now;
|
|
|
|
limit->slice_end_time = now + limit->slice_ns;
|
2012-05-09 16:09:46 +02:00
|
|
|
limit->dispatched = 0;
|
|
|
|
}
|
Improve block job rate limiting for small bandwidth values
ratelimit_calculate_delay() previously reset the accounting every time
slice, no matter how much data had been processed before. This had (at
least) two consequences:
1. The minimum speed is rather large, e.g. 5 MiB/s for commit and stream.
Not sure if there are real-world use cases where this would be a
problem. Mirroring and backup over a slow link (e.g. DSL) would
come to mind, though.
2. Tests for block job operations (e.g. cancel) were rather racy
All block jobs currently use a time slice of 100ms. That's a
reasonable value to get smooth output during regular
operation. However this also meant that the state of block jobs
changed every 100ms, no matter how low the configured limit was. On
busy hosts, qemu often transferred additional chunks until the test
case had a chance to cancel the job.
Fix the block job rate limit code to delay for more than one time
slice to address the above issues. To make it easier to handle
oversized chunks we switch the semantics from returning a delay
_before_ the current request to a delay _after_ the current
request. If necessary, this delay consists of multiple time slice
units.
Since the mirror job sends multiple chunks in one go even if the rate
limit was exceeded in between, we need to keep track of the start of
the current time slice so we can correctly re-compute the delay for
the updated amount of data.
The minimum bandwidth now is 1 data unit per time slice. The block
jobs are currently passing the amount of data transferred in sectors
and using 100ms time slices, so this translates to 5120
bytes/second. With chunk sizes usually being O(512KiB), tests have
plenty of time (O(100s)) to operate on block jobs. The chance of a
race condition now is fairly remote, except possibly on insanely
loaded systems.
Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Message-id: 1467127721-9564-2-git-send-email-silbe@linux.vnet.ibm.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-06-28 17:28:41 +02:00
|
|
|
|
|
|
|
limit->dispatched += n;
|
|
|
|
if (limit->dispatched < limit->slice_quota) {
|
|
|
|
/* We may send further data within the current time slice, no
|
|
|
|
* need to delay the next request. */
|
2012-05-09 16:09:46 +02:00
|
|
|
return 0;
|
|
|
|
}
|
Improve block job rate limiting for small bandwidth values
ratelimit_calculate_delay() previously reset the accounting every time
slice, no matter how much data had been processed before. This had (at
least) two consequences:
1. The minimum speed is rather large, e.g. 5 MiB/s for commit and stream.
Not sure if there are real-world use cases where this would be a
problem. Mirroring and backup over a slow link (e.g. DSL) would
come to mind, though.
2. Tests for block job operations (e.g. cancel) were rather racy
All block jobs currently use a time slice of 100ms. That's a
reasonable value to get smooth output during regular
operation. However this also meant that the state of block jobs
changed every 100ms, no matter how low the configured limit was. On
busy hosts, qemu often transferred additional chunks until the test
case had a chance to cancel the job.
Fix the block job rate limit code to delay for more than one time
slice to address the above issues. To make it easier to handle
oversized chunks we switch the semantics from returning a delay
_before_ the current request to a delay _after_ the current
request. If necessary, this delay consists of multiple time slice
units.
Since the mirror job sends multiple chunks in one go even if the rate
limit was exceeded in between, we need to keep track of the start of
the current time slice so we can correctly re-compute the delay for
the updated amount of data.
The minimum bandwidth now is 1 data unit per time slice. The block
jobs are currently passing the amount of data transferred in sectors
and using 100ms time slices, so this translates to 5120
bytes/second. With chunk sizes usually being O(512KiB), tests have
plenty of time (O(100s)) to operate on block jobs. The chance of a
race condition now is fairly remote, except possibly on insanely
loaded systems.
Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Message-id: 1467127721-9564-2-git-send-email-silbe@linux.vnet.ibm.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-06-28 17:28:41 +02:00
|
|
|
|
2018-02-07 08:17:58 +01:00
|
|
|
/* Quota exceeded. Wait based on the excess amount and then start a new
|
|
|
|
* slice. */
|
|
|
|
delay_slices = (double)limit->dispatched / limit->slice_quota;
|
Improve block job rate limiting for small bandwidth values
ratelimit_calculate_delay() previously reset the accounting every time
slice, no matter how much data had been processed before. This had (at
least) two consequences:
1. The minimum speed is rather large, e.g. 5 MiB/s for commit and stream.
Not sure if there are real-world use cases where this would be a
problem. Mirroring and backup over a slow link (e.g. DSL) would
come to mind, though.
2. Tests for block job operations (e.g. cancel) were rather racy
All block jobs currently use a time slice of 100ms. That's a
reasonable value to get smooth output during regular
operation. However this also meant that the state of block jobs
changed every 100ms, no matter how low the configured limit was. On
busy hosts, qemu often transferred additional chunks until the test
case had a chance to cancel the job.
Fix the block job rate limit code to delay for more than one time
slice to address the above issues. To make it easier to handle
oversized chunks we switch the semantics from returning a delay
_before_ the current request to a delay _after_ the current
request. If necessary, this delay consists of multiple time slice
units.
Since the mirror job sends multiple chunks in one go even if the rate
limit was exceeded in between, we need to keep track of the start of
the current time slice so we can correctly re-compute the delay for
the updated amount of data.
The minimum bandwidth now is 1 data unit per time slice. The block
jobs are currently passing the amount of data transferred in sectors
and using 100ms time slices, so this translates to 5120
bytes/second. With chunk sizes usually being O(512KiB), tests have
plenty of time (O(100s)) to operate on block jobs. The chance of a
race condition now is fairly remote, except possibly on insanely
loaded systems.
Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Message-id: 1467127721-9564-2-git-send-email-silbe@linux.vnet.ibm.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-06-28 17:28:41 +02:00
|
|
|
limit->slice_end_time = limit->slice_start_time +
|
2018-02-07 08:17:58 +01:00
|
|
|
(uint64_t)(delay_slices * limit->slice_ns);
|
Improve block job rate limiting for small bandwidth values
ratelimit_calculate_delay() previously reset the accounting every time
slice, no matter how much data had been processed before. This had (at
least) two consequences:
1. The minimum speed is rather large, e.g. 5 MiB/s for commit and stream.
Not sure if there are real-world use cases where this would be a
problem. Mirroring and backup over a slow link (e.g. DSL) would
come to mind, though.
2. Tests for block job operations (e.g. cancel) were rather racy
All block jobs currently use a time slice of 100ms. That's a
reasonable value to get smooth output during regular
operation. However this also meant that the state of block jobs
changed every 100ms, no matter how low the configured limit was. On
busy hosts, qemu often transferred additional chunks until the test
case had a chance to cancel the job.
Fix the block job rate limit code to delay for more than one time
slice to address the above issues. To make it easier to handle
oversized chunks we switch the semantics from returning a delay
_before_ the current request to a delay _after_ the current
request. If necessary, this delay consists of multiple time slice
units.
Since the mirror job sends multiple chunks in one go even if the rate
limit was exceeded in between, we need to keep track of the start of
the current time slice so we can correctly re-compute the delay for
the updated amount of data.
The minimum bandwidth now is 1 data unit per time slice. The block
jobs are currently passing the amount of data transferred in sectors
and using 100ms time slices, so this translates to 5120
bytes/second. With chunk sizes usually being O(512KiB), tests have
plenty of time (O(100s)) to operate on block jobs. The chance of a
race condition now is fairly remote, except possibly on insanely
loaded systems.
Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Message-id: 1467127721-9564-2-git-send-email-silbe@linux.vnet.ibm.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-06-28 17:28:41 +02:00
|
|
|
return limit->slice_end_time - now;
|
2012-05-09 16:09:46 +02:00
|
|
|
}
|
|
|
|
|
2021-04-13 10:20:32 +02:00
|
|
|
static inline void ratelimit_init(RateLimit *limit)
|
|
|
|
{
|
|
|
|
qemu_mutex_init(&limit->lock);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void ratelimit_destroy(RateLimit *limit)
|
|
|
|
{
|
|
|
|
qemu_mutex_destroy(&limit->lock);
|
|
|
|
}
|
|
|
|
|
2012-05-09 16:09:46 +02:00
|
|
|
static inline void ratelimit_set_speed(RateLimit *limit, uint64_t speed,
|
|
|
|
uint64_t slice_ns)
|
|
|
|
{
|
2021-04-13 10:20:32 +02:00
|
|
|
QEMU_LOCK_GUARD(&limit->lock);
|
2012-05-09 16:09:46 +02:00
|
|
|
limit->slice_ns = slice_ns;
|
2021-06-14 10:11:26 +02:00
|
|
|
if (speed == 0) {
|
|
|
|
limit->slice_quota = 0;
|
|
|
|
} else {
|
|
|
|
limit->slice_quota = MAX(((double)speed * slice_ns) / 1000000000ULL, 1);
|
|
|
|
}
|
2012-05-09 16:09:46 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
#endif
|