qemu-e2k

History

Emilio G. Cota 0ac20318ce tcg: remove tb_lock Use mmap_lock in user-mode to protect TCG state and the page descriptors. In !user-mode, each vCPU has its own TCG state, so no locks needed. Per-page locks are used to protect the page descriptors. Per-TB locks are used in both modes to protect TB jumps. Some notes: - tb_lock is removed from notdirty_mem_write by passing a locked page_collection to tb_invalidate_phys_page_fast. - tcg_tb_lookup/remove/insert/etc have their own internal lock(s), so there is no need to further serialize access to them. - do_tb_flush is run in a safe async context, meaning no other vCPU threads are running. Therefore acquiring mmap_lock there is just to please tools such as thread sanitizer. - Not visible in the diff, but tb_invalidate_phys_page already has an assert_memory_lock. - cpu_io_recompile is !user-only, so no mmap_lock there. - Added mmap_unlock()'s before all siglongjmp's that could be called in user-mode while mmap_lock is held. + Added an assert for !have_mmap_lock() after returning from the longjmp in cpu_exec, just like we do in cpu_exec_step_atomic. Performance numbers before/after: Host: AMD Opteron(tm) Processor 6376 ubuntu 17.04 ppc64 bootup+shutdown time 700 +-+--+----+------+------------+-----------+--------------+-+ \| + + + + + B \| \| before *B* ** * \| \|tb lock removal ###D### * \| 600 +-+ * +-+ \| ** # \| \| B #D \| \| *** * ## \| 500 +-+ *** ### +-+ \| * *** ### \| \| B # ## \| \| ** * #D# \| 400 +-+ ## +-+ \| ### \| \| ## \| \| # ## \| 300 +-+ * B* #D# +-+ \| B *** ### \| \| * ** #### \| \| * *** ### \| 200 +-+ B B #D# +-+ \| #B * ## # \| \| #* ## \| \| + D##D# + + + + \| 100 +-+--+----+------+------------+-----------+------------+--+-+ 1 8 16 Guest CPUs 48 64 png: https://imgur.com/HwmBHXe debian jessie aarch64 bootup+shutdown time 90 +-+--+-----+-----+------------+------------+------------+--+-+ \| + + + + + + \| \| before *B* B \| 80 +tb lock removal ###D### D +-+ \| ### \| \| ## \| 70 +-+ # +-+ \| ## \| \| # \| 60 +-+ B ## +-+ \| * ## \| \| * #D \| 50 +-+ * ## +-+ \| * ### \| \| B* ### \| 40 +-+ ** # ## +-+ \| #D# \| \| B* ### \| 30 +-+ B*B #### +-+ \| B * * # ### \| \| B ###D# \| 20 +-+ D ##D## +-+ \| D# \| \| + + + + + + \| 10 +-+--+-----+-----+------------+------------+------------+--+-+ 1 8 16 Guest CPUs 48 64 png: https://imgur.com/iGpGFtv The gains are high for 4-8 CPUs. Beyond that point, however, unrelated lock contention significantly hurts scalability. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>		2018-06-15 08:18:48 -10:00
..
atomics.txt	docs: document atomic_load_acquire and atomic_store_release	2018-03-12 16:12:47 +01:00
blkdebug.txt
blkverify.txt
build-system.txt	build-sys: add a rule to print a variable	2018-01-12 13:22:02 +01:00
loads-stores.rst	bswap: Add new stn__p() and ldn__p() memory access functions	2018-06-15 15:23:34 +01:00
lockcnt.txt	docs: fix broken paths to docs/devel/atomics.txt	2017-07-31 13:12:47 +03:00
memory.txt	memory: get rid of memory_region_init_reservation	2018-06-01 14:15:10 +02:00
migration.rst	migration: update docs	2018-05-15 22:13:08 +02:00
multi-thread-tcg.txt	tcg: remove tb_lock	2018-06-15 08:18:48 -10:00
multiple-iothreads.txt	docs: mark nested AioContext locking as a legacy API	2017-12-19 10:25:09 +00:00
qapi-code-gen.txt	qapi: introduce new cmd option "allow-preconfig"	2018-05-30 13:19:09 -03:00
rcu.txt
stable-process.rst	docs: document our stable process	2018-02-19 10:51:16 +01:00
testing.rst	docs: Add docs/devel/testing.rst	2018-02-08 09:23:07 +08:00
tracing.txt	trace: add trace_event_get_state_backends()	2017-08-01 12:13:07 +01:00
virtio-migration.txt
writing-qmp-commands.txt	qapi: Move qapi-schema.json to qapi/, rename generated files	2018-03-02 13:45:57 -06:00