qemu-e2k

Author	SHA1	Message	Date
Richard Henderson	984fdcee34	tcg/aarch64: Split up is_fimm There are several sub-classes of vector immediate, and only MOVI can use them all. This will enable usage of MVNI and ORRI, which use progressively fewer sub-classes. This patch adds no new functionality, merely splits the function and moves part of the logic into tcg_out_dupi_vec. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-22 15:09:43 -04:00
Richard Henderson	a9e434a5dc	tcg/aarch64: Support vector bitwise select value The instruction set has 3 insns that perform the same operation, only varying in which operand must overlap the destination. We can represent the operation without overlap and choose based on the operands seen. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-22 15:09:43 -04:00
Richard Henderson	ebcfb91abe	tcg/i386: Use umin/umax in expanding unsigned compare Using umin(a, b) == a as an expansion for TCG_COND_LEU is a better alternative to (a - INT_MIN) <= (b - INT_MIN). Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-22 15:09:43 -04:00
Richard Henderson	3ec3538a45	tcg/i386: Remove expansion for missing minmax This is now handled by code within tcg-op-vec.c. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-22 15:09:43 -04:00
Richard Henderson	904c5e1967	tcg/i386: Support vector comparison select value We already had backend support for this feature. Expand the new cmpsel opcode using vpblendb. The combination allows us to avoid an extra NOT for some comparison codes. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-22 15:09:43 -04:00
Richard Henderson	25c012b400	tcg: Add TCG_OPF_NOT_PRESENT if TCG_TARGET_HAS_foo is negative If INDEX_op_foo is always expanded by tcg_expand_vec_op, then there may be no reasonable set of constraints to return from tcg_target_op_def for that opcode. Let TCG_TARGET_HAS_foo be specified as -1 in that case. Thus a boolean test for TCG_TARGET_HAS_foo is true, but we will not assert within process_op_defs when no constraints are specified. Compare this with tcg_can_emit_vec_op, which already uses this tri-state indication. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-22 15:09:43 -04:00
Richard Henderson	72b4c792c7	tcg: Expand vector minmax using cmp+cmpsel Provide a generic fallback for the min/max operations. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-22 15:09:43 -04:00
Richard Henderson	17f79944eb	tcg: Introduce do_op3_nofail for vector expansion This makes do_op3 match do_op2 in allowing for failure, and thus fall back expansions. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-22 15:09:43 -04:00
Richard Henderson	f75da2988e	tcg: Add support for vector compare select Perform a per-element conditional move. This combination operation is easier to implement on some host vector units than plain cmp+bitsel. Omit the usual gvec interface, as this is intended to be used by target-specific gvec expansion call-backs. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-22 15:09:43 -04:00
Richard Henderson	38dc12947e	tcg: Add support for vector bitwise select This operation performs d = (b & a) \| (c & ~a), and is present on a majority of host vector units. Include gvec expanders. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-22 15:09:43 -04:00
Richard Henderson	532ba368a1	tcg: Fix missing checks and clears in tcg_gen_gvec_dup_mem The paths through tcg_gen_dup_mem_vec and through MO_128 were missing the check_size_align. The path through MO_128 was also missing the expand_clr. This last was not visible because the only user is ARM SVE, which would set oprsz == maxsz, and not require the clear. Fix by adding the check_size_align and using do_dup directly instead of duplicating the check in tcg_gen_gvec_dup_{i32,i64}. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-22 15:09:43 -04:00
Richard Henderson	7b60ef3264	tcg/i386: Fix dupi/dupm for avx1 and 32-bit hosts The VBROADCASTSD instruction only allows %ymm registers as destination. Rather than forcing VEX.L and writing to the entire 256-bit register, revert to using MOVDDUP with an %xmm register. This is sufficient for an avx1 host since we do not support TCG_TYPE_V256 for that case. Also fix the 32-bit avx2, which should have used VPBROADCASTW. Fixes: `1e262b49b5` Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-22 15:09:43 -04:00
Richard Henderson	a7b6d286cf	tcg/aarch64: Do not advertise minmax for MO_64 The min/max instructions are not available for 64-bit elements. Fixes: `93f332a503` Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 22:52:08 +00:00
Richard Henderson	a456394ae5	tcg/aarch64: Support vector absolute value Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 22:52:08 +00:00
Richard Henderson	18f9b65f1a	tcg/i386: Support vector absolute value Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 22:52:08 +00:00
Richard Henderson	bcefc90208	tcg: Add support for vector absolute value Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 22:52:08 +00:00
Richard Henderson	ff1f11f7f8	tcg: Add support for integer absolute value Remove a function of the same name from target/arm/. Use a branchless implementation of abs gleaned from gcc. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 22:52:08 +00:00
Richard Henderson	0a8d7a3bf5	tcg/i386: Support vector scalar shift opcodes Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 22:52:08 +00:00
Richard Henderson	b4578cd91c	tcg: Add gvec expanders for vector shift by scalar Allow expansion either via shift by scalar or by replicating the scalar for shift by vector. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- v3: Use a private structure for do_gvec_shifts.	2019-05-13 22:52:08 +00:00
Richard Henderson	79525dfd08	tcg/aarch64: Support vector variable shift opcodes Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 22:52:08 +00:00
Richard Henderson	a2ce146a06	tcg/i386: Support vector variable shift opcodes Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 22:52:08 +00:00
Richard Henderson	5ee5c14cac	tcg: Add gvec expanders for variable shift The gvec expanders perform a modulo on the shift count. If the target requires alternate behaviour, then it cannot use the generic gvec expanders anyway, and will have to have its own custom code. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 22:52:08 +00:00
Richard Henderson	37ee55a081	tcg: Add INDEX_op_dupm_vec Allow the backend to expand dup from memory directly, instead of forcing the value into a temp first. This is especially important if integer/vector register moves do not exist. Note that officially tcg_out_dupm_vec is allowed to fail. If it did, we could fix this up relatively easily: VECE == 32/64: Load the value into a vector register, then dup. Both of these must work. VECE == 8/16: If the value happens to be at an offset such that an aligned load would place the desired value in the least significant end of the register, go ahead and load w/garbage in high bits. Load the value w/INDEX_op_ld{8,16}_i32. Attempt a move directly to vector reg, which may fail. Store the value into the backing store for OTS. Load the value into the vector reg w/TCG_TYPE_I32, which must work. Duplicate from the vector reg into itself, which must work. All of which is well and good, except that all supported hosts can support dupm for all vece, so all of the failure paths would be dead code and untestable. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 22:52:08 +00:00
Richard Henderson	f23e5e15ed	tcg/aarch64: Implement tcg_out_dupm_vec The LD1R instruction does all the work. Note that the only useful addressing mode is a base register with no offset. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 22:50:35 +00:00
Richard Henderson	1e262b49b5	tcg/i386: Implement tcg_out_dupm_vec At the same time, improve tcg_out_dupi_vec wrt broadcast from the constant pool. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 14:44:03 -07:00
Richard Henderson	d6ecb4a978	tcg: Add tcg_out_dupm_vec to the backend interface Currently stubbed out in all backends that support vectors. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 14:44:03 -07:00
Richard Henderson	bab1671f0f	tcg: Manually expand INDEX_op_dup_vec This case is similar to INDEX_op_mov_* in that we need to do different things depending on the current location of the source. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- v3: Added some commentary to the tcg_reg_alloc_* functions.	2019-05-13 14:44:03 -07:00
Richard Henderson	e7632cfa8b	tcg: Promote tcg_out_{dup,dupi}_vec to backend interface The i386 backend already has these functions, and the aarch64 backend could easily split out one. Nothing is done with these functions yet, but this will aid register allocation of INDEX_op_dup_vec in a later patch. Adjust the aarch64 tcg_out_dupi_vec signature to match the new interface. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 14:44:03 -07:00
Richard Henderson	240c08d099	tcg: Support cross-class moves without instruction support PowerPC Altivec does not support direct moves between vector registers and general registers. So when tcg_out_mov fails, we can use the backing memory for the temporary to perform the move. Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 14:44:03 -07:00
Richard Henderson	78113e83e0	tcg: Return bool success from tcg_out_mov This patch merely changes the interface, aborting on all failures, of which there are currently none. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 14:44:03 -07:00
Richard Henderson	c16f52b2c5	tcg/arm: Use tcg_out_mov_reg in tcg_out_mov We have a function that takes an additional condition parameter over the standard backend interface. It already takes care of eliding no-op moves. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 14:44:03 -07:00
Richard Henderson	d63e3b6e69	tcg: Assert fixed_reg is read-only The only fixed_reg is cpu_env, and it should not be modified during any TB. Therefore code that tries to special-case moves into a fixed_reg is dead. Remove it. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 14:44:03 -07:00
Richard Henderson	53229a7703	tcg: Specify optional vector requirements with a list Replace the single opcode in .opc with a null-terminated array in .opt_opc. We still require that all opcodes be used with the same .vece. Validate the contents of this list with CONFIG_DEBUG_TCG. All tcg_gen_*_vec functions will check any list active during .fniv expansion. Swap the active list in and out as we expand other opcodes, or take control away from the front-end function. Convert all existing vector aware front ends. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 14:44:03 -07:00
Richard Henderson	ce27c5d1a3	tcg: Allow add_vec, sub_vec, neg_vec, not_vec to be expanded PowerPC Altivec does not support add and subtract of 64-bit elements. Prepare for that configuration by not assuming the operation is universally supported. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 14:44:03 -07:00
Richard Henderson	ac383dde33	tcg: Do not recreate INDEX_op_neg_vec unless supported Use tcg_can_emit_vec_op instead of just TCG_TARGET_HAS_neg_vec, so that we check the type and vece for the actual operation. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 14:26:28 -07:00
David Hildenbrand	e1227bb6e5	tcg: Implement tcg_gen_gvec_3i() Let's add tcg_gen_gvec_3i(), similar to tcg_gen_gvec_2i(), however without introducing "gen_helper_gvec_3i *fnoi", as it isn't needed for now. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20190416185301.25344-2-david@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 14:26:28 -07:00
Richard Henderson	b4b82d7e9c	tcg/arm: Restrict constant pool displacement to 12 bits This will not necessarily restrict the size of the TB, since for v7 the majority of constant pool usage is for calls from the out-of-line ldst code, which is already at the end of the TB. But this does allow us to save one insn per reference on the off-chance. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-04-25 10:39:39 -07:00
Richard Henderson	a7cdaf710f	tcg/ppc: Allow the constant pool to overflow at 32k There is no point in coding for a 2GB offset when the max TB size is already limited to 64k. If we further restrict to 32k then we can eliminate the extra ADDIS instruction. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-04-24 13:05:28 -07:00
Richard Henderson	aeee05f53a	tcg: Restart TB generation after out-of-line ldst overflow This is part c of relocation overflow handling. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-04-24 13:05:28 -07:00
Richard Henderson	1768987b73	tcg: Restart TB generation after constant pool overflow This is part b of relocation overflow handling. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-04-24 13:05:28 -07:00
Richard Henderson	7ecd02a06f	tcg: Restart TB generation after relocation overflow If the TB generates too much code, such that backend relocations overflow, try again with a smaller TB. In support of this, move relocation processing from a random place within tcg_out_op, in the handling of branch opcodes, to a new function at the end of tcg_gen_code. This is not a complete solution, as there are additional relocs generated for out-of-line ldst handling and constant pools. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-04-24 13:05:21 -07:00
Richard Henderson	6e6c4efed9	tcg: Restart after TB code generation overflow If a TB generates too much code, try again with fewer insns. Fixes: https://bugs.launchpad.net/bugs/1824853 Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-04-24 13:04:33 -07:00
Richard Henderson	464c2969d5	tcg/aarch64: Support INDEX_op_extract2_{i32,i64} Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-04-24 13:04:33 -07:00
Richard Henderson	3b832d67a9	tcg/arm: Support INDEX_op_extract2_i32 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-04-24 13:04:33 -07:00
Richard Henderson	c6fb8c0cf7	tcg/i386: Support INDEX_op_extract2_{i32,i64} Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-04-24 13:04:33 -07:00
Richard Henderson	b0a6056719	tcg: Use extract2 in tcg_gen_deposit_{i32,i64} Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-04-24 13:04:33 -07:00
Richard Henderson	02616bad6f	tcg: Use deposit and extract2 in tcg_gen_shifti_i64 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-04-24 13:04:33 -07:00
Richard Henderson	fce1296f13	tcg: Add INDEX_op_extract2_{i32,i64} This will let backends implement the double-word shift operation. Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-04-24 13:04:33 -07:00
David Hildenbrand	2089fcc9e7	tcg: Implement tcg_gen_extract2_{i32,i64} Will be helpful for s390x. Input 128 bit and output 64 bit only, which is sufficient for now. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20190225154204.26751-1-david@redhat.com> [rth: Add matching tcg_gen_extract2_i32.] Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-04-24 13:04:33 -07:00
Markus Armbruster	3de2faa9a8	tcg: Simplify how dump_exec_info() prints dump_exec_info() takes an fprintf()-like callback and a FILE * to pass to it. Its only caller hmp_info_jit() passes monitor_fprintf() and the current monitor cast to FILE *. monitor_fprintf() casts it right back, and is otherwise identical to monitor_printf(). The type-punning is ugly. Drop the callback, and call qemu_printf() instead. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190417191805.28198-5-armbru@redhat.com>	2019-04-18 22:18:59 +02:00
Markus Armbruster	d4c51a0af3	tcg: Simplify how dump_opcount_info() prints dump_opcount_info() takes an fprintf()-like callback and a FILE * to pass to it. Its only caller hmp_info_opcount() passes monitor_fprintf() and the current monitor cast to FILE *. monitor_fprintf() casts it right back, and is otherwise identical to monitor_printf(). The type-punning is ugly. Drop the callback, and call qemu_printf() instead. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190417191805.28198-4-armbru@redhat.com>	2019-04-18 22:18:59 +02:00
Richard Henderson	9e564a1dde	tcg: Remove TODO file The last update to this file was 9 years ago. In the meantime, 4 of the 6 ideas have actually been completed. The lat two do not actually make sense anymore. Suggested-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-02-21 10:22:24 -08:00
Mark Cave-Ayland	3115584d39	tcg/i386: fix unsigned vector saturating arithmetic Due to a cut/paste error in the original implementation, the unsigned vector saturating arithmetic was erroneously being calculated as signed vector saturating arithmetic. Fixes: `8ffafbcec2` ("tcg/i386: Implement vector saturating arithmetic") Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Message-Id: <20190207224258.426-1-mark.cave-ayland@ilande.co.uk> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-02-11 08:52:44 -08:00
Richard Henderson	bef16ab4e6	tcg: Diagnose referenced labels that have not been emitted Currently, a jump to a label that is not defined anywhere will be emitted not be relocated. This results in a jump to a random jump target. With tcg debugging, print a diagnostic to the -d op file and abort. This could help debug or detect errors like `c2d9644e6d` ("target/arm: Fix crash on conditional instruction in an IT block") Reported-by: Roman Kapl <code@rkapl.cz> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-02-11 08:52:44 -08:00
Thomas Huth	fb0343d5b4	tcg: Fix LGPL version number It's either "GNU Library General Public version 2" or "GNU Lesser General Public version 2.1", but there was no "version 2.0" of the "Lesser" library. So assume that version 2.1 is meant here. Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <1548252536-6242-5-git-send-email-thuth@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2019-01-30 11:01:52 +01:00
Richard Henderson	e77c89fb08	cputlb: Remove static tlb sizing Now that all tcg backends support TCG_TARGET_IMPLEMENTS_DYN_TLB, remove the define and the old code. Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-01-28 07:04:35 -08:00
Richard Henderson	0a9a83d6bf	tcg/tci: enable dynamic TLB sizing This is automatic due to TCI using the other softtlb macros. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-01-28 07:04:10 -08:00
Richard Henderson	ac33373e0e	tcg/mips: enable dynamic TLB sizing Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-01-28 07:04:10 -08:00
Richard Henderson	a31aa4ce00	tcg/mips: Fix tcg_out_qemu_ld_slow_path Patch the branch after it has been emitted rather than before it exists. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-01-28 07:04:10 -08:00
Richard Henderson	cd7d3cb7a2	tcg/arm: enable dynamic TLB sizing Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-01-28 07:04:10 -08:00
Richard Henderson	41b70f220b	tcg/riscv: enable dynamic TLB sizing Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-01-28 07:04:24 -08:00
Richard Henderson	4f47e338f6	tcg/s390: enable dynamic TLB sizing Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-01-28 07:04:10 -08:00
Richard Henderson	17ff9f7801	tcg/sparc: enable dynamic TLB sizing Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-01-28 07:04:10 -08:00
Richard Henderson	644f591ab0	tcg/ppc: enable dynamic TLB sizing Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-01-28 07:04:10 -08:00
Richard Henderson	f7bcd96669	tcg/aarch64: enable dynamic TLB sizing Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-01-28 07:04:10 -08:00
Emilio G. Cota	54eaf40b8f	tcg/i386: enable dynamic TLB sizing As the following experiments show, this series is a net perf gain, particularly for memory-heavy workloads. Experiments are run on an Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz. 1. System boot + shudown, debian aarch64: - Before (v3.1.0): Performance counter stats for './die.sh v3.1.0' (10 runs): 9019.797015 task-clock (msec) # 0.993 CPUs utilized ( +- 0.23% ) 29,910,312,379 cycles # 3.316 GHz ( +- 0.14% ) 54,699,252,014 instructions # 1.83 insn per cycle ( +- 0.08% ) 10,061,951,686 branches # 1115.541 M/sec ( +- 0.08% ) 172,966,530 branch-misses # 1.72% of all branches ( +- 0.07% ) 9.084039051 seconds time elapsed ( +- 0.23% ) - After: Performance counter stats for './die.sh tlb-dyn-v5' (10 runs): 8624.084842 task-clock (msec) # 0.993 CPUs utilized ( +- 0.23% ) 28,556,123,404 cycles # 3.311 GHz ( +- 0.13% ) 51,755,089,512 instructions # 1.81 insn per cycle ( +- 0.05% ) 9,526,513,946 branches # 1104.641 M/sec ( +- 0.05% ) 166,578,509 branch-misses # 1.75% of all branches ( +- 0.19% ) 8.680540350 seconds time elapsed ( +- 0.24% ) That is, a 4.4% perf increase. 2. System boot + shutdown, ubuntu 18.04 x86_64: - Before (v3.1.0): 56100.574751 task-clock (msec) # 1.016 CPUs utilized ( +- 4.81% ) 200,745,466,128 cycles # 3.578 GHz ( +- 5.24% ) 431,949,100,608 instructions # 2.15 insn per cycle ( +- 5.65% ) 77,502,383,330 branches # 1381.490 M/sec ( +- 6.18% ) 844,681,191 branch-misses # 1.09% of all branches ( +- 3.82% ) 55.221556378 seconds time elapsed ( +- 5.01% ) - After: 56603.419540 task-clock (msec) # 1.019 CPUs utilized ( +- 10.19% ) 202,217,930,479 cycles # 3.573 GHz ( +- 10.69% ) 439,336,291,626 instructions # 2.17 insn per cycle ( +- 14.14% ) 80,538,357,447 branches # 1422.853 M/sec ( +- 16.09% ) 776,321,622 branch-misses # 0.96% of all branches ( +- 3.77% ) 55.549661409 seconds time elapsed ( +- 10.44% ) No improvement (within noise range). Note that for this workload, increasing the time window too much can lead to perf degradation, since it flushes the TLB very frequently. 3. x86_64 SPEC06int: x86_64-softmmu speedup vs. v3.1.0 for SPEC06int (test set) Host: Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz (Skylake) 5.5 +------------------------------------------------------------------------+ \| +-+ \| 5 \|-+.................+-+...............................tlb-dyn-v5.......+-\| \| * * \| 4.5 \|-+..................................................................+-\| \| * * \| 4 \|-+..................................................................+-\| \| * * \| 3.5 \|-+..................................................................+-\| \| * * \| 3 \|-+......+-+........................................................+-\| \| * * * \| 2.5 \|-+.................................................+-+...........+-\| \| * * * * * \| 2 \|-+..............................................................+-\| \| * * * * * * +-+ \| 1.5 \|-+....................................................+-+.+-+.+-\| \| * * +-+ * +-+ +-+ +-+ +-+ * * * * * \| 1 \|++++-++++++++++++++++-+++-+++-++++-++++-++++++++++++++\| \| * * * * * * * * * * * * * * * * * * * * * * * * * * \| 0.5 +------------------------------------------------------------------------+ 400.perlb401.bzip403.g429445.g456.hm462.libq464.h471.omn47483.xalancbgeomean png: https://imgur.com/YRF90f7 That is, a 1.51x average speedup over the baseline, with a max speedup of 5.17x. Here's a different look at the SPEC06int results, using KVM as the baseline: x86_64-softmmu slowdown vs. KVM for SPEC06int (test set) Host: Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz (Skylake) 25 +---------------------------------------------------------------------------+ \| +-+ +-+ \| \| * * +-+ v3.1.0 \| \| * * +-+ tlb-dyn-v5 \| \| * * * * +-+ \| 20 \|-+................................................+-+...............+-\| \| * * # # * * \| \| +-+ * * * # # * * \| \| * * * * * # # * * \| 15 \|-+..............................................#.#.......+-+......+-\| \| * * * * * # # * #\|# \| \| * * * * +-+ * # # * +-+ \| \| * * +-+ * * ++-+ +-+ * # # * # # +-+ \| \| * * +-+ * * * ## \| +-+ # # * # # +-+ \| 10 \|-+..........+-+...........##.......++-+..+-+.#.#.......#.#....+-\| \| * * +-+ * * * ## +-+ # # # #* # # +-+ * # # * * \| \| * * * # # * * +-+ * ## * +-+ # # # #* # # * * * # # +-+ \| \| * * # # * * * +-+ * ## * # # # # # #* # # * * * # # * ## \| 5 \|-+.......+-+.#.#.....#.#..##..#.#.#.#..#.#.#.#.....#.#..##.+-\| \| * # #* # # * +-+* # # * ## * # # # # # #* # # * * * # # * ## \| \| * # #* # # * # #* # # * ## * # # # # # #* # # * +-+* # # * ## \| \| ++-+ * # #* # # * # #* # # * ## * # # # # # #* # # * # #* # # * ## \| \|+++#+#++#+#+#+#++#+#+#+#++##++#+#+#+#++#+#+#+#++#+#+#+#+*+##+++\| 0 +---------------------------------------------------------------------------+ 400.perlbe401.bzi403.gc429445.go456.h462.libqu464.h471.omne4483.xalancbmgeomean png: https://imgur.com/YzAMNEV After this series, we bring down the average SPEC06int slowdown vs KVM from 11.47x to 7.58x. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20190116170114.26802-4-cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-01-28 07:03:34 -08:00
Emilio G. Cota	86e1eff8bc	tcg: introduce dynamic TLB sizing Disabled in all TCG backends for now. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20190116170114.26802-3-cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-01-28 07:03:34 -08:00
Richard Henderson	93f332a503	tcg/aarch64: Implement vector minmax arithmetic Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-01-28 07:03:34 -08:00
Richard Henderson	d32648d445	tcg/aarch64: Implement vector saturating arithmetic Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-01-28 07:03:34 -08:00
Richard Henderson	bc37faf4cb	tcg/i386: Implement vector minmax arithmetic The avx instruction set does not directly provide MO_64. We can still implement 64-bit with comparison and vpblendvb. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-01-28 07:03:34 -08:00
Richard Henderson	8ffafbcec2	tcg/i386: Implement vector saturating arithmetic Only MO_8 and MO_16 are implemented, since that's all the instruction set provides. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-01-28 07:03:34 -08:00
Richard Henderson	44f1441dbe	tcg/i386: Split subroutines out of tcg_expand_vec_op This routine was becoming too large. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-01-28 07:03:34 -08:00
Richard Henderson	dd0a0fcdd8	tcg: Add opcodes for vector minmax arithmetic Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-01-28 07:03:34 -08:00
Richard Henderson	8afaf05066	tcg: Add opcodes for vector saturated arithmetic Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-01-28 07:03:34 -08:00
Richard Henderson	5d6acdd4a4	tcg: Add write_aofs to GVecGen4 This allows writing 2 output, 3 input operations. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-01-28 07:03:34 -08:00
Richard Henderson	f550805d83	tcg: Add gvec expanders for nand, nor, eqv Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-01-28 07:03:34 -08:00
Richard Henderson	9a9eda78e4	tcg: Add logical simplifications during gvec expand We handle many of these during integer expansion, and the rest of them during integer optimization. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-01-28 07:03:34 -08:00
Paolo Bonzini	7d37435bd5	avoid TABs in files that only contain a few Most files that have TABs only contain a handful of them. Change them to spaces so that we don't confuse people. disas, standard-headers, linux-headers and libdecnumber are imported from other projects and probably should be exempted from the check. Outside those, after this patch the following files still contain both 8-space and TAB sequences at the beginning of the line. Many of them have a majority of TABs, or were initially committed with all tabs. bsd-user/i386/target_syscall.h bsd-user/x86_64/target_syscall.h crypto/aes.c hw/audio/fmopl.c hw/audio/fmopl.h hw/block/tc58128.c hw/display/cirrus_vga.c hw/display/xenfb.c hw/dma/etraxfs_dma.c hw/intc/sh_intc.c hw/misc/mst_fpga.c hw/net/pcnet.c hw/sh4/sh7750.c hw/timer/m48t59.c hw/timer/sh_timer.c include/crypto/aes.h include/disas/bfd.h include/hw/sh4/sh.h libdecnumber/decNumber.c linux-headers/asm-generic/unistd.h linux-headers/linux/kvm.h linux-user/alpha/target_syscall.h linux-user/arm/nwfpe/double_cpdo.c linux-user/arm/nwfpe/fpa11_cpdt.c linux-user/arm/nwfpe/fpa11_cprt.c linux-user/arm/nwfpe/fpa11.h linux-user/flat.h linux-user/flatload.c linux-user/i386/target_syscall.h linux-user/ppc/target_syscall.h linux-user/sparc/target_syscall.h linux-user/syscall.c linux-user/syscall_defs.h linux-user/x86_64/target_syscall.h slirp/cksum.c slirp/if.c slirp/ip.h slirp/ip_icmp.c slirp/ip_icmp.h slirp/ip_input.c slirp/ip_output.c slirp/mbuf.c slirp/misc.c slirp/sbuf.c slirp/socket.c slirp/socket.h slirp/tcp_input.c slirp/tcpip.h slirp/tcp_output.c slirp/tcp_subr.c slirp/tcp_timer.c slirp/tftp.c slirp/udp.c slirp/udp.h target/cris/cpu.h target/cris/mmu.c target/cris/op_helper.c target/sh4/helper.c target/sh4/op_helper.c target/sh4/translate.c tcg/sparc/tcg-target.inc.c tests/tcg/cris/check_addo.c tests/tcg/cris/check_moveq.c tests/tcg/cris/check_swap.c tests/tcg/multiarch/test-mmap.c ui/vnc-enc-hextile-template.h ui/vnc-enc-zywrle.h util/envlist.c util/readline.c The following have only TABs: bsd-user/i386/target_signal.h bsd-user/sparc64/target_signal.h bsd-user/sparc64/target_syscall.h bsd-user/sparc/target_signal.h bsd-user/sparc/target_syscall.h bsd-user/x86_64/target_signal.h crypto/desrfb.c hw/audio/intel-hda-defs.h hw/core/uboot_image.h hw/sh4/sh7750_regnames.c hw/sh4/sh7750_regs.h include/hw/cris/etraxfs_dma.h linux-user/alpha/termbits.h linux-user/arm/nwfpe/fpopcode.h linux-user/arm/nwfpe/fpsr.h linux-user/arm/syscall_nr.h linux-user/arm/target_signal.h linux-user/cris/target_signal.h linux-user/i386/target_signal.h linux-user/linux_loop.h linux-user/m68k/target_signal.h linux-user/microblaze/target_signal.h linux-user/mips64/target_signal.h linux-user/mips/target_signal.h linux-user/mips/target_syscall.h linux-user/mips/termbits.h linux-user/ppc/target_signal.h linux-user/sh4/target_signal.h linux-user/sh4/termbits.h linux-user/sparc64/target_syscall.h linux-user/sparc/target_signal.h linux-user/x86_64/target_signal.h linux-user/x86_64/termbits.h pc-bios/optionrom/optionrom.h slirp/mbuf.h slirp/misc.h slirp/sbuf.h slirp/tcp.h slirp/tcp_timer.h slirp/tcp_var.h target/i386/svm.h target/sparc/asi.h target/xtensa/core-dc232b/xtensa-modules.inc.c target/xtensa/core-dc233c/xtensa-modules.inc.c target/xtensa/core-de212/core-isa.h target/xtensa/core-de212/xtensa-modules.inc.c target/xtensa/core-fsf/xtensa-modules.inc.c target/xtensa/core-sample_controller/core-isa.h target/xtensa/core-sample_controller/xtensa-modules.inc.c target/xtensa/core-test_kc705_be/core-isa.h target/xtensa/core-test_kc705_be/xtensa-modules.inc.c tests/tcg/cris/check_abs.c tests/tcg/cris/check_addc.c tests/tcg/cris/check_addcm.c tests/tcg/cris/check_addoq.c tests/tcg/cris/check_bound.c tests/tcg/cris/check_ftag.c tests/tcg/cris/check_int64.c tests/tcg/cris/check_lz.c tests/tcg/cris/check_openpf5.c tests/tcg/cris/check_sigalrm.c tests/tcg/cris/crisutils.h tests/tcg/cris/sys.c tests/tcg/i386/test-i386-ssse3.c ui/vgafont.h Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20181213223737.11793-3-pbonzini@redhat.com> Reviewed-by: Aleksandar Markovic <amarkovic@wavecomp.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com> Acked-by: Richard Henderson <richard.henderson@linaro.org> Acked-by: Eric Blake <eblake@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Stefan Markovic <smarkovic@wavecomp.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-01-11 15:46:56 +01:00
Paolo Bonzini	eae3eb3e18	qemu/queue.h: simplify reverse access to QTAILQ The new definition of QTAILQ does not require passing the headname, remove it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-01-11 15:46:55 +01:00
Paolo Bonzini	b58deb344d	qemu/queue.h: leave head structs anonymous unless necessary Most list head structs need not be given a name. In most cases the name is given just in case one is going to use QTAILQ_LAST, QTAILQ_PREV or reverse iteration, but this does not apply to lists of other kinds, and even for QTAILQ in practice this is only rarely needed. In addition, we will soon reimplement those macros completely so that they do not need a name for the head struct. So clean up everything, not giving a name except in the rare case where it is necessary. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-01-11 15:46:55 +01:00
Richard Henderson	4250da1092	tcg: Improve call argument loading Free the argument register only after we have verified that the temporary is not already in that register. This case is likely now that we are back propagating the preferred register. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:58:43 +11:00
Richard Henderson	25f49c5f15	tcg: Record register preferences during liveness With these preferences, we can arrange for function call arguments to be computed into the proper registers instead of requiring extra moves. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:58:35 +11:00
Richard Henderson	ae36a246ed	tcg: Add TCG_OPF_BB_EXIT Use this to notice the opcodes that exit the TB, which implies that local temps are really dead and need not be synced. Previously we so marked the true end of the TB, but that was immediately overwritten by the la_bb_end invoked by any TCG_OPF_BB_END opcode, like exit_tb. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:58:20 +11:00
Richard Henderson	f65a061c39	tcg: Split out more subroutines from liveness_pass_1 Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:58:17 +11:00
Richard Henderson	2616c80821	tcg: Rename and adjust liveness_pass_1 helpers No need for a "tcg_" prefix for a static function; we already have another "la_" prefix for indicating liveness analysis. Pass in nb_globals and nb_temps, as we will already have them in registers for other loops within the parent function. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:58:09 +11:00
Richard Henderson	152c35aab4	tcg: Reindent parts of liveness_pass_1 There are two blocks of the form if (foo) { stuff1; goto bar; } else { baz: stuff2; } which have unnecessary and confusing indentation. Remove the else and unindent stuff2. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:57:57 +11:00
Richard Henderson	1894f69a61	tcg: Dump register preference info with liveness Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:57:48 +11:00
Richard Henderson	d62816f2db	tcg: Improve register allocation for matching constraints Try harder to honor the output_pref. When we're forced to allocate a second register for the input, it does not need to use the input constraint; that will be honored by the register we allocate for the output and a move is already required. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:57:40 +11:00
Richard Henderson	69e3706d2b	tcg: Add output_pref to TCGOp Allocate storage for, but do not yet fill in, per-opcode preferences for the output operands. Pass it in to the register allocation routines for output operands. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:57:37 +11:00
Richard Henderson	ba87719cd2	tcg: Add preferred_reg argument to tcg_reg_alloc_do_movi Pass this through to temp_sync. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:57:34 +11:00
Richard Henderson	98b4e186c1	tcg: Add preferred_reg argument to temp_sync Pass this through to tcg_reg_alloc. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:57:32 +11:00
Richard Henderson	b722452aef	tcg: Add preferred_reg argument to temp_load Pass this through to tcg_reg_alloc. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:57:29 +11:00
Richard Henderson	b016486e7b	tcg: Add preferred_reg argument to tcg_reg_alloc This new argument will aid register allocation by indicating how the temporary will be used in future. If the preference cannot be satisfied, fall back to the constraints of the current insn. Short circuit the preference when it cannot be satisfied or if it does not further constrain the operation. With an eye toward optimizing function call sequences, optimize for the preferred_reg set containing a single register. For the moment, all users pass 0 for preference. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:57:06 +11:00
Richard Henderson	b4fc67c7af	tcg: Add reachable_code_pass Delete trivially dead code that follows unconditional branches and noreturn helpers. These can occur either via optimization or via the structure of a target's translator following an exception. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:40:30 +11:00
Richard Henderson	d88a117eaa	tcg: Reference count labels Increment when adding branches, and decrement when removing them. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:40:27 +11:00
Richard Henderson	15d7409260	tcg: Add TCG_CALL_NO_RETURN Remember which helpers have been marked noreturn. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:40:24 +11:00
Richard Henderson	3b50352b05	tcg: Renumber TCG_CALL_* flags Previously, the low 4 bits were used for TCG_CALL_TYPE_MASK, which was removed in `6a18ae2d29`. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:40:15 +11:00
Alistair Francis	7a5549f2ae	tcg/riscv: Add the target init code Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Michael Clark <mjc@sifive.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <dd6e439ab81883974b8fd91f904f6de26ab5d697.1545246859.git.alistair.francis@wdc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:40:02 +11:00
Alistair Francis	92c041c59b	tcg/riscv: Add the prologue generation and register the JIT Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Michael Clark <mjc@sifive.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <c4d023127967a0217d8d1eabdf5de6c0e8f8c228.1545246859.git.alistair.francis@wdc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:40:02 +11:00
Alistair Francis	bdf503819e	tcg/riscv: Add the out op decoder Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Michael Clark <mjc@sifive.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <7c47f00cb4a9a777120456e0704b4076a5d943ab.1545246859.git.alistair.francis@wdc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:40:02 +11:00
Alistair Francis	03a7d0213d	tcg/riscv: Add direct load and store instructions Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Michael Clark <mjc@sifive.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <2e047a95c39c007c66cda024c095e29b0ac4c43e.1545246859.git.alistair.francis@wdc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:40:02 +11:00
Alistair Francis	efbea94c76	tcg/riscv: Add slowpath load and store instructions Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Michael Clark <mjc@sifive.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <1a0a7e8f3347764f212c5efa5c07c9be17efdec6.1545246859.git.alistair.francis@wdc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:40:02 +11:00
Alistair Francis	15840069e1	tcg/riscv: Add branch and jump instructions Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Michael Clark <mjc@sifive.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <c356657e627168d89cb5b012b7e21e4efbbe83f3.1545246859.git.alistair.francis@wdc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:40:02 +11:00
Alistair Francis	28ca738e9d	tcg/riscv: Add the add2 and sub2 instructions Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <5665a57809e32b35775e8e98fdab898853af37b8.1545246859.git.alistair.francis@wdc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:40:02 +11:00
Alistair Francis	61535d4988	tcg/riscv: Add the out load and store instructions Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Michael Clark <mjc@sifive.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <d5d88ff29163788938368bbdbd18815d59cef6a0.1545246859.git.alistair.francis@wdc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:40:02 +11:00
Alistair Francis	27fd64144b	tcg/riscv: Add the extract instructions Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Michael Clark <mjc@sifive.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <c4d2afba46efefa9388cf3205fcedbb9a5fa411f.1545246859.git.alistair.francis@wdc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:40:02 +11:00
Alistair Francis	6cd2eda39f	tcg/riscv: Add the mov and movi instruction Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Michael Clark <mjc@sifive.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <bd6a45c73a67b77ddaa2fe590a6bb8ee422b9683.1545246859.git.alistair.francis@wdc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:40:02 +11:00
Alistair Francis	dfa8e74f94	tcg/riscv: Add the relocation functions Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Michael Clark <mjc@sifive.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <6ac4f4b0d5ea93cb0ee9a3b8b47ee9f7b3711494.1545246859.git.alistair.francis@wdc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:40:02 +11:00
Alistair Francis	bedf14e335	tcg/riscv: Add the instruction emitters Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Michael Clark <mjc@sifive.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <c740aca183675625bb9cf3ce7b9e8b9d431ca694.1545246859.git.alistair.francis@wdc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:40:02 +11:00
Alistair Francis	54a9ce0f68	tcg/riscv: Add the immediate encoders Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Michael Clark <mjc@sifive.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <d54dc56303fd1b0d7ed53869de2dbb59b111c7ca.1545246859.git.alistair.francis@wdc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:40:02 +11:00
Alistair Francis	8ce23a1312	tcg/riscv: Add support for the constraints Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Michael Clark <mjc@sifive.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <dba7315e4e20e879933f72d47ccf98f1cc612b8a.1545246859.git.alistair.francis@wdc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:40:02 +11:00
Alistair Francis	505e75c592	tcg/riscv: Add the tcg target registers Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Michael Clark <mjc@sifive.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <6e43abaa64361d57b9bc9439820d0e7701f2d47e.1545246859.git.alistair.francis@wdc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:40:02 +11:00
Alistair Francis	fb1f70f368	tcg/riscv: Add the tcg-target.h file Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Michael Clark <mjc@sifive.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <a135ee1a88cd7bd08993a519d4d654da27785254.1545246859.git.alistair.francis@wdc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-26 06:40:02 +11:00
Emilio G. Cota	ac1043f6d6	tcg: Drop nargs from tcg_op_insert_{before,after} It's unused since `75e8b9b7aa`. Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20181209193749.12277-9-cota@braap.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:44 +03:00
Alistair Francis	161dec9d1b	tcg/mips: Improve the add2/sub2 command to use TCG_TARGET_REG_BITS Instead of hard coding 31 for the shift right use TCG_TARGET_REG_BITS - 1. Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <7dfbddf7014a595150aa79011ddb342c3cc17ec3.1544648105.git.alistair.francis@wdc.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:44 +03:00
Richard Henderson	e1dcf3529d	tcg: Add TCG_TARGET_HAS_MEMORY_BSWAP For now, defined universally as true, since we previously required backends to implement swapped memory operations. Future patches may now remove that support where it is onerous. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:44 +03:00
Richard Henderson	6498594c8e	tcg/optimize: Optimize bswap Somehow we forgot these operations, once upon a time. This will allow immediate stores to have their bswap optimized away. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:44 +03:00
Richard Henderson	9e821eab0a	tcg: Clean up generic bswap64 Based on the only current user, Sparc: New code uses 2 constants that take 2 insns to load from constant pool, plus 13. Old code used 6 constants that took 1 or 2 insns to create, plus 21. The result is a new total of 17 vs an old total of 29. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:44 +03:00
Richard Henderson	a686dc71d8	tcg: Clean up generic bswap32 Based on the only current user, Sparc: New code uses 1 constant that takes 2 insns to create, plus 8. Old code used 2 constants that took 2 insns to create, plus 9. The result is a new total of 10 vs an old total of 13. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:44 +03:00
Richard Henderson	5785c17f31	tcg/i386: Add setup_guest_base_seg for FreeBSD Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:44 +03:00
Richard Henderson	913c2bddc2	tcg/i386: Precompute all guest_base parameters These values are constant between all qemu_ld/st invocations; there is no need to figure this out each time. If we cannot use a segment or an offset directly for guest_base, load the value into a register in the prologue. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:44 +03:00
Richard Henderson	4810d96f03	tcg/i386: Assume 32-bit values are zero-extended We now have an invariant that all TCG_TYPE_I32 values are zero-extended, which means that we do not need to extend them again during qemu_ld/st, either explicitly via a separate tcg_out_ext32u or implicitly via P_ADDR32. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:44 +03:00
Richard Henderson	75478279a0	tcg/i386: Implement INDEX_op_extr{lh}_i64_i32 for 32-bit guests This preserves the invariant that all TCG_TYPE_I32 values are zero-extended in the 64-bit host register. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:44 +03:00
Richard Henderson	3dbc8c61de	tcg/i386: Propagate is64 to tcg_out_qemu_ld_slow_path This helps preserve the invariant that all TCG_TYPE_I32 values are stored zero-extended in the 64-bit host registers. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:44 +03:00
Richard Henderson	1d21d95b61	tcg/i386: Propagate is64 to tcg_out_qemu_ld_direct This helps preserve the invariant that all TCG_TYPE_I32 values are stored zero-extended in the 64-bit host registers. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:43 +03:00
Richard Henderson	55dfd8fedc	tcg/s390x: Return false on failure from patch_reloc This does require an extra two checks within the slow paths to replace the assert that we're moving. Also add two checks within existing functions that lacked any kind of assert for out of range branch. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:43 +03:00
Richard Henderson	d513290351	tcg/ppc: Return false on failure from patch_reloc The reloc_pc{14,24}_val routines retain their asserts. Use these directly within the slow paths. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:43 +03:00
Richard Henderson	43fabd30e2	tcg/arm: Return false on failure from patch_reloc This does require an extra two checks within the slow paths to replace the assert that we're moving. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:43 +03:00
Richard Henderson	214bfe83d5	tcg/aarch64: Return false on failure from patch_reloc This does require an extra two checks within the slow paths to replace the assert that we're moving. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:43 +03:00
Richard Henderson	bec3afd5fc	tcg/i386: Return false on failure from patch_reloc Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:43 +03:00
Richard Henderson	6ac1778676	tcg: Return success from patch_reloc This will move the assert for success from within (subroutines of) patch_reloc into the callers. It will also let new code do something different when a relocation is out of range. For the moment, all backends are trivially converted to return true. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:43 +03:00
Richard Henderson	8c1b079279	tcg/mips: Remove retranslation code There is no longer a need for preserving branch offset operands, as we no longer re-translate. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:43 +03:00
Richard Henderson	791645f022	tcg/sparc: Remove retranslation code There is no longer a need for preserving branch offset operands, as we no longer re-translate. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:43 +03:00
Richard Henderson	3661612fc3	tcg/s390: Remove retranslation code There is no longer a need for preserving branch offset operands, as we no longer re-translate. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:43 +03:00
Richard Henderson	f9c7246faa	tcg/ppc: Fold away "noaddr" branch routines There is no longer a need for preserving branch offset operands, as we no longer re-translate. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:43 +03:00
Richard Henderson	37ee93a974	tcg/arm: Fold away "noaddr" branch routines There are one use apiece for these. There is no longer a need for preserving branch offset operands, as we no longer re-translate. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:43 +03:00
Richard Henderson	2672ccc7ee	tcg/arm: Remove reloc_pc24_atomic It is unused since `3fb53fb4d1`. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:43 +03:00
Richard Henderson	733589b338	tcg/aarch64: Fold away "noaddr" branch routines There are one use apiece for these. There is no longer a need for preserving branch offset operands, as we no longer re-translate. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:43 +03:00
Richard Henderson	90d6cb7811	tcg/aarch64: Remove reloc_pc26_atomic It is unused since `b68686bd4b`. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:43 +03:00
Richard Henderson	66c0285df4	tcg/i386: Move TCG_REG_CALL_STACK from define to enum Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:43 +03:00
Richard Henderson	5740d9f714	tcg/i386: Always use %ebp for TCG_AREG0 For x86_64, this can remove a REX prefix resulting in smaller code when manipulating globals of type i32, as we move them between backing store via cpu_env, aka TCG_AREG0. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:43 +03:00
Richard Henderson	f6823cbe37	target/sparc: Remove the constant pool Partially reverts `ab20bdc116`. The 14-bit displacement that we allowed to reach the constant pool is not always sufficient. Retain the tb-relative addressing, as that is how most return values from the tb are computed. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:42 +03:00
Thomas Huth	6fa2cef205	tcg/tcg.h: Remove GCC check for tcg_debug_assert() macro Both GCC v4.8 and Clang v3.4 (our minimum versions) support __builtin_unreachable(), so we can remove the version check here now. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>	2018-12-12 10:01:12 +01:00
Peter Maydell	a7ce790a02	tcg/tcg-op.h: Add multiple include guard The tcg-op.h header was missing the usual guard against multiple inclusion; add it. (Spotted by lgtm.com's static analyzer.) Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20181108125256.30986-1-peter.maydell@linaro.org	2018-11-08 15:15:32 +00:00
Richard Henderson	e6cd4bb59b	tcg: Split CONFIG_ATOMIC128 GCC7+ will no longer advertise support for 16-byte __atomic operations if only cmpxchg is supported, as for x86_64. Fortunately, x86_64 still has support for __sync_compare_and_swap_16 and we can make use of that. AArch64 does not have, nor ever has had such support, so open-code it. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-10-18 19:46:36 -07:00
Emilio G. Cota	72fd2efbbd	tcg: distribute tcg_time into TCG contexts When we implemented per-vCPU TCG contexts, we forgot to also distribute the tcg_time counter, which has remained as a global accessed without any serialization, leading to potentially missed counts. Fix it by distributing the field over the TCG contexts, embedding it into TCGProfile with a field called "cpu_exec_time", which is more descriptive than "tcg_time". Add a function to query this value directly, and for completeness, fill in the field in tcg_profile_snapshot, even though its callers do not use it. Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20181010144853.13005-5-cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-10-18 18:58:10 -07:00
Emilio G. Cota	dd1d7da23b	tcg: plug holes in struct TCGProfile This plugs two 4-byte holes in 64-bit. Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20181010144853.13005-4-cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-10-18 18:58:10 -07:00
Emilio G. Cota	c1f543b739	tcg: fix use of uninitialized variable under CONFIG_PROFILER We forgot to initialize n in commit `15fa08f845` ("tcg: Dynamically allocate TCGOps", 2017-12-29). Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20181010144853.13005-3-cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-10-18 18:58:10 -07:00
Richard Henderson	d7f425fdea	tcg: Implement CPU_LOG_TB_NOCHAIN during expansion Rather than test NOCHAIN before linking, do not emit the goto_tb opcode at all. We already do this for goto_ptr. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-10-18 18:58:10 -07:00
Roman Kapl	93bf9a4273	tcg/i386: fix vector operations on 32-bit hosts The TCG backend uses LOWREGMASK to get the low 3 bits of register numbers. This was defined as no-op for 32-bit x86, with the assumption that we have eight registers anyway. This assumption is not true once we have xmm regs. Since LOWREGMASK was a no-op, xmm register indidices were wrong in opcodes and have overflown into other opcode fields, wreaking havoc. To trigger these problems, you can try running the "movi d8, #0x0" AArch64 instruction on 32-bit x86. "vpxor %xmm0, %xmm0, %xmm0" should be generated, but instead TCG generated "vpxor %xmm0, %xmm0, %xmm2". Fixes: `770c2fc7bb` ("Add vector operations") Signed-off-by: Roman Kapl <rka@sysgo.com> Message-Id: <20180824131734.18557-1-rka@sysgo.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-09-26 09:02:51 -07:00
Richard Henderson	1fb57da72a	tcg/optimize: Do not skip default processing of dup_vec If we do not opimize away dup_vec, we must mark its output as changed. Fixes: `170ba88f45` Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com> Tested-by: Laurent Desnogues <laurent.desnogues@gmail.com> Message-id: 20180805233258.31892-1-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-08-06 14:57:48 +01:00
Richard Henderson	672189cd58	tcg/i386: Mark xmm registers call-clobbered When host vector registers and operations were introduced, I failed to mark the registers call clobbered as required by the ABI. Fixes: `770c2fc7bb` Cc: qemu-stable@nongnu.org Reported-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-07-23 09:21:14 -07:00
Alex Bennée	e65a5f227d	tcg/aarch64: limit mul_vec size In AdvSIMD we can only do 32x32 integer multiples although SVE is capable of larger 64 bit multiples. As a result we can end up generating invalid opcodes. Fix this by only reprting we can emit mul vector ops if the size is small enough. Fixes a crash on: sve-all-short-v8.3+sve@vq3/insn_mul_z_zi___INC.risu.bin When running on AArch64 hardware. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20180719154248.29669-1-alex.bennee@linaro.org> [rth: Removed the tcg_debug_assert -- there are plenty of other cases that we do not diagnose within the insn encoding helpers.] Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-07-19 09:07:31 -07:00
Richard Henderson	499748d768	tcg: Restrict check_size_impl to multiples of the line size Normally this is automatic in the size restrictions that are placed on vector sizes coming from the implementation. However, for the legitimate size tuple [oprsz=8, maxsz=32], we need to clear the final 24 bytes of the vector register. Without this check, do_dup selects TCG_TYPE_V128 and clears only 16 bytes. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Tested-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20180705191929.30773-2-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-07-09 14:51:34 +01:00
Richard Henderson	9f75462065	tcg: Reduce max TB opcode count Also, assert that we don't overflow any of two different offsets into the TB. Both unwind and goto_tb both record a uint16_t for later use. This fixes an arm-softmmu test case utilizing NEON in which there is a TB generated that runs to 7800 opcodes, and compiles to 96k on an x86_64 host. This overflows the 16-bit offset in which we record the goto_tb reset offset. Because of that overflow, we install a jump destination that goes to neverland. Boom. With this reduced op count, the same TB compiles to about 48k for aarch64, ppc64le, and x86_64 hosts, and neither assertion fires. Cc: qemu-stable@nongnu.org Reported-by: "Jason A. Donenfeld" <Jason@zx2c4.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-06-15 09:39:53 -10:00
Emilio G. Cota	0ac20318ce	tcg: remove tb_lock Use mmap_lock in user-mode to protect TCG state and the page descriptors. In !user-mode, each vCPU has its own TCG state, so no locks needed. Per-page locks are used to protect the page descriptors. Per-TB locks are used in both modes to protect TB jumps. Some notes: - tb_lock is removed from notdirty_mem_write by passing a locked page_collection to tb_invalidate_phys_page_fast. - tcg_tb_lookup/remove/insert/etc have their own internal lock(s), so there is no need to further serialize access to them. - do_tb_flush is run in a safe async context, meaning no other vCPU threads are running. Therefore acquiring mmap_lock there is just to please tools such as thread sanitizer. - Not visible in the diff, but tb_invalidate_phys_page already has an assert_memory_lock. - cpu_io_recompile is !user-only, so no mmap_lock there. - Added mmap_unlock()'s before all siglongjmp's that could be called in user-mode while mmap_lock is held. + Added an assert for !have_mmap_lock() after returning from the longjmp in cpu_exec, just like we do in cpu_exec_step_atomic. Performance numbers before/after: Host: AMD Opteron(tm) Processor 6376 ubuntu 17.04 ppc64 bootup+shutdown time 700 +-+--+----+------+------------+-----------+--------------+-+ \| + + + + + B \| \| before *B* ** * \| \|tb lock removal ###D### * \| 600 +-+ * +-+ \| ** # \| \| B #D \| \| *** * ## \| 500 +-+ *** ### +-+ \| * *** ### \| \| B # ## \| \| ** * #D# \| 400 +-+ ## +-+ \| ### \| \| ## \| \| # ## \| 300 +-+ * B* #D# +-+ \| B *** ### \| \| * ** #### \| \| * *** ### \| 200 +-+ B B #D# +-+ \| #B * ## # \| \| #* ## \| \| + D##D# + + + + \| 100 +-+--+----+------+------------+-----------+------------+--+-+ 1 8 16 Guest CPUs 48 64 png: https://imgur.com/HwmBHXe debian jessie aarch64 bootup+shutdown time 90 +-+--+-----+-----+------------+------------+------------+--+-+ \| + + + + + + \| \| before *B* B \| 80 +tb lock removal ###D### D +-+ \| ### \| \| ## \| 70 +-+ # +-+ \| ## \| \| # \| 60 +-+ B ## +-+ \| * ## \| \| * #D \| 50 +-+ * ## +-+ \| * ### \| \| B* ### \| 40 +-+ ** # ## +-+ \| #D# \| \| B* ### \| 30 +-+ B*B #### +-+ \| B * * # ### \| \| B ###D# \| 20 +-+ D ##D## +-+ \| D# \| \| + + + + + + \| 10 +-+--+-----+-----+------------+------------+------------+--+-+ 1 8 16 Guest CPUs 48 64 png: https://imgur.com/iGpGFtv The gains are high for 4-8 CPUs. Beyond that point, however, unrelated lock contention significantly hurts scalability. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-06-15 08:18:48 -10:00
Emilio G. Cota	128ed2278c	tcg: move tb_ctx.tb_phys_invalidate_count to tcg_ctx Thereby making it per-TCGContext. Once we remove tb_lock, this will avoid an atomic increment every time a TB is invalidated. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-06-15 07:42:55 -10:00
Emilio G. Cota	be2cdc5e35	tcg: track TBs with per-region BST's This paves the way for enabling scalable parallel generation of TCG code. Instead of tracking TBs with a single binary search tree (BST), use a BST for each TCG region, protecting it with a lock. This is as scalable as it gets, since each TCG thread operates on a separate region. The core of this change is the introduction of struct tcg_region_tree, which contains a pointer to a GTree and an associated lock to serialize accesses to it. We then allocate an array of tcg_region_tree's, adding the appropriate padding to avoid false sharing based on qemu_dcache_linesize. Given a tc_ptr, we first find the corresponding region_tree. This is done by special-casing the first and last regions first, since they might be of size != region.size; otherwise we just divide the offset by region.stride. I was worried about this division (several dozen cycles of latency), but profiling shows that this is not a fast path. Note that region.stride is not required to be a power of two; it is only required to be a multiple of the host's page size. Note that with this design we can also provide consistent snapshots about all region trees at once; for instance, tcg_tb_foreach acquires/releases all region_tree locks before/after iterating over them. For this reason we now drop tb_lock in dump_exec_info(). As an alternative I considered implementing a concurrent BST, but this can be tricky to get right, offers no consistent snapshots of the BST, and performance and scalability-wise I don't think it could ever beat having separate GTrees, given that our workload is insert-mostly (all concurrent BST designs I've seen focus, understandably, on making lookups fast, which comes at the expense of convoluted, non-wait-free insertions/removals). Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-06-15 07:42:55 -10:00
John Arbuckle	1019242af1	tcg/i386: Use byte form of xgetbv instruction The assembler in most versions of Mac OS X is pretty old and does not support the xgetbv instruction. To go around this problem, the raw encoding of the instruction is used instead. Signed-off-by: John Arbuckle <programmingkidx@gmail.com> Message-Id: <20180604215102.11002-1-programmingkidx@gmail.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-06-15 07:42:55 -10:00
Peter Maydell	163670542f	tcg-next queue -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJbEdLqAAoJEGTfOOivfiFfQaEH/Rq96S5bo94495KmRJY9e/jw lV321YYI7nx7sHtViG/B3iTkvnxzZPWcc7XbBMxyV5xmMQ/5zjS/ynZPFyy/cYRn zLM4W0SJ38EqhHTZpkkvw9Nle8UbNWKm5PgND2TyE4hmeuQ98OrQ6Y1GvP4MFpXs uQErbmMjYHMq7thbfCO6ulJjjEliRy3AJ2C3fCCCUgBQrJt6JeqbGr/Zzi2y88M9 IhoK8RbJiWT2O5Tl95q2NOQvr11WbFlu/K0nuaVgbfTwd2tp3ygmRKPpeZ24qA52 qtwgcIjWHHkkC5s1qaP8oW4FtoMQZdsaOwSOPw0ZBnG+VA7P/h33fWr9f5SistA= =UVdE -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/rth/tags/tcg-next-pull-request' into staging tcg-next queue # gpg: Signature made Sat 02 Jun 2018 00:12:42 BST # gpg: using RSA key 64DF38E8AF7E215F # gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" # Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F * remotes/rth/tags/tcg-next-pull-request: tcg: Pass tb and index to tcg_gen_exit_tb separately Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-06-04 11:28:31 +01:00
Richard Henderson	07ea28b418	tcg: Pass tb and index to tcg_gen_exit_tb separately Do the cast to uintptr_t within the helper, so that the compiler can type check the pointer argument. We can also do some more sanity checking of the index argument. Reviewed-by: Laurent Vivier <laurent@vivier.eu> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-06-01 15:15:27 -07:00
Philippe Mathieu-Daudé	23c11b04dc	target: Do not include "exec/exec-all.h" if it is not necessary Code change produced with: $ git grep '#include "exec/exec-all.h"' \| \ cut -d: -f-1 \| \ xargs egrep -L "(cpu_address_space_init\|cpu_loop_\|tlb_\|tb_\|GETPC\|singlestep\|TranslationBlock)" \| \ xargs sed -i.bak '/#include "exec\/exec-all.h"/d' Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20180528232719.4721-10-f4bug@amsat.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-06-01 14:15:10 +02:00
Emilio G. Cota	1d34982155	tcg: fix s/compliment/complement/ typos Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2018-05-20 08:25:23 +03:00
Peter Maydell	f5583c527f	target-arm queue: * hw/arm/iotkit.c: fix minor memory leak * softfloat: fix wrong-exception-flags bug for multiply-add corner case * arm: isolate and clean up DTB generation * implement Arm v8.1-Atomics extension * Fix some bugs and missing instructions in the v8.2-FP16 extension -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABCAAGBQJa9IUCAAoJEDwlJe0UNgzeEGMQAKKjVRzZ7MBgvxQj0FJSWhSP BZkATf3ktid255PRpIssBZiY9oM+uY6n+/IRozAGvfDBp9eQOkrZczZjfW5hpe0B YsQadtk5cUOXqQzRTegSMPOoMmz8f5GaGOk4R6AEXJEX+Rug/zbOn9Q8Yx7JTd7o yBvU1+fys3galSiB88cffA95B9fwGfLsM7rP6OC4yNdUBYwjHf3wtY53WsxtWqX9 oX4keEiROQkrOfbSy9wYPZzu/0iRo8v35+7wIZhvNSlf02k6yJ7a+w0C4EQIRhWm 5zciE+aMYr7nOGpj7AEJLrRekhwnD6Ppje6aUd15yrxfNRZkpk/FeECWnaOPDis7 QNijx5Zqg6+GyItQKi5U4vFVReMj09OB7xDyAq77xDeBj4l3lg2DNkRfRhqQZAcv 2r4EW+pfLNj76Ah1qtQ410fprw462Sopb6bHmeuFbf1QFbQvJ4CL1+7Jl3ExrDX4 2+iQb4sQghWDxhDLfRSLxQ7K+bX+mNfGdFW8h+jPShD/+JY42dTKkFZEl4ghNgMD mpj8FrQuIkSMqnDmPfoTG5MVTMERacqPU7GGM7/fxudIkByO3zTiLxJ/E+Iy8HvX 29xKoOBjKT5FJrwJABsN6VpA3EuyAARgQIZ/dd6N5GZdgn2KAIHuaI+RHFOesKFd dJGM6sdksnsAAz28aUEJ =uXY+ -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20180510' into staging target-arm queue: * hw/arm/iotkit.c: fix minor memory leak * softfloat: fix wrong-exception-flags bug for multiply-add corner case * arm: isolate and clean up DTB generation * implement Arm v8.1-Atomics extension * Fix some bugs and missing instructions in the v8.2-FP16 extension # gpg: Signature made Thu 10 May 2018 18:44:34 BST # gpg: using RSA key 3C2525ED14360CDE # gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" # gpg: aka "Peter Maydell <pmaydell@gmail.com>" # gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" # Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE * remotes/pmaydell/tags/pull-target-arm-20180510: (21 commits) target/arm: Clear SVE high bits for FMOV target/arm: Fix float16 to/from int16 target/arm: Implement vector shifted FCVT for fp16 target/arm: Implement vector shifted SCVF/UCVF for fp16 target/arm: Enable ARM_FEATURE_V8_ATOMICS for user-only target/arm: Implement CAS and CASP target/arm: Fill in disas_ldst_atomic target/arm: Introduce ARM_FEATURE_V8_ATOMICS and initial decode target/riscv: Use new atomic min/max expanders tcg: Use GEN_ATOMIC_HELPER_FN for opposite endian atomic add tcg: Introduce atomic helpers for integer min/max target/xtensa: Use new min/max expanders target/arm: Use new min/max expanders tcg: Introduce helpers for integer min/max atomic.h: Work around gcc spurious "unused value" warning make sure that we aren't overwriting mc->get_hotplug_handler by accident arm/boot: split load_dtb() from arm_load_kernel() platform-bus-device: use device plug callback instead of machine_done notifier pc: simplify MachineClass::get_hotplug_handler handling softfloat: Handle default NaN mode after pickNaNMulAdd, not before ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org> # Conflicts: # target/riscv/translate.c	2018-05-11 17:41:54 +01:00
Richard Henderson	5507c2bf35	tcg: Introduce atomic helpers for integer min/max Given that this atomic operation will be used by both risc-v and aarch64, let's not duplicate code across the two targets. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20180508151437.4232-5-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-05-10 18:10:57 +01:00
Richard Henderson	b87fb8cd9f	tcg: Introduce helpers for integer min/max These operations are re-invented by several targets so far. Several supported hosts have insns for these, so place the expanders out-of-line for a future introduction of tcg opcodes. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20180508151437.4232-2-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-05-10 18:10:57 +01:00
Richard Henderson	abebf92597	tcg: Limit the number of ops in a TB In `6001f7729e` we partially attempt to address the branch displacement overflow caused by `15fa08f845`. However, gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqtbX.c is a testcase that contains a TB so large as to overflow anyway. The limit here of 8000 ops produces a maximum output TB size of 24112 bytes on a ppc64le host with that test case. This is still much less than the maximum forward branch distance of 32764 bytes. Cc: qemu-stable@nongnu.org Fixes: `15fa08f845` ("tcg: Dynamically allocate TCGOps") Reviewed-by: Laurent Vivier <laurent@vivier.eu> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-05-09 08:30:57 -07:00
Peter Maydell	7eb30ef0ba	tcg/i386: Fix dup_vec in non-AVX2 codepath The VPUNPCKLD* instructions are all "non-destructive source", indicated by "NDS" in the encoding string in the x86 ISA manual. This means that they take two source operands, one of which is encoded in the VEX.vvvv field. We were incorrectly treating them as if they were destructive-source and passing 0 as the 'v' argument of tcg_out_vex_modrm(). This meant we were always using %xmm0 as one of the source operands, causing incorrect results if the register allocator happened to want to use something else. For instance the input AArch64 insn: DUP v26.16b, w21 which becomes TCG IR ops: dup_vec v128,e8,tmp2,x21 st_vec v128,e8,tmp2,env,$0xa40 was assembled to: 0x607c568c: c4 c1 7a 7e 86 e8 00 00 vmovq 0xe8(%r14), %xmm0 0x607c5694: 00 0x607c5695: c5 f9 60 c8 vpunpcklbw %xmm0, %xmm0, %xmm1 0x607c5699: c5 f9 61 c9 vpunpcklwd %xmm1, %xmm0, %xmm1 0x607c569d: c5 f9 70 c9 00 vpshufd $0, %xmm1, %xmm1 0x607c56a2: c4 c1 7a 7f 8e 40 0a 00 vmovdqu %xmm1, 0xa40(%r14) 0x607c56aa: 00 when the vpunpcklwd insn should be "%xmm1, %xmm1, %xmm1". This resulted in our incorrectly setting the output vector to q26=0000320000003200:0000320000003200 when given an input of x21 == 0000000002803200 rather than the expected all-zeroes. Pass the correct source register number to tcg_out_vex_modrm() for these insns. Fixes: `770c2fc7bb` Cc: qemu-stable@nongnu.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20180504153431.5169-1-peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-05-09 08:30:57 -07:00
Laurent Vivier	6001f7729e	tcg: workaround branch instruction overflow in tcg_out_qemu_ld/st ppc64 uses a BC instruction to call the tcg_out_qemu_ld/st slow path. BC instruction uses a relative address encoded on 14 bits. The slow path functions are added at the end of the generated instructions buffer, in the reverse order of the callers. So more we have slow path functions more the distance between the caller (BC) and the function increases. This patch changes the behavior to generate the functions in the same order of the callers. Cc: qemu-stable@nongnu.org Fixes: `15fa08f845` ("tcg: Dynamically allocate TCGOps") Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20180429235840.16659-1-lvivier@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-05-01 11:56:55 -07:00
Richard Henderson	5bfa803448	tcg: Improve TCGv_ptr support Drop TCGV_PTR_TO_NAT and TCGV_NAT_TO_PTR internal macros. Add tcg_temp_local_new_ptr, tcg_gen_brcondi_ptr, tcg_gen_ext_i32_ptr, tcg_gen_trunc_i64_ptr, tcg_gen_extu_ptr_i64, tcg_gen_trunc_ptr_i32. Use inlines instead of macros where possible. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-05-01 11:56:16 -07:00
Richard Henderson	9a938d86b0	tcg: Allow wider vectors for cmp and mul In `db432672`, we allow wide inputs for operations such as add. However, in `212be173` and `3774030a` we didn't do the same for compare and multiply. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-05-01 11:56:16 -07:00
Henry Wertz	3f814b8037	tcg/arm: Fix memory barrier encoding I found with qemu 2.11.x or newer that I would get an illegal instruction error running some Intel binaries on my ARM chromebook. On investigation, I found it was quitting on memory barriers. qemu instruction: mb $0x31 was translating as: 0x604050cc: 5bf07ff5 blpl #0x600250a8 After patch it gives: 0x604050cc: f57ff05b dmb ish In short, I found INSN_DMB_ISH (memory barrier for ARMv7) appeared to be correct based on online docs, but due to some endian-related shenanigans it had to be byte-swapped to suit qemu; it appears INSN_DMB_MCR (memory barrier for ARMv6) also should be byte swapped (and this patch does so). I have not checked for correctness of aarch64's barrier instruction. Cc: qemu-stable@nongnu.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Henry Wertz <hwertz10@gmail.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-05-01 11:56:07 -07:00
Richard Henderson	d103021269	tcg: Document INDEX_mul[us]h_* Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-05-01 11:41:56 -07:00
Peter Maydell	161dfd1e7f	tcg/mips: Handle large offsets from target env to tlb_table The MIPS TCG target makes the assumption that the offset from the target env pointer to the tlb_table is less than about 64K. This used to be true, but gradual addition of features to the Arm target means that it's no longer true there. This results in the build-time assertion failing: In file included from /home/pm215/qemu/include/qemu/osdep.h:36:0, from /home/pm215/qemu/tcg/tcg.c:28: /home/pm215/qemu/tcg/mips/tcg-target.inc.c: In function ‘tcg_out_tlb_load’: /home/pm215/qemu/include/qemu/compiler.h:90:36: error: static assertion failed: "not expecting: offsetof(CPUArchState, tlb_table[NB_MMU_MODES - 1][1]) > 0x7ff0 + 0x7fff" #define QEMU_BUILD_BUG_MSG(x, msg) _Static_assert(!(x), msg) ^ /home/pm215/qemu/include/qemu/compiler.h:98:30: note: in expansion of macro ‘QEMU_BUILD_BUG_MSG’ #define QEMU_BUILD_BUG_ON(x) QEMU_BUILD_BUG_MSG(x, "not expecting: " #x) ^ /home/pm215/qemu/tcg/mips/tcg-target.inc.c:1236:9: note: in expansion of macro ‘QEMU_BUILD_BUG_ON’ QEMU_BUILD_BUG_ON(offsetof(CPUArchState, ^ /home/pm215/qemu/rules.mak:66: recipe for target 'tcg/tcg.o' failed An ideal long term approach would be to rearrange the CPU state so that the tlb_table was not so far along it, but this is tricky because it would move it from the "not cleared on CPU reset" part of the struct to the "cleared on CPU reset" part. As a simple fix for the 2.12 release, make the MIPS TCG target handle an arbitrary offset by emitting more add instructions. This will mean an extra instruction in the fastpath for TCG loads and stores for the affected guests (currently just aarch64-softmmu). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Message-id: 20180413142336.32163-1-peter.maydell@linaro.org	2018-04-16 11:51:37 +01:00
Richard Henderson	9743cd5736	tcg: Introduce tcg_set_insn_start_param The parameters for tcg_gen_insn_start are target_ulong, which may be split into two TCGArg parameters for storage in the opcode on 32-bit hosts. Fixes the ARM target and its direct use of tcg_set_insn_param, which would set the wrong argument in the 64-on-32 case. Cc: qemu-stable@nongnu.org Reported-by: alarson@ddci.com Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20180410003558.2470-1-richard.henderson@linaro.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-04-10 13:02:26 +01:00
Richard Henderson	f2f1dde751	tcg: Mark muluh_i64 and mulsh_i64 as 64-bit ops Failure to do so results in the tcg optimizer sign-extending any constant fold from 32-bits. This turns out to be visible in the RISC-V testsuite using a host that emits these opcodes (e.g. any non-x86_64). Reported-by: Michael Clark <mjc@sifive.com> Reviewed-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-03-28 12:45:16 +08:00
Richard Henderson	adb196cbd5	tcg: Add choose_vector_size This unifies 5 copies of checks for supported vector size, and in the process fixes a missing check in tcg_gen_gvec_2s. This lead to an assertion failure for 64-bit vector multiply, which is not available in the AVX instruction set. Suggested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-03-16 00:55:04 +08:00
Richard Henderson	7f34ed4bcd	tcg/i386: Support INDEX_op_dup2_vec for -m32 Unknown why -m32 was passing with gcc but not clang; it should have failed for both. This would be used for tcg_gen_dup_i64_vec, and visible with the right TB and an aarch64 guest. Reported-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-03-16 00:55:04 +08:00
Richard Henderson	b2e3ae9452	tcg: Improve tcg_gen_muli_i32/i64 Convert multiplication by power of two to left shift. Reviewed-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-03-16 00:55:04 +08:00
Richard Henderson	14e4c1e235	tcg/aarch64: Add vector operations Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-02-08 15:54:08 +00:00
Richard Henderson	770c2fc7bb	tcg/i386: Add vector operations The x86 vector instruction set is extremely irregular. With newer editions, Intel has filled in some of the blanks. However, we don't get many 64-bit operations until SSE4.2, introduced in 2009. The subsequent edition was for AVX1, introduced in 2011, which added three-operand addressing, and adjusts how all instructions should be encoded. Given the relatively narrow 2 year window between possible to support and desirable to support, and to vastly simplify code maintainence, I am only planning to support AVX1 and later cpus. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-02-08 15:54:08 +00:00
Richard Henderson	170ba88f45	tcg/optimize: Handle vector opcodes during optimize Trivial move and constant propagation. Some identity and constant function folding, but nothing that requires knowledge of the size of the vector element. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-02-08 15:54:06 +00:00
Richard Henderson	22fc352703	tcg: Add generic vector helpers with a scalar operand Use dup to convert a non-constant scalar to a third vector. Add addition, multiplication, and logical operations with an immediate. Add addition, subtraction, multiplication, and logical operations with a non-constant scalar. Allow for the front-end to build operations in which the scalar operand comes first. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-02-08 15:54:06 +00:00
Richard Henderson	f49b12c6e6	tcg: Add generic helpers for saturating arithmetic No vector ops as yet. SSE only has direct support for 8- and 16-bit saturation; handling 32- and 64-bit saturation is much more expensive. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-02-08 15:54:06 +00:00
Richard Henderson	3774030a3e	tcg: Add generic vector ops for multiplication Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-02-08 15:54:06 +00:00
Richard Henderson	212be173f0	tcg: Add generic vector ops for comparisons Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-02-08 15:54:05 +00:00
Richard Henderson	d0ec97967f	tcg: Add generic vector ops for constant shifts Opcodes are added for scalar and vector shifts, but considering the varied semantics of these do not expose them to the front ends. Do go ahead and provide them in case they are needed for backend expansion. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-02-08 15:54:05 +00:00
Richard Henderson	db432672dc	tcg: Add generic vector expanders Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-02-08 15:54:05 +00:00
Richard Henderson	474b2e8f0f	tcg: Standardize integral arguments to expanders Some functions use intN_t arguments, some use uintN_t, some just used "unsigned". To aid putting function pointers in tables, we need consistency. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-02-08 15:54:05 +00:00
Richard Henderson	d2fd745fe8	tcg: Add types and basic operations for host vectors Nothing uses or enables them yet. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-02-08 15:54:04 +00:00
Richard Henderson	da73a4abca	tcg: Allow multiple word entries into the constant pool This will be required for storing vector constants. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-02-08 15:53:34 +00:00
Richard Henderson	030ffe39dd	tcg/ppc: Allow a 32-bit offset to the constant pool We recently relaxed the limit of the number of opcodes that can appear in a TranslationBlock. In certain cases this has resulted in relocation overflow. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-01-16 08:21:56 -08:00
Richard Henderson	4a64e0fd68	tcg/ppc: Support tlb offsets larger than 64k AArch64 with SVE has an offset of 80k to the 8th TLB. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-01-16 08:21:56 -08:00
Richard Henderson	71f9cee9d0	tcg/arm: Support tlb offsets larger than 64k AArch64 with SVE has an offset of 80k to the 8th TLB. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-01-16 08:21:56 -08:00
Richard Henderson	7170ac3313	tcg/arm: Fix double-word comparisons The code sequence we were generating was only good for unsigned comparisons. For signed comparisions, use the sequence from gcc. Fixes booting of ppc64 firmware, with a patch changing the code sequence for ppc comparisons. Tested-by: Michael Roth <mdroth@linux.vnet.ibm.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-01-16 08:20:39 -08:00
Richard Henderson	1df3caa946	tcg: Allow 6 arguments to TCG helpers We already handle this in the backends, and the lifetime datum for the TCGOp is already large enough. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-12-29 12:43:40 -08:00
Richard Henderson	923ed17501	tcg: Add tcg_signed_cond Complimenting the existing tcg_unsigned_cond. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-12-29 12:43:40 -08:00
Richard Henderson	cd9090aa9d	tcg: Generalize TCGOp parameters We had two fields specific to INDEX_op_call. Rename these and add some macros so that the fields may be reused for other opcodes. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-12-29 12:43:39 -08:00
Richard Henderson	15fa08f845	tcg: Dynamically allocate TCGOps With no fixed array allocation, we can't overflow a buffer. This will be important as optimizations related to host vectors may expand the number of ops used. Use QTAILQ to link the ops together. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-12-29 12:43:39 -08:00
Richard Henderson	f764718d0c	tcg: Remove TCGV_UNUSED* and TCGV_IS_UNUSED* These are now trivial sets and tests against NULL. Unwrap. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-12-29 12:43:39 -08:00
Richard Henderson	ba2c747992	tcg/s390x: Use constant pool for prologue Rather than have separate code only used for guest_base, rely on a recent change to handle constant pool entries. Cc: qemu-s390x@nongnu.org Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-11-03 09:33:45 +01:00
Richard Henderson	5b38ee3161	tcg: Allow constant pool entries in the prologue Both ARMv6 and AArch64 currently may drop complex guest_base values into the constant pool. But generic code wasn't expecting that, and the pool is not emitted. Correct that. Tested-by: Emilio G. Cota <cota@braap.org> Tested-by: Laurent Desnogues <laurent.desnogues@gmail.com> Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-11-03 09:33:45 +01:00
Richard Henderson	1c2adb958f	tcg: Initialize cpu_env generically This is identical for each target. So, move the initialization to common code. Move the variable itself out of tcg_ctx and name it cpu_env to minimize changes within targets. This also means we can remove tcg_global_reg_new_{ptr,i32,i64}, since there are no longer global-register temps created by targets. Reviewed-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 13:53:42 -07:00
Emilio G. Cota	3468b59e18	tcg: enable multiple TCG contexts in softmmu This enables parallel TCG code generation. However, we do not take advantage of it yet since tb_lock is still held during tb_gen_code. In user-mode we use a single TCG context; see the documentation added to tcg_region_init for the rationale. Note that targets do not need any conversion: targets initialize a TCGContext (e.g. defining TCG globals), and after this initialization has finished, the context is cloned by the vCPU threads, each of them keeping a separate copy. TCG threads claim one entry in tcg_ctxs[] by atomically increasing n_tcg_ctxs. Do not be too annoyed by the subsequent atomic_read's of that variable and tcg_ctxs; they are there just to play nice with analysis tools such as thread sanitizer. Note that we do not allocate an array of contexts (we allocate an array of pointers instead) because when tcg_context_init is called, we do not know yet how many contexts we'll use since the bool behind qemu_tcg_mttcg_enabled() isn't set yet. Previous patches folded some TCG globals into TCGContext. The non-const globals remaining are only set at init time, i.e. before the TCG threads are spawned. Here is a list of these set-at-init-time globals under tcg/: Only written by tcg_context_init: - indirect_reg_alloc_order - tcg_op_defs Only written by tcg_target_init (called from tcg_context_init): - tcg_target_available_regs - tcg_target_call_clobber_regs - arm: arm_arch, use_idiv_instructions - i386: have_cmov, have_bmi1, have_bmi2, have_lzcnt, have_movbe, have_popcnt - mips: use_movnz_instructions, use_mips32_instructions, use_mips32r2_instructions, got_sigill (tcg_target_detect_isa) - ppc: have_isa_2_06, have_isa_3_00, tb_ret_addr - s390: tb_ret_addr, s390_facilities - sparc: qemu_ld_trampoline, qemu_st_trampoline (build_trampolines), use_vis3_instructions Only written by tcg_prologue_init: - 'struct jit_code_entry one_entry' - aarch64: tb_ret_addr - arm: tb_ret_addr - i386: tb_ret_addr, guest_base_flags - ia64: tb_ret_addr - mips: tb_ret_addr, bswap32_addr, bswap32u_addr, bswap64_addr Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 13:53:42 -07:00
Emilio G. Cota	e8feb96fcc	tcg: introduce regions to split code_gen_buffer This is groundwork for supporting multiple TCG contexts. The naive solution here is to split code_gen_buffer statically among the TCG threads; this however results in poor utilization if translation needs are different across TCG threads. What we do here is to add an extra layer of indirection, assigning regions that act just like pages do in virtual memory allocation. (BTW if you are wondering about the chosen naming, I did not want to use blocks or pages because those are already heavily used in QEMU). We use a global lock to serialize allocations as well as statistics reporting (we now export the size of the used code_gen_buffer with tcg_code_size()). Note that for the allocator we could just use a counter and atomic_inc; however, that would complicate the gathering of tcg_code_size()-like stats. So given that the region operations are not a fast path, a lock seems the most reasonable choice. The effectiveness of this approach is clear after seeing some numbers. I used the bootup+shutdown of debian-arm with '-tb-size 80' as a benchmark. Note that I'm evaluating this after enabling per-thread TCG (which is done by a subsequent commit). * -smp 1, 1 region (entire buffer): qemu: flush code_size=83885014 nb_tbs=154739 avg_tb_size=357 qemu: flush code_size=83884902 nb_tbs=153136 avg_tb_size=363 qemu: flush code_size=83885014 nb_tbs=152777 avg_tb_size=364 qemu: flush code_size=83884950 nb_tbs=150057 avg_tb_size=373 qemu: flush code_size=83884998 nb_tbs=150234 avg_tb_size=373 qemu: flush code_size=83885014 nb_tbs=154009 avg_tb_size=360 qemu: flush code_size=83885014 nb_tbs=151007 avg_tb_size=370 qemu: flush code_size=83885014 nb_tbs=151816 avg_tb_size=367 That is, 8 flushes. * -smp 8, 32 regions (80/32 MB per region) [i.e. this patch]: qemu: flush code_size=76328008 nb_tbs=141040 avg_tb_size=356 qemu: flush code_size=75366534 nb_tbs=138000 avg_tb_size=361 qemu: flush code_size=76864546 nb_tbs=140653 avg_tb_size=361 qemu: flush code_size=76309084 nb_tbs=135945 avg_tb_size=375 qemu: flush code_size=74581856 nb_tbs=132909 avg_tb_size=375 qemu: flush code_size=73927256 nb_tbs=135616 avg_tb_size=360 qemu: flush code_size=78629426 nb_tbs=142896 avg_tb_size=365 qemu: flush code_size=76667052 nb_tbs=138508 avg_tb_size=368 Again, 8 flushes. Note how buffer utilization is not 100%, but it is close. Smaller region sizes would yield higher utilization, but we want region allocation to be rare (it acquires a lock), so we do not want to go too small. * -smp 8, static partitioning of 8 regions (10 MB per region): qemu: flush code_size=21936504 nb_tbs=40570 avg_tb_size=354 qemu: flush code_size=11472174 nb_tbs=20633 avg_tb_size=370 qemu: flush code_size=11603976 nb_tbs=21059 avg_tb_size=365 qemu: flush code_size=23254872 nb_tbs=41243 avg_tb_size=377 qemu: flush code_size=28289496 nb_tbs=52057 avg_tb_size=358 qemu: flush code_size=43605160 nb_tbs=78896 avg_tb_size=367 qemu: flush code_size=45166552 nb_tbs=82158 avg_tb_size=364 qemu: flush code_size=63289640 nb_tbs=116494 avg_tb_size=358 qemu: flush code_size=51389960 nb_tbs=93937 avg_tb_size=362 qemu: flush code_size=59665928 nb_tbs=107063 avg_tb_size=372 qemu: flush code_size=38380824 nb_tbs=68597 avg_tb_size=374 qemu: flush code_size=44884568 nb_tbs=79901 avg_tb_size=376 qemu: flush code_size=50782632 nb_tbs=90681 avg_tb_size=374 qemu: flush code_size=39848888 nb_tbs=71433 avg_tb_size=372 qemu: flush code_size=64708840 nb_tbs=119052 avg_tb_size=359 qemu: flush code_size=49830008 nb_tbs=90992 avg_tb_size=362 qemu: flush code_size=68372408 nb_tbs=123442 avg_tb_size=368 qemu: flush code_size=33555560 nb_tbs=59514 avg_tb_size=378 qemu: flush code_size=44748344 nb_tbs=80974 avg_tb_size=367 qemu: flush code_size=37104248 nb_tbs=67609 avg_tb_size=364 That is, 20 flushes. Note how a static partitioning approach uses the code buffer poorly, leading to many unnecessary flushes. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 13:53:42 -07:00
Emilio G. Cota	34184b0718	tcg: allocate optimizer temps with tcg_malloc Groundwork for supporting multiple TCG contexts. While at it, also allocate temps_used directly as a bitmap of the required size, instead of using a bitmap of TCG_MAX_TEMPS via TCGTempSet. Performance-wise we lose about 1.12% in a translation-heavy workload such as booting+shutting down debian-arm: Performance counter stats for 'taskset -c 0 arm-softmmu/qemu-system-arm \ -machine type=virt -nographic -smp 1 -m 4096 \ -netdev user,id=unet,hostfwd=tcp::2222-:22 \ -device virtio-net-device,netdev=unet \ -drive file=die-on-boot.qcow2,id=myblock,index=0,if=none \ -device virtio-blk-device,drive=myblock \ -kernel kernel.img -append console=ttyAMA0 root=/dev/vda1 \ -name arm,debug-threads=on -smp 1' (10 runs): exec time (s) Relative slowdown wrt original (%) --------------------------------------------------------------- original 20.213321616 0. tcg_malloc 20.441130078 1.1270214 TCGContext 20.477846517 1.3086662 g_malloc 20.780527895 2.8061013 The other two alternatives shown in the table are: - TCGContext: embed temps[TCG_MAX_TEMPS] and TCGTempSet used_temps in TCGContext. This is simple enough but it isn't faster than using tcg_malloc; moreover, it wastes memory. - g_malloc: allocate/deallocate both temps and used_temps every time tcg_optimize is executed. Suggested-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 13:53:42 -07:00
Emilio G. Cota	c3fac1138e	tcg: distribute profiling counters across TCGContext's This is groundwork for supporting multiple TCG contexts. To avoid scalability issues when profiling info is enabled, this patch makes the profiling info counters distributed via the following changes: 1) Consolidate profile info into its own struct, TCGProfile, which TCGContext also includes. Note that tcg_table_op_count is brought into TCGProfile after dropping the tcg_ prefix. 2) Iterate over the TCG contexts in the system to obtain the total counts. This change also requires updating the accessors to TCGProfile fields to use atomic_read/set whenever there may be conflicting accesses (as defined in C11) to them. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 13:53:42 -07:00
Emilio G. Cota	df2cce2968	tcg: introduce **tcg_ctxs to keep track of all TCGContext's Groundwork for supporting multiple TCG contexts. Note that having n_tcg_ctxs is unnecessary. However, it is convenient to have it, since it will simplify iterating over the array: we'll have just a for loop instead of having to iterate over a NULL-terminated array (which would require n+1 elems) or having to check with ifdef's for usermode/softmmu. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 13:53:42 -07:00
Emilio G. Cota	26689780f8	gen-icount: fold exitreq_label into TCGContext Groundwork for supporting multiple TCG contexts. Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 13:53:42 -07:00
Emilio G. Cota	b1311c4acf	tcg: define tcg_init_ctx and make tcg_ctx a pointer Groundwork for supporting multiple TCG contexts. The core of this patch is this change to tcg/tcg.h: > -extern TCGContext tcg_ctx; > +extern TCGContext tcg_init_ctx; > +extern TCGContext tcg_ctx; Note that for now we set tcg_ctx to whatever TCGContext is passed to tcg_context_init -- in this case &tcg_init_ctx. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 13:53:42 -07:00
Emilio G. Cota	44ded3d048	tcg: take tb_ctx out of TCGContext Groundwork for supporting multiple TCG contexts. Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 13:53:42 -07:00
Emilio G. Cota	e82d5a2460	tcg: check CF_PARALLEL instead of parallel_cpus Thereby decoupling the resulting translated code from the current state of the system. The tb->cflags field is not passed to tcg generation functions. So we add a field to TCGContext, storing there a copy of tb->cflags. Most architectures have <= 32 registers, which results in a 4-byte hole in TCGContext. Use this hole for the new field. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 13:53:42 -07:00
Emilio G. Cota	4e2ca83e71	tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK This will enable us to decouple code translation from the value of parallel_cpus at any given time. It will also help us minimize TB flushes when generating code via EXCP_ATOMIC. Note that the declaration of parallel_cpus is brought to exec-all.h to be able to define there the "curr_cflags" inline. Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 13:53:41 -07:00
Richard Henderson	e89b28a635	tcg: Use offsets not indices for TCGv_* Using the offset of a temporary, relative to TCGContext, rather than its index means that we don't use 0. That leaves offset 0 free for a NULL representation without having to leave index 0 unused. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 13:53:36 -07:00
Richard Henderson	11f4e8f8bf	tcg: Remove TCGV_EQUAL* When we used structures for TCGv_*, we needed a macro in order to perform a comparison. Now that we use pointers, this is just clutter. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 21:50:15 +02:00
Richard Henderson	dc41aa7d34	tcg: Remove GET_TCGV_* and MAKE_TCGV_* The GET and MAKE functions weren't really specific enough. We now have a full complement of functions that convert exactly between temporaries, arguments, tcgv pointers, and indices. The target/sparc change is also a bug fix, which would have affected a host that defines TCG_TARGET_HAS_extr[lh]_i64_i32, i.e. MIPS64. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 21:49:30 +02:00
Richard Henderson	085272b35e	tcg: Introduce temp_tcgv_{i32,i64,ptr} Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 21:48:59 +02:00
Richard Henderson	ae8b75dc6e	tcg: Introduce tcgv_{i32,i64,ptr}_{arg,temp} Transform TCGv_* to an "argument" or a temporary. For now, an argument is simply the temporary index. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 21:47:46 +02:00
Richard Henderson	960c50e077	tcg: Push tcg_ctx into tcg_gen_callN Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 21:47:29 +02:00
Richard Henderson	b7e8b17a77	tcg: Push tcg_ctx into generator functions Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 21:45:07 +02:00
Richard Henderson	6349039d0b	tcg: Use per-temp state data in optimize While we're touching many of the lines anyway, adjust the naming of the functions to better distinguish when "TCGArg" vs "TCGTemp" should be used. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-10-24 21:45:07 +02:00
Richard Henderson	54534d7cfd	tcg: Remove unused TCG_CALL_DUMMY_TCGV Reviewed-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-10-24 21:45:07 +02:00
Richard Henderson	2272e4a791	tcg: Change temp_allocate_frame arg to TCGTemp Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-10-24 21:44:52 +02:00
Richard Henderson	ac3b88911e	tcg: Avoid loops against variable bounds Copy s->nb_globals or s->nb_temps to a local variable for the purposes of iteration. This should allow the compiler to use low-overhead looping constructs on some hosts. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-10-24 21:44:34 +02:00
Richard Henderson	b83eabeac0	tcg: Use per-temp state data in liveness This avoids having to allocate external memory for each temporary. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-10-24 21:44:34 +02:00
Richard Henderson	1807f4c400	tcg: Introduce temp_arg, export temp_idx At the same time, drop the TCGContext argument and use tcg_ctx instead. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-10-24 21:44:12 +02:00
Richard Henderson	c6c7d84df8	tcg: Return NULL temp for TCG_CALL_DUMMY_ARG Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-10-24 21:44:02 +02:00
Richard Henderson	fa477d2547	tcg: Add temp_global bit to TCGTemp This avoids needing to test the index of a temp against nb_globals. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-10-24 21:43:50 +02:00
Richard Henderson	434391390b	tcg: Introduce arg_temp Reviewed-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-10-24 21:43:36 +02:00
Richard Henderson	dd18629201	tcg: Propagate TCGOp down to allocators Reviewed-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-10-24 21:34:47 +02:00
Richard Henderson	efee3746fa	tcg: Propagate args to op->args in tcg.c Reviewed-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-10-24 21:34:47 +02:00
Richard Henderson	acd937019b	tcg: Propagate args to op->args in optimizer Reviewed-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-10-24 21:34:47 +02:00
Richard Henderson	75e8b9b7aa	tcg: Merge opcode arguments into TCGOp Rather than have a separate buffer of 10*max_ops entries, give each opcode 10 entries. The result is actually a bit smaller and should have slightly more cache locality. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-10-24 21:34:47 +02:00
Jiang Biao	8df8d529ed	tcg/mips: delete commented out extern keyword. Delete commented out extern keyword on link_error(). Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn> Message-Id: <1506762042-32145-1-git-send-email-jiang.biao2@zte.com.cn> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-10 09:45:01 -07:00
Emilio G. Cota	a505785cd2	tcg: define TCG_HIGHWATER Will come in handy very soon. Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-10 09:45:00 -07:00
Emilio G. Cota	619205fd1f	tcg: take .helpers out of TCGContext Groundwork for supporting multiple TCG contexts. The hash table becomes read-only after it is filled in, so we can save space by keeping just a global pointer to it. Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-10 07:37:10 -07:00
Emilio G. Cota	5e75150cdf	tci: move tci_regs to tcg_qemu_tb_exec's stack Groundwork for supporting multiple TCG contexts. Compile-tested for all targets on an x86_64 host. Suggested-by: Richard Henderson <rth@twiddle.net> Acked-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-10 07:37:10 -07:00
Emilio G. Cota	e7e168f413	exec-all: extract tb->tc_* into a separate struct tc_tb In preparation for adding tc.size to be able to keep track of TB's using the binary search tree implementation from glib. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-10 07:37:10 -07:00
Emilio G. Cota	7f11636dbe	tcg: remove addr argument from lookup_tb_ptr It is unlikely that we will ever want to call this helper passing an argument other than the current PC. So just remove the argument, and use the pc we already get from cpu_get_tb_cpu_state. This change paves the way to having a common "tb_lookup" function. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-10 07:37:10 -07:00
Emilio G. Cota	d453ec7825	tcg/mips: constify tcg_target_callee_save_regs Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-10 07:37:10 -07:00
Emilio G. Cota	e268f4c036	tcg/i386: constify tcg_target_callee_save_regs Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-10 07:37:10 -07:00
Richard Henderson	89b2e37e65	tcg/mips: Fully convert tcg_target_op_def Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-09-17 06:52:19 -07:00
Richard Henderson	9be44a16c2	tcg/sparc: Fully convert tcg_target_op_def Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-09-17 06:52:19 -07:00
Richard Henderson	6cb3658a04	tcg/ppc: Fully convert tcg_target_op_def Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-09-17 06:52:19 -07:00
Richard Henderson	7536b82d28	tcg/arm: Fully convert tcg_target_op_def Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-09-17 06:52:19 -07:00
Richard Henderson	1897cc2eb8	tcg/aarch64: Fully convert tcg_target_op_def Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-09-17 06:52:19 -07:00
Richard Henderson	80a8b9a910	tcg: Fix types in tcg_regset_{set,reset}_reg There was a potential problem here with an ILP32 host with 64 host registers. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-09-17 06:52:19 -07:00
Richard Henderson	f46934df66	tcg: Remove tcg_regset_set32 It's not even clear what the interface REG and VAL32 were supposed to mean. All uses had REG = 0 and VAL32 was the bitset assigned to the destination. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-09-17 06:52:19 -07:00
Richard Henderson	07ddf036fa	tcg: Remove tcg_regset_{or,and,andnot,not} Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-09-17 06:52:19 -07:00
Richard Henderson	d21369f5fb	tcg: Remove tcg_regset_set Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-09-17 06:52:19 -07:00

... 3 4 5 6 7 ...

1998 Commits