qemu-e2k

Author	SHA1	Message	Date
Richard Henderson	bcefc90208	tcg: Add support for vector absolute value Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 22:52:08 +00:00
Richard Henderson	0a8d7a3bf5	tcg/i386: Support vector scalar shift opcodes Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 22:52:08 +00:00
Richard Henderson	a2ce146a06	tcg/i386: Support vector variable shift opcodes Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 22:52:08 +00:00
Richard Henderson	37ee55a081	tcg: Add INDEX_op_dupm_vec Allow the backend to expand dup from memory directly, instead of forcing the value into a temp first. This is especially important if integer/vector register moves do not exist. Note that officially tcg_out_dupm_vec is allowed to fail. If it did, we could fix this up relatively easily: VECE == 32/64: Load the value into a vector register, then dup. Both of these must work. VECE == 8/16: If the value happens to be at an offset such that an aligned load would place the desired value in the least significant end of the register, go ahead and load w/garbage in high bits. Load the value w/INDEX_op_ld{8,16}_i32. Attempt a move directly to vector reg, which may fail. Store the value into the backing store for OTS. Load the value into the vector reg w/TCG_TYPE_I32, which must work. Duplicate from the vector reg into itself, which must work. All of which is well and good, except that all supported hosts can support dupm for all vece, so all of the failure paths would be dead code and untestable. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 22:52:08 +00:00
Richard Henderson	1e262b49b5	tcg/i386: Implement tcg_out_dupm_vec At the same time, improve tcg_out_dupi_vec wrt broadcast from the constant pool. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 14:44:03 -07:00
Richard Henderson	d6ecb4a978	tcg: Add tcg_out_dupm_vec to the backend interface Currently stubbed out in all backends that support vectors. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 14:44:03 -07:00
Richard Henderson	bab1671f0f	tcg: Manually expand INDEX_op_dup_vec This case is similar to INDEX_op_mov_* in that we need to do different things depending on the current location of the source. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- v3: Added some commentary to the tcg_reg_alloc_* functions.	2019-05-13 14:44:03 -07:00
Richard Henderson	e7632cfa8b	tcg: Promote tcg_out_{dup,dupi}_vec to backend interface The i386 backend already has these functions, and the aarch64 backend could easily split out one. Nothing is done with these functions yet, but this will aid register allocation of INDEX_op_dup_vec in a later patch. Adjust the aarch64 tcg_out_dupi_vec signature to match the new interface. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 14:44:03 -07:00
Richard Henderson	78113e83e0	tcg: Return bool success from tcg_out_mov This patch merely changes the interface, aborting on all failures, of which there are currently none. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-05-13 14:44:03 -07:00
Richard Henderson	aeee05f53a	tcg: Restart TB generation after out-of-line ldst overflow This is part c of relocation overflow handling. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-04-24 13:05:28 -07:00
Richard Henderson	c6fb8c0cf7	tcg/i386: Support INDEX_op_extract2_{i32,i64} Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-04-24 13:04:33 -07:00
Richard Henderson	fce1296f13	tcg: Add INDEX_op_extract2_{i32,i64} This will let backends implement the double-word shift operation. Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-04-24 13:04:33 -07:00
Mark Cave-Ayland	3115584d39	tcg/i386: fix unsigned vector saturating arithmetic Due to a cut/paste error in the original implementation, the unsigned vector saturating arithmetic was erroneously being calculated as signed vector saturating arithmetic. Fixes: `8ffafbcec2` ("tcg/i386: Implement vector saturating arithmetic") Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Message-Id: <20190207224258.426-1-mark.cave-ayland@ilande.co.uk> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-02-11 08:52:44 -08:00
Richard Henderson	e77c89fb08	cputlb: Remove static tlb sizing Now that all tcg backends support TCG_TARGET_IMPLEMENTS_DYN_TLB, remove the define and the old code. Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-01-28 07:04:35 -08:00
Emilio G. Cota	54eaf40b8f	tcg/i386: enable dynamic TLB sizing As the following experiments show, this series is a net perf gain, particularly for memory-heavy workloads. Experiments are run on an Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz. 1. System boot + shudown, debian aarch64: - Before (v3.1.0): Performance counter stats for './die.sh v3.1.0' (10 runs): 9019.797015 task-clock (msec) # 0.993 CPUs utilized ( +- 0.23% ) 29,910,312,379 cycles # 3.316 GHz ( +- 0.14% ) 54,699,252,014 instructions # 1.83 insn per cycle ( +- 0.08% ) 10,061,951,686 branches # 1115.541 M/sec ( +- 0.08% ) 172,966,530 branch-misses # 1.72% of all branches ( +- 0.07% ) 9.084039051 seconds time elapsed ( +- 0.23% ) - After: Performance counter stats for './die.sh tlb-dyn-v5' (10 runs): 8624.084842 task-clock (msec) # 0.993 CPUs utilized ( +- 0.23% ) 28,556,123,404 cycles # 3.311 GHz ( +- 0.13% ) 51,755,089,512 instructions # 1.81 insn per cycle ( +- 0.05% ) 9,526,513,946 branches # 1104.641 M/sec ( +- 0.05% ) 166,578,509 branch-misses # 1.75% of all branches ( +- 0.19% ) 8.680540350 seconds time elapsed ( +- 0.24% ) That is, a 4.4% perf increase. 2. System boot + shutdown, ubuntu 18.04 x86_64: - Before (v3.1.0): 56100.574751 task-clock (msec) # 1.016 CPUs utilized ( +- 4.81% ) 200,745,466,128 cycles # 3.578 GHz ( +- 5.24% ) 431,949,100,608 instructions # 2.15 insn per cycle ( +- 5.65% ) 77,502,383,330 branches # 1381.490 M/sec ( +- 6.18% ) 844,681,191 branch-misses # 1.09% of all branches ( +- 3.82% ) 55.221556378 seconds time elapsed ( +- 5.01% ) - After: 56603.419540 task-clock (msec) # 1.019 CPUs utilized ( +- 10.19% ) 202,217,930,479 cycles # 3.573 GHz ( +- 10.69% ) 439,336,291,626 instructions # 2.17 insn per cycle ( +- 14.14% ) 80,538,357,447 branches # 1422.853 M/sec ( +- 16.09% ) 776,321,622 branch-misses # 0.96% of all branches ( +- 3.77% ) 55.549661409 seconds time elapsed ( +- 10.44% ) No improvement (within noise range). Note that for this workload, increasing the time window too much can lead to perf degradation, since it flushes the TLB very frequently. 3. x86_64 SPEC06int: x86_64-softmmu speedup vs. v3.1.0 for SPEC06int (test set) Host: Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz (Skylake) 5.5 +------------------------------------------------------------------------+ \| +-+ \| 5 \|-+.................+-+...............................tlb-dyn-v5.......+-\| \| * * \| 4.5 \|-+..................................................................+-\| \| * * \| 4 \|-+..................................................................+-\| \| * * \| 3.5 \|-+..................................................................+-\| \| * * \| 3 \|-+......+-+........................................................+-\| \| * * * \| 2.5 \|-+.................................................+-+...........+-\| \| * * * * * \| 2 \|-+..............................................................+-\| \| * * * * * * +-+ \| 1.5 \|-+....................................................+-+.+-+.+-\| \| * * +-+ * +-+ +-+ +-+ +-+ * * * * * \| 1 \|++++-++++++++++++++++-+++-+++-++++-++++-++++++++++++++\| \| * * * * * * * * * * * * * * * * * * * * * * * * * * \| 0.5 +------------------------------------------------------------------------+ 400.perlb401.bzip403.g429445.g456.hm462.libq464.h471.omn47483.xalancbgeomean png: https://imgur.com/YRF90f7 That is, a 1.51x average speedup over the baseline, with a max speedup of 5.17x. Here's a different look at the SPEC06int results, using KVM as the baseline: x86_64-softmmu slowdown vs. KVM for SPEC06int (test set) Host: Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz (Skylake) 25 +---------------------------------------------------------------------------+ \| +-+ +-+ \| \| * * +-+ v3.1.0 \| \| * * +-+ tlb-dyn-v5 \| \| * * * * +-+ \| 20 \|-+................................................+-+...............+-\| \| * * # # * * \| \| +-+ * * * # # * * \| \| * * * * * # # * * \| 15 \|-+..............................................#.#.......+-+......+-\| \| * * * * * # # * #\|# \| \| * * * * +-+ * # # * +-+ \| \| * * +-+ * * ++-+ +-+ * # # * # # +-+ \| \| * * +-+ * * * ## \| +-+ # # * # # +-+ \| 10 \|-+..........+-+...........##.......++-+..+-+.#.#.......#.#....+-\| \| * * +-+ * * * ## +-+ # # # #* # # +-+ * # # * * \| \| * * * # # * * +-+ * ## * +-+ # # # #* # # * * * # # +-+ \| \| * * # # * * * +-+ * ## * # # # # # #* # # * * * # # * ## \| 5 \|-+.......+-+.#.#.....#.#..##..#.#.#.#..#.#.#.#.....#.#..##.+-\| \| * # #* # # * +-+* # # * ## * # # # # # #* # # * * * # # * ## \| \| * # #* # # * # #* # # * ## * # # # # # #* # # * +-+* # # * ## \| \| ++-+ * # #* # # * # #* # # * ## * # # # # # #* # # * # #* # # * ## \| \|+++#+#++#+#+#+#++#+#+#+#++##++#+#+#+#++#+#+#+#++#+#+#+#+*+##+++\| 0 +---------------------------------------------------------------------------+ 400.perlbe401.bzi403.gc429445.go456.h462.libqu464.h471.omne4483.xalancbmgeomean png: https://imgur.com/YzAMNEV After this series, we bring down the average SPEC06int slowdown vs KVM from 11.47x to 7.58x. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20190116170114.26802-4-cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-01-28 07:03:34 -08:00
Emilio G. Cota	86e1eff8bc	tcg: introduce dynamic TLB sizing Disabled in all TCG backends for now. Tested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <20190116170114.26802-3-cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-01-28 07:03:34 -08:00
Richard Henderson	bc37faf4cb	tcg/i386: Implement vector minmax arithmetic The avx instruction set does not directly provide MO_64. We can still implement 64-bit with comparison and vpblendvb. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-01-28 07:03:34 -08:00
Richard Henderson	8ffafbcec2	tcg/i386: Implement vector saturating arithmetic Only MO_8 and MO_16 are implemented, since that's all the instruction set provides. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-01-28 07:03:34 -08:00
Richard Henderson	44f1441dbe	tcg/i386: Split subroutines out of tcg_expand_vec_op This routine was becoming too large. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-01-28 07:03:34 -08:00
Richard Henderson	dd0a0fcdd8	tcg: Add opcodes for vector minmax arithmetic Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-01-28 07:03:34 -08:00
Richard Henderson	8afaf05066	tcg: Add opcodes for vector saturated arithmetic Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2019-01-28 07:03:34 -08:00
Paolo Bonzini	7d37435bd5	avoid TABs in files that only contain a few Most files that have TABs only contain a handful of them. Change them to spaces so that we don't confuse people. disas, standard-headers, linux-headers and libdecnumber are imported from other projects and probably should be exempted from the check. Outside those, after this patch the following files still contain both 8-space and TAB sequences at the beginning of the line. Many of them have a majority of TABs, or were initially committed with all tabs. bsd-user/i386/target_syscall.h bsd-user/x86_64/target_syscall.h crypto/aes.c hw/audio/fmopl.c hw/audio/fmopl.h hw/block/tc58128.c hw/display/cirrus_vga.c hw/display/xenfb.c hw/dma/etraxfs_dma.c hw/intc/sh_intc.c hw/misc/mst_fpga.c hw/net/pcnet.c hw/sh4/sh7750.c hw/timer/m48t59.c hw/timer/sh_timer.c include/crypto/aes.h include/disas/bfd.h include/hw/sh4/sh.h libdecnumber/decNumber.c linux-headers/asm-generic/unistd.h linux-headers/linux/kvm.h linux-user/alpha/target_syscall.h linux-user/arm/nwfpe/double_cpdo.c linux-user/arm/nwfpe/fpa11_cpdt.c linux-user/arm/nwfpe/fpa11_cprt.c linux-user/arm/nwfpe/fpa11.h linux-user/flat.h linux-user/flatload.c linux-user/i386/target_syscall.h linux-user/ppc/target_syscall.h linux-user/sparc/target_syscall.h linux-user/syscall.c linux-user/syscall_defs.h linux-user/x86_64/target_syscall.h slirp/cksum.c slirp/if.c slirp/ip.h slirp/ip_icmp.c slirp/ip_icmp.h slirp/ip_input.c slirp/ip_output.c slirp/mbuf.c slirp/misc.c slirp/sbuf.c slirp/socket.c slirp/socket.h slirp/tcp_input.c slirp/tcpip.h slirp/tcp_output.c slirp/tcp_subr.c slirp/tcp_timer.c slirp/tftp.c slirp/udp.c slirp/udp.h target/cris/cpu.h target/cris/mmu.c target/cris/op_helper.c target/sh4/helper.c target/sh4/op_helper.c target/sh4/translate.c tcg/sparc/tcg-target.inc.c tests/tcg/cris/check_addo.c tests/tcg/cris/check_moveq.c tests/tcg/cris/check_swap.c tests/tcg/multiarch/test-mmap.c ui/vnc-enc-hextile-template.h ui/vnc-enc-zywrle.h util/envlist.c util/readline.c The following have only TABs: bsd-user/i386/target_signal.h bsd-user/sparc64/target_signal.h bsd-user/sparc64/target_syscall.h bsd-user/sparc/target_signal.h bsd-user/sparc/target_syscall.h bsd-user/x86_64/target_signal.h crypto/desrfb.c hw/audio/intel-hda-defs.h hw/core/uboot_image.h hw/sh4/sh7750_regnames.c hw/sh4/sh7750_regs.h include/hw/cris/etraxfs_dma.h linux-user/alpha/termbits.h linux-user/arm/nwfpe/fpopcode.h linux-user/arm/nwfpe/fpsr.h linux-user/arm/syscall_nr.h linux-user/arm/target_signal.h linux-user/cris/target_signal.h linux-user/i386/target_signal.h linux-user/linux_loop.h linux-user/m68k/target_signal.h linux-user/microblaze/target_signal.h linux-user/mips64/target_signal.h linux-user/mips/target_signal.h linux-user/mips/target_syscall.h linux-user/mips/termbits.h linux-user/ppc/target_signal.h linux-user/sh4/target_signal.h linux-user/sh4/termbits.h linux-user/sparc64/target_syscall.h linux-user/sparc/target_signal.h linux-user/x86_64/target_signal.h linux-user/x86_64/termbits.h pc-bios/optionrom/optionrom.h slirp/mbuf.h slirp/misc.h slirp/sbuf.h slirp/tcp.h slirp/tcp_timer.h slirp/tcp_var.h target/i386/svm.h target/sparc/asi.h target/xtensa/core-dc232b/xtensa-modules.inc.c target/xtensa/core-dc233c/xtensa-modules.inc.c target/xtensa/core-de212/core-isa.h target/xtensa/core-de212/xtensa-modules.inc.c target/xtensa/core-fsf/xtensa-modules.inc.c target/xtensa/core-sample_controller/core-isa.h target/xtensa/core-sample_controller/xtensa-modules.inc.c target/xtensa/core-test_kc705_be/core-isa.h target/xtensa/core-test_kc705_be/xtensa-modules.inc.c tests/tcg/cris/check_abs.c tests/tcg/cris/check_addc.c tests/tcg/cris/check_addcm.c tests/tcg/cris/check_addoq.c tests/tcg/cris/check_bound.c tests/tcg/cris/check_ftag.c tests/tcg/cris/check_int64.c tests/tcg/cris/check_lz.c tests/tcg/cris/check_openpf5.c tests/tcg/cris/check_sigalrm.c tests/tcg/cris/crisutils.h tests/tcg/cris/sys.c tests/tcg/i386/test-i386-ssse3.c ui/vgafont.h Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20181213223737.11793-3-pbonzini@redhat.com> Reviewed-by: Aleksandar Markovic <amarkovic@wavecomp.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com> Acked-by: Richard Henderson <richard.henderson@linaro.org> Acked-by: Eric Blake <eblake@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Stefan Markovic <smarkovic@wavecomp.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-01-11 15:46:56 +01:00
Richard Henderson	e1dcf3529d	tcg: Add TCG_TARGET_HAS_MEMORY_BSWAP For now, defined universally as true, since we previously required backends to implement swapped memory operations. Future patches may now remove that support where it is onerous. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:44 +03:00
Richard Henderson	5785c17f31	tcg/i386: Add setup_guest_base_seg for FreeBSD Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:44 +03:00
Richard Henderson	913c2bddc2	tcg/i386: Precompute all guest_base parameters These values are constant between all qemu_ld/st invocations; there is no need to figure this out each time. If we cannot use a segment or an offset directly for guest_base, load the value into a register in the prologue. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:44 +03:00
Richard Henderson	4810d96f03	tcg/i386: Assume 32-bit values are zero-extended We now have an invariant that all TCG_TYPE_I32 values are zero-extended, which means that we do not need to extend them again during qemu_ld/st, either explicitly via a separate tcg_out_ext32u or implicitly via P_ADDR32. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:44 +03:00
Richard Henderson	75478279a0	tcg/i386: Implement INDEX_op_extr{lh}_i64_i32 for 32-bit guests This preserves the invariant that all TCG_TYPE_I32 values are zero-extended in the 64-bit host register. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:44 +03:00
Richard Henderson	3dbc8c61de	tcg/i386: Propagate is64 to tcg_out_qemu_ld_slow_path This helps preserve the invariant that all TCG_TYPE_I32 values are stored zero-extended in the 64-bit host registers. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:44 +03:00
Richard Henderson	1d21d95b61	tcg/i386: Propagate is64 to tcg_out_qemu_ld_direct This helps preserve the invariant that all TCG_TYPE_I32 values are stored zero-extended in the 64-bit host registers. Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:43 +03:00
Richard Henderson	bec3afd5fc	tcg/i386: Return false on failure from patch_reloc Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:43 +03:00
Richard Henderson	6ac1778676	tcg: Return success from patch_reloc This will move the assert for success from within (subroutines of) patch_reloc into the callers. It will also let new code do something different when a relocation is out of range. For the moment, all backends are trivially converted to return true. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:43 +03:00
Richard Henderson	66c0285df4	tcg/i386: Move TCG_REG_CALL_STACK from define to enum Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:43 +03:00
Richard Henderson	5740d9f714	tcg/i386: Always use %ebp for TCG_AREG0 For x86_64, this can remove a REX prefix resulting in smaller code when manipulating globals of type i32, as we move them between backing store via cpu_env, aka TCG_AREG0. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:43 +03:00
Roman Kapl	93bf9a4273	tcg/i386: fix vector operations on 32-bit hosts The TCG backend uses LOWREGMASK to get the low 3 bits of register numbers. This was defined as no-op for 32-bit x86, with the assumption that we have eight registers anyway. This assumption is not true once we have xmm regs. Since LOWREGMASK was a no-op, xmm register indidices were wrong in opcodes and have overflown into other opcode fields, wreaking havoc. To trigger these problems, you can try running the "movi d8, #0x0" AArch64 instruction on 32-bit x86. "vpxor %xmm0, %xmm0, %xmm0" should be generated, but instead TCG generated "vpxor %xmm0, %xmm0, %xmm2". Fixes: `770c2fc7bb` ("Add vector operations") Signed-off-by: Roman Kapl <rka@sysgo.com> Message-Id: <20180824131734.18557-1-rka@sysgo.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-09-26 09:02:51 -07:00
Richard Henderson	672189cd58	tcg/i386: Mark xmm registers call-clobbered When host vector registers and operations were introduced, I failed to mark the registers call clobbered as required by the ABI. Fixes: `770c2fc7bb` Cc: qemu-stable@nongnu.org Reported-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-07-23 09:21:14 -07:00
Richard Henderson	9f75462065	tcg: Reduce max TB opcode count Also, assert that we don't overflow any of two different offsets into the TB. Both unwind and goto_tb both record a uint16_t for later use. This fixes an arm-softmmu test case utilizing NEON in which there is a TB generated that runs to 7800 opcodes, and compiles to 96k on an x86_64 host. This overflows the 16-bit offset in which we record the goto_tb reset offset. Because of that overflow, we install a jump destination that goes to neverland. Boom. With this reduced op count, the same TB compiles to about 48k for aarch64, ppc64le, and x86_64 hosts, and neither assertion fires. Cc: qemu-stable@nongnu.org Reported-by: "Jason A. Donenfeld" <Jason@zx2c4.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-06-15 09:39:53 -10:00
John Arbuckle	1019242af1	tcg/i386: Use byte form of xgetbv instruction The assembler in most versions of Mac OS X is pretty old and does not support the xgetbv instruction. To go around this problem, the raw encoding of the instruction is used instead. Signed-off-by: John Arbuckle <programmingkidx@gmail.com> Message-Id: <20180604215102.11002-1-programmingkidx@gmail.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-06-15 07:42:55 -10:00
Peter Maydell	7eb30ef0ba	tcg/i386: Fix dup_vec in non-AVX2 codepath The VPUNPCKLD* instructions are all "non-destructive source", indicated by "NDS" in the encoding string in the x86 ISA manual. This means that they take two source operands, one of which is encoded in the VEX.vvvv field. We were incorrectly treating them as if they were destructive-source and passing 0 as the 'v' argument of tcg_out_vex_modrm(). This meant we were always using %xmm0 as one of the source operands, causing incorrect results if the register allocator happened to want to use something else. For instance the input AArch64 insn: DUP v26.16b, w21 which becomes TCG IR ops: dup_vec v128,e8,tmp2,x21 st_vec v128,e8,tmp2,env,$0xa40 was assembled to: 0x607c568c: c4 c1 7a 7e 86 e8 00 00 vmovq 0xe8(%r14), %xmm0 0x607c5694: 00 0x607c5695: c5 f9 60 c8 vpunpcklbw %xmm0, %xmm0, %xmm1 0x607c5699: c5 f9 61 c9 vpunpcklwd %xmm1, %xmm0, %xmm1 0x607c569d: c5 f9 70 c9 00 vpshufd $0, %xmm1, %xmm1 0x607c56a2: c4 c1 7a 7f 8e 40 0a 00 vmovdqu %xmm1, 0xa40(%r14) 0x607c56aa: 00 when the vpunpcklwd insn should be "%xmm1, %xmm1, %xmm1". This resulted in our incorrectly setting the output vector to q26=0000320000003200:0000320000003200 when given an input of x21 == 0000000002803200 rather than the expected all-zeroes. Pass the correct source register number to tcg_out_vex_modrm() for these insns. Fixes: `770c2fc7bb` Cc: qemu-stable@nongnu.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20180504153431.5169-1-peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-05-09 08:30:57 -07:00
Richard Henderson	7f34ed4bcd	tcg/i386: Support INDEX_op_dup2_vec for -m32 Unknown why -m32 was passing with gcc but not clang; it should have failed for both. This would be used for tcg_gen_dup_i64_vec, and visible with the right TB and an aarch64 guest. Reported-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-03-16 00:55:04 +08:00
Richard Henderson	770c2fc7bb	tcg/i386: Add vector operations The x86 vector instruction set is extremely irregular. With newer editions, Intel has filled in some of the blanks. However, we don't get many 64-bit operations until SSE4.2, introduced in 2009. The subsequent edition was for AVX1, introduced in 2011, which added three-operand addressing, and adjusts how all instructions should be encoded. Given the relatively narrow 2 year window between possible to support and desirable to support, and to vastly simplify code maintainence, I am only planning to support AVX1 and later cpus. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-02-08 15:54:08 +00:00
Emilio G. Cota	e268f4c036	tcg/i386: constify tcg_target_callee_save_regs Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-10 07:37:10 -07:00
Richard Henderson	f46934df66	tcg: Remove tcg_regset_set32 It's not even clear what the interface REG and VAL32 were supposed to mean. All uses had REG = 0 and VAL32 was the bitset assigned to the destination. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-09-17 06:52:19 -07:00
Richard Henderson	ccb1bb66ea	tcg: Remove tcg_regset_clear Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-09-17 06:52:19 -07:00
Richard Henderson	4e45f23943	tcg/i386: Store out-of-range call targets in constant pool Already it saves 2 bytes per call, but also the constant pool entry may well be shared across multiple calls. Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-09-07 11:57:35 -07:00
Richard Henderson	659ef5cbb8	tcg: Rearrange ldst label tracking Dispense with TCGBackendData, as it has never been used for more than holding a single pointer. Use a define in the cpu/tcg-target.h to signal requirement for TCGLabelQemuLdst, so that we can drop the no-op tcg-be-null.h stubs. Rename tcg-be-ldst.h to tcg-ldst.inc.c. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-09-07 11:57:35 -07:00
Richard Henderson	a858339336	tcg: Move USE_DIRECT_JUMP discriminator to tcg/cpu/tcg-target.h Replace the USE_DIRECT_JUMP ifdef with a TCG_TARGET_HAS_direct_jump boolean test. Replace the tb_set_jmp_target1 ifdef with an unconditional function tb_target_set_jmp_target. While we're touching all backends, add a parameter for tb->tc_ptr; we're going to need it shortly for some backends. Move tb_set_jmp_target and tb_add_jump from exec-all.h to cpu-exec.c. This opens the possibility for TCG_TARGET_HAS_direct_jump to be a runtime decision -- based on host cpu capabilities, the size of code_gen_buffer, or a future debugging switch. Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-09-07 11:57:34 -07:00
Richard Henderson	5dd8990841	util: Introduce include/qemu/cpuid.h Clang 3.9 passes the CONFIG_AVX2_OPT configure test. However, the supplied <cpuid.h> does not contain the bit_AVX2 define that we use when detecting whether the routine can be enabled. Introduce a qemu-specific header that uses the compiler's definition of __cpuid et al, but supplies any missing bit_* definitions needed. This avoids introducing any extra ifdefs to util/bufferiszero.c, and allows quite a few to be removed from tcg/i386/tcg-target.inc.c. Signed-off-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20170719044018.18063-1-rth@twiddle.net Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-07-24 12:42:55 +01:00
Emilio G. Cota	5cb4ef80f6	tcg/i386: implement goto_ptr Suggested-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <1493263764-18657-6-git-send-email-cota@braap.org> [rth: Reuse goto_ptr epilogue for exit_tb 0.] Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-06-05 09:25:42 -07:00
Emilio G. Cota	cedbcb0152	tcg: Introduce goto_ptr opcode and tcg_gen_lookup_and_goto_ptr Instead of exporting goto_ptr directly to TCG frontends, export tcg_gen_lookup_and_goto_ptr(), which calls goto_ptr with the pointer returned by the lookup_tb_ptr() helper. This is the only use case we have for goto_ptr and lookup_tb_ptr, so having this function is very convenient. Furthermore, it trivially allows us to avoid calling the lookup helper if goto_ptr is not implemented by the backend. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <1493263764-18657-2-git-send-email-cota@braap.org> Message-Id: <1493263764-18657-3-git-send-email-cota@braap.org> Message-Id: <1493263764-18657-4-git-send-email-cota@braap.org> Message-Id: <1493263764-18657-5-git-send-email-cota@braap.org> [rth: Squashed 4 related commits.] Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-06-05 09:25:42 -07:00
Alex Bennée	ca759f9e38	tcg: enable MTTCG by default for ARM on x86 hosts This enables the multi-threaded system emulation by default for ARMv7 and ARMv8 guests using the x86_64 TCG backend. This is because on the guest side: - The ARM translate.c/translate-64.c have been converted to - use MTTCG safe atomic primitives - emit the appropriate barrier ops - The ARM machine has been updated to - hold the BQL when modifying shared cross-vCPU state - defer powerctl changes to async safe work All the host backends support the barrier and atomic primitives but need to provide same-or-better support for normal load/store operations. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Acked-by: Peter Maydell <peter.maydell@linaro.org> Tested-by: Pranith Kumar <bobby.prani@gmail.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>	2017-02-24 10:32:46 +00:00
Richard Henderson	39f099ec9d	tcg/i386: Always use TZCNT when available I think this is cleaner than sometimes using BSF. Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-01-17 12:02:08 -08:00
Richard Henderson	9bf38308f6	Revert "tcg/i386: Rely on undefined/undocumented behaviour of BSF/BSR" This reverts commit `4ac7691073`. This fixes http://lists.nongnu.org/archive/html/qemu-devel/2017-01/msg03062.html While I think we could get away with relying on the undocumented behaviour, the tcg constraint system isn't powerful enough to properly describe the required (non-)overlap conditions. Reported-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-01-17 11:59:13 -08:00
Richard Henderson	993508e43e	tcg/i386: Handle ctpop opcode Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-01-10 08:49:59 -08:00
Richard Henderson	a768e4e992	tcg: Add opcode for ctpop The number of actual invocations of ctpop itself does not warrent an opcode, but it is very helpful for POWER7 to use in generating an expansion for ctz. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-01-10 08:48:56 -08:00
Richard Henderson	4ac7691073	tcg/i386: Rely on undefined/undocumented behaviour of BSF/BSR The ISA manual documents the output is undefined if the input was zero. However, we document in target-i386 that the behavior of real silicon is to preserve the contents of the output register. We also mention that there are real applications that depend on this. That this is baked into silicon is mentioned as a potential cause for some false sharing behaviour wrt lzcnt/tzcnt. Taking advantage of this allows us to save 2 insns in the normal case, and 4 insns for i686 emulating a 64-bit clz. Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-01-10 08:47:48 -08:00
Richard Henderson	bbf25f90ba	tcg/i386: Handle ctz and clz opcodes Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-01-10 08:47:48 -08:00
Richard Henderson	6a5aed4bdc	tcg/i386: Allow bmi2 shiftx to have non-matching operands Previously we could not have different constraints for different ISA levels, which prevented us from eliding the matching constraint for shifts. We do now have to make sure that the operands match for constant shifts. We can also handle some small left shifts via lea. Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-01-10 08:47:48 -08:00
Richard Henderson	42d5b51492	tcg/i386: Hoist common arguments in tcg_out_op Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-01-10 08:47:48 -08:00
Richard Henderson	cd26449a50	tcg/i386: Fuly convert tcg_target_op_def Use a switch instead of searching a table. Share constraints between 32-bit and 64-bit, when at all possible. Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-01-10 08:47:48 -08:00
Richard Henderson	0e28d0063b	tcg: Add clz and ctz opcodes Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-01-10 08:06:11 -08:00
Richard Henderson	069ea736b5	tcg: Pass the opcode width to target_parse_constraint This will let us choose how to interpret a given constraint depending on whether the opcode is 32- or 64-bit. Which will let us share more constraint combinations between opcodes. At the same time, change the interface to return the advanced pointer instead of passing it in/out by reference. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-01-10 08:06:11 -08:00
Richard Henderson	f69d277ece	tcg: Transition flat op_defs array to a target callback This will allow the target to tailor the constraints to the auto-detected ISA extensions. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-01-10 08:06:11 -08:00
Richard Henderson	78fdbfb946	tcg/i386: Implement field extraction opcodes Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-01-10 07:59:11 -08:00
Richard Henderson	7ec8bab3de	tcg: Add field extraction primitives Adds tcg_gen_extract_* and tcg_gen_sextract_* for extraction of fixed position bitfields, much like we already have for deposit. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2017-01-10 07:59:11 -08:00
Richard Henderson	ebb90a005d	tcg/i386: Extend TARGET_PAGE_MASK to the proper type TARGET_PAGE_MASK, as defined, has type "int". We need to extend that to the proper target width before oring in an "unsigned". Signed-off-by: Richard Henderson <rth@twiddle.net>	2016-09-20 11:45:30 -07:00
Pranith Kumar	a7d00d4eff	tcg/i386: Add support for fence Generate a 'lock orl $0,0(%esp)' instruction for ordering instead of mfence which has similar ordering semantics. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Message-Id: <20160714202026.9727-3-bobby.prani@gmail.com> Signed-off-by: Richard Henderson <rth@twiddle.net>	2016-09-16 08:12:11 -07:00
Richard Henderson	85aa80813d	tcg: Support arbitrary size + alignment Previously we allowed fully unaligned operations, but not operations that are aligned but with less alignment than the operation size. In addition, arm32, ia64, mips, and sparc had been omitted from the previous overalignment patch, which would have led to that alignment being enforced. Signed-off-by: Richard Henderson <rth@twiddle.net>	2016-09-16 08:12:06 -07:00
Markus Armbruster	14e54f8ecf	tcg: Clean up tcg-target.h header guards These use guard symbols like TCG_TARGET_$target. scripts/clean-header-guards.pl doesn't like them because they don't match their file name (they should, to make guard collisions less likely). Clean them up: use guard symbol $target_TCG_TARGET_H for tcg/$target/tcg-target.h. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net>	2016-07-12 16:19:16 +02:00
Sergey Sorokin	1f00b27f17	tcg: Improve the alignment check infrastructure Some architectures (e.g. ARMv8) need the address which is aligned to a size more than the size of the memory access. To support such check it's enough the current costless alignment check implementation in QEMU, but we need to support an alignment size specifying. Signed-off-by: Sergey Sorokin <afarallax@yandex.ru> Message-Id: <1466705806-679898-1-git-send-email-afarallax@yandex.ru> Signed-off-by: Richard Henderson <rth@twiddle.net> [rth: Assert in tcg_canonicalize_memop. Leave get_alignment_bits available for, though unused by, user-mode. Retain logging difference based on ALIGNED_ONLY.]	2016-07-05 20:50:13 -07:00
Richard Henderson	59d7c14eef	tcg: Optimize spills of constants While we can store constants via constrants on INDEX_op_st_i32 et al, we weren't able to spill constants to backing store. Add a new backend interface, tcg_out_sti, which may store the constant (and is allowed to fail). Rearrange the temp_* helpers so that we only attempt to directly store a constant when the temp is becoming dead/free. Signed-off-by: Richard Henderson <rth@twiddle.net>	2016-07-05 20:50:13 -07:00
Sergey Fedorov	f309101c26	tcg: Clean up direct block chaining data fields Briefly describe in a comment how direct block chaining is done. It should help in understanding of the following data fields. Rename some fields in TranslationBlock and TCGContext structures to better reflect their purpose (dropping excessive 'tb_' prefix in TranslationBlock but keeping it in TCGContext): tb_next_offset => jmp_reset_offset tb_jmp_offset => jmp_insn_offset tb_next => jmp_target_addr jmp_next => jmp_list_next jmp_first => jmp_list_first Avoid using a magic constant as an invalid offset which is used to indicate that there's no n-th jump generated. Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2016-05-12 14:06:41 -10:00
Sergey Fedorov	0d07abf05e	tcg/i386: Make direct jump patching thread-safe Ensure direct jump patching in i386 is atomic by: * naturally aligning a location of direct jump address; * using atomic_read()/atomic_set() for code patching. tcg_out_nopn() implementation: Suggested-by: Richard Henderson <rth@twiddle.net>. Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org> Message-Id: <1461341333-19646-6-git-send-email-sergey.fedorov@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2016-05-12 14:06:41 -10:00
Aurelien Jarno	8d8fdbae01	tcg: check for CONFIG_DEBUG_TCG instead of NDEBUG Check for CONFIG_DEBUG_TCG instead of NDEBUG, drop now useless code. Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Message-id: 1461228530-14852-2-git-send-email-aurelien@aurel32.net Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-04-21 15:43:20 +01:00
Aurelien Jarno	eabb7b91b3	tcg: use tcg_debug_assert instead of assert (fix performance regression) The TCG code is quite performance sensitive, but at the same time can also be quite tricky. That is why asserts that can be enabled with the --enable-debug-tcg configure option. This used to work the following way: \| #include "config.h" \| \| ... \| \| #if !defined(CONFIG_DEBUG_TCG) && !defined(NDEBUG) \| /* define it to suppress various consistency checks (faster) */ \| #define NDEBUG \| #endif \| \| ... \| \| #include <assert.h> Since commit `757e725b` (tcg: Clean up includes) "config.h" as been replaced by "qemu/osdep.h" which itself includes <assert.h>. As a consequence the assertions are always enabled, even when using --disable-debug-tcg, causing a performance regression, especially on targets with many registers. For instance on qemu-system-ppc the speed difference is about 15%. tcg_debug_assert is controlled directly by CONFIG_DEBUG_TCG and already uses in some places. This patch replaces all the calls to assert into calss to tcg_debug_assert. Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Message-id: 1461228530-14852-1-git-send-email-aurelien@aurel32.net Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-04-21 15:41:47 +01:00
Peter Maydell	c3b7f66800	tcg: Remove unnecessary osdep.h includes from tcg-target.inc.c Commit `757e725b58` added a number of #include "qemu/osdep.h" files to the tcg-target.c files (as they were named at the time). These are unnecessary because these files are not standalone C files, and the tcg/tcg.c file which includes them will have already included osdep.h on their behalf. Remove the unneeded include directives. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <1456238983-10160-4-git-send-email-peter.maydell@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2016-02-23 08:31:03 -08:00
Peter Maydell	ce15110981	tcg: Rename tcg-target.c to tcg-target.inc.c Rename the per-architecture tcg-target.c files to tcg-target.inc.c. This makes it clearer that they are not intended to be standalone C files, but are instead #included into another source file. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <1456238983-10160-2-git-send-email-peter.maydell@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2016-02-23 08:30:38 -08:00
Peter Maydell	757e725b58	tcg: Clean up includes Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 1453832250-766-16-git-send-email-peter.maydell@linaro.org	2016-01-29 15:07:23 +00:00
Aurelien Jarno	08b0b23be6	tcg/i386: omit a few REXW prefixes in softmmu code When computing the TLB address we are likely to mask out the high 32-bits by using shr + and. We can use 32-bit instructions in that case. This saves 2 bytes per TLB access. Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Message-Id: <1437306632-20655-1-git-send-email-aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>	2015-09-02 14:24:10 -07:00
Laurent Vivier	b76f21a707	linux-user: remove useless macros GUEST_BASE and RESERVED_VA As we have removed CONFIG_USE_GUEST_BASE, we always use a guest base and the macros GUEST_BASE and RESERVED_VA become useless: replace them by their values. Reviewed-by: Alexander Graf <agraf@suse.de> Signed-off-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <1440420834-8388-1-git-send-email-laurent@vivier.eu> Signed-off-by: Richard Henderson <rth@twiddle.net>	2015-08-24 11:14:30 -07:00
Aurelien Jarno	8cc580f6a0	tcg/i386: use softmmu fast path for unaligned accesses Softmmu unaligned load/stores currently goes through through the slow path for two reasons: - to support unaligned access on host with strict alignement - to correctly handle accesses crossing pages x86 is only concerned by the second reason. Unaligned accesses are avoided by compilers, but are not uncommon. We therefore would like to see them going through the fast path, if they don't cross pages. For that we can use the fact that two adjacent TLB entries can't contain the same page. Therefore accessing the TLB entry corresponding to the first byte, but comparing its content to page address of the last byte ensures that we don't cross pages. We can do this check without adding more instructions in the TLB code (but increasing its length by one byte) by using the LEA instruction to combine the existing move with the size addition. On an x86-64 host, this gives a 3% boot time improvement for a powerpc guest and 4% for an x86-64 guest. [rth: Tidied calculation of the offset mask] Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Message-Id: <1436467197-2183-1-git-send-email-aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>	2015-08-24 11:10:54 -07:00
Richard Henderson	609ad70562	tcg: Split trunc_shr_i32 opcode into extr[lh]_i64_i32 Rather than allow arbitrary shift+trunc, only concern ourselves with low and high parts. This is all that was being used anyway. Signed-off-by: Richard Henderson <rth@twiddle.net>	2015-08-24 11:10:54 -07:00
Aurelien Jarno	4f2331e5b6	tcg: implement real ext_i32_i64 and extu_i32_i64 ops Implement real ext_i32_i64 and extu_i32_i64 ops. They ensure that a 32-bit value is always converted to a 64-bit value and not propagated through the register allocator or the optimizer. Cc: Andrzej Zaborowski <balrogg@gmail.com> Cc: Alexander Graf <agraf@suse.de> Cc: Blue Swirl <blauwirbel@gmail.com> Cc: Stefan Weil <sw@weilnetz.de> Acked-by: Claudio Fontana <claudio.fontana@huawei.com> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>	2015-08-24 11:10:54 -07:00
Aurelien Jarno	0632e555fc	tcg: rename trunc_shr_i32 into trunc_shr_i64_i32 The op is sometimes named trunc_shr_i32 and sometimes trunc_shr_i64_i32, and the name in the README doesn't match the name offered to the frontends. Always use the long name to make it clear it is a size changing op. Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net>	2015-08-24 11:10:54 -07:00
Richard Henderson	ee8ba9e4d8	tcg/i386: Extend addresses for 32-bit guests Removing the ??? comment explaining why it (mostly) worked. Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net> Message-Id: <1437081950-7206-2-git-send-email-rth@twiddle.net>	2015-07-23 15:09:04 -07:00
Richard Henderson	2b7ec66f02	tcg: Mask TCGMemOp appropriately for indexing The addition of MO_AMASK means that places that used inverted masks need to be changed to use positive masks, and places that failed to mask the intended bits need updating. Reviewed-by: Yongbok Kim <yongbok.kim@imgtec.com> Tested-by: Yongbok Kim <yongbok.kim@imgtec.com> Signed-off-by: Richard Henderson <rth@twiddle.net>	2015-06-09 06:35:29 -07:00
Paolo Bonzini	006f8638c6	tcg: add TCG_TARGET_TLB_DISPLACEMENT_BITS This will be used to size the TLB when more than 8 MMU modes are used by the target. Limitations come from the limited size of the immediate fields (which sometimes, as in the case of Aarch64, extend to instructions that shift the immediate). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1424436345-37924-2-git-send-email-pbonzini@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Alexander Graf <agraf@suse.de>	2015-06-03 23:56:56 +02:00
Richard Henderson	3972ef6f83	tcg: Push merged memop+mmu_idx parameter to softmmu routines The extra information is not yet used but it is now available. This requires minor changes through all of the tcg backends. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2015-05-14 12:15:14 -07:00
Richard Henderson	59227d5d45	tcg: Merge memop and mmu_idx parameters to qemu_ld/st At the tcg opcode level, not at the tcg-op.h generator level. This requires minor changes through all of the tcg backends, but none of the cpu translators. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2015-05-14 12:14:55 -07:00
Richard Henderson	bec1631100	tcg: Change generator-side labels to a pointer This is less about improved type checking than enabling a subsequent change to the representation of labels. Acked-by: Claudio Fontana <claudio.fontana@huawei.com> Tested-by: Claudio Fontana <claudio.fontana@huawei.com> Cc: Andrzej Zaborowski <balrogg@gmail.com> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: Blue Swirl <blauwirbel@gmail.com> Cc: Stefan Weil <sw@weilnetz.de> Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de> Signed-off-by: Richard Henderson <rth@twiddle.net>	2015-03-13 12:28:18 -07:00
Richard Henderson	42a268c241	tcg: Change translator-side labels to a pointer This is improved type checking for the translators -- it's no longer possible to accidentally swap arguments to the branch functions. Note that the code generating backends still manipulate labels as int. With notable exceptions, the scope of the change is just a few lines for each target, so it's not worth building extra machinery to do this change in per-target increments. Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Edgar E. Iglesias <edgar.iglesias@gmail.com> Cc: Michael Walle <michael@walle.cc> Cc: Leon Alrae <leon.alrae@imgtec.com> Cc: Anthony Green <green@moxielogic.com> Cc: Jia Liu <proljc@gmail.com> Cc: Alexander Graf <agraf@suse.de> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: Blue Swirl <blauwirbel@gmail.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de> Signed-off-by: Richard Henderson <rth@twiddle.net>	2015-03-13 12:28:18 -07:00
Richard Henderson	3d1b2ff62c	tcg: Remove TCG_TARGET_HAS_new_ldst Since all backends have been converted, remove the compatibility code. Acked-by: Claudio Fontana <claudio.fontana@huawei.com> Signed-off-by: Richard Henderson <rth@twiddle.net>	2014-06-04 14:10:26 -07:00
Richard Henderson	0b91966730	tcg-i386: Fix win64 qemu store The first non-register argument isn't placed at offset 0. Cc: qemu-stable@nongnu.org Reviewed-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Richard Henderson <rth@twiddle.net>	2014-06-04 13:58:39 -07:00
Richard Henderson	e9a9a5b605	tcg-i386: Make debug_frame const Signed-off-by: Richard Henderson <rth@twiddle.net>	2014-05-28 09:33:55 -07:00
Richard Henderson	96d0ee7f09	tcg: Remove unreachable code in tcg_out_op and op_defs The INDEX_op_call case has just been obsoleted; the mov and movi cases have not been reachable for years. Attempt to document this both in each tcg_out_op switch, and via TCG_OPF_NOT_PRESENT. Because of the TCG_OPF_NOT_PRESENT change, this must be done for all targets in a single commit. Signed-off-by: Richard Henderson <rth@twiddle.net>	2014-05-12 11:13:13 -07:00
Richard Henderson	6bf3e99747	tcg-i386: Rename tcg_out_calli to tcg_out_call Signed-off-by: Richard Henderson <rth@twiddle.net>	2014-05-12 11:13:11 -07:00
Richard Henderson	f6bff89d06	tcg-i386: Define TCG_TARGET_INSN_UNIT_SIZE And use tcg pointer differencing functions as appropriate. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2014-05-12 10:03:04 -07:00
Peter Maydell	5c53bb8121	tcg: Avoid undefined behaviour patching code at unaligned addresses To avoid C undefined behaviour when patching generated code, provide wrappers tcg_patch8/16/32/64 which use the usual memcpy trick, and use them in the i386 backend. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2014-05-12 10:03:04 -07:00
Richard Henderson	4bb7a41ed6	tcg: Add INDEX_op_trunc_shr_i32 Let the backend do something special for truncation. Signed-off-by: Richard Henderson <rth@twiddle.net>	2014-04-28 11:06:34 -07:00
Richard Henderson	02eb19d0ec	tcg: Use HOST_WORDS_BIGENDIAN Instead of rolling a local TCG_TARGET_WORDS_BIGENDIAN. Signed-off-by: Richard Henderson <rth@twiddle.net>	2014-04-18 16:57:37 -07:00
Richard Henderson	f6c6afc1d4	tcg: Add TCGType parameter to tcg_target_const_match Most 64-bit targets need to be able to ignore the high bits of a TCG_TYPE_I32 value. Suggested-by: Stuart Brady <sdb@zubnet.me.uk> Signed-off-by: Richard Henderson <rth@twiddle.net>	2014-04-18 16:57:36 -07:00

1 2 3 4 5 ...

298 Commits