git-svn-id: svn://svn.savannah.nongnu.org/qemu/trunk@4581 c046a42c-6fe2-441c-8c8c-71466251a162
This commit is contained in:
bellard 2008-05-25 18:24:40 +00:00
parent b314f2706b
commit 0a6b7b7813
2 changed files with 64 additions and 86 deletions

View File

@ -16,14 +16,18 @@ from the host, although it is never the case for QEMU.
A TCG "function" corresponds to a QEMU Translated Block (TB).
A TCG "temporary" is a variable only live in a given
function. Temporaries are allocated explicitly in each function.
A TCG "temporary" is a variable only live in a basic
block. Temporaries are allocated explicitly in each function.
A TCG "global" is a variable which is live in all the functions. They
are defined before the functions defined. A TCG global can be a memory
location (e.g. a QEMU CPU register), a fixed host register (e.g. the
QEMU CPU state pointer) or a memory location which is stored in a
register outside QEMU TBs (not implemented yet).
A TCG "local temporary" is a variable only live in a function. Local
temporaries are allocated explicitly in each function.
A TCG "global" is a variable which is live in all the functions
(equivalent of a C global variable). They are defined before the
functions defined. A TCG global can be a memory location (e.g. a QEMU
CPU register), a fixed host register (e.g. the QEMU CPU state pointer)
or a memory location which is stored in a register outside QEMU TBs
(not implemented yet).
A TCG "basic block" corresponds to a list of instructions terminated
by a branch instruction.
@ -32,11 +36,11 @@ by a branch instruction.
3.1) Introduction
TCG instructions operate on variables which are temporaries or
globals. TCG instructions and variables are strongly typed. Two types
are supported: 32 bit integers and 64 bit integers. Pointers are
defined as an alias to 32 bit or 64 bit integers depending on the TCG
target word size.
TCG instructions operate on variables which are temporaries, local
temporaries or globals. TCG instructions and variables are strongly
typed. Two types are supported: 32 bit integers and 64 bit
integers. Pointers are defined as an alias to 32 bit or 64 bit
integers depending on the TCG target word size.
Each instruction has a fixed number of output variable operands, input
variable operands and always constant operands.
@ -44,14 +48,12 @@ variable operands and always constant operands.
The notable exception is the call instruction which has a variable
number of outputs and inputs.
In the textual form, output operands come first, followed by input
operands, followed by constant operands. The output type is included
in the instruction name. Constants are prefixed with a '$'.
In the textual form, output operands usually come first, followed by
input operands, followed by constant operands. The output type is
included in the instruction name. Constants are prefixed with a '$'.
add_i32 t0, t1, t2 (t0 <- t1 + t2)
sub_i64 t2, t3, $4 (t2 <- t3 - 4)
3.2) Assumptions
* Basic blocks
@ -62,9 +64,8 @@ sub_i64 t2, t3, $4 (t2 <- t3 - 4)
- Basic blocks start after the end of a previous basic block, at a
set_label instruction or after a legacy dyngen operation.
After the end of a basic block, temporaries at destroyed and globals
are stored at their initial storage (register or memory place
depending on their declarations).
After the end of a basic block, the content of temporaries is
destroyed, but local temporaries and globals are preserved.
* Floating point types are not supported yet
@ -100,7 +101,7 @@ optimizations:
is suppressed.
- A liveness analysis is done at the basic block level. The
information is used to suppress moves from a dead temporary to
information is used to suppress moves from a dead variable to
another one. It is also used to remove instructions which compute
dead results. The later is especially useful for condition code
optimization in QEMU.
@ -113,47 +114,6 @@ optimizations:
only the last instruction is kept.
- A macro system is supported (may get closer to function inlining
some day). It is useful if the liveness analysis is likely to prove
that some results of a computation are indeed not useful. With the
macro system, the user can provide several alternative
implementations which are used depending on the used results. It is
especially useful for condition code optimization in QEMU.
Here is an example:
macro_2 t0, t1, $1
mov_i32 t0, $0x1234
The macro identified by the ID "$1" normally returns the values t0
and t1. Suppose its implementation is:
macro_start
brcond_i32 t2, $0, $TCG_COND_EQ, $1
mov_i32 t0, $2
br $2
set_label $1
mov_i32 t0, $3
set_label $2
add_i32 t1, t3, t4
macro_end
If t0 is not used after the macro, the user can provide a simpler
implementation:
macro_start
add_i32 t1, t2, t4
macro_end
TCG automatically chooses the right implementation depending on
which macro outputs are used after it.
Note that if TCG did more expensive optimizations, macros would be
less useful. In the previous example a macro is useful because the
liveness analysis is done on each basic block separately. Hence TCG
cannot remove the code computing 't0' even if it is not used after
the first macro implementation.
3.4) Instruction Reference
********* Function call
@ -241,6 +201,10 @@ t0=t1|t2
t0=t1^t2
* not_i32/i64 t0, t1
t0=~t1
********* Shifts
* shl_i32/i64 t0, t1, t2
@ -428,3 +392,34 @@ to apply more optimizations because more registers will be free for
the generated code.
The exception model is the same as the dyngen one.
6) Recommended coding rules for best performance
- Use globals to represent the parts of the QEMU CPU state which are
often modified, e.g. the integer registers and the condition
codes. TCG will be able to use host registers to store them.
- Avoid globals stored in fixed registers. They must be used only to
store the pointer to the CPU state and possibly to store a pointer
to a register window. The other uses are to ensure backward
compatibility with dyngen during the porting a new target to TCG.
- Use temporaries. Use local temporaries only when really needed,
e.g. when you need to use a value after a jump. Local temporaries
introduce a performance hit in the current TCG implementation: their
content is saved to memory at end of each basic block.
- Free temporaries and local temporaries when they are no longer used
(tcg_temp_free). Since tcg_const_x() also creates a temporary, you
should free it after it is used. Freeing temporaries does not yield
a better generated code, but it reduces the memory usage of TCG and
the speed of the translation.
- Don't hesitate to use helpers for complicated or seldom used target
intructions. There is little performance advantage in using TCG to
implement target instructions taking more than about twenty TCG
instructions.
- Use the 'discard' instruction if you know that TCG won't be able to
prove that a given global is "dead" at a given program point. The
x86 target uses it to improve the condition codes optimisation.

View File

@ -1,32 +1,15 @@
- test macro system
- Add new instructions such as: andnot, ror, rol, setcond, clz, ctz,
popcnt.
- test conditional jumps
- See if it is worth exporting mul2, mulu2, div2, divu2.
- test mul, div, ext8s, ext16s, bswap
- generate a global TB prologue and epilogue to save/restore registers
to/from the CPU state and to reserve a stack frame to optimize
helper calls. Modify cpu-exec.c so that it does not use global
register variables (except maybe for 'env').
- fully convert the x86 target. The minimal amount of work includes:
- add cc_src, cc_dst and cc_op as globals
- disable its eflags optimization (the liveness analysis should
suffice)
- move complicated operations to helpers (in particular FPU, SSE, MMX).
- optimize the x86 target:
- move some or all the registers as globals
- use the TB prologue and epilogue to have QEMU target registers in
pre assigned host registers.
- Support of globals saved in fixed registers between TBs.
Ideas:
- Move the slow part of the qemu_ld/st ops after the end of the TB.
- Experiment: change instruction storage to simplify macro handling
and to handle dynamic allocation and see if the translation speed is
OK.
- change exception syntax to get closer to QOP system (exception
- Change exception syntax to get closer to QOP system (exception
parameters given with a specific instruction).
- Add float and vector support.