320 lines
14 KiB
Plaintext
320 lines
14 KiB
Plaintext
Hexagon is Qualcomm's very long instruction word (VLIW) digital signal
|
|
processor(DSP). We also support Hexagon Vector eXtensions (HVX). HVX
|
|
is a wide vector coprocessor designed for high performance computer vision,
|
|
image processing, machine learning, and other workloads.
|
|
|
|
The following versions of the Hexagon core are supported
|
|
Scalar core: v67
|
|
https://developer.qualcomm.com/downloads/qualcomm-hexagon-v67-programmer-s-reference-manual
|
|
HVX extension: v66
|
|
https://developer.qualcomm.com/downloads/qualcomm-hexagon-v66-hvx-programmer-s-reference-manual
|
|
|
|
We presented an overview of the project at the 2019 KVM Forum.
|
|
https://kvmforum2019.sched.com/event/Tmwc/qemu-hexagon-automatic-translation-of-the-isa-manual-pseudcode-to-tiny-code-instructions-of-a-vliw-architecture-niccolo-izzo-revng-taylor-simpson-qualcomm-innovation-center
|
|
|
|
*** Tour of the code ***
|
|
|
|
The qemu-hexagon implementation is a combination of qemu and the Hexagon
|
|
architecture library (aka archlib). The three primary directories with
|
|
Hexagon-specific code are
|
|
|
|
qemu/target/hexagon
|
|
This has all the instruction and packet semantics
|
|
qemu/target/hexagon/imported
|
|
These files are imported with very little modification from archlib
|
|
*.idef Instruction semantics definition
|
|
macros.def Mapping of macros to instruction attributes
|
|
encode*.def Encoding patterns for each instruction
|
|
iclass.def Instruction class definitions used to determine
|
|
legal VLIW slots for each instruction
|
|
qemu/target/hexagon/idef-parser
|
|
Parser that, given the high-level definitions of an instruction,
|
|
produces a C function generating equivalent tiny code instructions.
|
|
See README.rst.
|
|
qemu/linux-user/hexagon
|
|
Helpers for loading the ELF file and making Linux system calls,
|
|
signals, etc
|
|
|
|
We start with scripts that generate a bunch of include files. This
|
|
is a two step process. The first step is to use the C preprocessor to expand
|
|
macros inside the architecture definition files. This is done in
|
|
target/hexagon/gen_semantics.c. This step produces
|
|
<BUILD_DIR>/target/hexagon/semantics_generated.pyinc.
|
|
That file is consumed by the following python scripts to produce the indicated
|
|
header files in <BUILD_DIR>/target/hexagon
|
|
gen_opcodes_def.py -> opcodes_def_generated.h.inc
|
|
gen_op_regs.py -> op_regs_generated.h.inc
|
|
gen_printinsn.py -> printinsn_generated.h.inc
|
|
gen_op_attribs.py -> op_attribs_generated.h.inc
|
|
gen_helper_protos.py -> helper_protos_generated.h.inc
|
|
gen_shortcode.py -> shortcode_generated.h.inc
|
|
gen_tcg_funcs.py -> tcg_funcs_generated.c.inc
|
|
gen_tcg_func_table.py -> tcg_func_table_generated.c.inc
|
|
gen_helper_funcs.py -> helper_funcs_generated.c.inc
|
|
gen_idef_parser_funcs.py -> idef_parser_input.h
|
|
|
|
Qemu helper functions have 3 parts
|
|
DEF_HELPER declaration indicates the signature of the helper
|
|
gen_helper_<NAME> will generate a TCG call to the helper function
|
|
The helper implementation
|
|
|
|
Here's an example of the A2_add instruction.
|
|
Instruction tag A2_add
|
|
Assembly syntax "Rd32=add(Rs32,Rt32)"
|
|
Instruction semantics "{ RdV=RsV+RtV;}"
|
|
|
|
By convention, the operands are identified by letter
|
|
RdV is the destination register
|
|
RsV, RtV are source registers
|
|
|
|
The generator uses the operand naming conventions (see large comment in
|
|
hex_common.py) to determine the signature of the helper function. Here are the
|
|
results for A2_add
|
|
|
|
helper_protos_generated.h.inc
|
|
DEF_HELPER_3(A2_add, s32, env, s32, s32)
|
|
|
|
tcg_funcs_generated.c.inc
|
|
static void generate_A2_add(
|
|
CPUHexagonState *env,
|
|
DisasContext *ctx,
|
|
Insn *insn,
|
|
Packet *pkt)
|
|
{
|
|
TCGv RdV = tcg_temp_local_new();
|
|
const int RdN = insn->regno[0];
|
|
TCGv RsV = hex_gpr[insn->regno[1]];
|
|
TCGv RtV = hex_gpr[insn->regno[2]];
|
|
gen_helper_A2_add(RdV, cpu_env, RsV, RtV);
|
|
gen_log_reg_write(RdN, RdV);
|
|
ctx_log_reg_write(ctx, RdN);
|
|
tcg_temp_free(RdV);
|
|
}
|
|
|
|
helper_funcs_generated.c.inc
|
|
int32_t HELPER(A2_add)(CPUHexagonState *env, int32_t RsV, int32_t RtV)
|
|
{
|
|
uint32_t slot __attribute__((unused)) = 4;
|
|
int32_t RdV = 0;
|
|
{ RdV=RsV+RtV;}
|
|
return RdV;
|
|
}
|
|
|
|
Note that generate_A2_add updates the disassembly context to be processed
|
|
when the packet commits (see "Packet Semantics" below).
|
|
|
|
The generator checks for fGEN_TCG_<tag> macro. This allows us to generate
|
|
TCG code instead of a call to the helper. If defined, the macro takes 1
|
|
argument.
|
|
C semantics (aka short code)
|
|
|
|
This allows the code generator to override the auto-generated code. In some
|
|
cases this is necessary for correct execution. We can also override for
|
|
faster emulation. For example, calling a helper for add is more expensive
|
|
than generating a TCG add operation.
|
|
|
|
The gen_tcg.h file has any overrides. For example, we could write
|
|
#define fGEN_TCG_A2_add(GENHLPR, SHORTCODE) \
|
|
tcg_gen_add_tl(RdV, RsV, RtV)
|
|
|
|
The instruction semantics C code relies heavily on macros. In cases where the
|
|
C semantics are specified only with macros, we can override the default with
|
|
the short semantics option and #define the macros to generate TCG code. One
|
|
example is L2_loadw_locked:
|
|
Instruction tag L2_loadw_locked
|
|
Assembly syntax "Rd32=memw_locked(Rs32)"
|
|
Instruction semantics "{ fEA_REG(RsV); fLOAD_LOCKED(1,4,u,EA,RdV) }"
|
|
|
|
In gen_tcg.h, we use the shortcode
|
|
#define fGEN_TCG_L2_loadw_locked(SHORTCODE) \
|
|
SHORTCODE
|
|
|
|
There are also cases where we brute force the TCG code generation.
|
|
Instructions with multiple definitions are examples. These require special
|
|
handling because qemu helpers can only return a single value.
|
|
|
|
For HVX vectors, the generator behaves slightly differently. The wide vectors
|
|
won't fit in a TCGv or TCGv_i64, so we pass TCGv_ptr variables to pass the
|
|
address to helper functions. Here's an example for an HVX vector-add-word
|
|
istruction.
|
|
static void generate_V6_vaddw(
|
|
CPUHexagonState *env,
|
|
DisasContext *ctx,
|
|
Insn *insn,
|
|
Packet *pkt)
|
|
{
|
|
const int VdN = insn->regno[0];
|
|
const intptr_t VdV_off =
|
|
ctx_future_vreg_off(ctx, VdN, 1, true);
|
|
TCGv_ptr VdV = tcg_temp_local_new_ptr();
|
|
tcg_gen_addi_ptr(VdV, cpu_env, VdV_off);
|
|
const int VuN = insn->regno[1];
|
|
const intptr_t VuV_off =
|
|
vreg_src_off(ctx, VuN);
|
|
TCGv_ptr VuV = tcg_temp_local_new_ptr();
|
|
const int VvN = insn->regno[2];
|
|
const intptr_t VvV_off =
|
|
vreg_src_off(ctx, VvN);
|
|
TCGv_ptr VvV = tcg_temp_local_new_ptr();
|
|
tcg_gen_addi_ptr(VuV, cpu_env, VuV_off);
|
|
tcg_gen_addi_ptr(VvV, cpu_env, VvV_off);
|
|
TCGv slot = tcg_constant_tl(insn->slot);
|
|
gen_helper_V6_vaddw(cpu_env, VdV, VuV, VvV, slot);
|
|
tcg_temp_free(slot);
|
|
gen_log_vreg_write(ctx, VdV_off, VdN, EXT_DFL, insn->slot, false);
|
|
ctx_log_vreg_write(ctx, VdN, EXT_DFL, false);
|
|
tcg_temp_free_ptr(VdV);
|
|
tcg_temp_free_ptr(VuV);
|
|
tcg_temp_free_ptr(VvV);
|
|
}
|
|
|
|
Notice that we also generate a variable named <operand>_off for each operand of
|
|
the instruction. This makes it easy to override the instruction semantics with
|
|
functions from tcg-op-gvec.h. Here's the override for this instruction.
|
|
#define fGEN_TCG_V6_vaddw(SHORTCODE) \
|
|
tcg_gen_gvec_add(MO_32, VdV_off, VuV_off, VvV_off, \
|
|
sizeof(MMVector), sizeof(MMVector))
|
|
|
|
Finally, we notice that the override doesn't use the TCGv_ptr variables, so
|
|
we don't generate them when an override is present. Here is what we generate
|
|
when the override is present.
|
|
static void generate_V6_vaddw(
|
|
CPUHexagonState *env,
|
|
DisasContext *ctx,
|
|
Insn *insn,
|
|
Packet *pkt)
|
|
{
|
|
const int VdN = insn->regno[0];
|
|
const intptr_t VdV_off =
|
|
ctx_future_vreg_off(ctx, VdN, 1, true);
|
|
const int VuN = insn->regno[1];
|
|
const intptr_t VuV_off =
|
|
vreg_src_off(ctx, VuN);
|
|
const int VvN = insn->regno[2];
|
|
const intptr_t VvV_off =
|
|
vreg_src_off(ctx, VvN);
|
|
fGEN_TCG_V6_vaddw({ fHIDE(int i;) fVFOREACH(32, i) { VdV.w[i] = VuV.w[i] + VvV.w[i] ; } });
|
|
gen_log_vreg_write(ctx, VdV_off, VdN, EXT_DFL, insn->slot, false);
|
|
ctx_log_vreg_write(ctx, VdN, EXT_DFL, false);
|
|
}
|
|
|
|
In addition to instruction semantics, we use a generator to create the decode
|
|
tree. This generation is also a two step process. The first step is to run
|
|
target/hexagon/gen_dectree_import.c to produce
|
|
<BUILD_DIR>/target/hexagon/iset.py
|
|
This file is imported by target/hexagon/dectree.py to produce
|
|
<BUILD_DIR>/target/hexagon/dectree_generated.h.inc
|
|
|
|
*** Key Files ***
|
|
|
|
cpu.h
|
|
|
|
This file contains the definition of the CPUHexagonState struct. It is the
|
|
runtime information for each thread and contains stuff like the GPR and
|
|
predicate registers.
|
|
|
|
macros.h
|
|
mmvec/macros.h
|
|
|
|
The Hexagon arch lib relies heavily on macros for the instruction semantics.
|
|
This is a great advantage for qemu because we can override them for different
|
|
purposes. You will also notice there are sometimes two definitions of a macro.
|
|
The QEMU_GENERATE variable determines whether we want the macro to generate TCG
|
|
code. If QEMU_GENERATE is not defined, we want the macro to generate vanilla
|
|
C code that will work in the helper implementation.
|
|
|
|
translate.c
|
|
|
|
The functions in this file generate TCG code for a translation block. Some
|
|
important functions in this file are
|
|
|
|
gen_start_packet - initialize the data structures for packet semantics
|
|
gen_commit_packet - commit the register writes, stores, etc for a packet
|
|
decode_and_translate_packet - disassemble a packet and generate code
|
|
|
|
genptr.c
|
|
gen_tcg.h
|
|
|
|
These files create a function for each instruction. It is mostly composed of
|
|
fGEN_TCG_<tag> definitions followed by including tcg_funcs_generated.c.inc.
|
|
|
|
op_helper.c
|
|
|
|
This file contains the implementations of all the helpers. There are a few
|
|
general purpose helpers, but most of them are generated by including
|
|
helper_funcs_generated.c.inc. There are also several helpers used for debugging.
|
|
|
|
|
|
*** Packet Semantics ***
|
|
|
|
VLIW packet semantics differ from serial semantics in that all input operands
|
|
are read, then the operations are performed, then all the results are written.
|
|
For exmaple, this packet performs a swap of registers r0 and r1
|
|
{ r0 = r1; r1 = r0 }
|
|
Note that the result is different if the instructions are executed serially.
|
|
|
|
Packet semantics dictate that we defer any changes of state until the entire
|
|
packet is committed. We record the results of each instruction in a side data
|
|
structure, and update the visible processor state when we commit the packet.
|
|
|
|
The data structures are divided between the runtime state and the translation
|
|
context.
|
|
|
|
During the TCG generation (see translate.[ch]), we use the DisasContext to
|
|
track what needs to be done during packet commit. Here are the relevant
|
|
fields
|
|
|
|
reg_log list of registers written
|
|
reg_log_idx index into ctx_reg_log
|
|
pred_log list of predicates written
|
|
pred_log_idx index into ctx_pred_log
|
|
store_width width of stores (indexed by slot)
|
|
|
|
During runtime, the following fields in CPUHexagonState (see cpu.h) are used
|
|
|
|
new_value new value of a given register
|
|
reg_written boolean indicating if register was written
|
|
new_pred_value new value of a predicate register
|
|
pred_written boolean indicating if predicate was written
|
|
mem_log_stores record of the stores (indexed by slot)
|
|
|
|
For Hexagon Vector eXtensions (HVX), the following fields are used
|
|
VRegs Vector registers
|
|
future_VRegs Registers to be stored during packet commit
|
|
tmp_VRegs Temporary registers *not* stored during commit
|
|
VRegs_updated Mask of predicated vector writes
|
|
QRegs Q (vector predicate) registers
|
|
future_QRegs Registers to be stored during packet commit
|
|
QRegs_updated Mask of predicated vector writes
|
|
|
|
*** Debugging ***
|
|
|
|
You can turn on a lot of debugging by changing the HEX_DEBUG macro to 1 in
|
|
internal.h. This will stream a lot of information as it generates TCG and
|
|
executes the code.
|
|
|
|
To track down nasty issues with Hexagon->TCG generation, we compare the
|
|
execution results with actual hardware running on a Hexagon Linux target.
|
|
Run qemu with the "-d cpu" option. Then, we can diff the results and figure
|
|
out where qemu and hardware behave differently.
|
|
|
|
The stacks are located at different locations. We handle this by changing
|
|
env->stack_adjust in translate.c. First, set this to zero and run qemu.
|
|
Then, change env->stack_adjust to the difference between the two stack
|
|
locations. Then rebuild qemu and run again. That will produce a very
|
|
clean diff.
|
|
|
|
Here are some handy places to set breakpoints
|
|
|
|
At the call to gen_start_packet for a given PC (note that the line number
|
|
might change in the future)
|
|
br translate.c:602 if ctx->base.pc_next == 0xdeadbeef
|
|
The helper function for each instruction is named helper_<TAG>, so here's
|
|
an example that will set a breakpoint at the start
|
|
br helper_A2_add
|
|
If you have the HEX_DEBUG macro set, the following will be useful
|
|
At the start of execution of a packet for a given PC
|
|
br helper_debug_start_packet if env->gpr[41] == 0xdeadbeef
|
|
At the end of execution of a packet for a given PC
|
|
br helper_debug_commit_end if env->this_PC == 0xdeadbeef
|