7e62609353
Document the named field syntax that we want to implement for the decodetree script. This allows a field to be defined in terms of some other field that the instruction pattern has already set, for example: %sz_imm 10:3 sz:3 !function=expand_sz_imm to allow a function to be passed both an immediate field from the instruction and also a sz value which might have been specified by the instruction pattern directly (sz=1, etc) rather than being a simple field within the instruction. Note that the restriction on not having the format referring to the pattern and the pattern referring to the format simultaneously is a restriction of the decoder generator rather than inherently being a silly thing to do. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20230523120447.728365-3-peter.maydell@linaro.org>
261 lines
9.8 KiB
ReStructuredText
261 lines
9.8 KiB
ReStructuredText
========================
|
|
Decodetree Specification
|
|
========================
|
|
|
|
A *decodetree* is built from instruction *patterns*. A pattern may
|
|
represent a single architectural instruction or a group of same, depending
|
|
on what is convenient for further processing.
|
|
|
|
Each pattern has both *fixedbits* and *fixedmask*, the combination of which
|
|
describes the condition under which the pattern is matched::
|
|
|
|
(insn & fixedmask) == fixedbits
|
|
|
|
Each pattern may have *fields*, which are extracted from the insn and
|
|
passed along to the translator. Examples of such are registers,
|
|
immediates, and sub-opcodes.
|
|
|
|
In support of patterns, one may declare *fields*, *argument sets*, and
|
|
*formats*, each of which may be re-used to simplify further definitions.
|
|
|
|
Fields
|
|
======
|
|
|
|
Syntax::
|
|
|
|
field_def := '%' identifier ( field )* ( !function=identifier )?
|
|
field := unnamed_field | named_field
|
|
unnamed_field := number ':' ( 's' ) number
|
|
named_field := identifier ':' ( 's' ) number
|
|
|
|
For *unnamed_field*, the first number is the least-significant bit position
|
|
of the field and the second number is the length of the field. If the 's' is
|
|
present, the field is considered signed.
|
|
|
|
A *named_field* refers to some other field in the instruction pattern
|
|
or format. Regardless of the length of the other field where it is
|
|
defined, it will be inserted into this field with the specified
|
|
signedness and bit width.
|
|
|
|
Field definitions that involve loops (i.e. where a field is defined
|
|
directly or indirectly in terms of itself) are errors.
|
|
|
|
A format can include fields that refer to named fields that are
|
|
defined in the instruction pattern(s) that use the format.
|
|
Conversely, an instruction pattern can include fields that refer to
|
|
named fields that are defined in the format it uses. However you
|
|
cannot currently do both at once (i.e. pattern P uses format F; F has
|
|
a field A that refers to a named field B that is defined in P, and P
|
|
has a field C that refers to a named field D that is defined in F).
|
|
|
|
If multiple ``fields`` are present, they are concatenated.
|
|
In this way one can define disjoint fields.
|
|
|
|
If ``!function`` is specified, the concatenated result is passed through the
|
|
named function, taking and returning an integral value.
|
|
|
|
One may use ``!function`` with zero ``fields``. This case is called
|
|
a *parameter*, and the named function is only passed the ``DisasContext``
|
|
and returns an integral value extracted from there.
|
|
|
|
A field with no ``fields`` and no ``!function`` is in error.
|
|
|
|
Field examples:
|
|
|
|
+---------------------------+---------------------------------------------+
|
|
| Input | Generated code |
|
|
+===========================+=============================================+
|
|
| %disp 0:s16 | sextract(i, 0, 16) |
|
|
+---------------------------+---------------------------------------------+
|
|
| %imm9 16:6 10:3 | extract(i, 16, 6) << 3 | extract(i, 10, 3) |
|
|
+---------------------------+---------------------------------------------+
|
|
| %disp12 0:s1 1:1 2:10 | sextract(i, 0, 1) << 11 | |
|
|
| | extract(i, 1, 1) << 10 | |
|
|
| | extract(i, 2, 10) |
|
|
+---------------------------+---------------------------------------------+
|
|
| %shimm8 5:s8 13:1 | expand_shimm8(sextract(i, 5, 8) << 1 | |
|
|
| !function=expand_shimm8 | extract(i, 13, 1)) |
|
|
+---------------------------+---------------------------------------------+
|
|
| %sz_imm 10:2 sz:3 | expand_sz_imm(extract(i, 10, 2) << 3 | |
|
|
| !function=expand_sz_imm | extract(a->sz, 0, 3)) |
|
|
+---------------------------+---------------------------------------------+
|
|
|
|
Argument Sets
|
|
=============
|
|
|
|
Syntax::
|
|
|
|
args_def := '&' identifier ( args_elt )+ ( !extern )?
|
|
args_elt := identifier (':' identifier)?
|
|
|
|
Each *args_elt* defines an argument within the argument set.
|
|
If the form of the *args_elt* contains a colon, the first
|
|
identifier is the argument name and the second identifier is
|
|
the argument type. If the colon is missing, the argument
|
|
type will be ``int``.
|
|
|
|
Each argument set will be rendered as a C structure "arg_$name"
|
|
with each of the fields being one of the member arguments.
|
|
|
|
If ``!extern`` is specified, the backing structure is assumed
|
|
to have been already declared, typically via a second decoder.
|
|
|
|
Argument sets are useful when one wants to define helper functions
|
|
for the translator functions that can perform operations on a common
|
|
set of arguments. This can ensure, for instance, that the ``AND``
|
|
pattern and the ``OR`` pattern put their operands into the same named
|
|
structure, so that a common ``gen_logic_insn`` may be able to handle
|
|
the operations common between the two.
|
|
|
|
Argument set examples::
|
|
|
|
®3 ra rb rc
|
|
&loadstore reg base offset
|
|
&longldst reg base offset:int64_t
|
|
|
|
|
|
Formats
|
|
=======
|
|
|
|
Syntax::
|
|
|
|
fmt_def := '@' identifier ( fmt_elt )+
|
|
fmt_elt := fixedbit_elt | field_elt | field_ref | args_ref
|
|
fixedbit_elt := [01.-]+
|
|
field_elt := identifier ':' 's'? number
|
|
field_ref := '%' identifier | identifier '=' '%' identifier
|
|
args_ref := '&' identifier
|
|
|
|
Defining a format is a handy way to avoid replicating groups of fields
|
|
across many instruction patterns.
|
|
|
|
A *fixedbit_elt* describes a contiguous sequence of bits that must
|
|
be 1, 0, or don't care. The difference between '.' and '-'
|
|
is that '.' means that the bit will be covered with a field or a
|
|
final 0 or 1 from the pattern, and '-' means that the bit is really
|
|
ignored by the cpu and will not be specified.
|
|
|
|
A *field_elt* describes a simple field only given a width; the position of
|
|
the field is implied by its position with respect to other *fixedbit_elt*
|
|
and *field_elt*.
|
|
|
|
If any *fixedbit_elt* or *field_elt* appear, then all bits must be defined.
|
|
Padding with a *fixedbit_elt* of all '.' is an easy way to accomplish that.
|
|
|
|
A *field_ref* incorporates a field by reference. This is the only way to
|
|
add a complex field to a format. A field may be renamed in the process
|
|
via assignment to another identifier. This is intended to allow the
|
|
same argument set be used with disjoint named fields.
|
|
|
|
A single *args_ref* may specify an argument set to use for the format.
|
|
The set of fields in the format must be a subset of the arguments in
|
|
the argument set. If an argument set is not specified, one will be
|
|
inferred from the set of fields.
|
|
|
|
It is recommended, but not required, that all *field_ref* and *args_ref*
|
|
appear at the end of the line, not interleaving with *fixedbit_elf* or
|
|
*field_elt*.
|
|
|
|
Format examples::
|
|
|
|
@opr ...... ra:5 rb:5 ... 0 ....... rc:5
|
|
@opi ...... ra:5 lit:8 1 ....... rc:5
|
|
|
|
Patterns
|
|
========
|
|
|
|
Syntax::
|
|
|
|
pat_def := identifier ( pat_elt )+
|
|
pat_elt := fixedbit_elt | field_elt | field_ref | args_ref | fmt_ref | const_elt
|
|
fmt_ref := '@' identifier
|
|
const_elt := identifier '=' number
|
|
|
|
The *fixedbit_elt* and *field_elt* specifiers are unchanged from formats.
|
|
A pattern that does not specify a named format will have one inferred
|
|
from a referenced argument set (if present) and the set of fields.
|
|
|
|
A *const_elt* allows a argument to be set to a constant value. This may
|
|
come in handy when fields overlap between patterns and one has to
|
|
include the values in the *fixedbit_elt* instead.
|
|
|
|
The decoder will call a translator function for each pattern matched.
|
|
|
|
Pattern examples::
|
|
|
|
addl_r 010000 ..... ..... .... 0000000 ..... @opr
|
|
addl_i 010000 ..... ..... .... 0000000 ..... @opi
|
|
|
|
which will, in part, invoke::
|
|
|
|
trans_addl_r(ctx, &arg_opr, insn)
|
|
|
|
and::
|
|
|
|
trans_addl_i(ctx, &arg_opi, insn)
|
|
|
|
Pattern Groups
|
|
==============
|
|
|
|
Syntax::
|
|
|
|
group := overlap_group | no_overlap_group
|
|
overlap_group := '{' ( pat_def | group )+ '}'
|
|
no_overlap_group := '[' ( pat_def | group )+ ']'
|
|
|
|
A *group* begins with a lone open-brace or open-bracket, with all
|
|
subsequent lines indented two spaces, and ending with a lone
|
|
close-brace or close-bracket. Groups may be nested, increasing the
|
|
required indentation of the lines within the nested group to two
|
|
spaces per nesting level.
|
|
|
|
Patterns within overlap groups are allowed to overlap. Conflicts are
|
|
resolved by selecting the patterns in order. If all of the fixedbits
|
|
for a pattern match, its translate function will be called. If the
|
|
translate function returns false, then subsequent patterns within the
|
|
group will be matched.
|
|
|
|
Patterns within no-overlap groups are not allowed to overlap, just
|
|
the same as ungrouped patterns. Thus no-overlap groups are intended
|
|
to be nested inside overlap groups.
|
|
|
|
The following example from PA-RISC shows specialization of the *or*
|
|
instruction::
|
|
|
|
{
|
|
{
|
|
nop 000010 ----- ----- 0000 001001 0 00000
|
|
copy 000010 00000 r1:5 0000 001001 0 rt:5
|
|
}
|
|
or 000010 rt2:5 r1:5 cf:4 001001 0 rt:5
|
|
}
|
|
|
|
When the *cf* field is zero, the instruction has no side effects,
|
|
and may be specialized. When the *rt* field is zero, the output
|
|
is discarded and so the instruction has no effect. When the *rt2*
|
|
field is zero, the operation is ``reg[r1] | 0`` and so encodes
|
|
the canonical register copy operation.
|
|
|
|
The output from the generator might look like::
|
|
|
|
switch (insn & 0xfc000fe0) {
|
|
case 0x08000240:
|
|
/* 000010.. ........ ....0010 010..... */
|
|
if ((insn & 0x0000f000) == 0x00000000) {
|
|
/* 000010.. ........ 00000010 010..... */
|
|
if ((insn & 0x0000001f) == 0x00000000) {
|
|
/* 000010.. ........ 00000010 01000000 */
|
|
extract_decode_Fmt_0(&u.f_decode0, insn);
|
|
if (trans_nop(ctx, &u.f_decode0)) return true;
|
|
}
|
|
if ((insn & 0x03e00000) == 0x00000000) {
|
|
/* 00001000 000..... 00000010 010..... */
|
|
extract_decode_Fmt_1(&u.f_decode1, insn);
|
|
if (trans_copy(ctx, &u.f_decode1)) return true;
|
|
}
|
|
}
|
|
extract_decode_Fmt_2(&u.f_decode2, insn);
|
|
if (trans_or(ctx, &u.f_decode2)) return true;
|
|
return false;
|
|
}
|