rs6000.c (analyze_swaps commentary): Add discussion of permutes and why we don't handle them.

2014-10-06  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/rs6000.c (analyze_swaps commentary): Add
	discussion of permutes and why we don't handle them.

From-SVN: r215951
This commit is contained in:
Bill Schmidt 2014-10-06 15:27:32 +00:00 committed by William Schmidt
parent 63b9f71bb3
commit cec5d8be55
2 changed files with 52 additions and 0 deletions

View File

@ -1,3 +1,8 @@
2014-10-06 Bill Schmidt <wschmidt@linux.vnet.ibm.com>
* config/rs6000/rs6000.c (analyze_swaps commentary): Add
discussion of permutes and why we don't handle them.
2014-10-06 Eric Botcazou <ebotcazou@adacore.com>
* config/sparc/predicates.md (int_register_operand): Delete.

View File

@ -33431,6 +33431,53 @@ emit_fusion_gpr_load (rtx target, rtx mem)
than deleting a swap, we convert the load/store into a permuting
load/store (which effectively removes the swap). */
/* Notes on Permutes
We do not currently handle computations that contain permutes. There
is a general transformation that can be performed correctly, but it
may introduce more expensive code than it replaces. To handle these
would require a cost model to determine when to perform the optimization.
This commentary records how this could be done if desired.
The most general permute is something like this (example for V16QI):
(vec_select:V16QI (vec_concat:V32QI (op1:V16QI) (op2:V16QI))
(parallel [(const_int a0) (const_int a1)
...
(const_int a14) (const_int a15)]))
where a0,...,a15 are in [0,31] and select elements from op1 and op2
to produce in the result.
Regardless of mode, we can convert the PARALLEL to a mask of 16
byte-element selectors. Let's call this M, with M[i] representing
the ith byte-element selector value. Then if we swap doublewords
throughout the computation, we can get correct behavior by replacing
M with M' as follows:
{ M[i+8]+8 : i < 8, M[i+8] in [0,7] U [16,23]
M'[i] = { M[i+8]-8 : i < 8, M[i+8] in [8,15] U [24,31]
{ M[i-8]+8 : i >= 8, M[i-8] in [0,7] U [16,23]
{ M[i-8]-8 : i >= 8, M[i-8] in [8,15] U [24,31]
This seems promising at first, since we are just replacing one mask
with another. But certain masks are preferable to others. If M
is a mask that matches a vmrghh pattern, for example, M' certainly
will not. Instead of a single vmrghh, we would generate a load of
M' and a vperm. So we would need to know how many xxswapd's we can
remove as a result of this transformation to determine if it's
profitable; and preferably the logic would need to be aware of all
the special preferable masks.
Another form of permute is an UNSPEC_VPERM, in which the mask is
already in a register. In some cases, this mask may be a constant
that we can discover with ud-chains, in which case the above
transformation is ok. However, the common usage here is for the
mask to be produced by an UNSPEC_LVSL, in which case the mask
cannot be known at compile time. In such a case we would have to
generate several instructions to compute M' as above at run time,
and a cost model is needed again. */
/* This is based on the union-find logic in web.c. web_entry_base is
defined in df.h. */
class swap_web_entry : public web_entry_base