Support for the SPARC M8 cpu.

This patch serie adds support for the SPARC M8 processor to GCC. The SPARC M8 processor implements the Oracle SPARC Architecture 2017. - bmask* instructions are put in their own instruction type. It makes little sense to have them in the same category than array instructions. - Similarly, VIS compare instructions are put in their own instruction type. This is to better accommodate subtypes, which are not quite the same than the subtypes of `visl' instructions. - The introduction of a new `subtype' insn attribute in sparc.md avoids the need for adjusting the instruction scheduler DFAs for previous cpu models every time a new cpu is introduced. - The full set of SPARC instructions used in sparc.md, and their position in the type/subtype hierarchy, is documented in a comment. This eases the modification of the DFA schedulers, and the addition of new cpus. - The M7 DFA scheduler is reworked: + To use the new type/subtype hierarchy. + The v3pipe insn attribute is no longer needed. + More accurate latencies for instructions. + The S4 core pipeline is documented in a comment in niagara7.md. - Support for -mcpu=m8 (we are thus suggesting to abandon the niagaraN denomination for M8 and later processors.) - Support for a new VIS level, VIS4B, covering the new VIS instructions introduced in OSA2017 and implemented in the M8. Also built-ins. - A M8 DFA scheduler: + Also based on the new type/subtype hierarchy. + The functional units in the S5 core are explicitly documented in a comment in m8.md. gcc/ChangeLog: * config/sparc/m8.md: New file. * config/sparc/sparc.md: Include m8.md. * config/sparc/sparc.opt: New option -mvis4b. * config/sparc/sparc.c (dump_target_flag_bits): Handle MASK_VIS4B. (sparc_option_override): Handle VIS4B. (enum sparc_builtins): Define SPARC_BUILTIN_DICTUNPACK{8,16,32}, SPARC_BUILTIN_FPCMP{LE,GT,EQ,NE}{8,16,32}SHL, SPARC_BUILTIN_FPCMPU{LE,GT}{8,16,32}SHL, SPARC_BUILTIN_FPCMPDE{8,16,32}SHL and SPARC_BUILTIN_FPCMPUR{8,16,32}SHL. (check_constant_argument): New function. (sparc_vis_init_builtins): Define builtins __builtin_vis_dictunpack{8,16,32}, __builtin_vis_fpcmp{le,gt,eq,ne}{8,16,32}shl, __builtin_vis_fpcmpu{le,gt}{8,16,32}shl, __builtin_vis_fpcmpde{8,16,32}shl and __builtin_vis_fpcmpur{8,16,32}shl. (sparc_expand_builtin): Check that the constant operands to __builtin_vis_fpcmp*shl and _builtin_vis_dictunpack* are indeed constant and in range. * config/sparc/sparc-c.c (sparc_target_macros): Handle TARGET_VIS4B. * config/sparc/sparc.h (SPARC_IMM2_P): Define. (SPARC_IMM5_P): Likewise. * config/sparc/sparc.md (cpu_feature): Add new feagure "vis4b". (enabled): Handle vis4b. (UNSPEC_DICTUNPACK): New unspec. (UNSPEC_FPCMPSHL): Likewise. (UNSPEC_FPUCMPSHL): Likewise. (UNSPEC_FPCMPDESHL): Likewise. (UNSPEC_FPCMPURSHL): Likewise. (cpu_feature): New CPU feature `vis4b'. (dictunpack{8,16,32}): New insns. (FPCSMODE): New mode iterator. (fpcscond): New code iterator. (fpcsucond): Likewise. (fpcmp{le,gt,eq,ne}{8,16,32}{si,di}shl): New insns. (fpcmpu{le,gt}{8,16,32}{si,di}shl): Likewise. (fpcmpde{8,16,32}{si,di}shl): Likewise. (fpcmpur{8,16,32}{si,di}shl): Likewise. * config/sparc/constraints.md: Define constraints `q' for unsigned 2-bit integer constants and `t' for unsigned 5-bit integer constants. * config/sparc/predicates.md (imm5_operand_dictunpack8): New predicate. (imm5_operand_dictunpack16): Likewise. (imm5_operand_dictunpack32): Likewise. (imm2_operand): Likewise. * doc/invoke.texi (SPARC Options): Document -mvis4b. * doc/extend.texi (SPARC VIS Built-in Functions): Document the ditunpack* and fpcmp*shl builtins. * config.gcc: Handle m8 in --with-{cpu,tune} options. * config.in: Add HAVE_AS_SPARC6 define. * config/sparc/driver-sparc.c (cpu_names): Add entry for the SPARC M8. * config/sparc/sol2.h (CPP_CPU64_DEFAULT_SPEC): Define for TARGET_CPU_m8. (ASM_CPU32_DEFAUILT_SPEC): Likewise. (CPP_CPU_SPEC): Handle m8. (ASM_CPU_SPEC): Likewise. * config/sparc/sparc-opts.h (enum processor_type): Add PROCESSOR_M8. * config/sparc/sparc.c (m8_costs): New struct. (sparc_option_override): Handle TARGET_CPU_m8. (sparc32_initialize_trampoline): Likewise. (sparc64_initialize_trampoline): Likewise. (sparc_issue_rate): Likewise. (sparc_register_move_cost): Likewise. * config/sparc/sparc.h (TARGET_CPU_m8): Define. (CPP_CPU64_DEFAULT_SPEC): Define for M8. (ASM_CPU64_DEFAULT_SPEC): Likewise. (CPP_CPU_SPEC): Handle M8. (ASM_CPU_SPEC): Likewise. (AS_M8_FLAG): Define. * config/sparc/sparc.md: Add m8 to the cpu attribute. * config/sparc/sparc.opt: New option -mcpu=m8 for sparc targets. * configure.ac (HAVE_AS_SPARC6): Check for assembler support for M8 instructions. * configure: Regenerate. * doc/invoke.texi (SPARC Options): Document -mcpu=m8 and -mtune=m8. * config/sparc/niagara7.md: Rework the DFA scheduler to use insn subtypes. * config/sparc/sparc.md: Remove the `v3pipe' insn attribute. ("*movdi_insn_sp32"): Do not set v3pipe. ("*movsi_insn"): Likewise. ("*movdi_insn_sp64"): Likewise. ("*movsf_insn"): Likewise. ("*movdf_insn_sp32"): Likewise. ("*movdf_insn_sp64"): Likewise. ("*zero_extendsidi2_insn_sp64"): Likewise. ("*sign_extendsidi2_insn"): Likewise. ("*mov<VM32:mode>_insn"): Likewise. ("*mov<VM64:mode>_insn_sp64"): Likewise. ("*mov<VM64:mode>_insn_sp32"): Likewise. ("<plusminus_insn><VADDSUB:mode>3"): Likewise. ("<vlop:code><VL:mode>3"): Likewise. ("*not_<vlop:code><VL:mode>3"): Likewise. ("*nand<VL:mode>_vis"): Likewise. ("*<vlnotop:code>_not1<VL:mode>_vis"): Likewise. ("*<vlnotop:code>_not2<VL:mode>_vis"): Likewise. ("one_cmpl<VL:mode>2"): Likewise. ("faligndata<VM64:mode>_vis"): Likewise. ("alignaddrsi_vis"): Likewise. ("alignaddrdi_vis"): Likweise. ("alignaddrlsi_vis"): Likewise. ("alignaddrldi_vis"): Likewise. ("fcmp<gcond:code><GCM:gcm_name><P:mode>_vis"): Likewise. ("bmaskdi_vis"): Likewise. ("bmasksi_vis"): Likewise. ("bshuffle<VM64:mode>_vis"): Likewise. ("cmask8<P:mode>_vis"): Likewise. ("cmask16<P:mode>_vis"): Likewise. ("cmask32<P:mode>_vis"): Likewise. ("pdistn<P:mode>_vis"): Likewise. ("<vis3_addsub_ss_patname><VASS:mode>3"): Likewise. * config/sparc/sparc.md ("subtype"): New insn attribute. ("*wrgsr_sp64"): Set insn subtype. ("*rdgsr_sp64"): Likewise. ("alignaddrsi_vis"): Likewise. ("alignaddrdi_vis"): Likewise. ("alignaddrlsi_vis"): Likewise. ("alignaddrldi_vis"): Likewise. ("<plusminus_insn><VADDSUB:mode>3"): Likewise. ("fexpand_vis"): Likewise. ("fpmerge_vis"): Likewise. ("faligndata<VM64:mode>_vis"): Likewise. ("bshuffle<VM64:mode>_vis"): Likewise. ("cmask8<P:mode>_vis"): Likewise. ("cmask16<P:mode>_vis"): Likewise. ("cmask32<P:mode>_vis"): Likewise. ("fchksm16_vis"): Likewise. ("v<vis3_shift_patname><GCM:mode>3"): Likewise. ("fmean16_vis"): Likewise. ("fp<plusminus_insn>64_vis"): Likewise. ("<plusminus_insn>v8qi3"): Likewise. ("<vis3_addsub_ss_patname><VASS:mode>3"): Likewise. ("<vis4_minmax_patname><VMMAX:mode>3"): Likewise. ("<vis4_uminmax_patname><VMMAX:mode>3"): Likewise. ("<vis3_addsub_ss_patname>v8qi3"): Likewise. ("<vis4_addsub_us_patname><VAUS:mode>3"): Likewise. ("*movqi_insn"): Likewise. ("*movhi_insn"): Likewise. ("*movsi_insn"): Likewise. ("movsi_pic_gotdata_op"): Likewise. ("*movdi_insn_sp32"): Likewise. ("*movdi_insn_sp64"): Likewise. ("movdi_pic_gotdata_op"): Likewise. ("*movsf_insn"): Likewise. ("*movdf_insn_sp32"): Likewise. ("*movdf_insn_sp64"): Likewise. ("*zero_extendhisi2_insn"): Likewise. ("*zero_extendqihi2_insn"): Likewise. ("*zero_extendqisi2_insn"): Likewise. ("*zero_extendqidi2_insn"): Likewise. ("*zero_extendhidi2_insn"): Likewise. ("*zero_extendsidi2_insn_sp64"): Likewise. ("ldfsr"): Likewise. ("prefetch_64"): Likewise. ("prefetch_32"): Likewise. ("tie_ld32"): Likewise. ("tie_ld64"): Likewise. ("*tldo_ldub_sp32"): Likewise. ("*tldo_ldub1_sp32"): Likewise. ("*tldo_ldub2_sp32"): Likewise. ("*tldo_ldub_sp64"): Likewise. ("*tldo_ldub1_sp64"): Likewise. ("*tldo_ldub2_sp64"): Likewise. ("*tldo_ldub3_sp64"): Likewise. ("*tldo_lduh_sp32"): Likewise. ("*tldo_lduh1_sp32"): Likewise. ("*tldo_lduh_sp64"): Likewise. ("*tldo_lduh1_sp64"): Likewise. ("*tldo_lduh2_sp64"): Likewise. ("*tldo_lduw_sp32"): Likewise. ("*tldo_lduw_sp64"): Likewise. ("*tldo_lduw1_sp64"): Likewise. ("*tldo_ldx_sp64"): Likewise. ("*mov<VM32:mode>_insn"): Likewise. ("*mov<VM64:mode>_insn_sp64"): Likewise. ("*mov<VM64:mode>_insn_sp32"): Likewise. * config/sparc/sparc.md ("type"): New insn type viscmp. ("fcmp<gcond:code><GCM:gcm_name><P:mode>_vis"): Set insn type to viscmp. ("fpcmp<gcond:code>8<P:mode>_vis"): Likewise. ("fucmp<gcond:code>8<P:mode>_vis"): Likewise. ("fpcmpu<gcond:code><GCM:gcm_name><P:mode>_vis"): Likewise. * config/sparc/niagara7.md ("n7_vis_logical_v3pipe"): Handle viscmp. ("n7_vis_logical_11cycle"): Likewise. * config/sparc/niagara4.md ("n4_vis_logical"): Likewise. * config/sparc/niagara2.md ("niag3_vis": Likewise. * config/sparc/niagara.md ("niag_vis"): Likewise. * config/sparc/ultra3.md ("us3_fga"): Likewise. * config/sparc/ultra1_2.md ("us1_fga_double"): Likewise. * config/sparc/sparc.md: New instruction type `bmask'. (bmaskdi_vis): Use the `bmask' type. (bmasksi_vis): Likewise. * config/sparc/ultra3.md (us3_array): Likewise. * config/sparc/niagara7.md (n7_array): Likewise. * config/sparc/niagara4.md (n4_array): Likewise. * config/sparc/niagara2.md (niag2_vis): Likewise. (niag3_vis): Likewise. * config/sparc/niagara.md (niag_vis): Likewise. gcc/testsuite/ChangeLog: * gcc.target/sparc/dictunpack.c: New file. * gcc.target/sparc/fpcmpdeshl.c: Likewise. * gcc.target/sparc/fpcmpshl.c: Likewise. * gcc.target/sparc/fpcmpurshl.c: Likewise. * gcc.target/sparc/fpcmpushl.c: Likewise. From-SVN: r250050
2017-07-07 17:42:43 +02:00 · 2017-07-07 17:42:43 +02:00 · 0316d24f7a
parent e604883c8f
commit 0316d24f7a
30 changed files with 1583 additions and 187 deletions
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@ -1,3 +1,229 @@
+2017-07-07  Jose E. Marchesi  <jose.marchesi@oracle.com>
+
+	* config/sparc/m8.md: New file.
+	* config/sparc/sparc.md: Include m8.md.
+
+2017-07-07  Jose E. Marchesi  <jose.marchesi@oracle.com>
+
+	* config/sparc/sparc.opt: New option -mvis4b.
+	* config/sparc/sparc.c (dump_target_flag_bits): Handle MASK_VIS4B.
+	(sparc_option_override): Handle VIS4B.
+	(enum sparc_builtins): Define
+	SPARC_BUILTIN_DICTUNPACK{8,16,32},
+	SPARC_BUILTIN_FPCMP{LE,GT,EQ,NE}{8,16,32}SHL,
+	SPARC_BUILTIN_FPCMPU{LE,GT}{8,16,32}SHL,
+	SPARC_BUILTIN_FPCMPDE{8,16,32}SHL and
+	SPARC_BUILTIN_FPCMPUR{8,16,32}SHL.
+	(check_constant_argument): New function.
+	(sparc_vis_init_builtins): Define builtins
+	__builtin_vis_dictunpack{8,16,32},
+	__builtin_vis_fpcmp{le,gt,eq,ne}{8,16,32}shl,
+	__builtin_vis_fpcmpu{le,gt}{8,16,32}shl,
+	__builtin_vis_fpcmpde{8,16,32}shl and
+	__builtin_vis_fpcmpur{8,16,32}shl.
+	(sparc_expand_builtin): Check that the constant operands to
+	__builtin_vis_fpcmp*shl and _builtin_vis_dictunpack* are indeed
+	constant and in range.
+	* config/sparc/sparc-c.c (sparc_target_macros): Handle
+	TARGET_VIS4B.
+	* config/sparc/sparc.h (SPARC_IMM2_P): Define.
+	(SPARC_IMM5_P): Likewise.
+	* config/sparc/sparc.md (cpu_feature): Add new feagure "vis4b".
+	(enabled): Handle vis4b.
+	(UNSPEC_DICTUNPACK): New unspec.
+	(UNSPEC_FPCMPSHL): Likewise.
+	(UNSPEC_FPUCMPSHL): Likewise.
+	(UNSPEC_FPCMPDESHL): Likewise.
+	(UNSPEC_FPCMPURSHL): Likewise.
+	(cpu_feature): New CPU feature `vis4b'.
+	(dictunpack{8,16,32}): New insns.
+	(FPCSMODE): New mode iterator.
+	(fpcscond): New code iterator.
+	(fpcsucond): Likewise.
+	(fpcmp{le,gt,eq,ne}{8,16,32}{si,di}shl): New insns.
+	(fpcmpu{le,gt}{8,16,32}{si,di}shl): Likewise.
+	(fpcmpde{8,16,32}{si,di}shl): Likewise.
+	(fpcmpur{8,16,32}{si,di}shl): Likewise.
+	* config/sparc/constraints.md: Define constraints `q' for unsigned
+	2-bit integer constants and `t' for unsigned 5-bit integer
+	constants.
+	* config/sparc/predicates.md (imm5_operand_dictunpack8): New
+	predicate.
+	(imm5_operand_dictunpack16): Likewise.
+	(imm5_operand_dictunpack32): Likewise.
+	(imm2_operand): Likewise.
+	* doc/invoke.texi (SPARC Options): Document -mvis4b.
+	* doc/extend.texi (SPARC VIS Built-in Functions): Document the
+	ditunpack* and fpcmp*shl builtins.
+
+2017-07-07  Jose E. Marchesi  <jose.marchesi@oracle.com>
+
+	* config.gcc: Handle m8 in --with-{cpu,tune} options.
+	* config.in: Add HAVE_AS_SPARC6 define.
+	* config/sparc/driver-sparc.c (cpu_names): Add entry for the SPARC
+	M8.
+	* config/sparc/sol2.h (CPP_CPU64_DEFAULT_SPEC): Define for
+	TARGET_CPU_m8.
+	(ASM_CPU32_DEFAUILT_SPEC): Likewise.
+	(CPP_CPU_SPEC): Handle m8.
+	(ASM_CPU_SPEC): Likewise.
+	* config/sparc/sparc-opts.h (enum processor_type): Add
+	PROCESSOR_M8.
+	* config/sparc/sparc.c (m8_costs): New struct.
+	(sparc_option_override): Handle TARGET_CPU_m8.
+	(sparc32_initialize_trampoline): Likewise.
+	(sparc64_initialize_trampoline): Likewise.
+	(sparc_issue_rate): Likewise.
+	(sparc_register_move_cost): Likewise.
+	* config/sparc/sparc.h (TARGET_CPU_m8): Define.
+	(CPP_CPU64_DEFAULT_SPEC): Define for M8.
+	(ASM_CPU64_DEFAULT_SPEC): Likewise.
+	(CPP_CPU_SPEC): Handle M8.
+	(ASM_CPU_SPEC): Likewise.
+	(AS_M8_FLAG): Define.
+	* config/sparc/sparc.md: Add m8 to the cpu attribute.
+	* config/sparc/sparc.opt: New option -mcpu=m8 for sparc targets.
+	* configure.ac (HAVE_AS_SPARC6): Check for assembler support for
+	M8 instructions.
+	* configure: Regenerate.
+	* doc/invoke.texi (SPARC Options): Document -mcpu=m8 and
+	-mtune=m8.
+
+2017-07-07  Jose E. Marchesi  <jose.marchesi@oracle.com>
+
+	* config/sparc/niagara7.md: Rework the DFA scheduler to use insn
+	subtypes.
+	* config/sparc/sparc.md: Remove the `v3pipe' insn attribute.
+	("*movdi_insn_sp32"): Do not set v3pipe.
+	("*movsi_insn"): Likewise.
+	("*movdi_insn_sp64"): Likewise.
+	("*movsf_insn"): Likewise.
+	("*movdf_insn_sp32"): Likewise.
+	("*movdf_insn_sp64"): Likewise.
+	("*zero_extendsidi2_insn_sp64"): Likewise.
+	("*sign_extendsidi2_insn"): Likewise.
+	("*mov<VM32:mode>_insn"): Likewise.
+	("*mov<VM64:mode>_insn_sp64"): Likewise.
+	("*mov<VM64:mode>_insn_sp32"): Likewise.
+	("<plusminus_insn><VADDSUB:mode>3"): Likewise.
+	("<vlop:code><VL:mode>3"): Likewise.
+	("*not_<vlop:code><VL:mode>3"): Likewise.
+	("*nand<VL:mode>_vis"): Likewise.
+	("*<vlnotop:code>_not1<VL:mode>_vis"): Likewise.
+	("*<vlnotop:code>_not2<VL:mode>_vis"): Likewise.
+	("one_cmpl<VL:mode>2"): Likewise.
+	("faligndata<VM64:mode>_vis"): Likewise.
+	("alignaddrsi_vis"): Likewise.
+	("alignaddrdi_vis"): Likweise.
+	("alignaddrlsi_vis"): Likewise.
+	("alignaddrldi_vis"): Likewise.
+	("fcmp<gcond:code><GCM:gcm_name><P:mode>_vis"): Likewise.
+	("bmaskdi_vis"): Likewise.
+	("bmasksi_vis"): Likewise.
+	("bshuffle<VM64:mode>_vis"): Likewise.
+	("cmask8<P:mode>_vis"): Likewise.
+	("cmask16<P:mode>_vis"): Likewise.
+	("cmask32<P:mode>_vis"): Likewise.
+	("pdistn<P:mode>_vis"): Likewise.
+	("<vis3_addsub_ss_patname><VASS:mode>3"): Likewise.
+
+2017-07-07  Jose E. Marchesi  <jose.marchesi@oracle.com>
+
+	* config/sparc/sparc.md ("subtype"): New insn attribute.
+	("*wrgsr_sp64"): Set insn subtype.
+	("*rdgsr_sp64"): Likewise.
+	("alignaddrsi_vis"): Likewise.
+	("alignaddrdi_vis"): Likewise.
+	("alignaddrlsi_vis"): Likewise.
+	("alignaddrldi_vis"): Likewise.
+	("<plusminus_insn><VADDSUB:mode>3"): Likewise.
+	("fexpand_vis"): Likewise.
+	("fpmerge_vis"): Likewise.
+	("faligndata<VM64:mode>_vis"): Likewise.
+	("bshuffle<VM64:mode>_vis"): Likewise.
+	("cmask8<P:mode>_vis"): Likewise.
+	("cmask16<P:mode>_vis"): Likewise.
+	("cmask32<P:mode>_vis"): Likewise.
+	("fchksm16_vis"): Likewise.
+	("v<vis3_shift_patname><GCM:mode>3"): Likewise.
+	("fmean16_vis"): Likewise.
+	("fp<plusminus_insn>64_vis"): Likewise.
+	("<plusminus_insn>v8qi3"): Likewise.
+	("<vis3_addsub_ss_patname><VASS:mode>3"): Likewise.
+	("<vis4_minmax_patname><VMMAX:mode>3"): Likewise.
+	("<vis4_uminmax_patname><VMMAX:mode>3"): Likewise.
+	("<vis3_addsub_ss_patname>v8qi3"): Likewise.
+	("<vis4_addsub_us_patname><VAUS:mode>3"): Likewise.
+	("*movqi_insn"): Likewise.
+	("*movhi_insn"): Likewise.
+	("*movsi_insn"): Likewise.
+	("movsi_pic_gotdata_op"): Likewise.
+	("*movdi_insn_sp32"): Likewise.
+	("*movdi_insn_sp64"): Likewise.
+	("movdi_pic_gotdata_op"): Likewise.
+	("*movsf_insn"): Likewise.
+	("*movdf_insn_sp32"): Likewise.
+	("*movdf_insn_sp64"): Likewise.
+	("*zero_extendhisi2_insn"): Likewise.
+	("*zero_extendqihi2_insn"): Likewise.
+	("*zero_extendqisi2_insn"): Likewise.
+	("*zero_extendqidi2_insn"): Likewise.
+	("*zero_extendhidi2_insn"): Likewise.
+	("*zero_extendsidi2_insn_sp64"): Likewise.
+	("ldfsr"): Likewise.
+	("prefetch_64"): Likewise.
+	("prefetch_32"): Likewise.
+	("tie_ld32"): Likewise.
+	("tie_ld64"): Likewise.
+	("*tldo_ldub_sp32"): Likewise.
+	("*tldo_ldub1_sp32"): Likewise.
+	("*tldo_ldub2_sp32"): Likewise.
+	("*tldo_ldub_sp64"): Likewise.
+	("*tldo_ldub1_sp64"): Likewise.
+	("*tldo_ldub2_sp64"): Likewise.
+	("*tldo_ldub3_sp64"): Likewise.
+	("*tldo_lduh_sp32"): Likewise.
+	("*tldo_lduh1_sp32"): Likewise.
+	("*tldo_lduh_sp64"): Likewise.
+	("*tldo_lduh1_sp64"): Likewise.
+	("*tldo_lduh2_sp64"): Likewise.
+	("*tldo_lduw_sp32"): Likewise.
+	("*tldo_lduw_sp64"): Likewise.
+	("*tldo_lduw1_sp64"): Likewise.
+	("*tldo_ldx_sp64"): Likewise.
+	("*mov<VM32:mode>_insn"): Likewise.
+	("*mov<VM64:mode>_insn_sp64"): Likewise.
+	("*mov<VM64:mode>_insn_sp32"): Likewise.
+
+2017-07-07  Jose E. Marchesi  <jose.marchesi@oracle.com>
+
+	* config/sparc/sparc.md ("type"): New insn type viscmp.
+	("fcmp<gcond:code><GCM:gcm_name><P:mode>_vis"): Set insn type to
+	viscmp.
+	("fpcmp<gcond:code>8<P:mode>_vis"): Likewise.
+	("fucmp<gcond:code>8<P:mode>_vis"): Likewise.
+	("fpcmpu<gcond:code><GCM:gcm_name><P:mode>_vis"): Likewise.
+	* config/sparc/niagara7.md ("n7_vis_logical_v3pipe"): Handle
+	viscmp.
+	("n7_vis_logical_11cycle"): Likewise.
+	* config/sparc/niagara4.md ("n4_vis_logical"): Likewise.
+	* config/sparc/niagara2.md ("niag3_vis": Likewise.
+	* config/sparc/niagara.md ("niag_vis"): Likewise.
+	* config/sparc/ultra3.md ("us3_fga"): Likewise.
+	* config/sparc/ultra1_2.md ("us1_fga_double"): Likewise.
+
+2017-07-07  Jose E. Marchesi  <jose.marchesi@oracle.com>
+
+	* config/sparc/sparc.md: New instruction type `bmask'.
+	(bmaskdi_vis): Use the `bmask' type.
+	(bmasksi_vis): Likewise.
+	* config/sparc/ultra3.md (us3_array): Likewise.
+	* config/sparc/niagara7.md (n7_array): Likewise.
+	* config/sparc/niagara4.md (n4_array): Likewise.
+	* config/sparc/niagara2.md (niag2_vis): Likewise.
+	(niag3_vis): Likewise.
+	* config/sparc/niagara.md (niag_vis): Likewise.
+
 2017-07-05  Georg-Johann Lay  <avr@gjlay.de>

 	Backport from 2017-07-05 trunk r249995.
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@ -4383,7 +4383,7 @@ case "${target}" in
 			| sparclite | f930 | f934 | sparclite86x \
 			| sparclet | tsc701 \
 			| v9 | ultrasparc | ultrasparc3 | niagara | niagara2 \
-			| niagara3 | niagara4 | niagara7)
+			| niagara3 | niagara4 | niagara7 | m8)
 				# OK
 				;;
 			*)
--- a/gcc/config.in
+++ b/gcc/config.in
@ -660,6 +660,10 @@
 #undef HAVE_AS_SPARC5_VIS4
 #endif

+/* Define if your assembler supports SPARC6 instructions. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_AS_SPARC6
+#endif

 /* Define if your assembler and linker support GOTDATA_OP relocs. */
 #ifndef USED_FOR_TARGET
--- a/gcc/config/sparc/constraints.md
+++ b/gcc/config/sparc/constraints.md
@ -19,7 +19,7 @@

 ;;; Unused letters:
 ;;;     B
-;;;    a        jkl    q  tuv xyz
+;;;    a        jkl        uv xyz


 ;; Register constraints
@ -58,6 +58,16 @@

 ;; Integer constant constraints

+(define_constraint "q"
+ "Unsigned 2-bit integer constant"
+  (and (match_code "const_int")
+       (match_test "SPARC_IMM2_P (ival)")))
+
+(define_constraint "t"
+ "Unsigned 5-bit integer constant"
+ (and (match_code "const_int")
+      (match_test "SPARC_IMM5_P (ival)")))
+
 (define_constraint "A"
 "Signed 5-bit integer constant"
 (and (match_code "const_int")
--- a/gcc/config/sparc/driver-sparc.c
+++ b/gcc/config/sparc/driver-sparc.c
@ -79,6 +79,7 @@ static const struct cpu_names {
 #endif
  { "SPARC-M7",		"niagara7" },
  { "SPARC-S7",		"niagara7" },
+  { "SPARC-M8",		"m8" },
  { NULL,	NULL }
  };

--- a/gcc/config/sparc/m8.md
+++ b/gcc/config/sparc/m8.md
@ -0,0 +1,242 @@
+;; Scheduling description for the SPARC M8.
+;;   Copyright (C) 2017 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; Thigs to improve:
+;;
+;; - Store instructions are implemented by micro-ops, one of which
+;;   generates the store address and is executed in the store address
+;;   generation unit in the slot0.  We need to model that.
+;;
+;; - There are two V3 pipes connected to different slots.  The current
+;;   implementation assumes that all the instructions executing in a
+;;   V3 pipe are issued to the unit in slot3.
+;;
+;; - Single-issue ALU operations incur an additional cycle of latency to
+;;   slot 0 and slot 1 instructions.  This is not currently reflected
+;;   in the DFA.
+
+(define_automaton "m8_0")
+
+;; The S5 core has two dual-issue queues, PQLS and PQEX.  Each queue
+;; is divided into two slots: PQLS corresponds to slots 0 and 1, and
+;; PQEX corresponds to slots 2 and 3.  The core can issue 4
+;; instructions per-cycle, and up to 4 instructions are committed each
+;; cycle.
+;;
+;;                            
+;;                   m8_slot0  - Load Unit.
+;;                             - Store address gen. Unit.
+;;                                                       
+;;                            
+;;   === PQLS ==>    m8_slot1  - Store data unit.
+;;                             - Branch unit.
+;;                                            
+;;                             
+;;   === PQEX ==>    m8_slot2  - Integer Unit (EXU2).                     
+;;                             - 3-cycles Crypto Unit (SPU2).
+;;                                                     
+;;                   m8_slot3  - Integer Unit (EXU3).
+;;                             - 3-cycles Crypto Unit (SPU3).
+;;                             - Floating-point and graphics unit (FPG).
+;;                             - Long-latency Crypto Unit.
+;;                             - Oracle Numbers Unit (ONU).
+
+(define_cpu_unit "m8_slot0,m8_slot1,m8_slot2,m8_slot3" "m8_0")
+
+;; Some instructions stall the pipeline and avoid any other
+;; instruction to be issued in the same cycle.  We assume the same for
+;; multi-instruction insns.
+
+(define_reservation "m8_single_issue" "m8_slot0 + m8_slot1 + m8_slot2 + m8_slot3")
+
+(define_insn_reservation "m8_single" 1
+  (and (eq_attr "cpu" "m8")
+       (eq_attr "type" "multi,savew,flushw,trap,bmask"))
+  "m8_single_issue")
+
+;; Most of the instructions executing in the integer units have a
+;; latency of 1.
+
+(define_insn_reservation "m8_integer" 1
+  (and (eq_attr "cpu" "m8")
+       (eq_attr "type" "ialu,ialuX,shift,cmove,compare,bmask"))
+  "(m8_slot2 | m8_slot3)")
+
+;; Flushing the instruction memory takes 27 cycles.
+
+
+(define_insn_reservation "m8_iflush" 27
+  (and (eq_attr "cpu" "m8")
+       (eq_attr "type" "iflush"))
+  "(m8_slot2 | m8_slot3), nothing*26")
+
+;; The integer multiplication instructions have a latency of 10 cycles
+;; and execute in integer units.
+;;
+;; Likewise for array*, edge* and pdistn instructions.
+;;
+;; However, the latency is only 9 cycles if the consumer of the
+;; operation is also capable of 9 cycles latency.  We model this with
+;; a bypass.
+
+(define_insn_reservation "m8_imul" 10
+  (and (eq_attr "cpu" "m8")
+       (eq_attr "type" "imul,array,edge,edgen,pdistn"))
+  "(m8_slot2 | m8_slot3), nothing*12")
+
+(define_bypass 9 "m8_imul" "m8_imul")
+
+;; The integer division instructions `sdiv' and `udivx' have a latency
+;; of 30 cycles and execute in integer units.
+
+(define_insn_reservation "m8_idiv" 30
+  (and (eq_attr "cpu" "m8")
+       (eq_attr "type" "idiv"))
+  "(m8_slot2 | m8_slot3), nothing*29")
+
+;; Both integer and floating-point load instructions have a latency of
+;; only 3 cycles,and execute in the slot0.
+;;
+;; Misaligned load instructions feature a latency of 11 cycles.
+;;
+;; The prefetch instruction also executes in the load unit, but it's
+;; latency is only 1 cycle.
+
+(define_insn_reservation "m8_load" 3
+  (and (eq_attr "cpu" "m8")
+       (ior (eq_attr "type" "fpload,sload")
+            (and (eq_attr "type" "load")
+                 (eq_attr "subtype" "regular"))))
+  "m8_slot0, nothing*2")
+
+;; (define_insn_reservation "m8_load_misalign" 11
+;;  (and (eq_attr "cpu" "m8")
+;;       (eq_attr "type" "load_mis,fpload_mis"))
+;;  "m8_slot0, nothing*10")
+
+(define_insn_reservation "m8_prefetch" 1
+  (and (eq_attr "cpu" "m8")
+       (eq_attr "type" "load")
+       (eq_attr "subtype" "prefetch"))
+  "m8_slot0")
+
+;; Both integer and floating-point store instructions have a latency
+;; of 1 cycle, and execute in the store data unit in slot1.
+;;
+;; However, misaligned store instructions feature a latency of 3
+;; cycles.
+
+(define_insn_reservation "m8_store" 1
+  (and (eq_attr "cpu" "m8")
+       (eq_attr "type" "store,fpstore"))
+  "m8_slot1")
+
+;; (define_insn_reservation "m8_store_misalign" 3
+;;   (and (eq_attr "cpu" "m8")
+;;        (eq_attr "type" "store_mis,fpstore_mis"))
+;;   "m8_slot1, nothing*2")
+
+;; Control-transfer instructions execute in the Branch Unit in the
+;; slot1.
+
+(define_insn_reservation "m8_cti" 1
+  (and (eq_attr "cpu" "m8")
+       (eq_attr "type" "cbcond,uncond_cbcond,branch,call,sibcall,call_no_delay_slot,uncond_branch,return"))
+  "m8_slot1")
+
+;; Many instructions executing in the Floating-point and Graphics Unit
+;; (FGU) serving slot3 feature a default latency of 9 cycles.
+
+(define_insn_reservation "m8_fp" 9
+  (and (eq_attr "cpu" "m8")
+       (ior (eq_attr "type" "fpmove,fpcmove,fpcrmove,fp,fpcmp,fpmul,fgm_pack,fgm_mul,pdist")
+            (and (eq_attr "type" "fga")
+                 (eq_attr "subtype" "fpu"))))
+  "m8_slot3, nothing*8")
+
+;; Floating-point division and floating-point square-root instructions
+;; have high latencies.  They execute in the FGU.
+
+(define_insn_reservation "m8_fpdivs" 26
+  (and (eq_attr "cpu" "m8")
+       (eq_attr "type" "fpdivs"))
+  "m8_slot3, nothing*25")
+
+(define_insn_reservation "m8_fpsqrts" 33
+  (and (eq_attr "cpu" "m8")
+       (eq_attr "type" "fpsqrts"))
+  "m8_slot3, nothing*32")
+
+(define_insn_reservation "m8_fpdivd" 30
+  (and (eq_attr "cpu" "m8")
+       (eq_attr "type" "fpdivd"))
+  "m8_slot3, nothing*29")
+
+(define_insn_reservation "m8_fpsqrtd" 41
+  (and (eq_attr "cpu" "m8")
+       (eq_attr "type" "fpsqrtd"))
+  "m8_slot3, nothing*40")
+
+;; SIMD VIS instructions executing in the Floating-point and graphics
+;; unit (FPG) in slot3 usually have a latency of 5 cycles.
+;;
+;; However, the latency for many instructions is only 3 cycles if the
+;; consumer can also be executed in 3 cycles.  We model this with a
+;; bypass.  In these cases the instructions are executed in one of the
+;; two 3-cycle crypto units (SPU, also known as "v3-pipes") in slots 2
+;; and 3.
+
+(define_insn_reservation "m8_vis" 5
+  (and (eq_attr "cpu" "m8")
+       (ior (eq_attr "type" "viscmp,lzd")
+            (and (eq_attr "type" "fga")
+                 (eq_attr "subtype" "maxmin,cmask,other"))
+            (and (eq_attr "type" "vismv")
+                 (eq_attr "subtype" "single,movstouw"))
+            (and (eq_attr "type" "visl")
+                 (eq_attr "subtype" "single"))))
+  "m8_slot3, nothing*4")
+
+(define_bypass 3 "m8_vis" "m8_vis")
+
+(define_insn_reservation "m8_gsr" 5
+  (and (eq_attr "cpu" "m8")
+       (eq_attr "type" "gsr")
+       (eq_attr "subtype" "alignaddr"))
+  "m8_slot3, nothing*4")
+
+;; A few VIS instructions have a latency of 1.
+
+(define_insn_reservation "m8_vis_1cycle" 1
+  (and (eq_attr "cpu" "m8")
+       (ior (and (eq_attr "type" "vismv")
+                 (eq_attr "subtype" "double,movxtod,movdtox"))
+            (and (eq_attr "type" "visl")
+                 (eq_attr "subtype" "double"))
+            (and (eq_attr "type" "fga")
+                 (eq_attr "subtype" "addsub64"))))
+  "m8_slot3")
+
+;; Reading and writing to the gsr register takes more than 70 cycles.
+
+(define_insn_reservation "m8_gsr_reg" 70
+  (and (eq_attr "cpu" "m8")
+       (eq_attr "type" "gsr")
+       (eq_attr "subtype" "reg"))
+  "m8_slot3, nothing*69")
--- a/gcc/config/sparc/niagara.md
+++ b/gcc/config/sparc/niagara.md
@ -114,5 +114,5 @@
 */
 (define_insn_reservation "niag_vis" 8
  (and (eq_attr "cpu" "niagara")
-    (eq_attr "type" "fga,visl,vismv,fgm_pack,fgm_mul,pdist,edge,edgen,gsr,array"))
+    (eq_attr "type" "fga,visl,viscmp,vismv,fgm_pack,fgm_mul,pdist,edge,edgen,gsr,array,bmask"))
  "niag_pipe*8")
--- a/gcc/config/sparc/niagara2.md
+++ b/gcc/config/sparc/niagara2.md
@ -111,10 +111,10 @@

 (define_insn_reservation "niag2_vis" 6
  (and (eq_attr "cpu" "niagara2")
-    (eq_attr "type" "fga,vismv,visl,fgm_pack,fgm_mul,pdist,edge,edgen,array,gsr"))
+    (eq_attr "type" "fga,vismv,visl,viscmp,fgm_pack,fgm_mul,pdist,edge,edgen,array,bmask,gsr"))
  "niag2_pipe*6")

 (define_insn_reservation "niag3_vis" 9
  (and (eq_attr "cpu" "niagara3")
-    (eq_attr "type" "fga,vismv,visl,fgm_pack,fgm_mul,pdist,pdistn,edge,edgen,array,gsr"))
+    (eq_attr "type" "fga,vismv,visl,viscmp,fgm_pack,fgm_mul,pdist,pdistn,edge,edgen,array,bmask,gsr"))
  "niag2_pipe*9")
--- a/gcc/config/sparc/niagara4.md
+++ b/gcc/config/sparc/niagara4.md
@ -66,7 +66,7 @@

 (define_insn_reservation "n4_array" 12
  (and (eq_attr "cpu" "niagara4")
-    (eq_attr "type" "array,edge,edgen"))
+    (eq_attr "type" "array,bmask,edge,edgen"))
  "n4_slot1, nothing*11")

 (define_insn_reservation "n4_vis_move_1cycle" 1
@ -90,8 +90,9 @@

 (define_insn_reservation "n4_vis_logical" 3
  (and (eq_attr "cpu" "niagara4")
-    (and (eq_attr "type" "visl,pdistn")
-      (eq_attr "fptype" "double")))
+       (ior (and (eq_attr "type" "visl,pdistn")
+                 (eq_attr "fptype" "double"))
+            (eq_attr "type" "viscmp")))
  "n4_slot1, nothing*2")

 (define_insn_reservation "n4_vis_logical_11cycle" 11
--- a/gcc/config/sparc/niagara7.md
+++ b/gcc/config/sparc/niagara7.md
@ -19,64 +19,120 @@

 (define_automaton "niagara7_0")

-(define_cpu_unit "n7_slot0,n7_slot1,n7_slot2" "niagara7_0")
-(define_reservation "n7_single_issue" "n7_slot0 + n7_slot1 + n7_slot2")
+;; The S4 core has a dual-issue queue.  This queue is divided into two
+;; slots.  One instruction can be issued each cycle to each slot, and
+;; up to 2 instructions are committed each cycle.  Each slot serves
+;; several execution units, as depicted below:
+;;
+;;
+;;                 m7_slot0 - Integer unit.
+;;                          - Load/Store unit.
+;; === QUEUE ==>
+;;
+;;                 m7_slot1 - Integer unit.
+;;                          - Branch unit.
+;;                          - Floating-point and graphics unit.
+;;                          - 3-cycles crypto unit.

-(define_cpu_unit "n7_load_store" "niagara7_0")
+(define_cpu_unit "n7_slot0,n7_slot1" "niagara7_0")
+
+;; Some instructions stall the pipeline and avoid any other
+;; instruction to be issued in the same cycle.  We assume the same for
+;; multi-instruction insns.
+
+(define_reservation "n7_single_issue" "n7_slot0 + n7_slot1")

 (define_insn_reservation "n7_single" 1
  (and (eq_attr "cpu" "niagara7")
    (eq_attr "type" "multi,savew,flushw,trap"))
  "n7_single_issue")

-(define_insn_reservation "n7_iflush" 27
-  (and (eq_attr "cpu" "niagara7")
-       (eq_attr "type" "iflush"))
-  "(n7_slot0 | n7_slot1), nothing*26")
+;; Most of the instructions executing in the integer unit have a
+;; latency of 1.

 (define_insn_reservation "n7_integer" 1
  (and (eq_attr "cpu" "niagara7")
    (eq_attr "type" "ialu,ialuX,shift,cmove,compare"))
  "(n7_slot0 | n7_slot1)")

+;; Flushing the instruction memory takes 27 cycles.
+
+(define_insn_reservation "n7_iflush" 27
+  (and (eq_attr "cpu" "niagara7")
+       (eq_attr "type" "iflush"))
+  "(n7_slot0 | n7_slot1), nothing*26")
+
+;; The integer multiplication instructions have a latency of 12 cycles
+;; and execute in the integer unit.
+;;
+;; Likewise for array*, edge* and pdistn instructions.
+
 (define_insn_reservation "n7_imul" 12
  (and (eq_attr "cpu" "niagara7")
-    (eq_attr "type" "imul"))
-  "n7_slot1, nothing*11")
+    (eq_attr "type" "imul,array,edge,edgen,pdistn"))
+  "(n7_slot0 | n7_slot1), nothing*11")
+
+;; The integer division instructions have a latency of 35 cycles and
+;; execute in the integer unit.

 (define_insn_reservation "n7_idiv" 35
  (and (eq_attr "cpu" "niagara7")
    (eq_attr "type" "idiv"))
-  "n7_slot1, nothing*34")
+  "(n7_slot0 | n7_slot1), nothing*34")
+
+;; Both integer and floating-point load instructions have a latency of
+;; 5 cycles, and execute in the slot0.
+;;
+;; The prefetch instruction also executes in the load/store unit, but
+;; its latency is only 1 cycle.

 (define_insn_reservation "n7_load" 5
  (and (eq_attr "cpu" "niagara7")
-    (eq_attr "type" "load,fpload,sload"))
-  "(n7_slot0 + n7_load_store), nothing*4")
+       (ior (eq_attr "type" "fpload,sload")
+            (and (eq_attr "type" "load")
+                 (eq_attr "subtype" "regular"))))
+  "n7_slot0, nothing*4")
+
+(define_insn_reservation "n7_prefetch" 1
+  (and (eq_attr "cpu" "niagara7")
+       (eq_attr "type" "load")
+       (eq_attr "subtype" "prefetch"))
+  "n7_slot0")
+
+;; Both integer and floating-point store instructions have a latency
+;; of 1 cycle, and execute in the load/store unit in slot0.

 (define_insn_reservation "n7_store" 1
  (and (eq_attr "cpu" "niagara7")
    (eq_attr "type" "store,fpstore"))
-  "(n7_slot0 | n7_slot2) + n7_load_store")
+  "n7_slot0")
+
+;; Control-transfer instructions execute in the Branch Unit in the
+;; slot1.

 (define_insn_reservation "n7_cti" 1
  (and (eq_attr "cpu" "niagara7")
    (eq_attr "type" "cbcond,uncond_cbcond,branch,call,sibcall,call_no_delay_slot,uncond_branch,return"))
  "n7_slot1")

+;; Many instructions executing in the Floating-point and Graphics unit
+;; in the slot1 feature a latency of 11 cycles.
+
 (define_insn_reservation "n7_fp" 11
  (and (eq_attr "cpu" "niagara7")
-    (eq_attr "type" "fpmove,fpcmove,fpcrmove,fp,fpcmp,fpmul"))
+       (ior (eq_attr "type" "fpmove,fpcmove,fpcrmove,fp,fpcmp,fpmul,fgm_pack,fgm_mul,pdist")
+            (and (eq_attr "type" "fga")
+                 (eq_attr "subtype" "fpu,maxmin"))))
  "n7_slot1, nothing*10")

-(define_insn_reservation "n7_array" 12
-  (and (eq_attr "cpu" "niagara7")
-    (eq_attr "type" "array,edge,edgen"))
-  "n7_slot1, nothing*11")
+;; Floating-point division and floating-point square-root instructions
+;; have high latencies.  They execute in the floating-point and
+;; graphics unit in the slot1.
+

 (define_insn_reservation "n7_fpdivs" 24
  (and (eq_attr "cpu" "niagara7")
-    (eq_attr "type" "fpdivs,fpsqrts"))
+       (eq_attr "type" "fpdivs,fpsqrts"))
  "n7_slot1, nothing*23")

 (define_insn_reservation "n7_fpdivd" 37
@ -84,53 +140,66 @@
    (eq_attr "type" "fpdivd,fpsqrtd"))
  "n7_slot1, nothing*36")

+;; SIMD VIS instructions executing in the Floating-point and graphics
+;; unit (FPG) in slot1 usually have a latency of either 11 or 12
+;; cycles.
+;;
+;; However, the latency for many instructions is only 3 cycles if the
+;; consumer can also be executed in 3 cycles.  We model this with a
+;; bypass.  In these cases the instructions are executed in the
+;; 3-cycle crypto unit which also serves slot1.
+
+(define_insn_reservation "n7_vis_11cycles" 11
+  (and (eq_attr "cpu" "niagara7")
+       (ior (and (eq_attr "type" "fga")
+                 (eq_attr "subtype" "addsub64,other"))
+            (and (eq_attr "type" "vismv")
+                 (eq_attr "subtype" "double,single"))
+            (and (eq_attr "type" "visl")
+                 (eq_attr "subtype" "double,single"))))
+  "n7_slot1, nothing*10")
+
+(define_insn_reservation "n7_vis_12cycles" 12
+  (and (eq_attr "cpu" "niagara7")
+       (ior (eq_attr "type" "bmask,viscmp")
+            (and (eq_attr "type" "fga")
+                 (eq_attr "subtype" "cmask"))
+            (and (eq_attr "type" "vismv")
+                 (eq_attr "subtype" "movstouw"))))
+  "n7_slot1, nothing*11")
+
+(define_bypass 3 "n7_vis_*" "n7_vis_*")
+
+;; Some other VIS instructions have a latency of 12 cycles, and won't
+;; be executed in the 3-cycle crypto pipe.
+
 (define_insn_reservation "n7_lzd" 12
  (and (eq_attr "cpu" "niagara7")
-       (eq_attr "type" "lzd"))
-  "(n7_slot0 | n7_slot1), nothing*11")
+       (ior (eq_attr "type" "lzd,")
+            (and (eq_attr "type" "gsr")
+                 (eq_attr "subtype" "alignaddr"))))
+  "n7_slot1, nothing*11")

-;; There is an internal unit called the "V3 pipe", that was originally
-;; intended to process some of the short cryptographic instructions.
-;; However, as soon as in the T4 several of the VIS instructions
-;; (notably non-FP instructions) have been moved to the V3 pipe.
-;; Consequently, these instructions feature a latency of 3 instead of
-;; 11 or 12 cycles, provided their consumers also execute in the V3
-;; pipe.
-;;
-;; This is modelled here with a bypass.
+;; A couple of VIS instructions feature very low latencies in the M7.

-(define_insn_reservation "n7_vis_fga" 11
+(define_insn_reservation "n7_single_vis" 1
  (and (eq_attr "cpu" "niagara7")
-    (eq_attr "type" "fga,gsr"))
-  "n7_slot1, nothing*10")
-
-(define_insn_reservation "n7_vis_fgm" 11
-  (and (eq_attr "cpu" "niagara7")
-    (eq_attr "type" "fgm_pack,fgm_mul,pdist"))
-  "n7_slot1, nothing*10")
-
-(define_insn_reservation "n7_vis_move_v3pipe" 11
-  (and (eq_attr "cpu" "niagara7")
-    (and (eq_attr "type" "vismv")
-         (eq_attr "v3pipe" "true")))
+       (eq_attr "type" "vismv")
+       (eq_attr "subtype" "movxtod"))
  "n7_slot1")

-(define_insn_reservation "n7_vis_move_11cycle" 11
+(define_insn_reservation "n7_double_vis" 2
  (and (eq_attr "cpu" "niagara7")
-    (and (eq_attr "type" "vismv")
-         (eq_attr "v3pipe" "false")))
-  "n7_slot1, nothing*10")
+       (eq_attr "type" "vismv")
+       (eq_attr "subtype" "movdtox"))
+  "n7_slot1, nothing")

-(define_insn_reservation "n7_vis_logical_v3pipe" 11
+;; Reading and writing to the gsr register takes a high number of
+;; cycles that is not documented in the PRM.  Let's use the same value
+;; than the M8.
+
+(define_insn_reservation "n7_gsr_reg" 70
  (and (eq_attr "cpu" "niagara7")
-    (and (eq_attr "type" "visl,pdistn")
-         (eq_attr "v3pipe" "true")))
-  "n7_slot1, nothing*2")
-
-(define_insn_reservation "n7_vis_logical_11cycle" 11
-  (and (eq_attr "cpu" "niagara7")
-    (and (eq_attr "type" "visl")
-      (eq_attr "v3pipe" "false")))
-  "n7_slot1, nothing*10")
-
-(define_bypass 3 "*_v3pipe" "*_v3pipe")
+       (eq_attr "type" "gsr")
+       (eq_attr "subtype" "reg"))
+  "n7_slot1, nothing*70")
--- a/gcc/config/sparc/predicates.md
+++ b/gcc/config/sparc/predicates.md
@ -328,6 +328,33 @@
       (and (match_code "const_int")
            (match_test "SPARC_SIMM5_P (INTVAL (op))"))))

+;; Return true if OP is a constant in the range 0..7.  This is an
+;; acceptable second operand for dictunpack instructions setting a
+;; V8QI mode in the destination register.
+(define_predicate "imm5_operand_dictunpack8"
+  (and (match_code "const_int")
+       (match_test "(INTVAL (op) >= 0 && INTVAL (op) < 8)")))
+
+;; Return true if OP is a constant in the range 7..15.  This is an
+;; acceptable second operand for dictunpack instructions setting a
+;; V4HI mode in the destination register.
+(define_predicate "imm5_operand_dictunpack16"
+  (and (match_code "const_int")
+       (match_test "(INTVAL (op) >= 8 && INTVAL (op) < 16)")))
+
+;; Return true if OP is a constant in the range 15..31.  This is an
+;; acceptable second operand for dictunpack instructions setting a
+;; V2SI mode in the destination register.
+(define_predicate "imm5_operand_dictunpack32"
+  (and (match_code "const_int")
+       (match_test "(INTVAL (op) >= 16 && INTVAL (op) < 32)")))
+
+;; Return true if OP is a constant that is representable by a 2-bit
+;; unsigned field.  This is an acceptable third operand for
+;; fpcmp*shl instructions.
+(define_predicate "imm2_operand"
+  (and (match_code "const_int")
+       (match_test "SPARC_IMM2_P (INTVAL (op))")))

 ;; Predicates for miscellaneous instructions.

--- a/gcc/config/sparc/sol2.h
+++ b/gcc/config/sparc/sol2.h
@ -174,13 +174,22 @@ along with GCC; see the file COPYING3.  If not see
 #define ASM_CPU64_DEFAULT_SPEC AS_SPARC64_FLAG AS_NIAGARA7_FLAG
 #endif

+#if TARGET_CPU_DEFAULT == TARGET_CPU_m8
+#undef CPP_CPU64_DEFAULT_SPEC
+#define CPP_CPU64_DEFAULT_SPEC ""
+#undef ASM_CPU32_DEFAULT_SPEC
+#define ASM_CPU32_DEFAULT_SPEC AS_SPARC32_FLAG AS_M8_FLAG
+#undef ASM_CPU64_DEFAULT_SPEC
+#define ASM_CPU64_DEFAULT_SPEC AS_SPARC64_FLAG AS_M8_FLAG
+#endif
+
 #undef CPP_CPU_SPEC
 #define CPP_CPU_SPEC "\
 %{mcpu=sparclet|mcpu=tsc701:-D__sparclet__} \
 %{mcpu=sparclite|mcpu-f930|mcpu=f934:-D__sparclite__} \
 %{mcpu=v8:" DEF_ARCH32_SPEC("-D__sparcv8") "} \
 %{mcpu=supersparc:-D__supersparc__ " DEF_ARCH32_SPEC("-D__sparcv8") "} \
-%{mcpu=v9|mcpu=ultrasparc|mcpu=ultrasparc3|mcpu=niagara|mcpu=niagara2|mcpu=niagara3|mcpu=niagara4|mcpu=niagara7:" DEF_ARCH32_SPEC("-D__sparcv8") "} \
+%{mcpu=v9|mcpu=ultrasparc|mcpu=ultrasparc3|mcpu=niagara|mcpu=niagara2|mcpu=niagara3|mcpu=niagara4|mcpu=niagara7|mcpu=m8:" DEF_ARCH32_SPEC("-D__sparcv8") "} \
 %{!mcpu*:%(cpp_cpu_default)} \
 "

@ -290,7 +299,8 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 %{mcpu=niagara3:" DEF_ARCH32_SPEC("-xarch=v8plus" AS_NIAGARA3_FLAG) DEF_ARCH64_SPEC("-xarch=v9" AS_NIAGARA3_FLAG) "} \
 %{mcpu=niagara4:" DEF_ARCH32_SPEC(AS_SPARC32_FLAG AS_NIAGARA4_FLAG) DEF_ARCH64_SPEC(AS_SPARC64_FLAG AS_NIAGARA4_FLAG) "} \
 %{mcpu=niagara7:" DEF_ARCH32_SPEC(AS_SPARC32_FLAG AS_NIAGARA7_FLAG) DEF_ARCH64_SPEC(AS_SPARC64_FLAG AS_NIAGARA7_FLAG) "} \
-%{!mcpu=niagara7:%{!mcpu=niagara4:%{!mcpu=niagara3:%{!mcpu=niagara2:%{!mcpu=niagara:%{!mcpu=ultrasparc3:%{!mcpu=ultrasparc:%{!mcpu=v9:%{mcpu*:" DEF_ARCH32_SPEC("-xarch=v8") DEF_ARCH64_SPEC("-xarch=v9") "}}}}}}}}} \
+%{mcpu=m8:" DEF_ARCH32_SPEC(AS_SPARC32_FLAG AS_M8_FLAG) DEF_ARCH64_SPEC(AS_SPARC64_FLAG AS_M8_FLAG) "} \
+%{!mcpu=m8:%{!mcpu=niagara7:%{!mcpu=niagara4:%{!mcpu=niagara3:%{!mcpu=niagara2:%{!mcpu=niagara:%{!mcpu=ultrasparc3:%{!mcpu=ultrasparc:%{!mcpu=v9:%{mcpu*:" DEF_ARCH32_SPEC("-xarch=v8") DEF_ARCH64_SPEC("-xarch=v9") "}}}}}}}}}} \
 %{!mcpu*:%(asm_cpu_default)} \
 "

--- a/gcc/config/sparc/sparc-c.c
+++ b/gcc/config/sparc/sparc-c.c
@ -40,7 +40,12 @@ sparc_target_macros (void)
      cpp_assert (parse_in, "machine=sparc");
    }

-  if (TARGET_VIS4)
+  if (TARGET_VIS4B)
+    {
+      cpp_define (parse_in, "__VIS__=0x410");
+      cpp_define (parse_in, "__VIS=0x410");
+    }
+  else if (TARGET_VIS4)
    {
      cpp_define (parse_in, "__VIS__=0x400");
      cpp_define (parse_in, "__VIS=0x400");
--- a/gcc/config/sparc/sparc-opts.h
+++ b/gcc/config/sparc/sparc-opts.h
@ -46,6 +46,7 @@ enum processor_type {
  PROCESSOR_NIAGARA3,
  PROCESSOR_NIAGARA4,
  PROCESSOR_NIAGARA7,
+  PROCESSOR_M8,
  PROCESSOR_NATIVE
 };

--- a/gcc/config/sparc/sparc.c
+++ b/gcc/config/sparc/sparc.c
@ -448,6 +448,30 @@ struct processor_costs niagara7_costs = {
  0, /* shift penalty */
 };

+static const
+struct processor_costs m8_costs = {
+  COSTS_N_INSNS (3), /* int load */
+  COSTS_N_INSNS (3), /* int signed load */
+  COSTS_N_INSNS (3), /* int zeroed load */
+  COSTS_N_INSNS (3), /* float load */
+  COSTS_N_INSNS (9), /* fmov, fneg, fabs */
+  COSTS_N_INSNS (9), /* fadd, fsub */
+  COSTS_N_INSNS (9), /* fcmp */
+  COSTS_N_INSNS (9), /* fmov, fmovr */
+  COSTS_N_INSNS (9), /* fmul */
+  COSTS_N_INSNS (26), /* fdivs */
+  COSTS_N_INSNS (30), /* fdivd */
+  COSTS_N_INSNS (33), /* fsqrts */
+  COSTS_N_INSNS (41), /* fsqrtd */
+  COSTS_N_INSNS (12), /* imul */
+  COSTS_N_INSNS (10), /* imulX */
+  0, /* imul bit factor */
+  COSTS_N_INSNS (57), /* udiv/sdiv */
+  COSTS_N_INSNS (30), /* udivx/sdivx */
+  COSTS_N_INSNS (1), /* movcc/movr */
+  0, /* shift penalty */
+};
+
 static const struct processor_costs *sparc_costs = &cypress_costs;

 #ifdef HAVE_AS_RELAX_OPTION
@ -1222,6 +1246,8 @@ dump_target_flag_bits (const int flags)
    fprintf (stderr, "VIS3 ");
  if (flags & MASK_VIS4)
    fprintf (stderr, "VIS4 ");
+  if (flags & MASK_VIS4B)
+    fprintf (stderr, "VIS4B ");
  if (flags & MASK_CBCOND)
    fprintf (stderr, "CBCOND ");
  if (flags & MASK_DEPRECATED_V8_INSNS)
@ -1286,6 +1312,7 @@ sparc_option_override (void)
    { TARGET_CPU_niagara3, PROCESSOR_NIAGARA3 },
    { TARGET_CPU_niagara4, PROCESSOR_NIAGARA4 },
    { TARGET_CPU_niagara7, PROCESSOR_NIAGARA7 },
+    { TARGET_CPU_m8, PROCESSOR_M8 },
    { -1, PROCESSOR_V7 }
  };
  const struct cpu_default *def;
@ -1337,7 +1364,11 @@ sparc_option_override (void)
      MASK_V9|MASK_POPC|MASK_VIS3|MASK_FMAF|MASK_CBCOND },
    /* UltraSPARC M7 */
    { "niagara7",	MASK_ISA,
-      MASK_V9|MASK_POPC|MASK_VIS4|MASK_FMAF|MASK_CBCOND|MASK_SUBXC }
+      MASK_V9|MASK_POPC|MASK_VIS4|MASK_FMAF|MASK_CBCOND|MASK_SUBXC },
+    /* UltraSPARC M8 */
+    { "m8",		MASK_ISA,
+      MASK_V9|MASK_POPC|MASK_VIS4|MASK_FMAF|MASK_CBCOND|MASK_SUBXC
+      |MASK_VIS4B }
  };
  const struct cpu_table *cpu;
  unsigned int i;
@ -1467,6 +1498,9 @@ sparc_option_override (void)
 #ifndef HAVE_AS_SPARC5_VIS4
 		   & ~(MASK_VIS4 | MASK_SUBXC)
 #endif
+#ifndef HAVE_AS_SPARC6
+		   & ~(MASK_VIS4B)
+#endif
 #ifndef HAVE_AS_LEON
 		   & ~(MASK_LEON | MASK_LEON3)
 #endif
@ -1485,11 +1519,15 @@ sparc_option_override (void)
  if (TARGET_VIS4)
    target_flags |= MASK_VIS3 | MASK_VIS2 | MASK_VIS;

-  /* Don't allow -mvis, -mvis2, -mvis3, -mvis4 or -mfmaf if FPU is
-     disabled.  */
+  /* -mvis4b implies -mvis4, -mvis3, -mvis2 and -mvis */
+  if (TARGET_VIS4B)
+    target_flags |= MASK_VIS4 | MASK_VIS3 | MASK_VIS2 | MASK_VIS;
+
+  /* Don't allow -mvis, -mvis2, -mvis3, -mvis4, -mvis4b and -mfmaf if
+     FPU is disabled.  */
  if (! TARGET_FPU)
    target_flags &= ~(MASK_VIS | MASK_VIS2 | MASK_VIS3 | MASK_VIS4
-		      | MASK_FMAF);
+		      | MASK_VIS4B | MASK_FMAF);

  /* -mvis assumes UltraSPARC+, so we are sure v9 instructions
     are available; -m64 also implies v9.  */
@ -1529,7 +1567,8 @@ sparc_option_override (void)
 	  || sparc_cpu == PROCESSOR_NIAGARA3
 	  || sparc_cpu == PROCESSOR_NIAGARA4)
 	align_functions = 32;
-      else if (sparc_cpu == PROCESSOR_NIAGARA7)
+      else if (sparc_cpu == PROCESSOR_NIAGARA7
+	       || sparc_cpu == PROCESSOR_M8)
 	align_functions = 64;
    }

@ -1597,6 +1636,9 @@ sparc_option_override (void)
    case PROCESSOR_NIAGARA7:
      sparc_costs = &niagara7_costs;
      break;
+    case PROCESSOR_M8:
+      sparc_costs = &m8_costs;
+      break;
    case PROCESSOR_NATIVE:
      gcc_unreachable ();
    };
@ -1659,13 +1701,14 @@ sparc_option_override (void)
 			   || sparc_cpu == PROCESSOR_NIAGARA4)
 			  ? 2
 			  : (sparc_cpu == PROCESSOR_ULTRASPARC3
-			     ? 8 : (sparc_cpu == PROCESSOR_NIAGARA7
+			     ? 8 : ((sparc_cpu == PROCESSOR_NIAGARA7
+				     || sparc_cpu == PROCESSOR_M8)
 				    ? 32 : 3))),
 			 global_options.x_param_values,
 			 global_options_set.x_param_values);

-  /* For PARAM_L1_CACHE_LINE_SIZE we use the default 32 bytes (see
-     params.def), so no maybe_set_param_value is needed.
+  /* PARAM_L1_CACHE_LINE_SIZE is the size of the L1 cache line, in
+     bytes.

     The Oracle SPARC Architecture (previously the UltraSPARC
     Architecture) specification states that when a PREFETCH[A]
@ -1681,6 +1724,11 @@ sparc_option_override (void)
     L2 and L3, but only 32B are brought into the L1D$. (Assuming it
     is a read_n prefetch, which is the only type which allocates to
     the L1.)  */
+  maybe_set_param_value (PARAM_L1_CACHE_LINE_SIZE,
+			 (sparc_cpu == PROCESSOR_M8
+			  ? 64 : 32),
+			 global_options.x_param_values,
+			 global_options_set.x_param_values);

  /* PARAM_L1_CACHE_SIZE is the size of the L1D$ (most SPARC chips use
     Hardvard level-1 caches) in kilobytes.  Both UltraSPARC and
@ -1692,7 +1740,8 @@ sparc_option_override (void)
 			   || sparc_cpu == PROCESSOR_NIAGARA2
 			   || sparc_cpu == PROCESSOR_NIAGARA3
 			   || sparc_cpu == PROCESSOR_NIAGARA4
-			   || sparc_cpu == PROCESSOR_NIAGARA7)
+			   || sparc_cpu == PROCESSOR_NIAGARA7
+			   || sparc_cpu == PROCESSOR_M8)
 			  ? 16 : 64),
 			 global_options.x_param_values,
 			 global_options_set.x_param_values);
@ -1701,7 +1750,8 @@ sparc_option_override (void)
  /* PARAM_L2_CACHE_SIZE is the size fo the L2 in kilobytes.  Note
     that 512 is the default in params.def.  */
  maybe_set_param_value (PARAM_L2_CACHE_SIZE,
-			 (sparc_cpu == PROCESSOR_NIAGARA4
+			 ((sparc_cpu == PROCESSOR_NIAGARA4
+			   || sparc_cpu == PROCESSOR_M8)
 			  ? 128 : (sparc_cpu == PROCESSOR_NIAGARA7
 				   ? 256 : 512)),
 			 global_options.x_param_values,
@ -9478,7 +9528,8 @@ sparc32_initialize_trampoline (rtx m_tramp, rtx fnaddr, rtx cxt)
      && sparc_cpu != PROCESSOR_NIAGARA2
      && sparc_cpu != PROCESSOR_NIAGARA3
      && sparc_cpu != PROCESSOR_NIAGARA4
-      && sparc_cpu != PROCESSOR_NIAGARA7)
+      && sparc_cpu != PROCESSOR_NIAGARA7
+      && sparc_cpu != PROCESSOR_M8)
    emit_insn (gen_flushsi (validize_mem (adjust_address (m_tramp, SImode, 8))));

  /* Call __enable_execute_stack after writing onto the stack to make sure
@ -9524,7 +9575,8 @@ sparc64_initialize_trampoline (rtx m_tramp, rtx fnaddr, rtx cxt)
      && sparc_cpu != PROCESSOR_NIAGARA2
      && sparc_cpu != PROCESSOR_NIAGARA3
      && sparc_cpu != PROCESSOR_NIAGARA4
-      && sparc_cpu != PROCESSOR_NIAGARA7)
+      && sparc_cpu != PROCESSOR_NIAGARA7
+      && sparc_cpu != PROCESSOR_M8)
    emit_insn (gen_flushdi (validize_mem (adjust_address (m_tramp, DImode, 8))));

  /* Call __enable_execute_stack after writing onto the stack to make sure
@ -9724,7 +9776,8 @@ sparc_use_sched_lookahead (void)
      || sparc_cpu == PROCESSOR_NIAGARA3)
    return 0;
  if (sparc_cpu == PROCESSOR_NIAGARA4
-      || sparc_cpu == PROCESSOR_NIAGARA7)
+      || sparc_cpu == PROCESSOR_NIAGARA7
+      || sparc_cpu == PROCESSOR_M8)
    return 2;
  if (sparc_cpu == PROCESSOR_ULTRASPARC
      || sparc_cpu == PROCESSOR_ULTRASPARC3)
@ -9758,6 +9811,7 @@ sparc_issue_rate (void)
      return 2;
    case PROCESSOR_ULTRASPARC:
    case PROCESSOR_ULTRASPARC3:
+    case PROCESSOR_M8:
      return 4;
    }
 }
@ -10340,6 +10394,45 @@ enum sparc_builtins
  SPARC_BUILTIN_FPSUBS8,
  SPARC_BUILTIN_FPSUBUS8,
  SPARC_BUILTIN_FPSUBUS16,
+
+  /* VIS 4.0B builtins.  */
+
+  /* Note that all the DICTUNPACK* entries should be kept
+     contiguous.  */
+  SPARC_BUILTIN_FIRST_DICTUNPACK,
+  SPARC_BUILTIN_DICTUNPACK8 = SPARC_BUILTIN_FIRST_DICTUNPACK,
+  SPARC_BUILTIN_DICTUNPACK16,
+  SPARC_BUILTIN_DICTUNPACK32,
+  SPARC_BUILTIN_LAST_DICTUNPACK = SPARC_BUILTIN_DICTUNPACK32,
+
+  /* Note that all the FPCMP*SHL entries should be kept
+     contiguous.  */
+  SPARC_BUILTIN_FIRST_FPCMPSHL,
+  SPARC_BUILTIN_FPCMPLE8SHL = SPARC_BUILTIN_FIRST_FPCMPSHL,
+  SPARC_BUILTIN_FPCMPGT8SHL,
+  SPARC_BUILTIN_FPCMPEQ8SHL,
+  SPARC_BUILTIN_FPCMPNE8SHL,
+  SPARC_BUILTIN_FPCMPLE16SHL,
+  SPARC_BUILTIN_FPCMPGT16SHL,
+  SPARC_BUILTIN_FPCMPEQ16SHL,
+  SPARC_BUILTIN_FPCMPNE16SHL,
+  SPARC_BUILTIN_FPCMPLE32SHL,
+  SPARC_BUILTIN_FPCMPGT32SHL,
+  SPARC_BUILTIN_FPCMPEQ32SHL,
+  SPARC_BUILTIN_FPCMPNE32SHL,
+  SPARC_BUILTIN_FPCMPULE8SHL,
+  SPARC_BUILTIN_FPCMPUGT8SHL,
+  SPARC_BUILTIN_FPCMPULE16SHL,
+  SPARC_BUILTIN_FPCMPUGT16SHL,
+  SPARC_BUILTIN_FPCMPULE32SHL,
+  SPARC_BUILTIN_FPCMPUGT32SHL,
+  SPARC_BUILTIN_FPCMPDE8SHL,
+  SPARC_BUILTIN_FPCMPDE16SHL,
+  SPARC_BUILTIN_FPCMPDE32SHL,
+  SPARC_BUILTIN_FPCMPUR8SHL,
+  SPARC_BUILTIN_FPCMPUR16SHL,
+  SPARC_BUILTIN_FPCMPUR32SHL,
+  SPARC_BUILTIN_LAST_FPCMPSHL = SPARC_BUILTIN_FPCMPUR32SHL,
  
  SPARC_BUILTIN_MAX
 };
@ -10347,6 +10440,27 @@ enum sparc_builtins
 static GTY (()) tree sparc_builtins[(int) SPARC_BUILTIN_MAX];
 static enum insn_code sparc_builtins_icode[(int) SPARC_BUILTIN_MAX];

+/* Return true if OPVAL can be used for operand OPNUM of instruction ICODE.
+   The instruction should require a constant operand of some sort.  The
+   function prints an error if OPVAL is not valid.  */
+
+static int
+check_constant_argument (enum insn_code icode, int opnum, rtx opval)
+{
+  if (GET_CODE (opval) != CONST_INT)
+    {
+      error ("%qs expects a constant argument", insn_data[icode].name);
+      return false;
+    }
+
+  if (!(*insn_data[icode].operand[opnum].predicate) (opval, VOIDmode))
+    {
+      error ("constant argument out of range for %qs", insn_data[icode].name);
+      return false;
+    }
+  return true;
+}
+
 /* Add a SPARC builtin function with NAME, ICODE, CODE and TYPE.  Return the
   function decl or NULL_TREE if the builtin was not added.  */

@ -10440,6 +10554,12 @@ sparc_vis_init_builtins (void)
 						      v8qi, v8qi, 0);
  tree si_ftype_v8qi_v8qi = build_function_type_list (intSI_type_node,
 						      v8qi, v8qi, 0);
+  tree v8qi_ftype_df_si = build_function_type_list (v8qi, double_type_node,
+						    intSI_type_node, 0);
+  tree v4hi_ftype_df_si = build_function_type_list (v4hi, double_type_node,
+						    intSI_type_node, 0);
+  tree v2si_ftype_df_si = build_function_type_list (v2si, double_type_node,
+						    intDI_type_node, 0);
  tree di_ftype_di_di = build_function_type_list (intDI_type_node,
 						  intDI_type_node,
 						  intDI_type_node, 0);
@ -10894,6 +11014,156 @@ sparc_vis_init_builtins (void)
      def_builtin_const ("__builtin_vis_fpsubus16", CODE_FOR_ussubv4hi3,
 			 SPARC_BUILTIN_FPSUBUS16, v4hi_ftype_v4hi_v4hi);
    }
+
+  if (TARGET_VIS4B)
+    {
+      def_builtin_const ("__builtin_vis_dictunpack8", CODE_FOR_dictunpack8,
+			 SPARC_BUILTIN_DICTUNPACK8, v8qi_ftype_df_si);
+      def_builtin_const ("__builtin_vis_dictunpack16", CODE_FOR_dictunpack16,
+			 SPARC_BUILTIN_DICTUNPACK16, v4hi_ftype_df_si);
+      def_builtin_const ("__builtin_vis_dictunpack32", CODE_FOR_dictunpack32,
+			 SPARC_BUILTIN_DICTUNPACK32, v2si_ftype_df_si);
+
+      if (TARGET_ARCH64)
+	{
+	  tree di_ftype_v8qi_v8qi_si = build_function_type_list (intDI_type_node,
+								 v8qi, v8qi,
+								 intSI_type_node, 0);
+	  tree di_ftype_v4hi_v4hi_si = build_function_type_list (intDI_type_node,
+								 v4hi, v4hi,
+								 intSI_type_node, 0);
+	  tree di_ftype_v2si_v2si_si = build_function_type_list (intDI_type_node,
+								 v2si, v2si,
+								 intSI_type_node, 0);
+	  
+	  def_builtin_const ("__builtin_vis_fpcmple8shl", CODE_FOR_fpcmple8dishl,
+			     SPARC_BUILTIN_FPCMPLE8SHL, di_ftype_v8qi_v8qi_si);
+	  def_builtin_const ("__builtin_vis_fpcmpgt8shl", CODE_FOR_fpcmpgt8dishl,
+			     SPARC_BUILTIN_FPCMPGT8SHL, di_ftype_v8qi_v8qi_si);
+	  def_builtin_const ("__builtin_vis_fpcmpeq8shl", CODE_FOR_fpcmpeq8dishl,
+			     SPARC_BUILTIN_FPCMPEQ8SHL, di_ftype_v8qi_v8qi_si);
+	  def_builtin_const ("__builtin_vis_fpcmpne8shl", CODE_FOR_fpcmpne8dishl,
+			     SPARC_BUILTIN_FPCMPNE8SHL, di_ftype_v8qi_v8qi_si);
+
+	  def_builtin_const ("__builtin_vis_fpcmple16shl", CODE_FOR_fpcmple16dishl,
+			     SPARC_BUILTIN_FPCMPLE16SHL, di_ftype_v4hi_v4hi_si);
+	  def_builtin_const ("__builtin_vis_fpcmpgt16shl", CODE_FOR_fpcmpgt16dishl,
+			     SPARC_BUILTIN_FPCMPGT16SHL, di_ftype_v4hi_v4hi_si);
+	  def_builtin_const ("__builtin_vis_fpcmpeq16shl", CODE_FOR_fpcmpeq16dishl,
+			     SPARC_BUILTIN_FPCMPEQ16SHL, di_ftype_v4hi_v4hi_si);
+	  def_builtin_const ("__builtin_vis_fpcmpne16shl", CODE_FOR_fpcmpne16dishl,
+			     SPARC_BUILTIN_FPCMPNE16SHL, di_ftype_v4hi_v4hi_si);
+
+	  def_builtin_const ("__builtin_vis_fpcmple32shl", CODE_FOR_fpcmple32dishl,
+			     SPARC_BUILTIN_FPCMPLE32SHL, di_ftype_v2si_v2si_si);
+	  def_builtin_const ("__builtin_vis_fpcmpgt32shl", CODE_FOR_fpcmpgt32dishl,
+			     SPARC_BUILTIN_FPCMPGT32SHL, di_ftype_v2si_v2si_si);
+	  def_builtin_const ("__builtin_vis_fpcmpeq32shl", CODE_FOR_fpcmpeq32dishl,
+			     SPARC_BUILTIN_FPCMPEQ32SHL, di_ftype_v2si_v2si_si);
+	  def_builtin_const ("__builtin_vis_fpcmpne32shl", CODE_FOR_fpcmpne32dishl,
+			     SPARC_BUILTIN_FPCMPNE32SHL, di_ftype_v2si_v2si_si);
+
+
+	  def_builtin_const ("__builtin_vis_fpcmpule8shl", CODE_FOR_fpcmpule8dishl,
+			     SPARC_BUILTIN_FPCMPULE8SHL, di_ftype_v8qi_v8qi_si);
+	  def_builtin_const ("__builtin_vis_fpcmpugt8shl", CODE_FOR_fpcmpugt8dishl,
+			     SPARC_BUILTIN_FPCMPUGT8SHL, di_ftype_v8qi_v8qi_si);
+
+	  def_builtin_const ("__builtin_vis_fpcmpule16shl", CODE_FOR_fpcmpule16dishl,
+			     SPARC_BUILTIN_FPCMPULE16SHL, di_ftype_v4hi_v4hi_si);
+	  def_builtin_const ("__builtin_vis_fpcmpugt16shl", CODE_FOR_fpcmpugt16dishl,
+			     SPARC_BUILTIN_FPCMPUGT16SHL, di_ftype_v4hi_v4hi_si);
+
+	  def_builtin_const ("__builtin_vis_fpcmpule32shl", CODE_FOR_fpcmpule32dishl,
+			     SPARC_BUILTIN_FPCMPULE32SHL, di_ftype_v2si_v2si_si);
+	  def_builtin_const ("__builtin_vis_fpcmpugt32shl", CODE_FOR_fpcmpugt32dishl,
+			     SPARC_BUILTIN_FPCMPUGT32SHL, di_ftype_v2si_v2si_si);
+
+	  def_builtin_const ("__builtin_vis_fpcmpde8shl", CODE_FOR_fpcmpde8dishl,
+			     SPARC_BUILTIN_FPCMPDE8SHL, di_ftype_v8qi_v8qi_si);
+	  def_builtin_const ("__builtin_vis_fpcmpde16shl", CODE_FOR_fpcmpde16dishl,
+			     SPARC_BUILTIN_FPCMPDE16SHL, di_ftype_v4hi_v4hi_si);
+	  def_builtin_const ("__builtin_vis_fpcmpde32shl", CODE_FOR_fpcmpde32dishl,
+			     SPARC_BUILTIN_FPCMPDE32SHL, di_ftype_v2si_v2si_si);
+
+	  def_builtin_const ("__builtin_vis_fpcmpur8shl", CODE_FOR_fpcmpur8dishl,
+			     SPARC_BUILTIN_FPCMPUR8SHL, di_ftype_v8qi_v8qi_si);
+	  def_builtin_const ("__builtin_vis_fpcmpur16shl", CODE_FOR_fpcmpur16dishl,
+			     SPARC_BUILTIN_FPCMPUR16SHL, di_ftype_v4hi_v4hi_si);
+	  def_builtin_const ("__builtin_vis_fpcmpur32shl", CODE_FOR_fpcmpur32dishl,
+			     SPARC_BUILTIN_FPCMPUR32SHL, di_ftype_v2si_v2si_si);
+
+	}
+      else
+	{
+	  tree si_ftype_v8qi_v8qi_si = build_function_type_list (intSI_type_node,
+								 v8qi, v8qi,
+								 intSI_type_node, 0);
+	  tree si_ftype_v4hi_v4hi_si = build_function_type_list (intSI_type_node,
+								 v4hi, v4hi,
+								 intSI_type_node, 0);
+	  tree si_ftype_v2si_v2si_si = build_function_type_list (intSI_type_node,
+								 v2si, v2si,
+								 intSI_type_node, 0);
+	  
+	  def_builtin_const ("__builtin_vis_fpcmple8shl", CODE_FOR_fpcmple8sishl,
+			     SPARC_BUILTIN_FPCMPLE8SHL, si_ftype_v8qi_v8qi_si);
+	  def_builtin_const ("__builtin_vis_fpcmpgt8shl", CODE_FOR_fpcmpgt8sishl,
+			     SPARC_BUILTIN_FPCMPGT8SHL, si_ftype_v8qi_v8qi_si);
+	  def_builtin_const ("__builtin_vis_fpcmpeq8shl", CODE_FOR_fpcmpeq8sishl,
+			     SPARC_BUILTIN_FPCMPEQ8SHL, si_ftype_v8qi_v8qi_si);
+	  def_builtin_const ("__builtin_vis_fpcmpne8shl", CODE_FOR_fpcmpne8sishl,
+			     SPARC_BUILTIN_FPCMPNE8SHL, si_ftype_v8qi_v8qi_si);
+
+	  def_builtin_const ("__builtin_vis_fpcmple16shl", CODE_FOR_fpcmple16sishl,
+			     SPARC_BUILTIN_FPCMPLE16SHL, si_ftype_v4hi_v4hi_si);
+	  def_builtin_const ("__builtin_vis_fpcmpgt16shl", CODE_FOR_fpcmpgt16sishl,
+			     SPARC_BUILTIN_FPCMPGT16SHL, si_ftype_v4hi_v4hi_si);
+	  def_builtin_const ("__builtin_vis_fpcmpeq16shl", CODE_FOR_fpcmpeq16sishl,
+			     SPARC_BUILTIN_FPCMPEQ16SHL, si_ftype_v4hi_v4hi_si);
+	  def_builtin_const ("__builtin_vis_fpcmpne16shl", CODE_FOR_fpcmpne16sishl,
+			     SPARC_BUILTIN_FPCMPNE16SHL, si_ftype_v4hi_v4hi_si);
+
+	  def_builtin_const ("__builtin_vis_fpcmple32shl", CODE_FOR_fpcmple32sishl,
+			     SPARC_BUILTIN_FPCMPLE32SHL, si_ftype_v2si_v2si_si);
+	  def_builtin_const ("__builtin_vis_fpcmpgt32shl", CODE_FOR_fpcmpgt32sishl,
+			     SPARC_BUILTIN_FPCMPGT32SHL, si_ftype_v2si_v2si_si);
+	  def_builtin_const ("__builtin_vis_fpcmpeq32shl", CODE_FOR_fpcmpeq32sishl,
+			     SPARC_BUILTIN_FPCMPEQ32SHL, si_ftype_v2si_v2si_si);
+	  def_builtin_const ("__builtin_vis_fpcmpne32shl", CODE_FOR_fpcmpne32sishl,
+			     SPARC_BUILTIN_FPCMPNE32SHL, si_ftype_v2si_v2si_si);
+
+
+	  def_builtin_const ("__builtin_vis_fpcmpule8shl", CODE_FOR_fpcmpule8sishl,
+			     SPARC_BUILTIN_FPCMPULE8SHL, si_ftype_v8qi_v8qi_si);
+	  def_builtin_const ("__builtin_vis_fpcmpugt8shl", CODE_FOR_fpcmpugt8sishl,
+			     SPARC_BUILTIN_FPCMPUGT8SHL, si_ftype_v8qi_v8qi_si);
+
+	  def_builtin_const ("__builtin_vis_fpcmpule16shl", CODE_FOR_fpcmpule16sishl,
+			     SPARC_BUILTIN_FPCMPULE16SHL, si_ftype_v4hi_v4hi_si);
+	  def_builtin_const ("__builtin_vis_fpcmpugt16shl", CODE_FOR_fpcmpugt16sishl,
+			     SPARC_BUILTIN_FPCMPUGT16SHL, si_ftype_v4hi_v4hi_si);
+
+	  def_builtin_const ("__builtin_vis_fpcmpule32shl", CODE_FOR_fpcmpule32sishl,
+			     SPARC_BUILTIN_FPCMPULE32SHL, si_ftype_v2si_v2si_si);
+	  def_builtin_const ("__builtin_vis_fpcmpugt32shl", CODE_FOR_fpcmpugt32sishl,
+			     SPARC_BUILTIN_FPCMPUGT32SHL, si_ftype_v2si_v2si_si);
+
+	  def_builtin_const ("__builtin_vis_fpcmpde8shl", CODE_FOR_fpcmpde8sishl,
+			     SPARC_BUILTIN_FPCMPDE8SHL, si_ftype_v8qi_v8qi_si);
+	  def_builtin_const ("__builtin_vis_fpcmpde16shl", CODE_FOR_fpcmpde16sishl,
+			     SPARC_BUILTIN_FPCMPDE16SHL, si_ftype_v4hi_v4hi_si);
+	  def_builtin_const ("__builtin_vis_fpcmpde32shl", CODE_FOR_fpcmpde32sishl,
+			     SPARC_BUILTIN_FPCMPDE32SHL, si_ftype_v2si_v2si_si);
+
+	  def_builtin_const ("__builtin_vis_fpcmpur8shl", CODE_FOR_fpcmpur8sishl,
+			     SPARC_BUILTIN_FPCMPUR8SHL, si_ftype_v8qi_v8qi_si);
+	  def_builtin_const ("__builtin_vis_fpcmpur16shl", CODE_FOR_fpcmpur16sishl,
+			     SPARC_BUILTIN_FPCMPUR16SHL, si_ftype_v4hi_v4hi_si);
+	  def_builtin_const ("__builtin_vis_fpcmpur32shl", CODE_FOR_fpcmpur32sishl,
+			     SPARC_BUILTIN_FPCMPUR32SHL, si_ftype_v2si_v2si_si);
+	}
+    }
 }

 /* Implement TARGET_BUILTIN_DECL hook.  */
@ -10948,6 +11218,19 @@ sparc_expand_builtin (tree exp, rtx target,
      insn_op = &insn_data[icode].operand[idx];
      op[arg_count] = expand_normal (arg);

+      /* Some of the builtins require constant arguments.  We check
+	 for this here.  */
+      if ((code >= SPARC_BUILTIN_FIRST_FPCMPSHL
+	   && code <= SPARC_BUILTIN_LAST_FPCMPSHL
+	   && arg_count == 3)
+	  || (code >= SPARC_BUILTIN_FIRST_DICTUNPACK
+	      && code <= SPARC_BUILTIN_LAST_DICTUNPACK
+	      && arg_count == 2))
+	{
+	  if (!check_constant_argument (icode, idx, op[arg_count]))
+	    return const0_rtx;
+	}
+
      if (code == SPARC_BUILTIN_LDFSR || code == SPARC_BUILTIN_STFSR)
 	{
 	  if (!address_operand (op[arg_count], SImode))
@ -11458,7 +11741,8 @@ sparc_register_move_cost (machine_mode mode ATTRIBUTE_UNUSED,
 	  || sparc_cpu == PROCESSOR_NIAGARA2
 	  || sparc_cpu == PROCESSOR_NIAGARA3
 	  || sparc_cpu == PROCESSOR_NIAGARA4
-	  || sparc_cpu == PROCESSOR_NIAGARA7)
+	  || sparc_cpu == PROCESSOR_NIAGARA7
+	  || sparc_cpu == PROCESSOR_M8)
 	return 12;

      return 6;
--- a/gcc/config/sparc/sparc.h
+++ b/gcc/config/sparc/sparc.h
@ -143,6 +143,7 @@ extern enum cmodel sparc_cmodel;
 #define TARGET_CPU_niagara3	15
 #define TARGET_CPU_niagara4	16
 #define TARGET_CPU_niagara7	19
+#define TARGET_CPU_m8		20

 #if TARGET_CPU_DEFAULT == TARGET_CPU_v9 \
 || TARGET_CPU_DEFAULT == TARGET_CPU_ultrasparc \
@ -151,7 +152,8 @@ extern enum cmodel sparc_cmodel;
 || TARGET_CPU_DEFAULT == TARGET_CPU_niagara2 \
 || TARGET_CPU_DEFAULT == TARGET_CPU_niagara3 \
 || TARGET_CPU_DEFAULT == TARGET_CPU_niagara4 \
- || TARGET_CPU_DEFAULT == TARGET_CPU_niagara7
+ || TARGET_CPU_DEFAULT == TARGET_CPU_niagara7 \
+ || TARGET_CPU_DEFAULT == TARGET_CPU_m8

 #define CPP_CPU32_DEFAULT_SPEC ""
 #define ASM_CPU32_DEFAULT_SPEC ""
@ -192,6 +194,10 @@ extern enum cmodel sparc_cmodel;
 #define CPP_CPU64_DEFAULT_SPEC "-D__sparc_v9__"
 #define ASM_CPU64_DEFAULT_SPEC AS_NIAGARA7_FLAG
 #endif
+#if TARGET_CPU_DEFAULT == TARGET_CPU_m8
+#define CPP_CPU64_DEFAULT_SPEC "-D__sparc_v9__"
+#define ASM_CPU64_DEFAULT_SPEC AS_M8_FLAG
+#endif

 #else

@ -295,6 +301,7 @@ extern enum cmodel sparc_cmodel;
 %{mcpu=niagara3:-D__sparc_v9__} \
 %{mcpu=niagara4:-D__sparc_v9__} \
 %{mcpu=niagara7:-D__sparc_v9__} \
+%{mcpu=m8:-D__sparc_v9__} \
 %{!mcpu*:%(cpp_cpu_default)} \
 "
 #define CPP_ARCH32_SPEC ""
@ -347,6 +354,7 @@ extern enum cmodel sparc_cmodel;
 %{mcpu=niagara3:%{!mv8plus:-Av9" AS_NIAGARA3_FLAG "}} \
 %{mcpu=niagara4:%{!mv8plus:" AS_NIAGARA4_FLAG "}} \
 %{mcpu=niagara7:%{!mv8plus:" AS_NIAGARA7_FLAG "}} \
+%{mcpu=m8:%{!mv8plus:" AS_M8_FLAG "}} \
 %{!mcpu*:%(asm_cpu_default)} \
 "

@ -1039,6 +1047,10 @@ extern char leaf_reg_remap[];
 /* Local macro to handle the two v9 classes of FP regs.  */
 #define FP_REG_CLASS_P(CLASS) ((CLASS) == FP_REGS || (CLASS) == EXTRA_FP_REGS)

+/* Predicate for 2-bit and 5-bit unsigned constants.  */
+#define SPARC_IMM2_P(X) (((unsigned HOST_WIDE_INT) (X) & ~0x3) == 0)
+#define SPARC_IMM5_P(X) (((unsigned HOST_WIDE_INT) (X) & ~0x1F)	== 0)
+
 /* Predicates for 5-bit, 10-bit, 11-bit and 13-bit signed constants.  */
 #define SPARC_SIMM5_P(X)  ((unsigned HOST_WIDE_INT) (X) + 0x10 < 0x20)
 #define SPARC_SIMM10_P(X) ((unsigned HOST_WIDE_INT) (X) + 0x200 < 0x400)
@ -1799,6 +1811,12 @@ extern int sparc_indent_opcode;
 #define AS_NIAGARA7_FLAG AS_NIAGARA4_FLAG
 #endif

+#ifdef HAVE_AS_SPARC6
+#define AS_M8_FLAG "-xarch=sparc6"
+#else
+#define AS_M8_FLAG AS_NIAGARA7_FLAG
+#endif
+
 #ifdef HAVE_AS_LEON
 #define AS_LEON_FLAG "-Aleon"
 #define AS_LEONV7_FLAG "-Aleon"
--- a/gcc/config/sparc/sparc.md
+++ b/gcc/config/sparc/sparc.md
@ -94,6 +94,12 @@
  UNSPEC_ADDV
  UNSPEC_SUBV
  UNSPEC_NEGV
+
+  UNSPEC_DICTUNPACK
+  UNSPEC_FPCMPSHL
+  UNSPEC_FPUCMPSHL
+  UNSPEC_FPCMPDESHL
+  UNSPEC_FPCMPURSHL
 ])

 (define_c_enum "unspecv" [
@ -238,7 +244,8 @@
   niagara2,
   niagara3,
   niagara4,
-   niagara7"
+   niagara7,
+   m8"
  (const (symbol_ref "sparc_cpu_attr")))

 ;; Attribute for the instruction set.
@ -251,7 +258,7 @@
 	 (symbol_ref "TARGET_SPARCLET") (const_string "sparclet")]
 	(const_string "v7"))))

-(define_attr "cpu_feature" "none,fpu,fpunotv9,v9,vis,vis3,vis4"
+(define_attr "cpu_feature" "none,fpu,fpunotv9,v9,vis,vis3,vis4,vis4b"
  (const_string "none"))

 (define_attr "lra" "disabled,enabled"
@ -265,10 +272,92 @@
         (eq_attr "cpu_feature" "v9") (symbol_ref "TARGET_V9")
         (eq_attr "cpu_feature" "vis") (symbol_ref "TARGET_VIS")
         (eq_attr "cpu_feature" "vis3") (symbol_ref "TARGET_VIS3")
-         (eq_attr "cpu_feature" "vis4") (symbol_ref "TARGET_VIS4")]
+         (eq_attr "cpu_feature" "vis4") (symbol_ref "TARGET_VIS4")
+         (eq_attr "cpu_feature" "vis4b") (symbol_ref "TARGET_VIS4B")]
        (const_int 0)))

-;; Insn type.
+;; The SPARC instructions used by the backend are organized into a
+;; hierarchy using the insn attributes "type" and "subtype".
+;;
+;; The mnemonics used in the list below are the architectural names
+;; used in the Oracle SPARC Architecture specs.  A / character
+;; separates the type from the subtype where appropriate.  For
+;; brevity, text enclosed in {} denotes alternatives, while text
+;; enclosed in [] is optional.
+;;
+;; Please keep this list updated.  It is of great help for keeping the
+;; correctness and coherence of the DFA schedulers.
+;;
+;; ialu:  <empty>
+;; ialuX: ADD[X]C SUB[X]C
+;; shift: SLL[X] SRL[X] SRA[X]
+;; cmove: MOV{A,N,NE,E,G,LE,GE,L,GU,LEU,CC,CS,POS,NEG,VC,VS}
+;;        MOVF{A,N,U,G,UG,L,UL,LG,NE,E,UE,GE,UGE,LE,ULE,O}
+;;        MOVR{Z,LEZ,LZ,NZ,GZ,GEZ}
+;; compare: ADDcc ADDCcc ANDcc ORcc SUBcc SUBCcc XORcc XNORcc
+;; imul: MULX SMUL[cc] UMUL UMULXHI XMULX XMULXHI
+;; idiv: UDIVX SDIVX
+;; flush: FLUSH
+;; load/regular: LD{UB,UH,UW} LDFSR
+;; load/prefetch: PREFETCH
+;; fpload: LDF LDDF LDQF
+;; sload: LD{SB,SH,SW}
+;; store: ST{B,H,W,X} STFSR
+;; fpstore: STF STDF STQF
+;; cbcond: CWB{NE,E,G,LE,GE,L,GU,LEU,CC,CS,POS,NEG,VC,VS}
+;;         CXB{NE,E,G,LE,GE,L,GU,LEU,CC,CS,POS,NEG,VC,VS}
+;; uncond_branch: BA BPA JMPL
+;; branch: B{NE,E,G,LE,GE,L,GU,LEU,CC,CS,POS,NEG,VC,VS}
+;;         BP{NE,E,G,LE,GE,L,GU,LEU,CC,CS,POS,NEG,VC,VS}
+;;         FB{U,G,UG,L,UL,LG,NE,BE,UE,GE,UGE,LE,ULE,O}
+;; call: CALL
+;; return: RESTORE RETURN
+;; fpmove: FABS{s,d,q} FMOV{s,d,q} FNEG{s,d,q}
+;; fpcmove: FMOV{S,D,Q}{icc,xcc,fcc}
+;; fpcrmove: FMOVR{s,d,q}{Z,LEZ,LZ,NZ,GZ,GEZ}
+;; fp: FADD{s,d,q} FSUB{s,d,q} FHSUB{s,d} FNHADD{s,d} FNADD{s,d}
+;;     FiTO{s,d,q} FsTO{i,x,d,q} FdTO{i,x,s,q} FxTO{d,s,q} FqTO{i,x,s,d}
+;; fpcmp: FCMP{s,d,q} FCMPE{s,d,q}
+;; fpmul: FMADD{s,d}  FMSUB{s,d} FMUL{s,d,q} FNMADD{s,d}
+;;        FNMSUB{s,d} FNMUL{s,d} FNsMULd FsMULd
+;;        FdMULq
+;; array: ARRAY{8,16,32}
+;; bmask: BMASK
+;; edge: EDGE{8,16,32}[L]cc
+;; edgen: EDGE{8,16,32}[L]n
+;; fpdivs: FDIV{s,q}
+;; fpsqrts: FSQRT{s,q}
+;; fpdivd: FDIVd
+;; fpsqrtd: FSQRTd
+;; lzd: LZCNT
+;; fga/addsub64: FP{ADD,SUB}64
+;; fga/fpu: FCHKSM16 FEXPANd FMEAN16 FPMERGE
+;;          FS{LL,RA,RL}{16,32}
+;; fga/maxmin: FP{MAX,MIN}[U]{8,16,32}
+;; fga/cmask: CMASK{8,16,32}
+;; fga/other: BSHUFFLE FALIGNDATAg FP{ADD,SUB}[S]{8,16,32}
+;;            FP{ADD,SUB}US{8,16} DICTUNPACK
+;; gsr/reg: RDGSR WRGSR
+;; gsr/alignaddr: ALIGNADDRESS[_LITTLE]
+;; vismv/double:  FSRC2d
+;; vismv/single:  MOVwTOs FSRC2s
+;; vismv/movstouw: MOVsTOuw
+;; vismv/movxtod: MOVxTOd
+;; vismv/movdtox: MOVdTOx
+;; visl/single: F{AND,NAND,NOR,OR,NOT1}s
+;;              F{AND,OR}NOT{1,2}s
+;;              FONEs F{ZERO,XNOR,XOR}s FNOT2s
+;; visl/double: FONEd FZEROd FNOT1d F{OR,AND,XOR}d F{NOR,NAND,XNOR}d
+;;              F{OR,AND}NOT1d F{OR,AND}NOT2d
+;; viscmp: FPCMP{LE,GT,NE,EQ}{8,16,32} FPCMPU{LE,GT,NE,EQ}{8,16,32}
+;;         FPCMP{LE,GT,EQ,NE}{8,16,32}SHL FPCMPU{LE,GT,EQ,NE}{8,16,32}SHL
+;;         FPCMPDE{8,16,32}SHL FPCMPUR{8,16,32}SHL
+;; fgm_pack: FPACKFIX FPACK{8,16,32}
+;; fgm_mul: FMUL8SUx16 FMUL8ULx16 FMUL8x16 FMUL8x16AL
+;;          FMUL8x16AU FMULD8SUx16 FMULD8ULx16
+;; pdist: PDIST
+;; pdistn: PDISTN
+
 (define_attr "type"
  "ialu,compare,shift,
   load,sload,store,
@ -281,12 +370,20 @@
   fpcmp,
   fpmul,fpdivs,fpdivd,
   fpsqrts,fpsqrtd,
-   fga,visl,vismv,fgm_pack,fgm_mul,pdist,pdistn,edge,edgen,gsr,array,
+   fga,visl,vismv,viscmp,
+   fgm_pack,fgm_mul,pdist,pdistn,edge,edgen,gsr,array,bmask,
   cmove,
   ialuX,
   multi,savew,flushw,iflush,trap,lzd"
  (const_string "ialu"))

+(define_attr "subtype"
+  "single,double,movstouw,movxtod,movdtox,
+   addsub64,cmask,fpu,maxmin,other,
+   reg,alignaddr,
+   prefetch,regular"
+  (const_string "single"))
+
 ;; True if branch/call has empty delay slot and will emit a nop in it
 (define_attr "empty_delay_slot" "false,true"
  (symbol_ref "(empty_delay_slot (insn)
@ -487,9 +584,6 @@
 	   (const_string "true")
 	] (const_string "false")))

-;; True if the instruction executes in the V3 pipeline, in M7 and later processors.
-(define_attr "v3pipe" "false,true" (const_string "false"))
-
 (define_delay (eq_attr "type" "call")
  [(eq_attr "in_call_delay" "true") (nil) (nil)])

@ -519,6 +613,7 @@
 (include "niagara2.md")
 (include "niagara4.md")
 (include "niagara7.md")
+(include "m8.md")


 ;; Operand and operator predicates and constraints
@ -1507,6 +1602,7 @@
   ldub\t%1, %0
   stb\t%r1, %0"
  [(set_attr "type" "*,load,store")
+   (set_attr "subtype" "*,regular,*")
   (set_attr "us3load_type" "*,3cycle,*")])

 (define_expand "movhi"
@ -1529,6 +1625,7 @@
   lduh\t%1, %0
   sth\t%r1, %0"
  [(set_attr "type" "*,*,load,store")
+   (set_attr "subtype" "*,*,regular,*")
   (set_attr "us3load_type" "*,*,3cycle,*")])

 ;; We always work with constants here.
@ -1566,8 +1663,8 @@
   fzeros\t%0
   fones\t%0"
  [(set_attr "type" "*,*,load,store,vismv,vismv,fpmove,fpload,fpstore,visl,visl")
-   (set_attr "cpu_feature" "*,*,*,*,vis3,vis3,*,*,*,vis,vis")
-   (set_attr "v3pipe" "*,*,*,*,true,true,*,*,*,true,true")])
+   (set_attr "subtype" "*,*,regular,*,movstouw,single,*,*,*,single,single")
+   (set_attr "cpu_feature" "*,*,*,*,vis3,vis3,*,*,*,vis,vis")])

 (define_insn "*movsi_lo_sum"
  [(set (match_operand:SI 0 "register_operand" "=r")
@ -1624,7 +1721,8 @@
  return "ld\t[%1 + %2], %0";
 #endif
 }
-  [(set_attr "type" "load")])
+  [(set_attr "type" "load")
+   (set_attr "subtype" "regular")])

 (define_expand "movsi_pic_label_ref"
  [(set (match_dup 3) (high:SI
@ -1733,11 +1831,12 @@
   std\t%1, %0
   fzero\t%0
   fone\t%0"
-  [(set_attr "type" "store,*,load,store,load,store,*,*,fpload,fpstore,*,*,fpmove,*,*,*,fpload,fpstore,visl,visl")
+  [(set_attr "type" "store,*,load,store,load,store,*,*,fpload,fpstore,*,*,fpmove,*,*,*,fpload,fpstore,visl,
+visl")
+   (set_attr "subtype" "*,*,regular,*,regular,*,*,*,*,*,*,*,*,*,*,*,*,*,double,double")
   (set_attr "length" "*,2,*,*,*,*,2,2,*,*,2,2,*,2,2,2,*,*,*,*")
   (set_attr "fptype" "*,*,*,*,*,*,*,*,*,*,*,*,double,*,*,*,*,*,double,double")
   (set_attr "cpu_feature" "v9,*,*,*,*,*,*,*,fpu,fpu,fpu,fpu,v9,fpunotv9,vis3,vis3,fpu,fpu,vis,vis")
-   (set_attr "v3pipe" "*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,true,true")
   (set_attr "lra" "*,*,disabled,disabled,*,*,*,*,*,*,*,*,*,*,*,*,*,*,*,*")])

 (define_insn "*movdi_insn_sp64"
@ -1759,9 +1858,9 @@
   fzero\t%0
   fone\t%0"
  [(set_attr "type" "*,*,load,store,vismv,vismv,fpmove,fpload,fpstore,visl,visl")
+   (set_attr "subtype" "*,*,regular,*,movdtox,movxtod,*,*,*,double,double")
   (set_attr "fptype" "*,*,*,*,*,*,double,*,*,double,double")
-   (set_attr "cpu_feature" "*,*,*,*,vis3,vis3,*,*,*,vis,vis")
-   (set_attr "v3pipe" "*,*,*,*,*,*,*,*,*,true,true")])
+   (set_attr "cpu_feature" "*,*,*,*,vis3,vis3,*,*,*,vis,vis")])

 (define_expand "movdi_pic_label_ref"
  [(set (match_dup 3) (high:DI
@ -1847,7 +1946,8 @@
  return "ldx\t[%1 + %2], %0";
 #endif
 }
-  [(set_attr "type" "load")])
+  [(set_attr "type" "load")
+   (set_attr "subtype" "regular")])

 (define_insn "*sethi_di_medlow_embmedany_pic"
  [(set (match_operand:DI 0 "register_operand" "=r")
@ -2289,8 +2389,8 @@
    }
 }
  [(set_attr "type" "visl,visl,fpmove,*,*,*,vismv,vismv,fpload,load,fpstore,store")
-   (set_attr "cpu_feature" "vis,vis,fpu,*,*,*,vis3,vis3,fpu,*,fpu,*")
-   (set_attr "v3pipe" "true,true,*,*,*,*,true,true,*,*,*,*")])
+   (set_attr "subtype" "single,single,*,*,*,*,movstouw,single,*,regular,*,*")
+   (set_attr "cpu_feature" "vis,vis,fpu,*,*,*,vis3,vis3,fpu,*,fpu,*")])

 ;; The following 3 patterns build SFmode constants in integer registers.

@ -2362,10 +2462,10 @@
  ldd\t%1, %0
  std\t%1, %0"
  [(set_attr "type" "store,*,visl,visl,fpmove,*,*,*,fpload,fpstore,load,store,*,*,*,load,store")
+   (set_attr "subtype" "*,*,double,double,*,*,*,*,*,*,regular,*,*,*,*,regular,*")
   (set_attr "length" "*,2,*,*,*,2,2,2,*,*,*,*,2,2,2,*,*")
   (set_attr "fptype" "*,*,double,double,double,*,*,*,*,*,*,*,*,*,*,*,*")
   (set_attr "cpu_feature" "v9,*,vis,vis,v9,fpunotv9,vis3,vis3,fpu,fpu,*,*,fpu,fpu,*,*,*")
-   (set_attr "v3pipe" "*,*,true,true,*,*,*,*,*,*,*,*,*,*,*,*,*")
   (set_attr "lra" "*,*,*,*,*,*,*,*,*,*,disabled,disabled,*,*,*,*,*")])

 (define_insn "*movdf_insn_sp64"
@ -2387,10 +2487,10 @@
  stx\t%r1, %0
  #"
  [(set_attr "type" "visl,visl,fpmove,vismv,vismv,load,store,*,load,store,*")
+   (set_attr "subtype" "double,double,*,movdtox,movxtod,regular,*,*,regular,*,*")
   (set_attr "length" "*,*,*,*,*,*,*,*,*,*,2")
   (set_attr "fptype" "double,double,double,double,double,*,*,*,*,*,*")
-   (set_attr "cpu_feature" "vis,vis,fpu,vis3,vis3,fpu,fpu,*,*,*,*")
-   (set_attr "v3pipe" "true,true,*,*,*,*,*,*,*,*,*")])
+   (set_attr "cpu_feature" "vis,vis,fpu,vis3,vis3,fpu,fpu,*,*,*,*")])

 ;; This pattern builds DFmode constants in integer registers.
 (define_split
@ -2916,6 +3016,7 @@
  ""
  "lduh\t%1, %0"
  [(set_attr "type" "load")
+   (set_attr "subtype" "regular")
   (set_attr "us3load_type" "3cycle")])

 (define_expand "zero_extendqihi2"
@ -2932,6 +3033,7 @@
   and\t%1, 0xff, %0
   ldub\t%1, %0"
  [(set_attr "type" "*,load")
+   (set_attr "subtype" "*,regular")
   (set_attr "us3load_type" "*,3cycle")])

 (define_expand "zero_extendqisi2"
@ -2948,6 +3050,7 @@
   and\t%1, 0xff, %0
   ldub\t%1, %0"
  [(set_attr "type" "*,load")
+   (set_attr "subtype" "*,regular")
   (set_attr "us3load_type" "*,3cycle")])

 (define_expand "zero_extendqidi2"
@ -2964,6 +3067,7 @@
   and\t%1, 0xff, %0
   ldub\t%1, %0"
  [(set_attr "type" "*,load")
+   (set_attr "subtype" "*,regular")
   (set_attr "us3load_type" "*,3cycle")])

 (define_expand "zero_extendhidi2"
@ -2995,6 +3099,7 @@
  "TARGET_ARCH64"
  "lduh\t%1, %0"
  [(set_attr "type" "load")
+   (set_attr "subtype" "regular")
   (set_attr "us3load_type" "3cycle")])

 ;; ??? Write truncdisi pattern using sra?
@ -3015,8 +3120,9 @@
   lduw\t%1, %0
   movstouw\t%1, %0"
  [(set_attr "type" "shift,load,*")
-   (set_attr "cpu_feature" "*,*,vis3")
-   (set_attr "v3pipe" "*,*,true")])
+   (set_attr "subtype" "*,regular,movstouw")
+   (set_attr "cpu_feature" "*,*,vis3")])
+

 (define_insn_and_split "*zero_extendsidi2_insn_sp32"
  [(set (match_operand:DI 0 "register_operand" "=r")
@ -3331,8 +3437,7 @@
  movstosw\t%1, %0"
  [(set_attr "type" "shift,sload,*")
   (set_attr "us3load_type" "*,3cycle,*")
-   (set_attr "cpu_feature" "*,*,vis3")
-   (set_attr "v3pipe" "*,*,true")])
+   (set_attr "cpu_feature" "*,*,vis3")])


 ;; Special pattern for optimizing bit-field compares.  This is needed
@ -7356,7 +7461,8 @@
  [(unspec_volatile [(match_operand:SI 0 "memory_operand" "m")] UNSPECV_LDFSR)]
  "TARGET_FPU"
  "ld\t%0, %%fsr"
-  [(set_attr "type" "load")])
+  [(set_attr "type" "load")
+   (set_attr "subtype" "regular")])

 (define_insn "stfsr"
  [(set (match_operand:SI 0 "memory_operand" "=m")
@ -7720,7 +7826,8 @@
  gcc_assert (locality >= 0 && locality < 4);
  return prefetch_instr [read_or_write][locality == 0 ? 0 : 1];
 }
-  [(set_attr "type" "load")])
+  [(set_attr "type" "load")
+   (set_attr "subtype" "prefetch")])

 (define_insn "prefetch_32"
  [(prefetch (match_operand:SI 0 "address_operand" "p")
@ -7745,7 +7852,8 @@
  gcc_assert (locality >= 0 && locality < 4);
  return prefetch_instr [read_or_write][locality == 0 ? 0 : 1];
 }
-  [(set_attr "type" "load")])
+  [(set_attr "type" "load")
+   (set_attr "subtype" "prefetch")])


 ;; Trap instructions.
@ -7966,7 +8074,8 @@
 		   UNSPEC_TLSIE))]
  "TARGET_TLS && TARGET_ARCH32"
  "ld\\t[%1 + %2], %0, %%tie_ld(%a3)"
-  [(set_attr "type" "load")])
+  [(set_attr "type" "load")
+   (set_attr "subtype" "regular")])

 (define_insn "tie_ld64"
  [(set (match_operand:DI 0 "register_operand" "=r")
@ -7976,7 +8085,8 @@
 		   UNSPEC_TLSIE))]
  "TARGET_TLS && TARGET_ARCH64"
  "ldx\\t[%1 + %2], %0, %%tie_ldx(%a3)"
-  [(set_attr "type" "load")])
+  [(set_attr "type" "load")
+   (set_attr "subtype" "regular")])

 (define_insn "tie_add32"
  [(set (match_operand:SI 0 "register_operand" "=r")
@ -8036,6 +8146,7 @@
  "TARGET_TLS && TARGET_ARCH32"
  "ldub\t[%1 + %2], %0, %%tldo_add(%3)"
  [(set_attr "type" "load")
+   (set_attr "subtype" "regular")
   (set_attr "us3load_type" "3cycle")])

 (define_insn "*tldo_ldub1_sp32"
@ -8048,6 +8159,7 @@
  "TARGET_TLS && TARGET_ARCH32"
  "ldub\t[%1 + %2], %0, %%tldo_add(%3)"
  [(set_attr "type" "load")
+   (set_attr "subtype" "regular")
   (set_attr "us3load_type" "3cycle")])

 (define_insn "*tldo_ldub2_sp32"
@ -8060,6 +8172,7 @@
  "TARGET_TLS && TARGET_ARCH32"
  "ldub\t[%1 + %2], %0, %%tldo_add(%3)"
  [(set_attr "type" "load")
+   (set_attr "subtype" "regular")
   (set_attr "us3load_type" "3cycle")])

 (define_insn "*tldo_ldsb1_sp32"
@ -8095,6 +8208,7 @@
  "TARGET_TLS && TARGET_ARCH64"
  "ldub\t[%1 + %2], %0, %%tldo_add(%3)"
  [(set_attr "type" "load")
+   (set_attr "subtype" "regular")
   (set_attr "us3load_type" "3cycle")])

 (define_insn "*tldo_ldub1_sp64"
@ -8107,6 +8221,7 @@
  "TARGET_TLS && TARGET_ARCH64"
  "ldub\t[%1 + %2], %0, %%tldo_add(%3)"
  [(set_attr "type" "load")
+   (set_attr "subtype" "regular")
   (set_attr "us3load_type" "3cycle")])

 (define_insn "*tldo_ldub2_sp64"
@ -8119,6 +8234,7 @@
  "TARGET_TLS && TARGET_ARCH64"
  "ldub\t[%1 + %2], %0, %%tldo_add(%3)"
  [(set_attr "type" "load")
+   (set_attr "subtype" "regular")
   (set_attr "us3load_type" "3cycle")])

 (define_insn "*tldo_ldub3_sp64"
@ -8131,6 +8247,7 @@
  "TARGET_TLS && TARGET_ARCH64"
  "ldub\t[%1 + %2], %0, %%tldo_add(%3)"
  [(set_attr "type" "load")
+   (set_attr "subtype" "regular")
   (set_attr "us3load_type" "3cycle")])

 (define_insn "*tldo_ldsb1_sp64"
@ -8178,6 +8295,7 @@
  "TARGET_TLS && TARGET_ARCH32"
  "lduh\t[%1 + %2], %0, %%tldo_add(%3)"
  [(set_attr "type" "load")
+   (set_attr "subtype" "regular")
   (set_attr "us3load_type" "3cycle")])

 (define_insn "*tldo_lduh1_sp32"
@ -8190,6 +8308,7 @@
  "TARGET_TLS && TARGET_ARCH32"
  "lduh\t[%1 + %2], %0, %%tldo_add(%3)"
  [(set_attr "type" "load")
+   (set_attr "subtype" "regular")
   (set_attr "us3load_type" "3cycle")])

 (define_insn "*tldo_ldsh1_sp32"
@ -8213,6 +8332,7 @@
  "TARGET_TLS && TARGET_ARCH64"
  "lduh\t[%1 + %2], %0, %%tldo_add(%3)"
  [(set_attr "type" "load")
+   (set_attr "subtype" "regular")
   (set_attr "us3load_type" "3cycle")])

 (define_insn "*tldo_lduh1_sp64"
@ -8225,6 +8345,7 @@
  "TARGET_TLS && TARGET_ARCH64"
  "lduh\t[%1 + %2], %0, %%tldo_add(%3)"
  [(set_attr "type" "load")
+   (set_attr "subtype" "regular")
   (set_attr "us3load_type" "3cycle")])

 (define_insn "*tldo_lduh2_sp64"
@ -8237,6 +8358,7 @@
  "TARGET_TLS && TARGET_ARCH64"
  "lduh\t[%1 + %2], %0, %%tldo_add(%3)"
  [(set_attr "type" "load")
+   (set_attr "subtype" "regular")
   (set_attr "us3load_type" "3cycle")])

 (define_insn "*tldo_ldsh1_sp64"
@ -8271,7 +8393,8 @@
 			 (match_operand:SI 1 "register_operand" "r"))))]
  "TARGET_TLS && TARGET_ARCH32"
  "ld\t[%1 + %2], %0, %%tldo_add(%3)"
-  [(set_attr "type" "load")])
+  [(set_attr "type" "load")
+   (set_attr "subtype" "regular")])

 (define_insn "*tldo_lduw_sp64"
  [(set (match_operand:SI 0 "register_operand" "=r")
@ -8281,7 +8404,8 @@
 			 (match_operand:DI 1 "register_operand" "r"))))]
  "TARGET_TLS && TARGET_ARCH64"
  "lduw\t[%1 + %2], %0, %%tldo_add(%3)"
-  [(set_attr "type" "load")])
+  [(set_attr "type" "load")
+   (set_attr "subtype" "regular")])

 (define_insn "*tldo_lduw1_sp64"
  [(set (match_operand:DI 0 "register_operand" "=r")
@ -8292,7 +8416,8 @@
 			   (match_operand:DI 1 "register_operand" "r")))))]
  "TARGET_TLS && TARGET_ARCH64"
  "lduw\t[%1 + %2], %0, %%tldo_add(%3)"
-  [(set_attr "type" "load")])
+  [(set_attr "type" "load")
+   (set_attr "subtype" "regular")])

 (define_insn "*tldo_ldsw1_sp64"
  [(set (match_operand:DI 0 "register_operand" "=r")
@ -8314,7 +8439,8 @@
 			 (match_operand:DI 1 "register_operand" "r"))))]
  "TARGET_TLS && TARGET_ARCH64"
  "ldx\t[%1 + %2], %0, %%tldo_add(%3)"
-  [(set_attr "type" "load")])
+  [(set_attr "type" "load")
+   (set_attr "subtype" "regular")])

 (define_insn "*tldo_stb_sp32"
  [(set (mem:QI (plus:SI (unspec:SI [(match_operand:SI 2 "register_operand" "r")
@ -8519,8 +8645,8 @@
  movstouw\t%1, %0
  movwtos\t%1, %0"
  [(set_attr "type" "visl,visl,vismv,fpload,fpstore,store,load,store,*,vismv,vismv")
-   (set_attr "cpu_feature" "vis,vis,vis,*,*,*,*,*,*,vis3,vis3")
-   (set_attr "v3pipe" "true,true,true,*,*,*,*,*,*,true,true")])
+   (set_attr "subtype" "single,single,single,*,*,*,regular,*,*,movstouw,single")
+   (set_attr "cpu_feature" "vis,vis,vis,*,*,*,*,*,*,vis3,vis3")])

 (define_insn "*mov<VM64:mode>_insn_sp64"
  [(set (match_operand:VM64 0 "nonimmediate_operand" "=e,e,e,e,W,m,*r, m,*r, e,*r")
@ -8542,8 +8668,8 @@
  movxtod\t%1, %0
  mov\t%1, %0"
  [(set_attr "type" "visl,visl,vismv,fpload,fpstore,store,load,store,vismv,vismv,*")
-   (set_attr "cpu_feature" "vis,vis,vis,*,*,*,*,*,vis3,vis3,*")
-   (set_attr "v3pipe" "true,true,true,*,*,*,*,*,*,*,*")])
+   (set_attr "subtype" "double,double,double,*,*,*,regular,*,movdtox,movxtod,*")
+   (set_attr "cpu_feature" "vis,vis,vis,*,*,*,*,*,vis3,vis3,*")])

 (define_insn "*mov<VM64:mode>_insn_sp32"
  [(set (match_operand:VM64 0 "nonimmediate_operand"
@ -8572,9 +8698,9 @@
  ldd\t%1, %0
  std\t%1, %0"
  [(set_attr "type" "store,*,visl,visl,vismv,*,*,fpload,fpstore,load,store,*,*,*,load,store")
+   (set_attr "subtype" "*,*,double,double,double,*,*,*,*,regular,*,*,*,*,regular,*")
   (set_attr "length" "*,2,*,*,*,2,2,*,*,*,*,2,2,2,*,*")
   (set_attr "cpu_feature" "*,*,vis,vis,vis,vis3,vis3,*,*,*,*,*,*,*,*,*")
-   (set_attr "v3pipe" "*,*,true,true,true,*,*,*,*,*,*,*,*,*,*,*")
   (set_attr "lra" "*,*,*,*,*,*,*,*,*,disabled,disabled,*,*,*,*,*")])

 (define_split
@ -8652,8 +8778,8 @@
  "TARGET_VIS"
  "fp<plusminus_insn><vbits>\t%1, %2, %0"
  [(set_attr "type" "fga")
-   (set_attr "fptype" "<vfptype>")
-   (set_attr "v3pipe" "true")])
+   (set_attr "subtype" "other")
+   (set_attr "fptype" "<vfptype>")])

 (define_mode_iterator VL [V1SI V2HI V4QI V1DI V2SI V4HI V8QI])
 (define_mode_attr vlsuf [(V1SI "s") (V2HI "s") (V4QI "s")
@ -8669,8 +8795,7 @@
  "TARGET_VIS"
  "f<vlinsn><vlsuf>\t%1, %2, %0"
  [(set_attr "type" "visl")
-   (set_attr "fptype" "<vfptype>")
-   (set_attr "v3pipe" "true")])
+   (set_attr "fptype" "<vfptype>")])

 (define_insn "*not_<vlop:code><VL:mode>3"
  [(set (match_operand:VL 0 "register_operand" "=<vconstr>")
@ -8679,8 +8804,7 @@
  "TARGET_VIS"
  "f<vlninsn><vlsuf>\t%1, %2, %0"
  [(set_attr "type" "visl")
-   (set_attr "fptype" "<vfptype>")
-   (set_attr "v3pipe" "true")])
+   (set_attr "fptype" "<vfptype>")])

 ;; (ior (not (op1)) (not (op2))) is the canonical form of NAND.
 (define_insn "*nand<VL:mode>_vis"
@ -8690,8 +8814,7 @@
  "TARGET_VIS"
  "fnand<vlsuf>\t%1, %2, %0"
  [(set_attr "type" "visl")
-   (set_attr "fptype" "<vfptype>")
-   (set_attr "v3pipe" "true")])
+   (set_attr "fptype" "<vfptype>")])

 (define_code_iterator vlnotop [ior and])

@ -8702,8 +8825,7 @@
  "TARGET_VIS"
  "f<vlinsn>not1<vlsuf>\t%1, %2, %0"
  [(set_attr "type" "visl")
-   (set_attr "fptype" "<vfptype>")
-   (set_attr "v3pipe" "true")])
+   (set_attr "fptype" "<vfptype>")])

 (define_insn "*<vlnotop:code>_not2<VL:mode>_vis"
  [(set (match_operand:VL 0 "register_operand" "=<vconstr>")
@ -8712,8 +8834,7 @@
  "TARGET_VIS"
  "f<vlinsn>not2<vlsuf>\t%1, %2, %0"
  [(set_attr "type" "visl")
-   (set_attr "fptype" "<vfptype>")
-   (set_attr "v3pipe" "true")])
+   (set_attr "fptype" "<vfptype>")])

 (define_insn "one_cmpl<VL:mode>2"
  [(set (match_operand:VL 0 "register_operand" "=<vconstr>")
@ -8721,8 +8842,7 @@
  "TARGET_VIS"
  "fnot1<vlsuf>\t%1, %0"
  [(set_attr "type" "visl")
-   (set_attr "fptype" "<vfptype>")
-   (set_attr "v3pipe" "true")])
+   (set_attr "fptype" "<vfptype>")])

 ;; Hard to generate VIS instructions.  We have builtins for these.

@ -8764,6 +8884,7 @@
 "TARGET_VIS"
 "fexpand\t%1, %0"
 [(set_attr "type" "fga")
+  (set_attr "subtype" "fpu")
  (set_attr "fptype" "double")])

 (define_insn "fpmerge_vis"
@ -8778,6 +8899,7 @@
 "TARGET_VIS"
 "fpmerge\t%1, %2, %0"
 [(set_attr "type" "fga")
+  (set_attr "subtype" "fpu")
  (set_attr "fptype" "double")])

 ;; Partitioned multiply instructions
@ -8866,7 +8988,8 @@
  [(set (reg:DI GSR_REG) (match_operand:DI 0 "arith_operand" "rI"))]
  "TARGET_VIS && TARGET_ARCH64"
  "wr\t%%g0, %0, %%gsr"
-  [(set_attr "type" "gsr")])
+  [(set_attr "type" "gsr")
+   (set_attr "subtype" "reg")])

 (define_insn "wrgsr_v8plus"
  [(set (reg:DI GSR_REG) (match_operand:DI 0 "arith_operand" "I,r"))
@ -8897,7 +9020,8 @@
  [(set (match_operand:DI 0 "register_operand" "=r") (reg:DI GSR_REG))]
  "TARGET_VIS && TARGET_ARCH64"
  "rd\t%%gsr, %0"
-  [(set_attr "type" "gsr")])
+  [(set_attr "type" "gsr")
+   (set_attr "subtype" "reg")])

 (define_insn "rdgsr_v8plus"
  [(set (match_operand:DI 0 "register_operand" "=r") (reg:DI GSR_REG))
@ -8920,8 +9044,8 @@
  "TARGET_VIS"
  "faligndata\t%1, %2, %0"
  [(set_attr "type" "fga")
-   (set_attr "fptype" "double")
-   (set_attr "v3pipe" "true")])
+   (set_attr "subtype" "other")
+   (set_attr "fptype" "double")])

 (define_insn "alignaddrsi_vis"
  [(set (match_operand:SI 0 "register_operand" "=r")
@ -8932,7 +9056,7 @@
  "TARGET_VIS"
  "alignaddr\t%r1, %r2, %0"
  [(set_attr "type" "gsr")
-   (set_attr "v3pipe" "true")])
+   (set_attr "subtype" "alignaddr")])

 (define_insn "alignaddrdi_vis"
  [(set (match_operand:DI 0 "register_operand" "=r")
@ -8943,7 +9067,7 @@
  "TARGET_VIS"
  "alignaddr\t%r1, %r2, %0"
  [(set_attr "type" "gsr")
-   (set_attr "v3pipe" "true")])
+   (set_attr "subtype" "alignaddr")])

 (define_insn "alignaddrlsi_vis"
  [(set (match_operand:SI 0 "register_operand" "=r")
@ -8955,7 +9079,7 @@
  "TARGET_VIS"
  "alignaddrl\t%r1, %r2, %0"
  [(set_attr "type" "gsr")
-   (set_attr "v3pipe" "true")])
+   (set_attr "subtype" "alignaddr")])

 (define_insn "alignaddrldi_vis"
  [(set (match_operand:DI 0 "register_operand" "=r")
@ -8967,7 +9091,7 @@
  "TARGET_VIS"
  "alignaddrl\t%r1, %r2, %0"
  [(set_attr "type" "gsr")
-   (set_attr "v3pipe" "true")])
+   (set_attr "subtype" "alignaddr")])

 (define_insn "pdist_vis"
  [(set (match_operand:DI 0 "register_operand" "=e")
@ -9059,9 +9183,7 @@
 	 UNSPEC_FCMP))]
  "TARGET_VIS"
  "fcmp<gcond:code><GCM:gcm_name>\t%1, %2, %0"
-  [(set_attr "type" "visl")
-   (set_attr "fptype" "double")
-   (set_attr "v3pipe" "true")])
+  [(set_attr "type" "viscmp")])

 (define_insn "fpcmp<gcond:code>8<P:mode>_vis"
  [(set (match_operand:P 0 "register_operand" "=r")
@ -9070,8 +9192,7 @@
 	 UNSPEC_FCMP))]
  "TARGET_VIS4"
  "fpcmp<gcond:code>8\t%1, %2, %0"
-  [(set_attr "type" "visl")
-   (set_attr "fptype" "double")])
+  [(set_attr "type" "viscmp")])

 (define_expand "vcond<GCM:mode><GCM:mode>"
  [(match_operand:GCM 0 "register_operand" "")
@ -9134,8 +9255,7 @@
        (plus:DI (match_dup 1) (match_dup 2)))]
  "TARGET_VIS2 && TARGET_ARCH64"
  "bmask\t%r1, %r2, %0"
-  [(set_attr "type" "array")
-   (set_attr "v3pipe" "true")])
+  [(set_attr "type" "bmask")])

 (define_insn "bmasksi_vis"
  [(set (match_operand:SI 0 "register_operand" "=r")
@ -9145,8 +9265,7 @@
        (zero_extend:DI (plus:SI (match_dup 1) (match_dup 2))))]
  "TARGET_VIS2"
  "bmask\t%r1, %r2, %0"
-  [(set_attr "type" "array")
-   (set_attr "v3pipe" "true")])
+  [(set_attr "type" "bmask")])

 (define_insn "bshuffle<VM64:mode>_vis"
  [(set (match_operand:VM64 0 "register_operand" "=e")
@ -9157,8 +9276,8 @@
  "TARGET_VIS2"
  "bshuffle\t%1, %2, %0"
  [(set_attr "type" "fga")
-   (set_attr "fptype" "double")
-   (set_attr "v3pipe" "true")])
+   (set_attr "subtype" "other")
+   (set_attr "fptype" "double")])

 ;; The rtl expanders will happily convert constant permutations on other
 ;; modes down to V8QI.  Rely on this to avoid the complexity of the byte
@ -9261,7 +9380,7 @@
  "TARGET_VIS3"
  "cmask8\t%r0"
  [(set_attr "type" "fga")
-   (set_attr "v3pipe" "true")])
+   (set_attr "subtype" "cmask")])

 (define_insn "cmask16<P:mode>_vis"
  [(set (reg:DI GSR_REG)
@ -9271,7 +9390,7 @@
  "TARGET_VIS3"
  "cmask16\t%r0"
  [(set_attr "type" "fga")
-   (set_attr "v3pipe" "true")])
+   (set_attr "subtype" "cmask")])

 (define_insn "cmask32<P:mode>_vis"
  [(set (reg:DI GSR_REG)
@ -9281,7 +9400,7 @@
  "TARGET_VIS3"
  "cmask32\t%r0"
  [(set_attr "type" "fga")
-   (set_attr "v3pipe" "true")])
+   (set_attr "subtype" "cmask")])

 (define_insn "fchksm16_vis"
  [(set (match_operand:V4HI 0 "register_operand" "=e")
@ -9290,7 +9409,8 @@
                     UNSPEC_FCHKSM16))]
  "TARGET_VIS3"
  "fchksm16\t%1, %2, %0"
-  [(set_attr "type" "fga")])
+  [(set_attr "type" "fga")
+   (set_attr "subtype" "fpu")])

 (define_code_iterator vis3_shift [ashift ss_ashift lshiftrt ashiftrt])
 (define_code_attr vis3_shift_insn
@ -9304,7 +9424,8 @@
 			(match_operand:GCM 2 "register_operand" "<vconstr>")))]
  "TARGET_VIS3"
  "<vis3_shift_insn><vbits>\t%1, %2, %0"
-  [(set_attr "type" "fga")])
+  [(set_attr "type" "fga")
+   (set_attr "subtype" "fpu")])

 (define_insn "pdistn<P:mode>_vis"
  [(set (match_operand:P 0 "register_operand" "=r")
@ -9314,8 +9435,7 @@
  "TARGET_VIS3"
  "pdistn\t%1, %2, %0"
  [(set_attr "type" "pdistn")
-   (set_attr "fptype" "double")
-   (set_attr "v3pipe" "true")])
+   (set_attr "fptype" "double")])

 (define_insn "fmean16_vis"
  [(set (match_operand:V4HI 0 "register_operand" "=e")
@ -9332,7 +9452,8 @@
          (const_int 1))))]
  "TARGET_VIS3"
  "fmean16\t%1, %2, %0"
-  [(set_attr "type" "fga")])
+  [(set_attr "type" "fga")
+   (set_attr "subtype" "fpu")])

 (define_insn "fp<plusminus_insn>64_vis"
  [(set (match_operand:V1DI 0 "register_operand" "=e")
@ -9340,7 +9461,8 @@
 			(match_operand:V1DI 2 "register_operand" "e")))]
  "TARGET_VIS3"
  "fp<plusminus_insn>64\t%1, %2, %0"
-  [(set_attr "type" "fga")])
+  [(set_attr "type" "fga")
+   (set_attr "subtype" "addsub64")])

 (define_insn "<plusminus_insn>v8qi3"
  [(set (match_operand:V8QI 0 "register_operand" "=e")
@ -9348,7 +9470,8 @@
                        (match_operand:V8QI 2 "register_operand" "e")))]
  "TARGET_VIS4"
  "fp<plusminus_insn>8\t%1, %2, %0"
-  [(set_attr "type" "fga")])
+  [(set_attr "type" "fga")
+   (set_attr "subtype" "other")])

 (define_mode_iterator VASS [V4HI V2SI V2HI V1SI])
 (define_code_iterator vis3_addsub_ss [ss_plus ss_minus])
@ -9364,7 +9487,7 @@
  "TARGET_VIS3"
  "<vis3_addsub_ss_insn><vbits>\t%1, %2, %0"
  [(set_attr "type" "fga")
-   (set_attr "v3pipe" "true")])
+   (set_attr "subtype" "other")])

 (define_mode_iterator VMMAX [V8QI V4HI V2SI])
 (define_code_iterator vis4_minmax [smin smax])
@ -9379,7 +9502,8 @@
                           (match_operand:VMMAX 2 "register_operand" "<vconstr>")))]
  "TARGET_VIS4"
  "<vis4_minmax_insn><vbits>\t%1, %2, %0"
-  [(set_attr "type" "fga")])
+  [(set_attr "type" "fga")
+   (set_attr "subtype" "maxmin")])

 (define_code_iterator vis4_uminmax [umin umax])
 (define_code_attr vis4_uminmax_insn
@ -9393,7 +9517,8 @@
                            (match_operand:VMMAX 2 "register_operand" "<vconstr>")))]
  "TARGET_VIS4"
  "<vis4_uminmax_insn><vbits>\t%1, %2, %0"
-  [(set_attr "type" "fga")])
+  [(set_attr "type" "fga")
+   (set_attr "subtype" "maxmin")])

 ;; The use of vis3_addsub_ss_patname in the VIS4 instruction below is
 ;; intended.
@ -9403,7 +9528,8 @@
                             (match_operand:V8QI 2 "register_operand" "e")))]
  "TARGET_VIS4"
  "<vis3_addsub_ss_insn>8\t%1, %2, %0"
-  [(set_attr "type" "fga")])
+  [(set_attr "type" "fga")
+   (set_attr "subtype" "other")])

 (define_mode_iterator VAUS [V4HI V8QI])
 (define_code_iterator vis4_addsub_us [us_plus us_minus])
@ -9418,7 +9544,8 @@
                            (match_operand:VAUS 2 "register_operand" "<vconstr>")))]
 "TARGET_VIS4"
 "<vis4_addsub_us_insn><vbits>\t%1, %2, %0"
- [(set_attr "type" "fga")])
+ [(set_attr "type" "fga")
+  (set_attr "subtype" "other")])

 (define_insn "fucmp<gcond:code>8<P:mode>_vis"
  [(set (match_operand:P 0 "register_operand" "=r")
@ -9427,8 +9554,7 @@
 	 UNSPEC_FUCMP))]
  "TARGET_VIS3"
  "fucmp<gcond:code>8\t%1, %2, %0"
-  [(set_attr "type" "visl")
-   (set_attr "v3pipe" "true")])
+  [(set_attr "type" "viscmp")])

 (define_insn "fpcmpu<gcond:code><GCM:gcm_name><P:mode>_vis"
  [(set (match_operand:P 0 "register_operand" "=r")
@ -9437,8 +9563,7 @@
 	 UNSPEC_FUCMP))]
  "TARGET_VIS4"
  "fpcmpu<gcond:code><GCM:gcm_name>\t%1, %2, %0"
-  [(set_attr "type" "visl")
-   (set_attr "fptype" "double")])
+  [(set_attr "type" "viscmp")])

 (define_insn "*naddsf3"
  [(set (match_operand:SF 0 "register_operand" "=f")
@ -9542,4 +9667,62 @@
  [(set_attr "type" "fp")
   (set_attr "fptype" "double")])

+;; VIS4B instructions.
+
+(define_mode_iterator DUMODE [V2SI V4HI V8QI])
+
+(define_insn "dictunpack<DUMODE:vbits>"
+  [(set (match_operand:DUMODE 0 "register_operand" "=e")
+        (unspec:DUMODE [(match_operand:DF 1 "register_operand" "e")
+                        (match_operand:SI 2 "imm5_operand_dictunpack<DUMODE:vbits>" "t")]
+         UNSPEC_DICTUNPACK))]
+  "TARGET_VIS4B"
+  "dictunpack\t%1, %2, %0"
+  [(set_attr "type" "fga")
+   (set_attr "subtype" "other")])
+
+(define_mode_iterator FPCSMODE [V2SI V4HI V8QI])
+(define_code_iterator fpcscond [le gt eq ne])
+(define_code_iterator fpcsucond [le gt])
+
+(define_insn "fpcmp<fpcscond:code><FPCSMODE:vbits><P:mode>shl"
+  [(set (match_operand:P 0 "register_operand" "=r")
+        (unspec:P [(fpcscond:FPCSMODE (match_operand:FPCSMODE 1 "register_operand" "e")
+                                      (match_operand:FPCSMODE 2 "register_operand" "e"))
+                   (match_operand:SI 3 "imm2_operand" "q")]
+         UNSPEC_FPCMPSHL))]
+   "TARGET_VIS4B"
+   "fpcmp<fpcscond:code><FPCSMODE:vbits>shl\t%1, %2, %3, %0"
+   [(set_attr "type" "viscmp")])
+
+(define_insn "fpcmpu<fpcsucond:code><FPCSMODE:vbits><P:mode>shl"
+  [(set (match_operand:P 0 "register_operand" "=r")
+        (unspec:P [(fpcsucond:FPCSMODE (match_operand:FPCSMODE 1 "register_operand" "e")
+                                       (match_operand:FPCSMODE 2 "register_operand" "e"))
+                   (match_operand:SI 3 "imm2_operand" "q")]
+         UNSPEC_FPUCMPSHL))]
+   "TARGET_VIS4B"
+   "fpcmpu<fpcsucond:code><FPCSMODE:vbits>shl\t%1, %2, %3, %0"
+   [(set_attr "type" "viscmp")])
+
+(define_insn "fpcmpde<FPCSMODE:vbits><P:mode>shl"
+  [(set (match_operand:P 0 "register_operand" "=r")
+        (unspec:P [(match_operand:FPCSMODE 1 "register_operand" "e")
+                   (match_operand:FPCSMODE 2 "register_operand" "e")
+                   (match_operand:SI 3 "imm2_operand" "q")]
+         UNSPEC_FPCMPDESHL))]
+   "TARGET_VIS4B"
+   "fpcmpde<FPCSMODE:vbits>shl\t%1, %2, %3, %0"
+   [(set_attr "type" "viscmp")])
+
+(define_insn "fpcmpur<FPCSMODE:vbits><P:mode>shl"
+  [(set (match_operand:P 0 "register_operand" "=r")
+        (unspec:P [(match_operand:FPCSMODE 1 "register_operand" "e")
+                   (match_operand:FPCSMODE 2 "register_operand" "e")
+                   (match_operand:SI 3 "imm2_operand" "q")]
+         UNSPEC_FPCMPURSHL))]
+   "TARGET_VIS4B"
+   "fpcmpur<FPCSMODE:vbits>shl\t%1, %2, %3, %0"
+   [(set_attr "type" "viscmp")])
+
 (include "sync.md")
--- a/gcc/config/sparc/sparc.opt
+++ b/gcc/config/sparc/sparc.opt
@ -81,6 +81,10 @@ mvis4
 Target Report Mask(VIS4)
 Use UltraSPARC Visual Instruction Set version 4.0 extensions.

+mvis4b
+Target Report Mask(VIS4B)
+Use additional VIS instructions introduced in OSA2017.
+
 mcbcond
 Target Report Mask(CBCOND)
 Use UltraSPARC Compare-and-Branch extensions.
@ -209,6 +213,9 @@ Enum(sparc_processor_type) String(niagara4) Value(PROCESSOR_NIAGARA4)
 EnumValue
 Enum(sparc_processor_type) String(niagara7) Value(PROCESSOR_NIAGARA7)

+EnumValue
+Enum(sparc_processor_type) String(m8) Value(PROCESSOR_M8)
+
 mcmodel=
 Target RejectNegative Joined Var(sparc_cmodel_string)
 Use given SPARC-V9 code model.
--- a/gcc/config/sparc/ultra1_2.md
+++ b/gcc/config/sparc/ultra1_2.md
@ -263,10 +263,10 @@

 (define_insn_reservation "us1_fga_double"
  2
-  (and (and
-         (eq_attr "cpu" "ultrasparc")
-         (eq_attr "type" "fga,visl,vismv"))
-       (eq_attr "fptype" "double"))
+  (and (eq_attr "cpu" "ultrasparc")
+       (ior (and (eq_attr "type" "fga,visl,vismv")
+                 (eq_attr "fptype" "double"))
+            (eq_attr "type" "viscmp")))
  "us1_fpa + us1_fp_double + us1_slotany, nothing")

 (define_bypass 1 "us1_fga_double" "us1_fga_double")
--- a/gcc/config/sparc/ultra3.md
+++ b/gcc/config/sparc/ultra3.md
@ -56,7 +56,7 @@

 (define_insn_reservation "us3_array" 2
  (and (eq_attr "cpu" "ultrasparc3")
-    (eq_attr "type" "array,edgen"))
+    (eq_attr "type" "array,edgen,bmask"))
  "us3_ms + us3_slotany, nothing")

 ;; ??? Not entirely accurate.
@ -176,7 +176,7 @@
 (define_insn_reservation "us3_fga"
  3
  (and (eq_attr "cpu" "ultrasparc3")
-       (eq_attr "type" "fga,visl,vismv"))
+       (eq_attr "type" "fga,visl,viscmp,vismv"))
  "us3_fpa + us3_slotany, nothing*2")

 (define_insn_reservation "us3_fgm"
--- a/gcc/configure
+++ b/gcc/configure
@ -25217,6 +25217,41 @@ $as_echo "#define HAVE_AS_SPARC5_VIS4 1" >>confdefs.h

 fi

+    { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for SPARC6 instructions" >&5
+$as_echo_n "checking assembler for SPARC6 instructions... " >&6; }
+if test "${gcc_cv_as_sparc_sparc6+set}" = set; then :
+  $as_echo_n "(cached) " >&6
+else
+  gcc_cv_as_sparc_sparc6=no
+  if test x$gcc_cv_as != x; then
+    $as_echo '.text
+       .register %g2, #scratch
+       .register %g3, #scratch
+       .align 4
+       rd %entropy, %g1
+       fpsll64x %f0, %f2, %f4' > conftest.s
+    if { ac_try='$gcc_cv_as $gcc_cv_as_flags -xarch=sparc6 -o conftest.o conftest.s >&5'
+  { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
+  (eval $ac_try) 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; }
+    then
+	gcc_cv_as_sparc_sparc6=yes
+    else
+      echo "configure: failed program was" >&5
+      cat conftest.s >&5
+    fi
+    rm -f conftest.o conftest.s
+  fi
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_as_sparc_sparc6" >&5
+$as_echo "$gcc_cv_as_sparc_sparc6" >&6; }
+if test $gcc_cv_as_sparc_sparc6 = yes; then
+
+$as_echo "#define HAVE_AS_SPARC6 1" >>confdefs.h
+
+fi

    { $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for LEON instructions" >&5
 $as_echo_n "checking assembler for LEON instructions... " >&6; }
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@ -3969,6 +3969,18 @@ foo:
      [AC_DEFINE(HAVE_AS_SPARC5_VIS4, 1,
                [Define if your assembler supports SPARC5 and VIS 4.0 instructions.])])

+    gcc_GAS_CHECK_FEATURE([SPARC6 instructions],
+      gcc_cv_as_sparc_sparc6,,
+      [-xarch=sparc6],
+      [.text
+       .register %g2, #scratch
+       .register %g3, #scratch
+       .align 4
+       rd %entropy, %g1
+       fpsll64x %f0, %f2, %f4],,
+      [AC_DEFINE(HAVE_AS_SPARC6, 1,
+                [Define if your assembler supports SPARC6 instructions.])])
+
    gcc_GAS_CHECK_FEATURE([LEON instructions],
      gcc_cv_as_sparc_leon,,
      [-Aleon],
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@ -19074,6 +19074,45 @@ v4hi __builtin_vis_fpminu16 (v4hi, v4hi);
 v2si __builtin_vis_fpminu32 (v2si, v2si);
@end smallexample

+When you use the @option{-mvis4b} switch, the VIS version 4.0B
+built-in functions also become available:
+
+@smallexample
+v8qi __builtin_vis_dictunpack8 (double, int);
+v4hi __builtin_vis_dictunpack16 (double, int);
+v2si __builtin_vis_dictunpack32 (double, int);
+
+long __builtin_vis_fpcmple8shl (v8qi, v8qi, int);
+long __builtin_vis_fpcmpgt8shl (v8qi, v8qi, int);
+long __builtin_vis_fpcmpeq8shl (v8qi, v8qi, int);
+long __builtin_vis_fpcmpne8shl (v8qi, v8qi, int);
+
+long __builtin_vis_fpcmple16shl (v4hi, v4hi, int);
+long __builtin_vis_fpcmpgt16shl (v4hi, v4hi, int);
+long __builtin_vis_fpcmpeq16shl (v4hi, v4hi, int);
+long __builtin_vis_fpcmpne16shl (v4hi, v4hi, int);
+
+long __builtin_vis_fpcmple32shl (v2si, v2si, int);
+long __builtin_vis_fpcmpgt32shl (v2si, v2si, int);
+long __builtin_vis_fpcmpeq32shl (v2si, v2si, int);
+long __builtin_vis_fpcmpne32shl (v2si, v2si, int);
+
+long __builtin_vis_fpcmpule8shl (v8qi, v8qi, int);
+long __builtin_vis_fpcmpugt8shl (v8qi, v8qi, int);
+long __builtin_vis_fpcmpule16shl (v4hi, v4hi, int);
+long __builtin_vis_fpcmpugt16shl (v4hi, v4hi, int);
+long __builtin_vis_fpcmpule32shl (v2si, v2si, int);
+long __builtin_vis_fpcmpugt32shl (v2si, v2si, int);
+
+long __builtin_vis_fpcmpde8shl (v8qi, v8qi, int);
+long __builtin_vis_fpcmpde16shl (v4hi, v4hi, int);
+long __builtin_vis_fpcmpde32shl (v2si, v2si, int);
+
+long __builtin_vis_fpcmpur8shl (v8qi, v8qi, int);
+long __builtin_vis_fpcmpur16shl (v4hi, v4hi, int);
+long __builtin_vis_fpcmpur32shl (v2si, v2si, int);
+@end smallexample
+
@node SPU Built-in Functions
@subsection SPU Built-in Functions

--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@ -1117,6 +1117,7 @@ See RS/6000 and PowerPC Options.
 -muser-mode  -mno-user-mode @gol
 -mv8plus  -mno-v8plus  -mvis  -mno-vis @gol
 -mvis2  -mno-vis2  -mvis3  -mno-vis3 @gol
+-mvis4 -mno-vis4 -mvis4b -mno-vis4b @gol
 -mcbcond  -mno-cbcond  -mfmaf  -mno-fmaf  @gol
 -mpopc  -mno-popc  -msubxc  -mno-subxc@gol
 -mfix-at697f  -mfix-ut699 @gol
@ -23395,7 +23396,7 @@ for machine type @var{cpu_type}.  Supported values for @var{cpu_type} are
@samp{leon}, @samp{leon3}, @samp{leon3v7}, @samp{sparclite}, @samp{f930},
@samp{f934}, @samp{sparclite86x}, @samp{sparclet}, @samp{tsc701}, @samp{v9},
@samp{ultrasparc}, @samp{ultrasparc3}, @samp{niagara}, @samp{niagara2},
-@samp{niagara3}, @samp{niagara4} and @samp{niagara7}.
+@samp{niagara3}, @samp{niagara4}, @samp{niagara7} and @samp{m8}.

 Native Solaris and GNU/Linux toolchains also support the value @samp{native},
 which selects the best architecture option for the host processor.
@ -23423,7 +23424,8 @@ f930, f934, sparclite86x
 tsc701

@item v9
-ultrasparc, ultrasparc3, niagara, niagara2, niagara3, niagara4, niagara7
+ultrasparc, ultrasparc3, niagara, niagara2, niagara3, niagara4,
+niagara7, m8
@end table

 By default (unless configured otherwise), GCC generates code for the V7
@ -23467,7 +23469,8 @@ additionally optimizes it for Sun UltraSPARC T2 chips. With
 UltraSPARC T3 chips.  With @option{-mcpu=niagara4}, the compiler
 additionally optimizes it for Sun UltraSPARC T4 chips.  With
@option{-mcpu=niagara7}, the compiler additionally optimizes it for
-Oracle SPARC M7 chips.
+Oracle SPARC M7 chips.  With @option{-mcpu=m8}, the compiler
+additionally optimizes it for Oracle M8 chips.

@item -mtune=@var{cpu_type}
@opindex mtune
@ -23482,8 +23485,8 @@ that select a particular CPU implementation.  Those are
@samp{leon3}, @samp{leon3v7}, @samp{f930}, @samp{f934},
@samp{sparclite86x}, @samp{tsc701}, @samp{ultrasparc},
@samp{ultrasparc3}, @samp{niagara}, @samp{niagara2}, @samp{niagara3},
-@samp{niagara4} and @samp{niagara7}.  With native Solaris and
-GNU/Linux toolchains, @samp{native} can also be used.
+@samp{niagara4}, @samp{niagara7} and @samp{m8}.  With native Solaris
+and GNU/Linux toolchains, @samp{native} can also be used.

@item -mv8plus
@itemx -mno-v8plus
@ -23531,6 +23534,18 @@ default is @option{-mvis4} when targeting a cpu that supports such
 instructions, such as niagara-7 and later.  Setting @option{-mvis4}
 also sets @option{-mvis3}, @option{-mvis2} and @option{-mvis}.

+@item -mvis4b
+@itemx -mno-vis4b
+@opindex mvis4b
+@opindex mno-vis4b
+With @option{-mvis4b}, GCC generates code that takes advantage of
+version 4.0 of the UltraSPARC Visual Instruction Set extensions, plus
+the additional VIS instructions introduced in the Oracle SPARC
+Architecture 2017.  The default is @option{-mvis4b} when targeting a
+cpu that supports such instructions, such as m8 and later.  Setting
+@option{-mvis4b} also sets @option{-mvis4}, @option{-mvis3},
+@option{-mvis2} and @option{-mvis}.
+
@item -mcbcond
@itemx -mno-cbcond
@opindex mcbcond
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@ -1,3 +1,11 @@
+2017-07-07  Jose E. Marchesi  <jose.marchesi@oracle.com>
+
+	* gcc.target/sparc/dictunpack.c: New file.
+	* gcc.target/sparc/fpcmpdeshl.c: Likewise.
+	* gcc.target/sparc/fpcmpshl.c: Likewise.
+	* gcc.target/sparc/fpcmpurshl.c: Likewise.
+	* gcc.target/sparc/fpcmpushl.c: Likewise.
+
 2017-07-05  Georg-Johann Lay  <avr@gjlay.de>

 	Backport from 2017-07-05 trunk r249995, r249996.
--- a/gcc/testsuite/gcc.target/sparc/dictunpack.c
+++ b/gcc/testsuite/gcc.target/sparc/dictunpack.c
@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-mvis4b" } */
+
+typedef unsigned char vec8 __attribute__((vector_size(8)));
+typedef short vec16 __attribute__((vector_size(8)));
+typedef int vec32 __attribute__((vector_size(8)));
+
+vec8 test_dictunpack8 (double a)
+{
+  return __builtin_vis_dictunpack8 (a, 6);
+}
+
+vec16 test_dictunpack16 (double a)
+{
+  return __builtin_vis_dictunpack16 (a, 14);
+}
+
+vec32 test_dictunpack32 (double a)
+{
+  return __builtin_vis_dictunpack32 (a, 30);
+}
+
+/* { dg-final { scan-assembler "dictunpack\t%" } } */
+/* { dg-final { scan-assembler "dictunpack\t%" } } */
+/* { dg-final { scan-assembler "dictunpack\t%" } } */
--- a/gcc/testsuite/gcc.target/sparc/fpcmpdeshl.c
+++ b/gcc/testsuite/gcc.target/sparc/fpcmpdeshl.c
@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-mvis4b" } */
+
+typedef unsigned char vec8 __attribute__((vector_size(8)));
+typedef short vec16 __attribute__((vector_size(8)));
+typedef int vec32 __attribute__((vector_size(8)));
+
+long test_fpcmpde8shl (vec8 a, vec8 b)
+{
+  return __builtin_vis_fpcmpde8shl (a, b, 2);
+}
+
+long test_fpcmpde16shl (vec16 a, vec16 b)
+{
+  return __builtin_vis_fpcmpde16shl (a, b, 2);
+}
+
+long test_fpcmpde32shl (vec32 a, vec32 b)
+{
+  return __builtin_vis_fpcmpde32shl (a, b, 2);
+}
+
+/* { dg-final { scan-assembler "fpcmpde8shl\t%" } } */
+/* { dg-final { scan-assembler "fpcmpde16shl\t%" } } */
+/* { dg-final { scan-assembler "fpcmpde32shl\t%" } } */
--- a/gcc/testsuite/gcc.target/sparc/fpcmpshl.c
+++ b/gcc/testsuite/gcc.target/sparc/fpcmpshl.c
@ -0,0 +1,81 @@
+/* { dg-do compile } */
+/* { dg-options "-mvis4b" } */
+
+typedef unsigned char vec8 __attribute__((vector_size(8)));
+typedef short vec16 __attribute__((vector_size(8)));
+typedef int vec32 __attribute__((vector_size(8)));
+
+long test_fpcmple8shl (vec8 a, vec8 b)
+{
+  return __builtin_vis_fpcmple8shl (a, b, 2);
+}
+
+long test_fpcmpgt8shl (vec8 a, vec8 b)
+{
+  return __builtin_vis_fpcmpgt8shl (a, b, 2);
+}
+
+long test_fpcmpeq8shl (vec8 a, vec8 b)
+{
+  return __builtin_vis_fpcmpeq8shl (a, b, 2);
+}
+
+long test_fpcmpne8shl (vec8 a, vec8 b)
+{
+  return __builtin_vis_fpcmpne8shl (a, b, 2);
+}
+
+long test_fpcmple16shl (vec16 a, vec16 b)
+{
+  return __builtin_vis_fpcmple16shl (a, b, 2);
+}
+
+long test_fpcmpgt16shl (vec16 a, vec16 b)
+{
+  return __builtin_vis_fpcmpgt16shl (a, b, 2);
+}
+
+long test_fpcmpeq16shl (vec16 a, vec16 b)
+{
+  return __builtin_vis_fpcmpeq16shl (a, b, 2);
+}
+
+long test_fpcmpne16shl (vec16 a, vec16 b)
+{
+  return __builtin_vis_fpcmpne16shl (a, b, 2);
+}
+
+long test_fpcmple32shl (vec32 a, vec32 b)
+{
+  return __builtin_vis_fpcmple32shl (a, b, 2);
+}
+
+long test_fpcmpgt32shl (vec32 a, vec32 b)
+{
+  return __builtin_vis_fpcmpgt32shl (a, b, 2);
+}
+
+long test_fpcmpeq32shl (vec32 a, vec32 b)
+{
+  return __builtin_vis_fpcmpeq32shl (a, b, 2);
+}
+
+long test_fpcmpne32shl (vec32 a, vec32 b)
+{
+  return __builtin_vis_fpcmpne32shl (a, b, 2);
+}
+
+/* { dg-final { scan-assembler "fpcmple8shl\t%" } } */
+/* { dg-final { scan-assembler "fpcmpgt8shl\t%" } } */
+/* { dg-final { scan-assembler "fpcmpeq8shl\t%" } } */
+/* { dg-final { scan-assembler "fpcmpne8shl\t%" } } */
+
+/* { dg-final { scan-assembler "fpcmple16shl\t%" } } */
+/* { dg-final { scan-assembler "fpcmpgt16shl\t%" } } */
+/* { dg-final { scan-assembler "fpcmpeq16shl\t%" } } */
+/* { dg-final { scan-assembler "fpcmpne16shl\t%" } } */
+
+/* { dg-final { scan-assembler "fpcmple32shl\t%" } } */
+/* { dg-final { scan-assembler "fpcmpgt32shl\t%" } } */
+/* { dg-final { scan-assembler "fpcmpeq32shl\t%" } } */
+/* { dg-final { scan-assembler "fpcmpne32shl\t%" } } */
--- a/gcc/testsuite/gcc.target/sparc/fpcmpurshl.c
+++ b/gcc/testsuite/gcc.target/sparc/fpcmpurshl.c
@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-mvis4b" } */
+
+typedef unsigned char vec8 __attribute__((vector_size(8)));
+typedef short vec16 __attribute__((vector_size(8)));
+typedef int vec32 __attribute__((vector_size(8)));
+
+long test_fpcmpur8shl (vec8 a, vec8 b)
+{
+  return __builtin_vis_fpcmpur8shl (a, b, 2);
+}
+
+long test_fpcmpur16shl (vec16 a, vec16 b)
+{
+  return __builtin_vis_fpcmpur16shl (a, b, 2);
+}
+
+long test_fpcmpur32shl (vec32 a, vec32 b)
+{
+  return __builtin_vis_fpcmpur32shl (a, b, 2);
+}
+
+/* { dg-final { scan-assembler "fpcmpur8shl\t%" } } */
+/* { dg-final { scan-assembler "fpcmpur16shl\t%" } } */
+/* { dg-final { scan-assembler "fpcmpur32shl\t%" } } */
--- a/gcc/testsuite/gcc.target/sparc/fpcmpushl.c
+++ b/gcc/testsuite/gcc.target/sparc/fpcmpushl.c
@ -0,0 +1,43 @@
+/* { dg-do compile } */
+/* { dg-options "-mvis4b" } */
+
+typedef unsigned char vec8 __attribute__((vector_size(8)));
+typedef short vec16 __attribute__((vector_size(8)));
+typedef int vec32 __attribute__((vector_size(8)));
+
+long test_fpcmpule8shl (vec8 a, vec8 b)
+{
+  return __builtin_vis_fpcmpule8shl (a, b, 2);
+}
+
+long test_fpcmpugt8shl (vec8 a, vec8 b)
+{
+  return __builtin_vis_fpcmpugt8shl (a, b, 2);
+}
+
+long test_fpcmpule16shl (vec16 a, vec16 b)
+{
+  return __builtin_vis_fpcmpule16shl (a, b, 2);
+}
+
+long test_fpcmpugt16shl (vec16 a, vec16 b)
+{
+  return __builtin_vis_fpcmpugt16shl (a, b, 2);
+}
+
+long test_fpcmpule32shl (vec32 a, vec32 b)
+{
+  return __builtin_vis_fpcmpule32shl (a, b, 2);
+}
+
+long test_fpcmpugt32shl (vec32 a, vec32 b)
+{
+  return __builtin_vis_fpcmpugt32shl (a, b, 2);
+}
+
+/* { dg-final { scan-assembler "fpcmpule8shl\t%" } } */
+/* { dg-final { scan-assembler "fpcmpugt8shl\t%" } } */
+/* { dg-final { scan-assembler "fpcmpule16shl\t%" } } */
+/* { dg-final { scan-assembler "fpcmpugt16shl\t%" } } */
+/* { dg-final { scan-assembler "fpcmpule32shl\t%" } } */
+/* { dg-final { scan-assembler "fpcmpugt32shl\t%" } } */