2002-04-29 Vladimir Makarov <vmakarov@redhat.com>

Merging code from dfa-branch: From-SVN: r52915
2002-04-29 22:34:36 +00:00 · 2002-04-29 22:34:36 +00:00 · fae15c9379
parent a2ff290c02
commit fae15c9379
27 changed files with 12626 additions and 1574 deletions
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@ -1,3 +1,365 @@
+2002-04-29  Vladimir Makarov  <vmakarov@redhat.com>
+
+	Merging code from dfa-branch:
+	
+	2002-04-24  Vladimir Makarov  <vmakarov@redhat.com>
+
+	* genautomata.c (output_reserv_sets): Fix typo.
+
+	2002-04-23  Vladimir Makarov  <vmakarov@redhat.com>
+
+	* genautomata.c (output_reserv_sets): Remove
+	next_cycle_output_flag.
+
+	Thu Apr 18 08:57:06 2002  Jeffrey A Law  (law@redhat.com)
+
+        * sched-rgn.c (init_ready_list): Make the DFA code handle
+        USE/CLOBBER insns in the same way as the traditional
+        scheduler.
+        (new_ready): Similarly..
+
+	2002-04-17  Vladimir Makarov  <vmakarov@redhat.com>
+
+	* haifa-sched.c (schedule_block): Change the DFA state only after
+	issuing insn.
+
+	Wed Apr 17 15:38:36 2002  Jeffrey A Law  (law@redhat.com)
+
+	* pa.c (hppa_use_dfa_pipeline_interface): New function.
+	(TARGET_SCHED_USE_DFA_PIPELINE_INTERFACE): Define.
+	(override_options): Add PA7300 scheduling support.
+	(pa_adjust_cost): Update various comments.  Properly
+	handle anti and output dependencies when using the
+	DFA scheduler.
+	(pa_issue_rate): Add PA7300 scheduling support.
+	(pa_can_combine_p): Call extract_insn before calling
+	constrain_operands (taken from mainline tree).
+	* pa.h (enum processor_type): Add PROCESSOR_PA7300.
+	* pa.md (cpu attr): Add 7300.  Rewrite pipeline
+	descriptions using DFA descriptions.  Add PA7300
+	scheduling support.
+
+	2002-03-30  David S. Miller  <davem@redhat.com>
+
+	Add UltraSPARC-III DFA scheduling support.
+	* config/sparc/sparc.md (define_attr type): Add fpcrmove.
+	Update FP conditional move on register insn patterns to use it, as
+	appropriate.
+	(define_attr cpu): Add ultrasparc3.
+	(define_attr us3load_type): New, update integer load patterns to
+	set it, as appropriate.
+	(define_automaton): Add ultrasparc3_0 and ultrasparc3_1.
+	(rest): Add UltraSPARC3 scheduling description.
+	* config/sparc/sparc.h (TARGET_CPU_ultrasparc3): New.
+	(PROCESSOR_ULTRASPARC3): New.
+	({ASM,CPP}_CPU64_DEFAULT_SPEC): Handle ultrasparc3.
+	({ASM,CPP}_CPU_SPEC): Likewise.
+	(REGISTER_MOVE_COST): Likewise.
+	(RTX_COSTS): Likewise.
+	* config/sparc/sparc.c (sparc_override_options,
+	sparc_initialize_trampoline, sparc64_initialize_trampoline,
+	sparc_use_dfa_pipeline_interface, sparc_use_sched_lookahead,
+	sparc_issue_rate): Likewise.
+	* config/sparc/sol2.h: Likewise.
+	* config/sparc/sol2-sld-64.h: Likewise.
+	* config/sparc/linux64.h: Likewise.
+	
+	2002-03-22  Vladimir Makarov  <vmakarov@redhat.com>
+
+	* doc/md.texi: Add comments about usage the latency time for the
+	different dependencies and about case when two or more conditions
+	in different define_insn_reservations returns TRUE for an insn.
+
+	* doc/md.texi: Add reference for automaton based pipeline
+	description.
+	
+	2002-03-04  Vladimir Makarov  <vmakarov@redhat.com>
+
+	* doc/passes.texi: Add missed information about genattrtab.
+
+	2002-03-01  Vladimir Makarov  <vmakarov@redhat.com>
+
+	* genautomata.c (output_automata_list_transition_code): Check
+	automata_list on NULL.
+
+	2002-02-28  Vladimir Makarov  <vmakarov@redhat.com>
+
+	* genautomata.c (output_insn_code_cases,
+	output_automata_list_min_issue_delay_code,
+	output_automata_list_transition_code,
+	output_automata_list_state_alts_code): Comment the functions.
+
+	2002-02-22  Vladimir Makarov  <vmakarov@redhat.com>
+
+	* genautomata.c (automata_list_el_t): New typedef.
+	(get_free_automata_list_el,free_automata_list_el,
+	free_automata_list, automata_list_hash, automata_list_eq_p,
+	initiate_automata_lists, automata_list_start, automata_list_add,
+	automata_list_finish, finish_automata_lists,
+	output_insn_code_cases, output_automata_list_min_issue_delay_code,
+	output_automata_list_transition_code,
+	output_automata_list_state_alts_code, add_automaton_state,
+	form_important_insn_automata_lists): New functions and prototypes.
+	(insn_reserv_decl): Add members important_automata_list and
+	processed_p.
+	(ainsn): Add members important_p.
+	(automata_list_el): New structure.
+	(first_free_automata_list_el, current_automata_list,
+	automata_list_table): New global variables.
+	(create_ainsns): Initiate member important_p.
+	(output_internal_min_issue_delay_func): Generate the switch and
+	call output_insn_code_cases.
+	(output_internal_trans_func, output_internal_state_alts_func):
+	Ditto.
+	(generate): Call initiate_automata_lists.
+	(automaton_states): New global variable.
+	(expand_automata): Call form_important_insn_automata_lists.
+	(write_automata): Call finish_automata_lists.
+	
+	2002-02-21  Vladimir Makarov  <vmakarov@redhat.com>
+
+	* genautomata.c (add_excls, add_presence_absence): Check that
+        cpu units in the sets belong the same automaton.
+
+	* rtl.def (EXCLUSION_SET, PRESENCE_SET, ABSENCE_SET): Add comment
+	about that cpu units in the sets belong the same automaton.
+
+	* doc/md.texi: Ditto.
+	
+	2001-12-20  Naveen Sharma,Nitin Gupta     <naveens@noida.hcltech.com,niting@noida.hcltech.com>
+
+	* config/sh/sh.c (sh_use_dfa_interface): New function.
+
+	(sh_issue_rate): New Function.
+	TARGET_SCHED_USE_DFA_PIPELINE_INTERFACE: define.
+	TARGET_SCHED_ISSUE_RATE: define.
+
+	* config/sh/sh.md: Add DFA based pipeline description for SH4.
+
+	(define_attr insn_class): New attribute used for DFA 
+         scheduling.
+	(define_insn cmpgtsi_t): Set attribute insn_class mt_group.
+	(cmpgesi_t,cmpgtusi_t,cmpgeusi_t,cmpeqsi_t,
+	 cmpeqdi_t): Likewise.
+
+	(add,addc1,addsi3,subc,subc1,*subsi3_internal,
+	 negc,negsi2,ashldi3_k,lshrdi3_k,ashrdi3_k): Set insn_class 
+	 ex_group.
+	(iorsi3,rotlsi3_1,rotlsi3_31,rotlsi3_16): Likewise.
+	   
+	2001-10-03  Vladimir Makarov  <vmakarov@toke.toronto.redhat.com>
+
+	* haifa-sched.c (queue_to_ready): Remove unnecessary condition for
+	break.
+
+	2001-10-03  Vladimir Makarov  <vmakarov@toke.toronto.redhat.com>
+
+	* genautomata.c (DFA_INSN_CODES_LENGTH_VARIABLE_NAME): New macro.
+	(output_dfa_insn_code_func): Expand dfa_insn_codes if it is
+	necessary.
+	(output_dfa_start_func): Initiate new variable insn_codes_length,
+	(write_automata): Output definition of the new variable.
+	
+	2001-10-02  David S. Miller  <davem@redhat.com>
+
+	* haifa-sched.c (advance_one_cycle): New function.
+	(schedule_block): Use it.
+	(queue_to_ready): Use it, and also make sure to advance the DFA
+	state on all stall cycles, not just those where insn_queue links
+	are found.
+
+	2001-10-02  Richard Sandiford  <rsandifo@redhat.com>
+
+	* haifa-sched.c (max_issue): Remove last_p argument.  Only return
+	non-zero if the highest-priority instruction could be scheduled.
+	(choose_ready): Remove last argument from max_issue call.
+
+	2001-09-28  David S. Miller  <davem@redhat.com>
+
+	* config/sparc/sparc.c (sparc_use_sched_lookahead): Use 4 for
+	ultrasparc and 3 for other multi-issue sparcs.
+
+	2001-09-27  David S. Miller  <davem@redhat.com>
+
+	* config/sparc/sparc.md (cycle_display): New pattern.
+	* config/sparc/sparc.c (sparc_cycle_display): New.
+	(TARGET_SCHED_CYCLE_DISPLAY): Set it.
+
+	2001-09-25  David S. Miller  <davem@redhat.com>
+
+	Convert all of Sparc scheduling to DFA
+	* config/sparc/sparc.md: Kill all define_function_unit
+	directives and replace with DFA equivalent.
+	* config/sparc/sparc.c (ultrasparc_adjust_cost,
+	mark_ultrasparc_pipeline_state, ultra_cmove_results_ready_p,
+	ultra_fpmode_conflict_exists, ultra_find_type,
+	ultra_build_types_avail, ultra_flush_pipeline,
+	ultra_rescan_pipeline_state, ultrasparc_sched_reorder,
+	ultrasparc_variable_issue, ultrasparc_sched_init,
+	sparc_variable_issue, sparc_sched_reorder, ultra_code_from_mask,
+	ultra_schedule_insn, ultra_code_names, ultra_pipe_hist,
+	ultra_cur_hist, ultra_cycles_elapsed): Kill.
+	(sparc_use_dfa_pipeline_interface, sparc_use_sched_lookahead,
+	ultrasparc_store_bypass_p): New.
+	* config/sparc/sparc-protos.h (ultrasparc_store_bypass_p):
+	Declare.
+
+	2001-09-24  David S. Miller  <davem@redhat.com>
+
+	* haifa-sched.c (ready_remove): Fix thinko, we want to copy around
+	ready->vec[foo] not ready[foo].
+
+	2001-09-07  Vladimir Makarov  <vmakarov@redhat.com>
+
+	* doc/md.texi: Correct examples for define_insn_reservations
+	`mult' and `div'.
+
+	2001-09-07  Vladimir Makarov  <vmakarov@redhat.com>
+
+	* genautomata.c (create_automata): Print message about creation of
+	each automaton.
+	(generate): Remove printing meease about creation of
+	automata.
+
+	2001-09-05  David S. Miller  <davem@redhat.com>
+
+	* config/sparc/linux.h: Set CPLUSPLUS_CPP_SPEC.
+	* config/sparc/linux64.h: Likewise.
+
+	2001-08-31  Vladimir Makarov  <vmakarov@redhat.com>
+
+	* haifa-sched.c (insn_cost, schedule_insn, queue_to_ready,
+	schedule_block, sched_init, sched_finish): Add missed calls of
+	use_dfa_pipeline_interface.
+
+	* sched-rgn.c (init_ready_list, new_ready, debug_dependencies):
+	Ditto.
+
+	* sched-vis.c (get_visual_tbl_length): Ditto.
+	
+	2001-08-27  Richard Henderson  <rth@redhat.com>
+
+	* genattr.c (main): Emit state_t even when not doing scheduling.
+
+	2001-08-27  Richard Henderson  <rth@redhat.com>
+
+	* genautomata.c (expand_automata): Always create a description.
+
+	2001-08-27  Vladimir Makarov  <vmakarov@touchme.toronto.redhat.com>
+
+        * rtl.def (DEFINE_CPU_UNIT, DEFINE_QUERY_CPU_UNIT, EXCLUSION_SET,
+	PRESENCE_SET, ABSENCE_SET, DEFINE_BYPASS, DEFINE_AUTOMATON,
+	AUTOMATA_OPTION, DEFINE_RESERVATION, DEFINE_INSN_RESERVATION): New
+	RTL constructions.
+
+	* genattr.c (main): New variable num_insn_reservations.  Increase
+	it if there is DEFINE_INSN_RESERVATION.  Output automaton based
+	pipeline hazard recognizer interface.
+
+	* genattrtab.h: New file.
+
+	* genattrtab.c: Include genattrtab.h.
+	(attr_printf, check_attr_test, make_internal_attr,
+	make_numeric_value): Move protypes into genattrtab.h.  Define them
+	as external.
+	(num_dfa_decls): New global variable.
+	(main): Process DEFINE_CPU_UNIT, DEFINE_QUERY_CPU_UNIT,
+	DEFINE_BYPASS, EXCLUSION_SET, PRESENCE_SET, ABSENCE_SET,
+	DEFINE_AUTOMATON, AUTOMATA_OPTION, DEFINE_RESERVATION,
+	DEFINE_INSN_RESERVATION.  Call expand_automata and write_automata.
+
+	* genautomata.c: New file.
+
+	* rtl.h (LINK_COST_ZERO, LINK_COST_FREE): Remove them.
+
+        * sched-int.h: (curr_state): Add the external definition for
+	automaton pipeline interface.
+	(haifa_insn_data): Add comments for members blockage and units.
+
+	* target-def.h (TARGET_SCHED_USE_DFA_PIPELINE_INTERFACE,
+	TARGET_SCHED_INIT_DFA_PRE_CYCLE_INSN,
+	TARGET_SCHED_DFA_PRE_CYCLE_INSN,
+	TARGET_SCHED_INIT_DFA_POST_CYCLE_INSN,
+	TARGET_SCHED_DFA_POST_CYCLE_INSN,
+	TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD,
+	TARGET_SCHED_INIT_DFA_BUBBLES, TARGET_SCHED_DFA_BUBBLE): New
+	macros.
+	(TARGET_SCHED): Use the new macros.
+
+	* target.h (use_dfa_pipeline_interface, init_dfa_pre_cycle_insn,
+	dfa_pre_cycle_insn, init_dfa_post_cycle_insn, dfa_post_cycle_insn,
+	first_cycle_multipass_dfa_lookahead, init_dfa_bubbles,
+	dfa_bubble): New members in gcc_target.sched.
+
+        * haifa-sched.c (insert_schedule_bubbles_p): New variable.
+	(MAX_INSN_QUEUE_INDEX): New macro for automaton interface.
+	(insn_queue): Redefine it as pointer to array.
+	(NEXT_Q, NEXT_Q_AFTER): Use MAX_INSN_QUEUE_INDEX instead of
+	INSN_QUEUE_SIZE.
+	(max_insn_queue_index_macro_value): New variable.
+	(curr_state, dfa_state_size, ready_try): New varaibles for
+	automaton interface.
+	(ready_element, ready_remove, max_issue): New function prototypes
+	for automaton interface.
+	(choose_ready): New function prototype.
+	(insn_unit, blockage_range): Add comments.
+	(unit_last_insn, unit_tick, unit_n_insns): Define them for case
+	FUNCTION_UNITS_SIZE == 0.
+	(insn_issue_delay, actual_hazard_this_instance, schedule_unit,
+	actual_hazard, potential_hazard): Add comments.
+	(insn_cost): Use cost -1 as undefined value.  Remove
+	LINK_COST_ZERO and LINK_COST_FREE.  Add new code for automaton
+	pipeline interface.
+	(ready_element, ready_remove): New functions for automaton
+	interface.
+	(schedule_insn): Add new code for automaton pipeline interface.
+	(queue_to_ready): Add new code for automaton pipeline interface.
+	Use MAX_INSN_QUEUE_INDEX instead of INSN_QUEUE_SIZE.
+	(debug_ready_list): Print newline when the queue is empty.
+	(max_issue): New function for automaton pipeline interface.
+	(choose_ready): New function.
+	(schedule_block): Add new code for automaton pipeline interface.
+	Print ready list before scheduling each insn.
+	(sched_init): Add new code for automaton pipeline interface.
+	Initiate insn cost by -1.
+	(sched_finish): Free the current automaton state and finalize
+	automaton pipeline interface.
+
+	* sched-rgn.c: Include target.h.
+	(init_ready_list, new_ready, debug_dependencies): Add new code for
+	automaton pipeline interface.
+
+	* sched-vis.c: Include target.h.
+	(get_visual_tbl_length): Add code for automaton interface.
+	(target_units, print_block_visualization):  Add comments.
+
+        * Makefile.in (GETRUNTIME, HASHTAB, HOST_GETRUNTIME, HOST_HASHTAB,
+	USE_HOST_GETRUNTIME, USE_HOST_HASHTAB, HOST_VARRAY): New variables.
+	(sched-rgn.o, sched-vis.o): Add new dependency file target.h.
+	(getruntime.o, genautomata.o): New entries.
+	(genattrtab.o): Add new dependency file genattrtab.h.
+	(genattrtab): Add new dependencies.  Link it with `libm.a'.
+	(getruntime.o, hashtab.o): New entries for canadian cross.
+
+	* doc/md.texi: Description of automaton based model.
+
+	* doc/tm.texi (TARGET_SCHED_ISSUE_RATE, TARGET_SCHED_ADJUST_COST):
+	Add comments.
+	(TARGET_SCHED_USE_DFA_PIPELINE_INTERFACE,
+	TARGET_SCHED_DFA_PRE_CYCLE_INSN,
+	TARGET_SCHED_INIT_DFA_PRE_CYCLE_INSN,
+	TARGET_SCHED_DFA_POST_CYCLE_INSN,
+	TARGET_SCHED_INIT_DFA_POST_CYCLE_INSN,
+	TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD,
+	TARGET_SCHED_INIT_DFA_BUBBLES, TARGET_SCHED_DFA_BUBBLE): The new
+	hook descriptions.
+	(TRADITIONAL_PIPELINE_INTERFACE, DFA_PIPELINE_INTERFACE,
+	MAX_DFA_ISSUE_RATE): New macro descriptions.
+
+	* doc/contrib.texi: Add dfa based scheduler contribution.
+
+	* doc/gcc.texi: Add more information about genattrtab.
+
 Mon Apr 29 17:19:10 2002  Richard Kenner  <kenner@vlsi1.ultra.nyu.edu>

 	* reload1.c (eliminate_regs, case SUBREG): Fix typo in
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@ -344,6 +344,14 @@ LIBICONV = @LIBICONV@
 # List of internationalization subdirectories.
 INTL_SUBDIRS = intl

+# Change this to a null string if obstacks are installed in the
+# system library.
+OBSTACK=obstack.o
+
+# The following object files is used by genautomata.
+GETRUNTIME = getruntime.o
+HASHTAB = hashtab.o
+
 # The GC method to be used on this system.
 GGC=@GGC@.o

@ -475,6 +483,12 @@ HOST_CFLAGS= @HOST_CFLAGS@ -DGENERATOR_FILE
 # Native linker and preprocessor flags.  For x-fragment overrides.
 HOST_LDFLAGS=$(LDFLAGS)
 HOST_CPPFLAGS=$(ALL_CPPFLAGS)
+HOST_OBSTACK=$(OBSTACK)
+HOST_VFPRINTF=$(VFPRINTF)
+HOST_DOPRINT=$(DOPRINT)
+HOST_GETRUNTIME=$(GETRUNTIME)
+HOST_HASHTAB=$(HASHTAB)
+HOST_STRSTR=$(STRSTR)

 # Actual name to use when installing a native compiler.
 GCC_INSTALL_NAME = `echo gcc|sed '$(program_transform_name)'`
@ -598,8 +612,17 @@ ALL_CPPFLAGS = $(CPPFLAGS) $(X_CPPFLAGS) $(T_CPPFLAGS)
 LIBIBERTY = ../libiberty/libiberty.a
 BUILD_LIBIBERTY = @FORBUILD@/libiberty/libiberty.a

-# Dependencies on the intl and portability libraries.
-LIBDEPS= $(INTLDEPS) $(LIBIBERTY)
+USE_HOST_OBSTACK= ` case "${HOST_OBSTACK}" in ?*) echo ${HOST_PREFIX}${HOST_OBSTACK} ;; esac `
+USE_HOST_VFPRINTF= ` case "${HOST_VFPRINTF}" in ?*) echo ${HOST_PREFIX}${HOST_VFPRINTF} ;; esac `
+USE_HOST_DOPRINT= ` case "${HOST_DOPRINT}" in ?*) echo ${HOST_PREFIX}${HOST_DOPRINT} ;; esac `
+USE_HOST_GETRUNTIME= ` case "${HOST_GETRUNTIME}" in ?*) echo ${HOST_PREFIX}${HOST_GETRUNTIME} ;; esac `
+USE_HOST_HASHTAB= ` case "${HOST_HASHTAB}" in ?*) echo ${HOST_PREFIX}${HOST_HASHTAB} ;; esac `
+USE_HOST_STRSTR= ` case "${HOST_STRSTR}" in ?*) echo ${HOST_PREFIX}${HOST_STRSTR} ;; esac `
+
+# Dependency on the intl, portability libraries, obstack or whatever
+# library facilities are not installed in the system libraries.
+# We don't use USE_* because backquote expansion doesn't work in deps.
+LIBDEPS= $(INTLLIBS) $(LIBIBERTY) $(OBSTACK) $(VFPRINTF) $(DOPRINT) $(STRSTR)

 # Likewise, for use in the tools that must run on this machine
 # even if we are cross-building GCC.
@ -618,6 +641,7 @@ HOST_RTL = $(HOST_PREFIX)rtl.o read-rtl.o $(HOST_PREFIX)bitmap.o \

 HOST_PRINT = $(HOST_PREFIX)print-rtl.o
 HOST_ERRORS = $(HOST_PREFIX)errors.o
+HOST_VARRAY = $(HOST_PREFIX)varray.o

 # Specify the directories to be searched for header files.
 # Both . and srcdir are used, in that order,
@ -1319,6 +1343,17 @@ line-map.o: line-map.c line-map.h intl.h $(CONFIG_H) $(SYSTEM_H)
 ggc-none.o: ggc-none.c $(GCONFIG_H) $(SYSTEM_H) $(GGC_H)
 	$(CC) -c $(ALL_CFLAGS) -DGENERATOR_FILE $(ALL_CPPFLAGS) $(INCLUDES) $< $(OUTPUT_OPTION)

+obstack.o: $(srcdir)/../libiberty/obstack.c $(GCONFIG_H)
+	rm -f obstack.c
+	$(LN_S) $(srcdir)/../libiberty/obstack.c obstack.c
+	$(CC) -c $(ALL_CFLAGS) -DGENERATOR_FILE $(ALL_CPPFLAGS) $(INCLUDES) \
+		obstack.c $(OUTPUT_OPTION)
+
+getruntime.o: $(srcdir)/../libiberty/getruntime.c $(CONFIG_H)
+	rm -f getruntime.c
+	$(LN_S) $(srcdir)/../libiberty/getruntime.c getruntime.c
+	$(CC) -c $(ALL_CFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) getruntime.c
+
 prefix.o: prefix.c $(CONFIG_H) $(SYSTEM_H) Makefile prefix.h
 	$(CC) $(ALL_CFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
 	-DPREFIX=\"$(prefix)\" \
@ -1558,12 +1593,13 @@ sched-deps.o : sched-deps.c $(CONFIG_H) $(SYSTEM_H) $(RTL_H) sched-int.h \
   $(INSN_ATTR_H) toplev.h $(RECOG_H) except.h cselib.h $(PARAMS_H) $(TM_P_H)
 sched-rgn.o : sched-rgn.c $(CONFIG_H) $(SYSTEM_H) $(RTL_H) sched-int.h \
   $(BASIC_BLOCK_H) $(REGS_H) hard-reg-set.h flags.h insn-config.h function.h \
-   $(INSN_ATTR_H) toplev.h $(RECOG_H) except.h $(TM_P_H)
+   $(INSN_ATTR_H) toplev.h $(RECOG_H) except.h $(TM_P_H) $(TARGET_H)
 sched-ebb.o : sched-ebb.c $(CONFIG_H) $(SYSTEM_H) $(RTL_H) sched-int.h \
   $(BASIC_BLOCK_H) $(REGS_H) hard-reg-set.h flags.h insn-config.h function.h \
   $(INSN_ATTR_H) toplev.h $(RECOG_H) except.h $(TM_P_H)
 sched-vis.o : sched-vis.c $(CONFIG_H) $(SYSTEM_H) $(RTL_H) sched-int.h \
-   hard-reg-set.h $(BASIC_BLOCK_H) $(INSN_ATTR_H) $(REGS_H) $(TM_P_H)
+   hard-reg-set.h $(BASIC_BLOCK_H) $(INSN_ATTR_H) $(REGS_H) $(TM_P_H) \
+   $(TARGET_H)
 final.o : final.c $(CONFIG_H) $(SYSTEM_H) $(RTL_H) $(TREE_H) flags.h intl.h \
   $(REGS_H) $(RECOG_H) conditions.h insn-config.h $(INSN_ATTR_H) function.h \
   real.h output.h hard-reg-set.h except.h debug.h xcoffout.h \
@ -1868,14 +1904,18 @@ genattr$(build_exeext) : genattr.o $(HOST_RTL) $(HOST_PRINT) $(HOST_ERRORS) $(HO
 genattr.o : genattr.c $(RTL_H) $(HCONFIG_H) $(SYSTEM_H) errors.h gensupport.h
 	$(HOST_CC) -c $(HOST_CFLAGS) $(HOST_CPPFLAGS) $(INCLUDES) $(srcdir)/genattr.c $(OUTPUT_OPTION)

-genattrtab$(build_exeext) : genattrtab.o $(HOST_RTL) $(HOST_PRINT) $(HOST_ERRORS) $(HOST_LIBDEPS)
+genattrtab$(build_exeext) : genattrtab.o genautomata.o $(HOST_RTL) $(HOST_PRINT) $(HOST_ERRORS) $(HOST_VARRAY) $(HOST_PREFIX)$(HOST_GETRUNTIME) $(HOST_LIBDEPS)
 	$(HOST_CC) $(HOST_CFLAGS) $(HOST_LDFLAGS) -o $@ \
-	 genattrtab.o $(HOST_RTL) $(HOST_PRINT) $(HOST_ERRORS) $(HOST_LIBS)
+	 genattrtab.o genautomata.o $(HOST_RTL) $(HOST_PRINT) $(HOST_ERRORS) $(HOST_VARRAY) $(USE_HOST_GETRUNTIME) $(HOST_LIBS) -lm

 genattrtab.o : genattrtab.c $(RTL_H) $(OBSTACK_H) $(HCONFIG_H) \
-  $(SYSTEM_H) errors.h $(GGC_H) gensupport.h
+  $(SYSTEM_H) errors.h $(GGC_H) gensupport.h genattrtab.h
 	$(HOST_CC) -c $(HOST_CFLAGS) $(HOST_CPPFLAGS) $(INCLUDES) $(srcdir)/genattrtab.c $(OUTPUT_OPTION)

+genautomata.o : genautomata.c $(RTL_H) $(OBSTACK_H) $(HCONFIG_H) \
+  $(SYSTEM_H) errors.h varray.h hash.h genattrtab.h
+	$(HOST_CC) -c $(HOST_CFLAGS) $(HOST_CPPFLAGS) $(INCLUDES) $(srcdir)/genautomata.c $(OUTPUT_OPTION)
+
 genoutput$(build_exeext) : genoutput.o $(HOST_RTL) $(HOST_PRINT) $(HOST_ERRORS) $(HOST_LIBDEPS)
 	$(HOST_CC) $(HOST_CFLAGS) $(HOST_LDFLAGS) -o $@ \
 	 genoutput.o $(HOST_RTL) $(HOST_PRINT) $(HOST_ERRORS) $(HOST_LIBS)
@ -1921,11 +1961,47 @@ $(HOST_PREFIX_1)bitmap.o: $(srcdir)/bitmap.c $(HCONFIG_H) $(SYSTEM_H) $(RTL_H) \
 	sed -e 's/config[.]h/hconfig.h/' $(srcdir)/bitmap.c > $(HOST_PREFIX)bitmap.c
 	$(HOST_CC) -c $(HOST_CFLAGS) $(HOST_CPPFLAGS) $(INCLUDES) $(HOST_PREFIX)bitmap.c $(OUTPUT_OPTION)

+$(HOST_PREFIX_1)obstack.o: $(srcdir)/../libiberty/obstack.c $(HCONFIG_H)
+	rm -f $(HOST_PREFIX)obstack.c
+	sed -e 's/config[.]h/hconfig.h/' $(srcdir)/../libiberty/obstack.c > $(HOST_PREFIX)obstack.c
+	$(HOST_CC) -c $(HOST_CFLAGS) $(HOST_CPPFLAGS) $(INCLUDES) $(HOST_PREFIX)obstack.c $(OUTPUT_OPTION)
+
+$(HOST_PREFIX_1)getruntime.o: $(srcdir)/../libiberty/getruntime.c
+	rm -f $(HOST_PREFIX)getruntime.c
+	sed -e 's/config[.]h/hconfig.h/' $(srcdir)/../libiberty/getruntime.c > $(HOST_PREFIX)getruntime.c
+	$(HOST_CC) -c $(HOST_CFLAGS) $(HOST_CPPFLAGS) $(INCLUDES) $(HOST_PREFIX)getruntime.c $(OUTPUT_OPTION)
+
+$(HOST_PREFIX_1)hashtab.o: $(srcdir)/../libiberty/hashtab.c
+	rm -f $(HOST_PREFIX)hashtab.c
+	sed -e 's/config[.]h/hconfig.h/' $(srcdir)/../libiberty/hashtab.c > $(HOST_PREFIX)hashtab.c
+	$(HOST_CC) -c $(HOST_CFLAGS) $(HOST_CPPFLAGS) $(INCLUDES) $(HOST_PREFIX)hashtab.c $(OUTPUT_OPTION)
+
+$(HOST_PREFIX_1)vfprintf.o: $(srcdir)/../libiberty/vfprintf.c $(HCONFIG_H)
+	rm -f $(HOST_PREFIX)vfprintf.c
+	sed -e 's/config[.]h/hconfig.h/' $(srcdir)/../libiberty/vfprintf.c > $(HOST_PREFIX)vfprintf.c
+	$(HOST_CC) -c $(HOST_CFLAGS) $(HOST_CPPFLAGS) $(INCLUDES) $(HOST_PREFIX)vfprintf.c $(OUTPUT_OPTION)
+
+$(HOST_PREFIX_1)doprint.o: doprint.c $(HCONFIG_H)
+	rm -f $(HOST_PREFIX)doprint.c
+	sed -e 's/config[.]h/hconfig.h/' $(srcdir)/doprint.c > $(HOST_PREFIX)doprint.c
+	$(HOST_CC) -c $(HOST_CFLAGS) $(HOST_CPPFLAGS) $(INCLUDES) $(HOST_PREFIX)doprint.c $(OUTPUT_OPTION)
+
+$(HOST_PREFIX_1)strstr.o: $(srcdir)/../libiberty/strstr.c $(HCONFIG_H)
+	rm -f $(HOST_PREFIX)strstr.c
+	sed -e 's/config[.]h/hconfig.h/' $(srcdir)/../libiberty/strstr.c > $(HOST_PREFIX)strstr.c
+	$(HOST_CC) -c $(HOST_CFLAGS) $(HOST_CPPFLAGS) $(INCLUDES) $(HOST_PREFIX)strstr.c $(OUTPUT_OPTION)
+
 $(HOST_PREFIX_1)errors.o: errors.c $(HCONFIG_H) $(SYSTEM_H) errors.h
 	rm -f $(HOST_PREFIX)errors.c
 	sed -e 's/config[.]h/hconfig.h/' $(srcdir)/errors.c > $(HOST_PREFIX)errors.c
 	$(HOST_CC) -c $(HOST_CFLAGS) $(HOST_CPPFLAGS) $(INCLUDES) $(HOST_PREFIX)errors.c $(OUTPUT_OPTION)

+
+# This satisfies the dependency that we get if you cross-compile a compiler
+# that does not need to compile doprint or whatever.
+$(HOST_PREFIX_1):
+	$(STAMP) $(HOST_PREFIX_1)
+
 $(HOST_PREFIX_1)ggc-none.o: ggc-none.c $(HCONFIG_H) $(SYSTEM_H) $(GCC_H)
 	rm -f $(HOST_PREFIX)ggc-none.c
 	sed -e 's/config[.]h/hconfig.h/' $(srcdir)/ggc-none.c > $(HOST_PREFIX)ggc-none.c
--- a/gcc/config/pa/pa.c
+++ b/gcc/config/pa/pa.c
@ -49,6 +49,17 @@ Boston, MA 02111-1307, USA.  */
 #include "target.h"
 #include "target-def.h"

+static int hppa_use_dfa_pipeline_interface PARAMS ((void));
+
+#undef TARGET_SCHED_USE_DFA_PIPELINE_INTERFACE 
+#define TARGET_SCHED_USE_DFA_PIPELINE_INTERFACE hppa_use_dfa_pipeline_interface
+
+static int
+hppa_use_dfa_pipeline_interface ()
+{
+  return 1;
+}
+
 #ifndef DO_FRAME_NOTES
 #ifdef INCOMING_RETURN_ADDR_RTX
 #define DO_FRAME_NOTES 1
@ -172,6 +183,11 @@ override_options ()
      pa_cpu_string = "7200";
      pa_cpu = PROCESSOR_7200;
    }
+  else if (pa_cpu_string && ! strcmp (pa_cpu_string, "7300"))
+    {
+      pa_cpu_string = "7300";
+      pa_cpu = PROCESSOR_7300;
+    }
  else if (pa_cpu_string && ! strcmp (pa_cpu_string, "8000"))
    {
      pa_cpu_string = "8000";
@ -179,7 +195,7 @@ override_options ()
    }
  else
    {
-      warning ("unknown -mschedule= option (%s).\nValid options are 700, 7100, 7100LC, 7200, and 8000\n", pa_cpu_string);
+      warning ("unknown -mschedule= option (%s).\nValid options are 700, 7100, 7100LC, 7200, 7300, and 8000\n", pa_cpu_string);
    }

  /* Set the instruction set architecture.  */
@ -3929,7 +3945,7 @@ pa_adjust_cost (insn, link, dep_insn, cost)
 		{
 		case TYPE_FPLOAD:
 		  /* This cost 3 cycles, not 2 as the md says for the
-		     700 and 7100.  */
+		     700 and 7100, 7100lc, 7200 and 7300.  */
 		  return cost + 1;

 		case TYPE_FPALU:
@ -3947,6 +3963,11 @@ pa_adjust_cost (insn, link, dep_insn, cost)
 		  return cost;
 		}
 	    }
+
+	  /* A flop-flop true depenendency where the sizes of the operand
+	     carrying the dependency is difference causes an additional
+	     cycle stall on the 7100lc, 7200, and 7300.   Similarly for
+	     a fpload-flop true dependency.  */
 	}

      /* For other data dependencies, the default cost specified in the
@ -3989,7 +4010,10 @@ pa_adjust_cost (insn, link, dep_insn, cost)
 		     preceding arithmetic operation has finished if
 		     the target of the fpload is any of the sources
 		     (or destination) of the arithmetic operation.  */
-		  return cost - 1;
+		  if (hppa_use_dfa_pipeline_interface ())
+		    return insn_default_latency (dep_insn) - 1;
+		  else
+		    return cost - 1;

 		default:
 		  return 0;
@ -4024,7 +4048,10 @@ pa_adjust_cost (insn, link, dep_insn, cost)
 		     preceding divide or sqrt operation has finished if
 		     the target of the ALU flop is any of the sources
 		     (or destination) of the divide or sqrt operation.  */
-		  return cost - 2;
+		  if (hppa_use_dfa_pipeline_interface ())
+		    return insn_default_latency (dep_insn) - 2;
+		  else
+		    return cost - 2;

 		default:
 		  return 0;
@ -4069,8 +4096,15 @@ pa_adjust_cost (insn, link, dep_insn, cost)
 		  /* A fpload can't be issued until one cycle before a
 		     preceding arithmetic operation has finished if
 		     the target of the fpload is the destination of the
-		     arithmetic operation.  */
-		  return cost - 1;
+		     arithmetic operation. 
+
+		     Exception: For PA7100LC, PA7200 and PA7300, the cost
+		     is 3 cycles, unless they bundle together.   We also
+		     pay the penalty if the second insn is a fpload.  */
+		  if (hppa_use_dfa_pipeline_interface ())
+		    return insn_default_latency (dep_insn) - 1;
+		  else
+		    return cost - 1;

 		default:
 		  return 0;
@ -4105,7 +4139,10 @@ pa_adjust_cost (insn, link, dep_insn, cost)
 		     preceding divide or sqrt operation has finished if
 		     the target of the ALU flop is also the target of
 		     the divide or sqrt operation.  */
-		  return cost - 2;
+		  if (hppa_use_dfa_pipeline_interface ())
+		    return insn_default_latency (dep_insn) - 2;
+		  else
+		    return cost - 2;

 		default:
 		  return 0;
@ -4165,6 +4202,7 @@ pa_issue_rate ()
    case PROCESSOR_7100:	return 2;
    case PROCESSOR_7100LC:	return 2;
    case PROCESSOR_7200:	return 2;
+    case PROCESSOR_7300:	return 2;
    case PROCESSOR_8000:	return 4;

    default:
--- a/gcc/config/pa/pa.h
+++ b/gcc/config/pa/pa.h
@ -41,6 +41,7 @@ enum processor_type
  PROCESSOR_7100,
  PROCESSOR_7100LC,
  PROCESSOR_7200,
+  PROCESSOR_7300,
  PROCESSOR_8000
 };

--- a/gcc/config/pa/pa.md
+++ b/gcc/config/pa/pa.md
@ -44,7 +44,7 @@
 ;;
 ;; FIXME: Add 800 scheduling for completeness?

-(define_attr "cpu" "700,7100,7100LC,7200,8000" (const (symbol_ref "pa_cpu_attr")))
+(define_attr "cpu" "700,7100,7100LC,7200,7300,8000" (const (symbol_ref "pa_cpu_attr")))

 ;; Length (in # of bytes).
 (define_attr "length" ""
@ -139,35 +139,10 @@
 		       (const_int 0)))
  [(eq_attr "in_branch_delay" "true") (nil) (nil)])

-;; Function units of the HPPA. The following data is for the 700 CPUs
-;; (Mustang CPU + Timex FPU aka PA-89) because that's what I have the docs for.
-;; Scheduling instructions for PA-83 machines according to the Snake
-;; constraints shouldn't hurt.
-
-;; (define_function_unit {name} {num-units} {n-users} {test}
-;;                       {ready-delay} {issue-delay} [{conflict-list}])
-
-;; The integer ALU.
-;; (Noted only for documentation; units that take one cycle do not need to
-;; be specified.)
-
-;; (define_function_unit "alu" 1 0
-;;  (and (eq_attr "type" "unary,shift,nullshift,binary,move,address")
-;;	 (eq_attr "cpu" "700"))
-;;  1 0)
-
-
 ;; Memory. Disregarding Cache misses, the Mustang memory times are:
 ;; load: 2, fpload: 3
 ;; store, fpstore: 3, no D-cache operations should be scheduled.

-(define_function_unit "pa700memory" 1 0
-  (and (eq_attr "type" "load,fpload")
-       (eq_attr "cpu" "700")) 2 0)
-(define_function_unit "pa700memory" 1 0 
-  (and (eq_attr "type" "store,fpstore")
-       (eq_attr "cpu" "700")) 3 3)
-
 ;; The Timex (aka 700) has two floating-point units: ALU, and MUL/DIV/SQRT.
 ;; Timings:
 ;; Instruction	Time	Unit	Minimum Distance (unit contention)
@ -186,44 +161,73 @@
 ;; fdiv,dbl	12	MPY	12
 ;; fsqrt,sgl	14	MPY	14
 ;; fsqrt,dbl	18	MPY	18
+;;
+;; We don't model fmpyadd/fmpysub properly as those instructions
+;; keep both the FP ALU and MPY units busy.  Given that these
+;; processors are obsolete, I'm not going to spend the time to
+;; model those instructions correctly.

-(define_function_unit "pa700fp_alu" 1 0
+(define_automaton "pa700")
+(define_cpu_unit "dummy_700,mem_700,fpalu_700,fpmpy_700" "pa700")
+
+(define_insn_reservation "W0" 4
  (and (eq_attr "type" "fpcc")
-       (eq_attr "cpu" "700")) 4 2)
-(define_function_unit "pa700fp_alu" 1 0
+       (eq_attr "cpu" "700"))
+  "fpalu_700*2")
+
+(define_insn_reservation "W1" 3
  (and (eq_attr "type" "fpalu")
-       (eq_attr "cpu" "700")) 3 2)
-(define_function_unit "pa700fp_mpy" 1 0
+       (eq_attr "cpu" "700"))
+  "fpalu_700*2")
+
+(define_insn_reservation "W2" 3
  (and (eq_attr "type" "fpmulsgl,fpmuldbl")
-       (eq_attr "cpu" "700")) 3 2)
-(define_function_unit "pa700fp_mpy" 1 0
+       (eq_attr "cpu" "700"))
+  "fpmpy_700*2")
+
+(define_insn_reservation "W3" 10
  (and (eq_attr "type" "fpdivsgl")
-       (eq_attr "cpu" "700")) 10 10)
-(define_function_unit "pa700fp_mpy" 1 0
+       (eq_attr "cpu" "700"))
+  "fpmpy_700*10")
+
+(define_insn_reservation "W4" 12
  (and (eq_attr "type" "fpdivdbl")
-       (eq_attr "cpu" "700")) 12 12)
-(define_function_unit "pa700fp_mpy" 1 0
+       (eq_attr "cpu" "700"))
+  "fpmpy_700*12")
+
+(define_insn_reservation "W5" 14
  (and (eq_attr "type" "fpsqrtsgl")
-       (eq_attr "cpu" "700")) 14 14)
-(define_function_unit "pa700fp_mpy" 1 0
+       (eq_attr "cpu" "700"))
+  "fpmpy_700*14")
+
+(define_insn_reservation "W6" 18
  (and (eq_attr "type" "fpsqrtdbl")
-       (eq_attr "cpu" "700")) 18 18)
+       (eq_attr "cpu" "700"))
+  "fpmpy_700*18")
+
+(define_insn_reservation "W7" 2
+  (and (eq_attr "type" "load,fpload")
+       (eq_attr "cpu" "700"))
+  "mem_700")
+
+(define_insn_reservation "W8" 3
+  (and (eq_attr "type" "store,fpstore")
+       (eq_attr "cpu" "700"))
+  "mem_700*3")
+
+(define_insn_reservation "W9" 1
+  (and (eq_attr "type" "!fpcc,fpalu,fpmulsgl,fpmuldbl,fpdivsgl,fpdivdbl,fpsqrtsgl,fpsqrtdbl,load,fpload,store,fpstore")
+       (eq_attr "cpu" "700"))
+  "dummy_700")

 ;; Function units for the 7100 and 7150.  The 7100/7150 can dual-issue
 ;; floating point computations with non-floating point computations (fp loads
 ;; and stores are not fp computations).
 ;;
-
 ;; Memory. Disregarding Cache misses, memory loads take two cycles; stores also
 ;; take two cycles, during which no Dcache operations should be scheduled.
 ;; Any special cases are handled in pa_adjust_cost.  The 7100, 7150 and 7100LC
 ;; all have the same memory characteristics if one disregards cache misses.
-(define_function_unit "pa7100memory" 1 0
-  (and (eq_attr "type" "load,fpload")
-       (eq_attr "cpu" "7100,7100LC")) 2 0)
-(define_function_unit "pa7100memory" 1 0 
-  (and (eq_attr "type" "store,fpstore")
-       (eq_attr "cpu" "7100,7100LC")) 2 2)

 ;; The 7100/7150 has three floating-point units: ALU, MUL, and DIV.
 ;; Timings:
@ -243,41 +247,46 @@
 ;; fdiv,dbl	15	DIV	15
 ;; fsqrt,sgl	8	DIV	8
 ;; fsqrt,dbl	15	DIV	15
+;;
+;; We don't really model the FP ALU/MPY units properly (they are
+;; distinct subunits in the FP unit).  However, there can never be
+;; a functional unit; conflict given the latency and issue rates
+;; for those units.

-(define_function_unit "pa7100fp_alu" 1 0
-  (and (eq_attr "type" "fpcc,fpalu")
-       (eq_attr "cpu" "7100")) 2 1)
-(define_function_unit "pa7100fp_mpy" 1 0
-  (and (eq_attr "type" "fpmulsgl,fpmuldbl")
-       (eq_attr "cpu" "7100")) 2 1)
-(define_function_unit "pa7100fp_div" 1 0
+(define_automaton "pa7100")
+(define_cpu_unit "i_7100, f_7100,fpmac_7100,fpdivsqrt_7100,mem_7100" "pa7100")
+
+(define_insn_reservation "X0" 2
+  (and (eq_attr "type" "fpcc,fpalu,fpmulsgl,fpmuldbl")
+       (eq_attr "cpu" "7100"))
+  "f_7100,fpmac_7100")
+
+(define_insn_reservation "X1" 8
  (and (eq_attr "type" "fpdivsgl,fpsqrtsgl")
-       (eq_attr "cpu" "7100")) 8 8)
-(define_function_unit "pa7100fp_div" 1 0
+       (eq_attr "cpu" "7100"))
+  "f_7100+fpdivsqrt_7100,fpdivsqrt_7100*7")
+
+(define_insn_reservation "X2" 15
  (and (eq_attr "type" "fpdivdbl,fpsqrtdbl")
-       (eq_attr "cpu" "7100")) 15 15)
+       (eq_attr "cpu" "7100"))
+  "f_7100+fpdivsqrt_7100,fpdivsqrt_7100*14")

-;; To encourage dual issue we define function units corresponding to
-;; the instructions which can be dual issued.    This is a rather crude
-;; approximation, the "pa7100nonflop" test in particular could be refined.
-(define_function_unit "pa7100flop" 1 1
-  (and
-    (eq_attr "type" "fpcc,fpalu,fpmulsgl,fpmuldbl,fpdivsgl,fpsqrtsgl,fpdivdbl,fpsqrtdbl")
-    (eq_attr "cpu" "7100")) 1 1)
+(define_insn_reservation "X3" 2
+  (and (eq_attr "type" "load,fpload")
+       (eq_attr "cpu" "7100"))
+  "i_7100+mem_7100")

-(define_function_unit "pa7100nonflop" 1 1
-  (and
-    (eq_attr "type" "!fpcc,fpalu,fpmulsgl,fpmuldbl,fpdivsgl,fpsqrtsgl,fpdivdbl,fpsqrtdbl")
-    (eq_attr "cpu" "7100")) 1 1)
+(define_insn_reservation "X4" 2
+  (and (eq_attr "type" "store,fpstore")
+       (eq_attr "cpu" "7100"))
+  "i_7100+mem_7100,mem_7100")

-
-;; Memory subsystem works just like 7100/7150 (except for cache miss times which
-;; we don't model here).  
+(define_insn_reservation "X5" 1
+  (and (eq_attr "type" "!fpcc,fpalu,fpmulsgl,fpmuldbl,fpdivsgl,fpsqrtsgl,fpdivdbl,fpsqrtdbl,load,fpload,store,fpstore")
+       (eq_attr "cpu" "7100"))
+  "i_7100")

 ;; The 7100LC has three floating-point units: ALU, MUL, and DIV.
-;; Note divides and sqrt flops lock the cpu until the flop is
-;; finished.  fmpy and xmpyu (fmpyi) lock the cpu for one cycle.
-;; There's no way to avoid the penalty.
 ;; Timings:
 ;; Instruction	Time	Unit	Minimum Distance (unit contention)
 ;; fcpy		2	ALU	1
@ -299,106 +308,179 @@
 ;; fdiv,dbl	15	DIV	15
 ;; fsqrt,sgl	8	DIV	8
 ;; fsqrt,dbl	15	DIV	15
+;;
+;; The PA7200 is just like the PA7100LC except that there is
+;; no store-store penalty.
+;;
+;; The PA7300 is just like the PA7200 except that there is
+;; no store-load penalty.
+;;
+;; Note there are some aspects of the 7100LC we are not modeling
+;; at the moment.  I'll be reviewing the 7100LC scheduling info
+;; shortly and updating this description.
+;;
+;;   load-load pairs
+;;   store-store pairs
+;;   fmpyadd,dbl
+;;   fmpysub,dbl
+;;   other issue modeling

-(define_function_unit "pa7100LCfp_alu" 1 0
+(define_automaton "pa7100lc")
+(define_cpu_unit "i0_7100lc, i1_7100lc, f_7100lc" "pa7100lc")
+(define_cpu_unit "fpalu_7100lc,fpdivsqrt_7100lc,fpmul_7100lc" "pa7100lc")
+(define_cpu_unit "mem_7100lc" "pa7100lc")
+
+(define_insn_reservation "Y0" 2
  (and (eq_attr "type" "fpcc,fpalu")
-       (eq_attr "cpu" "7100LC,7200")) 2 1)
-(define_function_unit "pa7100LCfp_mpy" 1 0
+       (eq_attr "cpu" "7100LC,7200,7300"))
+  "f_7100lc,fpalu_7100lc")
+
+(define_insn_reservation "Y1" 2
  (and (eq_attr "type" "fpmulsgl")
-       (eq_attr "cpu" "7100LC,7200")) 2 1)
-(define_function_unit "pa7100LCfp_mpy" 1 0
+       (eq_attr "cpu" "7100LC,7200,7300"))
+  "f_7100lc,fpmul_7100lc")
+
+(define_insn_reservation "Y2" 3
  (and (eq_attr "type" "fpmuldbl")
-       (eq_attr "cpu" "7100LC,7200")) 3 2)
-(define_function_unit "pa7100LCfp_div" 1 0
+       (eq_attr "cpu" "7100LC,7200,7300"))
+  "f_7100lc,fpmul_7100lc,fpmul_7100lc")
+
+(define_insn_reservation "Y3" 8
  (and (eq_attr "type" "fpdivsgl,fpsqrtsgl")
-       (eq_attr "cpu" "7100LC,7200")) 8 8)
-(define_function_unit "pa7100LCfp_div" 1 0
+       (eq_attr "cpu" "7100LC,7200,7300"))
+  "f_7100lc+fpdivsqrt_7100lc,fpdivsqrt_7100lc*7")
+
+(define_insn_reservation "Y4" 15
  (and (eq_attr "type" "fpdivdbl,fpsqrtdbl")
-       (eq_attr "cpu" "7100LC,7200")) 15 15)
+       (eq_attr "cpu" "7100LC,7200,7300"))
+  "f_7100lc+fpdivsqrt_7100lc,fpdivsqrt_7100lc*14")

-;; Define the various functional units for dual-issue.
+(define_insn_reservation "Y5" 2
+  (and (eq_attr "type" "load,fpload")
+       (eq_attr "cpu" "7100LC,7200,7300"))
+  "i1_7100lc+mem_7100lc")

-;; There's only one floating point unit.
-(define_function_unit "pa7100LCflop" 1 1
-  (and
-    (eq_attr "type" "fpcc,fpalu,fpmulsgl,fpmuldbl,fpdivsgl,fpsqrtsgl,fpdivdbl,fpsqrtdbl")
-    (eq_attr "cpu" "7100LC,7200")) 1 1)
+(define_insn_reservation "Y6" 2
+  (and (eq_attr "type" "store,fpstore")
+       (eq_attr "cpu" "7100LC"))
+  "i1_7100lc+mem_7100lc,mem_7100lc")

-;; Shifts and memory ops execute in only one of the integer ALUs
-(define_function_unit "pa7100LCshiftmem" 1 1
-  (and
-    (eq_attr "type" "shift,nullshift,load,fpload,store,fpstore")
-    (eq_attr "cpu" "7100LC,7200")) 1 1)
+(define_insn_reservation "Y7" 1
+  (and (eq_attr "type" "shift,nullshift")
+       (eq_attr "cpu" "7100LC,7200,7300"))
+  "i1_7100lc")

-;; We have two basic ALUs.
-(define_function_unit "pa7100LCalu" 2 1
-  (and
-    (eq_attr "type" "!fpcc,fpalu,fpmulsgl,fpmuldbl,fpdivsgl,fpsqrtsgl,fpdivdbl,fpsqrtdbl")
-   (eq_attr "cpu" "7100LC,7200")) 1 1)
+(define_insn_reservation "Y8" 1
+  (and (eq_attr "type" "!fpcc,fpalu,fpmulsgl,fpmuldbl,fpdivsgl,fpsqrtsgl,fpdivdbl,fpsqrtdbl,load,fpload,store,fpstore,shift,nullshift")
+       (eq_attr "cpu" "7100LC,7200,7300"))
+  "(i0_7100lc|i1_7100lc)")

-;; I don't have complete information on the PA7200; however, most of
-;; what I've heard makes it look like a 7100LC without the store-store
-;; penalty.  So that's how we'll model it.
+;; The 7200 has a store-load penalty
+(define_insn_reservation "Y9" 2
+  (and (eq_attr "type" "store,fpstore")
+       (eq_attr "cpu" "7200"))
+  "i0_7100lc,mem_7100lc")

-;; Memory. Disregarding Cache misses, memory loads and stores take
-;; two cycles.  Any special cases are handled in pa_adjust_cost.
-(define_function_unit "pa7200memory" 1 0
-  (and (eq_attr "type" "load,fpload,store,fpstore")
-       (eq_attr "cpu" "7200")) 2 0)
+;; The 7300 has no penalty for store-store or store-load
+(define_insn_reservation "YA" 2
+  (and (eq_attr "type" "store,fpstore")
+       (eq_attr "cpu" "7300"))
+  "i0_7100lc")

-;; I don't have detailed information on the PA7200 FP pipeline, so I
-;; treat it just like the 7100LC pipeline.
-;; Similarly for the multi-issue fake units.
-
-;; 
 ;; Scheduling for the PA8000 is somewhat different than scheduling for a
 ;; traditional architecture.
 ;;
 ;; The PA8000 has a large (56) entry reorder buffer that is split between
 ;; memory and non-memory operations.
 ;;
-;; The PA800 can issue two memory and two non-memory operations per cycle to
-;; the function units.  Similarly, the PA8000 can retire two memory and two
-;; non-memory operations per cycle.
+;; The PA8000 can issue two memory and two non-memory operations per cycle to
+;; the function units, with the exception of branches and multi-output
+;; instructions.  The PA8000 can retire two non-memory operations per cycle
+;; and two memory operations per cycle, only one of which may be a store.
 ;;
 ;; Given the large reorder buffer, the processor can hide most latencies.
 ;; According to HP, they've got the best results by scheduling for retirement
 ;; bandwidth with limited latency scheduling for floating point operations.
 ;; Latency for integer operations and memory references is ignored.
 ;;
+;;
 ;; We claim floating point operations have a 2 cycle latency and are
-;; fully pipelined, except for div and sqrt which are not pipelined.
+;; fully pipelined, except for div and sqrt which are not pipelined and
+;; take from 17 to 31 cycles to complete.
 ;;
-;; It is not necessary to define the shifter and integer alu units.
-;;
-;; These first two define_unit_unit descriptions model retirement from
-;; the reorder buffer.
-(define_function_unit "pa8000lsu" 2 1
-  (and
-    (eq_attr "type" "load,fpload,store,fpstore")
-    (eq_attr "cpu" "8000")) 1 1)
+;; It's worth noting that there is no way to saturate all the functional
+;; units on the PA8000 as there is not enough issue bandwidth.

-(define_function_unit "pa8000alu" 2 1
-  (and
-    (eq_attr "type" "!load,fpload,store,fpstore")
-    (eq_attr "cpu" "8000")) 1 1)
+(define_automaton "pa8000")
+(define_cpu_unit "inm0_8000, inm1_8000, im0_8000, im1_8000" "pa8000")
+(define_cpu_unit "rnm0_8000, rnm1_8000, rm0_8000, rm1_8000" "pa8000")
+(define_cpu_unit "store_8000" "pa8000")
+(define_cpu_unit "f0_8000, f1_8000" "pa8000")
+(define_cpu_unit "fdivsqrt0_8000, fdivsqrt1_8000" "pa8000")
+(define_reservation "inm_8000" "inm0_8000 | inm1_8000")
+(define_reservation "im_8000" "im0_8000 | im1_8000")
+(define_reservation "rnm_8000" "rnm0_8000 | rnm1_8000")
+(define_reservation "rm_8000" "rm0_8000 | rm1_8000")
+(define_reservation "f_8000" "f0_8000 | f1_8000")
+(define_reservation "fdivsqrt_8000" "fdivsqrt0_8000 | fdivsqrt1_8000")

-;; Claim floating point ops have a 2 cycle latency, excluding div and
-;; sqrt, which are not pipelined and issue to different units.
-(define_function_unit "pa8000fmac" 2 0
+;; We can issue any two memops per cycle, but we can only retire
+;; one memory store per cycle.  We assume that the reorder buffer
+;; will hide any memory latencies per HP's recommendation.
+(define_insn_reservation "Z0" 0
  (and
-    (eq_attr "type" "fpcc,fpalu,fpmulsgl,fpmuldbl")
-    (eq_attr "cpu" "8000")) 2 1)
+    (eq_attr "type" "load,fpload")
+    (eq_attr "cpu" "8000"))
+  "im_8000,rm_8000")

-(define_function_unit "pa8000fdiv" 2 1
+(define_insn_reservation "Z1" 0
  (and
-    (eq_attr "type" "fpdivsgl,fpsqrtsgl")
-    (eq_attr "cpu" "8000")) 17 17)
+    (eq_attr "type" "store,fpstore")
+    (eq_attr "cpu" "8000"))
+  "im_8000,rm_8000+store_8000")

-(define_function_unit "pa8000fdiv" 2 1
+;; We can issue and retire two non-memory operations per cycle with
+;; a few exceptions (branches).  This group catches those we want
+;; to assume have zero latency.
+(define_insn_reservation "Z2" 0
  (and
-    (eq_attr "type" "fpdivdbl,fpsqrtdbl")
-    (eq_attr "cpu" "8000")) 31 31)
+    (eq_attr "type" "!load,fpload,store,fpstore,uncond_branch,branch,cbranch,fbranch,call,dyncall,multi,milli,parallel_branch,fpcc,fpalu,fpmulsgl,fpmuldbl,fpsqrtsgl,fpsqrtdbl,fpdivsgl,fpdivdbl")
+    (eq_attr "cpu" "8000"))
+  "inm_8000,rnm_8000")
+
+;; Branches use both slots in the non-memory issue and
+;; retirement unit.
+(define_insn_reservation "Z3" 0
+  (and
+    (eq_attr "type" "uncond_branch,branch,cbranch,fbranch,call,dyncall,multi,milli,parallel_branch")
+    (eq_attr "cpu" "8000"))
+  "inm0_8000+inm1_8000,rnm0_8000+rnm1_8000")
+
+;; We partial latency schedule the floating point units.
+;; They can issue/retire two at a time in the non-memory
+;; units.  We fix their latency at 2 cycles and they
+;; are fully pipelined.
+(define_insn_reservation "Z4" 1
+ (and
+   (eq_attr "type" "fpcc,fpalu,fpmulsgl,fpmuldbl")
+   (eq_attr "cpu" "8000"))
+ "inm_8000,f_8000,rnm_8000")
+
+;; The fdivsqrt units are not pipelined and have a very long latency.  
+;; To keep the DFA from exploding, we do not show all the
+;; reservations for the divsqrt unit.
+(define_insn_reservation "Z5" 17
+ (and
+   (eq_attr "type" "fpdivsgl,fpsqrtsgl")
+   (eq_attr "cpu" "8000"))
+ "inm_8000,fdivsqrt_8000*6,rnm_8000")
+
+(define_insn_reservation "Z6" 31
+ (and
+   (eq_attr "type" "fpdivdbl,fpsqrtdbl")
+   (eq_attr "cpu" "8000"))
+ "inm_8000,fdivsqrt_8000*6,rnm_8000")
+


 ;; Compare instructions.
--- a/gcc/config/sh/sh.c
+++ b/gcc/config/sh/sh.c
@ -197,6 +197,9 @@ static void sh_insert_attributes PARAMS ((tree, tree *));
 static void sh_asm_named_section PARAMS ((const char *, unsigned int));
 #endif
 static int sh_adjust_cost PARAMS ((rtx, rtx, rtx, int));
+static int sh_use_dfa_interface PARAMS ((void));
+static int sh_issue_rate PARAMS ((void));
+
 static bool sh_cannot_modify_jumps_p PARAMS ((void));

 static bool sh_ms_bitfield_layout_p PARAMS ((tree));
@ -226,6 +229,12 @@ static bool sh_ms_bitfield_layout_p PARAMS ((tree));
 #undef TARGET_SCHED_ADJUST_COST
 #define TARGET_SCHED_ADJUST_COST sh_adjust_cost

+#undef TARGET_SCHED_USE_DFA_PIPELINE_INTERFACE 
+#define TARGET_SCHED_USE_DFA_PIPELINE_INTERFACE \
+				sh_use_dfa_interface
+#undef TARGET_SCHED_ISSUE_RATE
+#define TARGET_SCHED_ISSUE_RATE sh_issue_rate
+
 #undef TARGET_CANNOT_MODIFY_JUMPS_P
 #define TARGET_CANNOT_MODIFY_JUMPS_P sh_cannot_modify_jumps_p

@ -6726,6 +6735,29 @@ sh_pr_n_sets ()
  return REG_N_SETS (TARGET_SHMEDIA ? PR_MEDIA_REG : PR_REG);
 }

+/* This Function Returns non zero if DFA based scheduler
+   interface is to be used.At present supported only for
+   SH4.  */
+static int
+sh_use_dfa_interface()
+{
+        if (TARGET_SH4)
+                return 1;
+        else
+                return 0;
+}
+
+/* This function returns "2" that signifies dual issue 
+   for SH4 processor.To be used by DFA pipeline description.  */
+static int
+sh_issue_rate()
+{
+	if(TARGET_SH4)
+		return 2;
+	else
+		return 1;
+}
+
 /* SHmedia requires registers for branches, so we can't generate new
   branches past reload.  */
 static bool
--- a/gcc/config/sh/sh.md
+++ b/gcc/config/sh/sh.md
@ -195,6 +195,28 @@
 "cbranch,jump,jump_ind,arith,arith3,arith3b,dyn_shift,other,load,load_si,store,move,fmove,smpy,dmpy,return,pload,prset,pstore,prget,pcload,pcload_si,pt,ptabs,rte,sfunc,call,fp,fdiv,dfp_arith,dfp_cmp,dfp_conv,dfdiv,gp_fpul,nil"
  (const_string "other"))

+;; We define a new attribute namely "insn_class".We use
+;; this for DFA based pipeline description.
+;; Although the "type" attribute covers almost all insn 
+;; classes,it is more convenient to define new attribute
+;; for certain reservations.
+;;
+;; mt_group      SH4 "mt" group instructions.
+;;
+;; ex_group      SH4 "ex" group instructions.They mostly
+;;               overlap with arithmetic instructions but
+;;               new attribute defined to distinguish from
+;;	         mt group instructions.
+;;
+;; lds_to_fpscr  The "type" attribute couldn't sufficiently
+;;               distinguish it from others.It is part of 
+;;               new attribute.Similar case with ldsmem_to_fpscr
+;;	  	 and cwb. 
+
+(define_attr "insn_class"
+	     "mt_group,ex_group,lds_to_fpscr,ldsmem_to_fpscr,cwb,none"
+	     (const_string "none"))
+
 ;; Indicate what precision must be selected in fpscr for this insn, if any.

 (define_attr "fp_mode" "single,double,none" (const_string "none"))
@ -631,7 +653,8 @@
 		       (match_operand:SI 1 "arith_operand" "L,r"))
 	       (const_int 0)))]
  "TARGET_SH1"
-  "tst	%1,%0")
+  "tst	%1,%0"
+  [(set_attr "insn_class" "mt_group")])

 ;; ??? Perhaps should only accept reg/constant if the register is reg 0.
 ;; That would still allow reload to create cmpi instructions, but would
@ -647,7 +670,8 @@
  "@
 	tst	%0,%0
 	cmp/eq	%1,%0
-	cmp/eq	%1,%0")
+	cmp/eq	%1,%0"
+   [(set_attr "insn_class" "mt_group,mt_group,mt_group")])

 (define_insn "cmpgtsi_t"
  [(set (reg:SI T_REG)
@ -656,7 +680,8 @@
  "TARGET_SH1"
  "@
 	cmp/gt	%1,%0
-	cmp/pl	%0")
+	cmp/pl	%0"
+   [(set_attr "insn_class" "mt_group,mt_group")])

 (define_insn "cmpgesi_t"
  [(set (reg:SI T_REG)
@ -665,8 +690,9 @@
  "TARGET_SH1"
  "@
 	cmp/ge	%1,%0
-	cmp/pz	%0")
-
+	cmp/pz	%0"
+   [(set_attr "insn_class" "mt_group,mt_group")])
+
 ;; -------------------------------------------------------------------------
 ;; SImode unsigned integer comparisons
 ;; -------------------------------------------------------------------------
@ -676,14 +702,16 @@
 	(geu:SI (match_operand:SI 0 "arith_reg_operand" "r")
 		(match_operand:SI 1 "arith_reg_operand" "r")))]
  "TARGET_SH1"
-  "cmp/hs	%1,%0")
+  "cmp/hs	%1,%0"
+   [(set_attr "insn_class" "mt_group")])

 (define_insn "cmpgtusi_t"
  [(set (reg:SI T_REG)
 	(gtu:SI (match_operand:SI 0 "arith_reg_operand" "r")
 		(match_operand:SI 1 "arith_reg_operand" "r")))]
  "TARGET_SH1"
-  "cmp/hi	%1,%0")
+  "cmp/hi	%1,%0"
+   [(set_attr "insn_class" "mt_group")])

 ;; We save the compare operands in the cmpxx patterns and use them when
 ;; we generate the branch.
@ -1050,7 +1078,8 @@
 	(ltu:SI (plus:SI (match_dup 1) (match_dup 2)) (match_dup 1)))]
  "TARGET_SH1"
  "addc	%2,%0"
-  [(set_attr "type" "arith")])
+  [(set_attr "type" "arith")
+   (set_attr "insn_class" "ex_group")])

 (define_insn "addc1"
  [(set (match_operand:SI 0 "arith_reg_operand" "=r")
@ -1060,7 +1089,8 @@
   (clobber (reg:SI T_REG))]
  "TARGET_SH1"
  "addc	%2,%0"
-  [(set_attr "type" "arith")])
+  [(set_attr "type" "arith")
+   (set_attr "insn_class" "ex_group")])

 (define_expand "addsi3"
  [(set (match_operand:SI 0 "arith_reg_operand" "")
@ -1088,8 +1118,9 @@
 		 (match_operand:SI 2 "arith_operand" "rI")))]
  "TARGET_SH1"
  "add	%2,%0"
-  [(set_attr "type" "arith")])
-
+  [(set_attr "type" "arith")
+   (set_attr "insn_class" "ex_group")])
+
 ;; -------------------------------------------------------------------------
 ;; Subtraction instructions
 ;; -------------------------------------------------------------------------
@ -1155,7 +1186,8 @@
 	(gtu:SI (minus:SI (match_dup 1) (match_dup 2)) (match_dup 1)))]
  "TARGET_SH1"
  "subc	%2,%0"
-  [(set_attr "type" "arith")])
+  [(set_attr "type" "arith")
+   (set_attr "insn_class" "ex_group")])

 (define_insn "subc1"
  [(set (match_operand:SI 0 "arith_reg_operand" "=r")
@ -1165,7 +1197,8 @@
   (clobber (reg:SI T_REG))]
  "TARGET_SH1"
  "subc	%2,%0"
-  [(set_attr "type" "arith")])
+  [(set_attr "type" "arith")
+   (set_attr "insn_class" "ex_group")])

 (define_insn "*subsi3_internal"
  [(set (match_operand:SI 0 "arith_reg_operand" "=r")
@ -1173,7 +1206,8 @@
 		  (match_operand:SI 2 "arith_reg_operand" "r")))]
  "TARGET_SH1"
  "sub	%2,%0"
-  [(set_attr "type" "arith")])
+  [(set_attr "type" "arith")
+   (set_attr "insn_class" "ex_group")])

 (define_insn "*subsi3_media"
  [(set (match_operand:SI 0 "arith_reg_operand" "=r")
@ -1909,7 +1943,8 @@
 		(match_operand:SI 2 "logical_operand" "r,L")))]
  "TARGET_SH1"
  "and	%2,%0"
-  [(set_attr "type" "arith")])
+  [(set_attr "type" "arith")
+   (set_attr "insn_class" "ex_group")])

 ;; If the constant is 255, then emit a extu.b instruction instead of an
 ;; and, since that will give better code.
@ -1951,7 +1986,8 @@
 		(match_operand:SI 2 "logical_operand" "r,L")))]
  "TARGET_SH1"
  "or	%2,%0"
-  [(set_attr "type" "arith")])
+  [(set_attr "type" "arith")
+   (set_attr "insn_class" "ex_group")])

 (define_insn "iordi3"
  [(set (match_operand:DI 0 "arith_reg_operand" "=r,r")
@ -1968,7 +2004,8 @@
 		(match_operand:SI 2 "logical_operand" "L,r")))]
  "TARGET_SH1"
  "xor	%2,%0"
-  [(set_attr "type" "arith")])
+  [(set_attr "type" "arith")
+   (set_attr "insn_class" "ex_group")])

 (define_insn "xordi3"
  [(set (match_operand:DI 0 "arith_reg_operand" "=r,r")
@ -1991,7 +2028,8 @@
 	(lshiftrt:SI (match_dup 1) (const_int 31)))]
  "TARGET_SH1"
  "rotl	%0"
-  [(set_attr "type" "arith")])
+  [(set_attr "type" "arith")
+   (set_attr "insn_class" "ex_group")])

 (define_insn "rotlsi3_31"
  [(set (match_operand:SI 0 "arith_reg_operand" "=r")
@ -2000,7 +2038,8 @@
   (clobber (reg:SI T_REG))]
  "TARGET_SH1"
  "rotr	%0"
-  [(set_attr "type" "arith")])
+  [(set_attr "type" "arith")
+   (set_attr "insn_class" "ex_group")])

 (define_insn "rotlsi3_16"
  [(set (match_operand:SI 0 "arith_reg_operand" "=r")
@ -2008,7 +2047,8 @@
 		   (const_int 16)))]
  "TARGET_SH1"
  "swap.w	%1,%0"
-  [(set_attr "type" "arith")])
+  [(set_attr "type" "arith")
+  (set_attr "insn_class" "ex_group")])

 (define_expand "rotlsi3"
  [(set (match_operand:SI 0 "arith_reg_operand" "")
@ -2072,7 +2112,8 @@
 		   (const_int 8)))]
  "TARGET_SH1"
  "swap.b	%1,%0"
-  [(set_attr "type" "arith")])
+  [(set_attr "type" "arith")
+   (set_attr "insn_class" "ex_group")])

 (define_expand "rotlhi3"
  [(set (match_operand:HI 0 "arith_reg_operand" "")
@ -2114,7 +2155,8 @@
     (clobber (match_dup 4))])]
  "operands[4] = gen_rtx_SCRATCH (SImode);"
  [(set_attr "length" "*,*,*,4")
-   (set_attr "type" "dyn_shift,arith,arith,arith")])
+   (set_attr "type" "dyn_shift,arith,arith,arith")
+   (set_attr "insn_class" "ex_group,ex_group,ex_group,ex_group")])

 (define_insn "ashlhi3_k"
  [(set (match_operand:HI 0 "arith_reg_operand" "=r,r")
@ -2124,7 +2166,8 @@
  "@
 	add	%0,%0
 	shll%O2	%0"
-  [(set_attr "type" "arith")])
+  [(set_attr "type" "arith")
+   (set_attr "insn_class" "ex_group")])

 (define_insn "ashlsi3_n"
  [(set (match_operand:SI 0 "arith_reg_operand" "=r")
@ -2141,7 +2184,8 @@
 	       (eq (symbol_ref "shift_insns_rtx (insn)") (const_int 3))
 	       (const_string "6")]
 	      (const_string "8")))
-   (set_attr "type" "arith")])
+   (set_attr "type" "arith")
+   (set_attr "insn_class" "ex_group")])

 (define_split
  [(set (match_operand:SI 0 "arith_reg_operand" "")
@ -2229,7 +2273,8 @@
   (clobber (reg:SI T_REG))]
  "TARGET_SH1 && INTVAL (operands[2]) == 1"
  "shar	%0"
-  [(set_attr "type" "arith")])
+  [(set_attr "type" "arith")
+   (set_attr "insn_class" "ex_group")])

 ;; We can't do HImode right shifts correctly unless we start out with an
 ;; explicit zero / sign extension; doing that would result in worse overall
@ -2288,7 +2333,8 @@
 	(lt:SI (match_dup 1) (const_int 0)))]
  "TARGET_SH1"
  "shll	%0"
-  [(set_attr "type" "arith")])
+  [(set_attr "type" "arith")
+   (set_attr "insn_class" "ex_group")])

 (define_insn "ashrsi3_d"
  [(set (match_operand:SI 0 "arith_reg_operand" "=r")
@ -2296,7 +2342,8 @@
 		     (neg:SI (match_operand:SI 2 "arith_reg_operand" "r"))))]
  "TARGET_SH3"
  "shad	%2,%0"
-  [(set_attr "type" "dyn_shift")])
+  [(set_attr "type" "dyn_shift")
+   (set_attr "insn_class" "ex_group")])

 (define_insn "ashrsi3_n"
  [(set (reg:SI R4_REG)
@ -2346,7 +2393,8 @@
 		     (neg:SI (match_operand:SI 2 "arith_reg_operand" "r"))))]
  "TARGET_SH3"
  "shld	%2,%0"
-  [(set_attr "type" "dyn_shift")])
+  [(set_attr "type" "dyn_shift")
+   (set_attr "insn_class" "ex_group")])

 ;;  Only the single bit shift clobbers the T bit.

@ -2357,7 +2405,8 @@
   (clobber (reg:SI T_REG))]
  "TARGET_SH1 && CONST_OK_FOR_M (INTVAL (operands[2]))"
  "shlr	%0"
-  [(set_attr "type" "arith")])
+  [(set_attr "type" "arith")
+   (set_attr "insn_class" "ex_group")])

 (define_insn "lshrsi3_k"
  [(set (match_operand:SI 0 "arith_reg_operand" "=r")
@ -2366,7 +2415,8 @@
  "TARGET_SH1 && CONST_OK_FOR_K (INTVAL (operands[2]))
   && ! CONST_OK_FOR_M (INTVAL (operands[2]))"
  "shlr%O2	%0"
-  [(set_attr "type" "arith")])
+  [(set_attr "type" "arith")
+   (set_attr "insn_class" "ex_group")])

 (define_insn "lshrsi3_n"
  [(set (match_operand:SI 0 "arith_reg_operand" "=r")
@ -2444,7 +2494,8 @@
  "TARGET_SH1"
  "shll	%R0\;rotcl	%S0"
  [(set_attr "length" "4")
-   (set_attr "type" "arith")])
+   (set_attr "type" "arith")
+   (set_attr "insn_class" "ex_group")])

 (define_insn "ashldi3_media"
  [(set (match_operand:DI 0 "arith_reg_operand" "=r,r")
@ -2483,7 +2534,8 @@
  "TARGET_SH1"
  "shlr	%S0\;rotcr	%R0"
  [(set_attr "length" "4")
-   (set_attr "type" "arith")])
+   (set_attr "type" "arith")
+   (set_attr "insn_class" "ex_group")])

 (define_insn "lshrdi3_media"
  [(set (match_operand:DI 0 "arith_reg_operand" "=r,r")
@ -2522,7 +2574,8 @@
  "TARGET_SH1"
  "shar	%S0\;rotcr	%R0"
  [(set_attr "length" "4")
-   (set_attr "type" "arith")])
+   (set_attr "type" "arith")
+   (set_attr "insn_class" "ex_group")])

 (define_insn "ashrdi3_media"
  [(set (match_operand:DI 0 "arith_reg_operand" "=r,r")
@ -2756,7 +2809,8 @@
 			     (const_int 16))))]
  "TARGET_SH1"
  "xtrct	%1,%0"
-  [(set_attr "type" "arith")])
+  [(set_attr "type" "arith")
+   (set_attr "insn_class" "ex_group")])

 (define_insn "xtrct_right"
  [(set (match_operand:SI 0 "arith_reg_operand" "=r")
@ -2766,8 +2820,9 @@
 			   (const_int 16))))]
  "TARGET_SH1"
  "xtrct	%2,%0"
-  [(set_attr "type" "arith")])
-
+  [(set_attr "type" "arith")
+   (set_attr "insn_class" "ex_group")])
+
 ;; -------------------------------------------------------------------------
 ;; Unary arithmetic
 ;; -------------------------------------------------------------------------
@ -2781,7 +2836,8 @@
 	       (const_int 0)))]
  "TARGET_SH1"
  "negc	%1,%0"
-  [(set_attr "type" "arith")])
+  [(set_attr "type" "arith")
+   (set_attr "insn_class" "ex_group")])

 (define_insn "*negdi_media"
  [(set (match_operand:DI 0 "arith_reg_operand" "=r")
@ -2819,14 +2875,16 @@
 	(neg:SI (match_operand:SI 1 "arith_reg_operand" "r")))]
  "TARGET_SH1"
  "neg	%1,%0"
-  [(set_attr "type" "arith")])
+  [(set_attr "type" "arith")
+   (set_attr "insn_class" "ex_group")])

 (define_insn "one_cmplsi2"
  [(set (match_operand:SI 0 "arith_reg_operand" "=r")
 	(not:SI (match_operand:SI 1 "arith_reg_operand" "r")))]
  "TARGET_SH1"
  "not	%1,%0"
-  [(set_attr "type" "arith")])
+  [(set_attr "type" "arith")
+   (set_attr "insn_class" "ex_group")])

 (define_expand "one_cmpldi2"
  [(set (match_operand:DI 0 "arith_reg_operand" "")
@ -2872,22 +2930,25 @@
 	(zero_extend:SI (match_operand:HI 1 "arith_reg_operand" "r")))]
  "TARGET_SH1"
  "extu.w	%1,%0"
-  [(set_attr "type" "arith")])
+  [(set_attr "type" "arith")
+   (set_attr "insn_class" "ex_group")])

 (define_insn "zero_extendqisi2"
  [(set (match_operand:SI 0 "arith_reg_operand" "=r")
 	(zero_extend:SI (match_operand:QI 1 "arith_reg_operand" "r")))]
  "TARGET_SH1"
  "extu.b	%1,%0"
-  [(set_attr "type" "arith")])
+  [(set_attr "type" "arith")
+   (set_attr "insn_class" "ex_group")])

 (define_insn "zero_extendqihi2"
  [(set (match_operand:HI 0 "arith_reg_operand" "=r")
 	(zero_extend:HI (match_operand:QI 1 "arith_reg_operand" "r")))]
  "TARGET_SH1"
  "extu.b	%1,%0"
-  [(set_attr "type" "arith")])
-
+  [(set_attr "type" "arith")
+   (set_attr "insn_class" "ex_group")])
+
 ;; -------------------------------------------------------------------------
 ;; Sign extension instructions
 ;; -------------------------------------------------------------------------
@ -2941,7 +3002,8 @@
  "@
 	exts.w	%1,%0
   	mov.w	%1,%0"
-  [(set_attr "type" "arith,load")])
+  [(set_attr "type" "arith,load")
+   (set_attr "insn_class" "ex_group,*")])

 (define_insn "extendqisi2"
  [(set (match_operand:SI 0 "arith_reg_operand" "=r,r")
@ -2950,7 +3012,8 @@
  "@
 	exts.b	%1,%0
 	mov.b	%1,%0"
-  [(set_attr "type" "arith,load")])
+  [(set_attr "type" "arith,load")
+   (set_attr "insn_class" "ex_group,*")])

 (define_insn "extendqihi2"
  [(set (match_operand:HI 0 "arith_reg_operand" "=r,r")
@ -2959,8 +3022,9 @@
  "@
 	exts.b	%1,%0
 	mov.b	%1,%0"
-  [(set_attr "type" "arith,load")])
-
+  [(set_attr "type" "arith,load")
+   (set_attr "insn_class" "ex_group,*")])
+
 ;; -------------------------------------------------------------------------
 ;; Move instructions
 ;; -------------------------------------------------------------------------
@ -3070,6 +3134,7 @@
 	lds.l	%1,%0
 	fake	%1,%0"
  [(set_attr "type" "pcload_si,move,*,load_si,move,prget,move,store,store,pstore,move,prset,load,pload,pcload_si")
+   (set_attr "insn_class"  "*,*,mt_group,*,*,*,*,*,*,*,*,*,*,*,*")
   (set_attr "length" "*,*,*,*,*,*,*,*,*,*,*,*,*,*,*")])

 ;; t/r must come after r/r, lest reload will try to reload stuff like
@ -3227,7 +3292,8 @@
   (clobber (match_scratch:SI 2 "=&r"))]
  "TARGET_HARD_SH4"
  "ocbwb\\t@%0\;extu.w\\t%0,%2\;or\\t%1,%2\;mov.l\\t%0,@%2"
-  [(set_attr "length" "8")])
+  [(set_attr "length" "8")
+   (set_attr "insn_class" "cwb")])

 (define_insn "ic_invalidate_line_media"
  [(unspec_volatile [(match_operand 0 "register_operand" "r")]
@ -7415,7 +7481,8 @@
 	mov.l	%1,%0
 	sts	fpscr,%0"
  [(set_attr "length" "0,2,2,4,2,2,2,2")
-   (set_attr "type" "dfp_conv,dfp_conv,load,dfp_conv,dfp_conv,move,store,gp_fpul")])
+   (set_attr "type" "dfp_conv,dfp_conv,load,dfp_conv,dfp_conv,move,store,gp_fpul")
+   (set_attr "insn_class" "ldsmem_to_fpscr,*,*,lds_to_fpscr,*,*,*,*")])

 (define_split
  [(set (reg:PSI FPSCR_REG)
@ -8557,3 +8624,292 @@
  "TARGET_SH1"
  "mov.l @r15+,r15\;mov.l @r15+,r0"
  [(set_attr "length" "4")])
+
+;; The following description  models the
+;; SH4 pipeline using the DFA based scheduler. 
+;; The DFA based description is better way to model 
+;; a superscalar pipeline as compared to function unit
+;; reservation model.   
+;; 1. The function unit based model is oriented to describe at most one 
+;;    unit reservation by each insn. It is difficult to model unit reservations in multiple 
+;;    pipeline units by same insn. This can be done using DFA based description.
+;; 2. The execution performance of DFA based scheduler does not depend on processor complexity.
+;; 3. Writing all unit reservations for an instruction class is more natural description 
+;;    of the pipeline and makes interface of the hazard recognizer simpler than the 
+;;    old function unit based model.
+;; 4. The DFA model is richer and is a part of greater overall framework of RCSP.
+
+
+;; Two automata are defined to reduce number of states
+;; which a single large automaton will have.(Factoring)
+
+(define_automaton "inst_pipeline,fpu_pipe")
+
+;; This unit is basically the decode unit of the processor.
+;; Since SH4 is a dual issue machine,it is as if there are two 
+;; units so that any insn can be processed by either one
+;; of the decoding unit.
+
+(define_cpu_unit "pipe_01,pipe_02" "inst_pipeline")
+
+
+;; The fixed point arithmetic calculator(?? EX Unit).
+
+(define_cpu_unit  "int" "inst_pipeline")
+
+;; f1_1 and f1_2 are floating point units.Actually there is
+;; a f1 unit which can overlap with other f1 unit but
+;; not another F1 unit.It is as though there were two
+;; f1 units.
+
+(define_cpu_unit "f1_1,f1_2" "fpu_pipe")
+
+;; The floating point units.
+
+(define_cpu_unit "F1,F2,F3,FS" "fpu_pipe")
+
+;; This is basically the MA unit of SH4
+;; used in LOAD/STORE pipeline.
+
+(define_cpu_unit "memory" "inst_pipeline")
+
+;; The address calculator used for branch instructions.
+;; This will be reserved with "issue" of branch instructions
+;; and this is to make sure that  no two branch instructions 
+;; can be issued in parallel. 
+
+(define_cpu_unit "pcr_addrcalc" "inst_pipeline")
+
+;; ----------------------------------------------------
+;; This reservation is to simplify the dual issue description.
+
+(define_reservation  "issue"  "pipe_01|pipe_02")
+
+;; This is to express the locking of D stage.
+
+(define_reservation  "d_lock" "pipe_01+pipe_02")
+
+;; This is to simplify description where F1,F2,FS
+;; are used simultaneously.
+
+(define_reservation "fpu" "F1+F2+FS")
+
+;; This is to highlight the fact that f1 
+;; cannot overlap with F1.
+
+(exclusion_set  "f1_1,f1_2" "F1")
+
+;; Although reg moves have a latency of zero 
+;; we need to highlight that they use D stage
+;; for one cycle.
+
+(define_insn_reservation "reg_mov" 0
+               (eq_attr "type" "move,fmove")
+              "issue")
+
+;; Other MT  group intructions(1 step operations)
+;; Group:	MT
+;; Latency: 	1
+;; Issue Rate: 	1
+
+(define_insn_reservation "mt" 1
+                      (eq_attr "insn_class" "mt_group")
+                      "issue,nothing")
+
+;; Fixed Point Arithmetic Instructions(1 step operations)
+;; Group:	EX
+;; Latency: 	1
+;; Issue Rate: 	1
+
+(define_insn_reservation "simple_arith" 1 
+            (eq_attr "insn_class" "ex_group")
+            "issue,int")
+
+;; Load Store instructions. (MOV.[BWL]@(d,GBR)
+;; Group:	LS
+;; Latency: 	2
+;; Issue Rate: 	1
+
+(define_insn_reservation "load_store" 2
+       (eq_attr "type" "load,load_si,pcload,pcload_si,store")
+       "issue,memory*2")
+
+;; Branch (BF,BF/S,BT,BT/S,BRA)
+;; Group:	BR
+;; Latency: 	2 (or 1) Actually Observed to be 5/7
+;; Issue Rate: 	1
+;; The latency is 1 when displacement is 0.
+;; This reservation can be further broken into 2
+;;    1. branch_zero : One with latency 1 and in the TEST 
+;;       part it also checks for 0 (ZERO) displacement 
+;;    2. branch: Latency 2.
+
+(define_insn_reservation "branch_zero"  5
+             (and (eq_attr "type" "cbranch")
+		  (eq_attr "length" "2"))
+             "(issue+pcr_addrcalc),pcr_addrcalc,nothing")
+
+(define_insn_reservation "branch"  7
+             (eq_attr "type" "cbranch")
+             "(issue+pcr_addrcalc),pcr_addrcalc,nothing")
+
+;; Branch Far (JMP,RTS,BRAF)
+;; Group:	CO
+;; Latency: 	3
+;; Issue Rate: 	2
+;;    Since issue stage (D stage) is blocked for 2nd cycle, 
+;;    cpu_unit  int  is reserved since it might be required for far
+;;    address calculation.
+
+(define_insn_reservation "branch_far" 12
+         (and (eq_attr "type" "jump,return")
+	      (eq_attr "length" "6"))
+         "d_lock*2,int+pcr_addrcalc,pcr_addrcalc")
+
+;; RTE
+;; Group:	CO
+;; atency: 	5
+;; Issue Rate: 	5
+;; this instruction can be executed in any of the pipelines 
+;; and blocks the pipeline for next 4 stages.
+
+(define_insn_reservation "return_from_exp" 5
+          (eq_attr "type" "rte")
+         "(issue+pcr_addrcalc),d_lock*4,int+pcr_addrcalc,nothing")
+
+;; OCBP, OCBWB
+;; Group:	CO
+;; Latency: 	5
+;; Issue Rate: 	1
+
+(define_insn_reservation "ocbwb"  5
+          (eq_attr "insn_class" "cwb") 
+ 	   "issue,(int+memory),memory*5")
+		
+;; LDS to PR,JSR
+;; Group:	CO
+;; Latency: 	3
+;; Issue Rate: 	2
+;; The SX stage is blocked for last 2 cycles.
+
+(define_insn_reservation "lds_to_pr" 3 
+          (eq_attr "type" "prset,call,sfunc") 
+          "(issue+pcr_addrcalc),(issue+int+pcr_addrcalc),(int+pcr_addrcalc)*2")
+
+;; LDS.L to PR 
+;; Group:	CO
+;; Latency: 	3
+;; Issue Rate: 	2
+;; The SX unit is blocked for last 2 cycles.
+ 
+(define_insn_reservation "ldsmem_to_pr"  3
+      (eq_attr "type" "pload") 
+     "(issue+pcr_addrcalc),(issue+int+pcr_addrcalc),(int+memory+pcr_addrcalc),(int+pcr_addrcalc)")
+
+;; STS from PR
+;; Group:	CO
+;; Latency: 	2
+;; Issue Rate: 	2
+;; The SX unit in second and third cycles.
+
+(define_insn_reservation "sts_from_pr" 2
+        (eq_attr "type" "prget")
+       "(issue+pcr_addrcalc),(pipe_01+int+pcr_addrcalc),(int+pcr_addrcalc),nothing")
+
+;; STS.L from PR
+;; Group:	CO
+;; Latency: 	2
+;; Issue Rate: 	2
+
+(define_insn_reservation "prload_mem" 2 
+          (eq_attr "type" "pstore")
+           "(issue+pcr_addrcalc),(pipe_01+int+pcr_addrcalc),(int+memory+pcr_addrcalc),memory")
+
+;; LDS to FPSCR
+;; Group:	CO
+;; Latency: 	4
+;; Issue Rate: 	1
+;; F1 is blocked for last three cycles. 
+
+(define_insn_reservation "fpscr_store" 4
+        (eq_attr "insn_class" "lds_to_fpscr")
+       "issue,int,F1*3")
+
+;; LDS.L to FPSCR
+;; Group:	CO
+;; Latency: 	1 / 4
+;; Latency to update Rn is 1 and latency to update FPSCR is 4
+;; Issue Rate: 	1
+;; F1 is blocked for last three cycles.
+
+(define_insn_reservation "fpscr_store_mem" 4
+        (eq_attr "insn_class"  "ldsmem_to_fpscr") 
+        "issue,(int+memory),(F1+memory),F1*2")
+
+
+;; Fixed point multiplication (DMULS.L DMULU.L MUL.L MULS.W,MULU.W)
+;; Group:	CO
+;; Latency: 	4 / 4
+;; Issue Rate: 	1
+
+(define_insn_reservation "multi" 4
+	(eq_attr "type" "smpy,dmpy")
+ 	"issue,(issue+int+f1_1),(int+f1_1),(f1_1|f1_2)*2,F2,FS")
+
+
+;; Single precision floating point computation FCMP/EQ,
+;; FCP/GT, FADD, FLOAT, FMAC, FMUL, FSUB, FTRC, FRVHG, FSCHG
+;; Group:	FE
+;; Latency: 	4
+;; Issue Rate: 	1
+
+(define_insn_reservation "fp_arith"  4
+              (eq_attr "type" "fp")
+ 	      "issue,F1,F2,FS")
+
+;; Single Precision FDIV/SQRT
+;; Group:	FE
+;; Latency: 	12/13
+;; Issue Rate: 	1
+
+(define_insn_reservation "fp_div" 13
+               (eq_attr "type" "fdiv")
+               "issue,F1+F3,F1+F2+F3,F3*7,F1+F3,F2,FS")
+
+;; Double Precision floating point computation
+;; (FCNVDS, FCNVSD, FLOAT, FTRC)
+;; Group:	FE
+;; Latency: 	(3,4)/5
+;; Issue Rate: 	1
+
+(define_insn_reservation "dp_float" 5
+         (eq_attr "type" "dfp_conv")
+	 "issue,F1,F1+F2,F2+FS,FS")
+
+;; Double-precision floating-point (FADD ,FMUL,FSUB) 
+;; Group:	FE
+;; Latency: 	(7,8)/9
+;; Issue Rate: 	1
+
+(define_insn_reservation "fp_double_arith" 9
+        (eq_attr "type" "dfp_arith")
+	"issue,F1,F1+F2,fpu*4,F2+FS,FS")
+
+;; Double-precision FCMP (FCMP/EQ,FCMP/GT) 
+;; Group:	FE
+;; Latency: 	3/5
+;; Issue Rate: 	2
+
+(define_insn_reservation "fp_double_cmp" 5 
+        (eq_attr "type" "dfp_cmp")
+	"issue,(issue+F1),F1+F2,F2+FS,FS")
+
+;; Double precision FDIV/SQRT
+;; Group:	FE
+;; Latency: 	(24,25)/26
+;; Issue Rate: 	1
+
+(define_insn_reservation "dp_div" 26
+        (eq_attr "type" "dfdiv")
+ 	"issue,F1+F3,F1+F2+F3,F2+F3+FS,F3*16,F1+F3,F1+F2+F3,fpu+F3,F2+FS,FS")
+
--- a/gcc/config/sparc/sparc-protos.h
+++ b/gcc/config/sparc/sparc-protos.h
@ -117,6 +117,8 @@ extern char *sparc_v8plus_shift PARAMS ((rtx *, rtx, const char *));
   32 bits of REG are 0 before INSN.  */   
 extern int sparc_check_64 PARAMS ((rtx, rtx));
 extern rtx gen_df_reg PARAMS ((rtx, int));
+/* Used for DFA scheduling when cpu is ultrasparc.  */
+extern int ultrasparc_store_bypass_p PARAMS ((rtx, rtx));
 extern int sparc_extra_constraint_check PARAMS ((rtx, int, int));
 #endif /* RTX_CODE */

--- a/gcc/config/sparc/sparc.c
+++ b/gcc/config/sparc/sparc.c
--- a/gcc/config/sparc/sparc.h
+++ b/gcc/config/sparc/sparc.h
@ -122,9 +122,11 @@ extern enum cmodel sparc_cmodel;
 #define TARGET_CPU_sparcv9	7	/* alias */
 #define TARGET_CPU_sparc64	7	/* alias */
 #define TARGET_CPU_ultrasparc	8
+#define TARGET_CPU_ultrasparc3	9

 #if TARGET_CPU_DEFAULT == TARGET_CPU_v9 \
- || TARGET_CPU_DEFAULT == TARGET_CPU_ultrasparc
+ || TARGET_CPU_DEFAULT == TARGET_CPU_ultrasparc \
+ || TARGET_CPU_DEFAULT == TARGET_CPU_ultrasparc3

 #define CPP_CPU32_DEFAULT_SPEC ""
 #define ASM_CPU32_DEFAULT_SPEC ""
@ -141,6 +143,10 @@ extern enum cmodel sparc_cmodel;
 #define CPP_CPU64_DEFAULT_SPEC "-D__sparc_v9__"
 #define ASM_CPU64_DEFAULT_SPEC "-Av9a"
 #endif
+#if TARGET_CPU_DEFAULT == TARGET_CPU_ultrasparc3
+#define CPP_CPU64_DEFAULT_SPEC "-D__sparc_v9__"
+#define ASM_CPU64_DEFAULT_SPEC "-Av9b"
+#endif

 #else

@ -230,6 +236,7 @@ Unrecognized value in TARGET_CPU_DEFAULT.
 %{mcpu=sparclite86x:-D__sparclite86x__} \
 %{mcpu=v9:-D__sparc_v9__} \
 %{mcpu=ultrasparc:-D__sparc_v9__} \
+%{mcpu=ultrasparc3:-D__sparc_v9__} \
 %{!mcpu*:%{!mcypress:%{!msparclite:%{!mf930:%{!mf934:%{!mv8:%{!msupersparc:%(cpp_cpu_default)}}}}}}} \
 "

@ -296,6 +303,7 @@ Unrecognized value in TARGET_CPU_DEFAULT.
 %{mv8plus:-Av8plus} \
 %{mcpu=v9:-Av9} \
 %{mcpu=ultrasparc:%{!mv8plus:-Av9a}} \
+%{mcpu=ultrasparc3:%{!mv8plus:-Av9b}} \
 %{!mcpu*:%{!mcypress:%{!msparclite:%{!mf930:%{!mf934:%{!mv8:%{!msupersparc:%(asm_cpu_default)}}}}}}} \
 "

@ -623,7 +631,8 @@ enum processor_type {
  PROCESSOR_SPARCLET,
  PROCESSOR_TSC701,
  PROCESSOR_V9,
-  PROCESSOR_ULTRASPARC
+  PROCESSOR_ULTRASPARC,
+  PROCESSOR_ULTRASPARC3
 };

 /* This is set from -m{cpu,tune}=xxx.  */
@ -2622,7 +2631,8 @@ do {                                                                    \
  (((FP_REG_CLASS_P (CLASS1) && GENERAL_OR_I64 (CLASS2)) \
    || (GENERAL_OR_I64 (CLASS1) && FP_REG_CLASS_P (CLASS2)) \
    || (CLASS1) == FPCC_REGS || (CLASS2) == FPCC_REGS)		\
-   ? (sparc_cpu == PROCESSOR_ULTRASPARC ? 12 : 6) : 2)
+   ? ((sparc_cpu == PROCESSOR_ULTRASPARC \
+       || sparc_cpu == PROCESSOR_ULTRASPARC3) ? 12 : 6) : 2)

 /* Provide the cost of a branch.  For pre-v9 processors we use
   a value of 3 to take into account the potential annulling of
@ -2653,6 +2663,8 @@ do {                                                                    \
    if (sparc_cpu == PROCESSOR_ULTRASPARC)		\
      return (GET_MODE (X) == DImode ?			\
              COSTS_N_INSNS (34) : COSTS_N_INSNS (19));	\
+    if (sparc_cpu == PROCESSOR_ULTRASPARC3)		\
+      return COSTS_N_INSNS (6);				\
    return TARGET_HARD_MUL ? COSTS_N_INSNS (5) : COSTS_N_INSNS (25); \
  case DIV:						\
  case UDIV:						\
@ -2661,6 +2673,9 @@ do {                                                                    \
    if (sparc_cpu == PROCESSOR_ULTRASPARC)		\
      return (GET_MODE (X) == DImode ?			\
              COSTS_N_INSNS (68) : COSTS_N_INSNS (37));	\
+    if (sparc_cpu == PROCESSOR_ULTRASPARC3)		\
+      return (GET_MODE (X) == DImode ?			\
+              COSTS_N_INSNS (71) : COSTS_N_INSNS (40));	\
    return COSTS_N_INSNS (25);				\
  /* Make FLOAT and FIX more expensive than CONST_DOUBLE,\
     so that cse will favor the latter.  */		\
--- a/gcc/config/sparc/sparc.md
+++ b/gcc/config/sparc/sparc.md
@ -44,6 +44,7 @@
 ;;			16	embmedany_textlo
 ;;			18	sethm
 ;;			19	setlo
+;;			20	cycle_display
 ;;
 ;; UNSPEC_VOLATILE:	0	blockage
 ;;			1	flush_register_windows
@ -61,7 +62,7 @@

 ;; Attribute for cpu type.
 ;; These must match the values for enum processor_type in sparc.h.
-(define_attr "cpu" "v7,cypress,v8,supersparc,sparclite,f930,f934,hypersparc,sparclite86x,sparclet,tsc701,v9,ultrasparc"
+(define_attr "cpu" "v7,cypress,v8,supersparc,sparclite,f930,f934,hypersparc,sparclite86x,sparclet,tsc701,v9,ultrasparc,ultrasparc3"
  (const (symbol_ref "sparc_cpu_attr")))

 ;; Attribute for the instruction set.
@ -82,9 +83,8 @@

 ;; Insn type.

-;; If you add any new type here, please update ultrasparc_sched_reorder too.
 (define_attr "type"
-  "ialu,compare,shift,load,sload,store,uncond_branch,branch,call,sibcall,call_no_delay_slot,return,imul,idiv,fpload,fpstore,fp,fpmove,fpcmove,fpcmp,fpmul,fpdivs,fpdivd,fpsqrts,fpsqrtd,cmove,multi,misc"
+  "ialu,compare,shift,load,sload,store,uncond_branch,branch,call,sibcall,call_no_delay_slot,return,imul,idiv,fpload,fpstore,fp,fpmove,fpcmove,fpcrmove,fpcmp,fpmul,fpdivs,fpdivd,fpsqrts,fpsqrtd,cmove,multi,misc"
  (const_string "ialu"))

 ;; true if branch/call has empty delay slot and will emit a nop in it
@ -163,6 +163,9 @@
 ;; FP precision.
 (define_attr "fptype" "single,double" (const_string "single"))

+;; UltraSPARC-III integer load type.
+(define_attr "us3load_type" "2cycle,3cycle" (const_string "2cycle"))
+
 (define_asm_attributes
  [(set_attr "length" "2")
   (set_attr "type" "multi")])
@ -245,344 +248,525 @@
  [(eq_attr "in_uncond_branch_delay" "true")
   (nil) (nil)])
   
-;; Function units of the SPARC
+;; DFA scheduling on the SPARC

-;; (define_function_unit {name} {num-units} {n-users} {test}
-;;                       {ready-delay} {issue-delay} [{conflict-list}])
+(define_automaton "cypress_0,cypress_1,supersparc_0,supersparc_1,hypersparc_0,hypersparc_1,sparclet,ultrasparc_0,ultrasparc_1,ultrasparc3_0,ultrasparc3_1")

-;; The integer ALU.
-;; (Noted only for documentation; units that take one cycle do not need to
-;; be specified.)
+;; Cypress scheduling

-;; On the sparclite, integer multiply takes 1, 3, or 5 cycles depending on
-;; the inputs.
+(define_cpu_unit "cyp_memory, cyp_fpalu" "cypress_0")
+(define_cpu_unit "cyp_fpmds" "cypress_1")

-;; ---- cypress CY7C602 scheduling:
-;; Memory with load-delay of 1 (i.e., 2 cycle load).
-
-(define_function_unit "memory" 1 0 
+(define_insn_reservation "cyp_load" 2
  (and (eq_attr "cpu" "cypress")
    (eq_attr "type" "load,sload,fpload"))
-  2 2)
+  "cyp_memory, nothing")

-;; SPARC has two floating-point units: the FP ALU,
-;; and the FP MUL/DIV/SQRT unit.
-;; Instruction timings on the CY7C602 are as follows
-;; FABSs	4
-;; FADDs/d	5/5
-;; FCMPs/d	4/4
-;; FDIVs/d	23/37
-;; FMOVs	4
-;; FMULs/d	5/7
-;; FNEGs	4
-;; FSQRTs/d	34/63
-;; FSUBs/d	5/5
-;; FdTOi/s	5/5
-;; FsTOi/d	5/5
-;; FiTOs/d	9/5
-
-;; The CY7C602 can only support 2 fp isnsn simultaneously.
-;; More insns cause the chip to stall.
-
-(define_function_unit "fp_alu" 1 0
+(define_insn_reservation "cyp_fp_alu" 5
  (and (eq_attr "cpu" "cypress")
    (eq_attr "type" "fp,fpmove"))
-  5 5)
+  "cyp_fpalu, nothing*3")

-(define_function_unit "fp_mds" 1 0
+(define_insn_reservation "cyp_fp_mult" 7
  (and (eq_attr "cpu" "cypress")
    (eq_attr "type" "fpmul"))
-  7 7)
+  "cyp_fpmds, nothing*5")

-(define_function_unit "fp_mds" 1 0
+(define_insn_reservation "cyp_fp_div" 37
  (and (eq_attr "cpu" "cypress")
    (eq_attr "type" "fpdivs,fpdivd"))
-  37 37)
+  "cyp_fpmds, nothing*35")

-(define_function_unit "fp_mds" 1 0
+(define_insn_reservation "cyp_fp_sqrt" 63
  (and (eq_attr "cpu" "cypress")
    (eq_attr "type" "fpsqrts,fpsqrtd"))
-  63 63)
+  "cyp_fpmds, nothing*61")

-;; ----- The TMS390Z55 scheduling
-;; The Supersparc can issue 1 - 3 insns per cycle: up to two integer,
-;; one ld/st, one fp.
-;; Memory delivers its result in one cycle to IU, zero cycles to FP
+;; SuperSPARC scheduling

-(define_function_unit "memory" 1 0
+(define_cpu_unit "ss_memory, ss_shift, ss_iwport0, ss_iwport1" "supersparc_0")
+(define_cpu_unit "ss_fpalu" "supersparc_0")
+(define_cpu_unit "ss_fpmds" "supersparc_1")
+
+(define_reservation "ss_iwport" "(ss_iwport0 | ss_iwport1)")
+
+(define_insn_reservation "ss_iuload" 1
  (and (eq_attr "cpu" "supersparc")
    (eq_attr "type" "load,sload"))
-  1 1)
+  "ss_memory")

-(define_function_unit "memory" 1 0
+;; Ok, fpu loads deliver the result in zero cycles.  But we
+;; have to show the ss_memory reservation somehow, thus...
+(define_insn_reservation "ss_fpload" 0
  (and (eq_attr "cpu" "supersparc")
    (eq_attr "type" "fpload"))
-  0 1)
+  "ss_memory")

-(define_function_unit "memory" 1 0
+(define_bypass 0 "ss_fpload" "ss_fp_alu,ss_fp_mult,ss_fp_divs,ss_fp_divd,ss_fp_sqrt")
+
+(define_insn_reservation "ss_store" 1
  (and (eq_attr "cpu" "supersparc")
    (eq_attr "type" "store,fpstore"))
-  1 1)
+  "ss_memory")

-(define_function_unit "shift" 1 0
+(define_insn_reservation "ss_ialu_shift" 1
  (and (eq_attr "cpu" "supersparc")
    (eq_attr "type" "shift"))
-  1 1)
+  "ss_shift + ss_iwport")

-;; There are only two write ports to the integer register file
-;; A store also uses a write port
-
-(define_function_unit "iwport" 2 0
+(define_insn_reservation "ss_ialu_any" 1
  (and (eq_attr "cpu" "supersparc")
    (eq_attr "type" "load,sload,store,shift,ialu"))
-  1 1)
+  "ss_iwport")

-;; Timings; throughput/latency
-;; FADD     1/3    add/sub, format conv, compar, abs, neg
-;; FMUL     1/3
-;; FDIVs    4/6
-;; FDIVd    7/9
-;; FSQRTs   6/8
-;; FSQRTd  10/12
-;; IMUL     4/4
-
-(define_function_unit "fp_alu" 1 0
+(define_insn_reservation "ss_fp_alu" 3
  (and (eq_attr "cpu" "supersparc")
    (eq_attr "type" "fp,fpmove,fpcmp"))
-  3 1)
+  "ss_fpalu, nothing*2")

-(define_function_unit "fp_mds" 1 0
+(define_insn_reservation "ss_fp_mult" 3
  (and (eq_attr "cpu" "supersparc")
    (eq_attr "type" "fpmul"))
-  3 1)
+  "ss_fpmds, nothing*2")

-(define_function_unit "fp_mds" 1 0
+(define_insn_reservation "ss_fp_divs" 6
  (and (eq_attr "cpu" "supersparc")
    (eq_attr "type" "fpdivs"))
-  6 4)
+  "ss_fpmds*4, nothing*2")

-(define_function_unit "fp_mds" 1 0
+(define_insn_reservation "ss_fp_divd" 9
  (and (eq_attr "cpu" "supersparc")
    (eq_attr "type" "fpdivd"))
-  9 7)
+  "ss_fpmds*7, nothing*2")

-(define_function_unit "fp_mds" 1 0
+(define_insn_reservation "ss_fp_sqrt" 12
  (and (eq_attr "cpu" "supersparc")
    (eq_attr "type" "fpsqrts,fpsqrtd"))
-  12 10)
+  "ss_fpmds*10, nothing*2")

-(define_function_unit "fp_mds" 1 0
+(define_insn_reservation "ss_imul" 4
  (and (eq_attr "cpu" "supersparc")
    (eq_attr "type" "imul"))
-  4 4)
+  "ss_fpmds*4")

-;; ----- hypersparc/sparclite86x scheduling
-;; The Hypersparc can issue 1 - 2 insns per cycle.  The dual issue cases are:
-;; L-Ld/St I-Int F-Float B-Branch LI/LF/LB/II/IF/IB/FF/FB
-;; II/FF case is only when loading a 32 bit hi/lo constant
-;; Single issue insns include call, jmpl, u/smul, u/sdiv, lda, sta, fcmp
-;; Memory delivers its result in one cycle to IU
+;; HyperSPARC/sparclite86x scheduling

-(define_function_unit "memory" 1 0
+(define_cpu_unit "hs_memory,hs_branch,hs_shift,hs_fpalu" "hypersparc_0")
+(define_cpu_unit "hs_fpmds" "hypersparc_1")
+
+(define_insn_reservation "hs_load" 1
  (and (ior (eq_attr "cpu" "hypersparc") (eq_attr "cpu" "sparclite86x"))
    (eq_attr "type" "load,sload,fpload"))
-  1 1)
+  "hs_memory")

-(define_function_unit "memory" 1 0
+(define_insn_reservation "hs_store" 2
  (and (ior (eq_attr "cpu" "hypersparc") (eq_attr "cpu" "sparclite86x"))
    (eq_attr "type" "store,fpstore"))
-  2 1)
+  "hs_memory, nothing")

-(define_function_unit "sparclite86x_branch" 1 0
+(define_insn_reservation "hs_slbranch" 1
  (and (eq_attr "cpu" "sparclite86x")
    (eq_attr "type" "branch"))
-  1 1)
+  "hs_branch")

-;; integer multiply insns 
-(define_function_unit "sparclite86x_shift" 1 0
+(define_insn_reservation "hs_slshift" 1
  (and (eq_attr "cpu" "sparclite86x")
    (eq_attr "type" "shift"))
-  1 1)
+  "hs_shift")

-(define_function_unit "fp_alu" 1 0
+(define_insn_reservation "hs_fp_alu" 1
  (and (ior (eq_attr "cpu" "hypersparc") (eq_attr "cpu" "sparclite86x"))
    (eq_attr "type" "fp,fpmove,fpcmp"))
-  1 1)
+  "hs_fpalu")

-(define_function_unit "fp_mds" 1 0
+(define_insn_reservation "hs_fp_mult" 1
  (and (ior (eq_attr "cpu" "hypersparc") (eq_attr "cpu" "sparclite86x"))
    (eq_attr "type" "fpmul"))
-  1 1)
+  "hs_fpmds")

-(define_function_unit "fp_mds" 1 0
+(define_insn_reservation "hs_fp_divs" 8
  (and (ior (eq_attr "cpu" "hypersparc") (eq_attr "cpu" "sparclite86x"))
    (eq_attr "type" "fpdivs"))
-  8 6)
+  "hs_fpmds*6, nothing*2")

-(define_function_unit "fp_mds" 1 0
+(define_insn_reservation "hs_fp_divd" 12
  (and (ior (eq_attr "cpu" "hypersparc") (eq_attr "cpu" "sparclite86x"))
    (eq_attr "type" "fpdivd"))
-  12 10)
+  "hs_fpmds*10, nothing*2")

-(define_function_unit "fp_mds" 1 0
+(define_insn_reservation "hs_fp_sqrt" 17
  (and (ior (eq_attr "cpu" "hypersparc") (eq_attr "cpu" "sparclite86x"))
    (eq_attr "type" "fpsqrts,fpsqrtd"))
-  17 15)
+  "hs_fpmds*15, nothing*2")

-(define_function_unit "fp_mds" 1 0
+(define_insn_reservation "hs_imul" 17
  (and (ior (eq_attr "cpu" "hypersparc") (eq_attr "cpu" "sparclite86x"))
    (eq_attr "type" "imul"))
-  17 15)
+  "hs_fpmds*15, nothing*2")

-;; ----- sparclet tsc701 scheduling
-;; The tsc701 issues 1 insn per cycle.
-;; Results may be written back out of order.
+;; Sparclet tsc701 scheduling

-;; Loads take 2 extra cycles to complete and 4 can be buffered at a time.
+(define_cpu_unit "sl_load0,sl_load1,sl_load2,sl_load3" "sparclet")
+(define_cpu_unit "sl_store,sl_imul" "sparclet")

-(define_function_unit "tsc701_load" 4 1
+(define_reservation "sl_load_any" "(sl_load0 | sl_load1 | sl_load2 | sl_load3)")
+(define_reservation "sl_load_all" "(sl_load0 + sl_load1 + sl_load2 + sl_load3)")
+
+(define_insn_reservation "sl_ld" 3
  (and (eq_attr "cpu" "tsc701")
-    (eq_attr "type" "load,sload"))
-  3 1)
+   (eq_attr "type" "load,sload"))
+  "sl_load_any, sl_load_any, sl_load_any")

-;; Stores take 2(?) extra cycles to complete.
-;; It is desirable to not have any memory operation in the following 2 cycles.
-;; (??? or 2 memory ops in the case of std).
-
-(define_function_unit "tsc701_store" 1 0
+(define_insn_reservation "sl_st" 3
  (and (eq_attr "cpu" "tsc701")
    (eq_attr "type" "store"))
-  3 3
-  [(eq_attr "type" "load,sload,store")])
+  "(sl_store+sl_load_all)*3")

-;; The multiply unit has a latency of 5.
-(define_function_unit "tsc701_mul" 1 0
+(define_insn_reservation "sl_imul" 5
  (and (eq_attr "cpu" "tsc701")
    (eq_attr "type" "imul"))
-  5 5)
+  "sl_imul*5")

-;; ----- The UltraSPARC-1 scheduling
-;; UltraSPARC has two integer units.  Shift instructions can only execute
-;; on IE0.  Condition code setting instructions, call, and jmpl (including
-;; the ret and retl pseudo-instructions) can only execute on IE1.
-;; Branch on register uses IE1, but branch on condition code does not.
-;; Conditional moves take 2 cycles.  No other instruction can issue in the
-;; same cycle as a conditional move.
-;; Multiply and divide take many cycles during which no other instructions
-;; can issue.
-;; Memory delivers its result in two cycles (except for signed loads,
-;; which take one cycle more).  One memory instruction can be issued per
-;; cycle.
+;; UltraSPARC-I/II scheduling

-(define_function_unit "memory" 1 0
+(define_cpu_unit "us1_fdivider,us1_fpm" "ultrasparc_0");
+(define_cpu_unit "us1_fpa,us1_load_writeback" "ultrasparc_1")
+(define_cpu_unit "us1_fps_0,us1_fps_1,us1_fpd_0,us1_fpd_1" "ultrasparc_1")
+(define_cpu_unit "us1_slot0,us1_slot1,us1_slot2,us1_slot3" "ultrasparc_1")
+(define_cpu_unit "us1_ieu0,us1_ieu1,us1_cti,us1_lsu" "ultrasparc_1")
+
+(define_reservation "us1_slot012" "(us1_slot0 | us1_slot1 | us1_slot2)")
+(define_reservation "us1_slotany" "(us1_slot0 | us1_slot1 | us1_slot2 | us1_slot3)")
+(define_reservation "us1_single_issue" "us1_slot0 + us1_slot1 + us1_slot2 + us1_slot3")
+
+(define_reservation "us1_fp_single" "(us1_fps_0 | us1_fps_1)")
+(define_reservation "us1_fp_double" "(us1_fpd_0 | us1_fpd_1)")
+;; This is a simplified representation of the issue at hand.
+;; For most cases, going from one FP precision type insn to another
+;; just breaks up the insn group.  However for some cases, such
+;; a situation causes the second insn to stall 2 more cycles.
+(exclusion_set "us1_fps_0,us1_fps_1" "us1_fpd_0,us1_fpd_1")
+
+;; If we have to schedule an ieu1 specific instruction and we want
+;; to reserve the ieu0 unit as well, we must reserve it first.  So for
+;; example we could not schedule this sequence:
+;;	COMPARE		IEU1
+;;	IALU		IEU0
+;; but we could schedule them together like this:
+;;	IALU		IEU0
+;;	COMPARE		IEU1
+;; This basically requires that ieu0 is reserved before ieu1 when
+;; it is required that both be reserved.
+(absence_set "us1_ieu0" "us1_ieu1")
+
+;; This defines the slotting order.  Most IEU instructions can only
+;; execute in the first three slots, FPU and branches can go into
+;; any slot.  We represent instructions which "break the group"
+;; as requiring reservation of us1_slot0.
+(absence_set "us1_slot0" "us1_slot1,us1_slot2,us1_slot3")
+(absence_set "us1_slot1" "us1_slot2,us1_slot3")
+(absence_set "us1_slot2" "us1_slot3")
+
+(define_insn_reservation "us1_simple_ieuN" 1
  (and (eq_attr "cpu" "ultrasparc")
-    (eq_attr "type" "load,fpload"))
-  2 1)
+    (eq_attr "type" "ialu"))
+  "(us1_ieu0 | us1_ieu1) + us1_slot012")

-(define_function_unit "memory" 1 0
-  (and (eq_attr "cpu" "ultrasparc")
-    (eq_attr "type" "sload"))
-  3 1)
-
-(define_function_unit "memory" 1 0
-  (and (eq_attr "cpu" "ultrasparc")
-    (eq_attr "type" "store,fpstore"))
-  1 1)
-
-(define_function_unit "ieuN" 2 0
-  (and (eq_attr "cpu" "ultrasparc")
-    (eq_attr "type" "ialu,shift,compare,call,sibcall,call_no_delay_slot,uncond_branch"))
-  1 1)
-
-(define_function_unit "ieu0" 1 0
+(define_insn_reservation "us1_simple_ieu0" 1
  (and (eq_attr "cpu" "ultrasparc")
    (eq_attr "type" "shift"))
-  1 1)
+  "us1_ieu0 + us1_slot012")

-(define_function_unit "ieu0" 1 0
+(define_insn_reservation "us1_simple_ieu1" 1
+  (and (eq_attr "cpu" "ultrasparc")
+    (eq_attr "type" "compare"))
+  "us1_ieu1 + us1_slot012")
+
+(define_insn_reservation "us1_cmove" 2
  (and (eq_attr "cpu" "ultrasparc")
    (eq_attr "type" "cmove"))
-  2 1)
+  "us1_single_issue, nothing")

-(define_function_unit "ieu1" 1 0
+(define_insn_reservation "us1_imul" 1
  (and (eq_attr "cpu" "ultrasparc")
-    (eq_attr "type" "compare,call,sibcall,call_no_delay_slot,uncond_branch"))
-  1 1)
+    (eq_attr "type" "imul"))
+  "us1_single_issue")

-(define_function_unit "cti" 1 0
+(define_insn_reservation "us1_idiv" 1
+  (and (eq_attr "cpu" "ultrasparc")
+    (eq_attr "type" "idiv"))
+  "us1_single_issue")
+
+;; For loads, the "delayed return mode" behavior of the chip
+;; is represented using the us1_load_writeback resource.
+(define_insn_reservation "us1_load" 2
+  (and (eq_attr "cpu" "ultrasparc")
+    (eq_attr "type" "load,fpload"))
+  "us1_lsu + us1_slot012, us1_load_writeback")
+
+(define_insn_reservation "us1_load_signed" 3
+  (and (eq_attr "cpu" "ultrasparc")
+    (eq_attr "type" "sload"))
+  "us1_lsu + us1_slot012, nothing, us1_load_writeback")
+
+(define_insn_reservation "us1_store" 1
+  (and (eq_attr "cpu" "ultrasparc")
+    (eq_attr "type" "store,fpstore"))
+  "us1_lsu + us1_slot012")
+
+(define_insn_reservation "us1_branch" 1
  (and (eq_attr "cpu" "ultrasparc")
    (eq_attr "type" "branch"))
-  1 1)
+  "us1_cti + us1_slotany")

-;; Timings; throughput/latency
-;; FMOV     1/1    fmov, fabs, fneg
-;; FMOVcc   1/2
-;; FADD     1/3    add/sub, format conv, compar
-;; FMUL     1/3
-;; FDIVs    12/12
-;; FDIVd    22/22
-;; FSQRTs   12/12
-;; FSQRTd   22/22
-;; FCMP takes 1 cycle to branch, 2 cycles to conditional move.
-;;
-;; FDIV{s,d}/FSQRT{s,d} are given their own unit since they only
-;; use the FPM multiplier for final rounding 3 cycles before the
-;; end of their latency and we have no real way to model that.
-;;
-;; ??? This is really bogus because the timings really depend upon
-;; who uses the result.  We should record who the user is with
-;; more descriptive 'type' attribute names and account for these
-;; issues in ultrasparc_adjust_cost. 
-
-(define_function_unit "fadd" 1 0
+(define_insn_reservation "us1_call_jmpl" 1
  (and (eq_attr "cpu" "ultrasparc")
+    (eq_attr "type" "call,sibcall,call_no_delay_slot,uncond_branch"))
+  "us1_cti + us1_ieu1 + us1_slot0")
+
+(define_insn_reservation "us1_fmov_single" 1
+  (and (and (eq_attr "cpu" "ultrasparc")
+            (eq_attr "type" "fpmove"))
+       (eq_attr "fptype" "single"))
+  "us1_fpa + us1_fp_single + us1_slotany")
+
+(define_insn_reservation "us1_fmov_double" 1
+  (and (and (eq_attr "cpu" "ultrasparc")
+            (eq_attr "type" "fpmove"))
+       (eq_attr "fptype" "double"))
+  "us1_fpa + us1_fp_double + us1_slotany")
+
+(define_insn_reservation "us1_fcmov_single" 2
+  (and (and (eq_attr "cpu" "ultrasparc")
+            (eq_attr "type" "fpcmove,fpcrmove"))
+       (eq_attr "fptype" "single"))
+  "us1_fpa + us1_fp_single + us1_slotany, nothing")
+
+(define_insn_reservation "us1_fcmov_double" 2
+  (and (and (eq_attr "cpu" "ultrasparc")
+            (eq_attr "type" "fpcmove,fpcrmove"))
+       (eq_attr "fptype" "double"))
+  "us1_fpa + us1_fp_double + us1_slotany, nothing")
+
+(define_insn_reservation "us1_faddsub_single" 4
+  (and (and (eq_attr "cpu" "ultrasparc")
+            (eq_attr "type" "fp"))
+       (eq_attr "fptype" "single"))
+  "us1_fpa + us1_fp_single + us1_slotany, nothing*3")
+
+(define_insn_reservation "us1_faddsub_double" 4
+  (and (and (eq_attr "cpu" "ultrasparc")
+            (eq_attr "type" "fp"))
+       (eq_attr "fptype" "double"))
+  "us1_fpa + us1_fp_double + us1_slotany, nothing*3")
+
+(define_insn_reservation "us1_fpcmp_single" 1
+  (and (and (eq_attr "cpu" "ultrasparc")
+            (eq_attr "type" "fpcmp"))
+       (eq_attr "fptype" "single"))
+  "us1_fpa + us1_fp_single + us1_slotany")
+
+(define_insn_reservation "us1_fpcmp_double" 1
+  (and (and (eq_attr "cpu" "ultrasparc")
+            (eq_attr "type" "fpcmp"))
+       (eq_attr "fptype" "double"))
+  "us1_fpa + us1_fp_double + us1_slotany")
+
+(define_insn_reservation "us1_fmult_single" 4
+  (and (and (eq_attr "cpu" "ultrasparc")
+            (eq_attr "type" "fpmul"))
+       (eq_attr "fptype" "single"))
+  "us1_fpm + us1_fp_single + us1_slotany, nothing*3")
+
+(define_insn_reservation "us1_fmult_double" 4
+  (and (and (eq_attr "cpu" "ultrasparc")
+            (eq_attr "type" "fpmul"))
+       (eq_attr "fptype" "double"))
+  "us1_fpm + us1_fp_double + us1_slotany, nothing*3")
+
+;; This is actually in theory dangerous, because it is possible
+;; for the chip to prematurely dispatch the dependant instruction
+;; in the G stage, resulting in a 9 cycle stall.  However I have never
+;; been able to trigger this case myself even with hand written code,
+;; so it must require some rare complicated pipeline state.
+(define_bypass 3
+   "us1_faddsub_single,us1_faddsub_double,us1_fmult_single,us1_fmult_double"
+   "us1_faddsub_single,us1_faddsub_double,us1_fmult_single,us1_fmult_double")
+
+;; Floating point divide and square root use the multiplier unit
+;; for final rounding 3 cycles before the divide/sqrt is complete.
+
+(define_insn_reservation "us1_fdivs"
+  13
+  (and (eq_attr "cpu" "ultrasparc")
+    (eq_attr "type" "fpdivs,fpsqrts"))
+  "(us1_fpm + us1_fdivider + us1_slot0), us1_fdivider*8, (us1_fpm + us1_fdivider), us1_fdivider*2"
+  )
+
+(define_bypass
+  12
+  "us1_fdivs"
+  "us1_faddsub_single,us1_faddsub_double,us1_fmult_single,us1_fmult_double")
+
+(define_insn_reservation "us1_fdivd"
+  23
+  (and (eq_attr "cpu" "ultrasparc")
+    (eq_attr "type" "fpdivd,fpsqrtd"))
+  "(us1_fpm + us1_fdivider + us1_slot0), us1_fdivider*18, (us1_fpm + us1_fdivider), us1_fdivider*2"
+  )
+(define_bypass
+  22
+  "us1_fdivd"
+  "us1_faddsub_single,us1_faddsub_double,us1_fmult_single,us1_fmult_double")
+
+;; Any store may multi issue with the insn creating the source
+;; data as long as that creating insn is not an FPU div/sqrt.
+;; We need a special guard function because this bypass does
+;; not apply to the address inputs of the store.
+(define_bypass 0 "us1_simple_ieuN,us1_simple_ieu1,us1_simple_ieu0,us1_faddsub_single,us1_faddsub_double,us1_fmov_single,us1_fmov_double,us1_fcmov_single,us1_fcmov_double,us1_fmult_single,us1_fmult_double" "us1_store"
+   "ultrasparc_store_bypass_p")
+
+;; An integer branch may execute in the same cycle as the compare
+;; creating the condition codes.
+(define_bypass 0 "us1_simple_ieu1" "us1_branch")
+
+;; UltraSPARC-III scheduling
+;;
+;; A much simpler beast, no silly slotting rules and both
+;; integer units are fully symmetric.  It does still have
+;; single-issue instructions though.
+
+(define_cpu_unit "us3_a0,us3_a1,us3_ms,us3_br,us3_fpm" "ultrasparc3_0")
+(define_cpu_unit "us3_slot0,us3_slot1,us3_slot2,us3_slot3,us3_fpa" "ultrasparc3_1")
+(define_cpu_unit "us3_load_writeback" "ultrasparc3_1")
+
+(define_reservation "us3_slotany" "(us3_slot0 | us3_slot1 | us3_slot2 | us3_slot3)")
+(define_reservation "us3_single_issue" "us3_slot0 + us3_slot1 + us3_slot2 + us3_slot3")
+(define_reservation "us3_ax" "(us3_a0 | us3_a1)")
+
+(define_insn_reservation "us3_integer" 1
+  (and (eq_attr "cpu" "ultrasparc3")
+    (eq_attr "type" "ialu,shift,compare"))
+  "us3_ax + us3_slotany")
+
+(define_insn_reservation "us3_cmove" 2
+  (and (eq_attr "cpu" "ultrasparc3")
+    (eq_attr "type" "cmove"))
+  "us3_ms + us3_br + us3_slotany, nothing")
+
+;; ??? Not entirely accurate.
+;; ??? It can run from 6 to 9 cycles.  The first cycle the MS pipe
+;; ??? is needed, and the instruction group is broken right after
+;; ??? the imul.  Then 'helper' instructions are generated to perform
+;; ??? each further stage of the multiplication, each such 'helper' is
+;; ??? single group.  So, the reservation aspect is represented accurately
+;; ??? here, but the variable cycles are not.
+;; ??? Currently I have no idea how to determine the variability, but once
+;; ??? known we can simply add a define_bypass or similar to model it.
+(define_insn_reservation "us3_imul" 6
+  (and (eq_attr "cpu" "ultrasparc3")
+    (eq_attr "type" "imul"))
+  "us3_ms + us3_slotany, us3_single_issue*5")
+
+(define_insn_reservation "us3_idiv" 71
+  (and (eq_attr "cpu" "ultrasparc3")
+    (eq_attr "type" "idiv"))
+  "us3_ms + us3_slotany, us3_single_issue*70")
+
+;; UltraSPARC-III has a similar load delay as UltraSPARC-I/II except
+;; that all loads except 32-bit/64-bit unsigned loads take the extra
+;; delay for sign/zero extension.
+(define_insn_reservation "us3_2cycle_load" 2
+  (and (eq_attr "cpu" "ultrasparc3")
+    (and (eq_attr "type" "load,fpload")
+      (eq_attr "us3load_type" "2cycle")))
+  "us3_ms + us3_slotany, us3_load_writeback")
+
+(define_insn_reservation "us3_load_delayed" 3
+  (and (eq_attr "cpu" "ultrasparc3")
+    (and (eq_attr "type" "load,sload")
+      (eq_attr "us3load_type" "3cycle")))
+  "us3_ms + us3_slotany, nothing, us3_load_writeback")
+
+(define_insn_reservation "us3_store" 1
+  (and (eq_attr "cpu" "ultrasparc3")
+    (eq_attr "type" "store,fpstore"))
+  "us3_ms + us3_slotany")
+
+(define_insn_reservation "us3_branch" 1
+  (and (eq_attr "cpu" "ultrasparc3")
+    (eq_attr "type" "branch"))
+  "us3_br + us3_slotany")
+
+(define_insn_reservation "us3_call_jmpl" 1
+  (and (eq_attr "cpu" "ultrasparc3")
+    (eq_attr "type" "call,sibcall,call_no_delay_slot,uncond_branch"))
+  "us3_br + us3_ms + us3_slotany")
+
+(define_insn_reservation "us3_fmov" 3
+  (and (eq_attr "cpu" "ultrasparc3")
    (eq_attr "type" "fpmove"))
-  1 1)
+  "us3_fpa + us3_slotany, nothing*2")

-(define_function_unit "fadd" 1 0
-  (and (eq_attr "cpu" "ultrasparc")
+(define_insn_reservation "us3_fcmov" 3
+  (and (eq_attr "cpu" "ultrasparc3")
    (eq_attr "type" "fpcmove"))
-  2 1)
+  "us3_fpa + us3_br + us3_slotany, nothing*2")

-(define_function_unit "fadd" 1 0
-  (and (eq_attr "cpu" "ultrasparc")
+(define_insn_reservation "us3_fcrmov" 3
+  (and (eq_attr "cpu" "ultrasparc3")
+    (eq_attr "type" "fpcrmove"))
+  "us3_fpa + us3_ms + us3_slotany, nothing*2")
+
+(define_insn_reservation "us3_faddsub" 4
+  (and (eq_attr "cpu" "ultrasparc3")
    (eq_attr "type" "fp"))
-  3 1)
+  "us3_fpa + us3_slotany, nothing*3")

-(define_function_unit "fadd" 1 0
-  (and (eq_attr "cpu" "ultrasparc")
+(define_insn_reservation "us3_fpcmp" 5
+  (and (eq_attr "cpu" "ultrasparc3")
    (eq_attr "type" "fpcmp"))
-  2 1)
+  "us3_fpa + us3_slotany, nothing*4")

-(define_function_unit "fmul" 1 0
-  (and (eq_attr "cpu" "ultrasparc")
+(define_insn_reservation "us3_fmult" 4
+ (and (eq_attr "cpu" "ultrasparc3")
    (eq_attr "type" "fpmul"))
-  3 1)
+  "us3_fpm + us3_slotany, nothing*3")

-(define_function_unit "fadd" 1 0
-  (and (eq_attr "cpu" "ultrasparc")
-    (eq_attr "type" "fpcmove"))
-  2 1)
-
-(define_function_unit "fdiv" 1 0
-  (and (eq_attr "cpu" "ultrasparc")
+(define_insn_reservation "us3_fdivs" 17
+  (and (eq_attr "cpu" "ultrasparc3")
    (eq_attr "type" "fpdivs"))
-  12 12)
+  "(us3_fpm + us3_slotany), us3_fpm*14, nothing*2")

-(define_function_unit "fdiv" 1 0
-  (and (eq_attr "cpu" "ultrasparc")
-    (eq_attr "type" "fpdivd"))
-  22 22)
-
-(define_function_unit "fdiv" 1 0
-  (and (eq_attr "cpu" "ultrasparc")
+(define_insn_reservation "us3_fsqrts" 20
+  (and (eq_attr "cpu" "ultrasparc3")
    (eq_attr "type" "fpsqrts"))
-  12 12)
+  "(us3_fpm + us3_slotany), us3_fpm*17, nothing*2")

-(define_function_unit "fdiv" 1 0
-  (and (eq_attr "cpu" "ultrasparc")
+(define_insn_reservation "us3_fdivd" 20
+  (and (eq_attr "cpu" "ultrasparc3")
+    (eq_attr "type" "fpdivd"))
+  "(us3_fpm + us3_slotany), us3_fpm*17, nothing*2")
+
+(define_insn_reservation "us3_fsqrtd" 29
+  (and (eq_attr "cpu" "ultrasparc3")
    (eq_attr "type" "fpsqrtd"))
-  22 22)
+  "(us3_fpm + us3_slotany), us3_fpm*26, nothing*2")
+
+;; Any store may multi issue with the insn creating the source
+;; data as long as that creating insn is not an FPU div/sqrt.
+;; We need a special guard function because this bypass does
+;; not apply to the address inputs of the store.
+(define_bypass 0 "us3_integer,us3_faddsub,us3_fmov,us3_fcmov,us3_fmult" "us3_store"
+   "ultrasparc_store_bypass_p")
+
+;; An integer branch may execute in the same cycle as the compare
+;; creating the condition codes.
+(define_bypass 0 "us3_integer" "us3_branch")
+
+;; If FMOVfcc is user of FPCMP, latency is only 1 cycle.
+(define_bypass 1 "us3_fpcmp" "us3_fcmov")
+

 ;; Compare instructions.
 ;; This controls RTL generation and register allocation.
@ -2181,7 +2365,8 @@
   mov\\t%1, %0
   ldub\\t%1, %0
   stb\\t%r1, %0"
-  [(set_attr "type" "*,load,store")])
+  [(set_attr "type" "*,load,store")
+   (set_attr "us3load_type" "*,3cycle,*")])

 (define_expand "movhi"
  [(set (match_operand:HI 0 "general_operand" "")
@ -2255,7 +2440,8 @@
   sethi\\t%%hi(%a1), %0
   lduh\\t%1, %0
   sth\\t%r1, %0"
-  [(set_attr "type" "*,*,load,store")])
+  [(set_attr "type" "*,*,load,store")
+   (set_attr "us3load_type" "*,*,3cycle,*")])

 ;; We always work with constants here.
 (define_insn "*movhi_lo_sum"
@ -4404,7 +4590,7 @@
  "@
   fmovrs%D1\\t%2, %3, %0
   fmovrs%d1\\t%2, %4, %0"
-  [(set_attr "type" "fpcmove")])
+  [(set_attr "type" "fpcrmove")])

 (define_insn "movdf_cc_reg_sp64"
  [(set (match_operand:DF 0 "register_operand" "=e,e")
@ -4417,7 +4603,7 @@
  "@
   fmovrd%D1\\t%2, %3, %0
   fmovrd%d1\\t%2, %4, %0"
-  [(set_attr "type" "fpcmove")
+  [(set_attr "type" "fpcrmove")
   (set_attr "fptype" "double")])

 (define_insn "*movtf_cc_reg_hq_sp64"
@ -4431,7 +4617,7 @@
  "@
   fmovrq%D1\\t%2, %3, %0
   fmovrq%d1\\t%2, %4, %0"
-  [(set_attr "type" "fpcmove")])
+  [(set_attr "type" "fpcrmove")])

 (define_insn "*movtf_cc_reg_sp64"
  [(set (match_operand:TF 0 "register_operand" "=e,e")
@ -4521,7 +4707,8 @@
 	(zero_extend:SI (match_operand:HI 1 "memory_operand" "m")))]
  ""
  "lduh\\t%1, %0"
-  [(set_attr "type" "load")])
+  [(set_attr "type" "load")
+   (set_attr "us3load_type" "3cycle")])

 (define_expand "zero_extendqihi2"
  [(set (match_operand:HI 0 "register_operand" "")
@ -4536,7 +4723,8 @@
  "@
   and\\t%1, 0xff, %0
   ldub\\t%1, %0"
-  [(set_attr "type" "*,load")])
+  [(set_attr "type" "*,load")
+   (set_attr "us3load_type" "*,3cycle")])

 (define_expand "zero_extendqisi2"
  [(set (match_operand:SI 0 "register_operand" "")
@ -4551,7 +4739,8 @@
  "@
   and\\t%1, 0xff, %0
   ldub\\t%1, %0"
-  [(set_attr "type" "*,load")])
+  [(set_attr "type" "*,load")
+   (set_attr "us3load_type" "*,3cycle")])

 (define_expand "zero_extendqidi2"
  [(set (match_operand:DI 0 "register_operand" "")
@ -4566,7 +4755,8 @@
  "@
   and\\t%1, 0xff, %0
   ldub\\t%1, %0"
-  [(set_attr "type" "*,load")])
+  [(set_attr "type" "*,load")
+   (set_attr "us3load_type" "*,3cycle")])

 (define_expand "zero_extendhidi2"
  [(set (match_operand:DI 0 "register_operand" "")
@ -4597,7 +4787,8 @@
 	(zero_extend:DI (match_operand:HI 1 "memory_operand" "m")))]
  "TARGET_ARCH64"
  "lduh\\t%1, %0"
-  [(set_attr "type" "load")])
+  [(set_attr "type" "load")
+   (set_attr "us3load_type" "3cycle")])


 ;; ??? Write truncdisi pattern using sra?
@ -4803,7 +4994,8 @@
 	(sign_extend:SI (match_operand:HI 1 "memory_operand" "m")))]
  ""
  "ldsh\\t%1, %0"
-  [(set_attr "type" "sload")])
+  [(set_attr "type" "sload")
+   (set_attr "us3load_type" "3cycle")])

 (define_expand "extendqihi2"
  [(set (match_operand:HI 0 "register_operand" "")
@ -4843,7 +5035,8 @@
 	(sign_extend:HI (match_operand:QI 1 "memory_operand" "m")))]
  ""
  "ldsb\\t%1, %0"
-  [(set_attr "type" "sload")])
+  [(set_attr "type" "sload")
+   (set_attr "us3load_type" "3cycle")])

 (define_expand "extendqisi2"
  [(set (match_operand:SI 0 "register_operand" "")
@ -4874,7 +5067,8 @@
 	(sign_extend:SI (match_operand:QI 1 "memory_operand" "m")))]
  ""
  "ldsb\\t%1, %0"
-  [(set_attr "type" "sload")])
+  [(set_attr "type" "sload")
+   (set_attr "us3load_type" "3cycle")])

 (define_expand "extendqidi2"
  [(set (match_operand:DI 0 "register_operand" "")
@ -4905,7 +5099,8 @@
 	(sign_extend:DI (match_operand:QI 1 "memory_operand" "m")))]
  "TARGET_ARCH64"
  "ldsb\\t%1, %0"
-  [(set_attr "type" "sload")])
+  [(set_attr "type" "sload")
+   (set_attr "us3load_type" "3cycle")])

 (define_expand "extendhidi2"
  [(set (match_operand:DI 0 "register_operand" "")
@ -4936,7 +5131,8 @@
 	(sign_extend:DI (match_operand:HI 1 "memory_operand" "m")))]
  "TARGET_ARCH64"
  "ldsh\\t%1, %0"
-  [(set_attr "type" "sload")])
+  [(set_attr "type" "sload")
+   (set_attr "us3load_type" "3cycle")])

 (define_expand "extendsidi2"
  [(set (match_operand:DI 0 "register_operand" "")
@ -4951,7 +5147,8 @@
  "@
  sra\\t%1, 0, %0
  ldsw\\t%1, %0"
-  [(set_attr "type" "shift,sload")])
+  [(set_attr "type" "shift,sload")
+   (set_attr "us3load_type" "*,3cycle")])

 ;; Special pattern for optimizing bit-field compares.  This is needed
 ;; because combine uses this as a canonical form.
@ -9507,3 +9704,9 @@
  "TARGET_V9"
  "t%C0\\t%%xcc, %1"
  [(set_attr "type" "misc")])
+
+(define_insn "cycle_display"
+  [(unspec [(match_operand 0 "const_int_operand" "")] 20)]
+  ""
+  "! cycle %0"
+  [(set_attr "length" "0")])
--- a/gcc/doc/contrib.texi
+++ b/gcc/doc/contrib.texi
@ -330,9 +330,10 @@ Andrew MacLeod for his ongoing work in building a real EH system,
 various code generation improvements, work on the global optimizer, etc.

@item
-Vladimir Makarov for hacking some ugly i960 problems, PowerPC
-hacking improvements to compile-time performance and overall knowledge
-and direction in the area of instruction scheduling.
+Vladimir Makarov for hacking some ugly i960 problems, PowerPC hacking
+improvements to compile-time performance, overall knowledge and
+direction in the area of instruction scheduling, and design and
+implementation of the automaton based instruction scheduler.

@item
 Bob Manson for his behind the scenes work on dejagnu.
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@ -3871,13 +3871,14 @@ in the compiler.
@cindex instruction splitting
@cindex splitting instructions

-There are two cases where you should specify how to split a pattern into
-multiple insns.  On machines that have instructions requiring delay
-slots (@pxref{Delay Slots}) or that have instructions whose output is
-not available for multiple cycles (@pxref{Function Units}), the compiler
-phases that optimize these cases need to be able to move insns into
-one-instruction delay slots.  However, some insns may generate more than one
-machine instruction.  These insns cannot be placed into a delay slot.
+There are two cases where you should specify how to split a pattern
+into multiple insns.  On machines that have instructions requiring
+delay slots (@pxref{Delay Slots}) or that have instructions whose
+output is not available for multiple cycles (@pxref{Processor pipeline
+description}), the compiler phases that optimize these cases need to
+be able to move insns into one-instruction delay slots.  However, some
+insns may generate more than one machine instruction.  These insns
+cannot be placed into a delay slot.

 Often you can rewrite the single insn as a list of individual insns,
 each corresponding to one machine instruction.  The disadvantage of
@ -4497,7 +4498,7 @@ to track the condition codes.
 * Insn Lengths::        Computing the length of insns.
 * Constant Attributes:: Defining attributes that are constant.
 * Delay Slots::         Defining delay slots required for a machine.
-* Function Units::      Specifying information for insn scheduling.
+* Processor pipeline description:: Specifying information for insn scheduling.
@end menu

@node Defining Attributes
@ -5127,14 +5128,101 @@ branch is true, we might represent this as follows:
@end smallexample
@c the above is *still* too long.  --mew 4feb93

-@node Function Units
-@subsection Specifying Function Units
+@node Processor pipeline description
+@subsection Specifying processor pipeline description
+@cindex processor pipeline description
+@cindex processor functional units
+@cindex instruction latency time
+@cindex interlock delays
+@cindex data dependence delays
+@cindex reservation delays
+@cindex pipeline hazard recognizer
+@cindex automaton based pipeline description
+@cindex regular expressions
+@cindex deterministic finite state automaton
+@cindex automaton based scheduler
+@cindex RISC
+@cindex VLIW
+
+To achieve better productivity most modern processors
+(super-pipelined, superscalar @acronym{RISC}, and @acronym{VLIW}
+processors) have many @dfn{functional units} on which several
+instructions can be executed simultaneously.  An instruction starts
+execution if its issue conditions are satisfied.  If not, the
+instruction is interlocked until its conditions are satisfied.  Such
+@dfn{interlock (pipeline) delay} causes interruption of the fetching
+of successor instructions (or demands nop instructions, e.g. for some
+MIPS processors).
+
+There are two major kinds of interlock delays in modern processors.
+The first one is a data dependence delay determining @dfn{instruction
+latency time}.  The instruction execution is not started until all
+source data have been evaluated by prior instructions (there are more
+complex cases when the instruction execution starts even when the data
+are not availaible but will be ready in given time after the
+instruction execution start).  Taking the data dependence delays into
+account is simple.  The data dependence (true, output, and
+anti-dependence) delay between two instructions is given by a
+constant.  In most cases this approach is adequate.  The second kind
+of interlock delays is a reservation delay.  The reservation delay
+means that two instructions under execution will be in need of shared
+processors resources, i.e. buses, internal registers, and/or
+functional units, which are reserved for some time.  Taking this kind
+of delay into account is complex especially for modern @acronym{RISC}
+processors.
+
+The task of exploiting more processor parallelism is solved by an
+instruction scheduler.  For better solution of this problem, the
+instruction scheduler has to have an adequate description of the
+processor parallelism (or @dfn{pipeline description}).  Currently GCC
+has two ways to describe processor parallelism.  The first one is old
+and originated from instruction scheduler written by Michael Tiemann
+and described in the first subsequent section.  The second one was
+created later.  It is based on description of functional unit
+reservations by processor instructions with the aid of @dfn{regular
+expressions}.  This is so called @dfn{automaton based description}.
+
+Gcc instruction scheduler uses a @dfn{pipeline hazard recognizer} to
+figure out the possibility of the instruction issue by the processor
+on given simulated processor cycle.  The pipeline hazard recognizer is
+a code generated from the processor pipeline description.  The
+pipeline hazard recognizer generated from the automaton based
+description is more sophisticated and based on deterministic finite
+state automaton (@acronym{DFA}) and therefore faster than one
+generated from the old description.  Also its speed is not depended on
+processor complexity.  The instruction issue is possible if there is
+a transition from one automaton state to another one.
+
+You can use any model to describe processor pipeline characteristics
+or even a mix of them.  You could use the old description for some
+processor submodels and the @acronym{DFA}-based one for the rest
+processor submodels.
+
+In general, the usage of the automaton based description is more
+preferable.  Its model is more rich.  It permits to describe more
+accurately pipeline characteristics of processors which results in
+improving code quality (although sometimes only on several percent
+fractions).  It will be also used as an infrastructure to implement
+sophisticated and practical insn scheduling which will try many
+instruction sequences to choose the best one.
+
+
+@menu
+* Old pipeline description:: Specifying information for insn scheduling.
+* Automaton pipeline description:: Describing insn pipeline characteristics.
+* Comparison of the two descriptions:: Drawbacks of the old pipeline description
+@end menu
+
+@node Old pipeline description
+@subsubsection Specifying Function Units
+@cindex old pipeline description
@cindex function units, for scheduling

-On most RISC machines, there are instructions whose results are not
-available for a specific number of cycles.  Common cases are instructions
-that load data from memory.  On many machines, a pipeline stall will result
-if the data is referenced too soon after the load instruction.
+On most @acronym{RISC} machines, there are instructions whose results
+are not available for a specific number of cycles.  Common cases are
+instructions that load data from memory.  On many machines, a pipeline
+stall will result if the data is referenced too soon after the load
+instruction.

 In addition, many newer microprocessors have multiple function units, usually
 one for integer and one for floating point, and often will incur pipeline
@ -5148,13 +5236,14 @@ due to function unit conflicts.

 For the purposes of the specifications in this section, a machine is
 divided into @dfn{function units}, each of which execute a specific
-class of instructions in first-in-first-out order.  Function units that
-accept one instruction each cycle and allow a result to be used in the
-succeeding instruction (usually via forwarding) need not be specified.
-Classic RISC microprocessors will normally have a single function unit,
-which we can call @samp{memory}.  The newer ``superscalar'' processors
-will often have function units for floating point operations, usually at
-least a floating point adder and multiplier.
+class of instructions in first-in-first-out order.  Function units
+that accept one instruction each cycle and allow a result to be used
+in the succeeding instruction (usually via forwarding) need not be
+specified.  Classic @acronym{RISC} microprocessors will normally have
+a single function unit, which we can call @samp{memory}.  The newer
+``superscalar'' processors will often have function units for floating
+point operations, usually at least a floating point adder and
+multiplier.

@findex define_function_unit
 Each usage of a function units by a class of insns is specified with a
@ -5217,10 +5306,10 @@ Typical uses of this vector are where a floating point function unit can
 pipeline either single- or double-precision operations, but not both, or
 where a memory unit can pipeline loads, but not stores, etc.

-As an example, consider a classic RISC machine where the result of a
-load instruction is not available for two cycles (a single ``delay''
-instruction is required) and where only one load instruction can be executed
-simultaneously.  This would be specified as:
+As an example, consider a classic @acronym{RISC} machine where the
+result of a load instruction is not available for two cycles (a single
+``delay'' instruction is required) and where only one load instruction
+can be executed simultaneously.  This would be specified as:

@smallexample
 (define_function_unit "memory" 1 1 (eq_attr "type" "load") 2 0)
@ -5246,6 +5335,395 @@ used during their execution and there is no way of representing that
 conflict.  We welcome any examples of how function unit conflicts work
 in such processors and suggestions for their representation.

+@node Automaton pipeline description
+@subsubsection Describing instruction pipeline characteristics
+@cindex automaton based pipeline description
+
+This section describes constructions of the automaton based processor
+pipeline description.  The order of all mentioned below constructions
+in the machine description file is not important.
+
+@findex define_automaton
+@cindex pipeline hazard recognizer
+The following optional construction describes names of automata
+generated and used for the pipeline hazards recognition.  Sometimes
+the generated finite state automaton used by the pipeline hazard
+recognizer is large.  If we use more one automaton and bind functional
+units to the automata, the summary size of the automata usually is
+less than the size of the single automaton.  If there is no one such
+construction, only one finite state automaton is generated.
+
+@smallexample
+(define_automaton @var{automata-names})
+@end smallexample
+
+@var{automata-names} is a string giving names of the automata.  The
+names are separated by commas.  All the automata should have unique names.
+The automaton name is used in construction @code{define_cpu_unit} and
+@code{define_query_cpu_unit}.
+
+@findex define_cpu_unit
+@cindex processor functional units
+Each processor functional unit used in description of instruction
+reservations should be described by the following construction.
+
+@smallexample
+(define_cpu_unit @var{unit-names} [@var{automaton-name}])
+@end smallexample
+
+@var{unit-names} is a string giving the names of the functional units
+separated by commas.  Don't use name @samp{nothing}, it is reserved
+for other goals.
+
+@var{automaton-name} is a string giving the name of automaton with
+which the unit is bound.  The automaton should be described in
+construction @code{define_automaton}.  You should give
+@dfn{automaton-name}, if there is a defined automaton.
+
+@findex define_query_cpu_unit
+@cindex querying function unit reservations
+The following construction describes CPU functional units analogously
+to @code{define_cpu_unit}.  If we use automata without their
+minimization, the reservation of such units can be queried for an
+automaton state.  The instruction scheduler never queries reservation
+of functional units for given automaton state.  So as a rule, you
+don't need this construction.  This construction could be used for
+future code generation goals (e.g. to generate @acronym{VLIW} insn
+templates).
+
+@smallexample
+(define_query_cpu_unit @var{unit-names} [@var{automaton-name}])
+@end smallexample
+
+@var{unit-names} is a string giving names of the functional units
+separated by commas.
+
+@var{automaton-name} is a string giving name of the automaton with
+which the unit is bound.
+
+@findex define_insn_reservation
+@cindex instruction latency time
+@cindex regular expressions
+@cindex data bypass
+The following construction is major one to describe pipeline
+characteristics of an instruction.
+
+@smallexample
+(define_insn_reservation @var{insn-name} @var{default_latency}
+                         @var{condition} @var{regexp})
+@end smallexample
+
+@var{default_latency} is a number giving latency time of the
+instruction.  There is an important difference between the old
+description and the automaton based pipeline description.  The latency
+time is used for all dependencies when we use the old description.  In
+the automaton based pipeline description, given latency time is used
+only for true dependencies.  The cost of anti-dependencies is always
+zero and the cost of output dependencies is the difference between
+latency times of the producing and consuming insns (if the difference
+is negative, the cost is considered to be zero).  You always can
+change the default costs for any description by using target hook
+@code{TARGET_SCHED_ADJUST_COST} (@pxref{Scheduling}).
+
+@var{insn-names} is a string giving internal name of the insn.  The
+internal names are used in constructions @code{define_bypass} and in
+the automaton description file generated for debugging.  The internal
+name has nothing common with the names in @code{define_insn}.  It is a
+good practice to use insn classes described in the processor manual.
+
+@var{condition} defines what RTL insns are described by this
+construction.  You should remember that you will be in trouble if
+@var{condition} for two or more different
+@code{define_insn_reservation} constructions is TRUE for an insn.  In
+this case what reservation will be used for the insn is not defined.
+Such cases are not checked during generation of the pipeline hazards
+recognizer because in general recognizing that two conditions may have
+the same value is quite difficult (especially if the conditions
+contain @code{symbol_ref}).  It is also not checked during the
+pipeline hazard recognizer work because it would slow down the
+recognizer considerably.
+
+@var{regexp} is a string describing reservation of the cpu functional
+units by the instruction.  The reservations are described by a regular
+expression according to the following syntax:
+
+@smallexample
+       regexp = regexp "," oneof
+              | oneof
+
+       oneof = oneof "|" allof
+             | allof
+
+       allof = allof "+" repeat
+             | repeat
+ 
+       repeat = element "*" number
+              | element
+
+       element = cpu_function_unit_name
+               | reservation_name
+               | result_name
+               | "nothing"
+               | "(" regexp ")"
+@end smallexample
+
+@itemize @bullet
+@item
+@samp{,} is used for describing the start of the next cycle in
+the reservation.
+
+@item
+@samp{|} is used for describing a reservation described by the first
+regular expression @strong{or} a reservation described by the second
+regular expression @strong{or} etc.
+
+@item
+@samp{+} is used for describing a reservation described by the first
+regular expression @strong{and} a reservation described by the
+second regular expression @strong{and} etc.
+
+@item
+@samp{*} is used for convenience and simply means a sequence in which
+the regular expression are repeated @var{number} times with cycle
+advancing (see @samp{,}).
+
+@item
+@samp{cpu_function_unit_name} denotes reservation of the named
+functional unit.
+
+@item
+@samp{reservation_name} --- see description of construction
+@samp{define_reservation}.
+
+@item
+@samp{nothing} denotes no unit reservations.
+@end itemize
+
+@findex define_reservation
+Sometimes unit reservations for different insns contain common parts.
+In such case, you can simplify the pipeline description by describing
+the common part by the following construction
+
+@smallexample
+(define_reservation @var{reservation-name} @var{regexp})
+@end smallexample
+
+@var{reservation-name} is a string giving name of @var{regexp}.
+Functional unit names and reservation names are in the same name
+space.  So the reservation names should be different from the
+functional unit names and can not be reserved name @samp{nothing}.
+
+@findex define_bypass
+@cindex instruction latency time
+@cindex data bypass
+The following construction is used to describe exceptions in the
+latency time for given instruction pair.  This is so called bypasses.
+
+@smallexample
+(define_bypass @var{number} @var{out_insn_names} @var{in_insn_names}
+               [@var{guard}])
+@end smallexample
+
+@var{number} defines when the result generated by the instructions
+given in string @var{out_insn_names} will be ready for the
+instructions given in string @var{in_insn_names}.  The instructions in
+the string are separated by commas.
+
+@var{guard} is an optional string giving name of a C function which
+defines an additional guard for the bypass.  The function will get the
+two insns as parameters.  If the function returns zero the bypass will
+be ignored for this case.  The additional guard is necessary to
+recognize complicated bypasses, e.g. when consumer is only an address
+of insn @samp{store} (not a stored value).
+
+@findex exclusion_set
+@findex presence_set
+@findex absence_set
+@cindex VLIW
+@cindex RISC
+Usually the following three constructions are used to describe
+@acronym{VLIW} processors (more correctly to describe a placement of
+small insns into @acronym{VLIW} insn slots).  Although they can be
+used for @acronym{RISC} processors too.
+
+@smallexample
+(exclusion_set @var{unit-names} @var{unit-names})
+(presence_set @var{unit-names} @var{unit-names})
+(absence_set @var{unit-names} @var{unit-names})
+@end smallexample
+
+@var{unit-names} is a string giving names of functional units
+separated by commas.
+
+The first construction (@samp{exclusion_set}) means that each
+functional unit in the first string can not be reserved simultaneously
+with a unit whose name is in the second string and vice versa.  For
+example, the construction is useful for describing processors
+(e.g. some SPARC processors) with a fully pipelined floating point
+functional unit which can execute simultaneously only single floating
+point insns or only double floating point insns.
+
+The second construction (@samp{presence_set}) means that each
+functional unit in the first string can not be reserved unless at
+least one of units whose names are in the second string is reserved.
+This is an asymmetric relation.  For example, it is useful for
+description that @acronym{VLIW} @samp{slot1} is reserved after
+@samp{slot0} reservation.
+
+The third construction (@samp{absence_set}) means that each functional
+unit in the first string can be reserved only if each unit whose name
+is in the second string is not reserved.  This is an asymmetric
+relation (actually @samp{exclusion_set} is analogous to this one but
+it is symmetric).  For example, it is useful for description that
+@acronym{VLIW} @samp{slot0} can not be reserved after @samp{slot1} or
+@samp{slot2} reservation.
+
+All functional units mentioned in a set should belong the same
+automaton.
+
+@findex automata_option
+@cindex deterministic finite state automaton
+@cindex nondeterministic finite state automaton
+@cindex finite state automaton minimization
+You can control the generator of the pipeline hazard recognizer with
+the following construction.
+
+@smallexample
+(automata_option @var{options})
+@end smallexample
+
+@var{options} is a string giving options which affect the generated
+code.  Currently there are the following options:
+
+@itemize @bullet
+@item
+@dfn{no-minimization} makes no minimization of the automaton.  This is
+only worth to do when we are going to query CPU functional unit
+reservations in an automaton state.
+
+@item
+@dfn{w} means a generation of the file describing the result
+automaton.  The file can be used to verify the description.
+
+@item
+@dfn{ndfa} makes nondeterministic finite state automata.  This affects
+the treatment of operator @samp{|} in the regular expressions.  The
+usual treatment of the operator is to try the first alternative and,
+if the reservation is not possible, the second alternative.  The
+nondeterministic treatment means trying all alternatives, some of them
+may be rejected by reservations in the subsequent insns.  You can not
+query functional unit reservations in nondeterministic automaton
+states.
+@end itemize
+
+As an example, consider a superscalar @acronym{RISC} machine which can
+issue three insns (two integer insns and one floating point insn) on
+the cycle but can finish only two insns.  To describe this, we define
+the following functional units.
+
+@smallexample
+(define_cpu_unit "i0_pipeline, i1_pipeline, f_pipeline")
+(define_cpu_unit "port_0, port1")
+@end smallexample
+
+All simple integer insns can be executed in any integer pipeline and
+their result is ready in two cycles.  The simple integer insns are
+issued into the first pipeline unless it is reserved, otherwise they
+are issued into the second pipeline.  Integer division and
+multiplication insns can be executed only in the second integer
+pipeline and their results are ready correspondingly in 8 and 4
+cycles.  The integer division is not pipelined, i.e. the subsequent
+integer division insn can not be issued until the current division
+insn finished.  Floating point insns are fully pipelined and their
+results are ready in 3 cycles.  There is also additional one cycle
+delay in the usage by integer insns of result produced by floating
+point insns.  To describe all of this we could specify
+
+@smallexample
+(define_cpu_unit "div")
+
+(define_insn_reservation "simple" 2 (eq_attr "cpu" "int")
+                         "(i0_pipeline | i1_pipeline), (port_0 | port1)")
+
+(define_insn_reservation "mult" 4 (eq_attr "cpu" "mult")
+                         "i1_pipeline, nothing*2, (port_0 | port1)")
+
+(define_insn_reservation "div" 8 (eq_attr "cpu" "div")
+                         "i1_pipeline, div*7, div + (port_0 | port1)")
+
+(define_insn_reservation "float" 3 (eq_attr "cpu" "float")
+                         "f_pipeline, nothing, (port_0 | port1))
+
+(define_bypass 4 "float" "simple,mut,div")
+@end smallexample
+
+To simplify the description we could describe the following reservation
+
+@smallexample
+(define_reservation "finish" "port0|port1")
+@end smallexample
+
+and use it in all @code{define_insn_reservation} as in the following
+construction
+
+@smallexample
+(define_insn_reservation "simple" 2 (eq_attr "cpu" "int")
+                         "(i0_pipeline | i1_pipeline), finish")
+@end smallexample
+
+
+@node Comparison of the two descriptions
+@subsubsection Drawbacks of the old pipeline description
+@cindex old pipeline description
+@cindex automaton based pipeline description
+@cindex processor functional units
+@cindex interlock delays
+@cindex instruction latency time
+@cindex pipeline hazard recognizer
+@cindex data bypass
+
+The old instruction level parallelism description and the pipeline
+hazards recognizer based on it have the following drawbacks in
+comparison with the @acronym{DFA}-based ones:
+  
+@itemize @bullet
+@item
+Each functional unit is believed to be reserved at the instruction
+execution start.  This is a very inaccurate model for modern
+processors.
+
+@item
+An inadequate description of instruction latency times.  The latency
+time is bound with a functional unit reserved by an instruction not
+with the instruction itself.  In other words, the description is
+oriented to describe at most one unit reservation by each instruction.
+It also does not permit to describe special bypasses between
+instruction pairs.
+
+@item
+The implementation of the pipeline hazard recognizer interface has
+constraints on number of functional units.  This is a number of bits
+in integer on the host machine.
+
+@item
+The interface to the pipeline hazard recognizer is more complex than
+one to the automaton based pipeline recognizer.
+
+@item
+An unnatural description when you write an unit and a condition which
+selects instructions using the unit.  Writing all unit reservations
+for an instruction (an instruction class) is more natural.
+
+@item
+The recognition of the interlock delays has slow implementation.  GCC
+scheduler supports structures which describe the unit reservations.
+The more processor has functional units, the slower pipeline hazard
+recognizer.  Such implementation would become slower when we enable to
+reserve functional units not only at the instruction execution start.
+The automaton based pipeline hazard recognizer speed is not depended
+on processor complexity.
+@end itemize
+
@node Conditional Execution
@section Conditional Execution
@cindex conditional execution
--- a/gcc/doc/passes.texi
+++ b/gcc/doc/passes.texi
@ -654,6 +654,8 @@ Several passes use instruction attributes.  A definition of the
 attributes defined for a particular machine is in file
@file{insn-attr.h}, which is generated from the machine description by
 the program @file{genattr}.  The file @file{insn-attrtab.c} contains
-subroutines to obtain the attribute values for insns.  It is generated
-from the machine description by the program @file{genattrtab}.
+subroutines to obtain the attribute values for insns and information
+about processor pipeline characteristics for the instruction
+scheduler.  It is generated from the machine description by the
+program @file{genattrtab}.
@end itemize
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@ -5401,11 +5401,19 @@ hooks for this purpose.  It is usually enough to define just a few of
 them: try the first ones in this list first.

@deftypefn {Target Hook} int TARGET_SCHED_ISSUE_RATE (void)
-This hook returns the maximum number of instructions that can ever issue
-at the same time on the target machine.  The default is one.  This value
-must be constant over the entire compilation.  If you need it to vary
-depending on what the instructions are, you must use
+This hook returns the maximum number of instructions that can ever
+issue at the same time on the target machine.  The default is one.
+Although the insn scheduler can define itself the possibility of issue
+an insn on the same cycle, the value can serve as an additional
+constraint to issue insns on the same simulated processor cycle (see
+hooks @samp{TARGET_SCHED_REORDER} and @samp{TARGET_SCHED_REORDER2}).
+This value must be constant over the entire compilation.  If you need
+it to vary depending on what the instructions are, you must use
@samp{TARGET_SCHED_VARIABLE_ISSUE}.
+
+You could use the value of macro @samp{MAX_DFA_ISSUE_RATE} to return
+the value of the hook @samp{TARGET_SCHED_ISSUE_RATE} for the automaton
+based pipeline interface.
@end deftypefn

@deftypefn {Target Hook} int TARGET_SCHED_VARIABLE_ISSUE (FILE *@var{file}, int @var{verbose}, rtx @var{insn}, int @var{more})
@ -5421,12 +5429,18 @@ instruction that was scheduled.
@end deftypefn

@deftypefn {Target Hook} int TARGET_SCHED_ADJUST_COST (rtx @var{insn}, rtx @var{link}, rtx @var{dep_insn}, int @var{cost})
-This function corrects the value of @var{cost} based on the relationship
-between @var{insn} and @var{dep_insn} through the dependence @var{link}.
-It should return the new value.  The default is to make no adjustment to
-@var{cost}.  This can be used for example to specify to the scheduler
+This function corrects the value of @var{cost} based on the
+relationship between @var{insn} and @var{dep_insn} through the
+dependence @var{link}.  It should return the new value.  The default
+is to make no adjustment to @var{cost}.  This can be used for example
+to specify to the scheduler using the traditional pipeline description
 that an output- or anti-dependence does not incur the same cost as a
-data-dependence.
+data-dependence.  If the scheduler using the automaton based pipeline
+description, the cost of anti-dependence is zero and the cost of
+output-dependence is maximum of one and the difference of latency
+times of the first and the second insns.  If these values are not
+acceptable, you could use the hook to modify them too.  See also
+@pxref{Automaton pipeline description}.
@end deftypefn

@deftypefn {Target Hook} int TARGET_SCHED_ADJUST_PRIORITY (rtx @var{insn}, int @var{priority})
@ -5492,6 +5506,140 @@ RTL dumps and assembly output.  Define this hook only if you need this
 level of detail about what the scheduler is doing.
@end deftypefn

+@deftypefn {Target Hook} int TARGET_SCHED_USE_DFA_PIPELINE_INTERFACE (void)
+This hook is called many times during insn scheduling.  If the hook
+returns nonzero, the automaton based pipeline description is used for
+insn scheduling.  Otherwise the traditional pipeline description is
+used.  The default is usage of the traditional pipeline description.
+
+You should also remember that to simplify the insn scheduler sources
+an empty traditional pipeline description interface is generated even
+if there is no a traditional pipeline description in the @file{.md}
+file.  The same is true for the automaton based pipeline description.
+That means that you should be accurate in defining the hook.
+@end deftypefn
+
+@deftypefn {Target Hook} int TARGET_SCHED_DFA_PRE_CYCLE_INSN (void)
+The hook returns an RTL insn.  The automaton state used in the
+pipeline hazard recognizer is changed as if the insn were scheduled
+when the new simulated processor cycle starts.  Usage of the hook may
+simplify the automaton pipeline description for some @acronym{VLIW}
+processors.  If the hook is defined, it is used only for the automaton
+based pipeline description.  The default is not to change the state
+when the new simulated processor cycle starts.
+@end deftypefn
+
+@deftypefn {Target Hook} void TARGET_SCHED_INIT_DFA_PRE_CYCLE_INSN (void)
+The hook can be used to initialize data used by the previous hook.
+@end deftypefn
+
+@deftypefn {Target Hook} int TARGET_SCHED_DFA_POST_CYCLE_INSN (void)
+The hook is analogous to @samp{TARGET_SCHED_DFA_PRE_CYCLE_INSN} but used
+to changed the state as if the insn were scheduled when the new
+simulated processor cycle finishes.
+@end deftypefn
+
+@deftypefn {Target Hook} void TARGET_SCHED_INIT_DFA_POST_CYCLE_INSN (void)
+The hook is analogous to @samp{TARGET_SCHED_INIT_DFA_PRE_CYCLE_INSN} but
+used to initialize data used by the previous hook.
+@end deftypefn
+
+@deftypefn {Target Hook} int TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD (void)
+This hook controls better choosing an insn from the ready insn queue
+for the @acronym{DFA}-based insn scheduler.  Usually the scheduler
+chooses the first insn from the queue.  If the hook returns a positive
+value, an additional scheduler code tries all permutations of
+@samp{TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD ()}
+subsequent ready insns to choose an insn whose issue will result in
+maximal number of issued insns on the same cycle.  For the
+@acronym{VLIW} processor, the code could actually solve the problem of
+packing simple insns into the @acronym{VLIW} insn.  Of course, if the
+rules of @acronym{VLIW} packing are described in the automaton.
+
+This code also could be used for superscalar @acronym{RISC}
+processors.  Let us consider a superscalar @acronym{RISC} processor
+with 3 pipelines.  Some insns can be executed in pipelines @var{A} or
+@var{B}, some insns can be executed only in pipelines @var{B} or
+@var{C}, and one insn can be executed in pipeline @var{B}.  The
+processor may issue the 1st insn into @var{A} and the 2nd one into
+@var{B}.  In this case, the 3rd insn will wait for freeing @var{B}
+until the next cycle.  If the scheduler issues the 3rd insn the first,
+the processor could issue all 3 insns per cycle.
+
+Actually this code demonstrates advantages of the automaton based
+pipeline hazard recognizer.  We try quickly and easy many insn
+schedules to choose the best one.
+
+The default is no multipass scheduling.
+@end deftypefn
+
+@deftypefn {Target Hook} void TARGET_SCHED_INIT_DFA_BUBBLES (void)
+The @acronym{DFA}-based scheduler could take the insertion of nop
+operations for better insn scheduling into account.  It can be done
+only if the multi-pass insn scheduling works (see hook
+@samp{TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD}).
+
+Let us consider a @acronym{VLIW} processor insn with 3 slots.  Each
+insn can be placed only in one of the three slots.  We have 3 ready
+insns @var{A}, @var{B}, and @var{C}.  @var{A} and @var{C} can be
+placed only in the 1st slot, @var{B} can be placed only in the 3rd
+slot.  We described the automaton which does not permit empty slot
+gaps between insns (usually such description is simpler).  Without
+this code the scheduler would place each insn in 3 separate
+@acronym{VLIW} insns.  If the scheduler places a nop insn into the 2nd
+slot, it could place the 3 insns into 2 @acronym{VLIW} insns.  What is
+the nop insn is returned by hook @samp{TARGET_SCHED_DFA_BUBBLE}.  Hook
+@samp{TARGET_SCHED_INIT_DFA_BUBBLES} can be used to initialize or
+create the nop insns.
+
+You should remember that the scheduler does not insert the nop insns.
+It is not wise because of the following optimizations.  The scheduler
+only considers such possibility to improve the result schedule.  The
+nop insns should be inserted lately, e.g. on the final phase.
+@end deftypefn
+
+@deftypefn {Target Hook} rtx TARGET_SCHED_DFA_BUBBLE (int @var{index})
+This hook @samp{FIRST_CYCLE_MULTIPASS_SCHEDULING} is used to insert
+nop operations for better insn scheduling when @acronym{DFA}-based
+scheduler makes multipass insn scheduling (see also description of
+hook @samp{TARGET_SCHED_INIT_DFA_BUBBLES}).  This hook
+returns a nop insn with given @var{index}.  The indexes start with
+zero.  The hook should return @code{NULL} if there are no more nop
+insns with indexes greater than given index.
+@end deftypefn
+
+Macros in the following table are generated by the program
+@file{genattr} and can be useful for writing the hooks.
+
+@table @code
+@findex TRADITIONAL_PIPELINE_INTERFACE
+@item TRADITIONAL_PIPELINE_INTERFACE
+The macro definition is generated if there is a traditional pipeline
+description in @file{.md} file. You should also remember that to
+simplify the insn scheduler sources an empty traditional pipeline
+description interface is generated even if there is no a traditional
+pipeline description in the @file{.md} file.  The macro can be used to
+distinguish the two types of the traditional interface.
+
+@findex DFA_PIPELINE_INTERFACE
+@item DFA_PIPELINE_INTERFACE
+The macro definition is generated if there is an automaton pipeline
+description in @file{.md} file.  You should also remember that to
+simplify the insn scheduler sources an empty automaton pipeline
+description interface is generated even if there is no an automaton
+pipeline description in the @file{.md} file.  The macro can be used to
+distinguish the two types of the automaton interface.
+
+@findex MAX_DFA_ISSUE_RATE
+@item MAX_DFA_ISSUE_RATE
+The macro definition is generated in the automaton based pipeline
+description interface.  Its value is calculated from the automaton
+based pipeline description and is equal to maximal number of all insns
+described in constructions @samp{define_insn_reservation} which can be
+issued on the same processor cycle.
+
+@end table
+
@node Sections
@section Dividing the Output into Sections (Texts, Data, @dots{})
@c the above section title is WAY too long.  maybe cut the part between
--- a/gcc/genattr.c
+++ b/gcc/genattr.c
@ -193,6 +193,7 @@ main (argc, argv)
  int have_delay = 0;
  int have_annul_true = 0;
  int have_annul_false = 0;
+  int num_insn_reservations = 0;
  int num_units = 0;
  struct range all_simultaneity, all_multiplicity;
  struct range all_ready_cost, all_issue_delay, all_blockage;
@ -308,10 +309,18 @@ main (argc, argv)
 	  extend_range (&all_issue_delay,
 			unit->issue_delay.min, unit->issue_delay.max);
 	}
+      else if (GET_CODE (desc) == DEFINE_INSN_RESERVATION)
+	num_insn_reservations++;
    }

-  if (num_units > 0)
+  if (num_units > 0 || num_insn_reservations > 0)
    {
+      if (num_units > 0)
+	printf ("#define TRADITIONAL_PIPELINE_INTERFACE 1\n");
+
+      if (num_insn_reservations > 0)
+	printf ("#define DFA_PIPELINE_INTERFACE 1\n");
+
      /* Compute the range of blockage cost values.  See genattrtab.c
 	 for the derivation.  BLOCKAGE (E,C) when SIMULTANEITY is zero is

@ -348,6 +357,102 @@ main (argc, argv)

      write_units (num_units, &all_multiplicity, &all_simultaneity,
 		   &all_ready_cost, &all_issue_delay, &all_blockage);
+
+      /* Output interface for pipeline hazards recognition based on
+	 DFA (deterministic finite state automata.  */
+      printf ("\n/* DFA based pipeline interface.  */");
+      printf ("\n#ifndef AUTOMATON_STATE_ALTS\n");
+      printf ("#define AUTOMATON_STATE_ALTS 0\n");
+      printf ("#endif\n\n");
+      printf ("#ifndef CPU_UNITS_QUERY\n");
+      printf ("#define CPU_UNITS_QUERY 0\n");
+      printf ("#endif\n\n");
+      /* Interface itself: */
+      printf ("extern int max_dfa_issue_rate;\n\n");
+      printf ("/* The following macro value is calculated from the\n");
+      printf ("   automaton based pipeline description and is equal to\n");
+      printf ("   maximal number of all insns described in constructions\n");
+      printf ("   `define_insn_reservation' which can be issued on the\n");
+      printf ("   same processor cycle. */\n");
+      printf ("#define MAX_DFA_ISSUE_RATE max_dfa_issue_rate\n\n");
+      printf ("/* Insn latency time defined in define_insn_reservation. */\n");
+      printf ("extern int insn_default_latency PARAMS ((rtx));\n\n");
+      printf ("/* Return nonzero if there is a bypass for given insn\n");
+      printf ("   which is a data producer.  */\n");
+      printf ("extern int bypass_p PARAMS ((rtx));\n\n");
+      printf ("/* Insn latency time on data consumed by the 2nd insn.\n");
+      printf ("   Use the function if bypass_p returns nonzero for\n");
+      printf ("   the 1st insn. */\n");
+      printf ("extern int insn_latency PARAMS ((rtx, rtx));\n\n");
+      printf ("/* The following function returns number of alternative\n");
+      printf ("   reservations of given insn.  It may be used for better\n");
+      printf ("   insns scheduling heuristics. */\n");
+      printf ("extern int insn_alts PARAMS ((rtx));\n\n");
+      printf ("/* Maximal possible number of insns waiting results being\n");
+      printf ("   produced by insns whose execution is not finished. */\n");
+      printf ("extern int max_insn_queue_index;\n\n");
+      printf ("/* Pointer to data describing current state of DFA.  */\n");
+      printf ("typedef void *state_t;\n\n");
+      printf ("/* Size of the data in bytes.  */\n");
+      printf ("extern int state_size PARAMS ((void));\n\n");
+      printf ("/* Initiate given DFA state, i.e. Set up the state\n");
+      printf ("   as all functional units were not reserved.  */\n");
+      printf ("extern void state_reset PARAMS ((state_t));\n");
+      printf ("/* The following function returns negative value if given\n");
+      printf ("   insn can be issued in processor state described by given\n");
+      printf ("   DFA state.  In this case, the DFA state is changed to\n");
+      printf ("   reflect the current and future reservations by given\n");
+      printf ("   insn.  Otherwise the function returns minimal time\n");
+      printf ("   delay to issue the insn.  This delay may be zero\n");
+      printf ("   for superscalar or VLIW processors.  If the second\n");
+      printf ("   parameter is NULL the function changes given DFA state\n");
+      printf ("   as new processor cycle started.  */\n");
+      printf ("extern int state_transition PARAMS ((state_t, rtx));\n");
+      printf ("\n#if AUTOMATON_STATE_ALTS\n");
+      printf ("/* The following function returns number of possible\n");
+      printf ("   alternative reservations of given insn in given\n");
+      printf ("   DFA state.  It may be used for better insns scheduling\n");
+      printf ("   heuristics.  By default the function is defined if\n");
+      printf ("   macro AUTOMATON_STATE_ALTS is defined because its\n");
+      printf ("   implementation may require much memory.  */\n");
+      printf ("extern int state_alts PARAMS ((state_t, rtx));\n");
+      printf ("#endif\n\n");
+      printf ("extern int min_issue_delay PARAMS ((state_t, rtx));\n");
+      printf ("/* The following function returns nonzero if no one insn\n");
+      printf ("   can be issued in current DFA state. */\n");
+      printf ("extern int state_dead_lock_p PARAMS ((state_t));\n");
+      printf ("/* The function returns minimal delay of issue of the 2nd\n");
+      printf ("   insn after issuing the 1st insn in given DFA state.\n");
+      printf ("   The 1st insn should be issued in given state (i.e.\n");
+      printf ("    state_transition should return negative value for\n");
+      printf ("    the insn and the state).  Data dependencies between\n");
+      printf ("    the insns are ignored by the function.  */\n");
+      printf
+	("extern int min_insn_conflict_delay PARAMS ((state_t, rtx, rtx));\n");
+      printf ("/* The following function outputs reservations for given\n");
+      printf ("   insn as they are described in the corresponding\n");
+      printf ("   define_insn_reservation.  */\n");
+      printf ("extern void print_reservation PARAMS ((FILE *, rtx));\n");
+      printf ("\n#if CPU_UNITS_QUERY\n");
+      printf ("/* The following function returns code of functional unit\n");
+      printf ("   with given name (see define_cpu_unit). */\n");
+      printf ("extern int get_cpu_unit_code PARAMS ((const char *));\n");
+      printf ("/* The following function returns nonzero if functional\n");
+      printf ("   unit with given code is currently reserved in given\n");
+      printf ("   DFA state.  */\n");
+      printf ("extern int cpu_unit_reservation_p PARAMS ((state_t, int));\n");
+      printf ("#endif\n\n");
+      printf ("/* Initiate and finish work with DFA.  They should be\n");
+      printf ("   called as the first and the last interface\n");
+      printf ("   functions.  */\n");
+      printf ("extern void dfa_start PARAMS ((void));\n");
+      printf ("extern void dfa_finish PARAMS ((void));\n");
+    }
+  else
+    {
+      /* Otherwise we do no scheduling, but we need these typedefs
+	 in order to avoid uglifying other code with more ifdefs.  */
+      printf ("typedef void *state_t;\n\n");
    }

  /* Output flag masks for use by reorg.  
--- a/gcc/genattrtab.c
+++ b/gcc/genattrtab.c
@ -115,6 +115,8 @@ Software Foundation, 59 Temple Place - Suite 330, Boston, MA
 #include "obstack.h"
 #include "errors.h"

+#include "genattrtab.h"
+
 static struct obstack obstack1, obstack2;
 struct obstack *hash_obstack = &obstack1;
 struct obstack *temp_obstack = &obstack2;
@ -309,6 +311,8 @@ static int have_annul_true, have_annul_false;
 static int num_units, num_unit_opclasses;
 static int num_insn_ents;

+int num_dfa_decls;
+
 /* Used as operand to `operate_exp':  */

 enum operator {PLUS_OP, MINUS_OP, POS_MINUS_OP, EQ_OP, OR_OP, ORX_OP, MAX_OP, MIN_OP, RANGE_OP};
@ -371,10 +375,7 @@ static void attr_hash_add_rtx	PARAMS ((int, rtx));
 static void attr_hash_add_string PARAMS ((int, char *));
 static rtx attr_rtx		PARAMS ((enum rtx_code, ...));
 static rtx attr_rtx_1		PARAMS ((enum rtx_code, va_list));
-static char *attr_printf	PARAMS ((unsigned int, const char *, ...))
-  ATTRIBUTE_PRINTF_2;
 static char *attr_string        PARAMS ((const char *, int));
-static rtx check_attr_test	PARAMS ((rtx, int, int));
 static rtx check_attr_value	PARAMS ((rtx, struct attr_desc *));
 static rtx convert_set_attr_alternative PARAMS ((rtx, struct insn_def *));
 static rtx convert_set_attr	PARAMS ((rtx, struct insn_def *));
@ -458,10 +459,8 @@ static void write_const_num_delay_slots PARAMS ((void));
 static int n_comma_elts		PARAMS ((const char *));
 static char *next_comma_elt	PARAMS ((const char **));
 static struct attr_desc *find_attr PARAMS ((const char *, int));
-static void make_internal_attr	PARAMS ((const char *, rtx, int));
 static struct attr_value *find_most_used  PARAMS ((struct attr_desc *));
 static rtx find_single_value	PARAMS ((struct attr_desc *));
-static rtx make_numeric_value	PARAMS ((int));
 static void extend_range	PARAMS ((struct range *, int, int));
 static rtx attr_eq		PARAMS ((const char *, const char *));
 static const char *attr_numeral	PARAMS ((int));
@ -739,7 +738,7 @@ attr_rtx VPARAMS ((enum rtx_code code, ...))

   rtx attr_printf (len, format, [arg1, ..., argn])  */

-static char *
+char *
 attr_printf VPARAMS ((unsigned int len, const char *fmt, ...))
 {
  char str[256];
@ -920,7 +919,7 @@ attr_copy_rtx (orig)

   Return the new expression, if any.  */

-static rtx
+rtx
 check_attr_test (exp, is_const, lineno)
     rtx exp;
     int is_const;
@ -5880,7 +5879,7 @@ find_attr (name, create)

 /* Create internal attribute with the given default value.  */

-static void
+void
 make_internal_attr (name, value, special)
     const char *name;
     rtx value;
@ -5947,7 +5946,7 @@ find_single_value (attr)

 /* Return (attr_value "n") */

-static rtx
+rtx
 make_numeric_value (n)
     int n;
 {
@ -6097,6 +6096,7 @@ from the machine description file `md'.  */\n\n");

  /* Read the machine description.  */

+  initiate_automaton_gen (argc, argv);
  while (1)
    {
      int lineno;
@ -6125,6 +6125,46 @@ from the machine description file `md'.  */\n\n");
 	  gen_unit (desc, lineno);
 	  break;

+	case DEFINE_CPU_UNIT:
+	  gen_cpu_unit (desc);
+	  break;
+	  
+	case DEFINE_QUERY_CPU_UNIT:
+	  gen_query_cpu_unit (desc);
+	  break;
+	  
+	case DEFINE_BYPASS:
+	  gen_bypass (desc);
+	  break;
+	  
+	case EXCLUSION_SET:
+	  gen_excl_set (desc);
+	  break;
+	  
+	case PRESENCE_SET:
+	  gen_presence_set (desc);
+	  break;
+	  
+	case ABSENCE_SET:
+	  gen_absence_set (desc);
+	  break;
+	  
+	case DEFINE_AUTOMATON:
+	  gen_automaton (desc);
+	  break;
+	  
+	case AUTOMATA_OPTION:
+	  gen_automata_option (desc);
+	  break;
+	  
+	case DEFINE_RESERVATION:
+	  gen_reserv (desc);
+	  break;
+	  
+	case DEFINE_INSN_RESERVATION:
+	  gen_insn_reserv (desc);
+	  break;
+
 	default:
 	  break;
 	}
@ -6149,9 +6189,14 @@ from the machine description file `md'.  */\n\n");
  if (num_delays)
    expand_delays ();

-  /* Expand DEFINE_FUNCTION_UNIT information into new attributes.  */
-  if (num_units)
-    expand_units ();
+  if (num_units || num_dfa_decls)
+    {
+      /* Expand DEFINE_FUNCTION_UNIT information into new attributes.  */
+      expand_units ();
+      /* Build DFA, output some functions and expand DFA information
+	 into new attributes.  */
+      expand_automata ();
+    }

  printf ("#include \"config.h\"\n");
  printf ("#include \"system.h\"\n");
@ -6226,9 +6271,14 @@ from the machine description file `md'.  */\n\n");
 	write_eligible_delay ("annul_false");
    }

-  /* Write out information about function units.  */
-  if (num_units)
-    write_function_unit_info ();
+  if (num_units || num_dfa_decls)
+    {
+      /* Write out information about function units.  */
+      write_function_unit_info ();
+      /* Output code for pipeline hazards recognition based on DFA
+	 (deterministic finite state automata. */
+      write_automata ();
+    }

  /* Write out constant delay slot info */
  write_const_num_delay_slots ();
--- a/gcc/genattrtab.h
+++ b/gcc/genattrtab.h
@ -0,0 +1,43 @@
+/* External definitions of source files of genattrtab.
+   Copyright (C)  2001 Free Software Foundation, Inc.
+
+This file is part of GNU CC.
+
+GNU CC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 2, or (at your option)
+any later version.
+
+GNU CC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GNU CC; see the file COPYING.  If not, write to
+the Free Software Foundation, 59 Temple Place - Suite 330,
+Boston, MA 02111-1307, USA.  */
+
+/* Defined in genattrtab.c: */
+extern rtx check_attr_test	PARAMS ((rtx, int, int));
+extern rtx make_numeric_value	PARAMS ((int));
+extern void make_internal_attr	PARAMS ((const char *, rtx, int));
+extern char *attr_printf	PARAMS ((unsigned int, const char *, ...))
+  ATTRIBUTE_PRINTF_2;
+
+extern int num_dfa_decls;
+
+/* Defined in genautomata.c: */
+extern void gen_cpu_unit		PARAMS ((rtx));
+extern void gen_query_cpu_unit		PARAMS ((rtx));
+extern void gen_bypass			PARAMS ((rtx));
+extern void gen_excl_set		PARAMS ((rtx));
+extern void gen_presence_set		PARAMS ((rtx));
+extern void gen_absence_set		PARAMS ((rtx));
+extern void gen_automaton		PARAMS ((rtx));
+extern void gen_automata_option		PARAMS ((rtx));
+extern void gen_reserv   		PARAMS ((rtx));
+extern void gen_insn_reserv     	PARAMS ((rtx));
+extern void initiate_automaton_gen	PARAMS ((int, char **));
+extern void expand_automata             PARAMS ((void));
+extern void write_automata              PARAMS ((void));
--- a/gcc/genautomata.c
+++ b/gcc/genautomata.c
--- a/gcc/haifa-sched.c
+++ b/gcc/haifa-sched.c
@ -158,6 +158,12 @@ Software Foundation, 59 Temple Place - Suite 330, Boston, MA

 static int issue_rate;

+/* If the following variable value is non zero, the scheduler inserts
+   bubbles (nop insns).  The value of variable affects on scheduler
+   behavior only if automaton pipeline interface with multipass
+   scheduling is used and hook dfa_bubble is defined.  */
+int insert_schedule_bubbles_p = 0;
+
 /* sched-verbose controls the amount of debugging output the
   scheduler prints.  It is controlled by -fsched-verbose=N:
   N>0 and no -DSR : the output is directed to stderr.
@ -254,14 +260,39 @@ static rtx note_list;
   passes or stalls are introduced.  */

 /* Implement a circular buffer to delay instructions until sufficient
-   time has passed.  INSN_QUEUE_SIZE is a power of two larger than
-   MAX_BLOCKAGE and MAX_READY_COST computed by genattr.c.  This is the
-   longest time an isnsn may be queued.  */
-static rtx insn_queue[INSN_QUEUE_SIZE];
+   time has passed.  For the old pipeline description interface,
+   INSN_QUEUE_SIZE is a power of two larger than MAX_BLOCKAGE and
+   MAX_READY_COST computed by genattr.c.  For the new pipeline
+   description interface, MAX_INSN_QUEUE_INDEX is a power of two minus
+   one which is larger than maximal time of instruction execution
+   computed by genattr.c on the base maximal time of functional unit
+   reservations and geting a result.  This is the longest time an
+   insn may be queued.  */
+
+#define MAX_INSN_QUEUE_INDEX max_insn_queue_index_macro_value
+
+static rtx *insn_queue;
 static int q_ptr = 0;
 static int q_size = 0;
-#define NEXT_Q(X) (((X)+1) & (INSN_QUEUE_SIZE-1))
-#define NEXT_Q_AFTER(X, C) (((X)+C) & (INSN_QUEUE_SIZE-1))
+#define NEXT_Q(X) (((X)+1) & MAX_INSN_QUEUE_INDEX)
+#define NEXT_Q_AFTER(X, C) (((X)+C) & MAX_INSN_QUEUE_INDEX)
+
+/* The following variable defines value for macro
+   MAX_INSN_QUEUE_INDEX.  */
+static int max_insn_queue_index_macro_value;
+
+/* The following variable value refers for all current and future
+   reservations of the processor units.  */
+state_t curr_state;
+
+/* The following variable value is size of memory representing all
+   current and future reservations of the processor units.  It is used
+   only by DFA based scheduler.  */
+static size_t dfa_state_size;
+
+/* The following array is used to find the best insn from ready when
+   the automaton pipeline interface is used.  */
+static char *ready_try;

 /* Describe the ready list of the scheduler.
   VEC holds space enough for all insns in the current region.  VECLEN
@ -280,11 +311,15 @@ struct ready_list
 };

 /* Forward declarations.  */
+
+/* The scheduler using only DFA description should never use the
+   following five functions:  */
 static unsigned int blockage_range PARAMS ((int, rtx));
 static void clear_units PARAMS ((void));
 static void schedule_unit PARAMS ((int, rtx, int));
 static int actual_hazard PARAMS ((int, rtx, int, int));
 static int potential_hazard PARAMS ((int, rtx, int));
+
 static int priority PARAMS ((rtx));
 static int rank_for_schedule PARAMS ((const PTR, const PTR));
 static void swap_sort PARAMS ((rtx *, int));
@ -292,6 +327,7 @@ static void queue_insn PARAMS ((rtx, int));
 static void schedule_insn PARAMS ((rtx, struct ready_list *, int));
 static void find_insn_reg_weight PARAMS ((int));
 static void adjust_priority PARAMS ((rtx));
+static void advance_one_cycle PARAMS ((void));

 /* Notes handling mechanism:
   =========================
@ -331,6 +367,14 @@ static void debug_ready_list PARAMS ((struct ready_list *));
 static rtx move_insn1 PARAMS ((rtx, rtx));
 static rtx move_insn PARAMS ((rtx, rtx));

+/* The following functions are used to implement multi-pass scheduling
+   on the first cycle.  It is used only for DFA based scheduler.  */
+static rtx ready_element PARAMS ((struct ready_list *, int));
+static rtx ready_remove PARAMS ((struct ready_list *, int));
+static int max_issue PARAMS ((struct ready_list *, state_t, int *));
+
+static rtx choose_ready PARAMS ((struct ready_list *));
+
 #endif /* INSN_SCHEDULING */

 /* Point to state used for the current scheduling pass.  */
@ -354,7 +398,8 @@ static rtx last_scheduled_insn;
   returned by function_units_used.  A function unit is encoded as the
   unit number if the value is non-negative and the compliment of a
   mask if the value is negative.  A function unit index is the
-   non-negative encoding.  */
+   non-negative encoding.  The scheduler using only DFA description
+   should never use the following function.  */

 HAIFA_INLINE int
 insn_unit (insn)
@ -391,7 +436,9 @@ insn_unit (insn)
 /* Compute the blockage range for executing INSN on UNIT.  This caches
   the value returned by the blockage_range_function for the unit.
   These values are encoded in an int where the upper half gives the
-   minimum value and the lower half gives the maximum value.  */
+   minimum value and the lower half gives the maximum value.  The
+   scheduler using only DFA description should never use the following
+   function.  */

 HAIFA_INLINE static unsigned int
 blockage_range (unit, insn)
@ -415,20 +462,38 @@ blockage_range (unit, insn)
  return range;
 }

-/* A vector indexed by function unit instance giving the last insn to use
-   the unit.  The value of the function unit instance index for unit U
-   instance I is (U + I * FUNCTION_UNITS_SIZE).  */
+/* A vector indexed by function unit instance giving the last insn to
+   use the unit.  The value of the function unit instance index for
+   unit U instance I is (U + I * FUNCTION_UNITS_SIZE).  The scheduler
+   using only DFA description should never use the following variable.  */
+#if FUNCTION_UNITS_SIZE
 static rtx unit_last_insn[FUNCTION_UNITS_SIZE * MAX_MULTIPLICITY];
+#else
+static rtx unit_last_insn[1];
+#endif

-/* A vector indexed by function unit instance giving the minimum time when
-   the unit will unblock based on the maximum blockage cost.  */
+/* A vector indexed by function unit instance giving the minimum time
+   when the unit will unblock based on the maximum blockage cost.  The
+   scheduler using only DFA description should never use the following
+   variable.  */
+#if FUNCTION_UNITS_SIZE
 static int unit_tick[FUNCTION_UNITS_SIZE * MAX_MULTIPLICITY];
+#else
+static int unit_tick[1];
+#endif

 /* A vector indexed by function unit number giving the number of insns
-   that remain to use the unit.  */
+   that remain to use the unit.  The scheduler using only DFA
+   description should never use the following variable.  */
+#if FUNCTION_UNITS_SIZE
 static int unit_n_insns[FUNCTION_UNITS_SIZE];
+#else
+static int unit_n_insns[1];
+#endif

-/* Access the unit_last_insn array.  Used by the visualization code.  */
+/* Access the unit_last_insn array.  Used by the visualization code.
+   The scheduler using only DFA description should never use the
+   following function.  */

 rtx
 get_unit_last_insn (instance)
@ -447,7 +512,8 @@ clear_units ()
  memset ((char *) unit_n_insns, 0, sizeof (unit_n_insns));
 }

-/* Return the issue-delay of an insn.  */
+/* Return the issue-delay of an insn.  The scheduler using only DFA
+   description should never use the following function.  */

 HAIFA_INLINE int
 insn_issue_delay (insn)
@ -477,7 +543,8 @@ insn_issue_delay (insn)

 /* Return the actual hazard cost of executing INSN on the unit UNIT,
   instance INSTANCE at time CLOCK if the previous actual hazard cost
-   was COST.  */
+   was COST.  The scheduler using only DFA description should never
+   use the following function.  */

 HAIFA_INLINE int
 actual_hazard_this_instance (unit, instance, insn, clock, cost)
@ -513,8 +580,9 @@ actual_hazard_this_instance (unit, instance, insn, clock, cost)
  return cost;
 }

-/* Record INSN as having begun execution on the units encoded by UNIT at
-   time CLOCK.  */
+/* Record INSN as having begun execution on the units encoded by UNIT
+   at time CLOCK.  The scheduler using only DFA description should
+   never use the following function.  */

 HAIFA_INLINE static void
 schedule_unit (unit, insn, clock)
@ -545,8 +613,10 @@ schedule_unit (unit, insn, clock)
 	schedule_unit (i, insn, clock);
 }

-/* Return the actual hazard cost of executing INSN on the units encoded by
-   UNIT at time CLOCK if the previous actual hazard cost was COST.  */
+/* Return the actual hazard cost of executing INSN on the units
+   encoded by UNIT at time CLOCK if the previous actual hazard cost
+   was COST.  The scheduler using only DFA description should never
+   use the following function.  */

 HAIFA_INLINE static int
 actual_hazard (unit, insn, clock, cost)
@ -591,11 +661,13 @@ actual_hazard (unit, insn, clock, cost)
 }

 /* Return the potential hazard cost of executing an instruction on the
-   units encoded by UNIT if the previous potential hazard cost was COST.
-   An insn with a large blockage time is chosen in preference to one
-   with a smaller time; an insn that uses a unit that is more likely
-   to be used is chosen in preference to one with a unit that is less
-   used.  We are trying to minimize a subsequent actual hazard.  */
+   units encoded by UNIT if the previous potential hazard cost was
+   COST.  An insn with a large blockage time is chosen in preference
+   to one with a smaller time; an insn that uses a unit that is more
+   likely to be used is chosen in preference to one with a unit that
+   is less used.  We are trying to minimize a subsequent actual
+   hazard.  The scheduler using only DFA description should never use
+   the following function.  */

 HAIFA_INLINE static int
 potential_hazard (unit, insn, cost)
@ -648,62 +720,69 @@ insn_cost (insn, link, used)
 {
  int cost = INSN_COST (insn);

-  if (cost == 0)
+  if (cost < 0)
    {
-      recog_memoized (insn);
-
-      /* A USE insn, or something else we don't need to understand.
-         We can't pass these directly to result_ready_cost because it will
-         trigger a fatal error for unrecognizable insns.  */
-      if (INSN_CODE (insn) < 0)
+      /* A USE insn, or something else we don't need to
+	 understand.  We can't pass these directly to
+	 result_ready_cost or insn_default_latency because it will
+	 trigger a fatal error for unrecognizable insns.  */
+      if (recog_memoized (insn) < 0)
 	{
-	  INSN_COST (insn) = 1;
-	  return 1;
+	  INSN_COST (insn) = 0;
+	  return 0;
 	}
      else
 	{
-	  cost = result_ready_cost (insn);
-
-	  if (cost < 1)
-	    cost = 1;
-
+	  if (targetm.sched.use_dfa_pipeline_interface
+	      && (*targetm.sched.use_dfa_pipeline_interface) ())
+	    cost = insn_default_latency (insn);
+	  else
+	    cost = result_ready_cost (insn);
+	  
+	  if (cost < 0)
+	    cost = 0;
+	  
 	  INSN_COST (insn) = cost;
 	}
    }

  /* In this case estimate cost without caring how insn is used.  */
-  if (link == 0 && used == 0)
+  if (link == 0 || used == 0)
    return cost;

-  /* A USE insn should never require the value used to be computed.  This
-     allows the computation of a function's result and parameter values to
-     overlap the return and call.  */
-  recog_memoized (used);
-  if (INSN_CODE (used) < 0)
-    LINK_COST_FREE (link) = 1;
-
-  /* If some dependencies vary the cost, compute the adjustment.  Most
-     commonly, the adjustment is complete: either the cost is ignored
-     (in the case of an output- or anti-dependence), or the cost is
-     unchanged.  These values are cached in the link as LINK_COST_FREE
-     and LINK_COST_ZERO.  */
-
-  if (LINK_COST_FREE (link))
+  /* A USE insn should never require the value used to be computed.
+     This allows the computation of a function's result and parameter
+     values to overlap the return and call.  */
+  if (recog_memoized (used) < 0)
    cost = 0;
-  else if (!LINK_COST_ZERO (link) && targetm.sched.adjust_cost)
+  else
    {
-      int ncost = (*targetm.sched.adjust_cost) (used, link, insn, cost);
-
-      if (ncost < 1)
+      if (targetm.sched.use_dfa_pipeline_interface
+	  && (*targetm.sched.use_dfa_pipeline_interface) ())
 	{
-	  LINK_COST_FREE (link) = 1;
-	  ncost = 0;
+	  if (INSN_CODE (insn) >= 0)
+	    {
+	      if (REG_NOTE_KIND (link) == REG_DEP_ANTI)
+		cost = 0;
+	      else if (REG_NOTE_KIND (link) == REG_DEP_OUTPUT)
+		{
+		  cost = (insn_default_latency (insn)
+			  - insn_default_latency (used));
+		  if (cost <= 0)
+		    cost = 1;
+		}
+	      else if (bypass_p (insn))
+		cost = insn_latency (insn, used);
+	    }
 	}
-      if (cost == ncost)
-	LINK_COST_ZERO (link) = 1;
-      cost = ncost;
-    }

+      if (targetm.sched.adjust_cost)
+	cost = (*targetm.sched.adjust_cost) (used, link, insn, cost);
+
+      if (cost < 0)
+	cost = 0;
+    }
+  
  return cost;
 }

@ -930,6 +1009,48 @@ ready_remove_first (ready)
  return t;
 }

+/* The following code implements multi-pass scheduling for the first
+   cycle.  In other words, we will try to choose ready insn which
+   permits to start maximum number of insns on the same cycle.  */
+
+/* Return a pointer to the element INDEX from the ready.  INDEX for
+   insn with the highest priority is 0, and the lowest priority has
+   N_READY - 1.  */
+
+HAIFA_INLINE static rtx
+ready_element (ready, index)
+     struct ready_list *ready;
+     int index;
+{
+  if (ready->n_ready == 0 || index >= ready->n_ready)
+    abort ();
+  return ready->vec[ready->first - index];
+}
+
+/* Remove the element INDEX from the ready list and return it.  INDEX
+   for insn with the highest priority is 0, and the lowest priority
+   has N_READY - 1.  */
+
+HAIFA_INLINE static rtx
+ready_remove (ready, index)
+     struct ready_list *ready;
+     int index;
+{
+  rtx t;
+  int i;
+
+  if (index == 0)
+    return ready_remove_first (ready);
+  if (ready->n_ready == 0 || index >= ready->n_ready)
+    abort ();
+  t = ready->vec[ready->first - index];
+  ready->n_ready--;
+  for (i = index; i < ready->n_ready; i++)
+    ready->vec[ready->first - i] = ready->vec[ready->first - i - 1];
+  return t;
+}
+
+
 /* Sort the ready list READY by ascending priority, using the SCHED_SORT
   macro.  */

@ -961,6 +1082,25 @@ adjust_priority (prev)
      (*targetm.sched.adjust_priority) (prev, INSN_PRIORITY (prev));
 }

+/* Advance time on one cycle.  */
+HAIFA_INLINE static void
+advance_one_cycle ()
+{
+  if (targetm.sched.use_dfa_pipeline_interface
+      && (*targetm.sched.use_dfa_pipeline_interface) ())
+    {
+      if (targetm.sched.dfa_pre_cycle_insn)
+	state_transition (curr_state,
+			  (*targetm.sched.dfa_pre_cycle_insn) ());
+
+      state_transition (curr_state, NULL);
+
+      if (targetm.sched.dfa_post_cycle_insn)
+	state_transition (curr_state,
+			  (*targetm.sched.dfa_post_cycle_insn) ());
+    }
+}
+
 /* Clock at which the previous instruction was issued.  */
 static int last_clock_var;

@ -976,26 +1116,50 @@ schedule_insn (insn, ready, clock)
     int clock;
 {
  rtx link;
-  int unit;
+  int unit = 0;

-  unit = insn_unit (insn);
+  if (!targetm.sched.use_dfa_pipeline_interface
+      || !(*targetm.sched.use_dfa_pipeline_interface) ())
+    unit = insn_unit (insn);

  if (sched_verbose >= 2)
    {
-      fprintf (sched_dump, ";;\t\t--> scheduling insn <<<%d>>> on unit ",
-	       INSN_UID (insn));
-      insn_print_units (insn);
+
+      if (targetm.sched.use_dfa_pipeline_interface
+	  && (*targetm.sched.use_dfa_pipeline_interface) ())
+	{
+	  fprintf (sched_dump,
+		   ";;\t\t--> scheduling insn <<<%d>>>:reservation ",
+		   INSN_UID (insn));
+	  
+	  if (recog_memoized (insn) < 0)
+	    fprintf (sched_dump, "nothing");
+	  else
+	    print_reservation (sched_dump, insn);
+	}
+      else
+	{
+	  fprintf (sched_dump, ";;\t\t--> scheduling insn <<<%d>>> on unit ",
+		   INSN_UID (insn));
+	  insn_print_units (insn);
+	}
+
      fprintf (sched_dump, "\n");
    }

-  if (sched_verbose && unit == -1)
-    visualize_no_unit (insn);
+  if (!targetm.sched.use_dfa_pipeline_interface
+      || !(*targetm.sched.use_dfa_pipeline_interface) ())
+    {
+      if (sched_verbose && unit == -1)
+	visualize_no_unit (insn);

-  if (MAX_BLOCKAGE > 1 || issue_rate > 1 || sched_verbose)
-    schedule_unit (unit, insn, clock);

-  if (INSN_DEPEND (insn) == 0)
-    return;
+      if (MAX_BLOCKAGE > 1 || issue_rate > 1 || sched_verbose)
+	schedule_unit (unit, insn, clock);
+      
+      if (INSN_DEPEND (insn) == 0)
+	return;
+    }

  for (link = INSN_DEPEND (insn); link != 0; link = XEXP (link, 1))
    {
@ -1037,7 +1201,9 @@ schedule_insn (insn, ready, clock)
     to issue on the same cycle as the previous insn.  A machine
     may use this information to decide how the instruction should
     be aligned.  */
-  if (reload_completed && issue_rate > 1)
+  if (reload_completed && issue_rate > 1
+      && GET_CODE (PATTERN (insn)) != USE
+      && GET_CODE (PATTERN (insn)) != CLOBBER)
    {
      PUT_MODE (insn, clock > last_clock_var ? TImode : VOIDmode);
      last_clock_var = clock;
@ -1464,7 +1630,7 @@ queue_to_ready (ready)
    {
      int stalls;

-      for (stalls = 1; stalls < INSN_QUEUE_SIZE; stalls++)
+      for (stalls = 1; stalls <= MAX_INSN_QUEUE_INDEX; stalls++)
 	{
 	  if ((link = insn_queue[NEXT_Q_AFTER (q_ptr, stalls)]))
 	    {
@ -1483,13 +1649,19 @@ queue_to_ready (ready)
 		}
 	      insn_queue[NEXT_Q_AFTER (q_ptr, stalls)] = 0;

-	      if (ready->n_ready)
-		break;
+	      advance_one_cycle ();
+
+	      break;
 	    }
+
+	  advance_one_cycle ();
 	}

-      if (sched_verbose && stalls)
+      if ((!targetm.sched.use_dfa_pipeline_interface
+	   || !(*targetm.sched.use_dfa_pipeline_interface) ())
+	  && sched_verbose && stalls)
 	visualize_stall_cycles (stalls);
+
      q_ptr = NEXT_Q_AFTER (q_ptr, stalls);
      clock_var += stalls;
    }
@ -1505,7 +1677,10 @@ debug_ready_list (ready)
  int i;

  if (ready->n_ready == 0)
-    return;
+    {
+      fprintf (sched_dump, "\n");
+      return;
+    }

  p = ready_lastpos (ready);
  for (i = 0; i < ready->n_ready; i++)
@ -1617,6 +1792,113 @@ move_insn (insn, last)
  return retval;
 }

+/* The following function returns maximal (or close to maximal) number
+   of insns which can be issued on the same cycle and one of which
+   insns is insns with the best rank (the last insn in READY).  To
+   make this function tries different samples of ready insns.  READY
+   is current queue `ready'.  Global array READY_TRY reflects what
+   insns are already issued in this try.  STATE is current processor
+   state.  If the function returns nonzero, INDEX will contain index
+   of the best insn in READY.  The following function is used only for
+   first cycle multipass scheduling.  */
+
+static int
+max_issue (ready, state, index)
+     struct ready_list *ready;
+     state_t state;
+     int *index;
+{
+  int i, best, n, temp_index, delay;
+  state_t temp_state;
+  rtx insn;
+  int max_lookahead = (*targetm.sched.first_cycle_multipass_dfa_lookahead) ();
+
+  if (state_dead_lock_p (state))
+    return 0;
+
+  temp_state = alloca (dfa_state_size);
+  best = 0;
+  
+  for (i = 0; i < ready->n_ready; i++)
+    if (!ready_try [i])
+      {
+	insn = ready_element (ready, i);
+	
+	if (INSN_CODE (insn) < 0)
+	  continue;
+	
+	memcpy (temp_state, state, dfa_state_size);
+	
+	delay = state_transition (temp_state, insn);
+	
+	if (delay == 0)
+	  {
+	    if (!targetm.sched.dfa_bubble)
+	      continue;
+	    else
+	      {
+		int j;
+		rtx bubble;
+		
+		for (j = 0;
+		     (bubble = (*targetm.sched.dfa_bubble) (j)) != NULL_RTX;
+		     j++)
+		  if (state_transition (temp_state, bubble) < 0
+		      && state_transition (temp_state, insn) < 0)
+		    break;
+		
+		if (bubble == NULL_RTX)
+		  continue;
+	      }
+	  }
+	else if (delay > 0)
+	  continue;
+	
+	--max_lookahead;
+	
+	if (max_lookahead < 0)
+	  break;
+	
+	ready_try [i] = 1;
+
+	n = max_issue (ready, temp_state, &temp_index);
+	if (n > 0 || ready_try[0])
+	  n += 1;
+
+	if (best < n)
+	  {
+	    best = n;
+	    *index = i;
+	  }
+	ready_try [i] = 0;
+      }
+  
+  return best;
+}
+
+/* The following function chooses insn from READY and modifies
+   *N_READY and READY.  The following function is used only for first
+   cycle multipass scheduling.  */
+
+static rtx
+choose_ready (ready)
+     struct ready_list *ready;
+{
+  if (!targetm.sched.first_cycle_multipass_dfa_lookahead
+      || (*targetm.sched.first_cycle_multipass_dfa_lookahead) () <= 0)
+    return ready_remove_first (ready);
+  else
+    {
+      /* Try to choose the better insn.  */
+      int index;
+
+      if (max_issue (ready, curr_state, &index) == 0)
+	return ready_remove_first (ready);
+      else
+	return ready_remove (ready, index);
+    }
+}
+
 /* Called from backends from targetm.sched.reorder to emit stuff into
   the instruction stream.  */

@ -1638,7 +1920,9 @@ schedule_block (b, rgn_n_insns)
     int rgn_n_insns;
 {
  struct ready_list ready;
+  int first_cycle_insn_p;
  int can_issue_more;
+  state_t temp_state = NULL;  /* It is used for multipass scheduling.  */

  /* Head/tail info for this block.  */
  rtx prev_head = current_sched_info->prev_head;
@ -1671,7 +1955,11 @@ schedule_block (b, rgn_n_insns)
      init_block_visualization ();
    }

-  clear_units ();
+  if (targetm.sched.use_dfa_pipeline_interface
+      && (*targetm.sched.use_dfa_pipeline_interface) ())
+    state_reset (curr_state);
+  else
+    clear_units ();

  /* Allocate the ready list.  */
  ready.veclen = rgn_n_insns + 1 + issue_rate;
@ -1679,6 +1967,15 @@ schedule_block (b, rgn_n_insns)
  ready.vec = (rtx *) xmalloc (ready.veclen * sizeof (rtx));
  ready.n_ready = 0;

+  if (targetm.sched.use_dfa_pipeline_interface
+      && (*targetm.sched.use_dfa_pipeline_interface) ())
+    {
+      /* It is used for first cycle multipass scheduling.  */
+      temp_state = alloca (dfa_state_size);
+      ready_try = (char *) xmalloc ((rgn_n_insns + 1) * sizeof (char));
+      memset (ready_try, 0, (rgn_n_insns + 1) * sizeof (char));
+    }
+
  (*current_sched_info->init_ready_list) (&ready);

  if (targetm.sched.md_init)
@ -1691,8 +1988,16 @@ schedule_block (b, rgn_n_insns)
     queue.  */
  q_ptr = 0;
  q_size = 0;
-  last_clock_var = 0;
-  memset ((char *) insn_queue, 0, sizeof (insn_queue));
+
+  if (!targetm.sched.use_dfa_pipeline_interface
+      || !(*targetm.sched.use_dfa_pipeline_interface) ())
+    max_insn_queue_index_macro_value = INSN_QUEUE_SIZE - 1;
+  else
+    max_insn_queue_index_macro_value = max_insn_queue_index;
+
+  insn_queue = (rtx *) alloca ((MAX_INSN_QUEUE_INDEX + 1) * sizeof (rtx));
+  memset ((char *) insn_queue, 0, (MAX_INSN_QUEUE_INDEX + 1) * sizeof (rtx));
+  last_clock_var = -1;

  /* Start just before the beginning of time.  */
  clock_var = -1;
@ -1702,12 +2007,18 @@ schedule_block (b, rgn_n_insns)
    {
      clock_var++;

+      advance_one_cycle ();
+
      /* Add to the ready list all pending insns that can be issued now.
         If there are no ready insns, increment clock until one
         is ready and add all pending insns at that point to the ready
         list.  */
      queue_to_ready (&ready);

+      if (sched_verbose && targetm.sched.cycle_display)
+	last_scheduled_insn
+	  = (*targetm.sched.cycle_display) (clock_var, last_scheduled_insn);
+
      if (ready.n_ready == 0)
 	abort ();

@ -1730,24 +2041,127 @@ schedule_block (b, rgn_n_insns)
      else
 	can_issue_more = issue_rate;

-      if (sched_verbose && targetm.sched.cycle_display)
-	last_scheduled_insn
-	  = (*targetm.sched.cycle_display) (clock_var, last_scheduled_insn);
-
-      if (sched_verbose)
+      first_cycle_insn_p = 1;
+      for (;;)
 	{
-	  fprintf (sched_dump, "\n;;\tReady list (t =%3d):  ", clock_var);
-	  debug_ready_list (&ready);
-	}
+	  rtx insn;
+	  int cost;
+
+	  if (sched_verbose)
+	    {
+	      fprintf (sched_dump, ";;\tReady list (t =%3d):  ",
+		       clock_var);
+	      debug_ready_list (&ready);
+	    }
+
+	  if (!targetm.sched.use_dfa_pipeline_interface
+	      || !(*targetm.sched.use_dfa_pipeline_interface) ())
+	    {
+	      if (ready.n_ready == 0 || !can_issue_more
+		  || !(*current_sched_info->schedule_more_p) ())
+		break;
+	      insn = choose_ready (&ready);
+	      cost = actual_hazard (insn_unit (insn), insn, clock_var, 0);
+	    }
+	  else
+	    {
+	      if (ready.n_ready == 0 || !can_issue_more
+		  || state_dead_lock_p (curr_state)
+		  || !(*current_sched_info->schedule_more_p) ())
+		break;
+	      
+	      /* Select and remove the insn from the ready list.  */
+	      insn = choose_ready (&ready);
+	      
+	      memcpy (temp_state, curr_state, dfa_state_size);
+	      if (recog_memoized (insn) < 0)
+		{
+		  if (!first_cycle_insn_p
+		      && (GET_CODE (PATTERN (insn)) == ASM_INPUT
+			  || asm_noperands (PATTERN (insn)) >= 0))
+		    /* This is asm insn which is tryed to be issued on the
+		       cycle not first.  Issue it on the next cycle.  */
+		    cost = 1;
+		  else
+		    /* A USE insn, or something else we don't need to
+		       understand.  We can't pass these directly to
+		       state_transition because it will trigger a
+		       fatal error for unrecognizable insns.  */
+		    cost = 0;
+		}
+	      else
+		{
+		  cost = state_transition (temp_state, insn);
+
+		  if (targetm.sched.first_cycle_multipass_dfa_lookahead
+		      && targetm.sched.dfa_bubble)
+		    {
+		      if (cost == 0)
+			{
+			  int j;
+			  rtx bubble;
+			  
+			  for (j = 0;
+			       (bubble = (*targetm.sched.dfa_bubble) (j))
+				 != NULL_RTX;
+			       j++)
+			    {
+			      memcpy (temp_state, curr_state, dfa_state_size);
+			      
+			      if (state_transition (temp_state, bubble) < 0
+				  && state_transition (temp_state, insn) < 0)
+				break;
+			    }
+			  
+			  if (bubble != NULL_RTX)
+			    {
+			      if (insert_schedule_bubbles_p)
+				{
+				  rtx copy;
+				  
+				  copy = copy_rtx (PATTERN (bubble));
+				  emit_insn_after (copy, last_scheduled_insn);
+				  last_scheduled_insn
+				    = NEXT_INSN (last_scheduled_insn);
+				  INSN_CODE (last_scheduled_insn)
+				    = INSN_CODE (bubble);
+				  
+				  /* Annotate the same for the first insns
+				     scheduling by using mode.  */
+				  PUT_MODE (last_scheduled_insn,
+					    (clock_var > last_clock_var
+					     ? clock_var - last_clock_var
+					     : VOIDmode));
+				  last_clock_var = clock_var;
+				  
+				  if (sched_verbose >= 2)
+				    {
+				      fprintf (sched_dump,
+					       ";;\t\t--> scheduling bubble insn <<<%d>>>:reservation ",
+					       INSN_UID (last_scheduled_insn));
+				      
+				      if (recog_memoized (last_scheduled_insn)
+					  < 0)
+					fprintf (sched_dump, "nothing");
+				      else
+					print_reservation
+					  (sched_dump, last_scheduled_insn);
+				      
+				      fprintf (sched_dump, "\n");
+				    }
+				}
+			      cost = -1;
+			    }
+			}
+		    }
+
+		  if (cost < 0)
+		    cost = 0;
+		  else if (cost == 0)
+		    cost = 1;
+		}
+	    }

-      /* Issue insns from ready list.  */
-      while (ready.n_ready != 0
-	     && can_issue_more
-	     && (*current_sched_info->schedule_more_p) ())
-	{
-	  /* Select and remove the insn from the ready list.  */
-	  rtx insn = ready_remove_first (&ready);
-	  int cost = actual_hazard (insn_unit (insn), insn, clock_var, 0);

 	  if (cost >= 1)
 	    {
@ -1760,6 +2174,10 @@ schedule_block (b, rgn_n_insns)

 	  last_scheduled_insn = move_insn (insn, last_scheduled_insn);

+	  if (targetm.sched.use_dfa_pipeline_interface
+	      && (*targetm.sched.use_dfa_pipeline_interface) ())
+	    memcpy (curr_state, temp_state, dfa_state_size);
+	    
 	  if (targetm.sched.variable_issue)
 	    can_issue_more =
 	      (*targetm.sched.variable_issue) (sched_dump, sched_verbose,
@ -1770,6 +2188,8 @@ schedule_block (b, rgn_n_insns)
 	  schedule_insn (insn, &ready, clock_var);

 	next:
+	  first_cycle_insn_p = 0;
+
 	  if (targetm.sched.reorder2)
 	    {
 	      /* Sort the ready list based on priority.  */
@ -1783,8 +2203,10 @@ schedule_block (b, rgn_n_insns)
 	    }
 	}

-      /* Debug info.  */
-      if (sched_verbose)
+      if ((!targetm.sched.use_dfa_pipeline_interface
+	   || !(*targetm.sched.use_dfa_pipeline_interface) ())
+	  && sched_verbose)
+	/* Debug info.  */
 	visualize_scheduled_insns (clock_var);
    }

@ -1796,7 +2218,9 @@ schedule_block (b, rgn_n_insns)
    {
      fprintf (sched_dump, ";;\tReady list (final):  ");
      debug_ready_list (&ready);
-      print_block_visualization ("");
+      if (!targetm.sched.use_dfa_pipeline_interface
+	  || !(*targetm.sched.use_dfa_pipeline_interface) ())
+	print_block_visualization ("");
    }

  /* Sanity check -- queue must be empty now.  Meaningless if region has
@ -1841,6 +2265,10 @@ schedule_block (b, rgn_n_insns)
  current_sched_info->tail = tail;

  free (ready.vec);
+
+  if (targetm.sched.use_dfa_pipeline_interface
+      && (*targetm.sched.use_dfa_pipeline_interface) ())
+    free (ready_try);
 }

 /* Set_priorities: compute priority of each insn in the block.  */
@ -1882,6 +2310,7 @@ sched_init (dump_file)
 {
  int luid, b;
  rtx insn;
+  int i;

  /* Disable speculative loads in their presence if cc0 defined.  */
 #ifdef HAVE_cc0
@ -1909,6 +2338,27 @@ sched_init (dump_file)

  h_i_d = (struct haifa_insn_data *) xcalloc (old_max_uid, sizeof (*h_i_d));

+  for (i = 0; i < old_max_uid; i++)
+    h_i_d [i].cost = -1;
+
+  if (targetm.sched.use_dfa_pipeline_interface
+      && (*targetm.sched.use_dfa_pipeline_interface) ())
+    {
+      if (targetm.sched.init_dfa_pre_cycle_insn)
+	(*targetm.sched.init_dfa_pre_cycle_insn) ();
+      
+      if (targetm.sched.init_dfa_post_cycle_insn)
+	(*targetm.sched.init_dfa_post_cycle_insn) ();
+      
+      if (targetm.sched.first_cycle_multipass_dfa_lookahead
+	  && targetm.sched.init_dfa_bubbles)
+	(*targetm.sched.init_dfa_bubbles) ();
+      
+      dfa_start ();
+      dfa_state_size = state_size ();
+      curr_state = xmalloc (dfa_state_size);
+    }
+
  h_i_d[0].luid = 0;
  luid = 1;
  for (b = 0; b < n_basic_blocks; b++)
@ -1966,8 +2416,10 @@ sched_init (dump_file)
 	}
    }

-  /* Find units used in this function, for visualization.  */
-  if (sched_verbose)
+  if ((!targetm.sched.use_dfa_pipeline_interface
+       || !(*targetm.sched.use_dfa_pipeline_interface) ())
+      && sched_verbose)
+    /* Find units used in this function, for visualization.  */
    init_target_units ();

  /* ??? Add a NOTE after the last insn of the last basic block.  It is not
@ -1997,6 +2449,13 @@ void
 sched_finish ()
 {
  free (h_i_d);
+
+  if (targetm.sched.use_dfa_pipeline_interface
+      && (*targetm.sched.use_dfa_pipeline_interface) ())
+    {
+      free (curr_state);
+      dfa_finish ();
+    }
  free_dependency_caches ();
  end_alias_analysis ();
  if (write_symbols != NO_DEBUG)
--- a/gcc/rtl.def
+++ b/gcc/rtl.def
@ -341,6 +341,147 @@ DEF_RTL_EXPR(SEQUENCE, "sequence", "E", 'x')
 /* Refers to the address of its argument.  This is only used in alias.c.  */
 DEF_RTL_EXPR(ADDRESS, "address", "e", 'm')

+/* ----------------------------------------------------------------------
+   Constructions for CPU pipeline description described by NDFAs.
+   These do not appear in actual rtl code in the compiler.
+   ---------------------------------------------------------------------- */
+
+/* (define_cpu_unit string [string]) describes cpu functional
+   units (separated by comma).
+
+   1st operand: Names of cpu functional units.
+   2nd operand: Name of automaton (see comments for DEFINE_AUTOMATON).
+
+   All define_reservations, define_cpu_units, and
+   define_query_cpu_units should have unique names which may not be
+   "nothing".  */
+DEF_RTL_EXPR(DEFINE_CPU_UNIT, "define_cpu_unit", "sS", 'x')
+
+/* (define_query_cpu_unit string [string]) describes cpu functional
+   units analogously to define_cpu_unit.  If we use automaton without
+   minimization, the reservation of such units can be queried for
+   automaton state.  */
+DEF_RTL_EXPR(DEFINE_QUERY_CPU_UNIT, "define_query_cpu_unit", "sS", 'x')
+
+/* (exclusion_set string string) means that each CPU functional unit
+   in the first string can not be reserved simultaneously with any
+   unit whose name is in the second string and vise versa.  CPU units
+   in the string are separated by commas.  For example, it is useful
+   for description CPU with fully pipelined floating point functional
+   unit which can execute simultaneously only single floating point
+   insns or only double floating point insns.  All CPU functional
+   units in a set should belong the same automaton.  */
+DEF_RTL_EXPR(EXCLUSION_SET, "exclusion_set", "ss", 'x')
+
+/* (presence_set string string) means that each CPU functional unit in
+   the first string can not be reserved unless at least one of units
+   whose names are in the second string is reserved.  This is an
+   asymmetric relation.  CPU units in the string are separated by
+   commas.  For example, it is useful for description that slot1 is
+   reserved after slot0 reservation for VLIW processor.  All CPU
+   functional units in a set should belong the same automaton.  */
+DEF_RTL_EXPR(PRESENCE_SET, "presence_set", "ss", 'x')
+
+/* (absence_set string string) means that each CPU functional unit in
+   the first string can not be reserved only if each unit whose name
+   is in the second string is not reserved.  This is an asymmetric
+   relation (actually exclusion set is analogous to this one but it is
+   symmetric).  CPU units in the string are separated by commas.  For
+   example, it is useful for description that slot0 can not be
+   reserved after slot1 or slot2 reservation for VLIW processor.  All
+   CPU functional units in a set should belong the same automaton.  */
+DEF_RTL_EXPR(ABSENCE_SET, "absence_set", "ss", 'x')
+
+/* (define_bypass number out_insn_names in_insn_names) names bypass
+   with given latency (the first number) from insns given by the first
+   string (see define_insn_reservation) into insns given by the second
+   string.  Insn names in the strings are separated by commas.  The
+   third operand is optional name of function which is additional
+   guard for the bypass.  The function will get the two insns as
+   parameters.  If the function returns zero the bypass will be
+   ignored for this case.  Additional guard is necessary to recognize
+   complicated bypasses, e.g. when consumer is load address.  */
+DEF_RTL_EXPR(DEFINE_BYPASS, "define_bypass", "issS", 'x')
+
+/* (define_automaton string) describes names of automata generated and
+   used for pipeline hazards recognition.  The names are separated by
+   comma.  Actually it is possibly to generate the single automaton
+   but unfortunately it can be very large.  If we use more one
+   automata, the summary size of the automata usually is less than the
+   single one.  The automaton name is used in define_cpu_unit and
+   define_query_cpu_unit.  All automata should have unique names.  */
+DEF_RTL_EXPR(DEFINE_AUTOMATON, "define_automaton", "s", 'x')
+
+/* (automata_option string) describes option for generation of
+   automata.  Currently there are the following options:
+
+   o "no-minimization" which makes no minimization of automata.  This
+     is only worth to do when we are going to query CPU functional
+     unit reservations in an automaton state.
+
+   o "w" which means generation of file describing the result
+     automaton.  The file can be used for the description verification.
+
+   o "ndfa" which makes nondeterministic finite state automata.  */
+DEF_RTL_EXPR(AUTOMATA_OPTION, "automata_option", "s", 'x')
+
+/* (define_reservation string string) names reservation (the first
+   string) of cpu functional units (the 2nd string).  Sometimes unit
+   reservations for different insns contain common parts.  In such
+   case, you can describe common part and use its name (the 1st
+   parameter) in regular expression in define_insn_reservation.  All
+   define_reservations, define_cpu_units, and define_query_cpu_units
+   should have unique names which may not be "nothing".  */
+DEF_RTL_EXPR(DEFINE_RESERVATION, "define_reservation", "ss", 'x')
+
+/* (define_insn_reservation name default_latency condition regexpr)
+   describes reservation of cpu functional units (the 3nd operand) for
+   instruction which is selected by the condition (the 2nd parameter).
+   The first parameter is used for output of debugging information.
+   The reservations are described by a regular expression according
+   the following syntax:
+
+       regexp = regexp "," oneof
+              | oneof
+
+       oneof = oneof "|" allof
+             | allof
+
+       allof = allof "+" repeat
+             | repeat
+ 
+       repeat = element "*" number
+              | element
+
+       element = cpu_function_unit_name
+               | reservation_name
+               | result_name
+               | "nothing"
+               | "(" regexp ")"
+
+       1. "," is used for describing start of the next cycle in
+       reservation.
+
+       2. "|" is used for describing the reservation described by the
+       first regular expression *or* the reservation described by the
+       second regular expression *or* etc.
+
+       3. "+" is used for describing the reservation described by the
+       first regular expression *and* the reservation described by the
+       second regular expression *and* etc.
+
+       4. "*" is used for convinience and simply means sequence in
+       which the regular expression are repeated NUMBER times with
+       cycle advancing (see ",").
+
+       5. cpu functional unit name which means its reservation.
+
+       6. reservation name -- see define_reservation.
+
+       7. string "nothing" means no units reservation.  */
+
+DEF_RTL_EXPR(DEFINE_INSN_RESERVATION, "define_insn_reservation", "sies", 'x')
+
 /* ----------------------------------------------------------------------
   Expressions used for insn attributes.  These also do not appear in
   actual rtl code in the compiler.
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@ -130,11 +130,9 @@ struct rtx_def
  /* 1 in an INSN if it can alter flow of control
     within this function.
     MEM_KEEP_ALIAS_SET_P in a MEM.
-     LINK_COST_ZERO in an INSN_LIST.
     SET_IS_RETURN_P in a SET.  */
  unsigned int jump : 1;
-  /* 1 in an INSN if it can call another function.
-     LINK_COST_FREE in an INSN_LIST.  */
+  /* 1 in an INSN if it can call another function.  */
  unsigned int call : 1;
  /* 1 in a REG if value of this expression will never change during
     the current function, even though it is not manifestly constant.
@ -983,16 +981,6 @@ do {						\
   with the preceding insn.  */
 #define SCHED_GROUP_P(INSN) ((INSN)->in_struct)

-/* During sched, for the LOG_LINKS of an insn, these cache the adjusted
-   cost of the dependence link.  The cost of executing an instruction
-   may vary based on how the results are used.  LINK_COST_ZERO is 1 when
-   the cost through the link varies and is unchanged (i.e., the link has
-   zero additional cost).  LINK_COST_FREE is 1 when the cost through the
-   link is zero (i.e., the link makes the cost free).  In other cases,
-   the adjustment to the cost is recomputed each time it is needed.  */
-#define LINK_COST_ZERO(X) ((X)->jump)
-#define LINK_COST_FREE(X) ((X)->call)
-
 /* For a SET rtx, SET_DEST is the place that is set
   and SET_SRC is the value it is set to.  */
 #define SET_DEST(RTX) XC2EXP(RTX, 0, SET, CLOBBER)
--- a/gcc/sched-int.h
+++ b/gcc/sched-int.h
@ -20,6 +20,9 @@ along with GCC; see the file COPYING.  If not, write to the Free
 Software Foundation, 59 Temple Place - Suite 330, Boston, MA
 02111-1307, USA.  */

+/* Pointer to data describing the current DFA state.  */
+extern state_t curr_state;
+
 /* Forward declaration.  */
 struct ready_list;

@ -184,7 +187,7 @@ struct haifa_insn_data
  int dep_count;

  /* An encoding of the blockage range function.  Both unit and range
-     are coded.  */
+     are coded.  This member is used only for old pipeline interface.  */
  unsigned int blockage;

  /* Number of instructions referring to this insn.  */
@ -196,7 +199,8 @@ struct haifa_insn_data

  short cost;

-  /* An encoding of the function units used.  */
+  /* An encoding of the function units used.  This member is used only
+     for old pipeline interface.  */
  short units;

  /* This weight is an estimation of the insn's contribution to
--- a/gcc/sched-rgn.c
+++ b/gcc/sched-rgn.c
@ -62,6 +62,7 @@ Software Foundation, 59 Temple Place - Suite 330, Boston, MA
 #include "recog.h"
 #include "cfglayout.h"
 #include "sched-int.h"
+#include "target.h"

 /* Define when we want to do count REG_DEAD notes before and after scheduling
   for sanity checking.  We can't do that when conditional execution is used,
@ -2057,7 +2058,14 @@ init_ready_list (ready)

 	    if (!CANT_MOVE (insn)
 		&& (!IS_SPECULATIVE_INSN (insn)
-		    || (insn_issue_delay (insn) <= 3
+		    || ((((!targetm.sched.use_dfa_pipeline_interface
+			   || !(*targetm.sched.use_dfa_pipeline_interface) ())
+			  && insn_issue_delay (insn) <= 3)
+			 || (targetm.sched.use_dfa_pipeline_interface
+			     && (*targetm.sched.use_dfa_pipeline_interface) ()
+			     && (recog_memoized (insn) < 0
+			         || min_insn_conflict_delay (curr_state,
+							     insn, insn) <= 3)))
 			&& check_live (insn, bb_src)
 			&& is_exception_free (insn, bb_src, target_bb))))
 	      {
@ -2165,7 +2173,15 @@ new_ready (next)
      && (!IS_VALID (INSN_BB (next))
 	  || CANT_MOVE (next)
 	  || (IS_SPECULATIVE_INSN (next)
-	      && (insn_issue_delay (next) > 3
+	      && (0
+		  || (targetm.sched.use_dfa_pipeline_interface
+		      && (*targetm.sched.use_dfa_pipeline_interface) ()
+		      && recog_memoized (next) >= 0
+		      && min_insn_conflict_delay (curr_state, next,
+						  next) > 3)
+		  || ((!targetm.sched.use_dfa_pipeline_interface
+		       || !(*targetm.sched.use_dfa_pipeline_interface) ())
+		      && insn_issue_delay (next) > 3)
 		  || !check_live (next, INSN_BB (next))
 		  || !is_exception_free (next, INSN_BB (next), target_bb)))))
    return 0;
@ -2589,14 +2605,27 @@ debug_dependencies ()
 	  fprintf (sched_dump, "\n;;   --- Region Dependences --- b %d bb %d \n",
 		   BB_TO_BLOCK (bb), bb);

-	  fprintf (sched_dump, ";;   %7s%6s%6s%6s%6s%6s%11s%6s\n",
-	  "insn", "code", "bb", "dep", "prio", "cost", "blockage", "units");
-	  fprintf (sched_dump, ";;   %7s%6s%6s%6s%6s%6s%11s%6s\n",
-	  "----", "----", "--", "---", "----", "----", "--------", "-----");
+	  if (targetm.sched.use_dfa_pipeline_interface
+	      && (*targetm.sched.use_dfa_pipeline_interface) ())
+	    {
+	      fprintf (sched_dump, ";;   %7s%6s%6s%6s%6s%6s%14s\n",
+		       "insn", "code", "bb", "dep", "prio", "cost",
+		       "reservation");
+	      fprintf (sched_dump, ";;   %7s%6s%6s%6s%6s%6s%14s\n",
+		       "----", "----", "--", "---", "----", "----",
+		       "-----------");
+	    }
+	  else
+	    {
+	      fprintf (sched_dump, ";;   %7s%6s%6s%6s%6s%6s%11s%6s\n",
+	      "insn", "code", "bb", "dep", "prio", "cost", "blockage", "units");
+	      fprintf (sched_dump, ";;   %7s%6s%6s%6s%6s%6s%11s%6s\n",
+	      "----", "----", "--", "---", "----", "----", "--------", "-----");
+	    }
+
 	  for (insn = head; insn != next_tail; insn = NEXT_INSN (insn))
 	    {
 	      rtx link;
-	      int unit, range;

 	      if (! INSN_P (insn))
 		{
@ -2616,22 +2645,46 @@ debug_dependencies ()
 		  continue;
 		}

-	      unit = insn_unit (insn);
-	      range = (unit < 0
-		 || function_units[unit].blockage_range_function == 0) ? 0 :
-		function_units[unit].blockage_range_function (insn);
-	      fprintf (sched_dump,
-		       ";;   %s%5d%6d%6d%6d%6d%6d  %3d -%3d   ",
-		       (SCHED_GROUP_P (insn) ? "+" : " "),
-		       INSN_UID (insn),
-		       INSN_CODE (insn),
-		       INSN_BB (insn),
-		       INSN_DEP_COUNT (insn),
-		       INSN_PRIORITY (insn),
-		       insn_cost (insn, 0, 0),
-		       (int) MIN_BLOCKAGE_COST (range),
-		       (int) MAX_BLOCKAGE_COST (range));
-	      insn_print_units (insn);
+	      if (targetm.sched.use_dfa_pipeline_interface
+		  && (*targetm.sched.use_dfa_pipeline_interface) ())
+		{
+		  fprintf (sched_dump,
+			   ";;   %s%5d%6d%6d%6d%6d%6d   ",
+			   (SCHED_GROUP_P (insn) ? "+" : " "),
+			   INSN_UID (insn),
+			   INSN_CODE (insn),
+			   INSN_BB (insn),
+			   INSN_DEP_COUNT (insn),
+			   INSN_PRIORITY (insn),
+			   insn_cost (insn, 0, 0));
+		  
+		  if (recog_memoized (insn) < 0)
+		    fprintf (sched_dump, "nothing");
+		  else
+		    print_reservation (sched_dump, insn);
+		}
+	      else
+		{
+		  int unit = insn_unit (insn);
+		  int range
+		    = (unit < 0
+		       || function_units[unit].blockage_range_function == 0
+		       ? 0
+		       : function_units[unit].blockage_range_function (insn));
+		  fprintf (sched_dump,
+			   ";;   %s%5d%6d%6d%6d%6d%6d  %3d -%3d   ",
+			   (SCHED_GROUP_P (insn) ? "+" : " "),
+			   INSN_UID (insn),
+			   INSN_CODE (insn),
+			   INSN_BB (insn),
+			   INSN_DEP_COUNT (insn),
+			   INSN_PRIORITY (insn),
+			   insn_cost (insn, 0, 0),
+			   (int) MIN_BLOCKAGE_COST (range),
+			   (int) MAX_BLOCKAGE_COST (range));
+		  insn_print_units (insn);
+		}
+
 	      fprintf (sched_dump, "\t: ");
 	      for (link = INSN_DEPEND (insn); link; link = XEXP (link, 1))
 		fprintf (sched_dump, "%d ", INSN_UID (XEXP (link, 0)));
--- a/gcc/sched-vis.c
+++ b/gcc/sched-vis.c
@ -31,6 +31,7 @@ Software Foundation, 59 Temple Place - Suite 330, Boston, MA
 #include "basic-block.h"
 #include "insn-attr.h"
 #include "sched-int.h"
+#include "target.h"

 #ifdef INSN_SCHEDULING
 /* target_units bitmask has 1 for each unit in the cpu.  It should be
@ -38,7 +39,8 @@ Software Foundation, 59 Temple Place - Suite 330, Boston, MA
   But currently it is computed by examining the insn list.  Since
   this is only needed for visualization, it seems an acceptable
   solution.  (For understanding the mapping of bits to units, see
-   definition of function_units[] in "insn-attrtab.c".)  */
+   definition of function_units[] in "insn-attrtab.c".)  The scheduler
+   using only DFA description should never use the following variable.  */

 static int target_units = 0;

@ -122,6 +124,14 @@ get_visual_tbl_length ()
  int n, n1;
  char *s;

+  if (targetm.sched.use_dfa_pipeline_interface
+      && (*targetm.sched.use_dfa_pipeline_interface) ())
+    {
+      visual_tbl_line_length = 1;
+      return 1; /* Can't return 0 because that will cause problems
+                   with alloca.  */
+    }
+
  /* Compute length of one field in line.  */
  s = (char *) alloca (INSN_LEN + 6);
  sprintf (s, "  %33s", "uname");
@ -815,7 +825,8 @@ print_insn (buf, x, verbose)
    }
 }				/* print_insn */

-/* Print visualization debugging info.  */
+/* Print visualization debugging info.  The scheduler using only DFA
+   description should never use the following function.  */

 void
 print_block_visualization (s)
--- a/gcc/target-def.h
+++ b/gcc/target-def.h
@ -145,16 +145,33 @@ Foundation, 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
 #define TARGET_SCHED_REORDER 0
 #define TARGET_SCHED_REORDER2 0
 #define TARGET_SCHED_CYCLE_DISPLAY 0
+#define TARGET_SCHED_USE_DFA_PIPELINE_INTERFACE 0
+#define TARGET_SCHED_INIT_DFA_PRE_CYCLE_INSN 0
+#define TARGET_SCHED_DFA_PRE_CYCLE_INSN 0
+#define TARGET_SCHED_INIT_DFA_POST_CYCLE_INSN 0
+#define TARGET_SCHED_DFA_POST_CYCLE_INSN 0
+#define TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD 0
+#define TARGET_SCHED_INIT_DFA_BUBBLES 0
+#define TARGET_SCHED_DFA_BUBBLE 0

-#define TARGET_SCHED	{TARGET_SCHED_ADJUST_COST,	\
-			 TARGET_SCHED_ADJUST_PRIORITY,	\
-			 TARGET_SCHED_ISSUE_RATE,	\
-			 TARGET_SCHED_VARIABLE_ISSUE,	\
-			 TARGET_SCHED_INIT,		\
-			 TARGET_SCHED_FINISH,		\
-			 TARGET_SCHED_REORDER,		\
-			 TARGET_SCHED_REORDER2,		\
-			 TARGET_SCHED_CYCLE_DISPLAY}
+#define TARGET_SCHED						\
+  {TARGET_SCHED_ADJUST_COST,					\
+   TARGET_SCHED_ADJUST_PRIORITY,				\
+   TARGET_SCHED_ISSUE_RATE,					\
+   TARGET_SCHED_VARIABLE_ISSUE,					\
+   TARGET_SCHED_INIT,						\
+   TARGET_SCHED_FINISH,						\
+   TARGET_SCHED_REORDER,					\
+   TARGET_SCHED_REORDER2,					\
+   TARGET_SCHED_CYCLE_DISPLAY,					\
+   TARGET_SCHED_USE_DFA_PIPELINE_INTERFACE,			\
+   TARGET_SCHED_INIT_DFA_PRE_CYCLE_INSN,			\
+   TARGET_SCHED_DFA_PRE_CYCLE_INSN,				\
+   TARGET_SCHED_INIT_DFA_POST_CYCLE_INSN,			\
+   TARGET_SCHED_DFA_POST_CYCLE_INSN,				\
+   TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD,		\
+   TARGET_SCHED_INIT_DFA_BUBBLES,				\
+   TARGET_SCHED_DFA_BUBBLE}

 /* All in tree.c.  */
 #define TARGET_MERGE_DECL_ATTRIBUTES merge_decl_attributes
--- a/gcc/target.h
+++ b/gcc/target.h
@ -136,6 +136,47 @@ struct gcc_target
       insn in the new chain we're building.  Returns a new LAST.
       The default is to do nothing.  */
    rtx (* cycle_display) PARAMS ((int clock, rtx last));
+    /* The following member value is a pointer to a function returning
+       nonzero if we should use DFA based scheduling.  The default is
+       to use the old pipeline scheduler.  */
+    int (* use_dfa_pipeline_interface) PARAMS ((void));
+    /* The values of all the following members are used only for the
+       DFA based scheduler: */
+    /* The values of the following four members are pointers to
+       functions used to simplify the automaton descriptions.
+       dfa_pre_cycle_insn and dfa_post_cycle_insn give functions
+       returning insns which are used to change the pipeline hazard
+       recognizer state when the new simulated processor cycle
+       correspondingly starts and finishes.  The function defined by
+       init_dfa_pre_cycle_insn and init_dfa_post_cycle_insn are used
+       to initialize the corresponding insns.  The default values of
+       the memebers result in not changing the automaton state when
+       the new simulated processor cycle correspondingly starts and
+       finishes.  */
+    void (* init_dfa_pre_cycle_insn) PARAMS ((void));
+    rtx (* dfa_pre_cycle_insn) PARAMS ((void));
+    void (* init_dfa_post_cycle_insn) PARAMS ((void));
+    rtx (* dfa_post_cycle_insn) PARAMS ((void));
+    /* The following member value is a pointer to a function returning value
+       which defines how many insns in queue `ready' will we try for
+       multi-pass scheduling.  if the member value is nonzero and the
+       function returns positive value, the DFA based scheduler will make
+       multi-pass scheduling for the first cycle.  In other words, we will
+       try to choose ready insn which permits to start maximum number of
+       insns on the same cycle.  */
+    int (* first_cycle_multipass_dfa_lookahead) PARAMS ((void));
+    /* The values of the following members are pointers to functions
+       used to improve the first cycle multipass scheduling by
+       inserting nop insns.  dfa_scheduler_bubble gives a function
+       returning a nop insn with given index.  The indexes start with
+       zero.  The function should return NULL if there are no more nop
+       insns with indexes greater than given index.  To initialize the
+       nop insn the function given by member
+       init_dfa_scheduler_bubbles is used.  The default values of the
+       members result in not inserting nop insns during the multipass
+       scheduling.  */
+    void (* init_dfa_bubbles) PARAMS ((void));
+    rtx (* dfa_bubble) PARAMS ((int));
  } sched;

  /* Given two decls, merge their attributes and return the result.  */