extend.texi: Add fvect-cost-model flag.

gcc/ChangeLog: 2007-06-08 Harsha Jagasia <harsha.jagasia@amd.com> Tony Linthicum <tony.linthicum@amd.com> * doc/extend.texi: Add fvect-cost-model flag. * common.opt (fvect-cost-model): New flag. * tree-vectorizer.c (new_stmt_vec_info): Initialize inside and outside cost fields in stmt_vec_info struct for STMT. * tree-vectorizer.h (stmt_vec_info): Define inside and outside cost fields in stmt_vec_info struct and access functions for the same. (TARG_COND_BRANCH_COST): Define cost of conditional branch. (TARG_VEC_STMT_COST): Define cost of any vector operation, excluding load, store and vector to scalar operation. (TARG_VEC_TO_SCALAR_COST): Define cost of vector to scalar operation. (TARG_VEC_LOAD_COST): Define cost of aligned vector load. (TARG_VEC_UNALIGNED_LOAD_COST): Define cost of misasligned vector load. (TARG_VEC_STORE_COST): Define cost of vector store. (vect_estimate_min_profitable_iters): Define new function. * tree-vect-analyze.c (vect_analyze_operations): Add a compile-time check to evaluate if loop iterations are less than minimum profitable iterations determined by cost model or minimum vect loop bound defined by user, whichever is more conservative. * tree-vect-transform.c (vect_do_peeling_for_loop_bound): Add a run-time check to evaluate if loop iterations are less than minimum profitable iterations determined by cost model or minimum vect loop bound defined by user, whichever is more conservative. (vect_estimate_min_profitable_iterations): New function to estimate mimimimum iterartions required for vector version of loop to be profitable over scalar version. (vect_model_reduction_cost): New function. (vect_model_induction_cost): New function. (vect_model_simple_cost): New function. (vect_cost_strided_group_size): New function. (vect_model_store_cost): New function. (vect_model_load_cost): New function. (vectorizable_reduction): Call vect_model_reduction_cost during analysis phase. (vectorizable_induction): Call vect_model_induction_cost during analysis phase. (vectorizable_load): Call vect_model_load_cost during analysis phase. (vectorizable_store): Call vect_model_store_cost during analysis phase. (vectorizable_call, vectorizable_assignment, vectorizable_operation, vectorizable_promotion, vectorizable_demotion): Call vect_model_simple_cost during analysis phase. gcc/testsuite/ChangeLog: 2007-06-08 Harsha Jagasia <harsha.jagasia@amd.com> * gcc.dg/vect/costmodel: New directory. * gcc.dg/vect/costmodel/i386: New directory. * gcc.dg/vect/costmodel/i386/i386-costmodel-vect.exp: New testsuite. * gcc.dg/vect/costmodel/i386/costmodel-fast-math-vect-pr29925.c: New test. * gcc.dg/vect/costmodel/i386/costmodel-vect-31.c: New test. * gcc.dg/vect/costmodel/i386/costmodel-vect-33.c: New test. * gcc.dg/vect/costmodel/i386/costmodel-vect-68.c: New test. * gcc.dg/vect/costmodel/i386/costmodel-vect-reduc-1char.c: New test. * gcc.dg/vect/costmodel/x86_64: New directory. * gcc.dg/vect/costmodel/x86_64/x86_64-costmodel-vect.exp: New testsuite. * gcc.dg/vect/costmodel/x86_64/costmodel-fast-math-vect-pr29925.c: New test. * gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c: New test. * gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c: New test. * gcc.dg/vect/costmodel/x86_64/costmodel-vect-68.c: New test. * gcc.dg/vect/costmodel/x86_64/costmodel-vect-reduc-1char.c: New test. * gcc.dg/vect/costmodel/x86_64/costmodel-pr30843.c: New test. Co-Authored-By: Tony Linthicum <tony.linthicum@amd.com> From-SVN: r125575
2007-06-08 16:30:49 +00:00 · 2007-06-08 16:30:49 +00:00 · 792ed98bb7
commit 792ed98bb7
parent c8e2516ccf
21 changed files with 1486 additions and 21 deletions
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@ -1,3 +1,47 @@
+2007-06-08  Harsha Jagasia <harsha.jagasia@amd.com>
+            Tony Linthicum <tony.linthicum@amd.com>
+
+	* doc/extend.texi: Add fvect-cost-model flag.
+	* common.opt (fvect-cost-model): New flag.
+	* tree-vectorizer.c (new_stmt_vec_info): Initialize inside and outside
+	cost fields in stmt_vec_info struct for STMT.
+	* tree-vectorizer.h (stmt_vec_info): Define inside and outside cost
+	fields in stmt_vec_info struct and access functions for the same.
+	(TARG_COND_BRANCH_COST): Define cost of conditional branch.
+	(TARG_VEC_STMT_COST): Define cost of any vector operation, excluding
+	load, store and vector to scalar operation.
+	(TARG_VEC_TO_SCALAR_COST): Define cost of vector to scalar operation.
+	(TARG_VEC_LOAD_COST): Define cost of aligned vector load.
+	(TARG_VEC_UNALIGNED_LOAD_COST): Define cost of misasligned vector load.
+	(TARG_VEC_STORE_COST): Define cost of vector store.
+	(vect_estimate_min_profitable_iters): Define new function.
+	* tree-vect-analyze.c (vect_analyze_operations): Add a compile-time
+	check to evaluate if loop iterations are less than minimum profitable
+	iterations determined by cost model or minimum vect loop bound defined
+	by user, whichever is more conservative.
+	* tree-vect-transform.c (vect_do_peeling_for_loop_bound): Add a
+	run-time check to evaluate if loop iterations are less than minimum
+	profitable iterations determined by cost model or minimum vect loop
+	bound defined by user, whichever is more conservative.
+	(vect_estimate_min_profitable_iterations): New function to estimate
+	mimimimum iterartions required for vector version of loop to be
+	profitable over scalar version.
+        (vect_model_reduction_cost): New function.
+	(vect_model_induction_cost): New function.
+	(vect_model_simple_cost): New function.
+	(vect_cost_strided_group_size): New function.
+	(vect_model_store_cost): New function.
+	(vect_model_load_cost): New function.
+	(vectorizable_reduction): Call vect_model_reduction_cost during
+	analysis phase.
+	(vectorizable_induction): Call vect_model_induction_cost during
+	analysis phase.
+	(vectorizable_load): Call vect_model_load_cost during analysis phase.
+	(vectorizable_store): Call vect_model_store_cost during analysis phase.
+	(vectorizable_call, vectorizable_assignment, vectorizable_operation,
+	vectorizable_promotion, vectorizable_demotion): Call 
+	vect_model_simple_cost during analysis phase.
+
 2007-06-08  Simon Baldwin  <simonb@google.com>

 	* reg-stack.c (get_true_reg): Readability change.  Moved default case
--- a/gcc/common.opt
+++ b/gcc/common.opt
@ -1110,6 +1110,10 @@ ftree-vectorize
 Common Report Var(flag_tree_vectorize) Optimization
 Enable loop vectorization on trees

+fvect-cost-model
+Common Report Var(flag_vect_cost_model) Optimization
+Enable use of cost model in vectorization
+
 ftree-vect-loop-version
 Common Report Var(flag_tree_vect_loop_version) Init(1) Optimization
 Enable loop versioning when doing loop vectorization on trees
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@ -357,7 +357,7 @@ Objective-C and Objective-C++ Dialects}.
 -fcheck-data-deps @gol
 -ftree-dominator-opts -ftree-dse -ftree-copyrename -ftree-sink @gol
 -ftree-ch -ftree-sra -ftree-ter -ftree-fre -ftree-vectorize @gol
-ftree-vect-loop-version -ftree-salias -fipa-pta -fweb @gol
+-ftree-vect-loop-version -fvect-cost-model -ftree-salias -fipa-pta -fweb @gol
 -ftree-copy-prop -ftree-store-ccp -ftree-store-copy-prop -fwhole-program @gol
 --param @var{name}=@var{value}
 -O  -O0  -O1  -O2  -O3  -Os}
@ -5666,6 +5666,9 @@ the loop are generated along with runtime checks for alignment or dependence
 to control which version is executed.  This option is enabled by default
 except at level @option{-Os} where it is disabled.

+@item -fvect-cost-model
+Enable cost model for vectorization.
+
@item -ftree-vrp
 Perform Value Range Propagation on trees.  This is similar to the
 constant propagation pass, but instead of values, ranges of values are
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@ -1,3 +1,25 @@
+2007-06-08  Harsha Jagasia <harsha.jagasia@amd.com>
+
+	* gcc.dg/vect/costmodel: New directory.
+	* gcc.dg/vect/costmodel/i386: New directory.
+	* gcc.dg/vect/costmodel/i386/i386-costmodel-vect.exp: New testsuite.
+	* gcc.dg/vect/costmodel/i386/costmodel-fast-math-vect-pr29925.c:
+	New test.
+	* gcc.dg/vect/costmodel/i386/costmodel-vect-31.c: New test.
+	* gcc.dg/vect/costmodel/i386/costmodel-vect-33.c: New test.
+	* gcc.dg/vect/costmodel/i386/costmodel-vect-68.c: New test.
+	* gcc.dg/vect/costmodel/i386/costmodel-vect-reduc-1char.c: New test.
+	* gcc.dg/vect/costmodel/x86_64: New directory.
+	* gcc.dg/vect/costmodel/x86_64/x86_64-costmodel-vect.exp:
+	New testsuite.	
+	* gcc.dg/vect/costmodel/x86_64/costmodel-fast-math-vect-pr29925.c:
+	New test.
+	* gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c: New test.
+	* gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c: New test.
+	* gcc.dg/vect/costmodel/x86_64/costmodel-vect-68.c: New test.
+	* gcc.dg/vect/costmodel/x86_64/costmodel-vect-reduc-1char.c: New test.
+	* gcc.dg/vect/costmodel/x86_64/costmodel-pr30843.c: New test.
+
 2007-06-08  Uros Bizjak  <ubizjak@gmail.com>

 	PR tree-optimization/32243
--- a/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-fast-math-vect-pr29925.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-fast-math-vect-pr29925.c
@ -0,0 +1,39 @@
+/* { dg-require-effective-target vect_float } */
+
+#include <stdlib.h>
+#include "../../tree-vect.h"
+
+void interp_pitch(float *exc, float *interp, int pitch, int len)
+{
+   int i,k;
+   int maxj;
+
+   maxj=3;
+   for (i=0;i<len;i++)
+   {
+      float tmp = 0;
+      for (k=0;k<7;k++)
+      {
+         tmp += exc[i-pitch+k+maxj-6];
+      }
+      interp[i] = tmp;
+   }
+}
+
+int main()
+{
+   float *exc = calloc(126,sizeof(float));
+   float *interp = calloc(80,sizeof(float));
+   int pitch = -35;
+
+   check_vect ();
+
+   interp_pitch(exc, interp, pitch, 80);
+   free(exc);
+   free(interp);
+   return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
--- a/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c
@ -0,0 +1,91 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "../../tree-vect.h"
+
+#define N 32
+
+struct t{
+  int k[N];
+  int l; 
+};
+  
+struct s{
+  char a;	/* aligned */
+  char b[N-1];  /* unaligned (offset 1B) */
+  char c[N];    /* aligned (offset NB) */
+  struct t d;   /* aligned (offset 2NB) */
+  struct t e;   /* unaligned (offset 2N+4N+4 B) */
+};
+ 
+int main1 ()
+{  
+  int i;
+  struct s tmp;
+
+  /* unaligned */
+  for (i = 0; i < N/2; i++)
+    {
+      tmp.b[i] = 5;
+    }
+
+  /* check results:  */
+  for (i = 0; i <N/2; i++)
+    {
+      if (tmp.b[i] != 5)
+        abort ();
+    }
+
+  /* aligned */
+  for (i = 0; i < N/2; i++)
+    {
+      tmp.c[i] = 6;
+    }
+
+  /* check results:  */
+  for (i = 0; i <N/2; i++)
+    {
+      if (tmp.c[i] != 6)
+        abort ();
+    }
+
+  /* aligned */
+  for (i = 0; i < N/2; i++)
+    {
+      tmp.d.k[i] = 7;
+    }
+
+  /* check results:  */
+  for (i = 0; i <N/2; i++)
+    {
+      if (tmp.d.k[i] != 7)
+        abort ();
+    }
+
+  /* unaligned */
+  for (i = 0; i < N/2; i++)
+    {
+      tmp.e.k[i] = 8;
+    }
+
+  /* check results:  */
+  for (i = 0; i <N/2; i++)
+    {
+      if (tmp.e.k[i] != 8)
+        abort ();
+    }
+
+  return 0;
+}
+
+int main (void)
+{ 
+  check_vect ();
+  
+  return main1 ();
+} 
+
+/* { dg-final { scan-tree-dump-times "vectorization not profitable" 2 "vect" } }
+ */
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
--- a/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-33.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-33.c
@ -0,0 +1,39 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "../../tree-vect.h"
+
+#define N 16
+struct test {
+  char ca[N];
+};
+
+extern struct test s;
+ 
+int main1 ()
+{  
+  int i;
+
+  for (i = 0; i < N; i++)
+    {
+      s.ca[i] = 5;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (s.ca[i] != 5)
+        abort ();
+    }
+
+  return 0;
+}
+
+int main (void)
+{ 
+  return main1 ();
+} 
+
+/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
--- a/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-68.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-68.c
@ -0,0 +1,89 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "../../tree-vect.h"
+
+#define N 32
+
+struct s{
+  int m;
+  int n[N][N][N];
+};
+
+struct test1{
+  struct s a; /* array a.n is unaligned */
+  int b;
+  int c;
+  struct s e; /* array e.n is aligned */
+};
+
+int main1 ()
+{  
+  int i,j;
+  struct test1 tmp1;
+
+  /* 1. unaligned */
+  for (i = 0; i < N; i++)
+    {
+      tmp1.a.n[1][2][i] = 5;
+    }
+
+  /* check results:  */
+  for (i = 0; i <N; i++)
+    {
+      if (tmp1.a.n[1][2][i] != 5)
+        abort ();
+    }
+
+  /* 2. aligned */
+  for (i = 3; i < N-1; i++)
+    {
+      tmp1.a.n[1][2][i] = 6;
+    }
+
+  /* check results:  */
+  for (i = 3; i < N-1; i++)
+    {
+      if (tmp1.a.n[1][2][i] != 6)
+        abort ();
+    }
+
+  /* 3. aligned */
+  for (i = 0; i < N; i++)
+    {
+      tmp1.e.n[1][2][i] = 7;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (tmp1.e.n[1][2][i] != 7)
+        abort ();
+    }
+
+  /* 4. unaligned */
+  for (i = 3; i < N-3; i++)
+    {
+      tmp1.e.n[1][2][i] = 8;
+    }
+ 
+  /* check results:  */
+  for (i = 3; i <N-3; i++)
+    {
+      if (tmp1.e.n[1][2][i] != 8)
+        abort ();
+    }
+
+  return 0;
+}
+
+int main (void)
+{ 
+  check_vect ();
+  
+  return main1 ();
+} 
+
+/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
--- a/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-reduc-1char.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-reduc-1char.c
@ -0,0 +1,51 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "../../tree-vect.h"
+
+#define N 16
+#define DIFF 242
+
+void
+main1 (unsigned char x, unsigned char max_result, unsigned char min_result)
+{
+  int i;
+  unsigned char ub[N] = {1,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+  unsigned char uc[N] = {1,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+  unsigned char udiff = 2;
+  unsigned char umax = x;
+  unsigned char umin = x;
+
+  for (i = 0; i < N; i++) {
+    udiff += (unsigned char)(ub[i] - uc[i]);
+  }
+
+  for (i = 0; i < N; i++) {
+    umax = umax < uc[i] ? uc[i] : umax;
+  }
+
+  for (i = 0; i < N; i++) {
+    umin = umin > uc[i] ? uc[i] : umin;
+  }
+
+  /* check results:  */
+  if (udiff != DIFF)
+    abort ();
+  if (umax != max_result)
+    abort ();
+  if (umin != min_result)
+    abort ();
+}
+
+int main (void)
+{ 
+  check_vect ();
+  
+  main1 (100, 100, 1);
+  main1 (0, 15, 0);
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail vect_no_int_max } } } */
+/* { dg-final { scan-tree-dump-times "vectorization not profitable" 2 "vect" { xfail vect_no_int_max } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
--- a/gcc/testsuite/gcc.dg/vect/costmodel/i386/i386-costmodel-vect.exp
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/i386/i386-costmodel-vect.exp
@ -0,0 +1,67 @@
+# Copyright (C) 1997, 2004, 2005, 2006 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+# 
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+# 
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.  
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# Exit immediately if this isn't a x86 target.
+if { ![istarget i?86*-*-*] && ![istarget x86_64-*-*] } then {
+  return
+}
+
+# Set up flags used for tests that don't specify options.
+set DEFAULT_VECTCFLAGS ""
+
+# These flags are used for all targets.
+lappend DEFAULT_VECTCFLAGS "-O2" "-ftree-vectorize" "-fvect-cost-model"
+
+# If the target system supports vector instructions, the default action
+# for a test is 'run', otherwise it's 'compile'.  Save current default.
+# Executing vector instructions on a system without hardware vector support
+# is also disabled by a call to check_vect, but disabling execution here is
+# more efficient.
+global dg-do-what-default
+set save-dg-do-what-default ${dg-do-what-default}
+
+lappend DEFAULT_VECTCFLAGS "-msse2"
+set dg-do-what-default run
+
+# Initialize `dg'.
+dg-init
+
+lappend DEFAULT_VECTCFLAGS "-fdump-tree-vect-details"
+
+# Main loop.
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/costmodel-vect-*.\[cS\]]]  \
+	"" $DEFAULT_VECTCFLAGS
+
+#### Tests with special options
+global SAVED_DEFAULT_VECTCFLAGS
+set SAVED_DEFAULT_VECTCFLAGS $DEFAULT_VECTCFLAGS
+
+# -ffast-math tests
+set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
+lappend DEFAULT_VECTCFLAGS "-ffast-math"
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/costmodel-fast-math-vect*.\[cS\]]]  \
+	"" $DEFAULT_VECTCFLAGS
+
+# Clean up.
+set dg-do-what-default ${save-dg-do-what-default}
+
+# All done.
+dg-finish
--- a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-fast-math-vect-pr29925.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-fast-math-vect-pr29925.c
@ -0,0 +1,39 @@
+/* { dg-require-effective-target vect_float } */
+
+#include <stdlib.h>
+#include "../../tree-vect.h"
+
+void interp_pitch(float *exc, float *interp, int pitch, int len)
+{
+   int i,k;
+   int maxj;
+
+   maxj=3;
+   for (i=0;i<len;i++)
+   {
+      float tmp = 0;
+      for (k=0;k<7;k++)
+      {
+         tmp += exc[i-pitch+k+maxj-6];
+      }
+      interp[i] = tmp;
+   }
+}
+
+int main()
+{
+   float *exc = calloc(126,sizeof(float));
+   float *interp = calloc(80,sizeof(float));
+   int pitch = -35;
+
+   check_vect ();
+
+   interp_pitch(exc, interp, pitch, 80);
+   free(exc);
+   free(interp);
+   return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
--- a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr30843.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr30843.c
@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_long } */
+
+#include <stdarg.h>
+#include "../../tree-vect.h"
+
+#define N 16
+
+void dacP98FillRGBMap (unsigned char *pBuffer)
+{
+    unsigned long dw, dw1;
+    unsigned long *pdw = (unsigned long *)(pBuffer);
+
+    for( dw = 256, dw1 = 0; dw; dw--, dw1 += 0x01010101) 
+    {
+       *pdw++ = dw1;
+       *pdw++ = dw1;
+       *pdw++ = dw1;
+       *pdw++ = dw1;
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" { target vect_interleave
+} } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
--- a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c
@ -0,0 +1,91 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "../../tree-vect.h"
+
+#define N 32
+
+struct t{
+  int k[N];
+  int l; 
+};
+  
+struct s{
+  char a;	/* aligned */
+  char b[N-1];  /* unaligned (offset 1B) */
+  char c[N];    /* aligned (offset NB) */
+  struct t d;   /* aligned (offset 2NB) */
+  struct t e;   /* unaligned (offset 2N+4N+4 B) */
+};
+ 
+int main1 ()
+{  
+  int i;
+  struct s tmp;
+
+  /* unaligned */
+  for (i = 0; i < N/2; i++)
+    {
+      tmp.b[i] = 5;
+    }
+
+  /* check results:  */
+  for (i = 0; i <N/2; i++)
+    {
+      if (tmp.b[i] != 5)
+        abort ();
+    }
+
+  /* aligned */
+  for (i = 0; i < N/2; i++)
+    {
+      tmp.c[i] = 6;
+    }
+
+  /* check results:  */
+  for (i = 0; i <N/2; i++)
+    {
+      if (tmp.c[i] != 6)
+        abort ();
+    }
+
+  /* aligned */
+  for (i = 0; i < N/2; i++)
+    {
+      tmp.d.k[i] = 7;
+    }
+
+  /* check results:  */
+  for (i = 0; i <N/2; i++)
+    {
+      if (tmp.d.k[i] != 7)
+        abort ();
+    }
+
+  /* unaligned */
+  for (i = 0; i < N/2; i++)
+    {
+      tmp.e.k[i] = 8;
+    }
+
+  /* check results:  */
+  for (i = 0; i <N/2; i++)
+    {
+      if (tmp.e.k[i] != 8)
+        abort ();
+    }
+
+  return 0;
+}
+
+int main (void)
+{ 
+  check_vect ();
+  
+  return main1 ();
+} 
+
+/* { dg-final { scan-tree-dump-times "vectorization not profitable" 2 "vect" } }
+ */
+/* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
--- a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-33.c
@ -0,0 +1,39 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "../../tree-vect.h"
+
+#define N 16
+struct test {
+  char ca[N];
+};
+
+extern struct test s;
+ 
+int main1 ()
+{  
+  int i;
+
+  for (i = 0; i < N; i++)
+    {
+      s.ca[i] = 5;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (s.ca[i] != 5)
+        abort ();
+    }
+
+  return 0;
+}
+
+int main (void)
+{ 
+  return main1 ();
+} 
+
+/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
--- a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-68.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-68.c
@ -0,0 +1,89 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "../../tree-vect.h"
+
+#define N 32
+
+struct s{
+  int m;
+  int n[N][N][N];
+};
+
+struct test1{
+  struct s a; /* array a.n is unaligned */
+  int b;
+  int c;
+  struct s e; /* array e.n is aligned */
+};
+
+int main1 ()
+{  
+  int i,j;
+  struct test1 tmp1;
+
+  /* 1. unaligned */
+  for (i = 0; i < N; i++)
+    {
+      tmp1.a.n[1][2][i] = 5;
+    }
+
+  /* check results:  */
+  for (i = 0; i <N; i++)
+    {
+      if (tmp1.a.n[1][2][i] != 5)
+        abort ();
+    }
+
+  /* 2. aligned */
+  for (i = 3; i < N-1; i++)
+    {
+      tmp1.a.n[1][2][i] = 6;
+    }
+
+  /* check results:  */
+  for (i = 3; i < N-1; i++)
+    {
+      if (tmp1.a.n[1][2][i] != 6)
+        abort ();
+    }
+
+  /* 3. aligned */
+  for (i = 0; i < N; i++)
+    {
+      tmp1.e.n[1][2][i] = 7;
+    }
+
+  /* check results:  */
+  for (i = 0; i < N; i++)
+    {
+      if (tmp1.e.n[1][2][i] != 7)
+        abort ();
+    }
+
+  /* 4. unaligned */
+  for (i = 3; i < N-3; i++)
+    {
+      tmp1.e.n[1][2][i] = 8;
+    }
+ 
+  /* check results:  */
+  for (i = 3; i <N-3; i++)
+    {
+      if (tmp1.e.n[1][2][i] != 8)
+        abort ();
+    }
+
+  return 0;
+}
+
+int main (void)
+{ 
+  check_vect ();
+  
+  return main1 ();
+} 
+
+/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
--- a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-reduc-1char.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-reduc-1char.c
@ -0,0 +1,51 @@
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "../../tree-vect.h"
+
+#define N 16
+#define DIFF 242
+
+void
+main1 (unsigned char x, unsigned char max_result, unsigned char min_result)
+{
+  int i;
+  unsigned char ub[N] = {1,3,6,9,12,15,18,21,24,27,30,33,36,39,42,45};
+  unsigned char uc[N] = {1,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+  unsigned char udiff = 2;
+  unsigned char umax = x;
+  unsigned char umin = x;
+
+  for (i = 0; i < N; i++) {
+    udiff += (unsigned char)(ub[i] - uc[i]);
+  }
+
+  for (i = 0; i < N; i++) {
+    umax = umax < uc[i] ? uc[i] : umax;
+  }
+
+  for (i = 0; i < N; i++) {
+    umin = umin > uc[i] ? uc[i] : umin;
+  }
+
+  /* check results:  */
+  if (udiff != DIFF)
+    abort ();
+  if (umax != max_result)
+    abort ();
+  if (umin != min_result)
+    abort ();
+}
+
+int main (void)
+{ 
+  check_vect ();
+  
+  main1 (100, 100, 1);
+  main1 (0, 15, 0);
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail vect_no_int_max } } } */
+/* { dg-final { scan-tree-dump-times "vectorization not profitable" 2 "vect" { xfail vect_no_int_max } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
--- a/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/x86_64-costmodel-vect.exp
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/x86_64-costmodel-vect.exp
@ -0,0 +1,70 @@
+# Copyright (C) 1997, 2004, 2005, 2006 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+# 
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+# 
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.  
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# Exit immediately if this isn't a x86 target.
+if { (![istarget x86_64-*-*] && ![istarget i?86-*-*])
+     || ![is-effective-target lp64] } then {
+  return
+}
+
+# Set up flags used for tests that don't specify options.
+set DEFAULT_VECTCFLAGS ""
+
+# These flags are used for all targets.
+lappend DEFAULT_VECTCFLAGS "-O2" "-ftree-vectorize" "-fvect-cost-model"
+
+# If the target system supports vector instructions, the default action
+# for a test is 'run', otherwise it's 'compile'.  Save current default.
+# Executing vector instructions on a system without hardware vector support
+# is also disabled by a call to check_vect, but disabling execution here is
+# more efficient.
+global dg-do-what-default
+set save-dg-do-what-default ${dg-do-what-default}
+
+lappend DEFAULT_VECTCFLAGS "-msse2"
+set dg-do-what-default run
+
+# Initialize `dg'.
+dg-init
+
+lappend DEFAULT_VECTCFLAGS "-fdump-tree-vect-details"
+
+# Main loop.
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/costmodel-pr*.\[cS\]]]  \
+	"" $DEFAULT_VECTCFLAGS
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/costmodel-vect-*.\[cS\]]]  \
+	"" $DEFAULT_VECTCFLAGS
+
+#### Tests with special options
+global SAVED_DEFAULT_VECTCFLAGS
+set SAVED_DEFAULT_VECTCFLAGS $DEFAULT_VECTCFLAGS
+
+# -ffast-math tests
+set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
+lappend DEFAULT_VECTCFLAGS "-ffast-math"
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/costmodel-fast-math-vect*.\[cS\]]]  \
+	"" $DEFAULT_VECTCFLAGS
+
+# Clean up.
+set dg-do-what-default ${save-dg-do-what-default}
+
+# All done.
+dg-finish
--- a/gcc/tree-vect-analyze.c
+++ b/gcc/tree-vect-analyze.c
@ -1,5 +1,5 @@
 /* Analysis Utilities for Loop Vectorization.
-   Copyright (C) 2003,2004,2005,2006 Free Software Foundation, Inc.
+   Copyright (C) 2003, 2004, 2005, 2006, 2007 Free Software Foundation, Inc.
   Contributed by Dorit Naishlos <dorit@il.ibm.com>

 This file is part of GCC.
@ -300,6 +300,9 @@ vect_analyze_operations (loop_vec_info loop_vinfo)
  tree phi;
  stmt_vec_info stmt_info;
  bool need_to_vectorize = false;
+  int min_profitable_iters;
+  int min_scalar_loop_bound;
+  unsigned int th;

  if (vect_print_dump_info (REPORT_DETAILS))
    fprintf (vect_dump, "=== vect_analyze_operations ===");
@ -443,8 +446,6 @@ vect_analyze_operations (loop_vec_info loop_vinfo)
 	} /* stmts in bb */
    } /* bbs */

-  /* TODO: Analyze cost. Decide if worth while to vectorize.  */
-
  /* All operations in the loop are either irrelevant (deal with loop
     control, or dead), or only used outside the loop and can be moved
     out of the loop (e.g. invariants, inductions).  The loop can be 
@ -468,16 +469,55 @@ vect_analyze_operations (loop_vec_info loop_vinfo)
        vectorization_factor, LOOP_VINFO_INT_NITERS (loop_vinfo));

  if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
-      && ((LOOP_VINFO_INT_NITERS (loop_vinfo) < vectorization_factor)
-	  || (LOOP_VINFO_INT_NITERS (loop_vinfo) <=
-		((unsigned) (PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)) 
-					   * vectorization_factor))))
+      && (LOOP_VINFO_INT_NITERS (loop_vinfo) < vectorization_factor))
    {
      if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
-	fprintf (vect_dump, "not vectorized: iteration count too small.");
+        fprintf (vect_dump, "not vectorized: iteration count too small.");
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump,"not vectorized: iteration count smaller than "
+                 "vectorization factor.");
      return false;
    }

+  /* Analyze cost. Decide if worth while to vectorize.  */
+
+  min_profitable_iters = vect_estimate_min_profitable_iters (loop_vinfo);
+
+  if (min_profitable_iters < 0)
+    {
+      if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
+        fprintf (vect_dump, "not vectorized: vectorization not profitable.");
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "not vectorized: vector version will never be "
+                 "profitable.");
+      return false;
+    }
+
+  min_scalar_loop_bound = (PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND))
+                          * vectorization_factor;
+
+  /* Use the cost model only if it is more conservative than user specified
+     threshold.  */
+
+  th = (unsigned) min_scalar_loop_bound;
+  if (min_profitable_iters 
+      && (!min_scalar_loop_bound
+          || min_profitable_iters > min_scalar_loop_bound))
+    th = (unsigned) min_profitable_iters;
+
+  if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+      && LOOP_VINFO_INT_NITERS (loop_vinfo) < th)
+    {
+      if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))	      
+        fprintf (vect_dump, "not vectorized: vectorization not "
+                 "profitable.");
+      if (vect_print_dump_info (REPORT_DETAILS))	      
+        fprintf (vect_dump, "not vectorized: iteration count smaller than "
+                 "user specified loop bound parameter or minimum "
+                 "profitable iterations (whichever is more conservative).");
+      return false;
+    }  
+
  if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
      || LOOP_VINFO_INT_NITERS (loop_vinfo) % vectorization_factor != 0
      || LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo))
--- a/gcc/tree-vect-transform.c
+++ b/gcc/tree-vect-transform.c
@ -74,6 +74,490 @@ static void vect_update_inits_of_drs (loop_vec_info, tree);
 static int vect_min_worthwhile_factor (enum tree_code);


+/* Function vect_estimate_min_profitable_iters
+
+   Return the number of iterations required for the vector version of the
+   loop to be profitable relative to the cost of the scalar version of the
+   loop.
+
+   TODO: Take profile info into account before making vectorization
+   decisions, if available.  */
+
+int
+vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo)
+{
+  int i;
+  int min_profitable_iters;
+  int peel_iters_prologue;
+  int peel_iters_epilogue;
+  int vec_inside_cost = 0;
+  int vec_outside_cost = 0;
+  int scalar_single_iter_cost = 0;
+  int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
+  int nbbs = loop->num_nodes;
+
+  /* Cost model disabled.  */
+  if (!flag_vect_cost_model)
+    {
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "cost model disabled.");      
+      return 0;
+    }
+
+  /* Requires loop versioning tests to handle misalignment.
+     FIXME: Make cost depend on number of stmts in may_misalign list.  */
+
+  if (LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo))
+    {
+      vec_outside_cost += TARG_COND_BRANCH_COST;
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "cost model: Adding cost of checks for loop "
+                 "versioning.\n");
+    }
+
+  /* Requires a prologue loop when peeling to handle misalignment. Add cost of
+     two guards, one for the peeled loop and one for the vector loop.  */
+
+  peel_iters_prologue = LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo);
+  if (peel_iters_prologue)
+    {
+      vec_outside_cost += 2 * TARG_COND_BRANCH_COST;
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "cost model: Adding cost of checks for "
+                 "prologue.\n");
+    }
+
+ /* Requires an epilogue loop to finish up remaining iterations after vector
+    loop. Add cost of two guards, one for the peeled loop and one for the
+    vector loop.  */
+
+  if ((peel_iters_prologue < 0)
+      || !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+      || LOOP_VINFO_INT_NITERS (loop_vinfo) % vf)
+    {
+      vec_outside_cost += 2 * TARG_COND_BRANCH_COST;
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "cost model : Adding cost of checks for "
+                 "epilogue.\n");
+    }
+
+  /* Count statements in scalar loop.  Using this as scalar cost for a single
+     iteration for now.
+
+     TODO: Add outer loop support.
+
+     TODO: Consider assigning different costs to different scalar
+     statements.  */
+
+  for (i = 0; i < nbbs; i++)
+    {
+      block_stmt_iterator si;
+      basic_block bb = bbs[i];
+
+      for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (&si))
+        {
+          tree stmt = bsi_stmt (si);
+          stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+          if (!STMT_VINFO_RELEVANT_P (stmt_info)
+              && !STMT_VINFO_LIVE_P (stmt_info))
+            continue;
+          scalar_single_iter_cost++;
+          vec_inside_cost += STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info);
+          vec_outside_cost += STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info);
+        }
+    }
+
+  /* Add additional cost for the peeled instructions in prologue and epilogue
+     loop.
+
+     FORNOW: If we dont know the value of peel_iters for prologue or epilogue
+     at compile-time - we assume the worst.  
+
+     TODO: Build an expression that represents peel_iters for prologue and
+     epilogue to be used in a run-time test.  */
+
+  peel_iters_prologue = LOOP_PEELING_FOR_ALIGNMENT (loop_vinfo);
+
+  if (peel_iters_prologue < 0)
+    {
+      peel_iters_prologue = vf - 1;
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "cost model: "
+                 "prologue peel iters set conservatively.");
+
+      /* If peeling for alignment is unknown, loop bound of main loop becomes
+         unkown.  */
+      peel_iters_epilogue = vf - 1;
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "cost model: "
+                 "epilogue peel iters set conservatively because "
+                 "peeling for alignment is unknown .");
+    }
+  else 
+    {
+      if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
+        {
+          peel_iters_epilogue = vf - 1;
+          if (vect_print_dump_info (REPORT_DETAILS))
+            fprintf (vect_dump, "cost model: "
+                     "epilogue peel iters set conservatively because "
+                     "loop iterations are unknown .");
+        }
+      else      
+        peel_iters_epilogue = 
+                     (LOOP_VINFO_INT_NITERS (loop_vinfo) - peel_iters_prologue)
+                     % vf;
+    }
+
+  vec_outside_cost += (peel_iters_prologue * scalar_single_iter_cost)
+                      + (peel_iters_epilogue * scalar_single_iter_cost);
+
+  /* Calculate number of iterations required to make the vector version 
+     profitable, relative to the loop bodies only. The following condition
+     must hold true: ((SIC*VF)-VIC)*niters > VOC*VF, where
+     SIC = scalar iteration cost, VIC = vector iteration cost,
+     VOC = vector outside cost and VF = vectorization factor.  */
+
+  if ((scalar_single_iter_cost * vf) > vec_inside_cost)
+    {
+      if (vec_outside_cost == 0)
+        min_profitable_iters = 1;
+      else
+        {
+          min_profitable_iters = (vec_outside_cost * vf)
+                                 / ((scalar_single_iter_cost * vf)
+                                    - vec_inside_cost);
+
+          if ((scalar_single_iter_cost * vf * min_profitable_iters)
+              <= ((vec_inside_cost * min_profitable_iters)
+                  + (vec_outside_cost * vf)))
+            min_profitable_iters++;
+        }
+    }
+  /* vector version will never be profitable.  */
+  else
+    {
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "cost model: vector iteration cost = %d "
+                 "is divisible by scalar iteration cost = %d by a factor "
+                 "greater than or equal to the vectorization factor = %d .",
+                 vec_inside_cost, scalar_single_iter_cost, vf);
+      return -1;
+    }
+
+  if (vect_print_dump_info (REPORT_DETAILS))
+    {
+      fprintf (vect_dump, "Cost model analysis: \n");
+      fprintf (vect_dump, "  Vector inside of loop cost: %d\n",
+	       vec_inside_cost);
+      fprintf (vect_dump, "  Vector outside of loop cost: %d\n",
+	       vec_outside_cost);
+      fprintf (vect_dump, "  Scalar cost: %d\n", scalar_single_iter_cost);
+      fprintf (vect_dump, "  prologue iterations: %d\n",
+               peel_iters_prologue);
+      fprintf (vect_dump, "  epilogue iterations: %d\n",
+               peel_iters_epilogue);
+      fprintf (vect_dump, "  Calculated minimum iters for profitability: %d\n",
+	       min_profitable_iters);
+      fprintf (vect_dump, "  Actual minimum iters for profitability: %d\n",
+	       min_profitable_iters < vf ? vf : min_profitable_iters);
+    }
+
+  return min_profitable_iters < vf ? vf : min_profitable_iters;
+}
+
+
+/* TODO: Close dependency between vect_model_*_cost and vectorizable_* 
+   functions. Design better to avoid maintainence issues.  */
+    
+/* Function vect_model_reduction_cost.  
+
+   Models cost for a reduction operation, including the vector ops 
+   generated within the strip-mine loop, the initial definition before
+   the loop, and the epilogue code that must be generated.  */
+
+static void
+vect_model_reduction_cost (stmt_vec_info stmt_info, enum tree_code reduc_code,
+			   int ncopies)
+{
+  int outer_cost = 0;
+  enum tree_code code;
+  optab optab;
+  tree vectype;
+  tree orig_stmt;
+  tree reduction_op;
+  enum machine_mode mode;
+  tree operation = GIMPLE_STMT_OPERAND (STMT_VINFO_STMT (stmt_info), 1);
+  int op_type = TREE_CODE_LENGTH (TREE_CODE (operation));
+
+  /* Cost of reduction op inside loop.  */
+  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) += ncopies * TARG_VEC_STMT_COST;
+
+  reduction_op = TREE_OPERAND (operation, op_type-1);
+  vectype = get_vectype_for_scalar_type (TREE_TYPE (reduction_op));
+  mode = TYPE_MODE (vectype);
+  orig_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
+
+  if (!orig_stmt) 
+    orig_stmt = STMT_VINFO_STMT (stmt_info);
+
+  code = TREE_CODE (GIMPLE_STMT_OPERAND (orig_stmt, 1));
+
+  /* Add in cost for initial definition.  */
+  outer_cost += TARG_VEC_STMT_COST;
+
+  /* Determine cost of epilogue code.
+
+     We have a reduction operator that will reduce the vector in one statement.
+     Also requires scalar extract.  */
+
+  if (reduc_code < NUM_TREE_CODES) 
+    outer_cost += TARG_VEC_STMT_COST + TARG_VEC_TO_SCALAR_COST;
+  else 
+    {
+      int vec_size_in_bits = tree_low_cst (TYPE_SIZE (vectype), 1);
+      tree bitsize =
+	TYPE_SIZE (TREE_TYPE ( GIMPLE_STMT_OPERAND (orig_stmt, 0)));
+      int element_bitsize = tree_low_cst (bitsize, 1);
+      int nelements = vec_size_in_bits / element_bitsize;
+
+      optab = optab_for_tree_code (code, vectype);
+
+      /* We have a whole vector shift available.  */
+      if (!VECTOR_MODE_P (mode) 
+          || optab->handlers[mode].insn_code == CODE_FOR_nothing)
+        /* Final reduction via vector shifts and the reduction operator. Also
+           requires scalar extract.  */
+	outer_cost += ((exact_log2(nelements) * 2 + 1) * TARG_VEC_STMT_COST); 
+      else
+	/* Use extracts and reduction op for final reduction.  For N elements,
+           we have N extracts and N-1 reduction ops.  */
+	outer_cost += ((nelements + nelements - 1) * TARG_VEC_STMT_COST);
+    }
+
+  STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = outer_cost;
+
+  if (vect_print_dump_info (REPORT_DETAILS))
+    fprintf (vect_dump, "vect_model_reduction_cost: inside_cost = %d, "
+             "outside_cost = %d .", STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info),
+             STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info));
+}
+
+
+/* Function vect_model_induction_cost.
+
+   Models cost for induction operations.  */
+
+static void
+vect_model_induction_cost (stmt_vec_info stmt_info, int ncopies)
+{
+  /* loop cost for vec_loop.  */
+  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = ncopies * TARG_VEC_STMT_COST;
+  /* prologue cost for vec_init and vec_step.  */
+  STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = 2 * TARG_VEC_STMT_COST;
+  
+  if (vect_print_dump_info (REPORT_DETAILS))
+    fprintf (vect_dump, "vect_model_induction_cost: inside_cost = %d, "
+             "outside_cost = %d .", STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info),
+             STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info));
+}
+
+
+/* Function vect_model_simple_cost.  
+
+   Models cost for simple operations, i.e. those that only emit ncopies of a 
+   single op.  Right now, this does not account for multiple insns that could
+   be generated for the single vector op.  We will handle that shortly.  */
+
+static void
+vect_model_simple_cost (stmt_vec_info stmt_info, int ncopies)
+{
+  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = ncopies * TARG_VEC_STMT_COST;
+
+  if (vect_print_dump_info (REPORT_DETAILS))
+    fprintf (vect_dump, "vect_model_simple_cost: inside_cost = %d, "
+             "outside_cost = %d .", STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info),
+             STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info));
+}
+
+
+/* Function vect_cost_strided_group_size 
+ 
+   For strided load or store, return the group_size only if it is the first
+   load or store of a group, else return 1.  This ensures that group size is
+   only returned once per group.  */
+
+static int
+vect_cost_strided_group_size (stmt_vec_info stmt_info)
+{
+  tree first_stmt = DR_GROUP_FIRST_DR (stmt_info);
+
+  if (first_stmt == STMT_VINFO_STMT (stmt_info))
+    return DR_GROUP_SIZE (stmt_info);
+
+  return 1;
+}
+
+
+/* Function vect_model_store_cost
+
+   Models cost for stores.  In the case of strided accesses, one access
+   has the overhead of the strided access attributed to it.  */
+
+static void
+vect_model_store_cost (stmt_vec_info stmt_info, int ncopies)
+{
+  int cost = 0;
+  int group_size;
+
+  /* Strided access?  */
+  if (DR_GROUP_FIRST_DR (stmt_info)) 
+    group_size = vect_cost_strided_group_size (stmt_info);
+  /* Not a strided access.  */
+  else
+    group_size = 1;
+
+  /* Is this an access in a group of stores, which provide strided access?  
+     If so, add in the cost of the permutes.  */
+  if (group_size > 1) 
+    {
+      /* Uses a high and low interleave operation for each needed permute.  */
+      cost = ncopies * exact_log2(group_size) * group_size 
+             * TARG_VEC_STMT_COST;
+
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "vect_model_store_cost: strided group_size = %d .",
+                 group_size);
+
+    }
+
+  /* Costs of the stores.  */
+  cost += ncopies * TARG_VEC_STORE_COST;
+
+  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = cost;
+
+  if (vect_print_dump_info (REPORT_DETAILS))
+    fprintf (vect_dump, "vect_model_store_cost: inside_cost = %d, "
+             "outside_cost = %d .", STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info),
+             STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info));
+}
+
+
+/* Function vect_model_load_cost
+
+   Models cost for loads.  In the case of strided accesses, the last access
+   has the overhead of the strided access attributed to it.  Since unaligned
+   accesses are supported for loads, we also account for the costs of the 
+   access scheme chosen.  */
+
+static void
+vect_model_load_cost (stmt_vec_info stmt_info, int ncopies)
+		 
+{
+  int inner_cost = 0;
+  int group_size;
+  int alignment_support_cheme;
+  tree first_stmt;
+  struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info), *first_dr;
+
+  /* Strided accesses?  */
+  first_stmt = DR_GROUP_FIRST_DR (stmt_info);
+  if (first_stmt)
+    {
+      group_size = vect_cost_strided_group_size (stmt_info);
+      first_dr = STMT_VINFO_DATA_REF (vinfo_for_stmt (first_stmt));
+    }
+  /* Not a strided access.  */
+  else
+    {
+      group_size = 1;
+      first_dr = dr;
+    }
+
+  alignment_support_cheme = vect_supportable_dr_alignment (first_dr);
+
+  /* Is this an access in a group of loads providing strided access?  
+     If so, add in the cost of the permutes.  */
+  if (group_size > 1) 
+    {
+      /* Uses an even and odd extract operations for each needed permute.  */
+      inner_cost = ncopies * exact_log2(group_size) * group_size
+                   * TARG_VEC_STMT_COST;
+
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "vect_model_load_cost: strided group_size = %d .",
+                 group_size);
+
+    }
+
+  /* The loads themselves.  */
+  switch (alignment_support_cheme)
+    {
+    case dr_aligned:
+      {
+        inner_cost += ncopies * TARG_VEC_LOAD_COST;
+
+        if (vect_print_dump_info (REPORT_DETAILS))
+          fprintf (vect_dump, "vect_model_load_cost: aligned.");
+
+        break;
+      }
+    case dr_unaligned_supported:
+      {
+        /* Here, we assign an additional cost for the unaligned load.  */
+        inner_cost += ncopies * TARG_VEC_UNALIGNED_LOAD_COST;
+
+        if (vect_print_dump_info (REPORT_DETAILS))
+          fprintf (vect_dump, "vect_model_load_cost: unaligned supported by "
+                   "hardware.");
+
+        break;
+      }
+    case dr_unaligned_software_pipeline:
+      {
+        int outer_cost = 0;
+
+        if (vect_print_dump_info (REPORT_DETAILS))
+          fprintf (vect_dump, "vect_model_load_cost: unaligned software "
+                   "pipelined.");
+
+        /* Unaligned software pipeline has a load of an address, an initial
+           load, and possibly a mask operation to "prime" the loop. However,
+           if this is an access in a group of loads, which provide strided
+           acccess, then the above cost should only be considered for one
+           access in the group. Inside the loop, there is a load op
+           and a realignment op.  */
+
+        if ((!DR_GROUP_FIRST_DR (stmt_info)) || group_size > 1)
+          {
+            outer_cost = 2*TARG_VEC_STMT_COST;
+            if (targetm.vectorize.builtin_mask_for_load)
+              outer_cost += TARG_VEC_STMT_COST;
+          }
+        
+        STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info) = outer_cost;
+
+        inner_cost += ncopies * (TARG_VEC_LOAD_COST + TARG_VEC_STMT_COST);
+
+        break;
+      }
+
+    default:
+      gcc_unreachable ();
+    }
+
+  STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) = inner_cost;
+
+  if (vect_print_dump_info (REPORT_DETAILS))
+    fprintf (vect_dump, "vect_model_load_cost: inside_cost = %d, "
+             "outside_cost = %d .", STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info),
+             STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info));
+
+}
+
+
 /* Function vect_get_new_vect_var.

   Returns a name for a new variable. The current naming scheme appends the 
@ -1655,6 +2139,7 @@ vectorizable_reduction (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt)
  if (!vec_stmt) /* transformation not required.  */
    {
      STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
+      vect_model_reduction_cost (stmt_info, epilog_reduc_code, ncopies);
      return true;
    }

@ -1862,9 +2347,15 @@ vectorizable_call (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt)

  gcc_assert (ZERO_SSA_OPERANDS (stmt, SSA_OP_ALL_VIRTUALS));

+  ncopies = (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
+	     / TYPE_VECTOR_SUBPARTS (vectype_out));
+
  if (!vec_stmt) /* transformation not required.  */
    {
      STMT_VINFO_TYPE (stmt_info) = call_vec_info_type;
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "=== vectorizable_call ===");
+      vect_model_simple_cost (stmt_info, ncopies);
      return true;
    }

@ -1873,8 +2364,6 @@ vectorizable_call (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt)
  if (vect_print_dump_info (REPORT_DETAILS))
    fprintf (vect_dump, "transform operation.");

-  ncopies = (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
-	     / TYPE_VECTOR_SUBPARTS (vectype_out));
  gcc_assert (ncopies >= 1);

  /* Handle def.  */
@ -2302,6 +2791,9 @@ vectorizable_assignment (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt)
  if (!vec_stmt) /* transformation not required.  */
    {
      STMT_VINFO_TYPE (stmt_info) = assignment_vec_info_type;
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "=== vectorizable_assignment ===");
+      vect_model_simple_cost (stmt_info, ncopies);
      return true;
    }

@ -2392,6 +2884,9 @@ vectorizable_induction (tree phi, block_stmt_iterator *bsi ATTRIBUTE_UNUSED,
  if (!vec_stmt) /* transformation not required.  */
    {
      STMT_VINFO_TYPE (stmt_info) = induc_vec_info_type;
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "=== vectorizable_induction ===");
+      vect_model_induction_cost (stmt_info, ncopies);
      return true;
    }

@ -2555,6 +3050,9 @@ vectorizable_operation (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt)
  if (!vec_stmt) /* transformation not required.  */
    {
      STMT_VINFO_TYPE (stmt_info) = op_vec_info_type;
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "=== vectorizable_operation ===");
+      vect_model_simple_cost (stmt_info, ncopies);
      return true;
    }

@ -2772,6 +3270,9 @@ vectorizable_type_demotion (tree stmt, block_stmt_iterator *bsi,
  if (!vec_stmt) /* transformation not required.  */
    {
      STMT_VINFO_TYPE (stmt_info) = type_demotion_vec_info_type;
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "=== vectorizable_demotion ===");
+      vect_model_simple_cost (stmt_info, ncopies);
      return true;
    }

@ -2932,6 +3433,9 @@ vectorizable_type_promotion (tree stmt, block_stmt_iterator *bsi,
  if (!vec_stmt) /* transformation not required.  */
    {
      STMT_VINFO_TYPE (stmt_info) = type_promotion_vec_info_type;
+      if (vect_print_dump_info (REPORT_DETAILS))
+        fprintf (vect_dump, "=== vectorizable_promotion ===");
+      vect_model_simple_cost (stmt_info, 2*ncopies);
      return true;
    }

@ -3252,14 +3756,12 @@ vectorizable_store (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt)
  if (!vec_stmt) /* transformation not required.  */
    {
      STMT_VINFO_TYPE (stmt_info) = store_vec_info_type;
+      vect_model_store_cost (stmt_info, ncopies);
      return true;
    }

  /** Transform.  **/

-  if (vect_print_dump_info (REPORT_DETAILS))
-    fprintf (vect_dump, "transform store. ncopies = %d",ncopies);
-
  if (strided_store)
    {
      first_stmt = DR_GROUP_FIRST_DR (stmt_info);
@ -3284,6 +3786,9 @@ vectorizable_store (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt)
      group_size = 1;
    }
  
+  if (vect_print_dump_info (REPORT_DETAILS))
+    fprintf (vect_dump, "transform store. ncopies = %d",ncopies);
+
  dr_chain = VEC_alloc (tree, heap, group_size);
  oprnds = VEC_alloc (tree, heap, group_size);

@ -3915,14 +4420,15 @@ vectorizable_load (tree stmt, block_stmt_iterator *bsi, tree *vec_stmt)
  if (!vec_stmt) /* transformation not required.  */
    {
      STMT_VINFO_TYPE (stmt_info) = load_vec_info_type;
+      vect_model_load_cost (stmt_info, ncopies);
      return true;
    }

-  /** Transform.  **/
-
  if (vect_print_dump_info (REPORT_DETAILS))
    fprintf (vect_dump, "transform load.");

+  /** Transform.  **/
+
  if (strided_load)
    {
      first_stmt = DR_GROUP_FIRST_DR (stmt_info);
@ -4807,6 +5313,8 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo, tree *ratio)
  basic_block preheader;
  int loop_num;
  unsigned int th;
+  int min_scalar_loop_bound;
+  int min_profitable_iters;

  if (vect_print_dump_info (REPORT_DETAILS))
    fprintf (vect_dump, "=== vect_do_peeling_for_loop_bound ===");
@ -4822,11 +5330,28 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo, tree *ratio)
 				   &ratio_mult_vf_name, ratio);

  loop_num  = loop->num; 
-  /* Threshold for vectorized loop.  */
-  th = (PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)) * 
-			LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+
+  /* Analyze cost to set threshhold for vectorized loop.  */
+  min_profitable_iters = vect_estimate_min_profitable_iters (loop_vinfo);
+
+  min_scalar_loop_bound = (PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND))
+                          * LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+
+  /* Use the cost model only if it is more conservative than user specified
+     threshold.  */
+
+  th = (unsigned) min_scalar_loop_bound;
+  if (min_profitable_iters
+      && (!min_scalar_loop_bound
+          || min_profitable_iters > min_scalar_loop_bound))
+    th = (unsigned) min_profitable_iters;
+
+  if (vect_print_dump_info (REPORT_DETAILS))
+    fprintf (vect_dump, "vectorization may not be profitable.");
+
  new_loop = slpeel_tree_peel_loop_to_edge (loop, single_exit (loop),
-					    ratio_mult_vf_name, ni_name, false, th);
+                                            ratio_mult_vf_name, ni_name, false,
+                                            th);
  gcc_assert (new_loop);
  gcc_assert (loop_num == loop->num);
 #ifdef ENABLE_CHECKING
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@ -1351,6 +1351,8 @@ new_stmt_vec_info (tree stmt, loop_vec_info loop_vinfo)
  else
    STMT_VINFO_DEF_TYPE (res) = vect_loop_def;
  STMT_VINFO_SAME_ALIGN_REFS (res) = VEC_alloc (dr_p, heap, 5);
+  STMT_VINFO_INSIDE_OF_LOOP_COST (res) = 0;
+  STMT_VINFO_OUTSIDE_OF_LOOP_COST (res) = 0;
  DR_GROUP_FIRST_DR (res) = NULL_TREE;
  DR_GROUP_NEXT_DR (res) = NULL_TREE;
  DR_GROUP_SIZE (res) = 0;
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@ -1,5 +1,5 @@
 /* Loop Vectorization
-   Copyright (C) 2003, 2004, 2005, 2006 Free Software Foundation, Inc.
+   Copyright (C) 2003, 2004, 2005, 2006, 2007 Free Software Foundation, Inc.
   Contributed by Dorit Naishlos <dorit@il.ibm.com>

 This file is part of GCC.
@ -268,6 +268,13 @@ typedef struct _stmt_vec_info {
  /* For loads only, if there is a store with the same location, this field is
     TRUE.  */
  bool read_write_dep;
+
+  /* Vectorization costs associated with statement.  */
+  struct  
+  {
+    int outside_of_loop;     /* Statements generated outside loop.  */
+    int inside_of_loop;      /* Statements generated inside loop.  */
+  } cost;
 } *stmt_vec_info;

 /* Access Functions.  */
@ -300,6 +307,42 @@ typedef struct _stmt_vec_info {
 #define DR_GROUP_READ_WRITE_DEPENDENCE(S)  (S)->read_write_dep

 #define STMT_VINFO_RELEVANT_P(S)          ((S)->relevant != vect_unused_in_loop)
+#define STMT_VINFO_OUTSIDE_OF_LOOP_COST(S) (S)->cost.outside_of_loop
+#define STMT_VINFO_INSIDE_OF_LOOP_COST(S)  (S)->cost.inside_of_loop
+
+/* These are some defines for the initial implementation of the vectorizer's
+   cost model.  These will later be target specific hooks.  */
+
+/* Cost of conditional branch.  */
+#ifndef TARG_COND_BRANCH_COST
+#define TARG_COND_BRANCH_COST        3
+#endif
+
+/* Cost of any vector operation, excluding load, store or vector to scalar
+   operation.  */ 
+#ifndef TARG_VEC_STMT_COST
+#define TARG_VEC_STMT_COST           1
+#endif
+
+/* Cost of vector to scalar operation.  */
+#ifndef TARG_VEC_TO_SCALAR_COST
+#define TARG_VEC_TO_SCALAR_COST      1
+#endif
+
+/* Cost of aligned vector load.  */
+#ifndef TARG_VEC_LOAD_COST
+#define TARG_VEC_LOAD_COST           1
+#endif
+
+/* Cost of misaligned vector load.  */
+#ifndef TARG_VEC_UNALIGNED_LOAD_COST
+#define TARG_VEC_UNALIGNED_LOAD_COST 2
+#endif
+
+/* Cost of vector store.  */
+#ifndef TARG_VEC_STORE_COST
+#define TARG_VEC_STORE_COST          1
+#endif

 static inline void set_stmt_info (stmt_ann_t ann, stmt_vec_info stmt_info);
 static inline stmt_vec_info vinfo_for_stmt (tree stmt);
@ -437,6 +480,7 @@ extern bool vectorizable_condition (tree, block_stmt_iterator *, tree *);
 extern bool vectorizable_live_operation (tree, block_stmt_iterator *, tree *);
 extern bool vectorizable_reduction (tree, block_stmt_iterator *, tree *);
 extern bool vectorizable_induction (tree, block_stmt_iterator *, tree *);
+extern int  vect_estimate_min_profitable_iters (loop_vec_info);
 /* Driver for transformation stage.  */
 extern void vect_transform_loop (loop_vec_info);