The following testcase fails -fcompare-debug, because expand_vector_comparison
since r11-1786-g1ac9258cca8030745d3c0b8f63186f0adf0ebc27 sets
vec_cond_expr_only when it sees some use other than VEC_COND_EXPR that uses
the lhs in its condition.
Obviously we should ignore debug stmts when doing so, e.g. by not pushing
them to uses.
That would be a 2 liner change, but while looking at it, I'm also worried
about VEC_COND_EXPRs that would use the lhs in more than one operand,
like VEC_COND_EXPR <lhs, lhs, something> or VEC_COND_EXPR <lhs, something, lhs>
(sure, they ought to be folded, but what if they weren't). Because if
something like that happens, then FOR_EACH_IMM_USE_FAST would push the same
stmt multiple times and expand_vector_condition can return true even when
it modifies it (for vector bool masking).
And lastly, it seems quite wasteful to safe_push statements that will just
cause vec_cond_expr_only = false; and break; in the second loop, both for
cases like 1000 immediate non-VEC_COND_EXPR uses and for cases like
999 VEC_COND_EXPRs with lhs in cond followed by a single non-VEC_COND_EXPR
use. So this patch only pushes VEC_COND_EXPRs there.
2022-02-01 Jakub Jelinek <jakub@redhat.com>
PR middle-end/104307
* tree-vect-generic.cc (expand_vector_comparison): Don't push debug
stmts to uses vector, just set vec_cond_expr_only to false for
non-VEC_COND_EXPRs instead of pushing them into uses. Treat
VEC_COND_EXPRs that use lhs not just in rhs1, but rhs2 or rhs3 too
like non-VEC_COND_EXPRs.
* gcc.target/i386/pr104307.c: New test.
When propagating a multi-word register into an access with a smaller
mode the can_change_mode backend hook is already consulted for the
original register. This however is also required for the intermediate
copy in copy_regno which might use a different register class.
gcc/ChangeLog:
PR rtl-optimization/101260
* regcprop.cc (maybe_mode_change): Invoke mode_change_ok also for
copy_regno.
gcc/testsuite/ChangeLog:
PR rtl-optimization/101260
* gcc.target/s390/pr101260.c: New testcase.
These operations should raise an invalid operation exception at runtime.
So they should not be folded during compilation unless -fno-trapping-math
is used.
gcc/
PR middle-end/95115
* fold-const.cc (const_binop): Do not fold NaN result from
non-NaN operands.
gcc/testsuite
* gcc.dg/pr95115.c: New test.
When running libgomp test-case broadcast-many.c on an nvptx accelerator
(T400, driver version 470.86), I run into:
...
libgomp: The Nvidia accelerator has insufficient resources to launch \
'main$_omp_fn$0' with num_workers = 32 and vector_length = 32; \
recompile the program with 'num_workers = x and vector_length = y' on \
that offloaded region or '-fopenacc-dim=❌y' where x * y <= 896.
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/broadcast-many.c \
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none \
-O0 execution test
...
The error does not occur when using GOMP_NVPTX_JIT=-O0.
Fix this by using 896 / 32 == 28 workers for ACC_DEVICE_TYPE_nvidia.
Likewise for some other test-cases.
Tested libgomp on x86_64 with nvptx accelerator.
libgomp/ChangeLog:
2022-01-27 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.oacc-c-c++-common/broadcast-many.c: Reduce
num_workers for nvidia accelerator to fix libgomp error 'insufficient
resources'.
* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c:
Same.
* testsuite/libgomp.oacc-c-c++-common/reduction-7.c: Same.
When running the libgomp testsuite with GOMP_NVPTX_JIT=-O0 using an nvptx
accelerator (Nvidia T400, 2GB), I run into:
...
libgomp: cuCtxSynchronize error: unspecified launch failure \
(perhaps abort was called)
libgomp: cuMemFree_v2 error: unspecified launch failure
libgomp: device finalization failed
FAIL: libgomp.fortran/examples-4/declare_target-1.f90 -O0 execution test
...
The test-case contains:
...
! Reduced from 25 to 23, otherwise execution runs out of thread stack on
! Nvidia Titan V.
if (fib (23) /= fib_wrapper (23)) stop 2
...
Fix this by reducing the fib/fib_wrapper argument from 23 to 22.
Same for declare_target-2.f90.
Tested on x86_64 with nvptx accelerator.
libgomp/ChangeLog:
2022-01-27 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.fortran/examples-4/declare_target-1.f90: Reduce
recursion depth.
* testsuite/libgomp.fortran/examples-4/declare_target-2.f90: Same.
As mentioned in PR56888 comment 21:
...
-fno-tree-loop-distribute-patterns is the reliable way to not
transform loops into library calls.
...
However, since commit 6f966f0614 ("ldist: Recognize strlen and rawmemchr like
loops") a strlen or rawmemchr library call may be introduced by ldist.
This caused regressions in testcases
gcc.c-torture/execute/builtins/strlen{,-2,-3}.c for nvptx.
Fix this by not calling transform_reduction_loop from
loop_distribution::execute for -fno-tree-loop-distribute-patterns.
Tested regressed test-cases as well as gcc.dg/tree-ssa/ldist-*.c on
nvptx.
gcc/ChangeLog:
2022-01-31 Tom de Vries <tdevries@suse.de>
* tree-loop-distribution.cc (generate_reduction_builtin_1): Check for
-ftree-loop-distribute-patterns.
(loop_distribution::execute): Don't call transform_reduction_loop for
-fno-tree-loop-distribute-patterns.
gcc/testsuite/ChangeLog:
2022-01-31 Tom de Vries <tdevries@suse.de>
* gcc.dg/tree-ssa/ldist-strlen-4.c: New test.
The OEP_* enums were moved to tree-core.h in
r0-124973-g5e351e960763 but the comment was correct
when it was added added to fold-const.h in
r10-4231-g7f4a8ee03d40. This fixes the reference
to the OEP_* enum to reference tree-core.
Committed as obvious after a bootstrap/test on x86_64-linux.
gcc/ChangeLog:
* fold-const.h (operand_compare::operand_equal_p):
Fix comment about OEP_* flags.
Here we ICE in unify_array_domain when we're trying to deduce the type
of an array, as in
auto(*p)[i] = (int(*)[i])0;
but unify_array_domain doesn't arbitrarily complex bounds. Another
test is, e.g.,
auto (*b)[0/0] = &a;
where the type of the array is
<<< Unknown tree: template_type_parm >>>[0:(sizetype) ((ssizetype) (0 / 0) - 1)]
It seems to me that we need not handle these.
PR c++/102414
PR c++/101874
gcc/cp/ChangeLog:
* decl.cc (create_array_type_for_decl): Use template_placeholder_p.
Sorry on a variable-length array of auto.
gcc/testsuite/ChangeLog:
* g++.dg/cpp23/auto-array3.C: New test.
* g++.dg/cpp23/auto-array4.C: New test.
Weird things are going to happen if you define your std::initializer_list
as a union. In this case, we crash in output_constructor_regular_field.
Let's not allow such a definition in the first place.
PR c++/102434
gcc/cp/ChangeLog:
* class.cc (finish_struct): Don't allow union initializer_list.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/initlist128.C: New test.
Here during deduction guide generation for the nested class template
B<char(int)>::C, the computation of outer_args yields the template
arguments relative to the primary template for B (i.e. {char(int)})
but what we really want is those relative to C's enclosing scope, the
partial specialization of B (i.e. {char, int}).
PR c++/104294
gcc/cp/ChangeLog:
* pt.cc (ctor_deduction_guides_for): Correct computation of
outer_args.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1z/class-deduction106.C: New test.
As reported by Martin, while David has added OPTION_GLIBC define to aix
and Iain to darwin, all the other non-linux targets now fail because
rs6000.md macro isn't defined.
One possibility is to define this macro in option-defaults.h which on rs6000
targets is included last, then we don't need to define it in aix/darwin
headers and for targets using linux.h or linux64.h it will DTRT too.
The other option is the first 2 hunks + changing the 3
if (!OPTION_GLIBC)
FAIL;
cases in rs6000.md to e.g.
#ifdef OPTION_GLIBC
if (!OPTION_GLIBC)
#endif
FAIL;
or to:
#ifdef OPTION_GLIBC
if (!OPTION_GLIBC)
#else
if (true)
#endif
FAIL;
(the latter case if Richi wants to push the -Wunreachable-code changes for
GCC 13).
2022-01-31 Jakub Jelinek <jakub@redhat.com>
PR target/104298
* config/rs6000/aix.h (OPTION_GLIBC): Remove.
* config/rs6000/darwin.h (OPTION_GLIBC): Likewise.
* config/rs6000/option-defaults.h (OPTION_GLIBC): Define to 0
if not already defined.
libiberty/
PR demangler/98886
PR demangler/99935
* rust-demangle.c (struct rust_demangler): Add a recursion
counter.
(demangle_path): Increment/decrement the recursion counter upon
entry and exit. Fail if the counter exceeds a fixed limit.
(demangle_type): Likewise.
(rust_demangle_callback): Initialise the recursion counter,
disabling if requested by the option flags.
> > PR tree-optimization/103514
> > * match.pd (a & b) ^ (a == b) -> !(a | b): New optimization.
> > * match.pd (a & b) == (a ^ b) -> !(a | b): New optimization.
> > * gcc.dg/tree-ssa/pr103514.c: Testcase for this optimization.
> >
> > 1) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103514
> Note the bug was filed an fixed during stage3, review just didn't happen in
> a reasonable timeframe.
>
> I'm going to ACK this for the trunk and go ahead and commit it for you.
The testcase FAILs on short-circuit targets like powerpc64le-linux.
While the first 2 functions are identical, the last two look like:
<bb 2> :
if (a_5(D) != 0)
goto <bb 3>; [INV]
else
goto <bb 4>; [INV]
<bb 3> :
if (b_6(D) != 0)
goto <bb 5>; [INV]
else
goto <bb 4>; [INV]
<bb 4> :
<bb 5> :
# iftmp.1_4 = PHI <1(3), 0(4)>
_1 = a_5(D) == b_6(D);
_2 = (int) _1;
_3 = _2 ^ iftmp.1_4;
_9 = _2 != iftmp.1_4;
return _9;
instead of the expected:
<bb 2> :
_3 = a_8(D) & b_9(D);
_4 = (int) _3;
_5 = a_8(D) == b_9(D);
_6 = (int) _5;
_1 = a_8(D) | b_9(D);
_2 = ~_1;
_7 = (int) _2;
_10 = ~_1;
return _10;
so no wonder it doesn't match. E.g. x86_64-linux will also use jumps
if it isn't just a && b but a && b && c && d (will do
a & b and c & d tests and jump based on those.
As it is too late to implement this optimization even for the short
circuiting targets this late (not even sure which pass would be best),
this patch just forces non-short-circuiting for the test.
2022-01-31 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/103514
* gcc.dg/tree-ssa/pr103514.c: Add
--param logical-op-non-short-circuit=1 to dg-options.
libatomic/ChangeLog:
* acinclude.m4: Detect *_ld_is_mold and use it.
* configure: Regenerate.
libgomp/ChangeLog:
* acinclude.m4: Detect *_ld_is_mold and use it.
* configure: Regenerate.
libitm/ChangeLog:
* acinclude.m4: Detect *_ld_is_mold and use it.
* configure: Regenerate.
libstdc++-v3/ChangeLog:
* acinclude.m4: Detect *_ld_is_mold and use it.
* configure: Regenerate.
We were passing down the original type to recursive invocations
of multiple_of_p for say (int)(unsigned * unsigned).
2022-01-24 Richard Biener <rguenther@suse.de>
PR tree-optimization/100499
* fold-const.cc (multiple_of_p): Pass the correct type of
the expression to the recursive invocation of multiple_of_p
for conversions and use CASE_CONVERT.
This is what has been done for ages on SPARC/Solaris and makes it possible
to use 64-bit atomic instructions even in 32-bit mode.
gcc/
PR target/104189
* config/sparc/linux64.h (TARGET_DEFAULT): Add MASK_V8PLUS.
There are a few cases where we know we're dealing with (poly-)integer
constants, so remove the use of multiple_of_p in those cases to make
the PR100499 fix less impactful.
2022-01-24 Richard Biener <rguenther@suse.de>
PR tree-optimization/100499
* tree-cfg.cc (verify_gimple_assign_ternary): Use multiple_p
on poly-ints instead of multiple_of_p.
* tree-ssa.cc (maybe_rewrite_mem_ref_base): Likewise.
(non_rewritable_mem_ref_base): Likewise.
(non_rewritable_lvalue_p): Likewise.
(execute_update_addresses_taken): Likewise.
These tests have always been failing for my autotester running a
cris-elf simulator; when unrestrained they take about 20 minutes each,
compared to the (doubled) timeout of 720 seconds, of a total 2h40min
for the whole of the libstdc++-v3 testsuite. The tests cover counter
overflow and are already disabled for LP64 targets.
* testsuite/27_io/basic_istream/get/char/lwg3464.cc: Don't run on
simulator targets.
* testsuite/27_io/basic_istream/get/wchar_t/lwg3464.cc: Likewise.
This test fails everywhere, because ? doesn't match literal ?.
It should use \\? instead. I've also changed those .s in there.
2022-01-29 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/95424
* gcc.dg/tree-ssa/divide-7.c: Fix up regexps in scan-tree-dump{,-not}.
On Fri, Jan 28, 2022 at 11:38:23AM -0700, Jeff Law wrote:
> Thanks. Given the original submission and most of the review work was done
> prior to stage3 closing, I went ahead and installed this on the trunk.
Unfortunately this breaks quite a lot of things.
The main problem is that GIMPLE allows EQ_EXPR etc. only with BOOLEAN_TYPE
or with TYPE_PRECISION == 1 integral type (or vector boolean).
Violating this causes verification failures in tree-cfg.cc in some cases,
in other cases wrong-code issues because before it is verified we e.g.
transform
1U / x
into
x == 1U
and later into
x (because we assume that == type must be one of the above cases and
when it is the same type as the type of the first operand, for boolean-ish
cases it should be equivalent).
Fixed by changing that
(eq @1 { build_one_cst (type); })
into
(convert (eq:boolean_type_node @1 { build_one_cst (type); }))
Note, I'm not 100% sure if :boolean_type_node is required in that case,
I see some spots in match.pd that look exactly like this, while there is
e.g. (convert (le ...)) that supposedly does the right thing too.
The signed integer 1/X case doesn't need changes changes, for
(cond (le ...) ...)
le gets correctly boolean_type_node and cond should use type.
I've also reformatted it, some lines were too long, match.pd uses
indentation by 1 column instead of 2 etc.
2022-01-29 Jakub Jelinek <jakub@redhat.com>
Andrew Pinski <apinski@marvell.com>
PR tree-optimization/104279
PR tree-optimization/104280
PR tree-optimization/104281
* match.pd (1 / X -> X == 1 for unsigned X): Build eq with
boolean_type_node and convert to type. Formatting fixes.
* gcc.dg/torture/pr104279.c: New test.
* gcc.dg/torture/pr104280.c: New test.
* gcc.dg/torture/pr104281.c: New test.
This patch will add the missed pattern described in bug 103514 [1] to the match.pd. [1] includes proof of correctness for the patch too.
1) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103514
gcc/
PR tree-optimization/103514
* match.pd (a & b) ^ (a == b) -> !(a | b): New optimization.
(a & b) == (a ^ b) -> !(a | b): New optimization.
gcc/testsuite
* gcc.dg/tree-ssa/pr103514.c: Testcase for this optimization.
Here we're emitting a -Wignored-qualifiers warning for an intermediate
compiler-generated cast of nullptr to 'method-type* const' as part of
value initialization of a const pmf. This patch suppresses the warning
by instead casting to the corresponding unqualified type.
PR c++/92752
gcc/cp/ChangeLog:
* typeck.cc (build_ptrmemfunc): Cast a nullptr constant to the
unqualified pointer type not the qualified one.
gcc/testsuite/ChangeLog:
* g++.dg/warn/Wignored-qualifiers2.C: New test.
Co-authored-by: Jason Merrill <jason@redhat.com>
A recent patch added tests for OPTION_GLIBC that is defined in
linux.h and linux64.h. This broke bootstrap for powerpc Darwin.
Fixed by adding a definition to 0 for OPTION_GLIBC.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/ChangeLog:
* config/rs6000/darwin.h (OPTION_GLIBC): Define to 0.
This patch implements an optimization for the following C++ code:
int f(int x) {
return 1 / x;
}
int f(unsigned int x) {
return 1 / x;
}
Before this patch, x86-64 gcc -std=c++20 -O3 produces the following assembly:
f(int):
xor edx, edx
mov eax, 1
idiv edi
ret
f(unsigned int):
xor edx, edx
mov eax, 1
div edi
ret
In comparison, clang++ -std=c++20 -O3 produces the following assembly:
f(int):
lea ecx, [rdi + 1]
xor eax, eax
cmp ecx, 3
cmovb eax, edi
ret
f(unsigned int):
xor eax, eax
cmp edi, 1
sete al
ret
Clang's output is more efficient as it avoids expensive div operations.
With this patch, GCC now produces the following assembly:
f(int):
lea eax, [rdi + 1]
cmp eax, 2
mov eax, 0
cmovbe eax, edi
ret
f(unsigned int):
xor eax, eax
cmp edi, 1
sete al
ret
which is virtually identical to Clang's assembly output. Any slight differences
in the output for f(int) is possibly related to a different missed optimization.
v2: https://gcc.gnu.org/pipermail/gcc-patches/2022-January/587751.html
Changes from v2:
1. Refactor from using a switch statement to using the built-in
if-else statement.
v1: https://gcc.gnu.org/pipermail/gcc-patches/2022-January/587634.html
Changes from v1:
1. Refactor common if conditions.
2. Use build_[minus_]one_cst (type) to get -1/1 of the correct type.
3. Match only for TRUNC_DIV_EXPR and TYPE_PRECISION (type) > 1.
gcc/ChangeLog:
PR tree-optimization/95424
* match.pd: Simplify 1 / X where X is an integer.
As mentioned in the PRthe following testcase fails, because the last
stmt of a bb with -g is a debug stmt and get_status_for_store_merging
uses gimple_seq_last_stmt (bb_seq (bb)) when testing if it is valid
for store merging. The debug stmt isn't valid, while a stmt at that
position with -g0 is valid and so the divergence.
As we walk the whole bb already, this patch just remembers the last
non-debug stmt, so that we don't need to skip backwards debug stmts at the
end of the bb to find last real stmt.
2022-01-28 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/104263
* gimple-ssa-store-merging.cc (get_status_for_store_merging): For
cfun->can_throw_non_call_exceptions && cfun->eh test whether
last non-debug stmt in the bb is store_valid_for_store_merging_p
rather than last stmt.
* gcc.dg/pr104263.c: New test.
Revert partially what I did in g:76ef38e3178a11e76a66b4d4c0e10e85fe186a45.
gcc/ChangeLog:
* diagnostic.cc (diagnostic_action_after_output): Remove extra
newline.
gcc/ChangeLog:
* config/rs6000/host-darwin.cc (segv_crash_handler):
Do not use leading capital letter.
(segv_handler): Likewise.
* ipa-sra.cc (verify_splitting_accesses): Likewise.
* varasm.cc (get_section): Likewise.
gcc/d/ChangeLog:
* decl.cc (d_finish_decl): Do not use leading capital letter.
When deducing the type of a variable template (or templated static data
member) with a constrained auto type, we might need its template
arguments for satisfaction since the constraint could depend on them.
PR c++/103341
gcc/cp/ChangeLog:
* decl.cc (cp_finish_decl): Pass the template arguments of a
variable template specialization or a templated static data
member to do_auto_deduction when the auto is constrained.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-class4.C: New test.
* g++.dg/cpp2a/concepts-var-templ2.C: New test.
The following fixes the vector type registered for external defs
in call arguments when vectorizing with SLP. We assumed uniform
vectype_in types here but with calls like .COND_MUL we also have
mask arguments which, when invariant or external, need to have
a proper mask vector type.
2022-01-28 Richard Biener <rguenther@suse.de>
PR tree-optimization/104267
* tree-vect-stmts.cc (vectorizable_call): Properly use the
per-argument determined vector type for externals and
invariants.
This removes a premature optimization from
gimple_purge_dead_abnormal_call_edges which, after eliding the
last setjmp (or computed goto) statement from a function and
thus clearing cfun->calls_setjmp, leaves us with the abnormal
edges from other calls that are elided for example via inlining
or DCE. That's a CFG / IL combination that should be impossible
(not addressing the fact that with cfun->calls_setjmp and
cfun->has_nonlocal_label cleared we should not have any abnormal
edge at all).
For the testcase in the PR this means that IPA inlining will
remove the abormal edges from the block after inlining the call
the edge was coming from.
2022-01-28 Richard Biener <rguenther@suse.de>
PR tree-optimization/104263
* tree-cfg.cc (gimple_purge_dead_abnormal_call_edges):
Purge edges also when !cfun->has_nonlocal_label
and !cfun->calls_setjmp.
* gcc.dg/tree-ssa/inline-13.c: New testcase.
Document new `auipc' and `bitmanip' `type' attributes added respectively
with commit 88108b27dd ("RISC-V: Add sifive-7 pipeline description.")
and commit 283b1707f2 ("RISC-V: Implement instruction patterns for ZBA
extension.") but not listed so far.
gcc/
* config/riscv/riscv.md: Document `auipc' and `bitmanip' `type'
attributes.
The testcase in the PR (not included for the testsuite because we don't
have an (easy) way to -fcompare-debug LTO, we'd need 2 compilations/linking,
one with -g and one with -g0 and -fdump-rtl-final= at the end of lto1
and compare that) has different code generation for -g vs. -g0.
The difference appears during expansion, where we have a goto_locus
that is at -O0 compared to the INSN_LOCATION of the previous and next insn
across an edge. With -g0 the locations are equal and so no nop is added.
With -g the locations aren't equal and so a nop is added holding that
location.
The reason for the different location is in the way how we stream in
locations by lto1.
We have lto_location_cache::apply_location_cache that is called with some
set of expanded locations, qsorts them, creates location_t's for those
and remembers the last expanded location.
lto_location_cache::input_location_and_block when read in expanded_location
is equal to the last expanded location just reuses the last location_t
(or adds/changes/removes LOCATION_BLOCK in it), when it is not queues
it for next apply_location_cache. Now, when streaming in -g input, we can
see extra locations that don't appear with -g0, and if we are unlucky
enough, those can be sorted last during apply_location_cache and affect
what locations are used from the single entry cache next.
In particular, second apply_location_cache with non-empty loc_cache in
the testcase has 14 locations with -g0 and 16 with -g and those 2 extra
ones sort both last (they are the same). The last one from -g0 then
appears to be input_location_and_block sourced again, for -g0 triggers
the single entry cache, while for -g it doesn't and so apply_location_cache
will create for it another location_t with the same content.
The following patch fixes it by comparing everything we care about the
location instead (well, better in addition) to a simple location_t ==
location_t check. I think we don't care about the sysp flag for debug
info...
2022-01-28 Jakub Jelinek <jakub@redhat.com>
PR lto/104237
* cfgrtl.cc (loc_equal): New function.
(unique_locus_on_edge_between_p): Use it.
The following makes dumping of a function as graph work as intended
when specifying a function other than cfun. Unfortunately the loop
and the dominance APIs are not set up to work for other functions
than cfun so you won't get any fancy loop dumps but the non-loop
dump works up to reaching mark_dfs_back_edges which I trivially made
function aware and adjusted current callers with a wrapper.
With all this, doing dot-fn id->src_cfun from the debugger when
debugging inlining works. Previously you got a strange mix of
the src and dest functions visualized ;)
2022-01-28 Richard Biener <rguenther@suse.de>
* cfganal.h (mark_dfs_back_edges): Provide API with struct
function argument.
* cfganal.cc (mark_dfs_back_edges): Take a struct function
to work on, add a wrapper passing cfun.
* graph.cc (draw_cfg_nodes_no_loops): Replace stray cfun
uses with fun which is already passed.
(draw_cfg_edges): Likewise.
(draw_cfg_nodes_for_loop): Do not use draw_cfg_nodes_for_loop
for fun != cfun.