The name `waiting_cache` sounds like it is related to the states
`NodeState::Waiting`, but it's not; the cache holds nodes of various
states.
This commit changes it to `active_state`.
Now that all indices have type `usize`, it makes sense to be more
consistent about their naming. This commit removes all uses of `i` in
favour of `index`.
The size of the indices doesn't matter much here, and having a
`newtype_index!` index type without also using `IndexVec` requires lots
of conversions. So this commit removes `NodeIndex` in favour of uniform
use of `usize` as the index type. As well as making the code slightly
more concise, it also slightly speeds things up.
`Node` has an optional parent and a list of other descendents. Most of
the time the parent is treated the same as the other descendents --
error-handling is the exception -- and chaining them together for
iteration has a non-trivial cost.
This commit changes the representation. There is now a single list of
descendants, and a boolean flag that indicates if there is a parent (in
which case it is first descendent). This representation encodes the same
information, in a way that is less idiomatic but cheaper to iterate over
for the common case where the parent doesn't need special treatment.
As a result, some benchmark workloads are up to 2% faster.
This makes the code a little faster, presumably because bounds checks
aren't needed on `nodes` accesses. It requires making `scratch` a
`RefCell`, which is not unreasonable.
This commit removes the custom index implementation of `NodeIndex`,
which probably predates `newtype_index!`.
As well as eliminating code, it improves the debugging experience,
because the custom implementation had the property of being incremented
by 1 (so it could use `NonZeroU32`), which was incredibly confusing if
you didn't expect it.
For some reason, I also had to remove an `unsafe` block marker from
`from_u32_unchecked()` that the compiler said was now unnecessary.
This function is very hot, doesn't get inlined because it's recursive,
and the function calls are significant.
This commit splits it into inlined and uninlined variants, and uses the
inlined variant for the hot call site. This wins several percent on a
few benchmarks.
Speed up obligation forest code
Here are the rustc-perf benchmarks that get at least a 1% speedup on one or more of their runs with these patches applied:
```
inflate-check
avg: -8.7% min: -12.1% max: 0.0%
inflate
avg: -5.9% min: -8.6% max: 1.1%
inflate-opt
avg: -1.5% min: -2.0% max: -0.3%
clap-rs-check
avg: -0.6% min: -1.9% max: 0.5%
coercions
avg: -0.2%? min: -1.3%? max: 0.6%?
serde-opt
avg: -0.6% min: -1.0% max: 0.1%
coercions-check
avg: -0.4%? min: -1.0%? max: -0.0%?
```
The only instance of `ObligationForest` in use has an obligation type of
`PendingPredicateObligation`, which contains a `PredicateObligation` and a
`Vec<Ty>`.
`FulfillmentContext::pending_obligations()` calls
`ObligationForest::pending_obligations()`, which clones all the
`PendingPredicateObligation`s. But the `Vec<Ty>` field of those cloned
obligations is never touched.
This patch changes `ObligationForest::pending_obligations()` to
`map_pending_obligations` -- which gives callers control about which part
of the obligation to clone -- and takes advantage of the change to avoid
cloning the `Vec<Ty>`. The change speeds up runs of a few rustc-perf
benchmarks, the best by 1%.
This commit partially inlines two functions, `find_cycles_from_node` and
`mark_as_waiting_from`, at two call sites in order to avoid function
unnecessary function calls on hot paths.
It also fully inlines and removes `is_popped`.
These changes speeds up rustc-benchmarks/inflate-0.1.0 by about 2% when
doing debug builds with a stage1 compiler.