Rollup merge of #57652 - mark-i-m:remove-old, r=nikomatsakis

Update/remove some old readmes

r? @nikomatsakis

cc #48478

There are a bunch of READMEs with content that I would like to see a final decision made on:
- https://github.com/rust-lang/rust/tree/master/src/librustc/ty/query
- https://github.com/rust-lang/rust/tree/master/src/librustc/dep_graph
- https://github.com/rust-lang/rust/blob/master/src/librustc/infer/region_constraints
- https://github.com/rust-lang/rust/tree/master/src/librustc/infer/higher_ranked
- https://github.com/rust-lang/rust/tree/master/src/librustc/infer/lexical_region_resolve
- https://github.com/rust-lang/rust/blob/master/src/librustc_borrowck/borrowck

It's not clear how useful or obsolete any of these are. I would really appreciate if the appropriate domain experts for each of these could respond with one of (a) delete it, (b) wait for system to be remove, or (c) move it to rustc-guide. @nikomatsakis do you know who to ping for any of these (sorry, I suspect many of them are you)?
This commit is contained in:
Mazdak Farrokhzad 2019-01-25 01:37:00 +01:00 committed by GitHub
commit 2876801d18
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
7 changed files with 93 additions and 864 deletions

View File

@ -8,7 +8,6 @@ For more information on how various parts of the compiler work, see the [rustc g
There is also useful content in the following READMEs, which are gradually being moved over to the guide:
- https://github.com/rust-lang/rust/tree/master/src/librustc/ty/query
- https://github.com/rust-lang/rust/tree/master/src/librustc/dep_graph
- https://github.com/rust-lang/rust/blob/master/src/librustc/infer/region_constraints
- https://github.com/rust-lang/rust/tree/master/src/librustc/infer/higher_ranked
- https://github.com/rust-lang/rust/tree/master/src/librustc/infer/lexical_region_resolve

View File

@ -1,403 +1,8 @@
# Skolemization and functions
To learn more about how Higher-ranked trait bounds work in the _old_ trait
solver, see [this chapter][oldhrtb] of the rustc-guide.
One of the trickiest and most subtle aspects of regions is dealing
with higher-ranked things which include bound region variables, such
as function types. I strongly suggest that if you want to understand
the situation, you read this paper (which is, admittedly, very long,
but you don't have to read the whole thing):
To learn more about how they work in the _new_ trait solver, see [this
chapter][newhrtb].
http://research.microsoft.com/en-us/um/people/simonpj/papers/higher-rank/
Although my explanation will never compete with SPJ's (for one thing,
his is approximately 100 pages), I will attempt to explain the basic
problem and also how we solve it. Note that the paper only discusses
subtyping, not the computation of LUB/GLB.
The problem we are addressing is that there is a kind of subtyping
between functions with bound region parameters. Consider, for
example, whether the following relation holds:
for<'a> fn(&'a isize) <: for<'b> fn(&'b isize)? (Yes, a => b)
The answer is that of course it does. These two types are basically
the same, except that in one we used the name `a` and one we used
the name `b`.
In the examples that follow, it becomes very important to know whether
a lifetime is bound in a function type (that is, is a lifetime
parameter) or appears free (is defined in some outer scope).
Therefore, from now on I will always write the bindings explicitly,
using the Rust syntax `for<'a> fn(&'a isize)` to indicate that `a` is a
lifetime parameter.
Now let's consider two more function types. Here, we assume that the
`'b` lifetime is defined somewhere outside and hence is not a lifetime
parameter bound by the function type (it "appears free"):
for<'a> fn(&'a isize) <: fn(&'b isize)? (Yes, a => b)
This subtyping relation does in fact hold. To see why, you have to
consider what subtyping means. One way to look at `T1 <: T2` is to
say that it means that it is always ok to treat an instance of `T1` as
if it had the type `T2`. So, with our functions, it is always ok to
treat a function that can take pointers with any lifetime as if it
were a function that can only take a pointer with the specific
lifetime `'b`. After all, `'b` is a lifetime, after all, and
the function can take values of any lifetime.
You can also look at subtyping as the *is a* relationship. This amounts
to the same thing: a function that accepts pointers with any lifetime
*is a* function that accepts pointers with some specific lifetime.
So, what if we reverse the order of the two function types, like this:
fn(&'b isize) <: for<'a> fn(&'a isize)? (No)
Does the subtyping relationship still hold? The answer of course is
no. In this case, the function accepts *only the lifetime `'b`*,
so it is not reasonable to treat it as if it were a function that
accepted any lifetime.
What about these two examples:
for<'a,'b> fn(&'a isize, &'b isize) <: for<'a> fn(&'a isize, &'a isize)? (Yes)
for<'a> fn(&'a isize, &'a isize) <: for<'a,'b> fn(&'a isize, &'b isize)? (No)
Here, it is true that functions which take two pointers with any two
lifetimes can be treated as if they only accepted two pointers with
the same lifetime, but not the reverse.
## The algorithm
Here is the algorithm we use to perform the subtyping check:
1. Replace all bound regions in the subtype with new variables
2. Replace all bound regions in the supertype with placeholder
equivalents. A "placeholder" region is just a new fresh region
name.
3. Check that the parameter and return types match as normal
4. Ensure that no placeholder regions 'leak' into region variables
visible from "the outside"
Let's walk through some examples and see how this algorithm plays out.
#### First example
We'll start with the first example, which was:
1. for<'a> fn(&'a T) <: for<'b> fn(&'b T)? Yes: a -> b
After steps 1 and 2 of the algorithm we will have replaced the types
like so:
1. fn(&'A T) <: fn(&'x T)?
Here the upper case `&A` indicates a *region variable*, that is, a
region whose value is being inferred by the system. I also replaced
`&b` with `&x`---I'll use letters late in the alphabet (`x`, `y`, `z`)
to indicate placeholder region names. We can assume they don't appear
elsewhere. Note that neither the sub- nor the supertype bind any
region names anymore (as indicated by the absence of `<` and `>`).
The next step is to check that the parameter types match. Because
parameters are contravariant, this means that we check whether:
&'x T <: &'A T
Region pointers are contravariant so this implies that
&A <= &x
must hold, where `<=` is the subregion relationship. Processing
*this* constrain simply adds a constraint into our graph that `&A <=
&x` and is considered successful (it can, for example, be satisfied by
choosing the value `&x` for `&A`).
So far we have encountered no error, so the subtype check succeeds.
#### The third example
Now let's look first at the third example, which was:
3. fn(&'a T) <: for<'b> fn(&'b T)? No!
After steps 1 and 2 of the algorithm we will have replaced the types
like so:
3. fn(&'a T) <: fn(&'x T)?
This looks pretty much the same as before, except that on the LHS
`'a` was not bound, and hence was left as-is and not replaced with
a variable. The next step is again to check that the parameter types
match. This will ultimately require (as before) that `'a` <= `&x`
must hold: but this does not hold. `self` and `x` are both distinct
free regions. So the subtype check fails.
#### Checking for placeholder leaks
You may be wondering about that mysterious last step in the algorithm.
So far it has not been relevant. The purpose of that last step is to
catch something like *this*:
for<'a> fn() -> fn(&'a T) <: fn() -> for<'b> fn(&'b T)? No.
Here the function types are the same but for where the binding occurs.
The subtype returns a function that expects a value in precisely one
region. The supertype returns a function that expects a value in any
region. If we allow an instance of the subtype to be used where the
supertype is expected, then, someone could call the fn and think that
the return value has type `fn<b>(&'b T)` when it really has type
`fn(&'a T)` (this is case #3, above). Bad.
So let's step through what happens when we perform this subtype check.
We first replace the bound regions in the subtype (the supertype has
no bound regions). This gives us:
fn() -> fn(&'A T) <: fn() -> for<'b> fn(&'b T)?
Now we compare the return types, which are covariant, and hence we have:
fn(&'A T) <: for<'b> fn(&'b T)?
Here we replace the bound region in the supertype with a placeholder to yield:
fn(&'A T) <: fn(&'x T)?
And then proceed to compare the argument types:
&'x T <: &'A T
'A <= 'x
Finally, this is where it gets interesting! This is where an error
*should* be reported. But in fact this will not happen. The reason why
is that `A` is a variable: we will infer that its value is the fresh
region `x` and think that everything is happy. In fact, this behavior
is *necessary*, it was key to the first example we walked through.
The difference between this example and the first one is that the variable
`A` already existed at the point where the placeholders were added. In
the first example, you had two functions:
for<'a> fn(&'a T) <: for<'b> fn(&'b T)
and hence `&A` and `&x` were created "together". In general, the
intention of the placeholder names is that they are supposed to be
fresh names that could never be equal to anything from the outside.
But when inference comes into play, we might not be respecting this
rule.
So the way we solve this is to add a fourth step that examines the
constraints that refer to placeholder names. Basically, consider a
non-directed version of the constraint graph. Let `Tainted(x)` be the
set of all things reachable from a placeholder variable `x`.
`Tainted(x)` should not contain any regions that existed before the
step at which the placeholders were created. So this case here
would fail because `&x` was created alone, but is relatable to `&A`.
## Computing the LUB and GLB
The paper I pointed you at is written for Haskell. It does not
therefore considering subtyping and in particular does not consider
LUB or GLB computation. We have to consider this. Here is the
algorithm I implemented.
First though, let's discuss what we are trying to compute in more
detail. The LUB is basically the "common supertype" and the GLB is
"common subtype"; one catch is that the LUB should be the
*most-specific* common supertype and the GLB should be *most general*
common subtype (as opposed to any common supertype or any common
subtype).
Anyway, to help clarify, here is a table containing some function
pairs and their LUB/GLB (for conciseness, in this table, I'm just
including the lifetimes here, not the rest of the types, and I'm
writing `fn<>` instead of `for<> fn`):
```
Type 1 Type 2 LUB GLB
fn<'a>('a) fn('X) fn('X) fn<'a>('a)
fn('a) fn('X) -- fn<'a>('a)
fn<'a,'b>('a, 'b) fn<'x>('x, 'x) fn<'a>('a, 'a) fn<'a,'b>('a, 'b)
fn<'a,'b>('a, 'b, 'a) fn<'x,'y>('x, 'y, 'y) fn<'a>('a, 'a, 'a) fn<'a,'b,'c>('a,'b,'c)
```
### Conventions
I use lower-case letters (e.g., `&a`) for bound regions and upper-case
letters for free regions (`&A`). Region variables written with a
dollar-sign (e.g., `$a`). I will try to remember to enumerate the
bound-regions on the fn type as well (e.g., `for<'a> fn(&a)`).
### High-level summary
Both the LUB and the GLB algorithms work in a similar fashion. They
begin by replacing all bound regions (on both sides) with fresh region
inference variables. Therefore, both functions are converted to types
that contain only free regions. We can then compute the LUB/GLB in a
straightforward way, as described in `combine.rs`. This results in an
interim type T. The algorithms then examine the regions that appear
in T and try to, in some cases, replace them with bound regions to
yield the final result.
To decide whether to replace a region `R` that appears in `T` with
a bound region, the algorithms make use of two bits of
information. First is a set `V` that contains all region
variables created as part of the LUB/GLB computation (roughly; see
`region_vars_confined_to_snapshot()` for full details). `V` will
contain the region variables created to replace the bound regions
in the input types, but it also contains 'intermediate' variables
created to represent the LUB/GLB of individual regions.
Basically, when asked to compute the LUB/GLB of a region variable
with another region, the inferencer cannot oblige immediately
since the values of that variables are not known. Therefore, it
creates a new variable that is related to the two regions. For
example, the LUB of two variables `$x` and `$y` is a fresh
variable `$z` that is constrained such that `$x <= $z` and `$y <=
$z`. So `V` will contain these intermediate variables as well.
The other important factor in deciding how to replace a region in T is
the function `Tainted($r)` which, for a region variable, identifies
all regions that the region variable is related to in some way
(`Tainted()` made an appearance in the subtype computation as well).
### LUB
The LUB algorithm proceeds in three steps:
1. Replace all bound regions (on both sides) with fresh region
inference variables.
2. Compute the LUB "as normal", meaning compute the GLB of each
pair of argument types and the LUB of the return types and
so forth. Combine those to a new function type `F`.
3. Replace each region `R` that appears in `F` as follows:
- Let `V` be the set of variables created during the LUB
computational steps 1 and 2, as described in the previous section.
- If `R` is not in `V`, replace `R` with itself.
- If `Tainted(R)` contains a region that is not in `V`,
replace `R` with itself.
- Otherwise, select the earliest variable in `Tainted(R)` that originates
from the left-hand side and replace `R` with the bound region that
this variable was a replacement for.
So, let's work through the simplest example: `fn(&A)` and `for<'a> fn(&a)`.
In this case, `&a` will be replaced with `$a` and the interim LUB type
`fn($b)` will be computed, where `$b=GLB(&A,$a)`. Therefore, `V =
{$a, $b}` and `Tainted($b) = { $b, $a, &A }`. When we go to replace
`$b`, we find that since `&A \in Tainted($b)` is not a member of `V`,
we leave `$b` as is. When region inference happens, `$b` will be
resolved to `&A`, as we wanted.
Let's look at a more complex one: `fn(&a, &b)` and `fn(&x, &x)`. In
this case, we'll end up with a (pre-replacement) LUB type of `fn(&g,
&h)` and a graph that looks like:
```
$a $b *--$x
\ \ / /
\ $h-* /
$g-----------*
```
Here `$g` and `$h` are fresh variables that are created to represent
the LUB/GLB of things requiring inference. This means that `V` and
`Tainted` will look like:
```
V = {$a, $b, $g, $h, $x}
Tainted($g) = Tainted($h) = { $a, $b, $h, $g, $x }
```
Therefore we replace both `$g` and `$h` with `$a`, and end up
with the type `fn(&a, &a)`.
### GLB
The procedure for computing the GLB is similar. The difference lies
in computing the replacements for the various variables. For each
region `R` that appears in the type `F`, we again compute `Tainted(R)`
and examine the results:
1. If `R` is not in `V`, it is not replaced.
2. Else, if `Tainted(R)` contains only variables in `V`, and it
contains exactly one variable from the LHS and one variable from
the RHS, then `R` can be mapped to the bound version of the
variable from the LHS.
3. Else, if `Tainted(R)` contains no variable from the LHS and no
variable from the RHS, then `R` can be mapped to itself.
4. Else, `R` is mapped to a fresh bound variable.
These rules are pretty complex. Let's look at some examples to see
how they play out.
Out first example was `fn(&a)` and `fn(&X)`. In this case, `&a` will
be replaced with `$a` and we will ultimately compute a
(pre-replacement) GLB type of `fn($g)` where `$g=LUB($a,&X)`.
Therefore, `V={$a,$g}` and `Tainted($g)={$g,$a,&X}. To find the
replacement for `$g` we consult the rules above:
- Rule (1) does not apply because `$g \in V`
- Rule (2) does not apply because `&X \in Tainted($g)`
- Rule (3) does not apply because `$a \in Tainted($g)`
- Hence, by rule (4), we replace `$g` with a fresh bound variable `&z`.
So our final result is `fn(&z)`, which is correct.
The next example is `fn(&A)` and `fn(&Z)`. In this case, we will again
have a (pre-replacement) GLB of `fn(&g)`, where `$g = LUB(&A,&Z)`.
Therefore, `V={$g}` and `Tainted($g) = {$g, &A, &Z}`. In this case,
by rule (3), `$g` is mapped to itself, and hence the result is
`fn($g)`. This result is correct (in this case, at least), but it is
indicative of a case that *can* lead us into concluding that there is
no GLB when in fact a GLB does exist. See the section "Questionable
Results" below for more details.
The next example is `fn(&a, &b)` and `fn(&c, &c)`. In this case, as
before, we'll end up with `F=fn($g, $h)` where `Tainted($g) =
Tainted($h) = {$g, $h, $a, $b, $c}`. Only rule (4) applies and hence
we'll select fresh bound variables `y` and `z` and wind up with
`fn(&y, &z)`.
For the last example, let's consider what may seem trivial, but is
not: `fn(&a, &a)` and `fn(&b, &b)`. In this case, we'll get `F=fn($g,
$h)` where `Tainted($g) = {$g, $a, $x}` and `Tainted($h) = {$h, $a,
$x}`. Both of these sets contain exactly one bound variable from each
side, so we'll map them both to `&a`, resulting in `fn(&a, &a)`, which
is the desired result.
### Shortcomings and correctness
You may be wondering whether this algorithm is correct. The answer is
"sort of". There are definitely cases where they fail to compute a
result even though a correct result exists. I believe, though, that
if they succeed, then the result is valid, and I will attempt to
convince you. The basic argument is that the "pre-replacement" step
computes a set of constraints. The replacements, then, attempt to
satisfy those constraints, using bound identifiers where needed.
For now I will briefly go over the cases for LUB/GLB and identify
their intent:
- LUB:
- The region variables that are substituted in place of bound regions
are intended to collect constraints on those bound regions.
- If Tainted(R) contains only values in V, then this region is unconstrained
and can therefore be generalized, otherwise it cannot.
- GLB:
- The region variables that are substituted in place of bound regions
are intended to collect constraints on those bound regions.
- If Tainted(R) contains exactly one variable from each side, and
only variables in V, that indicates that those two bound regions
must be equated.
- Otherwise, if Tainted(R) references any variables from left or right
side, then it is trying to combine a bound region with a free one or
multiple bound regions, so we need to select fresh bound regions.
Sorry this is more of a shorthand to myself. I will try to write up something
more convincing in the future.
#### Where are the algorithms wrong?
- The pre-replacement computation can fail even though using a
bound-region would have succeeded.
- We will compute GLB(fn(fn($a)), fn(fn($b))) as fn($c) where $c is the
GLB of $a and $b. But if inference finds that $a and $b must be mapped
to regions without a GLB, then this is effectively a failure to compute
the GLB. However, the result `fn<$c>(fn($c))` is a valid GLB.
[oldhrtb]: https://rust-lang.github.io/rustc-guide/traits/hrtb.html
[newhrtb]: https://rust-lang.github.io/rustc-guide/borrow_check/region_inference.html#placeholders-and-universes

View File

@ -2,8 +2,12 @@
> WARNING: This README is obsolete and will be removed soon! For
> more info on how the current borrowck works, see the [rustc guide].
>
> As of edition 2018, region inference is done using Non-lexical lifetimes,
> which is described in the guide and [this RFC].
[rustc guide]: https://rust-lang.github.io/rustc-guide/mir/borrowck.html
[this RFC]: https://github.com/rust-lang/rfcs/blob/master/text/2094-nll.md
## Terminology

View File

@ -1,77 +1,3 @@
# Region constraint collection
> WARNING: This README is obsolete and will be removed soon! For
> more info on how the current borrowck works, see the [rustc guide].
For info on how the current borrowck works, see the [rustc guide].
[rustc guide]: https://rust-lang.github.io/rustc-guide/mir/borrowck.html
## Terminology
Note that we use the terms region and lifetime interchangeably.
## Introduction
As described in the rustc guide [chapter on type inference][ti], and unlike
normal type inference, which is similar in spirit to H-M and thus
works progressively, the region type inference works by accumulating
constraints over the course of a function. Finally, at the end of
processing a function, we process and solve the constraints all at
once.
[ti]: https://rust-lang.github.io/rustc-guide/type-inference.html
The constraints are always of one of three possible forms:
- `ConstrainVarSubVar(Ri, Rj)` states that region variable Ri must be
a subregion of Rj
- `ConstrainRegSubVar(R, Ri)` states that the concrete region R (which
must not be a variable) must be a subregion of the variable Ri
- `ConstrainVarSubReg(Ri, R)` states the variable Ri should be less
than the concrete region R. This is kind of deprecated and ought to
be replaced with a verify (they essentially play the same role).
In addition to constraints, we also gather up a set of "verifys"
(what, you don't think Verify is a noun? Get used to it my
friend!). These represent relations that must hold but which don't
influence inference proper. These take the form of:
- `VerifyRegSubReg(Ri, Rj)` indicates that Ri <= Rj must hold,
where Rj is not an inference variable (and Ri may or may not contain
one). This doesn't influence inference because we will already have
inferred Ri to be as small as possible, so then we just test whether
that result was less than Rj or not.
- `VerifyGenericBound(R, Vb)` is a more complex expression which tests
that the region R must satisfy the bound `Vb`. The bounds themselves
may have structure like "must outlive one of the following regions"
or "must outlive ALL of the following regions. These bounds arise
from constraints like `T: 'a` -- if we know that `T: 'b` and `T: 'c`
(say, from where clauses), then we can conclude that `T: 'a` if `'b:
'a` *or* `'c: 'a`.
## Building up the constraints
Variables and constraints are created using the following methods:
- `new_region_var()` creates a new, unconstrained region variable;
- `make_subregion(Ri, Rj)` states that Ri is a subregion of Rj
- `lub_regions(Ri, Rj) -> Rk` returns a region Rk which is
the smallest region that is greater than both Ri and Rj
- `glb_regions(Ri, Rj) -> Rk` returns a region Rk which is
the greatest region that is smaller than both Ri and Rj
The actual region resolution algorithm is not entirely
obvious, though it is also not overly complex.
## Snapshotting
It is also permitted to try (and rollback) changes to the graph. This
is done by invoking `start_snapshot()`, which returns a value. Then
later you can call `rollback_to()` which undoes the work.
Alternatively, you can call `commit()` which ends all snapshots.
Snapshots can be recursive---so you can start a snapshot when another
is in progress, but only the root snapshot can "commit".
## Skolemization
For a discussion on skolemization and higher-ranked subtyping, please
see the module `middle::infer::higher_ranked::doc`.

View File

@ -1,302 +1,3 @@
# The Rust Compiler Query System
The Compiler Query System is the key to our new demand-driven
organization. The idea is pretty simple. You have various queries
that compute things about the input -- for example, there is a query
called `type_of(def_id)` that, given the def-id of some item, will
compute the type of that item and return it to you.
Query execution is **memoized** -- so the first time you invoke a
query, it will go do the computation, but the next time, the result is
returned from a hashtable. Moreover, query execution fits nicely into
**incremental computation**; the idea is roughly that, when you do a
query, the result **may** be returned to you by loading stored data
from disk (but that's a separate topic we won't discuss further here).
The overall vision is that, eventually, the entire compiler
control-flow will be query driven. There will effectively be one
top-level query ("compile") that will run compilation on a crate; this
will in turn demand information about that crate, starting from the
*end*. For example:
- This "compile" query might demand to get a list of codegen-units
(i.e., modules that need to be compiled by LLVM).
- But computing the list of codegen-units would invoke some subquery
that returns the list of all modules defined in the Rust source.
- That query in turn would invoke something asking for the HIR.
- This keeps going further and further back until we wind up doing the
actual parsing.
However, that vision is not fully realized. Still, big chunks of the
compiler (for example, generating MIR) work exactly like this.
### Invoking queries
To invoke a query is simple. The tcx ("type context") offers a method
for each defined query. So, for example, to invoke the `type_of`
query, you would just do this:
```rust
let ty = tcx.type_of(some_def_id);
```
### Cycles between queries
Currently, cycles during query execution should always result in a
compilation error. Typically, they arise because of illegal programs
that contain cyclic references they shouldn't (though sometimes they
arise because of compiler bugs, in which case we need to factor our
queries in a more fine-grained fashion to avoid them).
However, it is nonetheless often useful to *recover* from a cycle
(after reporting an error, say) and try to soldier on, so as to give a
better user experience. In order to recover from a cycle, you don't
get to use the nice method-call-style syntax. Instead, you invoke
using the `try_get` method, which looks roughly like this:
```rust
use ty::query::queries;
...
match queries::type_of::try_get(tcx, DUMMY_SP, self.did) {
Ok(result) => {
// no cycle occurred! You can use `result`
}
Err(err) => {
// A cycle occurred! The error value `err` is a `DiagnosticBuilder`,
// meaning essentially an "in-progress", not-yet-reported error message.
// See below for more details on what to do here.
}
}
```
So, if you get back an `Err` from `try_get`, then a cycle *did* occur. This means that
you must ensure that a compiler error message is reported. You can do that in two ways:
The simplest is to invoke `err.emit()`. This will emit the cycle error to the user.
However, often cycles happen because of an illegal program, and you
know at that point that an error either already has been reported or
will be reported due to this cycle by some other bit of code. In that
case, you can invoke `err.cancel()` to not emit any error. It is
traditional to then invoke:
```
tcx.sess.delay_span_bug(some_span, "some message")
```
`delay_span_bug()` is a helper that says: we expect a compilation
error to have happened or to happen in the future; so, if compilation
ultimately succeeds, make an ICE with the message `"some
message"`. This is basically just a precaution in case you are wrong.
### How the compiler executes a query
So you may be wondering what happens when you invoke a query
method. The answer is that, for each query, the compiler maintains a
cache -- if your query has already been executed, then, the answer is
simple: we clone the return value out of the cache and return it
(therefore, you should try to ensure that the return types of queries
are cheaply cloneable; insert a `Rc` if necessary).
#### Providers
If, however, the query is *not* in the cache, then the compiler will
try to find a suitable **provider**. A provider is a function that has
been defined and linked into the compiler somewhere that contains the
code to compute the result of the query.
**Providers are defined per-crate.** The compiler maintains,
internally, a table of providers for every crate, at least
conceptually. Right now, there are really two sets: the providers for
queries about the **local crate** (that is, the one being compiled)
and providers for queries about **external crates** (that is,
dependencies of the local crate). Note that what determines the crate
that a query is targeting is not the *kind* of query, but the *key*.
For example, when you invoke `tcx.type_of(def_id)`, that could be a
local query or an external query, depending on what crate the `def_id`
is referring to (see the `self::keys::Key` trait for more information
on how that works).
Providers always have the same signature:
```rust
fn provider<'cx, 'tcx>(tcx: TyCtxt<'cx, 'tcx, 'tcx>,
key: QUERY_KEY)
-> QUERY_RESULT
{
...
}
```
Providers take two arguments: the `tcx` and the query key. Note also
that they take the *global* tcx (i.e., they use the `'tcx` lifetime
twice), rather than taking a tcx with some active inference context.
They return the result of the query.
#### How providers are setup
When the tcx is created, it is given the providers by its creator using
the `Providers` struct. This struct is generate by the macros here, but it
is basically a big list of function pointers:
```rust
struct Providers {
type_of: for<'cx, 'tcx> fn(TyCtxt<'cx, 'tcx, 'tcx>, DefId) -> Ty<'tcx>,
...
}
```
At present, we have one copy of the struct for local crates, and one
for external crates, though the plan is that we may eventually have
one per crate.
These `Provider` structs are ultimately created and populated by
`librustc_driver`, but it does this by distributing the work
throughout the other `rustc_*` crates. This is done by invoking
various `provide` functions. These functions tend to look something
like this:
```rust
pub fn provide(providers: &mut Providers) {
*providers = Providers {
type_of,
..*providers
};
}
```
That is, they take an `&mut Providers` and mutate it in place. Usually
we use the formulation above just because it looks nice, but you could
as well do `providers.type_of = type_of`, which would be equivalent.
(Here, `type_of` would be a top-level function, defined as we saw
before.) So, if we want to add a provider for some other query,
let's call it `fubar`, into the crate above, we might modify the `provide()`
function like so:
```rust
pub fn provide(providers: &mut Providers) {
*providers = Providers {
type_of,
fubar,
..*providers
};
}
fn fubar<'cx, 'tcx>(tcx: TyCtxt<'cx, 'tcx>, key: DefId) -> Fubar<'tcx> { .. }
```
NB. Most of the `rustc_*` crates only provide **local
providers**. Almost all **extern providers** wind up going through the
`rustc_metadata` crate, which loads the information from the crate
metadata. But in some cases there are crates that provide queries for
*both* local and external crates, in which case they define both a
`provide` and a `provide_extern` function that `rustc_driver` can
invoke.
### Adding a new kind of query
So suppose you want to add a new kind of query, how do you do so?
Well, defining a query takes place in two steps:
1. first, you have to specify the query name and arguments; and then,
2. you have to supply query providers where needed.
To specify the query name and arguments, you simply add an entry
to the big macro invocation in `mod.rs`. This will probably have changed
by the time you read this README, but at present it looks something
like:
```
define_queries! { <'tcx>
/// Records the type of every item.
[] fn type_of: TypeOfItem(DefId) -> Ty<'tcx>,
...
}
```
Each line of the macro defines one query. The name is broken up like this:
```
[] fn type_of: TypeOfItem(DefId) -> Ty<'tcx>,
^^ ^^^^^^^ ^^^^^^^^^^ ^^^^^ ^^^^^^^^
| | | | |
| | | | result type of query
| | | query key type
| | dep-node constructor
| name of query
query flags
```
Let's go over them one by one:
- **Query flags:** these are largely unused right now, but the intention
is that we'll be able to customize various aspects of how the query is
processed.
- **Name of query:** the name of the query method
(`tcx.type_of(..)`). Also used as the name of a struct
(`ty::query::queries::type_of`) that will be generated to represent
this query.
- **Dep-node constructor:** indicates the constructor function that
connects this query to incremental compilation. Typically, this is a
`DepNode` variant, which can be added by modifying the
`define_dep_nodes!` macro invocation in
`librustc/dep_graph/dep_node.rs`.
- However, sometimes we use a custom function, in which case the
name will be in snake case and the function will be defined at the
bottom of the file. This is typically used when the query key is
not a def-id, or just not the type that the dep-node expects.
- **Query key type:** the type of the argument to this query.
This type must implement the `ty::query::keys::Key` trait, which
defines (for example) how to map it to a crate, and so forth.
- **Result type of query:** the type produced by this query. This type
should (a) not use `RefCell` or other interior mutability and (b) be
cheaply cloneable. Interning or using `Rc` or `Arc` is recommended for
non-trivial data types.
- The one exception to those rules is the `ty::steal::Steal` type,
which is used to cheaply modify MIR in place. See the definition
of `Steal` for more details. New uses of `Steal` should **not** be
added without alerting `@rust-lang/compiler`.
So, to add a query:
- Add an entry to `define_queries!` using the format above.
- Possibly add a corresponding entry to the dep-node macro.
- Link the provider by modifying the appropriate `provide` method;
or add a new one if needed and ensure that `rustc_driver` is invoking it.
#### Query structs and descriptions
For each kind, the `define_queries` macro will generate a "query struct"
named after the query. This struct is a kind of a place-holder
describing the query. Each such struct implements the
`self::config::QueryConfig` trait, which has associated types for the
key/value of that particular query. Basically the code generated looks something
like this:
```rust
// Dummy struct representing a particular kind of query:
pub struct type_of<'tcx> { phantom: PhantomData<&'tcx ()> }
impl<'tcx> QueryConfig for type_of<'tcx> {
type Key = DefId;
type Value = Ty<'tcx>;
}
```
There is an additional trait that you may wish to implement called
`self::config::QueryDescription`. This trait is used during cycle
errors to give a "human readable" name for the query, so that we can
summarize what was happening when the cycle occurred. Implementing
this trait is optional if the query key is `DefId`, but if you *don't*
implement it, you get a pretty generic error ("processing `foo`...").
You can put new impls into the `config` module. They look something like this:
```rust
impl<'tcx> QueryDescription for queries::type_of<'tcx> {
fn describe(tcx: TyCtxt<'_, '_, '_>, key: DefId) -> String {
format!("computing the type of `{}`", tcx.item_path_str(key))
}
}
```
For more information about how the query system works, see the [rustc guide].
[rustc guide]: https://rust-lang.github.io/rustc-guide/query.html

View File

@ -1,81 +0,0 @@
The `ObligationForest` is a utility data structure used in trait
matching to track the set of outstanding obligations (those not yet
resolved to success or error). It also tracks the "backtrace" of each
pending obligation (why we are trying to figure this out in the first
place).
### External view
`ObligationForest` supports two main public operations (there are a
few others not discussed here):
1. Add a new root obligations (`push_tree`).
2. Process the pending obligations (`process_obligations`).
When a new obligation `N` is added, it becomes the root of an
obligation tree. This tree can also carry some per-tree state `T`,
which is given at the same time. This tree is a singleton to start, so
`N` is both the root and the only leaf. Each time the
`process_obligations` method is called, it will invoke its callback
with every pending obligation (so that will include `N`, the first
time). The callback also receives a (mutable) reference to the
per-tree state `T`. The callback should process the obligation `O`
that it is given and return one of three results:
- `Ok(None)` -> ambiguous result. Obligation was neither a success
nor a failure. It is assumed that further attempts to process the
obligation will yield the same result unless something in the
surrounding environment changes.
- `Ok(Some(C))` - the obligation was *shallowly successful*. The
vector `C` is a list of subobligations. The meaning of this is that
`O` was successful on the assumption that all the obligations in `C`
are also successful. Therefore, `O` is only considered a "true"
success if `C` is empty. Otherwise, `O` is put into a suspended
state and the obligations in `C` become the new pending
obligations. They will be processed the next time you call
`process_obligations`.
- `Err(E)` -> obligation failed with error `E`. We will collect this
error and return it from `process_obligations`, along with the
"backtrace" of obligations (that is, the list of obligations up to
and including the root of the failed obligation). No further
obligations from that same tree will be processed, since the tree is
now considered to be in error.
When the call to `process_obligations` completes, you get back an `Outcome`,
which includes three bits of information:
- `completed`: a list of obligations where processing was fully
completed without error (meaning that all transitive subobligations
have also been completed). So, for example, if the callback from
`process_obligations` returns `Ok(Some(C))` for some obligation `O`,
then `O` will be considered completed right away if `C` is the
empty vector. Otherwise it will only be considered completed once
all the obligations in `C` have been found completed.
- `errors`: a list of errors that occurred and associated backtraces
at the time of error, which can be used to give context to the user.
- `stalled`: if true, then none of the existing obligations were
*shallowly successful* (that is, no callback returned `Ok(Some(_))`).
This implies that all obligations were either errors or returned an
ambiguous result, which means that any further calls to
`process_obligations` would simply yield back further ambiguous
results. This is used by the `FulfillmentContext` to decide when it
has reached a steady state.
#### Snapshots
The `ObligationForest` supports a limited form of snapshots; see
`start_snapshot`; `commit_snapshot`; and `rollback_snapshot`. In
particular, you can use a snapshot to roll back new root
obligations. However, it is an error to attempt to
`process_obligations` during a snapshot.
### Implementation details
For the most part, comments specific to the implementation are in the
code. This file only contains a very high-level overview. Basically,
the forest is stored in a vector. Each element of the vector is a node
in some tree. Each node in the vector has the index of an (optional)
parent and (for convenience) its root (which may be itself). It also
has a current state, described by `NodeState`. After each
processing step, we compress the vector to remove completed and error
nodes, which aren't needed anymore.

View File

@ -1,9 +1,84 @@
//! The `ObligationForest` is a utility data structure used in trait
//! matching to track the set of outstanding obligations (those not
//! yet resolved to success or error). It also tracks the "backtrace"
//! of each pending obligation (why we are trying to figure this out
//! in the first place). See README.md for a general overview of how
//! to use this class.
//! matching to track the set of outstanding obligations (those not yet
//! resolved to success or error). It also tracks the "backtrace" of each
//! pending obligation (why we are trying to figure this out in the first
//! place).
//!
//! ### External view
//!
//! `ObligationForest` supports two main public operations (there are a
//! few others not discussed here):
//!
//! 1. Add a new root obligations (`push_tree`).
//! 2. Process the pending obligations (`process_obligations`).
//!
//! When a new obligation `N` is added, it becomes the root of an
//! obligation tree. This tree can also carry some per-tree state `T`,
//! which is given at the same time. This tree is a singleton to start, so
//! `N` is both the root and the only leaf. Each time the
//! `process_obligations` method is called, it will invoke its callback
//! with every pending obligation (so that will include `N`, the first
//! time). The callback also receives a (mutable) reference to the
//! per-tree state `T`. The callback should process the obligation `O`
//! that it is given and return one of three results:
//!
//! - `Ok(None)` -> ambiguous result. Obligation was neither a success
//! nor a failure. It is assumed that further attempts to process the
//! obligation will yield the same result unless something in the
//! surrounding environment changes.
//! - `Ok(Some(C))` - the obligation was *shallowly successful*. The
//! vector `C` is a list of subobligations. The meaning of this is that
//! `O` was successful on the assumption that all the obligations in `C`
//! are also successful. Therefore, `O` is only considered a "true"
//! success if `C` is empty. Otherwise, `O` is put into a suspended
//! state and the obligations in `C` become the new pending
//! obligations. They will be processed the next time you call
//! `process_obligations`.
//! - `Err(E)` -> obligation failed with error `E`. We will collect this
//! error and return it from `process_obligations`, along with the
//! "backtrace" of obligations (that is, the list of obligations up to
//! and including the root of the failed obligation). No further
//! obligations from that same tree will be processed, since the tree is
//! now considered to be in error.
//!
//! When the call to `process_obligations` completes, you get back an `Outcome`,
//! which includes three bits of information:
//!
//! - `completed`: a list of obligations where processing was fully
//! completed without error (meaning that all transitive subobligations
//! have also been completed). So, for example, if the callback from
//! `process_obligations` returns `Ok(Some(C))` for some obligation `O`,
//! then `O` will be considered completed right away if `C` is the
//! empty vector. Otherwise it will only be considered completed once
//! all the obligations in `C` have been found completed.
//! - `errors`: a list of errors that occurred and associated backtraces
//! at the time of error, which can be used to give context to the user.
//! - `stalled`: if true, then none of the existing obligations were
//! *shallowly successful* (that is, no callback returned `Ok(Some(_))`).
//! This implies that all obligations were either errors or returned an
//! ambiguous result, which means that any further calls to
//! `process_obligations` would simply yield back further ambiguous
//! results. This is used by the `FulfillmentContext` to decide when it
//! has reached a steady state.
//!
//! #### Snapshots
//!
//! The `ObligationForest` supports a limited form of snapshots; see
//! `start_snapshot`; `commit_snapshot`; and `rollback_snapshot`. In
//! particular, you can use a snapshot to roll back new root
//! obligations. However, it is an error to attempt to
//! `process_obligations` during a snapshot.
//!
//! ### Implementation details
//!
//! For the most part, comments specific to the implementation are in the
//! code. This file only contains a very high-level overview. Basically,
//! the forest is stored in a vector. Each element of the vector is a node
//! in some tree. Each node in the vector has the index of an (optional)
//! parent and (for convenience) its root (which may be itself). It also
//! has a current state, described by `NodeState`. After each
//! processing step, we compress the vector to remove completed and error
//! nodes, which aren't needed anymore.
use fx::{FxHashMap, FxHashSet};