From a6294b762859b3bb86b2a94c6857ae39beea9aeb Mon Sep 17 00:00:00 2001 From: mark Date: Tue, 15 Jan 2019 19:38:02 -0600 Subject: [PATCH 1/3] update/remove some old readmes --- .../infer/lexical_region_resolve/README.md | 4 + .../infer/region_constraints/README.md | 76 +---------------- .../obligation_forest/README.md | 81 ------------------ .../obligation_forest/mod.rs | 85 +++++++++++++++++-- 4 files changed, 85 insertions(+), 161 deletions(-) delete mode 100644 src/librustc_data_structures/obligation_forest/README.md diff --git a/src/librustc/infer/lexical_region_resolve/README.md b/src/librustc/infer/lexical_region_resolve/README.md index 4483e522f3a..56320636a67 100644 --- a/src/librustc/infer/lexical_region_resolve/README.md +++ b/src/librustc/infer/lexical_region_resolve/README.md @@ -2,8 +2,12 @@ > WARNING: This README is obsolete and will be removed soon! For > more info on how the current borrowck works, see the [rustc guide]. +> +> As of edition 2018, region inference is done using Non-lexical lifetimes, +> which is described in the guide and [this RFC]. [rustc guide]: https://rust-lang.github.io/rustc-guide/mir/borrowck.html +[this RFC]: https://github.com/rust-lang/rfcs/blob/master/text/2094-nll.md ## Terminology diff --git a/src/librustc/infer/region_constraints/README.md b/src/librustc/infer/region_constraints/README.md index 775bbf955b8..ea7fffe9dc1 100644 --- a/src/librustc/infer/region_constraints/README.md +++ b/src/librustc/infer/region_constraints/README.md @@ -1,77 +1,3 @@ -# Region constraint collection - -> WARNING: This README is obsolete and will be removed soon! For -> more info on how the current borrowck works, see the [rustc guide]. +For info on how the current borrowck works, see the [rustc guide]. [rustc guide]: https://rust-lang.github.io/rustc-guide/mir/borrowck.html - -## Terminology - -Note that we use the terms region and lifetime interchangeably. - -## Introduction - -As described in the rustc guide [chapter on type inference][ti], and unlike -normal type inference, which is similar in spirit to H-M and thus -works progressively, the region type inference works by accumulating -constraints over the course of a function. Finally, at the end of -processing a function, we process and solve the constraints all at -once. - -[ti]: https://rust-lang.github.io/rustc-guide/type-inference.html - -The constraints are always of one of three possible forms: - -- `ConstrainVarSubVar(Ri, Rj)` states that region variable Ri must be - a subregion of Rj -- `ConstrainRegSubVar(R, Ri)` states that the concrete region R (which - must not be a variable) must be a subregion of the variable Ri -- `ConstrainVarSubReg(Ri, R)` states the variable Ri should be less - than the concrete region R. This is kind of deprecated and ought to - be replaced with a verify (they essentially play the same role). - -In addition to constraints, we also gather up a set of "verifys" -(what, you don't think Verify is a noun? Get used to it my -friend!). These represent relations that must hold but which don't -influence inference proper. These take the form of: - -- `VerifyRegSubReg(Ri, Rj)` indicates that Ri <= Rj must hold, - where Rj is not an inference variable (and Ri may or may not contain - one). This doesn't influence inference because we will already have - inferred Ri to be as small as possible, so then we just test whether - that result was less than Rj or not. -- `VerifyGenericBound(R, Vb)` is a more complex expression which tests - that the region R must satisfy the bound `Vb`. The bounds themselves - may have structure like "must outlive one of the following regions" - or "must outlive ALL of the following regions. These bounds arise - from constraints like `T: 'a` -- if we know that `T: 'b` and `T: 'c` - (say, from where clauses), then we can conclude that `T: 'a` if `'b: - 'a` *or* `'c: 'a`. - -## Building up the constraints - -Variables and constraints are created using the following methods: - -- `new_region_var()` creates a new, unconstrained region variable; -- `make_subregion(Ri, Rj)` states that Ri is a subregion of Rj -- `lub_regions(Ri, Rj) -> Rk` returns a region Rk which is - the smallest region that is greater than both Ri and Rj -- `glb_regions(Ri, Rj) -> Rk` returns a region Rk which is - the greatest region that is smaller than both Ri and Rj - -The actual region resolution algorithm is not entirely -obvious, though it is also not overly complex. - -## Snapshotting - -It is also permitted to try (and rollback) changes to the graph. This -is done by invoking `start_snapshot()`, which returns a value. Then -later you can call `rollback_to()` which undoes the work. -Alternatively, you can call `commit()` which ends all snapshots. -Snapshots can be recursive---so you can start a snapshot when another -is in progress, but only the root snapshot can "commit". - -## Skolemization - -For a discussion on skolemization and higher-ranked subtyping, please -see the module `middle::infer::higher_ranked::doc`. diff --git a/src/librustc_data_structures/obligation_forest/README.md b/src/librustc_data_structures/obligation_forest/README.md deleted file mode 100644 index 982a2bacce1..00000000000 --- a/src/librustc_data_structures/obligation_forest/README.md +++ /dev/null @@ -1,81 +0,0 @@ -The `ObligationForest` is a utility data structure used in trait -matching to track the set of outstanding obligations (those not yet -resolved to success or error). It also tracks the "backtrace" of each -pending obligation (why we are trying to figure this out in the first -place). - -### External view - -`ObligationForest` supports two main public operations (there are a -few others not discussed here): - -1. Add a new root obligations (`push_tree`). -2. Process the pending obligations (`process_obligations`). - -When a new obligation `N` is added, it becomes the root of an -obligation tree. This tree can also carry some per-tree state `T`, -which is given at the same time. This tree is a singleton to start, so -`N` is both the root and the only leaf. Each time the -`process_obligations` method is called, it will invoke its callback -with every pending obligation (so that will include `N`, the first -time). The callback also receives a (mutable) reference to the -per-tree state `T`. The callback should process the obligation `O` -that it is given and return one of three results: - -- `Ok(None)` -> ambiguous result. Obligation was neither a success - nor a failure. It is assumed that further attempts to process the - obligation will yield the same result unless something in the - surrounding environment changes. -- `Ok(Some(C))` - the obligation was *shallowly successful*. The - vector `C` is a list of subobligations. The meaning of this is that - `O` was successful on the assumption that all the obligations in `C` - are also successful. Therefore, `O` is only considered a "true" - success if `C` is empty. Otherwise, `O` is put into a suspended - state and the obligations in `C` become the new pending - obligations. They will be processed the next time you call - `process_obligations`. -- `Err(E)` -> obligation failed with error `E`. We will collect this - error and return it from `process_obligations`, along with the - "backtrace" of obligations (that is, the list of obligations up to - and including the root of the failed obligation). No further - obligations from that same tree will be processed, since the tree is - now considered to be in error. - -When the call to `process_obligations` completes, you get back an `Outcome`, -which includes three bits of information: - -- `completed`: a list of obligations where processing was fully - completed without error (meaning that all transitive subobligations - have also been completed). So, for example, if the callback from - `process_obligations` returns `Ok(Some(C))` for some obligation `O`, - then `O` will be considered completed right away if `C` is the - empty vector. Otherwise it will only be considered completed once - all the obligations in `C` have been found completed. -- `errors`: a list of errors that occurred and associated backtraces - at the time of error, which can be used to give context to the user. -- `stalled`: if true, then none of the existing obligations were - *shallowly successful* (that is, no callback returned `Ok(Some(_))`). - This implies that all obligations were either errors or returned an - ambiguous result, which means that any further calls to - `process_obligations` would simply yield back further ambiguous - results. This is used by the `FulfillmentContext` to decide when it - has reached a steady state. - -#### Snapshots - -The `ObligationForest` supports a limited form of snapshots; see -`start_snapshot`; `commit_snapshot`; and `rollback_snapshot`. In -particular, you can use a snapshot to roll back new root -obligations. However, it is an error to attempt to -`process_obligations` during a snapshot. - -### Implementation details - -For the most part, comments specific to the implementation are in the -code. This file only contains a very high-level overview. Basically, -the forest is stored in a vector. Each element of the vector is a node -in some tree. Each node in the vector has the index of an (optional) -parent and (for convenience) its root (which may be itself). It also -has a current state, described by `NodeState`. After each -processing step, we compress the vector to remove completed and error -nodes, which aren't needed anymore. diff --git a/src/librustc_data_structures/obligation_forest/mod.rs b/src/librustc_data_structures/obligation_forest/mod.rs index 59497f0df18..9dd7d204f03 100644 --- a/src/librustc_data_structures/obligation_forest/mod.rs +++ b/src/librustc_data_structures/obligation_forest/mod.rs @@ -1,9 +1,84 @@ //! The `ObligationForest` is a utility data structure used in trait -//! matching to track the set of outstanding obligations (those not -//! yet resolved to success or error). It also tracks the "backtrace" -//! of each pending obligation (why we are trying to figure this out -//! in the first place). See README.md for a general overview of how -//! to use this class. +//! matching to track the set of outstanding obligations (those not yet +//! resolved to success or error). It also tracks the "backtrace" of each +//! pending obligation (why we are trying to figure this out in the first +//! place). +//! +//! ### External view +//! +//! `ObligationForest` supports two main public operations (there are a +//! few others not discussed here): +//! +//! 1. Add a new root obligations (`push_tree`). +//! 2. Process the pending obligations (`process_obligations`). +//! +//! When a new obligation `N` is added, it becomes the root of an +//! obligation tree. This tree can also carry some per-tree state `T`, +//! which is given at the same time. This tree is a singleton to start, so +//! `N` is both the root and the only leaf. Each time the +//! `process_obligations` method is called, it will invoke its callback +//! with every pending obligation (so that will include `N`, the first +//! time). The callback also receives a (mutable) reference to the +//! per-tree state `T`. The callback should process the obligation `O` +//! that it is given and return one of three results: +//! +//! - `Ok(None)` -> ambiguous result. Obligation was neither a success +//! nor a failure. It is assumed that further attempts to process the +//! obligation will yield the same result unless something in the +//! surrounding environment changes. +//! - `Ok(Some(C))` - the obligation was *shallowly successful*. The +//! vector `C` is a list of subobligations. The meaning of this is that +//! `O` was successful on the assumption that all the obligations in `C` +//! are also successful. Therefore, `O` is only considered a "true" +//! success if `C` is empty. Otherwise, `O` is put into a suspended +//! state and the obligations in `C` become the new pending +//! obligations. They will be processed the next time you call +//! `process_obligations`. +//! - `Err(E)` -> obligation failed with error `E`. We will collect this +//! error and return it from `process_obligations`, along with the +//! "backtrace" of obligations (that is, the list of obligations up to +//! and including the root of the failed obligation). No further +//! obligations from that same tree will be processed, since the tree is +//! now considered to be in error. +//! +//! When the call to `process_obligations` completes, you get back an `Outcome`, +//! which includes three bits of information: +//! +//! - `completed`: a list of obligations where processing was fully +//! completed without error (meaning that all transitive subobligations +//! have also been completed). So, for example, if the callback from +//! `process_obligations` returns `Ok(Some(C))` for some obligation `O`, +//! then `O` will be considered completed right away if `C` is the +//! empty vector. Otherwise it will only be considered completed once +//! all the obligations in `C` have been found completed. +//! - `errors`: a list of errors that occurred and associated backtraces +//! at the time of error, which can be used to give context to the user. +//! - `stalled`: if true, then none of the existing obligations were +//! *shallowly successful* (that is, no callback returned `Ok(Some(_))`). +//! This implies that all obligations were either errors or returned an +//! ambiguous result, which means that any further calls to +//! `process_obligations` would simply yield back further ambiguous +//! results. This is used by the `FulfillmentContext` to decide when it +//! has reached a steady state. +//! +//! #### Snapshots +//! +//! The `ObligationForest` supports a limited form of snapshots; see +//! `start_snapshot`; `commit_snapshot`; and `rollback_snapshot`. In +//! particular, you can use a snapshot to roll back new root +//! obligations. However, it is an error to attempt to +//! `process_obligations` during a snapshot. +//! +//! ### Implementation details +//! +//! For the most part, comments specific to the implementation are in the +//! code. This file only contains a very high-level overview. Basically, +//! the forest is stored in a vector. Each element of the vector is a node +//! in some tree. Each node in the vector has the index of an (optional) +//! parent and (for convenience) its root (which may be itself). It also +//! has a current state, described by `NodeState`. After each +//! processing step, we compress the vector to remove completed and error +//! nodes, which aren't needed anymore. use fx::{FxHashMap, FxHashSet}; From 228969c696b380716ad7547e87c4b52f3df50c96 Mon Sep 17 00:00:00 2001 From: mark Date: Tue, 15 Jan 2019 19:48:37 -0600 Subject: [PATCH 2/3] remove link to removed readme --- src/README.md | 1 - 1 file changed, 1 deletion(-) diff --git a/src/README.md b/src/README.md index 65228915866..14e773286bc 100644 --- a/src/README.md +++ b/src/README.md @@ -8,7 +8,6 @@ For more information on how various parts of the compiler work, see the [rustc g There is also useful content in the following READMEs, which are gradually being moved over to the guide: - https://github.com/rust-lang/rust/tree/master/src/librustc/ty/query - https://github.com/rust-lang/rust/tree/master/src/librustc/dep_graph -- https://github.com/rust-lang/rust/blob/master/src/librustc/infer/region_constraints - https://github.com/rust-lang/rust/tree/master/src/librustc/infer/higher_ranked - https://github.com/rust-lang/rust/tree/master/src/librustc/infer/lexical_region_resolve From dcd56d11da6e1d7cb8874d27d1fbb4f6a94a71de Mon Sep 17 00:00:00 2001 From: mark Date: Fri, 18 Jan 2019 14:10:45 -0600 Subject: [PATCH 3/3] remove more old readmes --- src/librustc/infer/higher_ranked/README.md | 407 +-------------------- src/librustc/ty/query/README.md | 303 +-------------- 2 files changed, 8 insertions(+), 702 deletions(-) diff --git a/src/librustc/infer/higher_ranked/README.md b/src/librustc/infer/higher_ranked/README.md index b1ac8bae4fb..e7afaa5beb0 100644 --- a/src/librustc/infer/higher_ranked/README.md +++ b/src/librustc/infer/higher_ranked/README.md @@ -1,403 +1,8 @@ -# Skolemization and functions +To learn more about how Higher-ranked trait bounds work in the _old_ trait +solver, see [this chapter][oldhrtb] of the rustc-guide. -One of the trickiest and most subtle aspects of regions is dealing -with higher-ranked things which include bound region variables, such -as function types. I strongly suggest that if you want to understand -the situation, you read this paper (which is, admittedly, very long, -but you don't have to read the whole thing): +To learn more about how they work in the _new_ trait solver, see [this +chapter][newhrtb]. -http://research.microsoft.com/en-us/um/people/simonpj/papers/higher-rank/ - -Although my explanation will never compete with SPJ's (for one thing, -his is approximately 100 pages), I will attempt to explain the basic -problem and also how we solve it. Note that the paper only discusses -subtyping, not the computation of LUB/GLB. - -The problem we are addressing is that there is a kind of subtyping -between functions with bound region parameters. Consider, for -example, whether the following relation holds: - - for<'a> fn(&'a isize) <: for<'b> fn(&'b isize)? (Yes, a => b) - -The answer is that of course it does. These two types are basically -the same, except that in one we used the name `a` and one we used -the name `b`. - -In the examples that follow, it becomes very important to know whether -a lifetime is bound in a function type (that is, is a lifetime -parameter) or appears free (is defined in some outer scope). -Therefore, from now on I will always write the bindings explicitly, -using the Rust syntax `for<'a> fn(&'a isize)` to indicate that `a` is a -lifetime parameter. - -Now let's consider two more function types. Here, we assume that the -`'b` lifetime is defined somewhere outside and hence is not a lifetime -parameter bound by the function type (it "appears free"): - - for<'a> fn(&'a isize) <: fn(&'b isize)? (Yes, a => b) - -This subtyping relation does in fact hold. To see why, you have to -consider what subtyping means. One way to look at `T1 <: T2` is to -say that it means that it is always ok to treat an instance of `T1` as -if it had the type `T2`. So, with our functions, it is always ok to -treat a function that can take pointers with any lifetime as if it -were a function that can only take a pointer with the specific -lifetime `'b`. After all, `'b` is a lifetime, after all, and -the function can take values of any lifetime. - -You can also look at subtyping as the *is a* relationship. This amounts -to the same thing: a function that accepts pointers with any lifetime -*is a* function that accepts pointers with some specific lifetime. - -So, what if we reverse the order of the two function types, like this: - - fn(&'b isize) <: for<'a> fn(&'a isize)? (No) - -Does the subtyping relationship still hold? The answer of course is -no. In this case, the function accepts *only the lifetime `'b`*, -so it is not reasonable to treat it as if it were a function that -accepted any lifetime. - -What about these two examples: - - for<'a,'b> fn(&'a isize, &'b isize) <: for<'a> fn(&'a isize, &'a isize)? (Yes) - for<'a> fn(&'a isize, &'a isize) <: for<'a,'b> fn(&'a isize, &'b isize)? (No) - -Here, it is true that functions which take two pointers with any two -lifetimes can be treated as if they only accepted two pointers with -the same lifetime, but not the reverse. - -## The algorithm - -Here is the algorithm we use to perform the subtyping check: - -1. Replace all bound regions in the subtype with new variables -2. Replace all bound regions in the supertype with placeholder - equivalents. A "placeholder" region is just a new fresh region - name. -3. Check that the parameter and return types match as normal -4. Ensure that no placeholder regions 'leak' into region variables - visible from "the outside" - -Let's walk through some examples and see how this algorithm plays out. - -#### First example - -We'll start with the first example, which was: - - 1. for<'a> fn(&'a T) <: for<'b> fn(&'b T)? Yes: a -> b - -After steps 1 and 2 of the algorithm we will have replaced the types -like so: - - 1. fn(&'A T) <: fn(&'x T)? - -Here the upper case `&A` indicates a *region variable*, that is, a -region whose value is being inferred by the system. I also replaced -`&b` with `&x`---I'll use letters late in the alphabet (`x`, `y`, `z`) -to indicate placeholder region names. We can assume they don't appear -elsewhere. Note that neither the sub- nor the supertype bind any -region names anymore (as indicated by the absence of `<` and `>`). - -The next step is to check that the parameter types match. Because -parameters are contravariant, this means that we check whether: - - &'x T <: &'A T - -Region pointers are contravariant so this implies that - - &A <= &x - -must hold, where `<=` is the subregion relationship. Processing -*this* constrain simply adds a constraint into our graph that `&A <= -&x` and is considered successful (it can, for example, be satisfied by -choosing the value `&x` for `&A`). - -So far we have encountered no error, so the subtype check succeeds. - -#### The third example - -Now let's look first at the third example, which was: - - 3. fn(&'a T) <: for<'b> fn(&'b T)? No! - -After steps 1 and 2 of the algorithm we will have replaced the types -like so: - - 3. fn(&'a T) <: fn(&'x T)? - -This looks pretty much the same as before, except that on the LHS -`'a` was not bound, and hence was left as-is and not replaced with -a variable. The next step is again to check that the parameter types -match. This will ultimately require (as before) that `'a` <= `&x` -must hold: but this does not hold. `self` and `x` are both distinct -free regions. So the subtype check fails. - -#### Checking for placeholder leaks - -You may be wondering about that mysterious last step in the algorithm. -So far it has not been relevant. The purpose of that last step is to -catch something like *this*: - - for<'a> fn() -> fn(&'a T) <: fn() -> for<'b> fn(&'b T)? No. - -Here the function types are the same but for where the binding occurs. -The subtype returns a function that expects a value in precisely one -region. The supertype returns a function that expects a value in any -region. If we allow an instance of the subtype to be used where the -supertype is expected, then, someone could call the fn and think that -the return value has type `fn(&'b T)` when it really has type -`fn(&'a T)` (this is case #3, above). Bad. - -So let's step through what happens when we perform this subtype check. -We first replace the bound regions in the subtype (the supertype has -no bound regions). This gives us: - - fn() -> fn(&'A T) <: fn() -> for<'b> fn(&'b T)? - -Now we compare the return types, which are covariant, and hence we have: - - fn(&'A T) <: for<'b> fn(&'b T)? - -Here we replace the bound region in the supertype with a placeholder to yield: - - fn(&'A T) <: fn(&'x T)? - -And then proceed to compare the argument types: - - &'x T <: &'A T - 'A <= 'x - -Finally, this is where it gets interesting! This is where an error -*should* be reported. But in fact this will not happen. The reason why -is that `A` is a variable: we will infer that its value is the fresh -region `x` and think that everything is happy. In fact, this behavior -is *necessary*, it was key to the first example we walked through. - -The difference between this example and the first one is that the variable -`A` already existed at the point where the placeholders were added. In -the first example, you had two functions: - - for<'a> fn(&'a T) <: for<'b> fn(&'b T) - -and hence `&A` and `&x` were created "together". In general, the -intention of the placeholder names is that they are supposed to be -fresh names that could never be equal to anything from the outside. -But when inference comes into play, we might not be respecting this -rule. - -So the way we solve this is to add a fourth step that examines the -constraints that refer to placeholder names. Basically, consider a -non-directed version of the constraint graph. Let `Tainted(x)` be the -set of all things reachable from a placeholder variable `x`. -`Tainted(x)` should not contain any regions that existed before the -step at which the placeholders were created. So this case here -would fail because `&x` was created alone, but is relatable to `&A`. - -## Computing the LUB and GLB - -The paper I pointed you at is written for Haskell. It does not -therefore considering subtyping and in particular does not consider -LUB or GLB computation. We have to consider this. Here is the -algorithm I implemented. - -First though, let's discuss what we are trying to compute in more -detail. The LUB is basically the "common supertype" and the GLB is -"common subtype"; one catch is that the LUB should be the -*most-specific* common supertype and the GLB should be *most general* -common subtype (as opposed to any common supertype or any common -subtype). - -Anyway, to help clarify, here is a table containing some function -pairs and their LUB/GLB (for conciseness, in this table, I'm just -including the lifetimes here, not the rest of the types, and I'm -writing `fn<>` instead of `for<> fn`): - -``` -Type 1 Type 2 LUB GLB -fn<'a>('a) fn('X) fn('X) fn<'a>('a) -fn('a) fn('X) -- fn<'a>('a) -fn<'a,'b>('a, 'b) fn<'x>('x, 'x) fn<'a>('a, 'a) fn<'a,'b>('a, 'b) -fn<'a,'b>('a, 'b, 'a) fn<'x,'y>('x, 'y, 'y) fn<'a>('a, 'a, 'a) fn<'a,'b,'c>('a,'b,'c) -``` - -### Conventions - -I use lower-case letters (e.g., `&a`) for bound regions and upper-case -letters for free regions (`&A`). Region variables written with a -dollar-sign (e.g., `$a`). I will try to remember to enumerate the -bound-regions on the fn type as well (e.g., `for<'a> fn(&a)`). - -### High-level summary - -Both the LUB and the GLB algorithms work in a similar fashion. They -begin by replacing all bound regions (on both sides) with fresh region -inference variables. Therefore, both functions are converted to types -that contain only free regions. We can then compute the LUB/GLB in a -straightforward way, as described in `combine.rs`. This results in an -interim type T. The algorithms then examine the regions that appear -in T and try to, in some cases, replace them with bound regions to -yield the final result. - -To decide whether to replace a region `R` that appears in `T` with -a bound region, the algorithms make use of two bits of -information. First is a set `V` that contains all region -variables created as part of the LUB/GLB computation (roughly; see -`region_vars_confined_to_snapshot()` for full details). `V` will -contain the region variables created to replace the bound regions -in the input types, but it also contains 'intermediate' variables -created to represent the LUB/GLB of individual regions. -Basically, when asked to compute the LUB/GLB of a region variable -with another region, the inferencer cannot oblige immediately -since the values of that variables are not known. Therefore, it -creates a new variable that is related to the two regions. For -example, the LUB of two variables `$x` and `$y` is a fresh -variable `$z` that is constrained such that `$x <= $z` and `$y <= -$z`. So `V` will contain these intermediate variables as well. - -The other important factor in deciding how to replace a region in T is -the function `Tainted($r)` which, for a region variable, identifies -all regions that the region variable is related to in some way -(`Tainted()` made an appearance in the subtype computation as well). - -### LUB - -The LUB algorithm proceeds in three steps: - -1. Replace all bound regions (on both sides) with fresh region - inference variables. -2. Compute the LUB "as normal", meaning compute the GLB of each - pair of argument types and the LUB of the return types and - so forth. Combine those to a new function type `F`. -3. Replace each region `R` that appears in `F` as follows: - - Let `V` be the set of variables created during the LUB - computational steps 1 and 2, as described in the previous section. - - If `R` is not in `V`, replace `R` with itself. - - If `Tainted(R)` contains a region that is not in `V`, - replace `R` with itself. - - Otherwise, select the earliest variable in `Tainted(R)` that originates - from the left-hand side and replace `R` with the bound region that - this variable was a replacement for. - -So, let's work through the simplest example: `fn(&A)` and `for<'a> fn(&a)`. -In this case, `&a` will be replaced with `$a` and the interim LUB type -`fn($b)` will be computed, where `$b=GLB(&A,$a)`. Therefore, `V = -{$a, $b}` and `Tainted($b) = { $b, $a, &A }`. When we go to replace -`$b`, we find that since `&A \in Tainted($b)` is not a member of `V`, -we leave `$b` as is. When region inference happens, `$b` will be -resolved to `&A`, as we wanted. - -Let's look at a more complex one: `fn(&a, &b)` and `fn(&x, &x)`. In -this case, we'll end up with a (pre-replacement) LUB type of `fn(&g, -&h)` and a graph that looks like: - -``` - $a $b *--$x - \ \ / / - \ $h-* / - $g-----------* -``` - -Here `$g` and `$h` are fresh variables that are created to represent -the LUB/GLB of things requiring inference. This means that `V` and -`Tainted` will look like: - -``` -V = {$a, $b, $g, $h, $x} -Tainted($g) = Tainted($h) = { $a, $b, $h, $g, $x } -``` - -Therefore we replace both `$g` and `$h` with `$a`, and end up -with the type `fn(&a, &a)`. - -### GLB - -The procedure for computing the GLB is similar. The difference lies -in computing the replacements for the various variables. For each -region `R` that appears in the type `F`, we again compute `Tainted(R)` -and examine the results: - -1. If `R` is not in `V`, it is not replaced. -2. Else, if `Tainted(R)` contains only variables in `V`, and it - contains exactly one variable from the LHS and one variable from - the RHS, then `R` can be mapped to the bound version of the - variable from the LHS. -3. Else, if `Tainted(R)` contains no variable from the LHS and no - variable from the RHS, then `R` can be mapped to itself. -4. Else, `R` is mapped to a fresh bound variable. - -These rules are pretty complex. Let's look at some examples to see -how they play out. - -Out first example was `fn(&a)` and `fn(&X)`. In this case, `&a` will -be replaced with `$a` and we will ultimately compute a -(pre-replacement) GLB type of `fn($g)` where `$g=LUB($a,&X)`. -Therefore, `V={$a,$g}` and `Tainted($g)={$g,$a,&X}. To find the -replacement for `$g` we consult the rules above: -- Rule (1) does not apply because `$g \in V` -- Rule (2) does not apply because `&X \in Tainted($g)` -- Rule (3) does not apply because `$a \in Tainted($g)` -- Hence, by rule (4), we replace `$g` with a fresh bound variable `&z`. -So our final result is `fn(&z)`, which is correct. - -The next example is `fn(&A)` and `fn(&Z)`. In this case, we will again -have a (pre-replacement) GLB of `fn(&g)`, where `$g = LUB(&A,&Z)`. -Therefore, `V={$g}` and `Tainted($g) = {$g, &A, &Z}`. In this case, -by rule (3), `$g` is mapped to itself, and hence the result is -`fn($g)`. This result is correct (in this case, at least), but it is -indicative of a case that *can* lead us into concluding that there is -no GLB when in fact a GLB does exist. See the section "Questionable -Results" below for more details. - -The next example is `fn(&a, &b)` and `fn(&c, &c)`. In this case, as -before, we'll end up with `F=fn($g, $h)` where `Tainted($g) = -Tainted($h) = {$g, $h, $a, $b, $c}`. Only rule (4) applies and hence -we'll select fresh bound variables `y` and `z` and wind up with -`fn(&y, &z)`. - -For the last example, let's consider what may seem trivial, but is -not: `fn(&a, &a)` and `fn(&b, &b)`. In this case, we'll get `F=fn($g, -$h)` where `Tainted($g) = {$g, $a, $x}` and `Tainted($h) = {$h, $a, -$x}`. Both of these sets contain exactly one bound variable from each -side, so we'll map them both to `&a`, resulting in `fn(&a, &a)`, which -is the desired result. - -### Shortcomings and correctness - -You may be wondering whether this algorithm is correct. The answer is -"sort of". There are definitely cases where they fail to compute a -result even though a correct result exists. I believe, though, that -if they succeed, then the result is valid, and I will attempt to -convince you. The basic argument is that the "pre-replacement" step -computes a set of constraints. The replacements, then, attempt to -satisfy those constraints, using bound identifiers where needed. - -For now I will briefly go over the cases for LUB/GLB and identify -their intent: - -- LUB: - - The region variables that are substituted in place of bound regions - are intended to collect constraints on those bound regions. - - If Tainted(R) contains only values in V, then this region is unconstrained - and can therefore be generalized, otherwise it cannot. -- GLB: - - The region variables that are substituted in place of bound regions - are intended to collect constraints on those bound regions. - - If Tainted(R) contains exactly one variable from each side, and - only variables in V, that indicates that those two bound regions - must be equated. - - Otherwise, if Tainted(R) references any variables from left or right - side, then it is trying to combine a bound region with a free one or - multiple bound regions, so we need to select fresh bound regions. - -Sorry this is more of a shorthand to myself. I will try to write up something -more convincing in the future. - -#### Where are the algorithms wrong? - -- The pre-replacement computation can fail even though using a - bound-region would have succeeded. -- We will compute GLB(fn(fn($a)), fn(fn($b))) as fn($c) where $c is the - GLB of $a and $b. But if inference finds that $a and $b must be mapped - to regions without a GLB, then this is effectively a failure to compute - the GLB. However, the result `fn<$c>(fn($c))` is a valid GLB. +[oldhrtb]: https://rust-lang.github.io/rustc-guide/traits/hrtb.html +[newhrtb]: https://rust-lang.github.io/rustc-guide/borrow_check/region_inference.html#placeholders-and-universes diff --git a/src/librustc/ty/query/README.md b/src/librustc/ty/query/README.md index 0fcaef5de54..4b5e08cecd9 100644 --- a/src/librustc/ty/query/README.md +++ b/src/librustc/ty/query/README.md @@ -1,302 +1,3 @@ -# The Rust Compiler Query System - -The Compiler Query System is the key to our new demand-driven -organization. The idea is pretty simple. You have various queries -that compute things about the input -- for example, there is a query -called `type_of(def_id)` that, given the def-id of some item, will -compute the type of that item and return it to you. - -Query execution is **memoized** -- so the first time you invoke a -query, it will go do the computation, but the next time, the result is -returned from a hashtable. Moreover, query execution fits nicely into -**incremental computation**; the idea is roughly that, when you do a -query, the result **may** be returned to you by loading stored data -from disk (but that's a separate topic we won't discuss further here). - -The overall vision is that, eventually, the entire compiler -control-flow will be query driven. There will effectively be one -top-level query ("compile") that will run compilation on a crate; this -will in turn demand information about that crate, starting from the -*end*. For example: - -- This "compile" query might demand to get a list of codegen-units - (i.e., modules that need to be compiled by LLVM). -- But computing the list of codegen-units would invoke some subquery - that returns the list of all modules defined in the Rust source. -- That query in turn would invoke something asking for the HIR. -- This keeps going further and further back until we wind up doing the - actual parsing. - -However, that vision is not fully realized. Still, big chunks of the -compiler (for example, generating MIR) work exactly like this. - -### Invoking queries - -To invoke a query is simple. The tcx ("type context") offers a method -for each defined query. So, for example, to invoke the `type_of` -query, you would just do this: - -```rust -let ty = tcx.type_of(some_def_id); -``` - -### Cycles between queries - -Currently, cycles during query execution should always result in a -compilation error. Typically, they arise because of illegal programs -that contain cyclic references they shouldn't (though sometimes they -arise because of compiler bugs, in which case we need to factor our -queries in a more fine-grained fashion to avoid them). - -However, it is nonetheless often useful to *recover* from a cycle -(after reporting an error, say) and try to soldier on, so as to give a -better user experience. In order to recover from a cycle, you don't -get to use the nice method-call-style syntax. Instead, you invoke -using the `try_get` method, which looks roughly like this: - -```rust -use ty::query::queries; -... -match queries::type_of::try_get(tcx, DUMMY_SP, self.did) { - Ok(result) => { - // no cycle occurred! You can use `result` - } - Err(err) => { - // A cycle occurred! The error value `err` is a `DiagnosticBuilder`, - // meaning essentially an "in-progress", not-yet-reported error message. - // See below for more details on what to do here. - } -} -``` - -So, if you get back an `Err` from `try_get`, then a cycle *did* occur. This means that -you must ensure that a compiler error message is reported. You can do that in two ways: - -The simplest is to invoke `err.emit()`. This will emit the cycle error to the user. - -However, often cycles happen because of an illegal program, and you -know at that point that an error either already has been reported or -will be reported due to this cycle by some other bit of code. In that -case, you can invoke `err.cancel()` to not emit any error. It is -traditional to then invoke: - -``` -tcx.sess.delay_span_bug(some_span, "some message") -``` - -`delay_span_bug()` is a helper that says: we expect a compilation -error to have happened or to happen in the future; so, if compilation -ultimately succeeds, make an ICE with the message `"some -message"`. This is basically just a precaution in case you are wrong. - -### How the compiler executes a query - -So you may be wondering what happens when you invoke a query -method. The answer is that, for each query, the compiler maintains a -cache -- if your query has already been executed, then, the answer is -simple: we clone the return value out of the cache and return it -(therefore, you should try to ensure that the return types of queries -are cheaply cloneable; insert a `Rc` if necessary). - -#### Providers - -If, however, the query is *not* in the cache, then the compiler will -try to find a suitable **provider**. A provider is a function that has -been defined and linked into the compiler somewhere that contains the -code to compute the result of the query. - -**Providers are defined per-crate.** The compiler maintains, -internally, a table of providers for every crate, at least -conceptually. Right now, there are really two sets: the providers for -queries about the **local crate** (that is, the one being compiled) -and providers for queries about **external crates** (that is, -dependencies of the local crate). Note that what determines the crate -that a query is targeting is not the *kind* of query, but the *key*. -For example, when you invoke `tcx.type_of(def_id)`, that could be a -local query or an external query, depending on what crate the `def_id` -is referring to (see the `self::keys::Key` trait for more information -on how that works). - -Providers always have the same signature: - -```rust -fn provider<'cx, 'tcx>(tcx: TyCtxt<'cx, 'tcx, 'tcx>, - key: QUERY_KEY) - -> QUERY_RESULT -{ - ... -} -``` - -Providers take two arguments: the `tcx` and the query key. Note also -that they take the *global* tcx (i.e., they use the `'tcx` lifetime -twice), rather than taking a tcx with some active inference context. -They return the result of the query. - -#### How providers are setup - -When the tcx is created, it is given the providers by its creator using -the `Providers` struct. This struct is generate by the macros here, but it -is basically a big list of function pointers: - -```rust -struct Providers { - type_of: for<'cx, 'tcx> fn(TyCtxt<'cx, 'tcx, 'tcx>, DefId) -> Ty<'tcx>, - ... -} -``` - -At present, we have one copy of the struct for local crates, and one -for external crates, though the plan is that we may eventually have -one per crate. - -These `Provider` structs are ultimately created and populated by -`librustc_driver`, but it does this by distributing the work -throughout the other `rustc_*` crates. This is done by invoking -various `provide` functions. These functions tend to look something -like this: - -```rust -pub fn provide(providers: &mut Providers) { - *providers = Providers { - type_of, - ..*providers - }; -} -``` - -That is, they take an `&mut Providers` and mutate it in place. Usually -we use the formulation above just because it looks nice, but you could -as well do `providers.type_of = type_of`, which would be equivalent. -(Here, `type_of` would be a top-level function, defined as we saw -before.) So, if we want to add a provider for some other query, -let's call it `fubar`, into the crate above, we might modify the `provide()` -function like so: - -```rust -pub fn provide(providers: &mut Providers) { - *providers = Providers { - type_of, - fubar, - ..*providers - }; -} - -fn fubar<'cx, 'tcx>(tcx: TyCtxt<'cx, 'tcx>, key: DefId) -> Fubar<'tcx> { .. } -``` - -NB. Most of the `rustc_*` crates only provide **local -providers**. Almost all **extern providers** wind up going through the -`rustc_metadata` crate, which loads the information from the crate -metadata. But in some cases there are crates that provide queries for -*both* local and external crates, in which case they define both a -`provide` and a `provide_extern` function that `rustc_driver` can -invoke. - -### Adding a new kind of query - -So suppose you want to add a new kind of query, how do you do so? -Well, defining a query takes place in two steps: - -1. first, you have to specify the query name and arguments; and then, -2. you have to supply query providers where needed. - -To specify the query name and arguments, you simply add an entry -to the big macro invocation in `mod.rs`. This will probably have changed -by the time you read this README, but at present it looks something -like: - -``` -define_queries! { <'tcx> - /// Records the type of every item. - [] fn type_of: TypeOfItem(DefId) -> Ty<'tcx>, - - ... -} -``` - -Each line of the macro defines one query. The name is broken up like this: - -``` -[] fn type_of: TypeOfItem(DefId) -> Ty<'tcx>, -^^ ^^^^^^^ ^^^^^^^^^^ ^^^^^ ^^^^^^^^ -| | | | | -| | | | result type of query -| | | query key type -| | dep-node constructor -| name of query -query flags -``` - -Let's go over them one by one: - -- **Query flags:** these are largely unused right now, but the intention - is that we'll be able to customize various aspects of how the query is - processed. -- **Name of query:** the name of the query method - (`tcx.type_of(..)`). Also used as the name of a struct - (`ty::query::queries::type_of`) that will be generated to represent - this query. -- **Dep-node constructor:** indicates the constructor function that - connects this query to incremental compilation. Typically, this is a - `DepNode` variant, which can be added by modifying the - `define_dep_nodes!` macro invocation in - `librustc/dep_graph/dep_node.rs`. - - However, sometimes we use a custom function, in which case the - name will be in snake case and the function will be defined at the - bottom of the file. This is typically used when the query key is - not a def-id, or just not the type that the dep-node expects. -- **Query key type:** the type of the argument to this query. - This type must implement the `ty::query::keys::Key` trait, which - defines (for example) how to map it to a crate, and so forth. -- **Result type of query:** the type produced by this query. This type - should (a) not use `RefCell` or other interior mutability and (b) be - cheaply cloneable. Interning or using `Rc` or `Arc` is recommended for - non-trivial data types. - - The one exception to those rules is the `ty::steal::Steal` type, - which is used to cheaply modify MIR in place. See the definition - of `Steal` for more details. New uses of `Steal` should **not** be - added without alerting `@rust-lang/compiler`. - -So, to add a query: - -- Add an entry to `define_queries!` using the format above. -- Possibly add a corresponding entry to the dep-node macro. -- Link the provider by modifying the appropriate `provide` method; - or add a new one if needed and ensure that `rustc_driver` is invoking it. - -#### Query structs and descriptions - -For each kind, the `define_queries` macro will generate a "query struct" -named after the query. This struct is a kind of a place-holder -describing the query. Each such struct implements the -`self::config::QueryConfig` trait, which has associated types for the -key/value of that particular query. Basically the code generated looks something -like this: - -```rust -// Dummy struct representing a particular kind of query: -pub struct type_of<'tcx> { phantom: PhantomData<&'tcx ()> } - -impl<'tcx> QueryConfig for type_of<'tcx> { - type Key = DefId; - type Value = Ty<'tcx>; -} -``` - -There is an additional trait that you may wish to implement called -`self::config::QueryDescription`. This trait is used during cycle -errors to give a "human readable" name for the query, so that we can -summarize what was happening when the cycle occurred. Implementing -this trait is optional if the query key is `DefId`, but if you *don't* -implement it, you get a pretty generic error ("processing `foo`..."). -You can put new impls into the `config` module. They look something like this: - -```rust -impl<'tcx> QueryDescription for queries::type_of<'tcx> { - fn describe(tcx: TyCtxt<'_, '_, '_>, key: DefId) -> String { - format!("computing the type of `{}`", tcx.item_path_str(key)) - } -} -``` +For more information about how the query system works, see the [rustc guide]. +[rustc guide]: https://rust-lang.github.io/rustc-guide/query.html