From 64dad2cb0333e12d0e5b4bb4e9b013af68f14156 Mon Sep 17 00:00:00 2001 From: Alan Andrade Date: Mon, 12 May 2014 10:31:13 -0700 Subject: [PATCH 1/3] Cleanup lifetime guide Clean pointers guide --- src/doc/guide-lifetimes.md | 223 +++++++++++-------------------------- src/doc/guide-pointers.md | 171 ++++------------------------ 2 files changed, 91 insertions(+), 303 deletions(-) diff --git a/src/doc/guide-lifetimes.md b/src/doc/guide-lifetimes.md index 48c50471c25..e0e33337c9a 100644 --- a/src/doc/guide-lifetimes.md +++ b/src/doc/guide-lifetimes.md @@ -3,14 +3,12 @@ # Introduction References are one of the more flexible and powerful tools available in -Rust. A reference can point anywhere: into the managed or exchange -heap, into the stack, and even into the interior of another data structure. A -reference is as flexible as a C pointer or C++ reference. However, -unlike C and C++ compilers, the Rust compiler includes special static checks -that ensure that programs use references safely. Another advantage of -references is that they are invisible to the garbage collector, so -working with references helps reduce the overhead of automatic memory -management. +Rust. They can point anywhere: into the heap, stack, and even into the +interior of another data structure. A reference is as flexible as a C pointer +or C++ reference. + +Unlike C and C++ compilers, the Rust compiler includes special static +checks that ensure that programs use references safely. Despite their complete safety, a reference's representation at runtime is the same as that of an ordinary pointer in a C program. They introduce zero @@ -26,7 +24,7 @@ through several examples. References, sometimes known as *borrowed pointers*, are only valid for a limited duration. References never claim any kind of ownership -over the data that they point to: instead, they are used for cases +over the data that they point to, instead, they are used for cases where you would like to use data for a short time. As an example, consider a simple struct type `Point`: @@ -36,27 +34,23 @@ struct Point {x: f64, y: f64} ~~~ We can use this simple definition to allocate points in many different ways. For -example, in this code, each of these three local variables contains a -point, but allocated in a different place: +example, in this code, each of these local variables contains a point, +but allocated in a different place: ~~~ # struct Point {x: f64, y: f64} -let on_the_stack : Point = Point {x: 3.0, y: 4.0}; -let managed_box : @Point = @Point {x: 5.0, y: 1.0}; -let owned_box : Box = box Point {x: 7.0, y: 9.0}; +let on_the_stack : Point = Point {x: 3.0, y: 4.0}; +let on_the_heap : Box = box Point {x: 7.0, y: 9.0}; ~~~ Suppose we wanted to write a procedure that computed the distance between any -two points, no matter where they were stored. For example, we might like to -compute the distance between `on_the_stack` and `managed_box`, or between -`managed_box` and `owned_box`. One option is to define a function that takes -two arguments of type `Point`—that is, it takes the points by value. But if we -define it this way, calling the function will cause the points to be -copied. For points, this is probably not so bad, but often copies are +two points, no matter where they were stored. One option is to define a function +that takes two arguments of type `Point`—that is, it takes the points __by value__. +But if we define it this way, calling the function will cause the points __to be +copied__. For points, this is probably not so bad, but often copies are expensive. Worse, if the data type contains mutable fields, copying can change -the semantics of your program in unexpected ways. So we'd like to define a -function that takes the points by pointer. We can use references to do -this: +the semantics of your program in unexpected ways. So we'd like to define +a function that takes the points just as a __reference__/__borrowed pointer__. ~~~ # struct Point {x: f64, y: f64} @@ -68,30 +62,27 @@ fn compute_distance(p1: &Point, p2: &Point) -> f64 { } ~~~ -Now we can call `compute_distance()` in various ways: +Now we can call `compute_distance()` ~~~ # struct Point {x: f64, y: f64} # let on_the_stack : Point = Point{x: 3.0, y: 4.0}; -# let managed_box : @Point = @Point{x: 5.0, y: 1.0}; -# let owned_box : Box = box Point{x: 7.0, y: 9.0}; +# let on_the_heap : Box = box Point{x: 7.0, y: 9.0}; # fn compute_distance(p1: &Point, p2: &Point) -> f64 { 0.0 } -compute_distance(&on_the_stack, managed_box); -compute_distance(managed_box, owned_box); +compute_distance(&on_the_stack, on_the_heap); ~~~ Here, the `&` operator takes the address of the variable `on_the_stack`; this is because `on_the_stack` has the type `Point` (that is, a struct value) and we have to take its address to get a value. We also call this _borrowing_ the local variable -`on_the_stack`, because we have created an alias: that is, another +`on_the_stack`, because we have created __an alias__: that is, another name for the same data. -In contrast, we can pass the boxes `managed_box` and `owned_box` to -`compute_distance` directly. The compiler automatically converts a box like -`@Point` or `~Point` to a reference like `&Point`. This is another form -of borrowing: in this case, the caller lends the contents of the managed or -owned box to the callee. +In contrast, we can pass `on_the_heap` to `compute_distance` directly. +The compiler automatically converts a box like `Box` to a reference like +`&Point`. This is another form of borrowing: in this case, the caller lends +the contents of the box to the callee. Whenever a caller lends data to a callee, there are some limitations on what the caller can do with the original. For example, if the contents of a @@ -134,10 +125,10 @@ let on_the_stack2 : &Point = &tmp; # Taking the address of fields -As in C, the `&` operator is not limited to taking the address of +The `&` operator is not limited to taking the address of local variables. It can also take the address of fields or individual array elements. For example, consider this type definition -for `rectangle`: +for `Rectangle`: ~~~ struct Point {x: f64, y: f64} // as before @@ -153,9 +144,7 @@ Now, as before, we can define rectangles in a few different ways: # struct Rectangle {origin: Point, size: Size} let rect_stack = &Rectangle {origin: Point {x: 1.0, y: 2.0}, size: Size {w: 3.0, h: 4.0}}; -let rect_managed = @Rectangle {origin: Point {x: 3.0, y: 4.0}, - size: Size {w: 3.0, h: 4.0}}; -let rect_owned = box Rectangle {origin: Point {x: 5.0, y: 6.0}, +let rect_heap = box Rectangle {origin: Point {x: 5.0, y: 6.0}, size: Size {w: 3.0, h: 4.0}}; ~~~ @@ -167,109 +156,29 @@ operator. For example, I could write: # struct Size {w: f64, h: f64} // as before # struct Rectangle {origin: Point, size: Size} # let rect_stack = &Rectangle {origin: Point {x: 1.0, y: 2.0}, size: Size {w: 3.0, h: 4.0}}; -# let rect_managed = @Rectangle {origin: Point {x: 3.0, y: 4.0}, size: Size {w: 3.0, h: 4.0}}; -# let rect_owned = box Rectangle {origin: Point {x: 5.0, y: 6.0}, size: Size {w: 3.0, h: 4.0}}; +# let rect_heap = box Rectangle {origin: Point {x: 5.0, y: 6.0}, size: Size {w: 3.0, h: 4.0}}; # fn compute_distance(p1: &Point, p2: &Point) -> f64 { 0.0 } -compute_distance(&rect_stack.origin, &rect_managed.origin); +compute_distance(&rect_stack.origin, &rect_heap.origin); ~~~ which would borrow the field `origin` from the rectangle on the stack -as well as from the managed box, and then compute the distance between them. +as well as from the owned box, and then compute the distance between them. -# Borrowing managed boxes and rooting +# Lifetimes -We’ve seen a few examples so far of borrowing heap boxes, both managed -and owned. Up till this point, we’ve glossed over issues of -safety. As stated in the introduction, at runtime a reference -is simply a pointer, nothing more. Therefore, avoiding C's problems -with dangling pointers requires a compile-time safety check. +We’ve seen a few examples of borrowing data. Up till this point, we’ve glossed +over issues of safety. As stated in the introduction, at runtime a reference +is simply a pointer, nothing more. Therefore, avoiding C's problems with +dangling pointers requires a compile-time safety check. -The basis for the check is the notion of _lifetimes_. A lifetime is a +The basis for the check is the notion of __lifetimes__. A lifetime is a static approximation of the span of execution during which the pointer is valid: it always corresponds to some expression or block within the -program. Code inside that expression can use the pointer without -restrictions. But if the pointer escapes from that expression (for -example, if the expression contains an assignment expression that -assigns the pointer to a mutable field of a data structure with a -broader scope than the pointer itself), the compiler reports an -error. We'll be discussing lifetimes more in the examples to come, and -a more thorough introduction is also available. +program. -When the `&` operator creates a reference, the compiler must -ensure that the pointer remains valid for its entire -lifetime. Sometimes this is relatively easy, such as when taking the -address of a local variable or a field that is stored on the stack: - -~~~ -struct X { f: int } -fn example1() { - let mut x = X { f: 3 }; - let y = &mut x.f; // -+ L - // ... // | -} // -+ -~~~ - -Here, the lifetime of the reference `y` is simply L, the -remainder of the function body. The compiler need not do any other -work to prove that code will not free `x.f`. This is true even if the -code mutates `x`. - -The situation gets more complex when borrowing data inside heap boxes: - -~~~ -# struct X { f: int } -fn example2() { - let mut x = @X { f: 3 }; - let y = &x.f; // -+ L - // ... // | -} // -+ -~~~ - -In this example, the value `x` is a heap box, and `y` is therefore a -pointer into that heap box. Again the lifetime of `y` is L, the -remainder of the function body. But there is a crucial difference: -suppose `x` were to be reassigned during the lifetime L? If the -compiler isn't careful, the managed box could become *unrooted*, and -would therefore be subject to garbage collection. A heap box that is -unrooted is one such that no pointer values in the heap point to -it. It would violate memory safety for the box that was originally -assigned to `x` to be garbage-collected, since a non-heap -pointer *`y`* still points into it. - -> *Note:* Our current implementation implements the garbage collector -> using reference counting and cycle detection. - -For this reason, whenever an `&` expression borrows the interior of a -managed box stored in a mutable location, the compiler inserts a -temporary that ensures that the managed box remains live for the -entire lifetime. So, the above example would be compiled as if it were -written - -~~~ -# struct X { f: int } -fn example2() { - let mut x = @X {f: 3}; - let x1 = x; - let y = &x1.f; // -+ L - // ... // | -} // -+ -~~~ - -Now if `x` is reassigned, the pointer `y` will still remain valid. This -process is called *rooting*. - -# Borrowing owned boxes - -The previous example demonstrated *rooting*, the process by which the -compiler ensures that managed boxes remain live for the duration of a -borrow. Unfortunately, rooting does not work for borrows of owned -boxes, because it is not possible to have two references to an owned -box. - -For owned boxes, therefore, the compiler will only allow a borrow *if -the compiler can guarantee that the owned box will not be reassigned -or moved for the lifetime of the pointer*. This does not necessarily -mean that the owned box is stored in immutable memory. For example, +The compiler will only allow a borrow *if it can guarantee that the data will +not be reassigned or moved for the lifetime of the pointer*. This does not +necessarily mean that the data is stored in immutable memory. For example, the following function is legal: ~~~ @@ -294,7 +203,7 @@ and `x` is declared as mutable. However, the compiler can prove that and in fact is mutated later in the function. It may not be clear why we are so concerned about mutating a borrowed -variable. The reason is that the runtime system frees any owned box +variable. The reason is that the runtime system frees any box _as soon as its owning reference changes or goes out of scope_. Therefore, a program like this is illegal (and would be rejected by the compiler): @@ -337,31 +246,34 @@ Once the reassignment occurs, the memory will look like this: +---------+ ~~~ -Here you can see that the variable `y` still points at the old box, -which has been freed. +Here you can see that the variable `y` still points at the old `f` +property of Foo, which has been freed. In fact, the compiler can apply the same kind of reasoning to any -memory that is _(uniquely) owned by the stack frame_. So we could +memory that is (uniquely) owned by the stack frame. So we could modify the previous example to introduce additional owned pointers and structs, and the compiler will still be able to detect possible -mutations: +mutations. This time, we'll use an analogy to illustrate the concept. ~~~ {.ignore} fn example3() -> int { - struct R { g: int } - struct S { f: Box } + struct House { owner: Box } + struct Person { age: int } - let mut x = box S {f: box R {g: 3}}; - let y = &x.f.g; - x = box S {f: box R {g: 4}}; // Error reported here. - x.f = box R {g: 5}; // Error reported here. - *y + let mut house = box House { + owner: box Person {age: 30} + }; + + let owner_age = &house.owner.age; + house = box House {owner: box Person {age: 40}}; // Error reported here. + house.owner = box Person {age: 50}; // Error reported here. + *owner_age } ~~~ -In this case, two errors are reported, one when the variable `x` is -modified and another when `x.f` is modified. Either modification would -invalidate the pointer `y`. +In this case, two errors are reported, one when the variable `house` is +modified and another when `house.owner` is modified. Either modification would +invalidate the pointer `owner_age`. # Borrowing and enums @@ -412,7 +324,7 @@ circle constant][tau] and not that dreadfully outdated notion of pi). The second match is more interesting. Here we match against a rectangle and extract its size: but rather than copy the `size` -struct, we use a by-reference binding to create a pointer to it. In +struct, we use a __by-reference binding__ to create a pointer to it. In other words, a pattern binding like `ref size` binds the name `size` to a pointer of type `&size` into the _interior of the enum_. @@ -526,12 +438,12 @@ time one that does not compile: ~~~ {.ignore} struct Point {x: f64, y: f64} -fn get_x_sh(p: @Point) -> &f64 { +fn get_x_sh(p: &Point) -> &f64 { &p.x // Error reported here } ~~~ -Here, the function `get_x_sh()` takes a managed box as input and +Here, the function `get_x_sh()` takes a reference as input and returns a reference. As before, the lifetime of the reference that will be returned is a parameter (specified by the caller). That means that `get_x_sh()` promises to return a reference @@ -540,17 +452,18 @@ subtly different from the first example, which promised to return a pointer that was valid for as long as its pointer argument was valid. Within `get_x_sh()`, we see the expression `&p.x` which takes the -address of a field of a managed box. The presence of this expression -implies that the compiler must guarantee that, so long as the -resulting pointer is valid, the managed box will not be reclaimed by -the garbage collector. But recall that `get_x_sh()` also promised to +address of a field of a Point. The presence of this expression +implies that the compiler must guarantee that , so long as the +resulting pointer is valid, the original Point won't be moved or changed. + +But recall that `get_x_sh()` also promised to return a pointer that was valid for as long as the caller wanted it to be. Clearly, `get_x_sh()` is not in a position to make both of these guarantees; in fact, it cannot guarantee that the pointer will remain valid at all once it returns, as the parameter `p` may or may not be live in the caller. Therefore, the compiler will report an error here. -In general, if you borrow a managed (or owned) box to create a +In general, if you borrow a structs or boxes to create a reference, it will only be valid within the function and cannot be returned. This is why the typical way to return references is to take references as input (the only other case in diff --git a/src/doc/guide-pointers.md b/src/doc/guide-pointers.md index 948d033e06c..bee1dbcd2ce 100644 --- a/src/doc/guide-pointers.md +++ b/src/doc/guide-pointers.md @@ -5,7 +5,7 @@ are also one of the more confusing topics for newcomers to Rust. They can also be confusing for people coming from other languages that support pointers, such as C++. This guide will help you understand this important topic. -# You don't actually need pointers +# You don't actually need pointers, use references I have good news for you: you probably don't need to care about pointers, especially as you're getting started. Think of it this way: Rust is a language @@ -37,7 +37,7 @@ error: mismatched types: expected `&int` but found `` (expec What gives? It needs a pointer! Therefore I have to use pointers!" -Turns out, you don't. All you need is a reference. Try this on for size: +Turns out, you don't. __All you need is a reference__. Try this on for size: ~~~rust # fn succ(x: &int) -> int { *x + 1 } @@ -74,22 +74,22 @@ Here are the use-cases for pointers. I've prefixed them with the name of the pointer that satisfies that use-case: 1. Owned: `Box` must be a pointer, because you don't know the size of the -object, so indirection is mandatory. +object, so indirection is mandatory. Notation might change once Rust +support DST fully so we recommend you stay tuned. + 2. Owned: You need a recursive data structure. These can be infinite sized, so indirection is mandatory. + 3. Owned: A very, very, very rare situation in which you have a *huge* chunk of data that you wish to pass to many methods. Passing a pointer will make this more efficient. If you're coming from another language where this technique is common, such as C++, please read "A note..." below. -4. Managed: Having only a single owner to a piece of data would be inconvenient -or impossible. This is only often useful when a program is very large or very -complicated. Using a managed pointer will activate Rust's garbage collection -mechanism. -5. Reference: You're writing a function, and you need a pointer, but you don't + +4. Reference: You're writing a function, and you need a pointer, but you don't care about its ownership. If you make the argument a reference, callers can send in whatever kind they want. -Five exceptions. That's it. Otherwise, you shouldn't need them. Be sceptical +Four exceptions. That's it. Otherwise, you shouldn't need them. Be sceptical of pointers in Rust: use them for a deliberate purpose, not just to make the compiler happy. @@ -165,6 +165,7 @@ approximation of owned pointers follows: 1. Only one owned pointer may exist to a particular place in memory. It may be borrowed from that owner, however. + 2. The Rust compiler uses static analysis to determine where the pointer is in scope, and handles allocating and de-allocating that memory. Owned pointers are not garbage collected. @@ -204,6 +205,10 @@ The inner lists _must_ be an owned pointer, because we can't know how many elements are in the list. Without knowing the length, we don't know the size, and therefore require the indirection that pointers offer. +> Note: Nil is just part of the List enum and even though is being used +> to represent the concept of "nothing", you shouldn't think of it as +> NULL. Rust doesn't have NULL. + ## Efficiency This should almost never be a concern, but because creating an owned pointer @@ -248,81 +253,6 @@ fn main() { Now it'll be copying a pointer-sized chunk of memory rather than the whole struct. -# Managed Pointers - -> **Note**: the `@` form of managed pointers is deprecated and behind a -> feature gate (it requires a `#![feature(managed_pointers)]` attribute on -> the crate root). There are replacements, currently -> there is `std::rc::Rc` and `std::gc::Gc` for shared ownership via reference -> counting and garbage collection respectively. - -Managed pointers, notated by an `@`, are used when having a single owner for -some data isn't convenient or possible. This generally happens when your -program is very large and complicated. - -For example, let's say you're using an owned pointer, and you want to do this: - -~~~rust{.ignore} -struct Point { - x: int, - y: int, -} - -fn main() { - let a = box Point { x: 10, y: 20 }; - let b = a; - println!("{}", b.x); - println!("{}", a.x); -} -~~~ - -You'll get this error: - -~~~ {.notrust} -test.rs:10:20: 10:21 error: use of moved value: `a` -test.rs:10 println!("{}", a.x); - ^ -note: in expansion of format_args! -:158:27: 158:81 note: expansion site -:157:5: 159:6 note: in expansion of println! -test.rs:10:5: 10:25 note: expansion site -test.rs:8:9: 8:10 note: `a` moved here because it has type `Box`, which is moved by default (use `ref` to override) -test.rs:8 let b = a; - ^ -~~~ - -As the message says, owned pointers only allow for one owner at a time. When you assign `a` to `b`, `a` becomes invalid. Change your code to this, however: - -~~~rust -struct Point { - x: int, - y: int, -} - -fn main() { - let a = @Point { x: 10, y: 20 }; - let b = a; - println!("{}", b.x); - println!("{}", a.x); -} -~~~ - -And it works: - -~~~ {.notrust} -10 -10 -~~~ - -So why not just use managed pointers everywhere? There are two big drawbacks to -managed pointers: - -1. They activate Rust's garbage collector. Other pointer types don't share this -drawback. -2. You cannot pass this data to another task. Shared ownership across -concurrency boundaries is the source of endless pain in other languages, so -Rust does not let you do this. - # References References are the third major kind of pointer Rust supports. They are @@ -346,7 +276,7 @@ fn compute_distance(p1: &Point, p2: &Point) -> f32 { } fn main() { - let origin = @Point { x: 0.0, y: 0.0 }; + let origin = &Point { x: 0.0, y: 0.0 }; let p1 = box Point { x: 5.0, y: 3.0 }; println!("{:?}", compute_distance(origin, p1)); @@ -354,8 +284,9 @@ fn main() { ~~~ This prints `5.83095189`. You can see that the `compute_distance` function -takes in two references, but we give it a managed and unique pointer. Of -course, if this were a real program, we wouldn't have any of these pointers, +takes in two references, but we give it a stack allocated reference and an +owned box reference. +Of course, if this were a real program, we wouldn't have any of these pointers, they're just there to demonstrate the concepts. So how is this hard? Well, because we're ignoring ownership, the compiler needs @@ -364,9 +295,11 @@ safety, a reference's representation at runtime is the same as that of an ordinary pointer in a C program. They introduce zero overhead. The compiler does all safety checks at compile time. -This theory is called 'region pointers,' and involve a concept called -'lifetimes'. Here's the simple explanation: would you expect this code to -compile? +This theory is called 'region pointers' and you can read more about it +[here](http://www.cs.umd.edu/projects/cyclone/papers/cyclone-regions.pdf). +Region pointers evolved into what we know today as 'lifetimes'. + +Here's the simple explanation: would you expect this code to compile? ~~~rust{.ignore} fn main() { @@ -428,64 +361,6 @@ hard for a computer, too! There is an entire [guide devoted to references and lifetimes](guide-lifetimes.html) that goes into lifetimes in great detail, so if you want the full details, check that out. -# Returning Pointers - -We've talked a lot about functions that accept various kinds of pointers, but -what about returning them? In general, it is better to let the caller decide -how to use a function's output, instead of assuming a certain type of pointer -is best. - -What does that mean? Don't do this: - -~~~rust -fn foo(x: Box) -> Box { - return box *x; -} - -fn main() { - let x = box 5; - let y = foo(x); -} -~~~ - -Do this: - -~~~rust -fn foo(x: Box) -> int { - return *x; -} - -fn main() { - let x = box 5; - let y = box foo(x); -} -~~~ - -This gives you flexibility, without sacrificing performance. For example, this will -also work: - -~~~rust -fn foo(x: Box) -> int { - return *x; -} - -fn main() { - let x = box 5; - let y = @foo(x); -} -~~~ - -You may think that this gives us terrible performance: return a value and then -immediately box it up?!?! Isn't that the worst of both worlds? Rust is smarter -than that. There is no copy in this code. `main` allocates enough room for the -`@int`, passes a pointer to that memory into `foo` as `x`, and then `foo` writes -the value straight into that pointer. This writes the return value directly into -the allocated box. - -This is important enough that it bears repeating: pointers are not for optimizing -returning values from your code. Allow the caller to choose how they want to -use your output. - # Related Resources From 99744653d5abb949e446daf0732be79c76aa6f79 Mon Sep 17 00:00:00 2001 From: Alan Andrade Date: Sat, 24 May 2014 13:15:48 -0700 Subject: [PATCH 2/3] get over bold text madness, changes per PR, brought the "returning pointers" section back to pointers guide --- src/doc/guide-lifetimes.md | 30 +++++++++--------- src/doc/guide-pointers.md | 63 ++++++++++++++++++++++++++++++++------ 2 files changed, 67 insertions(+), 26 deletions(-) diff --git a/src/doc/guide-lifetimes.md b/src/doc/guide-lifetimes.md index e0e33337c9a..3c0d8c4797c 100644 --- a/src/doc/guide-lifetimes.md +++ b/src/doc/guide-lifetimes.md @@ -45,12 +45,11 @@ let on_the_heap : Box = box Point {x: 7.0, y: 9.0}; Suppose we wanted to write a procedure that computed the distance between any two points, no matter where they were stored. One option is to define a function -that takes two arguments of type `Point`—that is, it takes the points __by value__. -But if we define it this way, calling the function will cause the points __to be -copied__. For points, this is probably not so bad, but often copies are -expensive. Worse, if the data type contains mutable fields, copying can change -the semantics of your program in unexpected ways. So we'd like to define -a function that takes the points just as a __reference__/__borrowed pointer__. +that takes two arguments of type `Point`—that is, it takes the points by value. +But if we define it this way, calling the function will cause the points to be +copied. For points, this is probably not so bad, but often copies are +expensive. So we'd like to define a function that takes the points just as +a reference. ~~~ # struct Point {x: f64, y: f64} @@ -62,27 +61,26 @@ fn compute_distance(p1: &Point, p2: &Point) -> f64 { } ~~~ -Now we can call `compute_distance()` +Now we can call `compute_distance()`: ~~~ # struct Point {x: f64, y: f64} # let on_the_stack : Point = Point{x: 3.0, y: 4.0}; # let on_the_heap : Box = box Point{x: 7.0, y: 9.0}; # fn compute_distance(p1: &Point, p2: &Point) -> f64 { 0.0 } -compute_distance(&on_the_stack, on_the_heap); +compute_distance(&on_the_stack, &*on_the_heap); ~~~ Here, the `&` operator takes the address of the variable `on_the_stack`; this is because `on_the_stack` has the type `Point` (that is, a struct value) and we have to take its address to get a value. We also call this _borrowing_ the local variable -`on_the_stack`, because we have created __an alias__: that is, another +`on_the_stack`, because we have created an alias: that is, another name for the same data. -In contrast, we can pass `on_the_heap` to `compute_distance` directly. -The compiler automatically converts a box like `Box` to a reference like -`&Point`. This is another form of borrowing: in this case, the caller lends -the contents of the box to the callee. +For the second argument, we need to grab the contents of `on_the_heap` +by using the `*` operator, and then get a reference to that data. In +order to convert `Box` into a `&T`, we need to use `&*`. Whenever a caller lends data to a callee, there are some limitations on what the caller can do with the original. For example, if the contents of a @@ -166,12 +164,12 @@ as well as from the owned box, and then compute the distance between them. # Lifetimes -We’ve seen a few examples of borrowing data. Up till this point, we’ve glossed +We’ve seen a few examples of borrowing data. To this point, we’ve glossed over issues of safety. As stated in the introduction, at runtime a reference is simply a pointer, nothing more. Therefore, avoiding C's problems with dangling pointers requires a compile-time safety check. -The basis for the check is the notion of __lifetimes__. A lifetime is a +The basis for the check is the notion of _lifetimes_. A lifetime is a static approximation of the span of execution during which the pointer is valid: it always corresponds to some expression or block within the program. @@ -324,7 +322,7 @@ circle constant][tau] and not that dreadfully outdated notion of pi). The second match is more interesting. Here we match against a rectangle and extract its size: but rather than copy the `size` -struct, we use a __by-reference binding__ to create a pointer to it. In +struct, we use a by-reference binding to create a pointer to it. In other words, a pattern binding like `ref size` binds the name `size` to a pointer of type `&size` into the _interior of the enum_. diff --git a/src/doc/guide-pointers.md b/src/doc/guide-pointers.md index bee1dbcd2ce..248142851b7 100644 --- a/src/doc/guide-pointers.md +++ b/src/doc/guide-pointers.md @@ -37,7 +37,7 @@ error: mismatched types: expected `&int` but found `` (expec What gives? It needs a pointer! Therefore I have to use pointers!" -Turns out, you don't. __All you need is a reference__. Try this on for size: +Turns out, you don't. All you need is a reference. Try this on for size: ~~~rust # fn succ(x: &int) -> int { *x + 1 } @@ -74,8 +74,7 @@ Here are the use-cases for pointers. I've prefixed them with the name of the pointer that satisfies that use-case: 1. Owned: `Box` must be a pointer, because you don't know the size of the -object, so indirection is mandatory. Notation might change once Rust -support DST fully so we recommend you stay tuned. +object, so indirection is mandatory. 2. Owned: You need a recursive data structure. These can be infinite sized, so indirection is mandatory. @@ -89,7 +88,10 @@ common, such as C++, please read "A note..." below. care about its ownership. If you make the argument a reference, callers can send in whatever kind they want. -Four exceptions. That's it. Otherwise, you shouldn't need them. Be sceptical +5. Shared: You need to share data among tasks. You can achieve that via the +`Rc` and `Arc` types. + +Five exceptions. That's it. Otherwise, you shouldn't need them. Be sceptical of pointers in Rust: use them for a deliberate purpose, not just to make the compiler happy. @@ -205,10 +207,6 @@ The inner lists _must_ be an owned pointer, because we can't know how many elements are in the list. Without knowing the length, we don't know the size, and therefore require the indirection that pointers offer. -> Note: Nil is just part of the List enum and even though is being used -> to represent the concept of "nothing", you shouldn't think of it as -> NULL. Rust doesn't have NULL. - ## Efficiency This should almost never be a concern, but because creating an owned pointer @@ -284,8 +282,8 @@ fn main() { ~~~ This prints `5.83095189`. You can see that the `compute_distance` function -takes in two references, but we give it a stack allocated reference and an -owned box reference. +takes in two references, a reference to a value on the stack, and a reference +to a value in a box. Of course, if this were a real program, we wouldn't have any of these pointers, they're just there to demonstrate the concepts. @@ -361,6 +359,51 @@ hard for a computer, too! There is an entire [guide devoted to references and lifetimes](guide-lifetimes.html) that goes into lifetimes in great detail, so if you want the full details, check that out. +# Returning Pointers + +We've talked a lot about functions that accept various kinds of pointers, but +what about returning them? In general, it is better to let the caller decide +how to use a function's output, instead of assuming a certain type of pointer +is best. + +What does that mean? Don't do this: + +~~~rust +fn foo(x: Box) -> Box { + return box *x; +} + +fn main() { + let x = box 5; + let y = foo(x); +} +~~~ + +Do this: + +~~~rust +fn foo(x: Box) -> int { + return *x; +} + +fn main() { + let x = box 5; + let y = box foo(x); +} +~~~ + +This gives you flexibility, without sacrificing performance. + +You may think that this gives us terrible performance: return a value and then +immediately box it up ?! Isn't that the worst of both worlds? Rust is smarter +than that. There is no copy in this code. `main` allocates enough room for the +`box int`, passes a pointer to that memory into `foo` as `x`, and then `foo` writes +the value straight into that pointer. This writes the return value directly into +the allocated box. + +This is important enough that it bears repeating: pointers are not for optimizing +returning values from your code. Allow the caller to choose how they want to +use your output. # Related Resources From 0cae84959568859f946dffb1e9d9e1d43e05ae6b Mon Sep 17 00:00:00 2001 From: Alan Andrade Date: Sat, 24 May 2014 17:08:00 -0700 Subject: [PATCH 3/3] fix mostly grammar per PR comments --- src/doc/guide-lifetimes.md | 38 ++++++++++++++++++-------------------- 1 file changed, 18 insertions(+), 20 deletions(-) diff --git a/src/doc/guide-lifetimes.md b/src/doc/guide-lifetimes.md index 3c0d8c4797c..40070c4dd4b 100644 --- a/src/doc/guide-lifetimes.md +++ b/src/doc/guide-lifetimes.md @@ -14,20 +14,19 @@ Despite their complete safety, a reference's representation at runtime is the same as that of an ordinary pointer in a C program. They introduce zero overhead. The compiler does all safety checks at compile time. -Although references have rather elaborate theoretical -underpinnings (region pointers), the core concepts will be familiar to -anyone who has worked with C or C++. Therefore, the best way to explain -how they are used—and their limitations—is probably just to work -through several examples. +Although references have rather elaborate theoretical underpinnings usually +introduced as (e.g. region pointers), the core concepts will be familiar to +anyone who has worked with C or C++. The best way to explain how they are +used—and their limitations—is probably just to work through several examples. # By example References, sometimes known as *borrowed pointers*, are only valid for a limited duration. References never claim any kind of ownership -over the data that they point to, instead, they are used for cases +over the data that they point to. Instead, they are used for cases where you would like to use data for a short time. -As an example, consider a simple struct type `Point`: +Consider a simple struct type `Point`: ~~~ struct Point {x: f64, y: f64} @@ -78,9 +77,9 @@ value. We also call this _borrowing_ the local variable `on_the_stack`, because we have created an alias: that is, another name for the same data. -For the second argument, we need to grab the contents of `on_the_heap` -by using the `*` operator, and then get a reference to that data. In -order to convert `Box` into a `&T`, we need to use `&*`. +For the second argument, we need to extract the contents of `on_the_heap` +by derefercing with the `*` symbol. Now that we have the data, we need +to create a reference with the `&` symbol. Whenever a caller lends data to a callee, there are some limitations on what the caller can do with the original. For example, if the contents of a @@ -194,7 +193,7 @@ fn example3() -> int { } ~~~ -Here, as before, the interior of the variable `x` is being borrowed +Here, the interior of the variable `x` is being borrowed and `x` is declared as mutable. However, the compiler can prove that `x` is not assigned anywhere in the lifetime L of the variable `y`. Therefore, it accepts the function, even though `x` is mutable @@ -281,8 +280,8 @@ prevents pointers from pointing into freed memory. There is one other case where the compiler must be very careful to ensure that pointers remain valid: pointers into the interior of an `enum`. -As an example, let’s look at the following `shape` type that can -represent both rectangles and circles: +Let’s look at the following `shape` type that can represent both rectangles +and circles: ~~~ struct Point {x: f64, y: f64}; // as before @@ -391,7 +390,7 @@ reference, then uses it within the same scope. It is also possible to return references as the result of a function, but as we'll see, doing so requires some explicit annotation. -For example, we could write a subroutine like this: +We could write a subroutine like this: ~~~ struct Point {x: f64, y: f64} @@ -412,11 +411,10 @@ pointer result will always have the same lifetime as one of the parameters; named lifetimes indicate which parameter that is. -In the previous examples, function parameter types did not include a -lifetime name. In those examples, the compiler simply creates a fresh -name for the lifetime automatically: that is, the lifetime name is -guaranteed to refer to a distinct lifetime from the lifetimes of all -other parameters. +In the previous code samples, function parameter types did not include a +lifetime name. The compiler simply creates a fresh name for the lifetime +automatically: that is, the lifetime name is guaranteed to refer to a distinct +lifetime from the lifetimes of all other parameters. Named lifetimes that appear in function signatures are conceptually the same as the other lifetimes we have seen before, but they are a bit @@ -461,7 +459,7 @@ guarantees; in fact, it cannot guarantee that the pointer will remain valid at all once it returns, as the parameter `p` may or may not be live in the caller. Therefore, the compiler will report an error here. -In general, if you borrow a structs or boxes to create a +In general, if you borrow a struct or box to create a reference, it will only be valid within the function and cannot be returned. This is why the typical way to return references is to take references as input (the only other case in