Revamped testing guide

This commit is contained in:
Steve Klabnik 2014-12-07 16:44:01 -05:00
parent c7a9b49d1b
commit d4ea71dbc1

View File

@ -1,283 +1,515 @@
% The Rust Testing Guide
# Quick start
> Program testing can be a very effective way to show the presence of bugs, but
> it is hopelessly inadequate for showing their absence.
>
> Edsger W. Dijkstra, "The Humble Programmer" (1972)
To create test functions, add a `#[test]` attribute like this:
Let's talk about how to test Rust code. What we will not be talking about is
the right way to test Rust code. There are many schools of thought regarding
the right and wrong way to write tests. All of these approaches use the same
basic tools, and so we'll show you the syntax for using them.
~~~test_harness
fn return_two() -> int {
2
}
# The `test` attribute
At its simplest, a test in Rust is a function that's annotated with the `test`
attribute. Let's make a new project with Cargo called `adder`:
```bash
$ cargo new adder
$ cd adder
```
Cargo will automatically generate a simple test when you make a new project.
Here's the contents of `src/lib.rs`:
```rust
#[test]
fn return_two_test() {
let x = return_two();
assert!(x == 2);
fn it_works() {
}
~~~
```
To run these tests, compile with `rustc --test` and run the resulting
binary:
Note the `#[test]`. This attribute indicates that this is a test function. It
currently has no body. That's good enough to pass! We can run the tests with
`cargo test`:
```bash
$ cargo test
Compiling adder v0.0.1 (file:///home/you/projects/adder)
Running target/adder-91b3e234d4ed382a
~~~console
$ rustc --test foo.rs
$ ./foo
running 1 test
test return_two_test ... ok
test it_works ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured
~~~
`rustc foo.rs` will *not* compile the tests, since `#[test]` implies
`#[cfg(test)]`. The `--test` flag to `rustc` implies `--cfg test`.
Doc-tests adder
running 0 tests
# Unit testing in Rust
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured
```
Rust has built in support for simple unit testing. Functions can be
marked as unit tests using the `test` attribute.
Cargo compiled and ran our tests. There are two sets of output here: one
for the test we wrote, and another for documentation tests. We'll talk about
those later. For now, see this line:
~~~test_harness
```text
test it_works ... ok
```
Note the `it_works`. This comes from the name of our function:
```rust
fn it_works() {
# }
```
We also get a summary line:
```text
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured
```
So why does our do-nothing test pass? Any test which doesn't `panic!` passes,
and any test that does `panic!` fails. Let's make our test fail:
```rust
#[test]
fn return_none_if_empty() {
// ... test code ...
fn it_works() {
assert!(false);
}
~~~
```
A test function's signature must have no arguments and no return
value. To run the tests in a crate, it must be compiled with the
`--test` flag: `rustc myprogram.rs --test -o myprogram-tests`. Running
the resulting executable will run all the tests in the crate. A test
is considered successful if its function returns; if the task running
the test fails, through a call to `panic!`, a failed `assert`, or some
other (`assert_eq`, ...) means, then the test fails.
`assert!` is a macro provided by Rust which takes one argument: if the argument
is `true`, nothing happens. If the argument is false, it `panic!`s. Let's run
our tests again:
When compiling a crate with the `--test` flag `--cfg test` is also
implied, so that tests can be conditionally compiled.
```bash
$ cargo test
Compiling adder v0.0.1 (file:///home/you/projects/adder)
Running target/adder-91b3e234d4ed382a
~~~test_harness
#[cfg(test)]
mod tests {
#[test]
fn return_none_if_empty() {
// ... test code ...
}
}
~~~
running 1 test
test it_works ... FAILED
Additionally `#[test]` items behave as if they also have the
`#[cfg(test)]` attribute, and will not be compiled when the `--test` flag
is not used.
failures:
Tests that should not be run can be annotated with the `ignore`
attribute. The existence of these tests will be noted in the test
runner output, but the test will not be run. Tests can also be ignored
by configuration using the `cfg_attr` attribute so, for example, to ignore a
test on windows you can write `#[cfg_attr(windows, ignore)]`.
---- it_works stdout ----
task 'it_works' panicked at 'assertion failed: false', /home/steve/tmp/adder/src/lib.rs:3
Tests that are intended to fail can be annotated with the
`should_fail` attribute. The test will be run, and if it causes its
task to panic then the test will be counted as successful; otherwise it
will be counted as a failure. For example:
~~~test_harness
failures:
it_works
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured
task '<main>' panicked at 'Some tests failed', /home/steve/src/rust/src/libtest/lib.rs:247
```
Rust indicates that our test failed:
```text
test it_works ... FAILED
```
And that's reflected in the summary line:
```text
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured
```
We also get a non-zero status code:
```bash
$ echo $?
101
```
This is useful if you want to integrate `cargo test` into other tooling.
We can invert our test's failure with another attribute: `should_fail`:
```rust
#[test]
#[should_fail]
fn test_out_of_bounds_failure() {
let v: &[int] = &[];
v[0];
fn it_works() {
assert!(false);
}
~~~
```
`#[should_fail]` tests can be fragile as it's hard to guarantee that the test
This test will now succeed if we `panic!` and fail if we complete. Let's try it:
```bash
$ cargo test
Compiling adder v0.0.1 (file:///home/you/projects/adder)
Running target/adder-91b3e234d4ed382a
running 1 test
test it_works ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured
Doc-tests adder
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured
```
Rust provides another macro, `assert_eq!`, that compares two arguments for
equality:
```rust
#[test]
#[should_fail]
fn it_works() {
assert_eq!("Hello", "world");
}
```
Does this test pass or fail? Because of the `should_fail` attribute, it
passes:
```bash
$ cargo test
Compiling adder v0.0.1 (file:///home/you/projects/adder)
Running target/adder-91b3e234d4ed382a
running 1 test
test it_works ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured
Doc-tests adder
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured
```
`should_fail` tests can be fragile, as it's hard to guarantee that the test
didn't fail for an unexpected reason. To help with this, an optional `expected`
parameter can be added to the `should_fail` attribute. The test harness will
make sure that the failure message contains the provided text. A safer version
of the example above would be:
~~~test_harness
```
#[test]
#[should_fail(expected = "index out of bounds")]
fn test_out_of_bounds_failure() {
let v: &[int] = &[];
v[0];
#[should_fail(expected = "assertion failed")]
fn it_works() {
assert_eq!("Hello", "world");
}
~~~
```
A test runner built with the `--test` flag supports a limited set of
arguments to control which tests are run:
That's all there is to the basics! Let's write one 'real' test:
- the first free argument passed to a test runner is interpreted as a
regular expression
([syntax reference](regex/index.html#syntax))
and is used to narrow down the set of tests being run. Note: a plain
string is a valid regular expression that matches itself.
- the `--ignored` flag tells the test runner to run only tests with the
`ignore` attribute.
```{rust,ignore}
pub fn add_two(a: i32) -> i32 {
a + 2
}
## Parallelism
#[test]
fn it_works() {
assert_eq!(4, add_two(2));
}
```
By default, tests are run in parallel, which can make interpreting
failure output difficult. In these cases you can set the
`RUST_TEST_TASKS` environment variable to 1 to make the tests run
sequentially.
This is a very common use of `assert_eq!`: call some function with
some known arguments and compare it to the expected output.
## Examples
# The `test` module
### Typical test run
There is one way in which our existing example is not idiomatic: it's
missing the test module. The idiomatic way of writing our example
looks like this:
~~~console
$ mytests
```{rust,ignore}
pub fn add_two(a: i32) -> i32 {
a + 2
}
running 30 tests
running driver::tests::mytest1 ... ok
running driver::tests::mytest2 ... ignored
... snip ...
running driver::tests::mytest30 ... ok
#[cfg(test)]
mod tests {
use super::add_two;
result: ok. 28 passed; 0 failed; 2 ignored
~~~
#[test]
fn it_works() {
assert_eq!(4, add_two(2));
}
}
```
### Test run with failures
There's a few changes here. The first is the introduction of a `mod tests` with
a `cfg` attribute. The module allows us to group all of our tests together, and
to also define helper functions if needed, that don't become a part of the rest
of our crate. The `cfg` attribute only compiles our test code if we're
currently trying to run the tests. This can save compile time, and also ensures
that our tests are entirely left out of a normal build.
~~~console
$ mytests
The second change is the `use` declaration. Because we're in an inner module,
we need to bring our test function into scope. This can be annoying if you have
a large module, and so this is a common use of the `glob` feature. Let's change
our `src/lib.rs` to make use of it:
running 30 tests
running driver::tests::mytest1 ... ok
running driver::tests::mytest2 ... ignored
... snip ...
running driver::tests::mytest30 ... FAILED
```{rust,ignore}
#![feature(globs)]
result: FAILED. 27 passed; 1 failed; 2 ignored
~~~
pub fn add_two(a: i32) -> i32 {
a + 2
}
### Running ignored tests
#[cfg(test)]
mod tests {
use super::*;
~~~console
$ mytests --ignored
#[test]
fn it_works() {
assert_eq!(4, add_two(2));
}
}
```
Note the `feature` attribute, as well as the different `use` line. Now we run
our tests:
```bash
$ cargo test
Updating registry `https://github.com/rust-lang/crates.io-index`
Compiling adder v0.0.1 (file:///home/you/projects/adder)
Running target/adder-91b3e234d4ed382a
running 1 test
test test::it_works ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured
Doc-tests adder
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured
```
It works!
The current convention is to use the `test` module to hold your "unit"-style
tests. Anything that just tests one small bit of functionality makes sense to
go here. But what about "integration"-style tests instead? For that, we have
the `tests` directory
# The `tests` directory
To write an integration test, let's make a `tests` directory, and
put a `tests/lib.rs` file inside, with this as its contents:
```{rust,ignore}
extern crate adder;
#[test]
fn it_works() {
assert_eq(4, adder::add_two(2));
}
```
This looks similar to our previous tests, but slightly different. We now have
an `extern crate adder` at the top. This is because the tests in the `tests`
directory are an entirely separate crate, and so we need to import our library.
This is also why `tests` is a suitable place to write integration-style tests:
they use the library like any other consumer of it would.
Let's run them:
```bash
$ cargo test
Compiling adder v0.0.1 (file:///home/you/projects/adder)
Running target/adder-91b3e234d4ed382a
running 1 test
test test::it_works ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured
Running target/lib-c18e7d3494509e74
running 1 test
test it_works ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured
Doc-tests adder
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured
```
Now we have three sections: our previous test is also run, as well as our new
one.
That's all there is to the `tests` directory. The `test` module isn't needed
here, since the whole thing is focused on tests.
Let's finally check out that third section: documentation tests.
# Documentation tests
Nothing is better than documentation with examples. Nothing is worse than
examples that don't actually work, because the code has changed since the
documentation has been written. To this end, Rust supports automaticaly
running examples in your documentation. Here's a fleshed-out `src/lib.rs`
with examples:
```{rust,ignore}
//! The `adder` crate provides functions that add numbers to other numbers.
//!
//! # Examples
//!
//! ```
//! assert_eq!(4, adder::add_two(2));
//! ```
#![feature(globs)]
/// This function adds two to its argument.
///
/// # Examples
///
/// ```
/// use adder::add_two;
///
/// assert_eq!(4, add_two(2));
/// ```
pub fn add_two(a: i32) -> i32 {
a + 2
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn it_works() {
assert_eq!(4, add_two(2));
}
}
```
Note the module-level documentation with `//!` and the function-level
documentation with `///`. Rust's documentation supports Markdown in comments,
and so triple graves mark code blocks. It is conventional to include the
`# Examples` section, exactly like that, with examples following.
Let's run the tests again:
```bash
$ cargo test
Compiling adder v0.0.1 (file:///home/steve/tmp/adder)
Running target/adder-91b3e234d4ed382a
running 1 test
test test::it_works ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured
Running target/lib-c18e7d3494509e74
running 1 test
test it_works ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured
Doc-tests adder
running 2 tests
running driver::tests::mytest2 ... failed
running driver::tests::mytest10 ... ok
test add_two_0 ... ok
test _0 ... ok
result: FAILED. 1 passed; 1 failed; 0 ignored
~~~
test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured
```
### Running a subset of tests
Now we have all three kinds of tests running! Note the names of the
documentation tests: the `_0` is generated for the module test, and `add_two_0`
for the function test. These will auto increment with names like `add_two_1` as
you add more examples.
Using a plain string:
# Benchmark tests
~~~console
$ mytests mytest23
Rust also supports benchmark tests, which can test the performance of your
code. Let's make our `src/lib.rs` look like this (comments elided):
running 1 tests
running driver::tests::mytest23 ... ok
```{rust,ignore}
#![feature(globs)]
result: ok. 1 passed; 0 failed; 0 ignored
~~~
Using some regular expression features:
~~~console
$ mytests 'mytest[145]'
running 13 tests
running driver::tests::mytest1 ... ok
running driver::tests::mytest4 ... ok
running driver::tests::mytest5 ... ok
running driver::tests::mytest10 ... ignored
... snip ...
running driver::tests::mytest19 ... ok
result: ok. 13 passed; 0 failed; 1 ignored
~~~
# Microbenchmarking
The test runner also understands a simple form of benchmark execution.
Benchmark functions are marked with the `#[bench]` attribute, rather
than `#[test]`, and have a different form and meaning. They are
compiled along with `#[test]` functions when a crate is compiled with
`--test`, but they are not run by default. To run the benchmark
component of your testsuite, pass `--bench` to the compiled test
runner.
The type signature of a benchmark function differs from a unit test:
it takes a mutable reference to type
`test::Bencher`. Inside the benchmark function, any
time-variable or "setup" code should execute first, followed by a call
to `iter` on the benchmark harness, passing a closure that contains
the portion of the benchmark you wish to actually measure the
per-iteration speed of.
For benchmarks relating to processing/generating data, one can set the
`bytes` field to the number of bytes consumed/produced in each
iteration; this will be used to show the throughput of the benchmark.
This must be the amount used in each iteration, *not* the total
amount.
For example:
~~~test_harness
extern crate test;
use test::Bencher;
#[bench]
fn bench_sum_1024_ints(b: &mut Bencher) {
let v = Vec::from_fn(1024, |n| n);
b.iter(|| v.iter().fold(0, |old, new| old + *new));
pub fn add_two(a: i32) -> i32 {
a + 2
}
#[bench]
fn initialise_a_vector(b: &mut Bencher) {
b.iter(|| Vec::from_elem(1024, 0u64));
b.bytes = 1024 * 8;
}
~~~
#[cfg(test)]
mod tests {
use super::*;
use test::Bencher;
The benchmark runner will calibrate measurement of the benchmark
function to run the `iter` block "enough" times to get a reliable
measure of the per-iteration speed.
#[test]
fn it_works() {
assert_eq!(4, add_two(2));
}
#[bench]
fn bench_add_two(b: &mut Bencher) {
b.iter(|| add_two(2));
}
}
```
We've imported the `test` crate, which contains our benchmarking support.
We have a new function as well, with the `bench` attribute. Unlike regular
tests, which take no arguments, benchmark tests take a `&mut Bencher`. This
`Bencher` provides an `iter` method, which takes a closure. This closure
contains the code we'd like to benchmark.
We can run benchmark tests with `cargo bench`:
```bash
$ cargo bench
Compiling adder v0.0.1 (file:///home/steve/tmp/adder)
Running target/release/adder-91b3e234d4ed382a
running 2 tests
test tests::it_works ... ignored
test tests::bench_add_two ... bench: 1 ns/iter (+/- 0)
test result: ok. 0 passed; 0 failed; 1 ignored; 1 measured
```
Our non-benchmark test was ignored. You may have noticed that `cargo bench`
takes a bit longer than `cargo test`. This is because Rust runs our benchmark
a number of times, and then takes the average. Because we're doing so little
work in this example, we have a `1 ns/iter (+/- 0)`, but this would show
the variance if there was one.
Advice on writing benchmarks:
- Move setup code outside the `iter` loop; only put the part you
want to measure inside
- Make the code do "the same thing" on each iteration; do not
accumulate or change state
- Make the outer function idempotent too; the benchmark runner is
likely to run it many times
- Make the inner `iter` loop short and fast so benchmark runs are
fast and the calibrator can adjust the run-length at fine
resolution
- Make the code in the `iter` loop do something simple, to assist in
pinpointing performance improvements (or regressions)
To run benchmarks, pass the `--bench` flag to the compiled
test-runner. Benchmarks are compiled-in but not executed by default.
* Move setup code outside the `iter` loop; only put the part you want to measure inside
* Make the code do "the same thing" on each iteration; do not accumulate or change state
* Make the outer function idempotent too; the benchmark runner is likely to run
it many times
* Make the inner `iter` loop short and fast so benchmark runs are fast and the
calibrator can adjust the run-length at fine resolution
* Make the code in the `iter` loop do something simple, to assist in pinpointing
performance improvements (or regressions)
~~~console
$ rustc mytests.rs -O --test
$ mytests --bench
There's another tricky part to writing benchmarks: benchmarks compiled with
optimizations activated can be dramatically changed by the optimizer so that
the benchmark is no longer benchmarking what one expects. For example, the
compiler might recognize that some calculation has no external effects and
remove it entirely.
running 2 tests
test bench_sum_1024_ints ... bench: 709 ns/iter (+/- 82)
test initialise_a_vector ... bench: 424 ns/iter (+/- 99) = 19320 MB/s
test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured
~~~
## Benchmarks and the optimizer
Benchmarks compiled with optimizations activated can be dramatically
changed by the optimizer so that the benchmark is no longer
benchmarking what one expects. For example, the compiler might
recognize that some calculation has no external effects and remove
it entirely.
~~~test_harness
```{rust,ignore}
extern crate test;
use test::Bencher;
@ -287,36 +519,36 @@ fn bench_xor_1000_ints(b: &mut Bencher) {
range(0u, 1000).fold(0, |old, new| old ^ new);
});
}
~~~
```
gives the following results
~~~console
```text
running 1 test
test bench_xor_1000_ints ... bench: 0 ns/iter (+/- 0)
test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured
~~~
```
The benchmarking runner offers two ways to avoid this. Either, the
closure that the `iter` method receives can return an arbitrary value
which forces the optimizer to consider the result used and ensures it
cannot remove the computation entirely. This could be done for the
example above by adjusting the `b.iter` call to
The benchmarking runner offers two ways to avoid this. Either, the closure that
the `iter` method receives can return an arbitrary value which forces the
optimizer to consider the result used and ensures it cannot remove the
computation entirely. This could be done for the example above by adjusting the
`b.iter` call to
~~~
```rust
# struct X; impl X { fn iter<T>(&self, _: || -> T) {} } let b = X;
b.iter(|| {
// note lack of `;` (could also use an explicit `return`).
range(0u, 1000).fold(0, |old, new| old ^ new)
});
~~~
```
Or, the other option is to call the generic `test::black_box`
function, which is an opaque "black box" to the optimizer and so
forces it to consider any argument as used.
Or, the other option is to call the generic `test::black_box` function, which
is an opaque "black box" to the optimizer and so forces it to consider any
argument as used.
~~~
```rust
extern crate test;
# fn main() {
@ -325,54 +557,17 @@ b.iter(|| {
test::black_box(range(0u, 1000).fold(0, |old, new| old ^ new));
});
# }
~~~
```
Neither of these read or modify the value, and are very cheap for
small values. Larger values can be passed indirectly to reduce
overhead (e.g. `black_box(&huge_struct)`).
Neither of these read or modify the value, and are very cheap for small values.
Larger values can be passed indirectly to reduce overhead (e.g.
`black_box(&huge_struct)`).
Performing either of the above changes gives the following
benchmarking results
Performing either of the above changes gives the following benchmarking results
~~~console
```text
running 1 test
test bench_xor_1000_ints ... bench: 375 ns/iter (+/- 148)
test bench_xor_1000_ints ... bench: 1 ns/iter (+/- 0)
test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured
~~~
However, the optimizer can still modify a testcase in an undesirable
manner even when using either of the above. Benchmarks can be checked
by hand by looking at the output of the compiler using the `--emit=ir`
(for LLVM IR), `--emit=asm` (for assembly) or compiling normally and
using any method for examining object code.
## Saving and ratcheting metrics
When running benchmarks or other tests, the test runner can record
per-test "metrics". Each metric is a scalar `f64` value, plus a noise
value which represents uncertainty in the measurement. By default, all
`#[bench]` benchmarks are recorded as metrics, which can be saved as
JSON in an external file for further reporting.
In addition, the test runner supports _ratcheting_ against a metrics
file. Ratcheting is like saving metrics, except that after each run,
if the output file already exists the results of the current run are
compared against the contents of the existing file, and any regression
_causes the testsuite to fail_. If the comparison passes -- if all
metrics stayed the same (within noise) or improved -- then the metrics
file is overwritten with the new values. In this way, a metrics file
in your workspace can be used to ensure your work does not regress
performance.
Test runners take 3 options that are relevant to metrics:
- `--save-metrics=<file.json>` will save the metrics from a test run
to `file.json`
- `--ratchet-metrics=<file.json>` will ratchet the metrics against
the `file.json`
- `--ratchet-noise-percent=N` will override the noise measurements
in `file.json`, and consider a metric change less than `N%` to be
noise. This can be helpful if you are testing in a noisy
environment where the benchmark calibration loop cannot acquire a
clear enough signal.
```