# Rationale
When dealing with strings, many functions deal with either a `char` (unicode
codepoint) or a byte (utf-8 encoding related). There is often an inconsistent
way in which methods are referred to as to whether they contain "byte", "char",
or nothing in their name. There are also issues open to rename *all* methods to
reflect that they operate on utf8 encodings or bytes (e.g. utf8_len() or
byte_len()).
The current state of String seems to largely be what is desired, so this PR
proposes the following rationale for methods dealing with bytes or characters:
> When constructing a string, the input encoding *must* be mentioned (e.g.
> from_utf8). This makes it clear what exactly the input type is expected to be
> in terms of encoding.
>
> When a method operates on anything related to an *index* within the string
> such as length, capacity, position, etc, the method *implicitly* operates on
> bytes. It is an understood fact that String is a utf-8 encoded string, and
> burdening all methods with "bytes" would be redundant.
>
> When a method operates on the *contents* of a string, such as push() or pop(),
> then "char" is the default type. A String can loosely be thought of as being a
> collection of unicode codepoints, but not all collection-related operations
> make sense because some can be woefully inefficient.
# Method stabilization
The following methods have been marked #[stable]
* The String type itself
* String::new
* String::with_capacity
* String::from_utf16_lossy
* String::into_bytes
* String::as_bytes
* String::len
* String::clear
* String::as_slice
The following methods have been marked #[unstable]
* String::from_utf8 - The error type in the returned `Result` may change to
provide a nicer message when it's `unwrap()`'d
* String::from_utf8_lossy - The returned `MaybeOwned` type still needs
stabilization
* String::from_utf16 - The return type may change to become a `Result` which
includes more contextual information like where the error
occurred.
* String::from_chars - This is equivalent to iter().collect(), but currently not
as ergonomic.
* String::from_char - This method is the equivalent of Vec::from_elem, and has
been marked #[unstable] becuase it can be seen as a
duplicate of iterator-based functionality as well as
possibly being renamed.
* String::push_str - This *can* be emulated with .extend(foo.chars()), but is
less efficient because of decoding/encoding. Due to the
desire to minimize API surface this may be able to be
removed in the future for something possibly generic with
no loss in performance.
* String::grow - This is a duplicate of iterator-based functionality, which may
become more ergonomic in the future.
* String::capacity - This function was just added.
* String::push - This function was just added.
* String::pop - This function was just added.
* String::truncate - The failure conventions around String methods and byte
indices isn't totally clear at this time, so the failure
semantics and return value of this method are subject to
change.
* String::as_mut_vec - the naming of this method may change.
* string::raw::* - these functions are all waiting on [an RFC][2]
[2]: rust-lang/rfcs#240
The following method have been marked #[experimental]
* String::from_str - This function only exists as it's more efficient than
to_string(), but having a less ergonomic function for
performance reasons isn't the greatest reason to keep it
around. Like Vec::push_all, this has been marked
experimental for now.
The following methods have been #[deprecated]
* String::append - This method has been deprecated to remain consistent with the
deprecation of Vec::append. While convenient, it is one of
the only functional-style apis on String, and requires more
though as to whether it belongs as a first-class method or
now (and how it relates to other collections).
* String::from_byte - This is fairly rare functionality and can be emulated with
str::from_utf8 plus an assert plus a call to to_string().
Additionally, String::from_char could possibly be used.
* String::byte_capacity - Renamed to String::capacity due to the rationale
above.
* String::push_char - Renamed to String::push due to the rationale above.
* String::pop_char - Renamed to String::pop due to the rationale above.
* String::push_bytes - There are a number of `unsafe` functions on the `String`
type which allow bypassing utf-8 checks. These have all
been deprecated in favor of calling `.as_mut_vec()` and
then operating directly on the vector returned. These
methods were deprecated because naming them with relation
to other methods was difficult to rationalize and it's
arguably more composable to call .as_mut_vec().
* String::as_mut_bytes - See push_bytes
* String::push_byte - See push_bytes
* String::pop_byte - See push_bytes
* String::shift_byte - See push_bytes
# Reservation methods
This commit does not yet touch the methods for reserving bytes. The methods on
Vec have also not yet been modified. These methods are discussed in the upcoming
[Collections reform RFC][1]
[1]: https://github.com/aturon/rfcs/blob/collections-conventions/active/0000-collections-conventions.md#implicit-growth
I don't know if anything else was relying on the old behavior, this seems more correct.
Fixes#16348
If '-F' is allowed to have an optional argument, with the previous version '-FF' would be translated to '-F -F'. In this new version '-FF' translates to '-F' with argument 'F'
declared with the same name in the same scope.
This breaks several common patterns. First are unused imports:
use foo::bar;
use baz::bar;
Change this code to the following:
use baz::bar;
Second, this patch breaks globs that import names that are shadowed by
subsequent imports. For example:
use foo::*; // including `bar`
use baz::bar;
Change this code to remove the glob:
use foo::{boo, quux};
use baz::bar;
Or qualify all uses of `bar`:
use foo::{boo, quux};
use baz;
... baz::bar ...
Finally, this patch breaks code that, at top level, explicitly imports
`std` and doesn't disable the prelude.
extern crate std;
Because the prelude imports `std` implicitly, there is no need to
explicitly import it; just remove such directives.
The old behavior can be opted into via the `import_shadowing` feature
gate. Use of this feature gate is discouraged.
This implements RFC #116.
Closes#16464.
[breaking-change]
As soon as an option is found that takes an argument, consume the rest
of the string and store it into i_arg. Previously this would only happen
if the character after the option was not a recognized option.
Addresses issue #16348
Being able to index into the bytes of a string encourages
poor UTF-8 hygiene. To get a view of `&[u8]` from either
a `String` or `&str` slice, use the `as_bytes()` method.
Closes#12710.
[breaking-change]
This obsoletes the old `to_err_msg` method. Replace
println!("Error: {}", failure.to_err_msg())
let string = failure.to_err_msg();
with
println!("Error: {}", failure)
let string = failure.to_str();
[breaking-change]
The following features have been removed
* box [a, b, c]
* ~[a, b, c]
* box [a, ..N]
* ~[a, ..N]
* ~[T] (as a type)
* deprecated_owned_vector lint
All users of ~[T] should move to using Vec<T> instead.
This grows a new option inside of rustdoc to add the ability to submit examples
to an external website. If the `--markdown-playground-url` command line option
or crate doc attribute `html_playground_url` is present, then examples will have
a button on hover to submit the code to the playground specified.
This commit enables submission of example code to play.rust-lang.org. The code
submitted is that which is tested by rustdoc, not necessarily the exact code
shown in the example.
Closes#14654
This is part of the ongoing renaming of the equality traits. See #12517 for more
details. All code using Eq/Ord will temporarily need to move to Partial{Eq,Ord}
or the Total{Eq,Ord} traits. The Total traits will soon be renamed to {Eq,Ord}.
cc #12517
[breaking-change]
This commit moves reflection (as well as the {:?} format modifier) to a new
libdebug crate, all of which is marked experimental.
This is a breaking change because it now requires the debug crate to be
explicitly linked if the :? format qualifier is used. This means that any code
using this feature will have to add `extern crate debug;` to the top of the
crate. Any code relying on reflection will also need to do this.
Closes#12019
[breaking-change]
The span on a inner doc-comment would point to the next token, e.g. the span for the `a` line points to the `b` line, and the span of `b` points to the `fn`.
```rust
//! a
//! b
fn bar() {}
```