Data Types - The Rust Programming Language (original) (raw)
- Foreword
- Introduction
- 1. Getting Started
- 2. Programming a Guessing Game
- 3. Common Programming Concepts
- 4. Understanding Ownership
- 5. Using Structs to Structure Related Data
- 5.2. An Example Program Using Structs
- 5.3. Method Syntax
- 6. Enums and Pattern Matching
- 6.2. The match Control Flow Operator
- 6.3. Concise Control Flow with if let
- 7. Packages, Crates, and Modules
- 7.2. Modules and use to control scope and privacy
- 8. Common Collections
- 8.2. Strings
- 8.3. Hash Maps
- 9. Error Handling
- 9.2. Recoverable Errors with Result
- 9.3. To panic! or Not To panic!
- 10. Generic Types, Traits, and Lifetimes
- 10.2. Traits: Defining Shared Behavior
- 10.3. Validating References with Lifetimes
- 11. Testing
- 11.2. Running tests
- 11.3. Test Organization
- 12. An I/O Project: Building a Command Line Program
- 12.2. Reading a File
- 12.3. Refactoring to Improve Modularity and Error Handling
- 12.4. Developing the Library’s Functionality with Test Driven Development
- 12.5. Working with Environment Variables
- 12.6. Writing Error Messages to Standard Error Instead of Standard Output
- 13. Functional Language Features: Iterators and Closures
- 13.2. Processing a Series of Items with Iterators
- 13.3. Improving Our I/O Project
- 13.4. Comparing Performance: Loops vs. Iterators
- 14. More about Cargo and Crates.io
- 14.2. Publishing a Crate to Crates.io
- 14.3. Cargo Workspaces
- 14.4. Installing Binaries from Crates.io with cargo install
- 14.5. Extending Cargo with Custom Commands
- 15. Smart Pointers
- 15.2. The Deref Trait Allows Access to the Data Through a Reference
- 15.3. The Drop Trait Runs Code on Cleanup
- 15.4. Rc, the Reference Counted Smart Pointer
- 15.5. RefCell and the Interior Mutability Pattern
- 15.6. Creating Reference Cycles and Leaking Memory is Safe
- 16. Fearless Concurrency
- 16.2. Message Passing
- 16.3. Shared State
- 16.4. Extensible Concurrency: Sync and Send
- 17. Object Oriented Programming Features of Rust
- 17.2. Using Trait Objects that Allow for Values of Different Types
- 17.3. Implementing an Object-Oriented Design Pattern
- 18. Patterns Match the Structure of Values
- 18.2. Refutability: Whether a Pattern Might Fail to Match
- 18.3. All the Pattern Syntax
- 19. Advanced Features
- 19.2. Advanced Lifetimes
- 19.3. Advanced Traits
- 19.4. Advanced Types
- 19.5. Advanced Functions & Closures
- 19.6. Macros
- 20. Final Project: Building a Multithreaded Web Server
- 20.2. Turning our Single Threaded Server into a Multithreaded Server
- 20.3. Graceful Shutdown and Cleanup
- 21. Appendix
- 21.2. B - Operators and Symbols
- 21.3. C - Derivable Traits
- 21.4. D - Useful Development Tools
- 21.5. E - Editions
- 21.6. F - Translations
- 21.7. G - How Rust is Made and “Nightly Rust”
The Rust Programming Language
Every value in Rust is of a certain data type, which tells Rust what kind of data is being specified so it knows how to work with that data. We’ll look at two data type subsets: scalar and compound.
Keep in mind that Rust is a statically typed language, which means that it must know the types of all variables at compile time. The compiler can usually infer what type we want to use based on the value and how we use it. In cases when many types are possible, such as when we converted a String
to a numeric type using parse
in the “Comparing the Guess to the Secret Number” section in Chapter 2, we must add a type annotation, like this:
# #![allow(unused_variables)]
#fn main() {
let guess: u32 = "42".parse().expect("Not a number!");
#}
If we don’t add the type annotation here, Rust will display the following error, which means the compiler needs more information from us to know which type we want to use:
error[E0282]: type annotations needed
--> src/main.rs:2:9
|
2 | let guess = "42".parse().expect("Not a number!");
| ^^^^^
| |
| cannot infer type for `_`
| consider giving `guess` a type
You’ll see different type annotations for other data types.
A scalar type represents a single value. Rust has four primary scalar types: integers, floating-point numbers, Booleans, and characters. You may recognize these from other programming languages. Let’s jump into how they work in Rust.
An integer is a number without a fractional component. We used one integer type in Chapter 2, the u32
type. This type declaration indicates that the value it’s associated with should be an unsigned integer (signed integer types start with i
, instead of u
) that takes up 32 bits of space. Table 3-1 shows the built-in integer types in Rust. Each variant in the Signed and Unsigned columns (for example, i16
) can be used to declare the type of an integer value.
Table 3-1: Integer Types in Rust
Length | Signed | Unsigned |
---|---|---|
8-bit | i8 | u8 |
16-bit | i16 | u16 |
32-bit | i32 | u32 |
64-bit | i64 | u64 |
128-bit | i128 | u128 |
arch | isize | usize |
Each variant can be either signed or unsigned and has an explicit size.Signed and unsigned refer to whether it’s possible for the number to be negative or positive—in other words, whether the number needs to have a sign with it (signed) or whether it will only ever be positive and can therefore be represented without a sign (unsigned). It’s like writing numbers on paper: when the sign matters, a number is shown with a plus sign or a minus sign; however, when it’s safe to assume the number is positive, it’s shown with no sign. Signed numbers are stored using two’s complement representation (if you’re unsure what this is, you can search for it online; an explanation is outside the scope of this book).
Each signed variant can store numbers from -(2n - 1) to 2n - 1 - 1 inclusive, where n is the number of bits that variant uses. So ani8
can store numbers from -(27) to 27 - 1, which equals -128 to 127. Unsigned variants can store numbers from 0 to 2n - 1, so a u8
can store numbers from 0 to 28 - 1, which equals 0 to 255.
Additionally, the isize
and usize
types depend on the kind of computer your program is running on: 64 bits if you’re on a 64-bit architecture and 32 bits if you’re on a 32-bit architecture.
You can write integer literals in any of the forms shown in Table 3-2. Note that all number literals except the byte literal allow a type suffix, such as57u8
, and _
as a visual separator, such as 1_000
.
Table 3-2: Integer Literals in Rust
Number literals | Example |
---|---|
Decimal | 98_222 |
Hex | 0xff |
Octal | 0o77 |
Binary | 0b1111_0000 |
Byte (u8 only) | b'A' |
So how do you know which type of integer to use? If you’re unsure, Rust’s defaults are generally good choices, and integer types default to i32
: this type is generally the fastest, even on 64-bit systems. The primary situation in which you’d use isize
or usize
is when indexing some sort of collection.
Let’s say that you have a u8
, which can hold values between zero and 255
. What happens if you try to change it to 256
? This is called “integer overflow,” and Rust has some interesting rules around this behavior. When compiling in debug mode, Rust checks for this kind of issue and will cause your program to panic, which is the term Rust uses when a program exits with an error. We’ll discuss panics more in Chapter 9.
In release builds, Rust does not check for overflow, and instead will do something called “two’s complement wrapping.” In short, 256
becomes0
, 257
becomes 1
, etc. Relying on overflow is considered an error, even if this behavior happens. If you want this behavior explicitly, the standard library has a type, Wrapping
, that provides it explicitly.
Rust also has two primitive types for floating-point numbers, which are numbers with decimal points. Rust’s floating-point types are f32
and f64
, which are 32 bits and 64 bits in size, respectively. The default type is f64
because on modern CPUs it’s roughly the same speed as f32
but is capable of more precision.
Here’s an example that shows floating-point numbers in action:
Filename: src/main.rs
fn main() {
let x = 2.0; // f64
let y: f32 = 3.0; // f32
}
Floating-point numbers are represented according to the IEEE-754 standard. Thef32
type is a single-precision float, and f64
has double precision.
Rust supports the basic mathematical operations you’d expect for all of the number types: addition, subtraction, multiplication, division, and remainder. The following code shows how you’d use each one in a let
statement:
Filename: src/main.rs
fn main() {
// addition
let sum = 5 + 10;
// subtraction
let difference = 95.5 - 4.3;
// multiplication
let product = 4 * 30;
// division
let quotient = 56.7 / 32.2;
// remainder
let remainder = 43 % 5;
}
Each expression in these statements uses a mathematical operator and evaluates to a single value, which is then bound to a variable. Appendix B contains a list of all operators that Rust provides.
As in most other programming languages, a Boolean type in Rust has two possible values: true
and false
. The Boolean type in Rust is specified using bool
. For example:
Filename: src/main.rs
fn main() {
let t = true;
let f: bool = false; // with explicit type annotation
}
The main way to consume Boolean values is through conditionals, such as an if
expression. We’ll cover how if
expressions work in Rust in the “Control Flow” section.
Booleans are one byte in size.
So far we’ve worked only with numbers, but Rust supports letters too. Rust’schar
type is the language’s most primitive alphabetic type, and the following code shows one way to use it. (Note that the char
literal is specified with single quotes, as opposed to string literals, which use double quotes.)
Filename: src/main.rs
fn main() {
let c = 'z';
let z = 'ℤ';
let heart_eyed_cat = '😻';
}
Rust’s char
type represents a Unicode Scalar Value, which means it can represent a lot more than just ASCII. Accented letters; Chinese, Japanese, and Korean characters; emoji; and zero-width spaces are all valid char
values in Rust. Unicode Scalar Values range from U+0000
to U+D7FF
and U+E000
toU+10FFFF
inclusive. However, a “character” isn’t really a concept in Unicode, so your human intuition for what a “character” is may not match up with what achar
is in Rust. We’ll discuss this topic in detail in “Strings” in Chapter 8.
Compound types can group multiple values into one type. Rust has two primitive compound types: tuples and arrays.
A tuple is a general way of grouping together some number of other values with a variety of types into one compound type. Tuples have a fixed length: once declared, they cannot grow or shrink in size.
We create a tuple by writing a comma-separated list of values inside parentheses. Each position in the tuple has a type, and the types of the different values in the tuple don’t have to be the same. We’ve added optional type annotations in this example:
Filename: src/main.rs
fn main() {
let tup: (i32, f64, u8) = (500, 6.4, 1);
}
The variable tup
binds to the entire tuple, because a tuple is considered a single compound element. To get the individual values out of a tuple, we can use pattern matching to destructure a tuple value, like this:
Filename: src/main.rs
fn main() {
let tup = (500, 6.4, 1);
let (x, y, z) = tup;
println!("The value of y is: {}", y);
}
This program first creates a tuple and binds it to the variable tup
. It then uses a pattern with let
to take tup
and turn it into three separate variables, x
, y
, and z
. This is called destructuring, because it breaks the single tuple into three parts. Finally, the program prints the value ofy
, which is 6.4
.
In addition to destructuring through pattern matching, we can access a tuple element directly by using a period (.
) followed by the index of the value we want to access. For example:
Filename: src/main.rs
fn main() {
let x: (i32, f64, u8) = (500, 6.4, 1);
let five_hundred = x.0;
let six_point_four = x.1;
let one = x.2;
}
This program creates a tuple, x
, and then makes new variables for each element by using their index. As with most programming languages, the first index in a tuple is 0.
Another way to have a collection of multiple values is with an array. Unlike a tuple, every element of an array must have the same type. Arrays in Rust are different from arrays in some other languages because arrays in Rust have a fixed length, like tuples.
In Rust, the values going into an array are written as a comma-separated list inside square brackets:
Filename: src/main.rs
fn main() {
let a = [1, 2, 3, 4, 5];
}
Arrays are useful when you want your data allocated on the stack rather than the heap (we will discuss the stack and the heap more in Chapter 4), or when you want to ensure you always have a fixed number of elements. An array isn’t as flexible as the vector type, though. A vector is a similar collection type provided by the standard library that is allowed to grow or shrink in size. If you’re unsure whether to use an array or a vector, you should probably use a vector. Chapter 8 discusses vectors in more detail.
An example of when you might want to use an array rather than a vector is in a program that needs to know the names of the months of the year. It’s very unlikely that such a program will need to add or remove months, so you can use an array because you know it will always contain 12 items:
# #![allow(unused_variables)]
#fn main() {
let months = ["January", "February", "March", "April", "May", "June", "July",
"August", "September", "October", "November", "December"];
#}
Arrays have an interesting type; it looks like this: [type; number]
. For example:
# #![allow(unused_variables)]
#fn main() {
let a: [i32; 5] = [1, 2, 3, 4, 5];
#}
First, there’s square brackets; they look like the syntax for creating an array. Inside, there’s two pieces of information, separated by a semicolon. The first is the type of each element of the array. Since all elements have the same type, we only need to list it once. After the semicolon, there’s a number that indicates the length of the array. Since an array has a fixed size, this number is always the same, even if the array’s elements are modified, it cannot grow or shrink.
An array is a single chunk of memory allocated on the stack. You can access elements of an array using indexing, like this:
Filename: src/main.rs
fn main() {
let a = [1, 2, 3, 4, 5];
let first = a[0];
let second = a[1];
}
In this example, the variable named first
will get the value 1
, because that is the value at index [0]
in the array. The variable named second
will get the value 2
from index [1]
in the array.
What happens if you try to access an element of an array that is past the end of the array? Say you change the example to the following code, which will compile but exit with an error when it runs:
Filename: src/main.rs
fn main() {
let a = [1, 2, 3, 4, 5];
let index = 10;
let element = a[index];
println!("The value of element is: {}", element);
}
Running this code using cargo run
produces the following result:
$ cargo run
Compiling arrays v0.1.0 (file:///projects/arrays)
Finished dev [unoptimized + debuginfo] target(s) in 0.31 secs
Running `target/debug/arrays`
thread '<main>' panicked at 'index out of bounds: the len is 5 but the index is
10', src/main.rs:6
note: Run with `RUST_BACKTRACE=1` for a backtrace.
The compilation didn’t produce any errors, but the program resulted in a_runtime_ error and didn’t exit successfully. When you attempt to access an element using indexing, Rust will check that the index you’ve specified is less than the array length. If the index is greater than the length, Rust will panic.
This is the first example of Rust’s safety principles in action. In many low-level languages, this kind of check is not done, and when you provide an incorrect index, invalid memory can be accessed. Rust protects you against this kind of error by immediately exiting instead of allowing the memory access and continuing. Chapter 9 discusses more of Rust’s error handling.