DataFrame in polars::frame - Rust (original) (raw)
pub struct DataFrame { /* private fields */ }
Expand description
A contiguous growable collection of Series
that have the same length.
§Use declarations
All the common tools can be found in crate::prelude (or in polars::prelude
).
use polars_core::prelude::*; // if the crate polars-core is used directly
// use polars::prelude::*; if the crate polars is used
§Initialization
§Default
A DataFrame
can be initialized empty:
let df = DataFrame::default();
assert!(df.is_empty());
§Wrapping a Vec<Series>
A DataFrame
is built upon a Vec<Series>
where the Series
have the same length.
let s1 = Column::new("Fruit".into(), ["Apple", "Apple", "Pear"]);
let s2 = Column::new("Color".into(), ["Red", "Yellow", "Green"]);
let df: PolarsResult<DataFrame> = DataFrame::new(vec![s1, s2]);
§Using a macro
The df! macro is a convenient method:
let df: PolarsResult<DataFrame> = df!("Fruit" => ["Apple", "Apple", "Pear"],
"Color" => ["Red", "Yellow", "Green"]);
§Using a CSV file
See the polars_io::csv::CsvReader
.
§Indexing
§By a number
The Index<usize>
is implemented for the DataFrame
.
let df = df!("Fruit" => ["Apple", "Apple", "Pear"],
"Color" => ["Red", "Yellow", "Green"])?;
assert_eq!(df[0], Column::new("Fruit".into(), &["Apple", "Apple", "Pear"]));
assert_eq!(df[1], Column::new("Color".into(), &["Red", "Yellow", "Green"]));
§By a Series
name
let df = df!("Fruit" => ["Apple", "Apple", "Pear"],
"Color" => ["Red", "Yellow", "Green"])?;
assert_eq!(df["Fruit"], Column::new("Fruit".into(), &["Apple", "Apple", "Pear"]));
assert_eq!(df["Color"], Column::new("Color".into(), &["Red", "Yellow", "Green"]));
Create a 2D ndarray::Array from this DataFrame. This requires all columns in theDataFrame to be non-null and numeric. They will be cast to the same data type (if they aren’t already).
For floating point data we implicitly convert None
to NaN
without failure.
use polars_core::prelude::*;
let a = UInt32Chunked::new("a".into(), &[1, 2, 3]).into_column();
let b = Float64Chunked::new("b".into(), &[10., 8., 6.]).into_column();
let df = DataFrame::new(vec![a, b]).unwrap();
let ndarray = df.to_ndarray::<Float64Type>(IndexOrder::Fortran).unwrap();
println!("{:?}", ndarray);
Outputs:
[[1.0, 10.0],
[2.0, 8.0],
[3.0, 6.0]], shape=[3, 2], strides=[1, 3], layout=Ff (0xa), const ndim=2
Explode DataFrame
to long format by exploding a column with Lists.
§Example
let s0 = Series::new("a".into(), &[1i64, 2, 3]);
let s1 = Series::new("b".into(), &[1i64, 1, 1]);
let s2 = Series::new("c".into(), &[2i64, 2, 2]);
let list = Series::new("foo", &[s0, s1, s2]);
let s0 = Series::new("B".into(), [1, 2, 3]);
let s1 = Series::new("C".into(), [1, 1, 1]);
let df = DataFrame::new(vec![list, s0, s1])?;
let exploded = df.explode(["foo"])?;
println!("{:?}", df);
println!("{:?}", exploded);
Outputs:
+-------------+-----+-----+
| foo | B | C |
| --- | --- | --- |
| list [i64] | i32 | i32 |
+=============+=====+=====+
| "[1, 2, 3]" | 1 | 1 |
+-------------+-----+-----+
| "[1, 1, 1]" | 2 | 1 |
+-------------+-----+-----+
| "[2, 2, 2]" | 3 | 1 |
+-------------+-----+-----+
+-----+-----+-----+
| foo | B | C |
| --- | --- | --- |
| i64 | i32 | i32 |
+=====+=====+=====+
| 1 | 1 | 1 |
+-----+-----+-----+
| 2 | 1 | 1 |
+-----+-----+-----+
| 3 | 1 | 1 |
+-----+-----+-----+
| 1 | 2 | 1 |
+-----+-----+-----+
| 1 | 2 | 1 |
+-----+-----+-----+
| 1 | 2 | 1 |
+-----+-----+-----+
| 2 | 3 | 1 |
+-----+-----+-----+
| 2 | 3 | 1 |
+-----+-----+-----+
| 2 | 3 | 1 |
+-----+-----+-----+
Group DataFrame using a Series column.
§Example
use polars_core::prelude::*;
fn group_by_sum(df: &DataFrame) -> PolarsResult<DataFrame> {
df.group_by(["column_name"])?
.select(["agg_column_name"])
.sum()
}
Group DataFrame using a Series column. The groups are ordered by their smallest row index.
Add columns horizontally.
§Safety
The caller must ensure:
Note: If self
is empty, self.height
will always be overridden by the height of the first column in columns
.
Note that on a debug build this will panic on duplicates / height mismatch.
Add multiple Column to a DataFrame. Errors if the resulting DataFrame columns have duplicate names or unequal heights.
Note: If self
is empty, self.height
will always be overridden by the height of the first column in columns
.
§Example
fn stack(df: &mut DataFrame, columns: &[Column]) {
df.hstack_mut(columns);
}
Get a row from a DataFrame. Use of this is discouraged as it will likely be slow.
Amortize allocations by reusing a row. The caller is responsible to make sure that the row has at least the capacity for the number of columns in the DataFrame
Amortize allocations by reusing a row. The caller is responsible to make sure that the row has at least the capacity for the number of columns in the DataFrame
§Safety
Does not do any bounds checking.
Create a new DataFrame from rows.
This should only be used when you have row wise data, as this is a lot slower than creating the Series in a columnar fashion
Create a new DataFrame from an iterator over rows.
This should only be used when you have row wise data, as this is a lot slower than creating the Series in a columnar fashion.
Create a new DataFrame from an iterator over rows. This should only be used when you have row wise data, as this is a lot slower than creating the Series in a columnar fashion
Create a new DataFrame from rows. This should only be used when you have row wise data, as this is a lot slower than creating the Series in a columnar fashion
Transpose a DataFrame. This is a very expensive operation.
Ensure all equal height and names are unique.
An Ok() result indicates columns
is a valid state for a DataFrame.
Returns an estimation of the total (heap) allocated size of the DataFrame
in bytes.
§Implementation
This estimation is the sum of the size of its buffers, validity, including nested arrays. Multiple arrays may share buffers and bitmaps. Therefore, the size of 2 arrays is not the sum of the sizes computed from this function. In particular, StructArray’s size is an upper bound.
When an array is sliced, its allocated size remains constant because the buffer unchanged. However, this function will yield a smaller number. This is because this function returns the visible size of the buffer, not its total capacity.
FFI buffers are included in this estimation.
Create a DataFrame from a Vector of Series.
Errors if a column names are not unique, or if heights are not all equal.
§Example
let s0 = Column::new("days".into(), [0, 1, 2].as_ref());
let s1 = Column::new("temp".into(), [22.1, 19.9, 7.].as_ref());
let df = DataFrame::new(vec![s0, s1])?;
Converts a sequence of columns into a DataFrame, broadcasting length-1 columns to match the other columns.
Converts a sequence of columns into a DataFrame, broadcasting length-1 columns to broadcast_len.
Converts a sequence of columns into a DataFrame, broadcasting length-1 columns to match the other columns.
§Safety
Does not check that the column names are unique (which they must be).
Creates an empty DataFrame
usable in a compile time context (such as static initializers).
§Example
use polars_core::prelude::DataFrame;
static EMPTY: DataFrame = DataFrame::empty();
Creates an empty DataFrame
with a specific height
.
Create an empty DataFrame
with empty columns as per the schema
.
Create an empty DataFrame
with empty columns as per the schema
.
Create a new DataFrame
with the given schema, only containing nulls.
Removes the last Series
from the DataFrame
and returns it, or None if it is empty.
§Example
let s1 = Column::new("Ocean".into(), ["Atlantic", "Indian"]);
let s2 = Column::new("Area (km²)".into(), [106_460_000, 70_560_000]);
let mut df = DataFrame::new(vec![s1.clone(), s2.clone()])?;
assert_eq!(df.pop(), Some(s2));
assert_eq!(df.pop(), Some(s1));
assert_eq!(df.pop(), None);
assert!(df.is_empty());
Add a new column at index 0 that counts the rows.
§Example
let df1: DataFrame = df!("Name" => ["James", "Mary", "John", "Patricia"])?;
assert_eq!(df1.shape(), (4, 1));
let df2: DataFrame = df1.with_row_index("Id".into(), None)?;
assert_eq!(df2.shape(), (4, 2));
println!("{}", df2);
Output:
shape: (4, 2)
+-----+----------+
| Id | Name |
| --- | --- |
| u32 | str |
+=====+==========+
| 0 | James |
+-----+----------+
| 1 | Mary |
+-----+----------+
| 2 | John |
+-----+----------+
| 3 | Patricia |
+-----+----------+
Add a row index column in place.
§Safety
The caller should ensure the DataFrame does not already contain a column with the given name.
§Panics
Panics if the resulting column would reach or overflow IdxSize::MAX.
Create a new DataFrame
but does not check the length or duplicate occurrence of theSeries
.
Calculates the height from the first column or 0
if no columns are given.
§Safety
It is the callers responsibility to uphold the contract of all Series
having an equal length and a unique name, if not this may panic down the line.
Create a new DataFrame
but does not check the length or duplicate occurrence of theSeries
.
It is advised to use DataFrame::new in favor of this method.
§Safety
It is the callers responsibility to uphold the contract of all Series
having an equal length and a unique name, if not this may panic down the line.
This will not panic even in debug mode - there are some (rare) use cases where a DataFrame is temporarily constructed containing duplicates for dispatching to functions. A DataFrame constructed with this method is generally highly unsafe and should not be long-lived.
Shrink the capacity of this DataFrame to fit its length.
Aggregate all the chunks in the DataFrame to a single chunk.
Aggregate all the chunks in the DataFrame to a single chunk in parallel. This may lead to more peak memory consumption.
Rechunks all columns to only have a single chunk.
Rechunks all columns to only have a single chunk and turns it into a [RecordBatchT
].
Returns true if the chunks of the columns do not align and re-chunking should be done
Ensure all the chunks in the DataFrame are aligned.
Get the DataFrame schema.
§Example
let df: DataFrame = df!("Thing" => ["Observable universe", "Human stupidity"],
"Diameter (m)" => [8.8e26, f64::INFINITY])?;
let f1: Field = Field::new("Thing".into(), DataType::String);
let f2: Field = Field::new("Diameter (m)".into(), DataType::Float64);
let sc: Schema = Schema::from_iter(vec![f1, f2]);
assert_eq!(&**df.schema(), &sc);
Get a reference to the DataFrame columns.
§Example
let df: DataFrame = df!("Name" => ["Adenine", "Cytosine", "Guanine", "Thymine"],
"Symbol" => ["A", "C", "G", "T"])?;
let columns: &[Column] = df.get_columns();
assert_eq!(columns[0].name(), "Name");
assert_eq!(columns[1].name(), "Symbol");
Get mutable access to the underlying columns.
§Safety
The caller must ensure the length of all Series remains equal to height
orDataFrame::set_height is called afterwards with the appropriate height
. The caller must ensure that the cached schema is cleared if it modifies the schema by calling DataFrame::clear_schema.
Remove all the columns in the DataFrame but keep the height
.
Extend the columns without checking for name collisions or height.
§Safety
The caller needs to ensure that:
- Column names are unique within the resulting DataFrame.
- The length of each appended column matches the height of the DataFrame. For
DataFrame
]s with no columns (ZCDFs), it is important that the height is set afterwards with DataFrame::set_height.
Take ownership of the underlying columns vec.
Iterator over the columns as Series.
§Example
let s1 = Column::new("Name".into(), ["Pythagoras' theorem", "Shannon entropy"]);
let s2 = Column::new("Formula".into(), ["a²+b²=c²", "H=-Σ[P(x)log|P(x)|]"]);
let df: DataFrame = DataFrame::new(vec![s1.clone(), s2.clone()])?;
let mut iterator = df.iter();
assert_eq!(iterator.next(), Some(s1.as_materialized_series()));
assert_eq!(iterator.next(), Some(s2.as_materialized_series()));
assert_eq!(iterator.next(), None);
§Example
let df: DataFrame = df!("Language" => ["Rust", "Python"],
"Designer" => ["Graydon Hoare", "Guido van Rossum"])?;
assert_eq!(df.get_column_names(), &["Language", "Designer"]);
Set the column names.
§Example
let mut df: DataFrame = df!("Mathematical set" => ["ℕ", "ℤ", "𝔻", "ℚ", "ℝ", "ℂ"])?;
df.set_column_names(["Set"])?;
assert_eq!(df.get_column_names(), &["Set"]);
Get the data types of the columns in the DataFrame.
§Example
let venus_air: DataFrame = df!("Element" => ["Carbon dioxide", "Nitrogen"],
"Fraction" => [0.965, 0.035])?;
assert_eq!(venus_air.dtypes(), &[DataType::String, DataType::Float64]);
The number of chunks for the first column.
The highest number of chunks for any column.
Get a reference to the schema fields of the DataFrame.
§Example
let earth: DataFrame = df!("Surface type" => ["Water", "Land"],
"Fraction" => [0.708, 0.292])?;
let f1: Field = Field::new("Surface type".into(), DataType::String);
let f2: Field = Field::new("Fraction".into(), DataType::Float64);
assert_eq!(earth.fields(), &[f1, f2]);
Get (height, width) of the DataFrame.
§Example
let df0: DataFrame = DataFrame::default();
let df1: DataFrame = df!("1" => [1, 2, 3, 4, 5])?;
let df2: DataFrame = df!("1" => [1, 2, 3, 4, 5],
"2" => [1, 2, 3, 4, 5])?;
assert_eq!(df0.shape(), (0 ,0));
assert_eq!(df1.shape(), (5, 1));
assert_eq!(df2.shape(), (5, 2));
Get the width of the DataFrame which is the number of columns.
§Example
let df0: DataFrame = DataFrame::default();
let df1: DataFrame = df!("Series 1" => [0; 0])?;
let df2: DataFrame = df!("Series 1" => [0; 0],
"Series 2" => [0; 0])?;
assert_eq!(df0.width(), 0);
assert_eq!(df1.width(), 1);
assert_eq!(df2.width(), 2);
Get the height of the DataFrame which is the number of rows.
§Example
let df0: DataFrame = DataFrame::default();
let df1: DataFrame = df!("Currency" => ["€", "$"])?;
let df2: DataFrame = df!("Currency" => ["€", "$", "¥", "£", "₿"])?;
assert_eq!(df0.height(), 0);
assert_eq!(df1.height(), 2);
assert_eq!(df2.height(), 5);
Returns the size as number of rows * number of columns
Returns true
if the DataFrame contains no rows.
§Example
let df1: DataFrame = DataFrame::default();
assert!(df1.is_empty());
let df2: DataFrame = df!("First name" => ["Forever"],
"Last name" => ["Alone"])?;
assert!(!df2.is_empty());
Set the height (i.e. number of rows) of this DataFrame.
§Safety
This needs to be equal to the length of all the columns.
Add multiple Series to a DataFrame. The added Series
are required to have the same length.
§Example
let df1: DataFrame = df!("Element" => ["Copper", "Silver", "Gold"])?;
let s1 = Column::new("Proton".into(), [29, 47, 79]);
let s2 = Column::new("Electron".into(), [29, 47, 79]);
let df2: DataFrame = df1.hstack(&[s1, s2])?;
assert_eq!(df2.shape(), (3, 3));
println!("{}", df2);
Output:
shape: (3, 3)
+---------+--------+----------+
| Element | Proton | Electron |
| --- | --- | --- |
| str | i32 | i32 |
+=========+========+==========+
| Copper | 29 | 29 |
+---------+--------+----------+
| Silver | 47 | 47 |
+---------+--------+----------+
| Gold | 79 | 79 |
+---------+--------+----------+
Concatenate a DataFrame to this DataFrame and return as newly allocated DataFrame.
If many vstack
operations are done, it is recommended to call DataFrame::align_chunks_par.
§Example
let df1: DataFrame = df!("Element" => ["Copper", "Silver", "Gold"],
"Melting Point (K)" => [1357.77, 1234.93, 1337.33])?;
let df2: DataFrame = df!("Element" => ["Platinum", "Palladium"],
"Melting Point (K)" => [2041.4, 1828.05])?;
let df3: DataFrame = df1.vstack(&df2)?;
assert_eq!(df3.shape(), (5, 2));
println!("{}", df3);
Output:
shape: (5, 2)
+-----------+-------------------+
| Element | Melting Point (K) |
| --- | --- |
| str | f64 |
+===========+===================+
| Copper | 1357.77 |
+-----------+-------------------+
| Silver | 1234.93 |
+-----------+-------------------+
| Gold | 1337.33 |
+-----------+-------------------+
| Platinum | 2041.4 |
+-----------+-------------------+
| Palladium | 1828.05 |
+-----------+-------------------+
Concatenate a DataFrame to this DataFrame
If many vstack
operations are done, it is recommended to call DataFrame::align_chunks_par.
§Example
let mut df1: DataFrame = df!("Element" => ["Copper", "Silver", "Gold"],
"Melting Point (K)" => [1357.77, 1234.93, 1337.33])?;
let df2: DataFrame = df!("Element" => ["Platinum", "Palladium"],
"Melting Point (K)" => [2041.4, 1828.05])?;
df1.vstack_mut(&df2)?;
assert_eq!(df1.shape(), (5, 2));
println!("{}", df1);
Output:
shape: (5, 2)
+-----------+-------------------+
| Element | Melting Point (K) |
| --- | --- |
| str | f64 |
+===========+===================+
| Copper | 1357.77 |
+-----------+-------------------+
| Silver | 1234.93 |
+-----------+-------------------+
| Gold | 1337.33 |
+-----------+-------------------+
| Platinum | 2041.4 |
+-----------+-------------------+
| Palladium | 1828.05 |
+-----------+-------------------+
Extend the memory backed by this DataFrame with the values from other
.
Different from vstack which adds the chunks from other
to the chunks of this DataFrame extend
appends the data from other
to the underlying memory locations and thus may cause a reallocation.
If this does not cause a reallocation, the resulting data structure will not have any extra chunks and thus will yield faster queries.
Prefer extend
over vstack
when you want to do a query after a single append. For instance during online operations where you add n
rows and rerun a query.
Prefer vstack
over extend
when you want to append many times before doing a query. For instance when you read in multiple files and when to store them in a single DataFrame
. In the latter case, finish the sequence of append
operations with a rechunk.
Remove a column by name and return the column removed.
§Example
let mut df: DataFrame = df!("Animal" => ["Tiger", "Lion", "Great auk"],
"IUCN" => ["Endangered", "Vulnerable", "Extinct"])?;
let s1: PolarsResult<Column> = df.drop_in_place("Average weight");
assert!(s1.is_err());
let s2: Column = df.drop_in_place("Animal")?;
assert_eq!(s2, Column::new("Animal".into(), &["Tiger", "Lion", "Great auk"]));
Return a new DataFrame where all null values are dropped.
§Example
let df1: DataFrame = df!("Country" => ["Malta", "Liechtenstein", "North Korea"],
"Tax revenue (% GDP)" => [Some(32.7), None, None])?;
assert_eq!(df1.shape(), (3, 2));
let df2: DataFrame = df1.drop_nulls::<String>(None)?;
assert_eq!(df2.shape(), (1, 2));
println!("{}", df2);
Output:
shape: (1, 2)
+---------+---------------------+
| Country | Tax revenue (% GDP) |
| --- | --- |
| str | f64 |
+=========+=====================+
| Malta | 32.7 |
+---------+---------------------+
Drop a column by name. This is a pure method and will return a new DataFrame instead of modifying the current one in place.
§Example
let df1: DataFrame = df!("Ray type" => ["α", "β", "X", "γ"])?;
let df2: DataFrame = df1.drop("Ray type")?;
assert!(df2.is_empty());
Drop columns that are in names
.
Drop columns that are in names
without allocating a HashSet.
Insert a new column at a given index.
Add a new column to this DataFrame or replace an existing one.
Adds a column to the DataFrame without doing any checks on length or duplicates.
§Safety
The caller must ensure self.width() == 0 || column.len() == self.height()
.
Add a new column to this DataFrame or replace an existing one. Uses an existing schema to amortize lookups. If the schema is incorrect, we will fallback to linear search.
Note: Schema can be both input or output_schema
Get a row in the DataFrame. Beware this is slow.
§Example
fn example(df: &mut DataFrame, idx: usize) -> Option<Vec<AnyValue>> {
df.get(idx)
}
Select a Series by index.
§Example
let df: DataFrame = df!("Star" => ["Sun", "Betelgeuse", "Sirius A", "Sirius B"],
"Absolute magnitude" => [4.83, -5.85, 1.42, 11.18])?;
let s1: Option<&Column> = df.select_at_idx(0);
let s2 = Column::new("Star".into(), ["Sun", "Betelgeuse", "Sirius A", "Sirius B"]);
assert_eq!(s1, Some(&s2));
Select column(s) from this DataFrame by range and return a new DataFrame
§Examples
let df = df! {
"0" => [0, 0, 0],
"1" => [1, 1, 1],
"2" => [2, 2, 2]
}?;
assert!(df.select(["0", "1"])?.equals(&df.select_by_range(0..=1)?));
assert!(df.equals(&df.select_by_range(..)?));
Get column index of a Series by name.
§Example
let df: DataFrame = df!("Name" => ["Player 1", "Player 2", "Player 3"],
"Health" => [100, 200, 500],
"Mana" => [250, 100, 0],
"Strength" => [30, 150, 300])?;
assert_eq!(df.get_column_index("Name"), Some(0));
assert_eq!(df.get_column_index("Health"), Some(1));
assert_eq!(df.get_column_index("Mana"), Some(2));
assert_eq!(df.get_column_index("Strength"), Some(3));
assert_eq!(df.get_column_index("Haste"), None);
Get column index of a Series by name.
Select a single column by name.
§Example
let s1 = Column::new("Password".into(), ["123456", "[]B$u$g$s$B#u#n#n#y[]{}"]);
let s2 = Column::new("Robustness".into(), ["Weak", "Strong"]);
let df: DataFrame = DataFrame::new(vec![s1.clone(), s2])?;
assert_eq!(df.column("Password")?, &s1);
Selected multiple columns by name.
§Example
let df: DataFrame = df!("Latin name" => ["Oncorhynchus kisutch", "Salmo salar"],
"Max weight (kg)" => [16.0, 35.89])?;
let sv: Vec<&Column> = df.columns(["Latin name", "Max weight (kg)"])?;
assert_eq!(&df[0], sv[0]);
assert_eq!(&df[1], sv[1]);
Select column(s) from this DataFrame and return a new DataFrame.
§Examples
fn example(df: &DataFrame) -> PolarsResult<DataFrame> {
df.select(["foo", "bar"])
}
Select with a known schema. The schema names must match the column names of this DataFrame.
Select with a known schema without checking for duplicates in selection
. The schema names must match the column names of this DataFrame.
- The schema names must match the column names of this DataFrame.
Select column(s) from this DataFrame and return them into a Vec.
§Example
let df: DataFrame = df!("Name" => ["Methane", "Ethane", "Propane"],
"Carbon" => [1, 2, 3],
"Hydrogen" => [4, 6, 8])?;
let sv: Vec<Column> = df.select_columns(["Carbon", "Hydrogen"])?;
assert_eq!(df["Carbon"], sv[0]);
assert_eq!(df["Hydrogen"], sv[1]);
Take the DataFrame rows by a boolean mask.
§Example
fn example(df: &DataFrame) -> PolarsResult<DataFrame> {
let mask = df.column("sepal_width")?.is_not_null();
df.filter(&mask)
}
Same as filter
but does not parallelize.
Take DataFrame rows by index values.
§Example
fn example(df: &DataFrame) -> PolarsResult<DataFrame> {
let idx = IdxCa::new("idx".into(), [0, 1, 9]);
df.take(&idx)
}
§Safety
The indices must be in-bounds.
§Safety
The indices must be in-bounds.
§Safety
The indices must be in-bounds.
§Safety
The indices must be in-bounds.
Rename a column in the DataFrame.
§Example
fn example(df: &mut DataFrame) -> PolarsResult<&mut DataFrame> {
let original_name = "foo";
let new_name = "bar";
df.rename(original_name, new_name.into())
}
Create a DataFrame
that has fields for all the known runtime metadata for each column.
This dataframe does not necessarily have a specified schema and may be changed at any point. It is primarily used for debugging.
Return a sorted clone of this DataFrame.
In many cases the output chunks will be continuous in memory but this is not guaranteed
§Example
Sort by a single column with default options:
fn sort_by_sepal_width(df: &DataFrame) -> PolarsResult<DataFrame> {
df.sort(["sepal_width"], Default::default())
}
Sort by a single column with specific order:
fn sort_with_specific_order(df: &DataFrame, descending: bool) -> PolarsResult<DataFrame> {
df.sort(
["sepal_width"],
SortMultipleOptions::new()
.with_order_descending(descending)
)
}
Sort by multiple columns with specifying order for each column:
fn sort_by_multiple_columns_with_specific_order(df: &DataFrame) -> PolarsResult<DataFrame> {
df.sort(
["sepal_width", "sepal_length"],
SortMultipleOptions::new()
.with_order_descending_multi([false, true])
)
}
See SortMultipleOptions for more options.
Also see DataFrame::sort_in_place.
Replace a column with a Series.
§Example
let mut df: DataFrame = df!("Country" => ["United States", "China"],
"Area (km²)" => [9_833_520, 9_596_961])?;
let s: Series = Series::new("Country".into(), ["USA", "PRC"]);
assert!(df.replace("Nation", s.clone()).is_err());
assert!(df.replace("Country", s).is_ok());
Replace or update a column. The difference between this method and DataFrame::with_columnis that now the value of column: &str
determines the name of the column and not the name of the Series
passed to this method.
Replace column at index idx
with a Series.
§Example
# use polars_core::prelude::*;
let s0 = Series::new("foo".into(), ["ham", "spam", "egg"]);
let s1 = Series::new("ascii".into(), [70, 79, 79]);
let mut df = DataFrame::new(vec![s0, s1])?;
// Add 32 to get lowercase ascii values
df.replace_column(1, df.select_at_idx(1).unwrap() + 32);
# Ok::<(), PolarsError>(())
Apply a closure to a column. This is the recommended way to do in place modification.
§Example
let s0 = Column::new("foo".into(), ["ham", "spam", "egg"]);
let s1 = Column::new("names".into(), ["Jean", "Claude", "van"]);
let mut df = DataFrame::new(vec![s0, s1])?;
fn str_to_len(str_val: &Column) -> Column {
str_val.str()
.unwrap()
.into_iter()
.map(|opt_name: Option<&str>| {
opt_name.map(|name: &str| name.len() as u32)
})
.collect::<UInt32Chunked>()
.into_column()
}
// Replace the names column by the length of the names.
df.apply("names", str_to_len);
Results in:
+--------+-------+
| foo | |
| --- | names |
| str | u32 |
+========+=======+
| "ham" | 4 |
+--------+-------+
| "spam" | 6 |
+--------+-------+
| "egg" | 3 |
+--------+-------+
Apply a closure to a column at index idx
. This is the recommended way to do in place modification.
§Example
let s0 = Column::new("foo".into(), ["ham", "spam", "egg"]);
let s1 = Column::new("ascii".into(), [70, 79, 79]);
let mut df = DataFrame::new(vec![s0, s1])?;
// Add 32 to get lowercase ascii values
df.apply_at_idx(1, |s| s + 32);
Results in:
+--------+-------+
| foo | ascii |
| --- | --- |
| str | i32 |
+========+=======+
| "ham" | 102 |
+--------+-------+
| "spam" | 111 |
+--------+-------+
| "egg" | 111 |
+--------+-------+
Apply a closure that may fail to a column at index idx
. This is the recommended way to do in place modification.
§Example
This is the idiomatic way to replace some values a column of a DataFrame
given range of indexes.
let s0 = Column::new("foo".into(), ["ham", "spam", "egg", "bacon", "quack"]);
let s1 = Column::new("values".into(), [1, 2, 3, 4, 5]);
let mut df = DataFrame::new(vec![s0, s1])?;
let idx = vec![0, 1, 4];
df.try_apply("foo", |c| {
c.str()?
.scatter_with(idx, |opt_val| opt_val.map(|string| format!("{}-is-modified", string)))
});
Results in:
+---------------------+--------+
| foo | values |
| --- | --- |
| str | i32 |
+=====================+========+
| "ham-is-modified" | 1 |
+---------------------+--------+
| "spam-is-modified" | 2 |
+---------------------+--------+
| "egg" | 3 |
+---------------------+--------+
| "bacon" | 4 |
+---------------------+--------+
| "quack-is-modified" | 5 |
+---------------------+--------+
Apply a closure that may fail to a column. This is the recommended way to do in place modification.
§Example
This is the idiomatic way to replace some values a column of a DataFrame
given a boolean mask.
let s0 = Column::new("foo".into(), ["ham", "spam", "egg", "bacon", "quack"]);
let s1 = Column::new("values".into(), [1, 2, 3, 4, 5]);
let mut df = DataFrame::new(vec![s0, s1])?;
// create a mask
let values = df.column("values")?.as_materialized_series();
let mask = values.lt_eq(1)? | values.gt_eq(5_i32)?;
df.try_apply("foo", |c| {
c.str()?
.set(&mask, Some("not_within_bounds"))
});
Results in:
+---------------------+--------+
| foo | values |
| --- | --- |
| str | i32 |
+=====================+========+
| "not_within_bounds" | 1 |
+---------------------+--------+
| "spam" | 2 |
+---------------------+--------+
| "egg" | 3 |
+---------------------+--------+
| "bacon" | 4 |
+---------------------+--------+
| "not_within_bounds" | 5 |
+---------------------+--------+
Slice the DataFrame along the rows.
§Example
let df: DataFrame = df!("Fruit" => ["Apple", "Grape", "Grape", "Fig", "Fig"],
"Color" => ["Green", "Red", "White", "White", "Red"])?;
let sl: DataFrame = df.slice(2, 3);
assert_eq!(sl.shape(), (3, 2));
println!("{}", sl);
Output:
shape: (3, 2)
+-------+-------+
| Fruit | Color |
| --- | --- |
| str | str |
+=======+=======+
| Grape | White |
+-------+-------+
| Fig | White |
+-------+-------+
| Fig | Red |
+-------+-------+
Get the head of the DataFrame.
§Example
let countries: DataFrame =
df!("Rank by GDP (2021)" => [1, 2, 3, 4, 5],
"Continent" => ["North America", "Asia", "Asia", "Europe", "Europe"],
"Country" => ["United States", "China", "Japan", "Germany", "United Kingdom"],
"Capital" => ["Washington", "Beijing", "Tokyo", "Berlin", "London"])?;
assert_eq!(countries.shape(), (5, 4));
println!("{}", countries.head(Some(3)));
Output:
shape: (3, 4)
+--------------------+---------------+---------------+------------+
| Rank by GDP (2021) | Continent | Country | Capital |
| --- | --- | --- | --- |
| i32 | str | str | str |
+====================+===============+===============+============+
| 1 | North America | United States | Washington |
+--------------------+---------------+---------------+------------+
| 2 | Asia | China | Beijing |
+--------------------+---------------+---------------+------------+
| 3 | Asia | Japan | Tokyo |
+--------------------+---------------+---------------+------------+
Get the tail of the DataFrame.
§Example
let countries: DataFrame =
df!("Rank (2021)" => [105, 106, 107, 108, 109],
"Apple Price (€/kg)" => [0.75, 0.70, 0.70, 0.65, 0.52],
"Country" => ["Kosovo", "Moldova", "North Macedonia", "Syria", "Turkey"])?;
assert_eq!(countries.shape(), (5, 3));
println!("{}", countries.tail(Some(2)));
Output:
shape: (2, 3)
+-------------+--------------------+---------+
| Rank (2021) | Apple Price (€/kg) | Country |
| --- | --- | --- |
| i32 | f64 | str |
+=============+====================+=========+
| 108 | 0.63 | Syria |
+-------------+--------------------+---------+
| 109 | 0.63 | Turkey |
+-------------+--------------------+---------+
Iterator over the rows in this DataFrame as Arrow RecordBatches.
§Panics
Panics if the DataFrame that is passed is not rechunked.
This responsibility is left to the caller as we don’t want to take mutable references here, but we also don’t want to rechunk here, as this operation is costly and would benefit the caller as well.
Iterator over the rows in this DataFrame as Arrow RecordBatches as physical values.
§Panics
Panics if the DataFrame that is passed is not rechunked.
This responsibility is left to the caller as we don’t want to take mutable references here, but we also don’t want to rechunk here, as this operation is costly and would benefit the caller as well.
Get a DataFrame with all the columns in reversed order.
Shift the values by a given period and fill the parts that will be empty due to this operation with Nones
.
See the method on Series for more info on the shift
operation.
Replace None values with one of the following strategies:
- Forward fill (replace None with the previous value)
- Backward fill (replace None with the next value)
- Mean fill (replace None with the mean of the whole array)
- Min fill (replace None with the minimum of the whole array)
- Max fill (replace None with the maximum of the whole array)
See the method on Series for more info on the fill_null
operation.
Pipe different functions/ closure operations that work on a DataFrame together.
Pipe different functions/ closure operations that work on a DataFrame together.
Pipe different functions/ closure operations that work on a DataFrame together.
Drop duplicate rows from a DataFrame.This fails when there is a column of type List in DataFrame
Stable means that the order is maintained. This has a higher cost than an unstable distinct.
§Example
let df = df! {
"flt" => [1., 1., 2., 2., 3., 3.],
"int" => [1, 1, 2, 2, 3, 3, ],
"str" => ["a", "a", "b", "b", "c", "c"]
}?;
println!("{}", df.unique_stable(None, UniqueKeepStrategy::First, None)?);
Returns
+-----+-----+-----+
| flt | int | str |
| --- | --- | --- |
| f64 | i32 | str |
+=====+=====+=====+
| 1 | 1 | "a" |
+-----+-----+-----+
| 2 | 2 | "b" |
+-----+-----+-----+
| 3 | 3 | "c" |
+-----+-----+-----+
Get a mask of all the unique rows in the DataFrame.
§Example
let df: DataFrame = df!("Company" => ["Apple", "Microsoft"],
"ISIN" => ["US0378331005", "US5949181045"])?;
let ca: ChunkedArray<BooleanType> = df.is_unique()?;
assert!(ca.all());
Get a mask of all the duplicated rows in the DataFrame.
§Example
let df: DataFrame = df!("Company" => ["Alphabet", "Alphabet"],
"ISIN" => ["US02079K3059", "US02079K1079"])?;
let ca: ChunkedArray<BooleanType> = df.is_duplicated()?;
assert!(!ca.all());
Create a new DataFrame that shows the null counts per column.
Get the supertype of the columns in this DataFrame
Split into multiple DataFrames partitioned by groups
Split into multiple DataFrames partitioned by groups Order of the groups are maintained.
Unnest the given Struct
columns. This means that the fields of the Struct
type will be inserted as columns.
Check if DataFrame’ schemas are equal.
Check if DataFrames are equal. Note that None == None
evaluates to false
§Example
let df1: DataFrame = df!("Atomic number" => &[1, 51, 300],
"Element" => &[Some("Hydrogen"), Some("Antimony"), None])?;
let df2: DataFrame = df!("Atomic number" => &[1, 51, 300],
"Element" => &[Some("Hydrogen"), Some("Antimony"), None])?;
assert!(!df1.equals(&df2));
Check if all values in DataFrames are equal where None == None
evaluates to true
.
§Example
let df1: DataFrame = df!("Atomic number" => &[1, 51, 300],
"Element" => &[Some("Hydrogen"), Some("Antimony"), None])?;
let df2: DataFrame = df!("Atomic number" => &[1, 51, 300],
"Element" => &[Some("Hydrogen"), Some("Antimony"), None])?;
assert!(df1.equals_missing(&df2));