DataFrame in polars::frame - Rust (original) (raw)

pub struct DataFrame { /* private fields */ }

Expand description

A contiguous growable collection of Series that have the same length.

§Use declarations

All the common tools can be found in crate::prelude (or in polars::prelude).

use polars_core::prelude::*; // if the crate polars-core is used directly
// use polars::prelude::*;      if the crate polars is used

§Initialization

§Default

A DataFrame can be initialized empty:

let df = DataFrame::default();
assert!(df.is_empty());

§Wrapping a `Vec<Series>`

A DataFrame is built upon a Vec<Series> where the Series have the same length.

let s1 = Column::new("Fruit".into(), ["Apple", "Apple", "Pear"]);
let s2 = Column::new("Color".into(), ["Red", "Yellow", "Green"]);

let df: PolarsResult<DataFrame> = DataFrame::new(vec![s1, s2]);

§Using a macro

The df! macro is a convenient method:

let df: PolarsResult<DataFrame> = df!("Fruit" => ["Apple", "Apple", "Pear"],
                                      "Color" => ["Red", "Yellow", "Green"]);

§Using a CSV file

See the polars_io::csv::CsvReader.

§Indexing

§By a number

The Index<usize> is implemented for the DataFrame.

let df = df!("Fruit" => ["Apple", "Apple", "Pear"],
             "Color" => ["Red", "Yellow", "Green"])?;

assert_eq!(df[0], Column::new("Fruit".into(), &["Apple", "Apple", "Pear"]));
assert_eq!(df[1], Column::new("Color".into(), &["Red", "Yellow", "Green"]));

§By a `Series` name

let df = df!("Fruit" => ["Apple", "Apple", "Pear"],
             "Color" => ["Red", "Yellow", "Green"])?;

assert_eq!(df["Fruit"], Column::new("Fruit".into(), &["Apple", "Apple", "Pear"]));
assert_eq!(df["Color"], Column::new("Color".into(), &["Red", "Yellow", "Green"]));

Source §

Source

Create a 2D ndarray::Array from this DataFrame. This requires all columns in theDataFrame to be non-null and numeric. They will be cast to the same data type (if they aren’t already).

For floating point data we implicitly convert None to NaN without failure.

use polars_core::prelude::*;
let a = UInt32Chunked::new("a".into(), &[1, 2, 3]).into_column();
let b = Float64Chunked::new("b".into(), &[10., 8., 6.]).into_column();

let df = DataFrame::new(vec![a, b]).unwrap();
let ndarray = df.to_ndarray::<Float64Type>(IndexOrder::Fortran).unwrap();
println!("{:?}", ndarray);

Outputs:

[[1.0, 10.0],
 [2.0, 8.0],
 [3.0, 6.0]], shape=[3, 2], strides=[1, 3], layout=Ff (0xa), const ndim=2

Explode DataFrame to long format by exploding a column with Lists.

§Example

ⓘ

let s0 = Series::new("a".into(), &[1i64, 2, 3]);
let s1 = Series::new("b".into(), &[1i64, 1, 1]);
let s2 = Series::new("c".into(), &[2i64, 2, 2]);
let list = Series::new("foo", &[s0, s1, s2]);

let s0 = Series::new("B".into(), [1, 2, 3]);
let s1 = Series::new("C".into(), [1, 1, 1]);
let df = DataFrame::new(vec![list, s0, s1])?;
let exploded = df.explode(["foo"])?;

println!("{:?}", df);
println!("{:?}", exploded);

Outputs:

 +-------------+-----+-----+
 | foo         | B   | C   |
 | ---         | --- | --- |
 | list [i64]  | i32 | i32 |
 +=============+=====+=====+
 | "[1, 2, 3]" | 1   | 1   |
 +-------------+-----+-----+
 | "[1, 1, 1]" | 2   | 1   |
 +-------------+-----+-----+
 | "[2, 2, 2]" | 3   | 1   |
 +-------------+-----+-----+

 +-----+-----+-----+
 | foo | B   | C   |
 | --- | --- | --- |
 | i64 | i32 | i32 |
 +=====+=====+=====+
 | 1   | 1   | 1   |
 +-----+-----+-----+
 | 2   | 1   | 1   |
 +-----+-----+-----+
 | 3   | 1   | 1   |
 +-----+-----+-----+
 | 1   | 2   | 1   |
 +-----+-----+-----+
 | 1   | 2   | 1   |
 +-----+-----+-----+
 | 1   | 2   | 1   |
 +-----+-----+-----+
 | 2   | 3   | 1   |
 +-----+-----+-----+
 | 2   | 3   | 1   |
 +-----+-----+-----+
 | 2   | 3   | 1   |
 +-----+-----+-----+

Source §

Source

Group DataFrame using a Series column.

§Example

use polars_core::prelude::*;
fn group_by_sum(df: &DataFrame) -> PolarsResult<DataFrame> {
    df.group_by(["column_name"])?
    .select(["agg_column_name"])
    .sum()
}

Source

Group DataFrame using a Series column. The groups are ordered by their smallest row index.

Source §

Source

Add columns horizontally.

§Safety

The caller must ensure:

the length of all Column is equal to the height of this DataFrame
the columns names are unique

Note: If self is empty, self.height will always be overridden by the height of the first column in columns.

Note that on a debug build this will panic on duplicates / height mismatch.

Source

Add multiple Column to a DataFrame. Errors if the resulting DataFrame columns have duplicate names or unequal heights.

Note: If self is empty, self.height will always be overridden by the height of the first column in columns.

§Example

fn stack(df: &mut DataFrame, columns: &[Column]) {
    df.hstack_mut(columns);
}

Source §

Source

Get a row from a DataFrame. Use of this is discouraged as it will likely be slow.

Source

Amortize allocations by reusing a row. The caller is responsible to make sure that the row has at least the capacity for the number of columns in the DataFrame

Source

Amortize allocations by reusing a row. The caller is responsible to make sure that the row has at least the capacity for the number of columns in the DataFrame

§Safety

Does not do any bounds checking.

Source

Create a new DataFrame from rows.

This should only be used when you have row wise data, as this is a lot slower than creating the Series in a columnar fashion

Source

Create a new DataFrame from an iterator over rows.

This should only be used when you have row wise data, as this is a lot slower than creating the Series in a columnar fashion.

Source

Create a new DataFrame from an iterator over rows. This should only be used when you have row wise data, as this is a lot slower than creating the Series in a columnar fashion

Source

Create a new DataFrame from rows. This should only be used when you have row wise data, as this is a lot slower than creating the Series in a columnar fashion

Source §

Source

Transpose a DataFrame. This is a very expensive operation.

Source §

Source

Ensure all equal height and names are unique.

An Ok() result indicates columns is a valid state for a DataFrame.

Source §

Source

Returns an estimation of the total (heap) allocated size of the DataFrame in bytes.

§Implementation

This estimation is the sum of the size of its buffers, validity, including nested arrays. Multiple arrays may share buffers and bitmaps. Therefore, the size of 2 arrays is not the sum of the sizes computed from this function. In particular, StructArray’s size is an upper bound.

When an array is sliced, its allocated size remains constant because the buffer unchanged. However, this function will yield a smaller number. This is because this function returns the visible size of the buffer, not its total capacity.

FFI buffers are included in this estimation.

Source

Create a DataFrame from a Vector of Series.

Errors if a column names are not unique, or if heights are not all equal.

§Example

let s0 = Column::new("days".into(), [0, 1, 2].as_ref());
let s1 = Column::new("temp".into(), [22.1, 19.9, 7.].as_ref());

let df = DataFrame::new(vec![s0, s1])?;

Source

Converts a sequence of columns into a DataFrame, broadcasting length-1 columns to match the other columns.

Source

Converts a sequence of columns into a DataFrame, broadcasting length-1 columns to broadcast_len.

Source

Converts a sequence of columns into a DataFrame, broadcasting length-1 columns to match the other columns.

§Safety

Does not check that the column names are unique (which they must be).

Source

Creates an empty DataFrame usable in a compile time context (such as static initializers).

§Example

use polars_core::prelude::DataFrame;
static EMPTY: DataFrame = DataFrame::empty();

Source

Creates an empty DataFrame with a specific height.

Source

Create an empty DataFrame with empty columns as per the schema.

Source

Create an empty DataFrame with empty columns as per the schema.

Source

Create a new DataFrame with the given schema, only containing nulls.

Source

Removes the last Series from the DataFrame and returns it, or None if it is empty.

§Example

let s1 = Column::new("Ocean".into(), ["Atlantic", "Indian"]);
let s2 = Column::new("Area (km²)".into(), [106_460_000, 70_560_000]);
let mut df = DataFrame::new(vec![s1.clone(), s2.clone()])?;

assert_eq!(df.pop(), Some(s2));
assert_eq!(df.pop(), Some(s1));
assert_eq!(df.pop(), None);
assert!(df.is_empty());

Source

Add a new column at index 0 that counts the rows.

§Example

let df1: DataFrame = df!("Name" => ["James", "Mary", "John", "Patricia"])?;
assert_eq!(df1.shape(), (4, 1));

let df2: DataFrame = df1.with_row_index("Id".into(), None)?;
assert_eq!(df2.shape(), (4, 2));
println!("{}", df2);

Output:

 shape: (4, 2)
 +-----+----------+
 | Id  | Name     |
 | --- | ---      |
 | u32 | str      |
 +=====+==========+
 | 0   | James    |
 +-----+----------+
 | 1   | Mary     |
 +-----+----------+
 | 2   | John     |
 +-----+----------+
 | 3   | Patricia |
 +-----+----------+

Source

Add a row index column in place.

§Safety

The caller should ensure the DataFrame does not already contain a column with the given name.

§Panics

Panics if the resulting column would reach or overflow IdxSize::MAX.

Source

Create a new DataFrame but does not check the length or duplicate occurrence of theSeries.

Calculates the height from the first column or 0 if no columns are given.

§Safety

It is the callers responsibility to uphold the contract of all Serieshaving an equal length and a unique name, if not this may panic down the line.

Source

Create a new DataFrame but does not check the length or duplicate occurrence of theSeries.

It is advised to use DataFrame::new in favor of this method.

§Safety

It is the callers responsibility to uphold the contract of all Serieshaving an equal length and a unique name, if not this may panic down the line.

Source

This will not panic even in debug mode - there are some (rare) use cases where a DataFrame is temporarily constructed containing duplicates for dispatching to functions. A DataFrame constructed with this method is generally highly unsafe and should not be long-lived.

Source

Shrink the capacity of this DataFrame to fit its length.

Source

Aggregate all the chunks in the DataFrame to a single chunk.

Source

Aggregate all the chunks in the DataFrame to a single chunk in parallel. This may lead to more peak memory consumption.

Source

Rechunks all columns to only have a single chunk.

Source

Rechunks all columns to only have a single chunk and turns it into a [RecordBatchT].

Source

Returns true if the chunks of the columns do not align and re-chunking should be done

Source

Ensure all the chunks in the DataFrame are aligned.

Source

Get the DataFrame schema.

§Example

let df: DataFrame = df!("Thing" => ["Observable universe", "Human stupidity"],
                        "Diameter (m)" => [8.8e26, f64::INFINITY])?;

let f1: Field = Field::new("Thing".into(), DataType::String);
let f2: Field = Field::new("Diameter (m)".into(), DataType::Float64);
let sc: Schema = Schema::from_iter(vec![f1, f2]);

assert_eq!(&**df.schema(), &sc);

Source

Get a reference to the DataFrame columns.

§Example

let df: DataFrame = df!("Name" => ["Adenine", "Cytosine", "Guanine", "Thymine"],
                        "Symbol" => ["A", "C", "G", "T"])?;
let columns: &[Column] = df.get_columns();

assert_eq!(columns[0].name(), "Name");
assert_eq!(columns[1].name(), "Symbol");

Source

Get mutable access to the underlying columns.

§Safety

The caller must ensure the length of all Series remains equal to height orDataFrame::set_height is called afterwards with the appropriate height. The caller must ensure that the cached schema is cleared if it modifies the schema by calling DataFrame::clear_schema.

Source

Remove all the columns in the DataFrame but keep the height.

Source

Extend the columns without checking for name collisions or height.

§Safety

The caller needs to ensure that:

Column names are unique within the resulting DataFrame.
The length of each appended column matches the height of the DataFrame. ForDataFrame]s with no columns (ZCDFs), it is important that the height is set afterwards with DataFrame::set_height.

Source

Take ownership of the underlying columns vec.

Source

Iterator over the columns as Series.

§Example

let s1 = Column::new("Name".into(), ["Pythagoras' theorem", "Shannon entropy"]);
let s2 = Column::new("Formula".into(), ["a²+b²=c²", "H=-Σ[P(x)log|P(x)|]"]);
let df: DataFrame = DataFrame::new(vec![s1.clone(), s2.clone()])?;

let mut iterator = df.iter();

assert_eq!(iterator.next(), Some(s1.as_materialized_series()));
assert_eq!(iterator.next(), Some(s2.as_materialized_series()));
assert_eq!(iterator.next(), None);

Source

§Example

let df: DataFrame = df!("Language" => ["Rust", "Python"],
                        "Designer" => ["Graydon Hoare", "Guido van Rossum"])?;

assert_eq!(df.get_column_names(), &["Language", "Designer"]);

Source

Set the column names.

§Example

let mut df: DataFrame = df!("Mathematical set" => ["ℕ", "ℤ", "𝔻", "ℚ", "ℝ", "ℂ"])?;
df.set_column_names(["Set"])?;

assert_eq!(df.get_column_names(), &["Set"]);

Source

Get the data types of the columns in the DataFrame.

§Example

let venus_air: DataFrame = df!("Element" => ["Carbon dioxide", "Nitrogen"],
                               "Fraction" => [0.965, 0.035])?;

assert_eq!(venus_air.dtypes(), &[DataType::String, DataType::Float64]);

Source

The number of chunks for the first column.

Source

The highest number of chunks for any column.

Source

Get a reference to the schema fields of the DataFrame.

§Example

let earth: DataFrame = df!("Surface type" => ["Water", "Land"],
                           "Fraction" => [0.708, 0.292])?;

let f1: Field = Field::new("Surface type".into(), DataType::String);
let f2: Field = Field::new("Fraction".into(), DataType::Float64);

assert_eq!(earth.fields(), &[f1, f2]);

Source

Get (height, width) of the DataFrame.

§Example

let df0: DataFrame = DataFrame::default();
let df1: DataFrame = df!("1" => [1, 2, 3, 4, 5])?;
let df2: DataFrame = df!("1" => [1, 2, 3, 4, 5],
                         "2" => [1, 2, 3, 4, 5])?;

assert_eq!(df0.shape(), (0 ,0));
assert_eq!(df1.shape(), (5, 1));
assert_eq!(df2.shape(), (5, 2));

Source

Get the width of the DataFrame which is the number of columns.

§Example

let df0: DataFrame = DataFrame::default();
let df1: DataFrame = df!("Series 1" => [0; 0])?;
let df2: DataFrame = df!("Series 1" => [0; 0],
                         "Series 2" => [0; 0])?;

assert_eq!(df0.width(), 0);
assert_eq!(df1.width(), 1);
assert_eq!(df2.width(), 2);

Source

Get the height of the DataFrame which is the number of rows.

§Example

let df0: DataFrame = DataFrame::default();
let df1: DataFrame = df!("Currency" => ["€", "$"])?;
let df2: DataFrame = df!("Currency" => ["€", "$", "¥", "£", "₿"])?;

assert_eq!(df0.height(), 0);
assert_eq!(df1.height(), 2);
assert_eq!(df2.height(), 5);

Source

Returns the size as number of rows * number of columns

Source

Returns true if the DataFrame contains no rows.

§Example

let df1: DataFrame = DataFrame::default();
assert!(df1.is_empty());

let df2: DataFrame = df!("First name" => ["Forever"],
                         "Last name" => ["Alone"])?;
assert!(!df2.is_empty());

Source

Set the height (i.e. number of rows) of this DataFrame.

§Safety

This needs to be equal to the length of all the columns.

Source

Add multiple Series to a DataFrame. The added Series are required to have the same length.

§Example

let df1: DataFrame = df!("Element" => ["Copper", "Silver", "Gold"])?;
let s1 = Column::new("Proton".into(), [29, 47, 79]);
let s2 = Column::new("Electron".into(), [29, 47, 79]);

let df2: DataFrame = df1.hstack(&[s1, s2])?;
assert_eq!(df2.shape(), (3, 3));
println!("{}", df2);

Output:

shape: (3, 3)
+---------+--------+----------+
| Element | Proton | Electron |
| ---     | ---    | ---      |
| str     | i32    | i32      |
+=========+========+==========+
| Copper  | 29     | 29       |
+---------+--------+----------+
| Silver  | 47     | 47       |
+---------+--------+----------+
| Gold    | 79     | 79       |
+---------+--------+----------+

Source

Concatenate a DataFrame to this DataFrame and return as newly allocated DataFrame.

If many vstack operations are done, it is recommended to call DataFrame::align_chunks_par.

§Example

let df1: DataFrame = df!("Element" => ["Copper", "Silver", "Gold"],
                         "Melting Point (K)" => [1357.77, 1234.93, 1337.33])?;
let df2: DataFrame = df!("Element" => ["Platinum", "Palladium"],
                         "Melting Point (K)" => [2041.4, 1828.05])?;

let df3: DataFrame = df1.vstack(&df2)?;

assert_eq!(df3.shape(), (5, 2));
println!("{}", df3);

Output:

shape: (5, 2)
+-----------+-------------------+
| Element   | Melting Point (K) |
| ---       | ---               |
| str       | f64               |
+===========+===================+
| Copper    | 1357.77           |
+-----------+-------------------+
| Silver    | 1234.93           |
+-----------+-------------------+
| Gold      | 1337.33           |
+-----------+-------------------+
| Platinum  | 2041.4            |
+-----------+-------------------+
| Palladium | 1828.05           |
+-----------+-------------------+

Source

Concatenate a DataFrame to this DataFrame

If many vstack operations are done, it is recommended to call DataFrame::align_chunks_par.

§Example

let mut df1: DataFrame = df!("Element" => ["Copper", "Silver", "Gold"],
                         "Melting Point (K)" => [1357.77, 1234.93, 1337.33])?;
let df2: DataFrame = df!("Element" => ["Platinum", "Palladium"],
                         "Melting Point (K)" => [2041.4, 1828.05])?;

df1.vstack_mut(&df2)?;

assert_eq!(df1.shape(), (5, 2));
println!("{}", df1);

Output:

shape: (5, 2)
+-----------+-------------------+
| Element   | Melting Point (K) |
| ---       | ---               |
| str       | f64               |
+===========+===================+
| Copper    | 1357.77           |
+-----------+-------------------+
| Silver    | 1234.93           |
+-----------+-------------------+
| Gold      | 1337.33           |
+-----------+-------------------+
| Platinum  | 2041.4            |
+-----------+-------------------+
| Palladium | 1828.05           |
+-----------+-------------------+

Source

Extend the memory backed by this DataFrame with the values from other.

Different from vstack which adds the chunks from other to the chunks of this DataFrame extend appends the data from other to the underlying memory locations and thus may cause a reallocation.

If this does not cause a reallocation, the resulting data structure will not have any extra chunks and thus will yield faster queries.

Prefer extend over vstack when you want to do a query after a single append. For instance during online operations where you add n rows and rerun a query.

Prefer vstack over extend when you want to append many times before doing a query. For instance when you read in multiple files and when to store them in a single DataFrame. In the latter case, finish the sequence of append operations with a rechunk.

Source

Remove a column by name and return the column removed.

§Example

let mut df: DataFrame = df!("Animal" => ["Tiger", "Lion", "Great auk"],
                            "IUCN" => ["Endangered", "Vulnerable", "Extinct"])?;

let s1: PolarsResult<Column> = df.drop_in_place("Average weight");
assert!(s1.is_err());

let s2: Column = df.drop_in_place("Animal")?;
assert_eq!(s2, Column::new("Animal".into(), &["Tiger", "Lion", "Great auk"]));

Source

Return a new DataFrame where all null values are dropped.

§Example

let df1: DataFrame = df!("Country" => ["Malta", "Liechtenstein", "North Korea"],
                        "Tax revenue (% GDP)" => [Some(32.7), None, None])?;
assert_eq!(df1.shape(), (3, 2));

let df2: DataFrame = df1.drop_nulls::<String>(None)?;
assert_eq!(df2.shape(), (1, 2));
println!("{}", df2);

Output:

shape: (1, 2)
+---------+---------------------+
| Country | Tax revenue (% GDP) |
| ---     | ---                 |
| str     | f64                 |
+=========+=====================+
| Malta   | 32.7                |
+---------+---------------------+

Source

Drop a column by name. This is a pure method and will return a new DataFrame instead of modifying the current one in place.

§Example

let df1: DataFrame = df!("Ray type" => ["α", "β", "X", "γ"])?;
let df2: DataFrame = df1.drop("Ray type")?;

assert!(df2.is_empty());

Source

Drop columns that are in names.

Source

Drop columns that are in names without allocating a HashSet.

Source

Insert a new column at a given index.

Source

Add a new column to this DataFrame or replace an existing one.

Source

Adds a column to the DataFrame without doing any checks on length or duplicates.

§Safety

The caller must ensure self.width() == 0 || column.len() == self.height() .

Source

Add a new column to this DataFrame or replace an existing one. Uses an existing schema to amortize lookups. If the schema is incorrect, we will fallback to linear search.

Note: Schema can be both input or output_schema

Source

Get a row in the DataFrame. Beware this is slow.

§Example

fn example(df: &mut DataFrame, idx: usize) -> Option<Vec<AnyValue>> {
    df.get(idx)
}

Source

Select a Series by index.

§Example

let df: DataFrame = df!("Star" => ["Sun", "Betelgeuse", "Sirius A", "Sirius B"],
                        "Absolute magnitude" => [4.83, -5.85, 1.42, 11.18])?;

let s1: Option<&Column> = df.select_at_idx(0);
let s2 = Column::new("Star".into(), ["Sun", "Betelgeuse", "Sirius A", "Sirius B"]);

assert_eq!(s1, Some(&s2));

Source

Select column(s) from this DataFrame by range and return a new DataFrame

§Examples

let df = df! {
    "0" => [0, 0, 0],
    "1" => [1, 1, 1],
    "2" => [2, 2, 2]
}?;

assert!(df.select(["0", "1"])?.equals(&df.select_by_range(0..=1)?));
assert!(df.equals(&df.select_by_range(..)?));

Source

Get column index of a Series by name.

§Example

let df: DataFrame = df!("Name" => ["Player 1", "Player 2", "Player 3"],
                        "Health" => [100, 200, 500],
                        "Mana" => [250, 100, 0],
                        "Strength" => [30, 150, 300])?;

assert_eq!(df.get_column_index("Name"), Some(0));
assert_eq!(df.get_column_index("Health"), Some(1));
assert_eq!(df.get_column_index("Mana"), Some(2));
assert_eq!(df.get_column_index("Strength"), Some(3));
assert_eq!(df.get_column_index("Haste"), None);

Source

Get column index of a Series by name.

Source

Select a single column by name.

§Example

let s1 = Column::new("Password".into(), ["123456", "[]B$u$g$s$B#u#n#n#y[]{}"]);
let s2 = Column::new("Robustness".into(), ["Weak", "Strong"]);
let df: DataFrame = DataFrame::new(vec![s1.clone(), s2])?;

assert_eq!(df.column("Password")?, &s1);

Source

Selected multiple columns by name.

§Example

let df: DataFrame = df!("Latin name" => ["Oncorhynchus kisutch", "Salmo salar"],
                        "Max weight (kg)" => [16.0, 35.89])?;
let sv: Vec<&Column> = df.columns(["Latin name", "Max weight (kg)"])?;

assert_eq!(&df[0], sv[0]);
assert_eq!(&df[1], sv[1]);

Source

Select column(s) from this DataFrame and return a new DataFrame.

§Examples

fn example(df: &DataFrame) -> PolarsResult<DataFrame> {
    df.select(["foo", "bar"])
}

Source

Select with a known schema. The schema names must match the column names of this DataFrame.

Source

Select with a known schema without checking for duplicates in selection. The schema names must match the column names of this DataFrame.

Source

The schema names must match the column names of this DataFrame.

Source

Select column(s) from this DataFrame and return them into a Vec.

§Example

let df: DataFrame = df!("Name" => ["Methane", "Ethane", "Propane"],
                        "Carbon" => [1, 2, 3],
                        "Hydrogen" => [4, 6, 8])?;
let sv: Vec<Column> = df.select_columns(["Carbon", "Hydrogen"])?;

assert_eq!(df["Carbon"], sv[0]);
assert_eq!(df["Hydrogen"], sv[1]);

Source

Take the DataFrame rows by a boolean mask.

§Example

fn example(df: &DataFrame) -> PolarsResult<DataFrame> {
    let mask = df.column("sepal_width")?.is_not_null();
    df.filter(&mask)
}

Source

Same as filter but does not parallelize.

Source

Take DataFrame rows by index values.

§Example

fn example(df: &DataFrame) -> PolarsResult<DataFrame> {
    let idx = IdxCa::new("idx".into(), [0, 1, 9]);
    df.take(&idx)
}

Source

§Safety

The indices must be in-bounds.

Source

§Safety

The indices must be in-bounds.

Source

§Safety

The indices must be in-bounds.

Source

§Safety

The indices must be in-bounds.

Source

Rename a column in the DataFrame.

§Example

fn example(df: &mut DataFrame) -> PolarsResult<&mut DataFrame> {
    let original_name = "foo";
    let new_name = "bar";
    df.rename(original_name, new_name.into())
}

Source

Create a DataFrame that has fields for all the known runtime metadata for each column.

This dataframe does not necessarily have a specified schema and may be changed at any point. It is primarily used for debugging.

Source

Return a sorted clone of this DataFrame.

In many cases the output chunks will be continuous in memory but this is not guaranteed

§Example

Sort by a single column with default options:

fn sort_by_sepal_width(df: &DataFrame) -> PolarsResult<DataFrame> {
    df.sort(["sepal_width"], Default::default())
}

Sort by a single column with specific order:

fn sort_with_specific_order(df: &DataFrame, descending: bool) -> PolarsResult<DataFrame> {
    df.sort(
        ["sepal_width"],
        SortMultipleOptions::new()
            .with_order_descending(descending)
    )
}

Sort by multiple columns with specifying order for each column:

fn sort_by_multiple_columns_with_specific_order(df: &DataFrame) -> PolarsResult<DataFrame> {
    df.sort(
        ["sepal_width", "sepal_length"],
        SortMultipleOptions::new()
            .with_order_descending_multi([false, true])
    )
}

See SortMultipleOptions for more options.

Also see DataFrame::sort_in_place.

Source

Replace a column with a Series.

§Example

let mut df: DataFrame = df!("Country" => ["United States", "China"],
                        "Area (km²)" => [9_833_520, 9_596_961])?;
let s: Series = Series::new("Country".into(), ["USA", "PRC"]);

assert!(df.replace("Nation", s.clone()).is_err());
assert!(df.replace("Country", s).is_ok());

Source

Replace or update a column. The difference between this method and DataFrame::with_columnis that now the value of column: &str determines the name of the column and not the name of the Series passed to this method.

Source

Replace column at index idx with a Series.

§Example

# use polars_core::prelude::*;
let s0 = Series::new("foo".into(), ["ham", "spam", "egg"]);
let s1 = Series::new("ascii".into(), [70, 79, 79]);
let mut df = DataFrame::new(vec![s0, s1])?;

// Add 32 to get lowercase ascii values
df.replace_column(1, df.select_at_idx(1).unwrap() + 32);
# Ok::<(), PolarsError>(())

Source

Apply a closure to a column. This is the recommended way to do in place modification.

§Example

let s0 = Column::new("foo".into(), ["ham", "spam", "egg"]);
let s1 = Column::new("names".into(), ["Jean", "Claude", "van"]);
let mut df = DataFrame::new(vec![s0, s1])?;

fn str_to_len(str_val: &Column) -> Column {
    str_val.str()
        .unwrap()
        .into_iter()
        .map(|opt_name: Option<&str>| {
            opt_name.map(|name: &str| name.len() as u32)
         })
        .collect::<UInt32Chunked>()
        .into_column()
}

// Replace the names column by the length of the names.
df.apply("names", str_to_len);

Results in:

+--------+-------+
| foo    |       |
| ---    | names |
| str    | u32   |
+========+=======+
| "ham"  | 4     |
+--------+-------+
| "spam" | 6     |
+--------+-------+
| "egg"  | 3     |
+--------+-------+

Source

Apply a closure to a column at index idx. This is the recommended way to do in place modification.

§Example

let s0 = Column::new("foo".into(), ["ham", "spam", "egg"]);
let s1 = Column::new("ascii".into(), [70, 79, 79]);
let mut df = DataFrame::new(vec![s0, s1])?;

// Add 32 to get lowercase ascii values
df.apply_at_idx(1, |s| s + 32);

Results in:

+--------+-------+
| foo    | ascii |
| ---    | ---   |
| str    | i32   |
+========+=======+
| "ham"  | 102   |
+--------+-------+
| "spam" | 111   |
+--------+-------+
| "egg"  | 111   |
+--------+-------+

Source

Apply a closure that may fail to a column at index idx. This is the recommended way to do in place modification.

§Example

This is the idiomatic way to replace some values a column of a DataFrame given range of indexes.

let s0 = Column::new("foo".into(), ["ham", "spam", "egg", "bacon", "quack"]);
let s1 = Column::new("values".into(), [1, 2, 3, 4, 5]);
let mut df = DataFrame::new(vec![s0, s1])?;

let idx = vec![0, 1, 4];

df.try_apply("foo", |c| {
    c.str()?
    .scatter_with(idx, |opt_val| opt_val.map(|string| format!("{}-is-modified", string)))
});

Results in:

+---------------------+--------+
| foo                 | values |
| ---                 | ---    |
| str                 | i32    |
+=====================+========+
| "ham-is-modified"   | 1      |
+---------------------+--------+
| "spam-is-modified"  | 2      |
+---------------------+--------+
| "egg"               | 3      |
+---------------------+--------+
| "bacon"             | 4      |
+---------------------+--------+
| "quack-is-modified" | 5      |
+---------------------+--------+

Source

Apply a closure that may fail to a column. This is the recommended way to do in place modification.

§Example

This is the idiomatic way to replace some values a column of a DataFrame given a boolean mask.

let s0 = Column::new("foo".into(), ["ham", "spam", "egg", "bacon", "quack"]);
let s1 = Column::new("values".into(), [1, 2, 3, 4, 5]);
let mut df = DataFrame::new(vec![s0, s1])?;

// create a mask
let values = df.column("values")?.as_materialized_series();
let mask = values.lt_eq(1)? | values.gt_eq(5_i32)?;

df.try_apply("foo", |c| {
    c.str()?
    .set(&mask, Some("not_within_bounds"))
});

Results in:

+---------------------+--------+
| foo                 | values |
| ---                 | ---    |
| str                 | i32    |
+=====================+========+
| "not_within_bounds" | 1      |
+---------------------+--------+
| "spam"              | 2      |
+---------------------+--------+
| "egg"               | 3      |
+---------------------+--------+
| "bacon"             | 4      |
+---------------------+--------+
| "not_within_bounds" | 5      |
+---------------------+--------+

Source

Slice the DataFrame along the rows.

§Example

let df: DataFrame = df!("Fruit" => ["Apple", "Grape", "Grape", "Fig", "Fig"],
                        "Color" => ["Green", "Red", "White", "White", "Red"])?;
let sl: DataFrame = df.slice(2, 3);

assert_eq!(sl.shape(), (3, 2));
println!("{}", sl);

Output:

shape: (3, 2)
+-------+-------+
| Fruit | Color |
| ---   | ---   |
| str   | str   |
+=======+=======+
| Grape | White |
+-------+-------+
| Fig   | White |
+-------+-------+
| Fig   | Red   |
+-------+-------+

Source

Get the head of the DataFrame.

§Example

let countries: DataFrame =
    df!("Rank by GDP (2021)" => [1, 2, 3, 4, 5],
        "Continent" => ["North America", "Asia", "Asia", "Europe", "Europe"],
        "Country" => ["United States", "China", "Japan", "Germany", "United Kingdom"],
        "Capital" => ["Washington", "Beijing", "Tokyo", "Berlin", "London"])?;
assert_eq!(countries.shape(), (5, 4));

println!("{}", countries.head(Some(3)));

Output:

shape: (3, 4)
+--------------------+---------------+---------------+------------+
| Rank by GDP (2021) | Continent     | Country       | Capital    |
| ---                | ---           | ---           | ---        |
| i32                | str           | str           | str        |
+====================+===============+===============+============+
| 1                  | North America | United States | Washington |
+--------------------+---------------+---------------+------------+
| 2                  | Asia          | China         | Beijing    |
+--------------------+---------------+---------------+------------+
| 3                  | Asia          | Japan         | Tokyo      |
+--------------------+---------------+---------------+------------+

Source

Get the tail of the DataFrame.

§Example

let countries: DataFrame =
    df!("Rank (2021)" => [105, 106, 107, 108, 109],
        "Apple Price (€/kg)" => [0.75, 0.70, 0.70, 0.65, 0.52],
        "Country" => ["Kosovo", "Moldova", "North Macedonia", "Syria", "Turkey"])?;
assert_eq!(countries.shape(), (5, 3));

println!("{}", countries.tail(Some(2)));

Output:

shape: (2, 3)
+-------------+--------------------+---------+
| Rank (2021) | Apple Price (€/kg) | Country |
| ---         | ---                | ---     |
| i32         | f64                | str     |
+=============+====================+=========+
| 108         | 0.63               | Syria   |
+-------------+--------------------+---------+
| 109         | 0.63               | Turkey  |
+-------------+--------------------+---------+

Source

Iterator over the rows in this DataFrame as Arrow RecordBatches.

§Panics

Panics if the DataFrame that is passed is not rechunked.

This responsibility is left to the caller as we don’t want to take mutable references here, but we also don’t want to rechunk here, as this operation is costly and would benefit the caller as well.

Source

Iterator over the rows in this DataFrame as Arrow RecordBatches as physical values.

§Panics

Panics if the DataFrame that is passed is not rechunked.

Source

Get a DataFrame with all the columns in reversed order.

Source

Shift the values by a given period and fill the parts that will be empty due to this operation with Nones.

See the method on Series for more info on the shift operation.

Source

Replace None values with one of the following strategies:

Forward fill (replace None with the previous value)
Backward fill (replace None with the next value)
Mean fill (replace None with the mean of the whole array)
Min fill (replace None with the minimum of the whole array)
Max fill (replace None with the maximum of the whole array)

See the method on Series for more info on the fill_null operation.

Source

Pipe different functions/ closure operations that work on a DataFrame together.

Source

Pipe different functions/ closure operations that work on a DataFrame together.

Source

Pipe different functions/ closure operations that work on a DataFrame together.

Source

Drop duplicate rows from a DataFrame.This fails when there is a column of type List in DataFrame

Stable means that the order is maintained. This has a higher cost than an unstable distinct.

§Example

let df = df! {
              "flt" => [1., 1., 2., 2., 3., 3.],
              "int" => [1, 1, 2, 2, 3, 3, ],
              "str" => ["a", "a", "b", "b", "c", "c"]
          }?;

println!("{}", df.unique_stable(None, UniqueKeepStrategy::First, None)?);

Returns

+-----+-----+-----+
| flt | int | str |
| --- | --- | --- |
| f64 | i32 | str |
+=====+=====+=====+
| 1   | 1   | "a" |
+-----+-----+-----+
| 2   | 2   | "b" |
+-----+-----+-----+
| 3   | 3   | "c" |
+-----+-----+-----+

Source

Get a mask of all the unique rows in the DataFrame.

§Example

let df: DataFrame = df!("Company" => ["Apple", "Microsoft"],
                        "ISIN" => ["US0378331005", "US5949181045"])?;
let ca: ChunkedArray<BooleanType> = df.is_unique()?;

assert!(ca.all());

Source

Get a mask of all the duplicated rows in the DataFrame.

§Example

let df: DataFrame = df!("Company" => ["Alphabet", "Alphabet"],
                        "ISIN" => ["US02079K3059", "US02079K1079"])?;
let ca: ChunkedArray<BooleanType> = df.is_duplicated()?;

assert!(!ca.all());

Source

Create a new DataFrame that shows the null counts per column.

Source

Get the supertype of the columns in this DataFrame

Source

Split into multiple DataFrames partitioned by groups

Source

Split into multiple DataFrames partitioned by groups Order of the groups are maintained.

Source

Unnest the given Struct columns. This means that the fields of the Struct type will be inserted as columns.

Check if DataFrame’ schemas are equal.

Source

Check if DataFrames are equal. Note that None == None evaluates to false

§Example

let df1: DataFrame = df!("Atomic number" => &[1, 51, 300],
                        "Element" => &[Some("Hydrogen"), Some("Antimony"), None])?;
let df2: DataFrame = df!("Atomic number" => &[1, 51, 300],
                        "Element" => &[Some("Hydrogen"), Some("Antimony"), None])?;

assert!(!df1.equals(&df2));

Source

Check if all values in DataFrames are equal where None == None evaluates to true.

§Example

let df1: DataFrame = df!("Atomic number" => &[1, 51, 300],
                        "Element" => &[Some("Hydrogen"), Some("Antimony"), None])?;
let df2: DataFrame = df!("Atomic number" => &[1, 51, 300],
                        "Element" => &[Some("Hydrogen"), Some("Antimony"), None])?;

assert!(df1.equals_missing(&df2));

DataFrame in polars::frame - Rust (original) (raw)

§Use declarations

§Initialization

§Default

§Wrapping a Vec<Series>

§Using a macro

§Using a CSV file

§Indexing

§By a number

§By a Series name

§Example

§Example

§Safety

§Example

§Safety

§Implementation

§Example

§Safety

§Example

§Example

§Example

§Safety

§Panics

§Safety

§Safety

§Example

§Example

§Safety

§Safety

§Example

§Example

§Example

§Example

§Example

§Example

§Example

§Example

§Example

§Safety

§Example

§Example

§Example

§Example

§Example

§Example

§Safety

§Example

§Example

§Examples

§Example

§Example

§Example

§Examples

§Example

§Example

§Example

§Safety

§Safety

§Safety

§Safety

§Example

§Example

§Example

§Example

§Example

§Example

§Example

§Example

§Example

§Example

§Example

§Panics

§Panics

§Example

§Example

§Example

§Example

§Example

§Wrapping a `Vec<Series>`

§By a `Series` name