DataFrame — pandas 2.3.3 documentation (original) (raw)
Constructor#
| DataFrame([data, index, columns, dtype, copy]) | Two-dimensional, size-mutable, potentially heterogeneous tabular data. |
|---|
Attributes and underlying data#
Axes
| DataFrame.dtypes | Return the dtypes in the DataFrame. |
|---|---|
| DataFrame.info([verbose, buf, max_cols, ...]) | Print a concise summary of a DataFrame. |
| DataFrame.select_dtypes([include, exclude]) | Return a subset of the DataFrame's columns based on the column dtypes. |
| DataFrame.values | Return a Numpy representation of the DataFrame. |
| DataFrame.axes | Return a list representing the axes of the DataFrame. |
| DataFrame.ndim | Return an int representing the number of axes / array dimensions. |
| DataFrame.size | Return an int representing the number of elements in this object. |
| DataFrame.shape | Return a tuple representing the dimensionality of the DataFrame. |
| DataFrame.memory_usage([index, deep]) | Return the memory usage of each column in bytes. |
| DataFrame.empty | Indicator whether Series/DataFrame is empty. |
| DataFrame.set_flags(*[, copy, ...]) | Return a new object with updated flags. |
Conversion#
| DataFrame.astype(dtype[, copy, errors]) | Cast a pandas object to a specified dtype dtype. |
|---|---|
| DataFrame.convert_dtypes([infer_objects, ...]) | Convert columns to the best possible dtypes using dtypes supporting pd.NA. |
| DataFrame.infer_objects([copy]) | Attempt to infer better dtypes for object columns. |
| DataFrame.copy([deep]) | Make a copy of this object's indices and data. |
| DataFrame.bool() | (DEPRECATED) Return the bool of a single element Series or DataFrame. |
| DataFrame.to_numpy([dtype, copy, na_value]) | Convert the DataFrame to a NumPy array. |
Indexing, iteration#
| DataFrame.head([n]) | Return the first n rows. |
|---|---|
| DataFrame.at | Access a single value for a row/column label pair. |
| DataFrame.iat | Access a single value for a row/column pair by integer position. |
| DataFrame.loc | Access a group of rows and columns by label(s) or a boolean array. |
| DataFrame.iloc | (DEPRECATED) Purely integer-location based indexing for selection by position. |
| DataFrame.insert(loc, column, value[, ...]) | Insert column into DataFrame at specified location. |
| DataFrame.__iter__() | Iterate over info axis. |
| DataFrame.items() | Iterate over (column name, Series) pairs. |
| DataFrame.keys() | Get the 'info axis' (see Indexing for more). |
| DataFrame.iterrows() | Iterate over DataFrame rows as (index, Series) pairs. |
| DataFrame.itertuples([index, name]) | Iterate over DataFrame rows as namedtuples. |
| DataFrame.pop(item) | Return item and drop from frame. |
| DataFrame.tail([n]) | Return the last n rows. |
| DataFrame.xs(key[, axis, level, drop_level]) | Return cross-section from the Series/DataFrame. |
| DataFrame.get(key[, default]) | Get item from object for given key (ex: DataFrame column). |
| DataFrame.isin(values) | Whether each element in the DataFrame is contained in values. |
| DataFrame.where(cond[, other, inplace, ...]) | Replace values where the condition is False. |
| DataFrame.mask(cond[, other, inplace, axis, ...]) | Replace values where the condition is True. |
| DataFrame.query(expr, *[, inplace]) | Query the columns of a DataFrame with a boolean expression. |
For more information on .at, .iat, .loc, and.iloc, see the indexing documentation.
Binary operator functions#
| DataFrame.__add__(other) | Get Addition of DataFrame and other, column-wise. |
|---|---|
| DataFrame.add(other[, axis, level, fill_value]) | Get Addition of dataframe and other, element-wise (binary operator add). |
| DataFrame.sub(other[, axis, level, fill_value]) | Get Subtraction of dataframe and other, element-wise (binary operator sub). |
| DataFrame.mul(other[, axis, level, fill_value]) | Get Multiplication of dataframe and other, element-wise (binary operator mul). |
| DataFrame.div(other[, axis, level, fill_value]) | Get Floating division of dataframe and other, element-wise (binary operator truediv). |
| DataFrame.truediv(other[, axis, level, ...]) | Get Floating division of dataframe and other, element-wise (binary operator truediv). |
| DataFrame.floordiv(other[, axis, level, ...]) | Get Integer division of dataframe and other, element-wise (binary operator floordiv). |
| DataFrame.mod(other[, axis, level, fill_value]) | Get Modulo of dataframe and other, element-wise (binary operator mod). |
| DataFrame.pow(other[, axis, level, fill_value]) | Get Exponential power of dataframe and other, element-wise (binary operator pow). |
| DataFrame.dot(other) | Compute the matrix multiplication between the DataFrame and other. |
| DataFrame.radd(other[, axis, level, fill_value]) | Get Addition of dataframe and other, element-wise (binary operator radd). |
| DataFrame.rsub(other[, axis, level, fill_value]) | Get Subtraction of dataframe and other, element-wise (binary operator rsub). |
| DataFrame.rmul(other[, axis, level, fill_value]) | Get Multiplication of dataframe and other, element-wise (binary operator rmul). |
| DataFrame.rdiv(other[, axis, level, fill_value]) | Get Floating division of dataframe and other, element-wise (binary operator rtruediv). |
| DataFrame.rtruediv(other[, axis, level, ...]) | Get Floating division of dataframe and other, element-wise (binary operator rtruediv). |
| DataFrame.rfloordiv(other[, axis, level, ...]) | Get Integer division of dataframe and other, element-wise (binary operator rfloordiv). |
| DataFrame.rmod(other[, axis, level, fill_value]) | Get Modulo of dataframe and other, element-wise (binary operator rmod). |
| DataFrame.rpow(other[, axis, level, fill_value]) | Get Exponential power of dataframe and other, element-wise (binary operator rpow). |
| DataFrame.lt(other[, axis, level]) | Get Less than of dataframe and other, element-wise (binary operator lt). |
| DataFrame.gt(other[, axis, level]) | Get Greater than of dataframe and other, element-wise (binary operator gt). |
| DataFrame.le(other[, axis, level]) | Get Less than or equal to of dataframe and other, element-wise (binary operator le). |
| DataFrame.ge(other[, axis, level]) | Get Greater than or equal to of dataframe and other, element-wise (binary operator ge). |
| DataFrame.ne(other[, axis, level]) | Get Not equal to of dataframe and other, element-wise (binary operator ne). |
| DataFrame.eq(other[, axis, level]) | Get Equal to of dataframe and other, element-wise (binary operator eq). |
| DataFrame.combine(other, func[, fill_value, ...]) | Perform column-wise combine with another DataFrame. |
| DataFrame.combine_first(other) | Update null elements with value in the same location in other. |
Function application, GroupBy & window#
| DataFrame.apply(func[, axis, raw, ...]) | Apply a function along an axis of the DataFrame. |
|---|---|
| DataFrame.map(func[, na_action]) | Apply a function to a Dataframe elementwise. |
| DataFrame.applymap(func[, na_action]) | (DEPRECATED) Apply a function to a Dataframe elementwise. |
| DataFrame.pipe(func, *args, **kwargs) | Apply chainable functions that expect Series or DataFrames. |
| DataFrame.agg([func, axis]) | Aggregate using one or more operations over the specified axis. |
| DataFrame.aggregate([func, axis]) | Aggregate using one or more operations over the specified axis. |
| DataFrame.transform(func[, axis]) | Call func on self producing a DataFrame with the same axis shape as self. |
| DataFrame.groupby([by, axis, level, ...]) | Group DataFrame using a mapper or by a Series of columns. |
| DataFrame.rolling(window[, min_periods, ...]) | Provide rolling window calculations. |
| DataFrame.expanding([min_periods, axis, method]) | Provide expanding window calculations. |
| DataFrame.ewm([com, span, halflife, alpha, ...]) | Provide exponentially weighted (EW) calculations. |
Computations / descriptive stats#
| DataFrame.abs() | Return a Series/DataFrame with absolute numeric value of each element. |
|---|---|
| DataFrame.all([axis, bool_only, skipna]) | Return whether all elements are True, potentially over an axis. |
| DataFrame.any(*[, axis, bool_only, skipna]) | Return whether any element is True, potentially over an axis. |
| DataFrame.clip([lower, upper, axis, inplace]) | Trim values at input threshold(s). |
| DataFrame.corr([method, min_periods, ...]) | Compute pairwise correlation of columns, excluding NA/null values. |
| DataFrame.corrwith(other[, axis, drop, ...]) | Compute pairwise correlation. |
| DataFrame.count([axis, numeric_only]) | Count non-NA cells for each column or row. |
| DataFrame.cov([min_periods, ddof, numeric_only]) | Compute pairwise covariance of columns, excluding NA/null values. |
| DataFrame.cummax([axis, skipna]) | Return cumulative maximum over a DataFrame or Series axis. |
| DataFrame.cummin([axis, skipna]) | Return cumulative minimum over a DataFrame or Series axis. |
| DataFrame.cumprod([axis, skipna]) | Return cumulative product over a DataFrame or Series axis. |
| DataFrame.cumsum([axis, skipna]) | Return cumulative sum over a DataFrame or Series axis. |
| DataFrame.describe([percentiles, include, ...]) | Generate descriptive statistics. |
| DataFrame.diff([periods, axis]) | First discrete difference of element. |
| DataFrame.eval(expr, *[, inplace]) | Evaluate a string describing operations on DataFrame columns. |
| DataFrame.kurt([axis, skipna, numeric_only]) | Return unbiased kurtosis over requested axis. |
| DataFrame.kurtosis([axis, skipna, numeric_only]) | Return unbiased kurtosis over requested axis. |
| DataFrame.max([axis, skipna, numeric_only]) | Return the maximum of the values over the requested axis. |
| DataFrame.mean([axis, skipna, numeric_only]) | Return the mean of the values over the requested axis. |
| DataFrame.median([axis, skipna, numeric_only]) | Return the median of the values over the requested axis. |
| DataFrame.min([axis, skipna, numeric_only]) | Return the minimum of the values over the requested axis. |
| DataFrame.mode([axis, numeric_only, dropna]) | Get the mode(s) of each element along the selected axis. |
| DataFrame.pct_change([periods, fill_method, ...]) | Fractional change between the current and a prior element. |
| DataFrame.prod([axis, skipna, numeric_only, ...]) | Return the product of the values over the requested axis. |
| DataFrame.product([axis, skipna, ...]) | Return the product of the values over the requested axis. |
| DataFrame.quantile([q, axis, numeric_only, ...]) | Return values at the given quantile over requested axis. |
| DataFrame.rank([axis, method, numeric_only, ...]) | Compute numerical data ranks (1 through n) along axis. |
| DataFrame.round([decimals]) | Round a DataFrame to a variable number of decimal places. |
| DataFrame.sem([axis, skipna, ddof, numeric_only]) | Return unbiased standard error of the mean over requested axis. |
| DataFrame.skew([axis, skipna, numeric_only]) | Return unbiased skew over requested axis. |
| DataFrame.sum([axis, skipna, numeric_only, ...]) | Return the sum of the values over the requested axis. |
| DataFrame.std([axis, skipna, ddof, numeric_only]) | Return sample standard deviation over requested axis. |
| DataFrame.var([axis, skipna, ddof, numeric_only]) | Return unbiased variance over requested axis. |
| DataFrame.nunique([axis, dropna]) | Count number of distinct elements in specified axis. |
| DataFrame.value_counts([subset, normalize, ...]) | Return a Series containing the frequency of each distinct row in the Dataframe. |
Reindexing / selection / label manipulation#
| DataFrame.add_prefix(prefix[, axis]) | Prefix labels with string prefix. |
|---|---|
| DataFrame.add_suffix(suffix[, axis]) | Suffix labels with string suffix. |
| DataFrame.align(other[, join, axis, level, ...]) | Align two objects on their axes with the specified join method. |
| DataFrame.at_time(time[, asof, axis]) | Select values at particular time of day (e.g., 9:30AM). |
| DataFrame.between_time(start_time, end_time) | Select values between particular times of the day (e.g., 9:00-9:30 AM). |
| DataFrame.drop([labels, axis, index, ...]) | Drop specified labels from rows or columns. |
| DataFrame.drop_duplicates([subset, keep, ...]) | Return DataFrame with duplicate rows removed. |
| DataFrame.duplicated([subset, keep]) | Return boolean Series denoting duplicate rows. |
| DataFrame.equals(other) | Test whether two objects contain the same elements. |
| DataFrame.filter([items, like, regex, axis]) | Subset the dataframe rows or columns according to the specified index labels. |
| DataFrame.first(offset) | (DEPRECATED) Select initial periods of time series data based on a date offset. |
| DataFrame.head([n]) | Return the first n rows. |
| DataFrame.idxmax([axis, skipna, numeric_only]) | Return index of first occurrence of maximum over requested axis. |
| DataFrame.idxmin([axis, skipna, numeric_only]) | Return index of first occurrence of minimum over requested axis. |
| DataFrame.last(offset) | (DEPRECATED) Select final periods of time series data based on a date offset. |
| DataFrame.reindex([labels, index, columns, ...]) | Conform DataFrame to new index with optional filling logic. |
| DataFrame.reindex_like(other[, method, ...]) | Return an object with matching indices as other object. |
| DataFrame.rename([mapper, index, columns, ...]) | Rename columns or index labels. |
| DataFrame.rename_axis([mapper, index, ...]) | Set the name of the axis for the index or columns. |
| DataFrame.reset_index([level, drop, ...]) | Reset the index, or a level of it. |
| DataFrame.sample([n, frac, replace, ...]) | Return a random sample of items from an axis of object. |
| DataFrame.set_axis(labels, *[, axis, copy]) | Assign desired index to given axis. |
| DataFrame.set_index(keys, *[, drop, append, ...]) | Set the DataFrame index using existing columns. |
| DataFrame.tail([n]) | Return the last n rows. |
| DataFrame.take(indices[, axis]) | Return the elements in the given positional indices along an axis. |
| DataFrame.truncate([before, after, axis, copy]) | Truncate a Series or DataFrame before and after some index value. |
Missing data handling#
| DataFrame.backfill(*[, axis, inplace, ...]) | (DEPRECATED) Fill NA/NaN values by using the next valid observation to fill the gap. |
|---|---|
| DataFrame.bfill(*[, axis, inplace, limit, ...]) | Fill NA/NaN values by using the next valid observation to fill the gap. |
| DataFrame.dropna(*[, axis, how, thresh, ...]) | Remove missing values. |
| DataFrame.ffill(*[, axis, inplace, limit, ...]) | Fill NA/NaN values by propagating the last valid observation to next valid. |
| DataFrame.fillna([value, method, axis, ...]) | Fill NA/NaN values using the specified method. |
| DataFrame.interpolate([method, axis, limit, ...]) | Fill NaN values using an interpolation method. |
| DataFrame.isna() | Detect missing values. |
| DataFrame.isnull() | DataFrame.isnull is an alias for DataFrame.isna. |
| DataFrame.notna() | Detect existing (non-missing) values. |
| DataFrame.notnull() | DataFrame.notnull is an alias for DataFrame.notna. |
| DataFrame.pad(*[, axis, inplace, limit, ...]) | (DEPRECATED) Fill NA/NaN values by propagating the last valid observation to next valid. |
| DataFrame.replace([to_replace, value, ...]) | Replace values given in to_replace with value. |
Reshaping, sorting, transposing#
| DataFrame.droplevel(level[, axis]) | Return Series/DataFrame with requested index / column level(s) removed. |
|---|---|
| DataFrame.pivot(*, columns[, index, values]) | Return reshaped DataFrame organized by given index / column values. |
| DataFrame.pivot_table([values, index, ...]) | Create a spreadsheet-style pivot table as a DataFrame. |
| DataFrame.reorder_levels(order[, axis]) | Rearrange index levels using input order. |
| DataFrame.sort_values(by, *[, axis, ...]) | Sort by the values along either axis. |
| DataFrame.sort_index(*[, axis, level, ...]) | Sort object by labels (along an axis). |
| DataFrame.nlargest(n, columns[, keep]) | Return the first n rows ordered by columns in descending order. |
| DataFrame.nsmallest(n, columns[, keep]) | Return the first n rows ordered by columns in ascending order. |
| DataFrame.swaplevel([i, j, axis]) | Swap levels i and j in a MultiIndex. |
| DataFrame.stack([level, dropna, sort, ...]) | Stack the prescribed level(s) from columns to index. |
| DataFrame.unstack([level, fill_value, sort]) | Pivot a level of the (necessarily hierarchical) index labels. |
| DataFrame.swapaxes(axis1, axis2[, copy]) | (DEPRECATED) Interchange axes and swap values axes appropriately. |
| DataFrame.melt([id_vars, value_vars, ...]) | Unpivot a DataFrame from wide to long format, optionally leaving identifiers set. |
| DataFrame.explode(column[, ignore_index]) | Transform each element of a list-like to a row, replicating index values. |
| DataFrame.squeeze([axis]) | Squeeze 1 dimensional axis objects into scalars. |
| DataFrame.to_xarray() | Return an xarray object from the pandas object. |
| DataFrame.T | The transpose of the DataFrame. |
| DataFrame.transpose(*args[, copy]) | Transpose index and columns. |
Combining / comparing / joining / merging#
| DataFrame.assign(**kwargs) | Assign new columns to a DataFrame. |
|---|---|
| DataFrame.compare(other[, align_axis, ...]) | Compare to another DataFrame and show the differences. |
| DataFrame.join(other[, on, how, lsuffix, ...]) | Join columns of another DataFrame. |
| DataFrame.merge(right[, how, on, left_on, ...]) | Merge DataFrame or named Series objects with a database-style join. |
| DataFrame.update(other[, join, overwrite, ...]) | Modify in place using non-NA values from another DataFrame. |
Flags#
Flags refer to attributes of the pandas object. Properties of the dataset (like the date is was recorded, the URL it was accessed from, etc.) should be stored in DataFrame.attrs.
| Flags(obj, *, allows_duplicate_labels) | Flags that apply to pandas objects. |
|---|
Metadata#
DataFrame.attrs is a dictionary for storing global metadata for this DataFrame.
Warning
DataFrame.attrs is considered experimental and may change without warning.
Plotting#
DataFrame.plot is both a callable method and a namespace attribute for specific plotting methods of the form DataFrame.plot.<kind>.
| DataFrame.plot([x, y, kind, ax, ....]) | DataFrame plotting accessor and method |
|---|
| DataFrame.plot.area([x, y, stacked]) | Draw a stacked area plot. |
|---|---|
| DataFrame.plot.bar([x, y]) | Vertical bar plot. |
| DataFrame.plot.barh([x, y]) | Make a horizontal bar plot. |
| DataFrame.plot.box([by]) | Make a box plot of the DataFrame columns. |
| DataFrame.plot.density([bw_method, ind]) | Generate Kernel Density Estimate plot using Gaussian kernels. |
| DataFrame.plot.hexbin(x, y[, C, ...]) | Generate a hexagonal binning plot. |
| DataFrame.plot.hist([by, bins]) | Draw one histogram of the DataFrame's columns. |
| DataFrame.plot.kde([bw_method, ind]) | Generate Kernel Density Estimate plot using Gaussian kernels. |
| DataFrame.plot.line([x, y]) | Plot Series or DataFrame as lines. |
| DataFrame.plot.pie(**kwargs) | Generate a pie plot. |
| DataFrame.plot.scatter(x, y[, s, c]) | Create a scatter plot with varying marker point size and color. |
| DataFrame.boxplot([column, by, ax, ...]) | Make a box plot from DataFrame columns. |
|---|---|
| DataFrame.hist([column, by, grid, ...]) | Make a histogram of the DataFrame's columns. |
Sparse accessor#
Sparse-dtype specific methods and attributes are provided under theDataFrame.sparse accessor.
Serialization / IO / conversion#
| DataFrame.from_dict(data[, orient, dtype, ...]) | Construct DataFrame from dict of array-like or dicts. |
|---|---|
| DataFrame.from_records(data[, index, ...]) | Convert structured or record ndarray to DataFrame. |
| DataFrame.to_orc([path, engine, index, ...]) | Write a DataFrame to the ORC format. |
| DataFrame.to_parquet([path, engine, ...]) | Write a DataFrame to the binary parquet format. |
| DataFrame.to_pickle(path, *[, compression, ...]) | Pickle (serialize) object to file. |
| DataFrame.to_csv([path_or_buf, sep, na_rep, ...]) | Write object to a comma-separated values (csv) file. |
| DataFrame.to_hdf(path_or_buf, *, key[, ...]) | Write the contained data to an HDF5 file using HDFStore. |
| DataFrame.to_sql(name, con, *[, schema, ...]) | Write records stored in a DataFrame to a SQL database. |
| DataFrame.to_dict([orient, into, index]) | Convert the DataFrame to a dictionary. |
| DataFrame.to_excel(excel_writer, *[, ...]) | Write object to an Excel sheet. |
| DataFrame.to_json([path_or_buf, orient, ...]) | Convert the object to a JSON string. |
| DataFrame.to_html([buf, columns, col_space, ...]) | Render a DataFrame as an HTML table. |
| DataFrame.to_feather(path, **kwargs) | Write a DataFrame to the binary Feather format. |
| DataFrame.to_latex([buf, columns, header, ...]) | Render object to a LaTeX tabular, longtable, or nested table. |
| DataFrame.to_stata(path, *[, convert_dates, ...]) | Export DataFrame object to Stata dta format. |
| DataFrame.to_gbq(destination_table, *[, ...]) | (DEPRECATED) Write a DataFrame to a Google BigQuery table. |
| DataFrame.to_records([index, column_dtypes, ...]) | Convert DataFrame to a NumPy record array. |
| DataFrame.to_string([buf, columns, ...]) | Render a DataFrame to a console-friendly tabular output. |
| DataFrame.to_clipboard(*[, excel, sep]) | Copy object to the system clipboard. |
| DataFrame.to_markdown([buf, mode, index, ...]) | Print DataFrame in Markdown-friendly format. |
| DataFrame.style | Returns a Styler object. |
| DataFrame.__dataframe__([nan_as_null, ...]) | Return the dataframe interchange object implementing the interchange protocol. |