pandas.DataFrame.update — pandas 2.3.0 documentation (original) (raw)

DataFrame.update(other, join='left', overwrite=True, filter_func=None, errors='ignore')[source]#

Modify in place using non-NA values from another DataFrame.

Aligns on indices. There is no return value.

Parameters:

otherDataFrame, or object coercible into a DataFrame

Should have at least one matching index/column label with the original DataFrame. If a Series is passed, its name attribute must be set, and that will be used as the column name to align with the original DataFrame.

join{‘left’}, default ‘left’

Only left join is implemented, keeping the index and columns of the original object.

overwritebool, default True

How to handle non-NA values for overlapping keys:

True: overwrite original DataFrame’s values with values from other.
False: only update values that are NA in the original DataFrame.

filter_funccallable(1d-array) -> bool 1d-array, optional

Can choose to replace values other than NA. Return True for values that should be updated.

errors{‘raise’, ‘ignore’}, default ‘ignore’

If ‘raise’, will raise a ValueError if the DataFrame and otherboth contain non-NA data in the same place.

Returns:

None

This method directly changes calling object.

Raises:

ValueError

When errors=’raise’ and there’s overlapping non-NA data.
When errors is not either ‘ignore’ or ‘raise’

NotImplementedError

If join != ‘left’

Examples

df = pd.DataFrame({'A': [1, 2, 3], ... 'B': [400, 500, 600]}) new_df = pd.DataFrame({'B': [4, 5, 6], ... 'C': [7, 8, 9]}) df.update(new_df) df A B 0 1 4 1 2 5 2 3 6

The DataFrame’s length does not increase as a result of the update, only values at matching index/column labels are updated.

df = pd.DataFrame({'A': ['a', 'b', 'c'], ... 'B': ['x', 'y', 'z']}) new_df = pd.DataFrame({'B': ['d', 'e', 'f', 'g', 'h', 'i']}) df.update(new_df) df A B 0 a d 1 b e 2 c f

df = pd.DataFrame({'A': ['a', 'b', 'c'], ... 'B': ['x', 'y', 'z']}) new_df = pd.DataFrame({'B': ['d', 'f']}, index=[0, 2]) df.update(new_df) df A B 0 a d 1 b y 2 c f

For Series, its name attribute must be set.

df = pd.DataFrame({'A': ['a', 'b', 'c'], ... 'B': ['x', 'y', 'z']}) new_column = pd.Series(['d', 'e', 'f'], name='B') df.update(new_column) df A B 0 a d 1 b e 2 c f

If other contains NaNs the corresponding values are not updated in the original dataframe.

df = pd.DataFrame({'A': [1, 2, 3], ... 'B': [400., 500., 600.]}) new_df = pd.DataFrame({'B': [4, np.nan, 6]}) df.update(new_df) df A B 0 1 4.0 1 2 500.0 2 3 6.0