pandas arrays, scalars, and data types — pandas 2.2.3 documentation (original) (raw)

Objects#

For most data types, pandas uses NumPy arrays as the concrete objects contained with a Index, Series, orDataFrame.

For some data types, pandas extends NumPy’s type system. String aliases for these types can be found at dtypes.

Kind of Data pandas Data Type Scalar Array
TZ-aware datetime DatetimeTZDtype Timestamp Datetimes
Timedeltas (none) Timedelta Timedeltas
Period (time spans) PeriodDtype Period Periods
Intervals IntervalDtype Interval Intervals
Nullable Integer Int64Dtype, … (none) Nullable integer
Nullable Float Float64Dtype, … (none) Nullable float
Categorical CategoricalDtype (none) Categoricals
Sparse SparseDtype (none) Sparse
Strings StringDtype str Strings
Nullable Boolean BooleanDtype bool Nullable Boolean
PyArrow ArrowDtype Python Scalars or NA PyArrow

pandas and third-party libraries can extend NumPy’s type system (see Extension types). The top-level array() method can be used to create a new array, which may be stored in a Series, Index, or as a column in a DataFrame.

array(data[, dtype, copy]) Create an array.

PyArrow#

Warning

This feature is experimental, and the API can change in a future release without warning.

The arrays.ArrowExtensionArray is backed by a pyarrow.ChunkedArray with apyarrow.DataType instead of a NumPy array and data type. The .dtype of a arrays.ArrowExtensionArrayis an ArrowDtype.

Pyarrow provides similar array and data typesupport as NumPy including first-class nullability support for all data types, immutability and more.

The table below shows the equivalent pyarrow-backed (pa), pandas extension, and numpy (np) types that are recognized by pandas. Pyarrow-backed types below need to be passed into ArrowDtype to be recognized by pandas e.g. pd.ArrowDtype(pa.bool_())

PyArrow type pandas extension type NumPy type
pyarrow.bool_() BooleanDtype np.bool_
pyarrow.int8() Int8Dtype np.int8
pyarrow.int16() Int16Dtype np.int16
pyarrow.int32() Int32Dtype np.int32
pyarrow.int64() Int64Dtype np.int64
pyarrow.uint8() UInt8Dtype np.uint8
pyarrow.uint16() UInt16Dtype np.uint16
pyarrow.uint32() UInt32Dtype np.uint32
pyarrow.uint64() UInt64Dtype np.uint64
pyarrow.float32() Float32Dtype np.float32
pyarrow.float64() Float64Dtype np.float64
pyarrow.time32() (none) (none)
pyarrow.time64() (none) (none)
pyarrow.timestamp() DatetimeTZDtype np.datetime64
pyarrow.date32() (none) (none)
pyarrow.date64() (none) (none)
pyarrow.duration() (none) np.timedelta64
pyarrow.binary() (none) (none)
pyarrow.string() StringDtype np.str_
pyarrow.decimal128() (none) (none)
pyarrow.list_() (none) (none)
pyarrow.map_() (none) (none)
pyarrow.dictionary() CategoricalDtype (none)

Note

Pyarrow-backed string support is provided by both pd.StringDtype("pyarrow") and pd.ArrowDtype(pa.string()).pd.StringDtype("pyarrow") is described below in the string sectionand will be returned if the string alias "string[pyarrow]" is specified. pd.ArrowDtype(pa.string())generally has better interoperability with ArrowDtype of different types.

While individual values in an arrays.ArrowExtensionArray are stored as a PyArrow objects, scalars are returnedas Python scalars corresponding to the data type, e.g. a PyArrow int64 will be returned as Python int, or NA for missing values.

ArrowDtype(pyarrow_dtype) An ExtensionDtype for PyArrow data types.

For more information, please see the PyArrow user guide

Datetimes#

NumPy cannot natively represent timezone-aware datetimes. pandas supports this with the arrays.DatetimeArray extension array, which can hold timezone-naive or timezone-aware values.

Timestamp, a subclass of datetime.datetime, is pandas’ scalar type for timezone-naive or timezone-aware datetime data. NaTis the missing value for datetime data.

Timestamp([ts_input, year, month, day, ...]) Pandas replacement for python datetime.datetime object.

Properties#

Timestamp.asm8 Return numpy datetime64 format in nanoseconds.
Timestamp.day
Timestamp.dayofweek Return day of the week.
Timestamp.day_of_week Return day of the week.
Timestamp.dayofyear Return the day of the year.
Timestamp.day_of_year Return the day of the year.
Timestamp.days_in_month Return the number of days in the month.
Timestamp.daysinmonth Return the number of days in the month.
Timestamp.fold
Timestamp.hour
Timestamp.is_leap_year Return True if year is a leap year.
Timestamp.is_month_end Check if the date is the last day of the month.
Timestamp.is_month_start Check if the date is the first day of the month.
Timestamp.is_quarter_end Check if date is last day of the quarter.
Timestamp.is_quarter_start Check if the date is the first day of the quarter.
Timestamp.is_year_end Return True if date is last day of the year.
Timestamp.is_year_start Return True if date is first day of the year.
Timestamp.max
Timestamp.microsecond
Timestamp.min
Timestamp.minute
Timestamp.month
Timestamp.nanosecond
Timestamp.quarter Return the quarter of the year.
Timestamp.resolution
Timestamp.second
Timestamp.tz Alias for tzinfo.
Timestamp.tzinfo
Timestamp.unit The abbreviation associated with self._creso.
Timestamp.value
Timestamp.week Return the week number of the year.
Timestamp.weekofyear Return the week number of the year.
Timestamp.year

Methods#

Timestamp.as_unit(unit[, round_ok]) Convert the underlying int64 representaton to the given unit.
Timestamp.astimezone(tz) Convert timezone-aware Timestamp to another time zone.
Timestamp.ceil(freq[, ambiguous, nonexistent]) Return a new Timestamp ceiled to this resolution.
Timestamp.combine(date, time) Combine date, time into datetime with same date and time fields.
Timestamp.ctime() Return ctime() style string.
Timestamp.date() Return date object with same year, month and day.
Timestamp.day_name([locale]) Return the day name of the Timestamp with specified locale.
Timestamp.dst() Return the daylight saving time (DST) adjustment.
Timestamp.floor(freq[, ambiguous, nonexistent]) Return a new Timestamp floored to this resolution.
Timestamp.fromordinal(ordinal[, tz]) Construct a timestamp from a a proleptic Gregorian ordinal.
Timestamp.fromtimestamp(ts) Transform timestamp[, tz] to tz's local time from POSIX timestamp.
Timestamp.isocalendar() Return a named tuple containing ISO year, week number, and weekday.
Timestamp.isoformat([sep, timespec]) Return the time formatted according to ISO 8601.
Timestamp.isoweekday() Return the day of the week represented by the date.
Timestamp.month_name([locale]) Return the month name of the Timestamp with specified locale.
Timestamp.normalize() Normalize Timestamp to midnight, preserving tz information.
Timestamp.now([tz]) Return new Timestamp object representing current time local to tz.
Timestamp.replace([year, month, day, hour, ...]) Implements datetime.replace, handles nanoseconds.
Timestamp.round(freq[, ambiguous, nonexistent]) Round the Timestamp to the specified resolution.
Timestamp.strftime(format) Return a formatted string of the Timestamp.
Timestamp.strptime(string, format) Function is not implemented.
Timestamp.time() Return time object with same time but with tzinfo=None.
Timestamp.timestamp() Return POSIX timestamp as float.
Timestamp.timetuple() Return time tuple, compatible with time.localtime().
Timestamp.timetz() Return time object with same time and tzinfo.
Timestamp.to_datetime64() Return a numpy.datetime64 object with same precision.
Timestamp.to_numpy([dtype, copy]) Convert the Timestamp to a NumPy datetime64.
Timestamp.to_julian_date() Convert TimeStamp to a Julian Date.
Timestamp.to_period([freq]) Return an period of which this timestamp is an observation.
Timestamp.to_pydatetime([warn]) Convert a Timestamp object to a native Python datetime object.
Timestamp.today([tz]) Return the current time in the local timezone.
Timestamp.toordinal() Return proleptic Gregorian ordinal.
Timestamp.tz_convert(tz) Convert timezone-aware Timestamp to another time zone.
Timestamp.tz_localize(tz[, ambiguous, ...]) Localize the Timestamp to a timezone.
Timestamp.tzname() Return time zone name.
Timestamp.utcfromtimestamp(ts) Construct a timezone-aware UTC datetime from a POSIX timestamp.
Timestamp.utcnow() Return a new Timestamp representing UTC day and time.
Timestamp.utcoffset() Return utc offset.
Timestamp.utctimetuple() Return UTC time tuple, compatible with time.localtime().
Timestamp.weekday() Return the day of the week represented by the date.

A collection of timestamps may be stored in a arrays.DatetimeArray. For timezone-aware data, the .dtype of a arrays.DatetimeArray is aDatetimeTZDtype. For timezone-naive data, np.dtype("datetime64[ns]")is used.

If the data are timezone-aware, then every value in the array must have the same timezone.

arrays.DatetimeArray(values[, dtype, freq, copy]) Pandas ExtensionArray for tz-naive or tz-aware datetime data.
DatetimeTZDtype([unit, tz]) An ExtensionDtype for timezone-aware datetime data.

Timedeltas#

NumPy can natively represent timedeltas. pandas provides Timedeltafor symmetry with Timestamp. NaTis the missing value for timedelta data.

Timedelta([value, unit]) Represents a duration, the difference between two dates or times.

Properties#

Timedelta.asm8 Return a numpy timedelta64 array scalar view.
Timedelta.components Return a components namedtuple-like.
Timedelta.days Returns the days of the timedelta.
Timedelta.max
Timedelta.microseconds
Timedelta.min
Timedelta.nanoseconds Return the number of nanoseconds (n), where 0 <= n < 1 microsecond.
Timedelta.resolution
Timedelta.seconds Return the total hours, minutes, and seconds of the timedelta as seconds.
Timedelta.unit
Timedelta.value
Timedelta.view(dtype) Array view compatibility.

Methods#

Timedelta.as_unit(unit[, round_ok]) Convert the underlying int64 representation to the given unit.
Timedelta.ceil(freq) Return a new Timedelta ceiled to this resolution.
Timedelta.floor(freq) Return a new Timedelta floored to this resolution.
Timedelta.isoformat() Format the Timedelta as ISO 8601 Duration.
Timedelta.round(freq) Round the Timedelta to the specified resolution.
Timedelta.to_pytimedelta() Convert a pandas Timedelta object into a python datetime.timedelta object.
Timedelta.to_timedelta64() Return a numpy.timedelta64 object with 'ns' precision.
Timedelta.to_numpy([dtype, copy]) Convert the Timedelta to a NumPy timedelta64.
Timedelta.total_seconds() Total seconds in the duration.

A collection of Timedelta may be stored in a TimedeltaArray.

Periods#

pandas represents spans of times as Period objects.

Period#

Period([value, freq, ordinal, year, month, ...]) Represents a period of time.

Properties#

Period.day Get day of the month that a Period falls on.
Period.dayofweek Day of the week the period lies in, with Monday=0 and Sunday=6.
Period.day_of_week Day of the week the period lies in, with Monday=0 and Sunday=6.
Period.dayofyear Return the day of the year.
Period.day_of_year Return the day of the year.
Period.days_in_month Get the total number of days in the month that this period falls on.
Period.daysinmonth Get the total number of days of the month that this period falls on.
Period.end_time Get the Timestamp for the end of the period.
Period.freq
Period.freqstr Return a string representation of the frequency.
Period.hour Get the hour of the day component of the Period.
Period.is_leap_year Return True if the period's year is in a leap year.
Period.minute Get minute of the hour component of the Period.
Period.month Return the month this Period falls on.
Period.ordinal
Period.quarter Return the quarter this Period falls on.
Period.qyear Fiscal year the Period lies in according to its starting-quarter.
Period.second Get the second component of the Period.
Period.start_time Get the Timestamp for the start of the period.
Period.week Get the week of the year on the given Period.
Period.weekday Day of the week the period lies in, with Monday=0 and Sunday=6.
Period.weekofyear Get the week of the year on the given Period.
Period.year Return the year this Period falls on.

Methods#

A collection of Period may be stored in a arrays.PeriodArray. Every period in a arrays.PeriodArray must have the same freq.

arrays.PeriodArray(values[, dtype, freq, copy]) Pandas ExtensionArray for storing Period data.

Intervals#

Arbitrary intervals can be represented as Interval objects.

Interval Immutable object implementing an Interval, a bounded slice-like interval.

Properties#

Interval.closed String describing the inclusive side the intervals.
Interval.closed_left Check if the interval is closed on the left side.
Interval.closed_right Check if the interval is closed on the right side.
Interval.is_empty Indicates if an interval is empty, meaning it contains no points.
Interval.left Left bound for the interval.
Interval.length Return the length of the Interval.
Interval.mid Return the midpoint of the Interval.
Interval.open_left Check if the interval is open on the left side.
Interval.open_right Check if the interval is open on the right side.
Interval.overlaps(other) Check whether two Interval objects overlap.
Interval.right Right bound for the interval.

A collection of intervals may be stored in an arrays.IntervalArray.

arrays.IntervalArray(data[, closed, dtype, ...]) Pandas array for interval data that are closed on the same side.
IntervalDtype([subtype, closed]) An ExtensionDtype for Interval data.

Nullable integer#

numpy.ndarray cannot natively represent integer-data with missing values. pandas provides this through arrays.IntegerArray.

Int8Dtype() An ExtensionDtype for int8 integer data.
Int16Dtype() An ExtensionDtype for int16 integer data.
Int32Dtype() An ExtensionDtype for int32 integer data.
Int64Dtype() An ExtensionDtype for int64 integer data.
UInt8Dtype() An ExtensionDtype for uint8 integer data.
UInt16Dtype() An ExtensionDtype for uint16 integer data.
UInt32Dtype() An ExtensionDtype for uint32 integer data.
UInt64Dtype() An ExtensionDtype for uint64 integer data.

Nullable float#

Categoricals#

pandas defines a custom data type for representing data that can take only a limited, fixed set of values. The dtype of a Categorical can be described by a CategoricalDtype.

CategoricalDtype([categories, ordered]) Type for categorical data with the categories and orderedness.

Categorical data can be stored in a pandas.Categorical

Categorical(values[, categories, ordered, ...]) Represent a categorical variable in classic R / S-plus fashion.

The alternative Categorical.from_codes() constructor can be used when you have the categories and integer codes already:

The dtype information is available on the Categorical

np.asarray(categorical) works by implementing the array interface. Be aware, that this converts the Categorical back to a NumPy array, so categories and order information is not preserved!

A Categorical can be stored in a Series or DataFrame. To create a Series of dtype category, use cat = s.astype(dtype) orSeries(..., dtype=dtype) where dtype is either

If the Series is of dtype CategoricalDtype, Series.cat can be used to change the categorical data. See Categorical accessor for more.

Sparse#

Data where a single value is repeated many times (e.g. 0 or NaN) may be stored efficiently as a arrays.SparseArray.

SparseDtype([dtype, fill_value]) Dtype for data stored in SparseArray.

The Series.sparse accessor may be used to access sparse-specific attributes and methods if the Series contains sparse values. SeeSparse accessor and the user guide for more.

Strings#

When working with text data, where each valid element is a string or missing, we recommend using StringDtype (with the alias "string").

StringDtype([storage]) Extension dtype for string data.

The Series.str accessor is available for Series backed by a arrays.StringArray. See String handling for more.

Nullable Boolean#

The boolean dtype (with the alias "boolean") provides support for storing boolean data (True, False) with missing values, which is not possible with a bool numpy.ndarray.

arrays.BooleanArray(values, mask[, copy]) Array of boolean (True/False) data with missing values.

Utilities#

Constructors#

Data type introspection#

api.types.is_any_real_numeric_dtype(arr_or_dtype) Check whether the provided array or dtype is of a real number dtype.
api.types.is_bool_dtype(arr_or_dtype) Check whether the provided array or dtype is of a boolean dtype.
api.types.is_categorical_dtype(arr_or_dtype) (DEPRECATED) Check whether an array-like or dtype is of the Categorical dtype.
api.types.is_complex_dtype(arr_or_dtype) Check whether the provided array or dtype is of a complex dtype.
api.types.is_datetime64_any_dtype(arr_or_dtype) Check whether the provided array or dtype is of the datetime64 dtype.
api.types.is_datetime64_dtype(arr_or_dtype) Check whether an array-like or dtype is of the datetime64 dtype.
api.types.is_datetime64_ns_dtype(arr_or_dtype) Check whether the provided array or dtype is of the datetime64[ns] dtype.
api.types.is_datetime64tz_dtype(arr_or_dtype) (DEPRECATED) Check whether an array-like or dtype is of a DatetimeTZDtype dtype.
api.types.is_extension_array_dtype(arr_or_dtype) Check if an object is a pandas extension array type.
api.types.is_float_dtype(arr_or_dtype) Check whether the provided array or dtype is of a float dtype.
api.types.is_int64_dtype(arr_or_dtype) (DEPRECATED) Check whether the provided array or dtype is of the int64 dtype.
api.types.is_integer_dtype(arr_or_dtype) Check whether the provided array or dtype is of an integer dtype.
api.types.is_interval_dtype(arr_or_dtype) (DEPRECATED) Check whether an array-like or dtype is of the Interval dtype.
api.types.is_numeric_dtype(arr_or_dtype) Check whether the provided array or dtype is of a numeric dtype.
api.types.is_object_dtype(arr_or_dtype) Check whether an array-like or dtype is of the object dtype.
api.types.is_period_dtype(arr_or_dtype) (DEPRECATED) Check whether an array-like or dtype is of the Period dtype.
api.types.is_signed_integer_dtype(arr_or_dtype) Check whether the provided array or dtype is of a signed integer dtype.
api.types.is_string_dtype(arr_or_dtype) Check whether the provided array or dtype is of the string dtype.
api.types.is_timedelta64_dtype(arr_or_dtype) Check whether an array-like or dtype is of the timedelta64 dtype.
api.types.is_timedelta64_ns_dtype(arr_or_dtype) Check whether the provided array or dtype is of the timedelta64[ns] dtype.
api.types.is_unsigned_integer_dtype(arr_or_dtype) Check whether the provided array or dtype is of an unsigned integer dtype.
api.types.is_sparse(arr) (DEPRECATED) Check whether an array-like is a 1-D pandas sparse array.

Iterable introspection#

Scalar introspection#

api.types.is_bool(obj) Return True if given object is boolean.
api.types.is_complex(obj) Return True if given object is complex.
api.types.is_float(obj) Return True if given object is float.
api.types.is_hashable(obj) Return True if hash(obj) will succeed, False otherwise.
api.types.is_integer(obj) Return True if given object is integer.
api.types.is_interval(obj)
api.types.is_number(obj) Check if the object is a number.
api.types.is_re(obj) Check if the object is a regex pattern instance.
api.types.is_re_compilable(obj) Check if the object can be compiled into a regex pattern instance.
api.types.is_scalar(val) Return True if given object is scalar.