pandas arrays, scalars, and data types — pandas 2.2.3 documentation (original) (raw)
Objects#
For most data types, pandas uses NumPy arrays as the concrete objects contained with a Index, Series, orDataFrame.
For some data types, pandas extends NumPy’s type system. String aliases for these types can be found at dtypes.
Kind of Data | pandas Data Type | Scalar | Array |
---|---|---|---|
TZ-aware datetime | DatetimeTZDtype | Timestamp | Datetimes |
Timedeltas | (none) | Timedelta | Timedeltas |
Period (time spans) | PeriodDtype | Period | Periods |
Intervals | IntervalDtype | Interval | Intervals |
Nullable Integer | Int64Dtype, … | (none) | Nullable integer |
Nullable Float | Float64Dtype, … | (none) | Nullable float |
Categorical | CategoricalDtype | (none) | Categoricals |
Sparse | SparseDtype | (none) | Sparse |
Strings | StringDtype | str | Strings |
Nullable Boolean | BooleanDtype | bool | Nullable Boolean |
PyArrow | ArrowDtype | Python Scalars or NA | PyArrow |
pandas and third-party libraries can extend NumPy’s type system (see Extension types). The top-level array() method can be used to create a new array, which may be stored in a Series, Index, or as a column in a DataFrame.
array(data[, dtype, copy]) | Create an array. |
---|
PyArrow#
Warning
This feature is experimental, and the API can change in a future release without warning.
The arrays.ArrowExtensionArray is backed by a pyarrow.ChunkedArray with apyarrow.DataType instead of a NumPy array and data type. The .dtype
of a arrays.ArrowExtensionArrayis an ArrowDtype.
Pyarrow provides similar array and data typesupport as NumPy including first-class nullability support for all data types, immutability and more.
The table below shows the equivalent pyarrow-backed (pa
), pandas extension, and numpy (np
) types that are recognized by pandas. Pyarrow-backed types below need to be passed into ArrowDtype to be recognized by pandas e.g. pd.ArrowDtype(pa.bool_())
PyArrow type | pandas extension type | NumPy type |
---|---|---|
pyarrow.bool_() | BooleanDtype | np.bool_ |
pyarrow.int8() | Int8Dtype | np.int8 |
pyarrow.int16() | Int16Dtype | np.int16 |
pyarrow.int32() | Int32Dtype | np.int32 |
pyarrow.int64() | Int64Dtype | np.int64 |
pyarrow.uint8() | UInt8Dtype | np.uint8 |
pyarrow.uint16() | UInt16Dtype | np.uint16 |
pyarrow.uint32() | UInt32Dtype | np.uint32 |
pyarrow.uint64() | UInt64Dtype | np.uint64 |
pyarrow.float32() | Float32Dtype | np.float32 |
pyarrow.float64() | Float64Dtype | np.float64 |
pyarrow.time32() | (none) | (none) |
pyarrow.time64() | (none) | (none) |
pyarrow.timestamp() | DatetimeTZDtype | np.datetime64 |
pyarrow.date32() | (none) | (none) |
pyarrow.date64() | (none) | (none) |
pyarrow.duration() | (none) | np.timedelta64 |
pyarrow.binary() | (none) | (none) |
pyarrow.string() | StringDtype | np.str_ |
pyarrow.decimal128() | (none) | (none) |
pyarrow.list_() | (none) | (none) |
pyarrow.map_() | (none) | (none) |
pyarrow.dictionary() | CategoricalDtype | (none) |
Note
Pyarrow-backed string support is provided by both pd.StringDtype("pyarrow")
and pd.ArrowDtype(pa.string())
.pd.StringDtype("pyarrow")
is described below in the string sectionand will be returned if the string alias "string[pyarrow]"
is specified. pd.ArrowDtype(pa.string())
generally has better interoperability with ArrowDtype of different types.
While individual values in an arrays.ArrowExtensionArray are stored as a PyArrow objects, scalars are returnedas Python scalars corresponding to the data type, e.g. a PyArrow int64 will be returned as Python int, or NA for missing values.
ArrowDtype(pyarrow_dtype) | An ExtensionDtype for PyArrow data types. |
---|
For more information, please see the PyArrow user guide
Datetimes#
NumPy cannot natively represent timezone-aware datetimes. pandas supports this with the arrays.DatetimeArray extension array, which can hold timezone-naive or timezone-aware values.
Timestamp, a subclass of datetime.datetime, is pandas’ scalar type for timezone-naive or timezone-aware datetime data. NaTis the missing value for datetime data.
Timestamp([ts_input, year, month, day, ...]) | Pandas replacement for python datetime.datetime object. |
---|
Properties#
Timestamp.asm8 | Return numpy datetime64 format in nanoseconds. |
---|---|
Timestamp.day | |
Timestamp.dayofweek | Return day of the week. |
Timestamp.day_of_week | Return day of the week. |
Timestamp.dayofyear | Return the day of the year. |
Timestamp.day_of_year | Return the day of the year. |
Timestamp.days_in_month | Return the number of days in the month. |
Timestamp.daysinmonth | Return the number of days in the month. |
Timestamp.fold | |
Timestamp.hour | |
Timestamp.is_leap_year | Return True if year is a leap year. |
Timestamp.is_month_end | Check if the date is the last day of the month. |
Timestamp.is_month_start | Check if the date is the first day of the month. |
Timestamp.is_quarter_end | Check if date is last day of the quarter. |
Timestamp.is_quarter_start | Check if the date is the first day of the quarter. |
Timestamp.is_year_end | Return True if date is last day of the year. |
Timestamp.is_year_start | Return True if date is first day of the year. |
Timestamp.max | |
Timestamp.microsecond | |
Timestamp.min | |
Timestamp.minute | |
Timestamp.month | |
Timestamp.nanosecond | |
Timestamp.quarter | Return the quarter of the year. |
Timestamp.resolution | |
Timestamp.second | |
Timestamp.tz | Alias for tzinfo. |
Timestamp.tzinfo | |
Timestamp.unit | The abbreviation associated with self._creso. |
Timestamp.value | |
Timestamp.week | Return the week number of the year. |
Timestamp.weekofyear | Return the week number of the year. |
Timestamp.year |
Methods#
Timestamp.as_unit(unit[, round_ok]) | Convert the underlying int64 representaton to the given unit. |
---|---|
Timestamp.astimezone(tz) | Convert timezone-aware Timestamp to another time zone. |
Timestamp.ceil(freq[, ambiguous, nonexistent]) | Return a new Timestamp ceiled to this resolution. |
Timestamp.combine(date, time) | Combine date, time into datetime with same date and time fields. |
Timestamp.ctime() | Return ctime() style string. |
Timestamp.date() | Return date object with same year, month and day. |
Timestamp.day_name([locale]) | Return the day name of the Timestamp with specified locale. |
Timestamp.dst() | Return the daylight saving time (DST) adjustment. |
Timestamp.floor(freq[, ambiguous, nonexistent]) | Return a new Timestamp floored to this resolution. |
Timestamp.fromordinal(ordinal[, tz]) | Construct a timestamp from a a proleptic Gregorian ordinal. |
Timestamp.fromtimestamp(ts) | Transform timestamp[, tz] to tz's local time from POSIX timestamp. |
Timestamp.isocalendar() | Return a named tuple containing ISO year, week number, and weekday. |
Timestamp.isoformat([sep, timespec]) | Return the time formatted according to ISO 8601. |
Timestamp.isoweekday() | Return the day of the week represented by the date. |
Timestamp.month_name([locale]) | Return the month name of the Timestamp with specified locale. |
Timestamp.normalize() | Normalize Timestamp to midnight, preserving tz information. |
Timestamp.now([tz]) | Return new Timestamp object representing current time local to tz. |
Timestamp.replace([year, month, day, hour, ...]) | Implements datetime.replace, handles nanoseconds. |
Timestamp.round(freq[, ambiguous, nonexistent]) | Round the Timestamp to the specified resolution. |
Timestamp.strftime(format) | Return a formatted string of the Timestamp. |
Timestamp.strptime(string, format) | Function is not implemented. |
Timestamp.time() | Return time object with same time but with tzinfo=None. |
Timestamp.timestamp() | Return POSIX timestamp as float. |
Timestamp.timetuple() | Return time tuple, compatible with time.localtime(). |
Timestamp.timetz() | Return time object with same time and tzinfo. |
Timestamp.to_datetime64() | Return a numpy.datetime64 object with same precision. |
Timestamp.to_numpy([dtype, copy]) | Convert the Timestamp to a NumPy datetime64. |
Timestamp.to_julian_date() | Convert TimeStamp to a Julian Date. |
Timestamp.to_period([freq]) | Return an period of which this timestamp is an observation. |
Timestamp.to_pydatetime([warn]) | Convert a Timestamp object to a native Python datetime object. |
Timestamp.today([tz]) | Return the current time in the local timezone. |
Timestamp.toordinal() | Return proleptic Gregorian ordinal. |
Timestamp.tz_convert(tz) | Convert timezone-aware Timestamp to another time zone. |
Timestamp.tz_localize(tz[, ambiguous, ...]) | Localize the Timestamp to a timezone. |
Timestamp.tzname() | Return time zone name. |
Timestamp.utcfromtimestamp(ts) | Construct a timezone-aware UTC datetime from a POSIX timestamp. |
Timestamp.utcnow() | Return a new Timestamp representing UTC day and time. |
Timestamp.utcoffset() | Return utc offset. |
Timestamp.utctimetuple() | Return UTC time tuple, compatible with time.localtime(). |
Timestamp.weekday() | Return the day of the week represented by the date. |
A collection of timestamps may be stored in a arrays.DatetimeArray. For timezone-aware data, the .dtype
of a arrays.DatetimeArray is aDatetimeTZDtype. For timezone-naive data, np.dtype("datetime64[ns]")
is used.
If the data are timezone-aware, then every value in the array must have the same timezone.
arrays.DatetimeArray(values[, dtype, freq, copy]) | Pandas ExtensionArray for tz-naive or tz-aware datetime data. |
---|
DatetimeTZDtype([unit, tz]) | An ExtensionDtype for timezone-aware datetime data. |
---|
Timedeltas#
NumPy can natively represent timedeltas. pandas provides Timedeltafor symmetry with Timestamp. NaTis the missing value for timedelta data.
Timedelta([value, unit]) | Represents a duration, the difference between two dates or times. |
---|
Properties#
Timedelta.asm8 | Return a numpy timedelta64 array scalar view. |
---|---|
Timedelta.components | Return a components namedtuple-like. |
Timedelta.days | Returns the days of the timedelta. |
Timedelta.max | |
Timedelta.microseconds | |
Timedelta.min | |
Timedelta.nanoseconds | Return the number of nanoseconds (n), where 0 <= n < 1 microsecond. |
Timedelta.resolution | |
Timedelta.seconds | Return the total hours, minutes, and seconds of the timedelta as seconds. |
Timedelta.unit | |
Timedelta.value | |
Timedelta.view(dtype) | Array view compatibility. |
Methods#
Timedelta.as_unit(unit[, round_ok]) | Convert the underlying int64 representation to the given unit. |
---|---|
Timedelta.ceil(freq) | Return a new Timedelta ceiled to this resolution. |
Timedelta.floor(freq) | Return a new Timedelta floored to this resolution. |
Timedelta.isoformat() | Format the Timedelta as ISO 8601 Duration. |
Timedelta.round(freq) | Round the Timedelta to the specified resolution. |
Timedelta.to_pytimedelta() | Convert a pandas Timedelta object into a python datetime.timedelta object. |
Timedelta.to_timedelta64() | Return a numpy.timedelta64 object with 'ns' precision. |
Timedelta.to_numpy([dtype, copy]) | Convert the Timedelta to a NumPy timedelta64. |
Timedelta.total_seconds() | Total seconds in the duration. |
A collection of Timedelta may be stored in a TimedeltaArray
.
Periods#
pandas represents spans of times as Period objects.
Period#
Period([value, freq, ordinal, year, month, ...]) | Represents a period of time. |
---|
Properties#
Period.day | Get day of the month that a Period falls on. |
---|---|
Period.dayofweek | Day of the week the period lies in, with Monday=0 and Sunday=6. |
Period.day_of_week | Day of the week the period lies in, with Monday=0 and Sunday=6. |
Period.dayofyear | Return the day of the year. |
Period.day_of_year | Return the day of the year. |
Period.days_in_month | Get the total number of days in the month that this period falls on. |
Period.daysinmonth | Get the total number of days of the month that this period falls on. |
Period.end_time | Get the Timestamp for the end of the period. |
Period.freq | |
Period.freqstr | Return a string representation of the frequency. |
Period.hour | Get the hour of the day component of the Period. |
Period.is_leap_year | Return True if the period's year is in a leap year. |
Period.minute | Get minute of the hour component of the Period. |
Period.month | Return the month this Period falls on. |
Period.ordinal | |
Period.quarter | Return the quarter this Period falls on. |
Period.qyear | Fiscal year the Period lies in according to its starting-quarter. |
Period.second | Get the second component of the Period. |
Period.start_time | Get the Timestamp for the start of the period. |
Period.week | Get the week of the year on the given Period. |
Period.weekday | Day of the week the period lies in, with Monday=0 and Sunday=6. |
Period.weekofyear | Get the week of the year on the given Period. |
Period.year | Return the year this Period falls on. |
Methods#
A collection of Period may be stored in a arrays.PeriodArray. Every period in a arrays.PeriodArray must have the same freq
.
arrays.PeriodArray(values[, dtype, freq, copy]) | Pandas ExtensionArray for storing Period data. |
---|
Intervals#
Arbitrary intervals can be represented as Interval objects.
Interval | Immutable object implementing an Interval, a bounded slice-like interval. |
---|
Properties#
Interval.closed | String describing the inclusive side the intervals. |
---|---|
Interval.closed_left | Check if the interval is closed on the left side. |
Interval.closed_right | Check if the interval is closed on the right side. |
Interval.is_empty | Indicates if an interval is empty, meaning it contains no points. |
Interval.left | Left bound for the interval. |
Interval.length | Return the length of the Interval. |
Interval.mid | Return the midpoint of the Interval. |
Interval.open_left | Check if the interval is open on the left side. |
Interval.open_right | Check if the interval is open on the right side. |
Interval.overlaps(other) | Check whether two Interval objects overlap. |
Interval.right | Right bound for the interval. |
A collection of intervals may be stored in an arrays.IntervalArray.
arrays.IntervalArray(data[, closed, dtype, ...]) | Pandas array for interval data that are closed on the same side. |
---|
IntervalDtype([subtype, closed]) | An ExtensionDtype for Interval data. |
---|
Nullable integer#
numpy.ndarray cannot natively represent integer-data with missing values. pandas provides this through arrays.IntegerArray.
Int8Dtype() | An ExtensionDtype for int8 integer data. |
---|---|
Int16Dtype() | An ExtensionDtype for int16 integer data. |
Int32Dtype() | An ExtensionDtype for int32 integer data. |
Int64Dtype() | An ExtensionDtype for int64 integer data. |
UInt8Dtype() | An ExtensionDtype for uint8 integer data. |
UInt16Dtype() | An ExtensionDtype for uint16 integer data. |
UInt32Dtype() | An ExtensionDtype for uint32 integer data. |
UInt64Dtype() | An ExtensionDtype for uint64 integer data. |
Nullable float#
Categoricals#
pandas defines a custom data type for representing data that can take only a limited, fixed set of values. The dtype of a Categorical can be described by a CategoricalDtype.
CategoricalDtype([categories, ordered]) | Type for categorical data with the categories and orderedness. |
---|
Categorical data can be stored in a pandas.Categorical
Categorical(values[, categories, ordered, ...]) | Represent a categorical variable in classic R / S-plus fashion. |
---|
The alternative Categorical.from_codes() constructor can be used when you have the categories and integer codes already:
The dtype information is available on the Categorical
np.asarray(categorical)
works by implementing the array interface. Be aware, that this converts the Categorical back to a NumPy array, so categories and order information is not preserved!
A Categorical can be stored in a Series or DataFrame. To create a Series of dtype category
, use cat = s.astype(dtype)
orSeries(..., dtype=dtype)
where dtype
is either
- the string
'category'
- an instance of CategoricalDtype.
If the Series is of dtype CategoricalDtype, Series.cat
can be used to change the categorical data. See Categorical accessor for more.
Sparse#
Data where a single value is repeated many times (e.g. 0
or NaN
) may be stored efficiently as a arrays.SparseArray.
SparseDtype([dtype, fill_value]) | Dtype for data stored in SparseArray. |
---|
The Series.sparse
accessor may be used to access sparse-specific attributes and methods if the Series contains sparse values. SeeSparse accessor and the user guide for more.
Strings#
When working with text data, where each valid element is a string or missing, we recommend using StringDtype (with the alias "string"
).
StringDtype([storage]) | Extension dtype for string data. |
---|
The Series.str
accessor is available for Series backed by a arrays.StringArray. See String handling for more.
Nullable Boolean#
The boolean dtype (with the alias "boolean"
) provides support for storing boolean data (True
, False
) with missing values, which is not possible with a bool numpy.ndarray.
arrays.BooleanArray(values, mask[, copy]) | Array of boolean (True/False) data with missing values. |
---|
Utilities#
Constructors#
Data type introspection#
api.types.is_any_real_numeric_dtype(arr_or_dtype) | Check whether the provided array or dtype is of a real number dtype. |
---|---|
api.types.is_bool_dtype(arr_or_dtype) | Check whether the provided array or dtype is of a boolean dtype. |
api.types.is_categorical_dtype(arr_or_dtype) | (DEPRECATED) Check whether an array-like or dtype is of the Categorical dtype. |
api.types.is_complex_dtype(arr_or_dtype) | Check whether the provided array or dtype is of a complex dtype. |
api.types.is_datetime64_any_dtype(arr_or_dtype) | Check whether the provided array or dtype is of the datetime64 dtype. |
api.types.is_datetime64_dtype(arr_or_dtype) | Check whether an array-like or dtype is of the datetime64 dtype. |
api.types.is_datetime64_ns_dtype(arr_or_dtype) | Check whether the provided array or dtype is of the datetime64[ns] dtype. |
api.types.is_datetime64tz_dtype(arr_or_dtype) | (DEPRECATED) Check whether an array-like or dtype is of a DatetimeTZDtype dtype. |
api.types.is_extension_array_dtype(arr_or_dtype) | Check if an object is a pandas extension array type. |
api.types.is_float_dtype(arr_or_dtype) | Check whether the provided array or dtype is of a float dtype. |
api.types.is_int64_dtype(arr_or_dtype) | (DEPRECATED) Check whether the provided array or dtype is of the int64 dtype. |
api.types.is_integer_dtype(arr_or_dtype) | Check whether the provided array or dtype is of an integer dtype. |
api.types.is_interval_dtype(arr_or_dtype) | (DEPRECATED) Check whether an array-like or dtype is of the Interval dtype. |
api.types.is_numeric_dtype(arr_or_dtype) | Check whether the provided array or dtype is of a numeric dtype. |
api.types.is_object_dtype(arr_or_dtype) | Check whether an array-like or dtype is of the object dtype. |
api.types.is_period_dtype(arr_or_dtype) | (DEPRECATED) Check whether an array-like or dtype is of the Period dtype. |
api.types.is_signed_integer_dtype(arr_or_dtype) | Check whether the provided array or dtype is of a signed integer dtype. |
api.types.is_string_dtype(arr_or_dtype) | Check whether the provided array or dtype is of the string dtype. |
api.types.is_timedelta64_dtype(arr_or_dtype) | Check whether an array-like or dtype is of the timedelta64 dtype. |
api.types.is_timedelta64_ns_dtype(arr_or_dtype) | Check whether the provided array or dtype is of the timedelta64[ns] dtype. |
api.types.is_unsigned_integer_dtype(arr_or_dtype) | Check whether the provided array or dtype is of an unsigned integer dtype. |
api.types.is_sparse(arr) | (DEPRECATED) Check whether an array-like is a 1-D pandas sparse array. |
Iterable introspection#
Scalar introspection#
api.types.is_bool(obj) | Return True if given object is boolean. |
---|---|
api.types.is_complex(obj) | Return True if given object is complex. |
api.types.is_float(obj) | Return True if given object is float. |
api.types.is_hashable(obj) | Return True if hash(obj) will succeed, False otherwise. |
api.types.is_integer(obj) | Return True if given object is integer. |
api.types.is_interval(obj) | |
api.types.is_number(obj) | Check if the object is a number. |
api.types.is_re(obj) | Check if the object is a regex pattern instance. |
api.types.is_re_compilable(obj) | Check if the object can be compiled into a regex pattern instance. |
api.types.is_scalar(val) | Return True if given object is scalar. |