pandas arrays, scalars, and data types — pandas 2.3.3 documentation (original) (raw)
Objects#
For most data types, pandas uses NumPy arrays as the concrete objects contained with a Index, Series, orDataFrame.
For some data types, pandas extends NumPy’s type system. String aliases for these types can be found at dtypes.
| Kind of Data | pandas Data Type | Scalar | Array |
|---|---|---|---|
| TZ-aware datetime | DatetimeTZDtype | Timestamp | Datetimes |
| Timedeltas | (none) | Timedelta | Timedeltas |
| Period (time spans) | PeriodDtype | Period | Periods |
| Intervals | IntervalDtype | Interval | Intervals |
| Nullable Integer | Int64Dtype, … | (none) | Nullable integer |
| Nullable Float | Float64Dtype, … | (none) | Nullable float |
| Categorical | CategoricalDtype | (none) | Categoricals |
| Sparse | SparseDtype | (none) | Sparse |
| Strings | StringDtype | str | Strings |
| Nullable Boolean | BooleanDtype | bool | Nullable Boolean |
| PyArrow | ArrowDtype | Python Scalars or NA | PyArrow |
pandas and third-party libraries can extend NumPy’s type system (see Extension types). The top-level array() method can be used to create a new array, which may be stored in a Series, Index, or as a column in a DataFrame.
| array(data[, dtype, copy]) | Create an array. |
|---|
PyArrow#
Warning
This feature is experimental, and the API can change in a future release without warning.
The arrays.ArrowExtensionArray is backed by a pyarrow.ChunkedArray with apyarrow.DataType instead of a NumPy array and data type. The .dtype of a arrays.ArrowExtensionArrayis an ArrowDtype.
Pyarrow provides similar array and data typesupport as NumPy including first-class nullability support for all data types, immutability and more.
The table below shows the equivalent pyarrow-backed (pa), pandas extension, and numpy (np) types that are recognized by pandas. Pyarrow-backed types below need to be passed into ArrowDtype to be recognized by pandas e.g. pd.ArrowDtype(pa.bool_())
| PyArrow type | pandas extension type | NumPy type |
|---|---|---|
| pyarrow.bool_() | BooleanDtype | np.bool_ |
| pyarrow.int8() | Int8Dtype | np.int8 |
| pyarrow.int16() | Int16Dtype | np.int16 |
| pyarrow.int32() | Int32Dtype | np.int32 |
| pyarrow.int64() | Int64Dtype | np.int64 |
| pyarrow.uint8() | UInt8Dtype | np.uint8 |
| pyarrow.uint16() | UInt16Dtype | np.uint16 |
| pyarrow.uint32() | UInt32Dtype | np.uint32 |
| pyarrow.uint64() | UInt64Dtype | np.uint64 |
| pyarrow.float32() | Float32Dtype | np.float32 |
| pyarrow.float64() | Float64Dtype | np.float64 |
| pyarrow.time32() | (none) | (none) |
| pyarrow.time64() | (none) | (none) |
| pyarrow.timestamp() | DatetimeTZDtype | np.datetime64 |
| pyarrow.date32() | (none) | (none) |
| pyarrow.date64() | (none) | (none) |
| pyarrow.duration() | (none) | np.timedelta64 |
| pyarrow.binary() | (none) | (none) |
| pyarrow.string() | StringDtype | np.str_ |
| pyarrow.decimal128() | (none) | (none) |
| pyarrow.list_() | (none) | (none) |
| pyarrow.map_() | (none) | (none) |
| pyarrow.dictionary() | CategoricalDtype | (none) |
Note
Pyarrow-backed string support is provided by both pd.StringDtype("pyarrow") and pd.ArrowDtype(pa.string()).pd.StringDtype("pyarrow") is described below in the string sectionand will be returned if the string alias "string[pyarrow]" is specified. pd.ArrowDtype(pa.string())generally has better interoperability with ArrowDtype of different types.
While individual values in an arrays.ArrowExtensionArray are stored as a PyArrow objects, scalars are returnedas Python scalars corresponding to the data type, e.g. a PyArrow int64 will be returned as Python int, or NA for missing values.
| ArrowDtype(pyarrow_dtype) | An ExtensionDtype for PyArrow data types. |
|---|
For more information, please see the PyArrow user guide
Datetimes#
NumPy cannot natively represent timezone-aware datetimes. pandas supports this with the arrays.DatetimeArray extension array, which can hold timezone-naive or timezone-aware values.
Timestamp, a subclass of datetime.datetime, is pandas’ scalar type for timezone-naive or timezone-aware datetime data. NaTis the missing value for datetime data.
| Timestamp([ts_input, year, month, day, ...]) | Pandas replacement for python datetime.datetime object. |
|---|
Properties#
| Timestamp.asm8 | Return numpy datetime64 format in nanoseconds. |
|---|---|
| Timestamp.day | |
| Timestamp.dayofweek | Return day of the week. |
| Timestamp.day_of_week | Return day of the week. |
| Timestamp.dayofyear | Return the day of the year. |
| Timestamp.day_of_year | Return the day of the year. |
| Timestamp.days_in_month | Return the number of days in the month. |
| Timestamp.daysinmonth | Return the number of days in the month. |
| Timestamp.fold | |
| Timestamp.hour | |
| Timestamp.is_leap_year | Return True if year is a leap year. |
| Timestamp.is_month_end | Check if the date is the last day of the month. |
| Timestamp.is_month_start | Check if the date is the first day of the month. |
| Timestamp.is_quarter_end | Check if date is last day of the quarter. |
| Timestamp.is_quarter_start | Check if the date is the first day of the quarter. |
| Timestamp.is_year_end | Return True if date is last day of the year. |
| Timestamp.is_year_start | Return True if date is first day of the year. |
| Timestamp.max | |
| Timestamp.microsecond | |
| Timestamp.min | |
| Timestamp.minute | |
| Timestamp.month | |
| Timestamp.nanosecond | |
| Timestamp.quarter | Return the quarter of the year. |
| Timestamp.resolution | |
| Timestamp.second | |
| Timestamp.tz | Alias for tzinfo. |
| Timestamp.tzinfo | |
| Timestamp.unit | The abbreviation associated with self._creso. |
| Timestamp.value | |
| Timestamp.week | Return the week number of the year. |
| Timestamp.weekofyear | Return the week number of the year. |
| Timestamp.year |
Methods#
| Timestamp.as_unit(unit[, round_ok]) | Convert the underlying int64 representaton to the given unit. |
|---|---|
| Timestamp.astimezone(tz) | Convert timezone-aware Timestamp to another time zone. |
| Timestamp.ceil(freq[, ambiguous, nonexistent]) | Return a new Timestamp ceiled to this resolution. |
| Timestamp.combine(date, time) | Combine date, time into datetime with same date and time fields. |
| Timestamp.ctime() | Return ctime() style string. |
| Timestamp.date() | Return date object with same year, month and day. |
| Timestamp.day_name([locale]) | Return the day name of the Timestamp with specified locale. |
| Timestamp.dst() | Return the daylight saving time (DST) adjustment. |
| Timestamp.floor(freq[, ambiguous, nonexistent]) | Return a new Timestamp floored to this resolution. |
| Timestamp.fromordinal(ordinal[, tz]) | Construct a timestamp from a a proleptic Gregorian ordinal. |
| Timestamp.fromtimestamp(ts) | Transform timestamp[, tz] to tz's local time from POSIX timestamp. |
| Timestamp.isocalendar() | Return a named tuple containing ISO year, week number, and weekday. |
| Timestamp.isoformat([sep, timespec]) | Return the time formatted according to ISO 8601. |
| Timestamp.isoweekday() | Return the day of the week represented by the date. |
| Timestamp.month_name([locale]) | Return the month name of the Timestamp with specified locale. |
| Timestamp.normalize() | Normalize Timestamp to midnight, preserving tz information. |
| Timestamp.now([tz]) | Return new Timestamp object representing current time local to tz. |
| Timestamp.replace([year, month, day, hour, ...]) | Implements datetime.replace, handles nanoseconds. |
| Timestamp.round(freq[, ambiguous, nonexistent]) | Round the Timestamp to the specified resolution. |
| Timestamp.strftime(format) | Return a formatted string of the Timestamp. |
| Timestamp.strptime(string, format) | Function is not implemented. |
| Timestamp.time() | Return time object with same time but with tzinfo=None. |
| Timestamp.timestamp() | Return POSIX timestamp as float. |
| Timestamp.timetuple() | Return time tuple, compatible with time.localtime(). |
| Timestamp.timetz() | Return time object with same time and tzinfo. |
| Timestamp.to_datetime64() | Return a numpy.datetime64 object with same precision. |
| Timestamp.to_numpy([dtype, copy]) | Convert the Timestamp to a NumPy datetime64. |
| Timestamp.to_julian_date() | Convert TimeStamp to a Julian Date. |
| Timestamp.to_period([freq]) | Return an period of which this timestamp is an observation. |
| Timestamp.to_pydatetime([warn]) | Convert a Timestamp object to a native Python datetime object. |
| Timestamp.today([tz]) | Return the current time in the local timezone. |
| Timestamp.toordinal() | Return proleptic Gregorian ordinal. |
| Timestamp.tz_convert(tz) | Convert timezone-aware Timestamp to another time zone. |
| Timestamp.tz_localize(tz[, ambiguous, ...]) | Localize the Timestamp to a timezone. |
| Timestamp.tzname() | Return time zone name. |
| Timestamp.utcfromtimestamp(ts) | Construct a timezone-aware UTC datetime from a POSIX timestamp. |
| Timestamp.utcnow() | Return a new Timestamp representing UTC day and time. |
| Timestamp.utcoffset() | Return utc offset. |
| Timestamp.utctimetuple() | Return UTC time tuple, compatible with time.localtime(). |
| Timestamp.weekday() | Return the day of the week represented by the date. |
A collection of timestamps may be stored in a arrays.DatetimeArray. For timezone-aware data, the .dtype of a arrays.DatetimeArray is aDatetimeTZDtype. For timezone-naive data, np.dtype("datetime64[ns]")is used.
If the data are timezone-aware, then every value in the array must have the same timezone.
| arrays.DatetimeArray(values[, dtype, freq, copy]) | Pandas ExtensionArray for tz-naive or tz-aware datetime data. |
|---|
| DatetimeTZDtype([unit, tz]) | An ExtensionDtype for timezone-aware datetime data. |
|---|
Timedeltas#
NumPy can natively represent timedeltas. pandas provides Timedeltafor symmetry with Timestamp. NaTis the missing value for timedelta data.
| Timedelta([value, unit]) | Represents a duration, the difference between two dates or times. |
|---|
Properties#
| Timedelta.asm8 | Return a numpy timedelta64 array scalar view. |
|---|---|
| Timedelta.components | Return a components namedtuple-like. |
| Timedelta.days | Returns the days of the timedelta. |
| Timedelta.max | |
| Timedelta.microseconds | |
| Timedelta.min | |
| Timedelta.nanoseconds | Return the number of nanoseconds (n), where 0 <= n < 1 microsecond. |
| Timedelta.resolution | |
| Timedelta.seconds | Return the total hours, minutes, and seconds of the timedelta as seconds. |
| Timedelta.unit | |
| Timedelta.value | |
| Timedelta.view(dtype) | Array view compatibility. |
Methods#
| Timedelta.as_unit(unit[, round_ok]) | Convert the underlying int64 representation to the given unit. |
|---|---|
| Timedelta.ceil(freq) | Return a new Timedelta ceiled to this resolution. |
| Timedelta.floor(freq) | Return a new Timedelta floored to this resolution. |
| Timedelta.isoformat() | Format the Timedelta as ISO 8601 Duration. |
| Timedelta.round(freq) | Round the Timedelta to the specified resolution. |
| Timedelta.to_pytimedelta() | Convert a pandas Timedelta object into a python datetime.timedelta object. |
| Timedelta.to_timedelta64() | Return a numpy.timedelta64 object with 'ns' precision. |
| Timedelta.to_numpy([dtype, copy]) | Convert the Timedelta to a NumPy timedelta64. |
| Timedelta.total_seconds() | Total seconds in the duration. |
A collection of Timedelta may be stored in a TimedeltaArray.
Periods#
pandas represents spans of times as Period objects.
Period#
| Period([value, freq, ordinal, year, month, ...]) | Represents a period of time. |
|---|
Properties#
| Period.day | Get day of the month that a Period falls on. |
|---|---|
| Period.dayofweek | Day of the week the period lies in, with Monday=0 and Sunday=6. |
| Period.day_of_week | Day of the week the period lies in, with Monday=0 and Sunday=6. |
| Period.dayofyear | Return the day of the year. |
| Period.day_of_year | Return the day of the year. |
| Period.days_in_month | Get the total number of days in the month that this period falls on. |
| Period.daysinmonth | Get the total number of days of the month that this period falls on. |
| Period.end_time | Get the Timestamp for the end of the period. |
| Period.freq | |
| Period.freqstr | Return a string representation of the frequency. |
| Period.hour | Get the hour of the day component of the Period. |
| Period.is_leap_year | Return True if the period's year is in a leap year. |
| Period.minute | Get minute of the hour component of the Period. |
| Period.month | Return the month this Period falls on. |
| Period.ordinal | |
| Period.quarter | Return the quarter this Period falls on. |
| Period.qyear | Fiscal year the Period lies in according to its starting-quarter. |
| Period.second | Get the second component of the Period. |
| Period.start_time | Get the Timestamp for the start of the period. |
| Period.week | Get the week of the year on the given Period. |
| Period.weekday | Day of the week the period lies in, with Monday=0 and Sunday=6. |
| Period.weekofyear | Get the week of the year on the given Period. |
| Period.year | Return the year this Period falls on. |
Methods#
A collection of Period may be stored in a arrays.PeriodArray. Every period in a arrays.PeriodArray must have the same freq.
| arrays.PeriodArray(values[, dtype, freq, copy]) | Pandas ExtensionArray for storing Period data. |
|---|
Intervals#
Arbitrary intervals can be represented as Interval objects.
| Interval | Immutable object implementing an Interval, a bounded slice-like interval. |
|---|
Properties#
| Interval.closed | String describing the inclusive side the intervals. |
|---|---|
| Interval.closed_left | Check if the interval is closed on the left side. |
| Interval.closed_right | Check if the interval is closed on the right side. |
| Interval.is_empty | Indicates if an interval is empty, meaning it contains no points. |
| Interval.left | Left bound for the interval. |
| Interval.length | Return the length of the Interval. |
| Interval.mid | Return the midpoint of the Interval. |
| Interval.open_left | Check if the interval is open on the left side. |
| Interval.open_right | Check if the interval is open on the right side. |
| Interval.overlaps(other) | Check whether two Interval objects overlap. |
| Interval.right | Right bound for the interval. |
A collection of intervals may be stored in an arrays.IntervalArray.
| arrays.IntervalArray(data[, closed, dtype, ...]) | Pandas array for interval data that are closed on the same side. |
|---|
| IntervalDtype([subtype, closed]) | An ExtensionDtype for Interval data. |
|---|
Nullable integer#
numpy.ndarray cannot natively represent integer-data with missing values. pandas provides this through arrays.IntegerArray.
| Int8Dtype() | An ExtensionDtype for int8 integer data. |
|---|---|
| Int16Dtype() | An ExtensionDtype for int16 integer data. |
| Int32Dtype() | An ExtensionDtype for int32 integer data. |
| Int64Dtype() | An ExtensionDtype for int64 integer data. |
| UInt8Dtype() | An ExtensionDtype for uint8 integer data. |
| UInt16Dtype() | An ExtensionDtype for uint16 integer data. |
| UInt32Dtype() | An ExtensionDtype for uint32 integer data. |
| UInt64Dtype() | An ExtensionDtype for uint64 integer data. |
Nullable float#
Categoricals#
pandas defines a custom data type for representing data that can take only a limited, fixed set of values. The dtype of a Categorical can be described by a CategoricalDtype.
| CategoricalDtype([categories, ordered]) | Type for categorical data with the categories and orderedness. |
|---|
Categorical data can be stored in a pandas.Categorical
| Categorical(values[, categories, ordered, ...]) | Represent a categorical variable in classic R / S-plus fashion. |
|---|
The alternative Categorical.from_codes() constructor can be used when you have the categories and integer codes already:
The dtype information is available on the Categorical
np.asarray(categorical) works by implementing the array interface. Be aware, that this converts the Categorical back to a NumPy array, so categories and order information is not preserved!
A Categorical can be stored in a Series or DataFrame. To create a Series of dtype category, use cat = s.astype(dtype) orSeries(..., dtype=dtype) where dtype is either
- the string
'category' - an instance of CategoricalDtype.
If the Series is of dtype CategoricalDtype, Series.cat can be used to change the categorical data. See Categorical accessor for more.
Sparse#
Data where a single value is repeated many times (e.g. 0 or NaN) may be stored efficiently as a arrays.SparseArray.
| SparseDtype([dtype, fill_value]) | Dtype for data stored in SparseArray. |
|---|
The Series.sparse accessor may be used to access sparse-specific attributes and methods if the Series contains sparse values. SeeSparse accessor and the user guide for more.
Strings#
When working with text data, where each valid element is a string or missing, we recommend using StringDtype (with the alias "string").
| StringDtype([storage, na_value]) | Extension dtype for string data. |
|---|
The Series.str accessor is available for Series backed by a arrays.StringArray. See String handling for more.
Nullable Boolean#
The boolean dtype (with the alias "boolean") provides support for storing boolean data (True, False) with missing values, which is not possible with a bool numpy.ndarray.
| arrays.BooleanArray(values, mask[, copy]) | Array of boolean (True/False) data with missing values. |
|---|
Utilities#
Constructors#
Data type introspection#
| api.types.is_any_real_numeric_dtype(arr_or_dtype) | Check whether the provided array or dtype is of a real number dtype. |
|---|---|
| api.types.is_bool_dtype(arr_or_dtype) | Check whether the provided array or dtype is of a boolean dtype. |
| api.types.is_categorical_dtype(arr_or_dtype) | (DEPRECATED) Check whether an array-like or dtype is of the Categorical dtype. |
| api.types.is_complex_dtype(arr_or_dtype) | Check whether the provided array or dtype is of a complex dtype. |
| api.types.is_datetime64_any_dtype(arr_or_dtype) | Check whether the provided array or dtype is of the datetime64 dtype. |
| api.types.is_datetime64_dtype(arr_or_dtype) | Check whether an array-like or dtype is of the datetime64 dtype. |
| api.types.is_datetime64_ns_dtype(arr_or_dtype) | Check whether the provided array or dtype is of the datetime64[ns] dtype. |
| api.types.is_datetime64tz_dtype(arr_or_dtype) | (DEPRECATED) Check whether an array-like or dtype is of a DatetimeTZDtype dtype. |
| api.types.is_extension_array_dtype(arr_or_dtype) | Check if an object is a pandas extension array type. |
| api.types.is_float_dtype(arr_or_dtype) | Check whether the provided array or dtype is of a float dtype. |
| api.types.is_int64_dtype(arr_or_dtype) | (DEPRECATED) Check whether the provided array or dtype is of the int64 dtype. |
| api.types.is_integer_dtype(arr_or_dtype) | Check whether the provided array or dtype is of an integer dtype. |
| api.types.is_interval_dtype(arr_or_dtype) | (DEPRECATED) Check whether an array-like or dtype is of the Interval dtype. |
| api.types.is_numeric_dtype(arr_or_dtype) | Check whether the provided array or dtype is of a numeric dtype. |
| api.types.is_object_dtype(arr_or_dtype) | Check whether an array-like or dtype is of the object dtype. |
| api.types.is_period_dtype(arr_or_dtype) | (DEPRECATED) Check whether an array-like or dtype is of the Period dtype. |
| api.types.is_signed_integer_dtype(arr_or_dtype) | Check whether the provided array or dtype is of a signed integer dtype. |
| api.types.is_string_dtype(arr_or_dtype) | Check whether the provided array or dtype is of the string dtype. |
| api.types.is_timedelta64_dtype(arr_or_dtype) | Check whether an array-like or dtype is of the timedelta64 dtype. |
| api.types.is_timedelta64_ns_dtype(arr_or_dtype) | Check whether the provided array or dtype is of the timedelta64[ns] dtype. |
| api.types.is_unsigned_integer_dtype(arr_or_dtype) | Check whether the provided array or dtype is of an unsigned integer dtype. |
| api.types.is_sparse(arr) | (DEPRECATED) Check whether an array-like is a 1-D pandas sparse array. |
Iterable introspection#
Scalar introspection#
| api.types.is_bool(obj) | Return True if given object is boolean. |
|---|---|
| api.types.is_complex(obj) | Return True if given object is complex. |
| api.types.is_float(obj) | Return True if given object is float. |
| api.types.is_hashable(obj) | Return True if hash(obj) will succeed, False otherwise. |
| api.types.is_integer(obj) | Return True if given object is integer. |
| api.types.is_interval(obj) | |
| api.types.is_number(obj) | Check if the object is a number. |
| api.types.is_re(obj) | Check if the object is a regex pattern instance. |
| api.types.is_re_compilable(obj) | Check if the object can be compiled into a regex pattern instance. |
| api.types.is_scalar(val) | Return True if given object is scalar. |