turicreate.SFrame.pack_columns — Turi Create API 6.4.1 documentation (original) (raw)
SFrame.
pack_columns
(column_names=None, column_name_prefix=None, dtype=<class 'list'>, fill_na=None, remove_prefix=True, new_column_name=None)¶
Pack columns of the current SFrame into one single column. The result is a new SFrame with the unaffected columns from the original SFrame plus the newly created column.
The list of columns that are packed is chosen through either thecolumn_names
or column_name_prefix
parameter. Only one of the parameters is allowed to be provided. columns_names
explicitly specifies the list of columns to pack, while column_name_prefix
specifies that all columns that have the given prefix are to be packed.
The type of the resulting column is decided by the dtype
parameter. Allowed values for dtype
are dict, array.array and list:
- dict: pack to a dictionary SArray where column name becomes dictionary key and column value becomes dictionary value
- array.array: pack all values from the packing columns into an array
- list: pack all values from the packing columns into a list.
Parameters: | column_names : list[str], optional A list of column names to be packed. If omitted andcolumn_name_prefix is not specified, all columns from current SFrame are packed. This parameter is mutually exclusive with thecolumn_name_prefix parameter. column_name_prefix : str, optional Pack all columns with the given column_name_prefix. This parameter is mutually exclusive with the columns_names parameter. dtype : dict | array.array | list, optional The resulting packed column type. If not provided, dtype is list. fill_na : value, optional Value to fill into packed column if missing value is encountered. If packing to dictionary, fill_na is only applicable to dictionary values; missing keys are not replaced. remove_prefix : bool, optional If True and column_name_prefix is specified, the dictionary key will be constructed by removing the prefix from the column name. This option is only applicable when packing to dict type. new_column_name : str, optional Packed column name. If not given and column_name_prefix is given, then the prefix will be used as the new column name, otherwise name is generated automatically. |
---|---|---|
Returns: | out : SFrame An SFrame that contains columns that are not packed, plus the newly packed column. |
Notes
- If packing to dictionary, missing key is always dropped. Missing values are dropped if fill_na is not provided, otherwise, missing value is replaced by ‘fill_na’. If packing to list or array, missing values will be kept. If ‘fill_na’ is provided, the missing value is replaced with ‘fill_na’ value.
Examples
Suppose ‘sf’ is an an SFrame that maintains business category information:
sf = turicreate.SFrame({'business': range(1, 5), ... 'category.retail': [1, None, 1, None], ... 'category.food': [1, 1, None, None], ... 'category.service': [None, 1, 1, None], ... 'category.shop': [1, 1, None, 1]}) sf +----------+-----------------+---------------+------------------+---------------+ | business | category.retail | category.food | category.service | category.shop | +----------+-----------------+---------------+------------------+---------------+ | 1 | 1 | 1 | None | 1 | | 2 | None | 1 | 1 | 1 | | 3 | 1 | None | 1 | None | | 4 | None | 1 | None | 1 | +----------+-----------------+---------------+------------------+---------------+ [4 rows x 5 columns]
To pack all category columns into a list:
sf.pack_columns(column_name_prefix='category') +----------+-----------------------+ | business | category | +----------+-----------------------+ | 1 | [1, 1, None, 1] | | 2 | [1, None, 1, 1] | | 3 | [None, 1, 1, None] | | 4 | [None, None, None, 1] | +----------+-----------------------+ [4 rows x 2 columns]
To pack all category columns into a dictionary, with new column name:
sf.pack_columns(column_name_prefix='category', dtype=dict, ... new_column_name='new name') +----------+-------------------------------+ | business | new name | +----------+-------------------------------+ | 1 | {'food': 1, 'shop': 1, 're... | | 2 | {'food': 1, 'shop': 1, 'se... | | 3 | {'retail': 1, 'service': 1} | | 4 | {'shop': 1} | +----------+-------------------------------+ [4 rows x 2 columns]
To keep column prefix in the resulting dict key:
sf.pack_columns(column_name_prefix='category', dtype=dict, remove_prefix=False) +----------+-------------------------------+ | business | category | +----------+-------------------------------+ | 1 | {'category.retail': 1, 'ca... | | 2 | {'category.food': 1, 'cate... | | 3 | {'category.retail': 1, 'ca... | | 4 | {'category.shop': 1} | +----------+-------------------------------+ [4 rows x 2 columns]
To explicitly pack a set of columns:
sf.pack_columns(column_names = ['business', 'category.retail', 'category.food', 'category.service', 'category.shop']) +-----------------------+ | X1 | +-----------------------+ | [1, 1, 1, None, 1] | | [2, None, 1, 1, 1] | | [3, 1, None, 1, None] | | [4, None, 1, None, 1] | +-----------------------+ [4 rows x 1 columns]
To pack all columns with name starting with ‘category’ into an array type, and with missing value replaced with 0:
import array sf.pack_columns(column_name_prefix="category", dtype=array.array, ... fill_na=0) +----------+----------------------+ | business | category | +----------+----------------------+ | 1 | [1.0, 1.0, 0.0, 1.0] | | 2 | [1.0, 0.0, 1.0, 1.0] | | 3 | [0.0, 1.0, 1.0, 0.0] | | 4 | [0.0, 0.0, 0.0, 1.0] | +----------+----------------------+ [4 rows x 2 columns]