torch_frame.transforms — pytorch-frame documentation (original) (raw)
Contents
Transforms
PyTorch Frame allows for data transformation across different stype
’s or within the same stype
. Transforms takes in both TensorFrame
and column stats.
Let’s look an example, where we apply CatToNumTransform to transform the categorical features into numerical features.
from torch_frame.datasets import Yandex from torch_frame.transforms import CatToNumTransform from torch_frame import stype
dataset = Yandex(root='/tmp/adult', name='adult') dataset.materialize() transform = CatToNumTransform() train_dataset = dataset.get_split('train')
train_dataset.tensor_frame.col_names_dict[stype.categorical]
['C_feature_0', 'C_feature_1', 'C_feature_2', 'C_feature_3', 'C_feature_4', 'C_feature_5', 'C_feature_6', 'C_feature_7']
test_dataset = dataset.get_split('test') transform.fit(train_dataset.tensor_frame, dataset.col_stats)
transformed_col_stats = transform.transformed_stats
transformed_col_stats.keys()
dict_keys(['C_feature_0_0', 'C_feature_1_0', 'C_feature_2_0', 'C_feature_3_0', 'C_feature_4_0', 'C_feature_5_0', 'C_feature_6_0', 'C_feature_7_0'])
transformed_col_stats['C_feature_0_0']
{<StatType.MEAN: 'MEAN'>: 0.6984029484029484, <StatType.STD: 'STD'>: 0.45895127199411595, <StatType.QUANTILES: 'QUANTILES'>: [0.0, 0.0, 1.0, 1.0, 1.0]}
transform(test_dataset.tensor_frame)
TensorFrame( num_cols=14, num_rows=16281, numerical (14): ['N_feature_0', 'N_feature_1', 'N_feature_2', 'N_feature_3', 'N_feature_4', 'N_feature_5', 'C_feature_0_0', 'C_feature_1_0', 'C_feature_2_0', 'C_feature_3_0', 'C_feature_4_0', 'C_feature_5_0', 'C_feature_6_0', 'C_feature_7_0'], has_target=True, device=cpu, )
You can see that after the transform, the column names of the categorical features changes and the categorical features are transformed into numerical features.