turicreate.activity_classifier.util.random_split_by_session — Turi Create API 6.4.1 documentation (original) (raw)
turicreate.activity_classifier.util.
random_split_by_session
(dataset, session_id, fraction=0.9, seed=None)¶
Randomly split an SFrame into two SFrames based on the session_id such that one split contains all data for a fraction of the sessions while the second split contains all data for the rest of the sessions.
Parameters: | dataset : SFrame Dataset to split. It must contain a column of session ids. session_id : string, optional The name of the column in dataset that corresponds to the a unique identifier for each session. fraction : float, optional Fraction of the sessions to fetch for the first returned SFrame. Must be between 0 and 1. Once the sessions are split, all data from a single session is in the same SFrame. seed : int, optional Seed for the random number generator used to split. |
---|
Examples
Split the data so that train has 90% of the users.
train, valid = tc.activity_classifier.util.random_split_by_session( ... dataset, session_id='session_id', fraction=0.9)
For example: If dataset has 2055 sessions
len(dataset['session_id'].unique()) 2055
The training set now has 90% of the sessions
len(train['session_id'].unique()) 1850
The validation set has the remaining 10% of the sessions
len(valid['session_id'].unique()) 205