The implementation is based on tf.data.Dataset.from_tensor_slices. This class is intended only for constructing toy federated datasets, especially to support simulation tests. Using this for large datasets is _not_recommended, as it requires putting all client data into the underlying TensorFlow graph (which is memory intensive).
Args
tensor_slices_dict
A dictionary keyed by client_id, where values are lists, tuples, or dicts for passing totf.data.Dataset.from_tensor_slices. Note that namedtuples and attrs classes are not explicitly supported, but a user can convert their data from those formats to a dict, and then use this class. The leaves of this dictionary must not be tf.Tensors, in order to avoid putting eager tensors into graphs.
Raises
ValueError
If a client with no data is found.
TypeError
If tensor_slices_dict is not a dictionary, or its value structures are namedtuples, or its value structures are not either strictly lists, strictly (standard, non-named) tuples, or strictly dictionaries.
A list of string identifiers for clients in this dataset.
dataset_computation
A tff.Computation accepting a client ID, returning a dataset.
element_type_structure
The element type information of the client datasets.elements returned by datasets in this ClientData object.
serializable_dataset_fn
A callable accepting a client ID and returning a tf.data.Dataset.Note that this callable must be traceable by TF, as it will be used in the context of a tf.function.
Creates a new tf.data.Dataset containing the client training examples.
This function will create a dataset for a given client, given thatclient_id is contained in the client_ids property of the ClientData. Unlike create_dataset, this method need not be serializable.
Creates a new tf.data.Dataset containing all client examples.
This function is intended for use training centralized, non-distributed models (num_clients=1). This can be useful as a point of comparison against federated models.
Currently, the implementation produces a dataset that contains all examples from a single client in order, and so generally additional shuffling should be performed.
Args
seed
Optional, a seed to determine the order in which clients are processed in the joined dataset. The seed can be any nonnegative 32-bit integer, an array of such integers, or None.
This function is intended for use building a static array of client data to be provided to the top-level federated computation.
Args
limit_count
Optional, a maximum number of datasets to return.
seed
Optional, a seed to determine the order in which clients are processed in the joined dataset. The seed can be any nonnegative 32-bit integer, an array of such integers, or None.
Constructs a ClientData based on the given function.
Args
client_ids
A non-empty list of strings to use as input tocreate_dataset_fn.
serializable_dataset_fn
A function that takes a client_id from the above list, and returns a tf.data.Dataset. This function must be serializable and usable within the context of a tf.function andtff.Computation.
This method partitions the clients of client_data into two ClientDataobjects with disjoint sets of ClientData.client_ids. All clients in the test ClientData are guaranteed to have non-empty datasets, but the training ClientData may have clients with no data.
Args
client_data
The base ClientData to split.
num_test_clients
How many clients to hold out for testing. This can be at most len(client_data.client_ids) - 1, since we don't want to produce empty ClientData.
seed
Optional seed to fix shuffling of clients before splitting. The seed can be any nonnegative 32-bit integer, an array of such integers, orNone.
Returns
A pair (train_client_data, test_client_data), where test_client_data has num_test_clients selected at random, subject to the constraint they each have at least 1 batch in their dataset.
Raises
ValueError
If num_test_clients cannot be satistifed by client_data, or too many clients have empty datasets.