tff.simulation.datasets.TestClientData | TensorFlow Federated (original) (raw)
tff.simulation.datasets.TestClientData
Stay organized with collections Save and categorize content based on your preferences.
A tff.simulation.datasets.ClientData intended for test purposes.
Inherits From: ClientData
tff.simulation.datasets.TestClientData(
tensor_slices_dict
)
The implementation is based on tf.data.Dataset.from_tensor_slices.
This class is intended only for constructing toy federated datasets, especially to support simulation tests. Using this for large datasets is _not_recommended, as it requires putting all client data into the underlying TensorFlow graph (which is memory intensive).
Args | |
---|---|
tensor_slices_dict | A dictionary keyed by client_id, where values are lists, tuples, or dicts for passing totf.data.Dataset.from_tensor_slices. Note that namedtuples and attrs classes are not explicitly supported, but a user can convert their data from those formats to a dict, and then use this class. The leaves of this dictionary must not be tf.Tensors, in order to avoid putting eager tensors into graphs. |
Raises | |
---|---|
ValueError | If a client with no data is found. |
TypeError | If tensor_slices_dict is not a dictionary, or its value structures are namedtuples, or its value structures are not either strictly lists, strictly (standard, non-named) tuples, or strictly dictionaries. |
TypeError | If any leaf of tensor_slices_dict is a tf.Tensor. |
Attributes | |
---|---|
client_ids | A list of string identifiers for clients in this dataset. |
dataset_computation | A tff.Computation accepting a client ID, returning a dataset. |
element_type_structure | The element type information of the client datasets.elements returned by datasets in this ClientData object. |
serializable_dataset_fn | A callable accepting a client ID and returning a tf.data.Dataset.Note that this callable must be traceable by TF, as it will be used in the context of a tf.function. |
Methods
create_tf_dataset_for_client
create_tf_dataset_for_client(
client_id
)
Creates a new tf.data.Dataset containing the client training examples.
This function will create a dataset for a given client, given thatclient_id
is contained in the client_ids
property of the ClientData
. Unlike create_dataset
, this method need not be serializable.
Args | |
---|---|
client_id | The string client_id for the desired client. |
Returns |
---|
A tf.data.Dataset object. |
create_tf_dataset_from_all_clients
create_tf_dataset_from_all_clients(
seed: Optional[Union[int, Sequence[int]]] = None
) -> tf.data.Dataset
Creates a new tf.data.Dataset containing all client examples.
This function is intended for use training centralized, non-distributed models (num_clients=1). This can be useful as a point of comparison against federated models.
Currently, the implementation produces a dataset that contains all examples from a single client in order, and so generally additional shuffling should be performed.
Args | |
---|---|
seed | Optional, a seed to determine the order in which clients are processed in the joined dataset. The seed can be any nonnegative 32-bit integer, an array of such integers, or None. |
Returns |
---|
A tf.data.Dataset object. |
datasets
datasets(
limit_count: Optional[int] = None,
seed: Optional[Union[int, Sequence[int]]] = None
) -> Iterable[tf.data.Dataset]
Yields the tf.data.Dataset for each client in random order.
This function is intended for use building a static array of client data to be provided to the top-level federated computation.
Args | |
---|---|
limit_count | Optional, a maximum number of datasets to return. |
seed | Optional, a seed to determine the order in which clients are processed in the joined dataset. The seed can be any nonnegative 32-bit integer, an array of such integers, or None. |
from_clients_and_tf_fn
@classmethod
from_clients_and_tf_fn( client_ids: Iterable[str], serializable_dataset_fn: Callable[[str], tf.data.Dataset] ) -> 'ClientData'
Constructs a ClientData
based on the given function.
Args | |
---|---|
client_ids | A non-empty list of strings to use as input tocreate_dataset_fn. |
serializable_dataset_fn | A function that takes a client_id from the above list, and returns a tf.data.Dataset. This function must be serializable and usable within the context of a tf.function andtff.Computation. |
Raises | |
---|---|
TypeError | If serializable_dataset_fn is a tff.Computation. |
Returns |
---|
A ClientData object. |
preprocess
preprocess(
preprocess_fn: Callable[[tf.data.Dataset], tf.data.Dataset]
) -> 'ClientData'
Applies preprocess_fn
to each client's data.
Args | |
---|---|
preprocess_fn | A callable accepting a tf.data.Dataset and returning a preprocessed tf.data.Dataset. This function must be traceable by TF. |
Returns |
---|
A tff.simulation.datasets.ClientData. |
Raises | |
---|---|
IncompatiblePreprocessFnError | If preprocess_fn is a tff.Computation. |
train_test_client_split
@classmethod
train_test_client_split( client_data: 'ClientData', num_test_clients: int, seed: Optional[Union[int, Sequence[int]]] = None ) -> tuple['ClientData', 'ClientData']
Returns a pair of (train, test) ClientData
.
This method partitions the clients of client_data
into two ClientData
objects with disjoint sets of ClientData.client_ids. All clients in the test ClientData
are guaranteed to have non-empty datasets, but the training ClientData
may have clients with no data.
Args | |
---|---|
client_data | The base ClientData to split. |
num_test_clients | How many clients to hold out for testing. This can be at most len(client_data.client_ids) - 1, since we don't want to produce empty ClientData. |
seed | Optional seed to fix shuffling of clients before splitting. The seed can be any nonnegative 32-bit integer, an array of such integers, orNone. |
Returns |
---|
A pair (train_client_data, test_client_data), where test_client_data has num_test_clients selected at random, subject to the constraint they each have at least 1 batch in their dataset. |
Raises | |
---|---|
ValueError | If num_test_clients cannot be satistifed by client_data, or too many clients have empty datasets. |