Main classes (original) (raw)

EvaluationModuleInfo

The base class EvaluationModuleInfo implements a the logic for the subclasses MetricInfo, ComparisonInfo, and MeasurementInfo.

class evaluate.EvaluationModuleInfo

< source >

( description: str citation: str features: typing.Union[datasets.features.features.Features, typing.List[datasets.features.features.Features]] inputs_description: str = homepage: str = license: str = codebase_urls: typing.List[str] = reference_urls: typing.List[str] = streamable: bool = False format: typing.Optional[str] = None module_type: str = 'metric' module_name: typing.Optional[str] = None config_name: typing.Optional[str] = None experiment_id: typing.Optional[str] = None )

Base class to store information about an evaluation used for MetricInfo, ComparisonInfo, and MeasurementInfo.

EvaluationModuleInfo documents an evaluation, including its name, version, and features. See the constructor arguments and properties for a full list.

Note: Not all fields are known on construction and may be updated later.

from_directory

< source >

( metric_info_dir )

Parameters

Create EvaluationModuleInfo from the JSON file in metric_info_dir.

Example:

my_metric = EvaluationModuleInfo.from_directory("/path/to/directory/")

write_to_directory

< source >

( metric_info_dir )

Parameters

Write EvaluationModuleInfo as JSON to metric_info_dir. Also save the license separately in LICENSE.

Example:

my_metric.info.write_to_directory("/path/to/directory/")

class evaluate.MetricInfo

< source >

( description: str citation: str features: typing.Union[datasets.features.features.Features, typing.List[datasets.features.features.Features]] inputs_description: str = homepage: str = license: str = codebase_urls: typing.List[str] = reference_urls: typing.List[str] = streamable: bool = False format: typing.Optional[str] = None module_type: str = 'metric' module_name: typing.Optional[str] = None config_name: typing.Optional[str] = None experiment_id: typing.Optional[str] = None )

Information about a metric.

EvaluationModuleInfo documents a metric, including its name, version, and features. See the constructor arguments and properties for a full list.

Note: Not all fields are known on construction and may be updated later.

class evaluate.ComparisonInfo

< source >

( description: str citation: str features: typing.Union[datasets.features.features.Features, typing.List[datasets.features.features.Features]] inputs_description: str = homepage: str = license: str = codebase_urls: typing.List[str] = reference_urls: typing.List[str] = streamable: bool = False format: typing.Optional[str] = None module_type: str = 'comparison' module_name: typing.Optional[str] = None config_name: typing.Optional[str] = None experiment_id: typing.Optional[str] = None )

Information about a comparison.

EvaluationModuleInfo documents a comparison, including its name, version, and features. See the constructor arguments and properties for a full list.

Note: Not all fields are known on construction and may be updated later.

class evaluate.MeasurementInfo

< source >

( description: str citation: str features: typing.Union[datasets.features.features.Features, typing.List[datasets.features.features.Features]] inputs_description: str = homepage: str = license: str = codebase_urls: typing.List[str] = reference_urls: typing.List[str] = streamable: bool = False format: typing.Optional[str] = None module_type: str = 'measurement' module_name: typing.Optional[str] = None config_name: typing.Optional[str] = None experiment_id: typing.Optional[str] = None )

Information about a measurement.

EvaluationModuleInfo documents a measurement, including its name, version, and features. See the constructor arguments and properties for a full list.

Note: Not all fields are known on construction and may be updated later.

EvaluationModule

The base class EvaluationModule implements a the logic for the subclasses Metric, Comparison, and Measurement.

class evaluate.EvaluationModule

< source >

( config_name: typing.Optional[str] = None keep_in_memory: bool = False cache_dir: typing.Optional[str] = None num_process: int = 1 process_id: int = 0 seed: typing.Optional[int] = None experiment_id: typing.Optional[str] = None hash: str = None max_concurrent_cache_files: int = 10000 timeout: typing.Union[int, float] = 100 **kwargs )

Parameters

A EvaluationModule is the base class and common API for metrics, comparisons, and measurements.

add

< source >

( prediction = None reference = None **kwargs )

Parameters

Add one prediction and reference for the evaluation module’s stack.

Example:

import evaluate accuracy = evaluate.load("accuracy") accuracy.add(references=[0,1], predictions=[1,0])

add_batch

< source >

( predictions = None references = None **kwargs )

Parameters

Add a batch of predictions and references for the evaluation module’s stack.

Example:

import evaluate accuracy = evaluate.load("accuracy") for refs, preds in zip([[0,1],[0,1]], [[1,0],[0,1]]): ... accuracy.add_batch(references=refs, predictions=preds)

compute

< source >

( predictions = None references = None **kwargs ) → dict or None

Parameters

Compute the evaluation module.

Usage of positional arguments is not allowed to prevent mistakes.

import evaluate accuracy = evaluate.load("accuracy") accuracy.compute(predictions=[0, 1, 1, 0], references=[0, 1, 0, 1])

download_and_prepare

< source >

( download_config: typing.Optional[datasets.download.download_config.DownloadConfig] = None dl_manager: typing.Optional[datasets.download.download_manager.DownloadManager] = None )

Parameters

Downloads and prepares evaluation module for reading.

class evaluate.Metric

< source >

( config_name: typing.Optional[str] = None keep_in_memory: bool = False cache_dir: typing.Optional[str] = None num_process: int = 1 process_id: int = 0 seed: typing.Optional[int] = None experiment_id: typing.Optional[str] = None hash: str = None max_concurrent_cache_files: int = 10000 timeout: typing.Union[int, float] = 100 **kwargs )

Parameters

A Metric is the base class and common API for all metrics.

class evaluate.Comparison

< source >

( config_name: typing.Optional[str] = None keep_in_memory: bool = False cache_dir: typing.Optional[str] = None num_process: int = 1 process_id: int = 0 seed: typing.Optional[int] = None experiment_id: typing.Optional[str] = None hash: str = None max_concurrent_cache_files: int = 10000 timeout: typing.Union[int, float] = 100 **kwargs )

Parameters

A Comparison is the base class and common API for all comparisons.

class evaluate.Measurement

< source >

( config_name: typing.Optional[str] = None keep_in_memory: bool = False cache_dir: typing.Optional[str] = None num_process: int = 1 process_id: int = 0 seed: typing.Optional[int] = None experiment_id: typing.Optional[str] = None hash: str = None max_concurrent_cache_files: int = 10000 timeout: typing.Union[int, float] = 100 **kwargs )

Parameters

A Measurement is the base class and common API for all measurements.

CombinedEvaluations

The combine function allows to combine multiple EvaluationModules into a single CombinedEvaluations.

evaluate.combine

< source >

( evaluations force_prefix = False )

Parameters

Combines several metrics, comparisons, or measurements into a single CombinedEvaluations object that can be used like a single evaluation module.

If two scores have the same name, then they are prefixed with their module names. And if two modules have the same name, please use a dictionary to give them different names, otherwise an integer id is appended to the prefix.

Examples:

import evaluate accuracy = evaluate.load("accuracy") f1 = evaluate.load("f1") clf_metrics = combine(["accuracy", "f1"])

class evaluate.CombinedEvaluations

< source >

( evaluation_modules force_prefix = False )

add

< source >

( prediction = None reference = None **kwargs )

Parameters

Add one prediction and reference for each evaluation module’s stack.

Example:

import evaluate accuracy = evaluate.load("accuracy") f1 = evaluate.load("f1") clf_metrics = combine(["accuracy", "f1"]) for ref, pred in zip([0,1,0,1], [1,0,0,1]): ... clf_metrics.add(references=ref, predictions=pred)

add_batch

< source >

( predictions = None references = None **kwargs )

Parameters

Add a batch of predictions and references for each evaluation module’s stack.

Example:

import evaluate accuracy = evaluate.load("accuracy") f1 = evaluate.load("f1") clf_metrics = combine(["accuracy", "f1"]) for refs, preds in zip([[0,1],[0,1]], [[1,0],[0,1]]): ... clf_metrics.add(references=refs, predictions=preds)

compute

< source >

( predictions = None references = None **kwargs ) → dict or None

Parameters

Compute each evaluation module.

Usage of positional arguments is not allowed to prevent mistakes.

Example:

import evaluate accuracy = evaluate.load("accuracy") f1 = evaluate.load("f1") clf_metrics = combine(["accuracy", "f1"]) clf_metrics.compute(predictions=[0,1], references=[1,1]) {'accuracy': 0.5, 'f1': 0.6666666666666666}

< > Update on GitHub