Lighteval documentation

EvaluationTracker

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

EvaluationTracker

class lighteval.logging.evaluation_tracker.EvaluationTracker

< >

( output_dir: str results_path_template: str | None = None save_details: bool = True push_to_hub: bool = False push_to_tensorboard: bool = False hub_results_org: str | None = '' tensorboard_metric_prefix: str = 'eval' public: bool = False nanotron_run_info: GeneralArgs = None use_wandb: bool = False )

Parameters

  • output_dir (str) — Local directory to save evaluation results and logs
  • results_path_template (str, optional) — Template for results directory structure. Example: “{outputdir}/results/{org}{model}”
  • save_details (bool, defaults to True) — Whether to save detailed evaluation records
  • push_to_hub (bool, defaults to False) — Whether to push results to HF Hub
  • push_to_tensorboard (bool, defaults to False) — Whether to push metrics to TensorBoard
  • hub_results_org (str, optional) — HF Hub organization to push results to
  • tensorboard_metric_prefix (str, defaults to “eval”) — Prefix for TensorBoard metrics
  • public (bool, defaults to False) — Whether to make Hub datasets public
  • nanotron_run_info (GeneralArgs, optional) — Nanotron model run information
  • use_wandb (bool, defaults to False) — Whether to log to Weights & Biases or Trackio if available

Tracks and manages evaluation results, metrics, and logging for model evaluations.

The EvaluationTracker coordinates multiple specialized loggers to track different aspects of model evaluation:

  • Details Logger (DetailsLogger): Records per-sample evaluation details and predictions
  • Metrics Logger (MetricsLogger): Tracks aggregate evaluation metrics and scores
  • Versions Logger (VersionsLogger): Records task and dataset versions
  • General Config Logger (GeneralConfigLogger): Stores overall evaluation configuration
  • Task Config Logger (TaskConfigLogger): Maintains per-task configuration details

The tracker can save results locally and optionally push them to:

  • Hugging Face Hub as datasets
  • TensorBoard for visualization
  • Trackio or Weights & Biases for experiment tracking

Example:

tracker = EvaluationTracker(
    output_dir="./eval_results",
    push_to_hub=True,
    hub_results_org="my-org",
    save_details=True
)

# Log evaluation results
tracker.metrics_logger.add_metric("accuracy", 0.85)
tracker.details_logger.add_detail(task_name="qa", prediction="Paris")

# Save all results
tracker.save()

generate_final_dict

< >

( ) dict

Returns

dict

Dictionary containing all experiment information including config, results, versions, and summaries

Aggregates and returns all the logger’s experiment information in a dictionary.

This function should be used to gather and display said information at the end of an evaluation run.

push_to_hub

< >

( date_id: str details: dict results_dict: dict )

Pushes the experiment details (all the model predictions for every step) to the hub.

recreate_metadata_card

< >

( repo_id: str )

Parameters

  • repo_id (str) — Details dataset repository path on the hub (org/dataset)

Fully updates the details repository metadata card for the currently evaluated model

save

< >

( )

Saves the experiment information and results to files, and to the hub if requested.

< > Update on GitHub