Skip to main content

Create a leaderboard

In H2O Eval Studio, leaderboards compare LLMs based on metrics calculated by the one evaluator. (For more information on evaluators in H2O Eval Studio, see Evaluators.) You can view This page describes how to create a leaderboard in H2O Eval Studio.

  1. In the main navigation, click Leaderboards.

  2. Click the New Leaderboard button.

  3. Enter a name for the leaderboard.

  4. Enter a description of the leaderboard.

  5. Select a connection to the model host of the LLM models you want to evaluate. Note that when creating leaderboards, there are two types of connections: LLM and RAG. The list of available evaluators and tests depends on the type of connection you select. (For example, operating the RAGAs evaluator on a pure LLM model is not applicable.) For more information on adding a connection, see Add connection.

  6. Select the evaluator you want to use. For more information on the available evaluators, see Evaluators.

  7. Select the tests that you want to use. For more information on tests in H2O Eval Studio, see Tests.

  8. Select the LLM Models you want to use for the evaluation.

  9. (Optional) Select an existing collection to be used for the leaderboard. If you do not select a collection, the evaluation will create new collections for the leaderboard as defined by tests.

  10. (Optional) Set advanced settings for the leaderboard. For more information on advanced settings, see model host specific Advanced settings.

  11. Click the Create button.

View a leaderboard

The table on the Leaderboards page lists all of the leaderboards that you have created. To view a leaderboard, click the name of the leaderboard you want to view.

When viewing a specific leaderboard, you can view a visualization of the leaderboard, obtain an HTML report, and download a zip archive with the evaluation results.

View a visualization of a leaderboard

The leaderboard page features a visualization of evaluator result metrics as a radar plot (in cases where more than one metric is produced by the evaluator) or bar chart (in cases where there are three or fewer metrics produced by the evaluator). Leaderboard visualizations can help you understand the evaluation results for a given metric and the LLM models being compared.

Obtain an HTML report of a leaderboard

To view an HTML report of a leaderboard, click the Show Report button. This report provides in-depth information about potential problems with the model, evaluation parameters, the evaluated models, and more.

Download a zip archive with evaluation results

To download a zip archive with evaluation results, click the Download Report button.

Delete a leaderboard

To delete a leaderboard from the Leaderboards page, select the checkbox next to the leaderboard you want to delete, and then click the Delete button.


Feedback