Release notes

v1.3.0 | Jan 14, 2026

New features

New Test Case Import Workflow - Added a new workflow type that allows importing a golden set of test cases and performing risk management analysis using common tools like failure clustering, weakness detection, or robustness testing.
Test Case Review & Annotation System - Introduced comprehensive test case annotation capabilities with human decision tracking, comments, and CSV/JSON export for further post-processing of the results.
Evaluation Results Comparison Tool - New comparison tool compares evaluation results of different RAG/LLM models, suggests the best model, and generates comparison reports in HTML and JSON formats.
New Evaluators Suite - Added 6 new evaluators including Agent Sanity Check (with 8 metrics), Mean Reciprocal Rank (MRR), JSON Schema validator, Answer Accuracy/Semantic Similarity, Self-consistency, and BERTScore evaluators.
Agentic Evaluation & Visualization - Added new agentic evaluation type with specialized visualization for agent-based LLM workflows and improved agent run tracking.
Topic Modeling Enhancements - Added custom topics support and interactive add/remove features.
Job Notification System - Introduced a jobs tray for tracking and managing long-running operations, including abort.

Improvements

Worker Scalability Rearchitecture - Major refactoring significantly improves worker scalability, stability, and job management. Long-running jobs now autoscale on available cluster resources and GPUs.

v1.2.5 | Dec 19, 2025

Fixes

Fixes AWS region validation issue in workers.

v1.2.4 | Dec 17, 2025

Maintenance

Upgrades h2oGPTe to version 1.6.51.
Resolves security vulnerabilities.

v1.2.3 | Nov 18, 2025

Maintenance

Upgrades h2oGPTe to version 1.6.45.

v1.2.2 | Oct 16, 2025

Fixes

Fixes document extension handling.
Includes Python client fixes and features backported from later versions.

v1.2.1 | Sept 29, 2025

Fixes

Fixes critical vulnerabilities.
Resolves several bugs to improve overall stability.

v1.2.0 | Aug 26, 2025

New features

Share resources such as workflows, evaluations, or tests directly in the UI through integration with the Authz service.

MRM artifacts and actionability

MRM Workflows now produce various output artifacts:

Summary and report: Generate a comprehensive workflow report to ingest and explore in H2OGPTe.
Corpus patch: A collection of FAQs based on problematic prompts and their expected answers to add to H2OGPTe collections to improve RAG performance.
Guardrails configuration: A downloadable configuration (JSON file) for custom guardrails to embed into applications using the H2O Eval Studio Python client.

New MRM features

Introduced the following MRM workflow features:
- Generated test validation: Validate the quality of the generated test suite using various metrics.
- Failure clustering: Automatically detect common failure patterns of the model host.
- Workflow statuses: Every workflow step now can report various colored statuses, such as failures, warnings, or being stale, indicating the need to verify outputs and re-run certain steps for up-to-date results.
MRM workflows now support other model hosts, including Bedrock and OpenAI-compatible APIs.
MRM workflows are now accessible directly through H2OGPTe collection UI.
Improved UX for workflows and their individual steps, including enhancements for robustness testing.

Feedback

Submit and view feedback for this page
Send feedback about H2O Eval Studio to cloud-feedback@h2o.ai

v1.3.0 | Jan 14, 2026​

New features​

Improvements​

v1.2.5 | Dec 19, 2025​

Fixes​

v1.2.4 | Dec 17, 2025​

Maintenance​

v1.2.3 | Nov 18, 2025​

Maintenance​

v1.2.2 | Oct 16, 2025​

Fixes​

v1.2.1 | Sept 29, 2025​

Fixes​

v1.2.0 | Aug 26, 2025​

New features​

RBAC (Sharing)​

MRM artifacts and actionability​

New MRM features​

v1.3.0 | Jan 14, 2026

New features

Improvements

v1.2.5 | Dec 19, 2025

Fixes

v1.2.4 | Dec 17, 2025

Maintenance

v1.2.3 | Nov 18, 2025

Maintenance

v1.2.2 | Oct 16, 2025

Fixes

v1.2.1 | Sept 29, 2025

Fixes

v1.2.0 | Aug 26, 2025

New features

RBAC (Sharing)

MRM artifacts and actionability

New MRM features