Key concepts
This section introduces foundational terms and ideas used throughout H2O Enterprise LLM Studio. These concepts help you understand how the platform works and why it’s designed this way.
Large Language Model (LLM)
A type of neural network trained on large text datasets to perform a variety of natural language tasks. LLMs power many of the features in Enterprise LLM Studio. In most enterprise workflows, LLMs are not directly fine-tuned due to their large size and computational requirements. Instead, LLMs are often used in the Data Generation section to generate labeled training data or annotations.
Small Language Model (SLM)
A compact neural network model derived from a larger LLM, typically through techniques like distillation, quantization, or pruning. In the Enterprise LLM Studio workflow, you generate or annotate data using an LLM, then use it to fine-tune an SLM for your specific domain or application. SLMs are efficient enough to be fine-tuned and deployed in production environments using your organization’s own data.
Fine-tuning
The process of adapting a pre-trained model to a new dataset or task. The platform supports supervised fine-tuning using your own labeled data or with data generated via prompt-based workflows.
LLM Backbone
The base model you're fine-tuning. H2O Enterprise LLM Studio supports a wide range of popular open-source models including Meta Llama, Qwen, Google Gemma, Mistral, DeepSeek, and H2O Danube. This choice impacts performance and resource requirements.
Parameters vs. Hyperparameters
- Parameters are learned during training (like weights in the model).
- Hyperparameters are settings you choose before training (like learning rate, batch size, or number of epochs).
Enterprise users can set these manually or let AutoML or Ask KGM tune them automatically.
LoRA and Quantization
- LoRA (Low-Rank Adaptation) is a memory-efficient way to fine-tune large models by updating only a small set of weights.
- Quantization reduces model precision (e.g., from 16-bit to 8-bit) to lower memory usage and speed up training and inference.
These techniques are used behind the scenes in the “Advanced Configuration” section of the Experiments UI.
Evaluation Metrics
After fine-tuning, your models are evaluated using standard metrics. The available metrics depend on the problem type:
- Perplexity: Lower is better; measures how confidently a model predicts text. Used in language modeling and multimodal tasks.
- BLEU: Used in text generation; higher is better and measures output quality against a reference.
- LLM-as-a-Judge: Uses a large language model to evaluate the quality of generated responses. Higher is better.
- QA_Accuracy: Measures accuracy of question-answering outputs. Higher is better.
- Accuracy: Measures classification correctness. Used for text and image classification.
- AUC: Area Under the Curve; measures a classifier's ability to distinguish between classes. Used for text and image classification.
- LogLoss: Measures the uncertainty of classification predictions. Lower is better. Used for text and image classification.
- mAP (Mean Average Precision): Overall detection accuracy across all IoU thresholds. Used for object detection.
- mAP@50 / mAP@75: Precision at specific IoU thresholds (0.50 and 0.75). Used for object detection.
- mAR (Mean Average Recall): Overall detection recall. Used for object detection.
Prompt-Based Data Generation
In cases where you don’t have labeled data, you can use a large model (like GPT-4) to generate rows or annotate existing data. This is managed through the Data Generation section and is useful for bootstrapping training sets.
AutoML and Ask KGM
Enterprise LLM Studio includes intelligent agents that can:
- Automatically tune hyperparameters
- Choose the best backbone model
- Iterate on experiments to improve performance
You can start these from the AutoML tab or invoke Ask KGM during experiment setup.
- Submit and view feedback for this page
- Send feedback about H2O Enterprise LLM Studio to cloud-feedback@h2o.ai