Supported problem types
H2O LLM DataStudio offers support for various problem types and workflows, providing users with the necessary tools to prepare datasets and train models for specific tasks. This page serves as a comprehensive guide to the supported problem types, highlights their importance, and explains how the application can assist in dataset preparation and model training.
Question and Answer
Description: H2O LLM DataStudio simplifies dataset preparation for question answering models. The datasets consist of contextual information, questions, and their respective answers. Its features facilitate the creation of well-structured datasets essential for training models to accurately respond to user queries based on the provided context.
Expected Columns: 'Question', 'Answer', and 'Context'.
Description: The Text Summarization workflow is designed for datasets consisting of articles and their corresponding summaries. Using H2O LLM DataStudio tools, this workflow simplifies the process of extracting vital information from articles, allowing you to create concise summaries that capture the main points. The resulting datasets are valuable for training text summarization models that can produce concise and informative summaries from lengthy text.
Expected Columns: 'Article' and 'Summary'.
article summary Sally Forrest, an actress-dancer who graced the silver screen throughout the '40s and '50s in MGM musicals and films such as the 1956 noir While the City Sleeps died on March 15 at her home in Beverly Hills, California. Forrest, whose birth name was Katherine Feeney, was 86 and had long battled cancer. Her publicist, Judith Goffin, announced the news Thursday....Forrest married writer-producer Milo Frank in 1951. He died in 2004. She is survived by her niece, Sharon Durham, and nephews, Michael and Mark Feeney. Career: A San Diego native, Forrest became a protege of Hollywood trailblazer Ida Lupino, who cast her in starring roles in films "Sally Forrest, an actress-dancer who graced the silver screen throughout the '40s and '50s in MGM musicals and films died on March 15 . Forrest, whose birth name was Katherine Feeney, had long battled cancer . A San Diego native, Forrest became a protege of Hollywood trailblazer Ida Lupino, who cast her in starring roles in films ."
Description: H2O LLM DataStudio assists in preparing datasets that include prompts or instructions along with their corresponding responses. These datasets are essential for training models to understand and follow provided instructions, enabling accurate responses to user prompts.
Expected Columns: 'Prompt' and 'Response'.
prompt response Translate the phrase "Good Morning" to French Bonjour
Human - Bot Conversations
Description: This workflow deals with datasets containing dialogues between human users and chatbots. These datasets are crucial for training models to comprehend user intents and deliver appropriate responses, thereby improving conversational experiences. H2O LLM DataStudio aids in efficiently structuring and organizing the conversational data, including user queries, and bot responses.
Expected Columns: 'Message_id', 'Parent_id', 'Text', and 'Role'.
message_id parent_id text role 384ad8e0-8fc2-4dfd-bf48-0c417f6c5f0f 7d05acb7-9360-458c-8a1d-c0b6492b8f8a "What are your thoughts on the censorship of ChatGPT's output and its liberal biases?" prompter
Description: In this workflow, H2O LLM DataStudio helps prepare datasets containing extensive texts for further pretraining of language models. The dataset preparation process focuses on organizing long text data, allowing language models to learn from a diverse range of linguistic patterns. This enhances their language understanding and generation capabilities.
Expected Column: 'Text'.
text Chrysaethe amoena Chrysaethe amoena is a species of beetle in the family Cerambycidae. It was described by Gounelle in 1911.