Tutorial: Analyzing LLM Trace Data
In this tutorial, we'll walk through the process of analyzing LLM trace data using Probably. We'll focus on answering the question: "Which model has the most stable latency distribution?" This analysis will demonstrate Probably's power in handling complex datasets and its ability to provide quick insights into multi-dimensional data.
Dataset Overview
For this tutorial, we'll use a dataset of LLM traces. Each row in our dataset represents a single interaction with an LLM and includes the following columns:
input_variables
: Variables used in the promptrendered_prompt
: The full prompt sent to the modelmodel
: The specific LLM used (e.g., GPT-3.5-turbo, GPT-4, DALL-E)sampling_params
: Parameters used for text generation (e.g., temperature, top_p)output
: The generated text from the LLMtoken_count
: Number of tokens in the interactioncost
: The cost of the API calllatency
: Time taken for the LLM to respond (in milliseconds)
Step 1: Loading the Dataset
- Download the LLM trace dataset from [link to dataset].
- Open Probably and click on "Add Dataset" in the top right corner.
- Upload the CSV file containing the LLM trace data.
- Review the auto-detected column types and adjust if necessary.
- Click "Confirm" to load the dataset.
Step 2: Initial Data Exploration
- Create a frequency plot of the
model
column to see the distribution of traces across different models. - Create a box plot with
model
on the X-axis andlatency
on the Y-axis for an initial view of latency distributions.
Step 3: Analyzing Latency Distributions
- Create a histogram of
latency
for each model:- Set X-axis to
latency
- Set Y-axis to frequency
- Use
model
as the Z-axis variable
- Set X-axis to
- Observe the shape and spread of latency distributions for each model.
Step 4: Calculating Stability Metrics
- Use Probably's statistical functions to calculate the coefficient of variation (CV) for each model's latency:
- CV = (Standard Deviation / Mean) * 100
- A lower CV indicates a more stable distribution
Step 5: Visualizing Stability Metrics
- Create a bar plot with
model
on the X-axis and the calculated CV on the Y-axis. - Sort the bars in ascending order of CV to easily identify the most stable model.
Step 6: Investigating Factors Affecting Latency
- Explore relationships between latency and other variables:
- Create a scatter plot of
token_count
vslatency
, colored bymodel
- Create a box plot of
sampling_params
vslatency
for the most stable model
- Create a scatter plot of
Conclusion
Summarize the findings:
- Identify the model with the most stable latency distribution
- Discuss any patterns or insights discovered during the analysis
- Suggest potential next steps or areas for further investigation
By following this tutorial, you've learned how to use Probably to analyze complex LLM trace data and answer specific questions about model performance. This process demonstrates how Probably's intuitive interface and powerful visualization capabilities can streamline your data analysis workflow and provide valuable insights quickly and easily.