X vs Y Plotting in Probably

X vs Y plotting is a cornerstone feature of Probably, allowing you to visualize relationships between variables and uncover insights in your data. This guide will walk you through creating your first plot, interpreting results, and navigating Probably's powerful features.

Creating Your First Plot

In the right sidebar, you'll see a list of your variables. Find the "X-axis" dropdown and select a variable you want to explore.
Probably will automatically generate a frequency plot based on the variable you've selected. This plot shows the distribution of values for your chosen variable.
Take a moment to examine this plot:
- The x-axis shows the different categories or values of your chosen variable.
- The y-axis shows the frequency or count of each category.
- This type of plot is crucial at the beginning of an exploratory analysis as it gives you a quick overview of your data's distribution.
Next, choose a variable for the Y-axis to explore the relationship between two variables.
Probably will update the plot based on the types of variables you've selected. For example:
- If both X and Y are categorical, you'll see a grouped bar chart.
- If X is categorical and Y is continuous, you'll see a box plot.
- If both X and Y are continuous, you'll see a distribution plot with regression lines.

Understanding the Z-Axis Feature

The Z-axis in Probably is a powerful tool for statistical analysis, not a third plotted variable as in 3D plots.

Select either the X or Y variable for Z-axis analysis.
Probably performs a proportional z-test for multi-category variables.
The Z-axis shows:
- A global mean for the entire category (black dashed line)
- Category-specific means from each z-test (red dashed lines)

Interpreting Z-Axis Results:

The Z-axis scale ranges from 0 to 1:
- 0 represents the ABSENCE of the category value
- 1 represents the PRESENCE of the category value
Bars are shaded red or green when a category mean is significantly different from others:
- Green: Category mean is significantly above other category means
- Red: Category mean is significantly below other category means

You can customize the Z-axis analysis by adjusting the minimum p-value threshold (default is 0.005) in the "Advanced Controls" panel.

Exploring Your Data

Now that you have your first plot, let's explore some basic interactions:

Hover over data points: This will display detailed information about that specific data point.
Use the zoom controls: In the top right of the plot, you can zoom in or out to focus on specific areas of interest.
Apply a filter: Click on a category in your frequency plot and select "Filter". This will create a new plot showing only the data for that specific category.
Add a Y-axis: After filtering, add a Y-axis variable to see how it's distributed within your filtered category.

Transitioning to Cluster View

If you have a text variable in your dataset, select it as your X-axis.
Probably will automatically switch to Cluster View, showing a 2D representation of your text data.
You can color-code the clusters using a categorical variable by setting it as the Z-axis.
Use the cluster summarization feature to get an overview of the main themes in each cluster.

Best Practices for X vs Y Plotting

Start with frequency plots for individual variables to understand their distributions.
Progress to bivariate plots to explore relationships between variables.
Use the Z-axis feature to identify statistically significant differences between categories.
Adjust the p-value threshold in advanced settings to control the strictness of significance testing.
Always consider the practical significance of results, not just statistical significance.

Next Steps

Congratulations! You've now created your first visualizations in Probably. From here, you can:

Explore different combinations of variables to uncover interesting relationships in your data.
Dive deeper into Cluster View to explore semantic relationships in text data.
Learn about Filtering and Navigation to refine your analysis.
Experiment with Advanced Controls to customize your analysis further.

Remember, the key to effective data analysis is asking good questions. Don't be afraid to experiment with different visualizations and explore your data from multiple angles. Happy analyzing!