My App

Quick Start

This guide will walk you through the basics of making your first plot and teach you the fundamentals of the tool.

Step 1: Loading Your First Dataset

Probably accepts tabular data of any column width, as long as the columns contain non-complex data structures. Compatible data types include any valid non-complex SQL type or primitive type from a typical spreadsheet: numbers, text, dates, etc. The tool does not support arrays, objects, or nested data structures.

  1. From the main dashboard, click the "Add Dataset" button in the top right corner.

  2. Choose your data source:

    • Upload a File: For CSV or JSON files, simply drag and drop your file into the designated area or click to browse your computer.
    • Connect to a Snowflake Database: a. Create a connections.toml file in the sources/ directory at the top level of your local Probably installation directory. b. Add the following fields to the file:
      account = "your_account"
      user = "your_username"
      password = "your_password"
      warehouse = "your_warehouse"
      database = "your_database"
      schema = "your_schema"
      role = "your_role"
      c. If these credentials are valid, you'll be able to select your schema and table to load as a new dataset.

    Don't have a dataset ready? Try one of these example datasets from HuggingFace:

  3. Once your data is loaded, Probably will automatically detect the data types for each column. You'll see a preview of your data with the detected types.

  4. Review the auto-detected variable types:

    • Categorical: For discrete variables with a limited number of possible values (e.g., product categories, user types).
    • Continuous: For numeric variables that can take any value within a range (e.g., price, age).
    • DateTime: For date and time information.
    • Text: For longer text fields that might be used for semantic analysis.
  5. If any variables are misclassified, you can adjust them by clicking on the type and selecting the correct one from the dropdown menu.

  6. Once you're satisfied with the variable classifications, click "Confirm" to proceed to the plotting view.

Step 2: Creating Your First Plot

  1. In the right sidebar, you'll see a list of your variables. Find the "X-axis" dropdown and select a variable you want to explore.

X-axis Selection

  1. Probably will automatically generate a frequency plot based on the variable you've selected. This plot shows the distribution of values for your chosen variable.

Frequency Plot

Take a moment to examine this plot:

  • The x-axis shows the different categories or values of your chosen variable.
  • The y-axis shows the frequency or count of each category.
  • This type of plot is crucial at the beginning of an exploratory analysis as it gives you a quick overview of your data's distribution.
  1. Next, choose a variable for the Y-axis to explore the relationship between two variables.

Y-axis Selection

  1. Probably will update the plot based on the types of variables you've selected. For example:
    • If both X and Y are categorical, you'll see a grouped bar chart.
    • If X is categorical and Y is continuous, you'll see a box plot.
    • If both X and Y are continuous, you'll see a distribution plot with regression lines.

X vs Y Plot

This plot helps answer questions about how one variable relates to another. For instance, "How does customer satisfaction vary across different product categories?"

  1. The Z-axis represents a statistical test computed over either the X or Y variables (selectable by user). It defaults to a proportional z-test for multi-category variables. The Z-axis shows:
    • A global mean for the entire category (black dashed line)
    • Category-specific means from each z-test (red dashed lines)
    • Shaded areas (red or green) when a category mean is significantly different from others, based on the configured p-value threshold.

Step 3: Exploring Your Data

Now that you have your first plot, let's explore some basic interactions:

  1. Hover over data points: This will display detailed information about that specific data point.

  2. Use the zoom controls: In the top right of the plot, you can zoom in or out to focus on specific areas of interest.

  3. Apply a filter: Click on a category in your frequency plot and select "Filter". This will create a new plot showing only the data for that specific category.

Filtering

  1. Add a Y-axis: After filtering, add a Y-axis variable to see how it's distributed within your filtered category.

Filtered Y Plot

Notice how some bars are shaded green or red. This indicates that these categories are significantly different from the overall mean, helping you quickly identify important patterns in your data.

Next Steps

Congratulations! You've now created your first visualizations in Probably. From here, you can:

  • Explore different combinations of variables to uncover interesting relationships in your data.
  • Dive deeper into X vs Y Plotting to learn about more advanced plotting features.
  • Check out the Cluster View to explore semantic relationships in text data.
  • Learn about Filtering and Navigation to refine your analysis.

Remember, the key to effective data analysis is asking good questions. Don't be afraid to experiment with different visualizations and explore your data from multiple angles. Happy analyzing!

On this page