AI promises to revolutionize data analysis, from automating data pipelines to finding meaningful patterns in massive, unstructured datasets. Businesses and data teams are rapidly adopting AI, realizing its potential to streamline their entire business analytics pipeline. But what specific tasks are a good starting point for data analysts looking to incorporate AI into their data workflow?
In this post, we highlight how AI can improve a critical phase in the data workflow: exploratory data analysis. We focus on using AI to streamline data exploration to:
Quickly profile data
Fast-track data wrangling using natural language queries
Instantly draft exploratory charts
For each of the use cases above, we share example prompts to use when analyzing data with the help of AI, then show a real world example of how it’s executed in an Observable Canvas.
Use AI for fast data profiling
Data profiling is the process of gaining high-level familiarity with the structure, content, and quality of data in a database or data warehouse. In other words, it helps give a quick early answer to the question: “What’s in the data, and what can I do with it?” It is an important step in data exploration because it helps data analysts understand the data and uncover data quality issues, so they can do any necessary cleaning and decide on appropriate analysis methods.
A large part of data profiling is finding summary statistics. These often include measures of central tendency (like mean, median, and mode), data spread (variance or standard deviation), and extrema (minimum and maximum values) for individual variables. Data profiling can also involve investigations of missingness, duplication, and other indicators of data quality. Table-level information like table dimensions and variable types might also be explored as part of data profiling.
As a one-off calculation the statistics listed above are trivial, but they can be time-consuming to find manually when working with a large database and many fields. AI allows analysts to automate data profiling to get a surface-level look at their data in a fraction of the time.
Examples of AI prompts to help with initial data profiling:
Summarize this data including table dimensions, column names, data types, and summary statistics for each variable.
What number and proportion of values are missing for each column?
Are there any duplicate rows?
Here is the final output of the first prompt above in an Observable Canvas:
Use AI to automate data wrangling
No matter how careful your data collection and quality assurance processes, data wrangling isn’t going anywhere. There will always be a need to clean and reshape data to get it into a better format for downstream data visualizations and analyses.
AI helps analysts speed through time-consuming data wrangling by translating natural language prompts into a series of data manipulations. A tool might do this using UI options that filter, sort, select, or derive values in sequence to complete the task. Or, the prompt might generate a SQL query that returns data in a more usable format. The latter is often called text-to-SQL or natural language to SQL (“NL2SQL” for short).
Here are some examples of AI prompts for data wrangling:
Show me all orders from the Electronics & Media overall category shipped to California in 2022.
How many clothing orders were placed monthly from 2021 to 2023?
Find the top 10 food products shipped to Florida, by total revenue.
Below we show how the first prompt above is completed step-by-step, and all out in the open, in an Observable Canvas:
In Observable Canvases, users can choose between built-in UI options for data wrangling, or activate text-to-SQL to return an editable query. For example, when our AI is prompted to do the same data wrangling steps as above using SQL, an editable SQL node is returned:
Use AI to draft exploratory data visualizations
Data visualization is a critical part of data exploration because it helps analysts to uncover patterns, anomalies, and relationships between variables that can highlight new questions and inform subsequent analysis.
But creating visualizations can be tiresome and frustrating, especially when it involves big data and tedious pre-processing to get it in a compatible shape with your chart type.
It can also be intimidating. When you’ve got a bunch of tables, a blank slate starting point, and any number of chart types to choose from, sometimes it’s hard to know where to begin.
AI can automatically generate charts, helping data analysts get off-and-running with quick data displays that avoid a blank slate.
Here are example prompts to create an exploratory data visualizations:
Draft a line chart of monthly revenue over time, with a different line color for each customer income levels
Create a histogram of product prices for all items in the Electronics & Media category
Make a stacked bar chart of top products by quantity purchased, with fill color based on customer education level
The live response from the third prompt is shown below in an Observable Canvas:
Ideally, you can keep tweaking the charts generated by AI. In canvases, for example, users can prompt the AI to make a chart in code using Observable Plot, then edit the code directly to customize their data visualizations.
Learn about using AI for data analysis in Observable Canvases
We recently shared how we’re making AI transparent and verifiable in Observable Canvases, so you know exactly what’s happening with your data at every step in your analysis. By overcoming the biggest risks of using AI for data analysis — returning unverifiable outputs from a black box that are difficult to validate — we let you confidently use AI for essential steps in data workflows like initial data profiling, data wrangling, and drafting visualizations.
Learn how we’re approaching AI differently in canvases.