AI promises to revolutionize data analysis, from automating data pipelines to finding meaningful patterns in massive, unstructured datasets. Businesses and data teams are rapidly adopting AI, realizing its potential to streamline their entire business analytics pipeline. But what specific tasks are a good starting point for data analysts looking to incorporate AI into their data workflow?
In this post, we highlight how AI can improve a critical phase in the data workflow: exploratory data analysis. Learn how AI streamlines data exploration through quick data profiling, automated data wrangling with natural language queries, and by drafting exploratory charts.
Use AI for fast data profiling
Data profiling is the process of gaining high-level familiarity with the structure, content, and quality of data in a database or data warehouse. In other words, it helps give a quick early answer to the question: “What’s in the data, and what can I do with it?” It is an important step in data exploration because it helps data analysts understand the data and uncover data quality issues, so they can do any necessary cleaning and decide on appropriate analysis methods.
A large part of data profiling is finding summary statistics. These often include measures of central tendency (like mean, median, and mode), data spread (variance or standard deviation), and extrema (minimum and maximum values) for individual variables. Data profiling can also involve investigations of missingness, duplication, and other indicators of data quality. Table-level information like table dimensions and variable types might also be explored as part of data profiling.
As a one-off calculation the statistics listed above are trivial, but they can be time-consuming to find manually when working with a large database and many fields. AI allows analysts to automate data profiling to get a surface-level look at their data in a fraction of the time.
Here are three example AI prompts that can accelerate initial data profiling:
Summarize this data including table dimensions, column names, data types, and summary statistics for each variable.
What number and proportion of values are missing for each column?
Are there any duplicate rows?
With a better big picture understanding of the data, you're less likely to prematurely dive into analyses that are inaccurate, misinformed, or unused. This can save you from costly mistakes and wasted effort down the line.
Use AI to automate data wrangling
No matter how careful your data collection and quality assurance processes, data wrangling isn’t going anywhere. There will always be a need to clean and reshape data to get it into a better format for downstream data visualizations and analyses.
AI helps analysts speed through time-consuming data wrangling by translating natural language prompts into a series of data manipulations. A tool might do this using UI options that filter, sort, select, or derive values in sequence to complete the task. Or, the prompt might generate a SQL query that returns data in a more usable format. The latter is often called text-to-SQL or natural language to SQL (“NL2SQL” for short).
Here are example AI prompts to draft new data wrangling sequences:
Show me all orders from the Electronics & Media overall category shipped to California in 2022.
How many clothing orders were placed monthly from 2021 to 2023?
Find the top 10 food products shipped to Florida, by total revenue.
Keep in mind that AI hallucinates, misinterprets requests, and makes mistakes — so always remember to inspect and verify each data wrangling step.
Use AI to draft exploratory data visualizations
Data visualization is a critical part of data exploration because it helps analysts to uncover patterns, anomalies, and relationships between variables that can highlight new questions and inform subsequent analysis.
But creating visualizations can be tiresome and frustrating, especially when it involves big data and tedious pre-processing to get it in a compatible shape with your chart type. It can also be intimidating. When you’ve got a bunch of tables, a blank slate starting point, and any number of chart types to choose from, sometimes it’s hard to know where to begin.
AI can automatically generate charts, helping data analysts get off-and-running with quick data displays that avoid a blank slate. Here are three example prompts to generate exploratory data visualizations:
Draft a line chart of monthly revenue over time, with a different line color for each customer income levels
Create a histogram of product prices for all items in the Electronics & Media category
Make a stacked bar chart of top products by quantity purchased, with fill color based on customer education level
Ideally, AI "shows its work," exposing any code used to generate outputs and allowing you to make manual chart edits after-the-fact.
Learn more
AI has the potential to revolutionize how we work with data. From initial data profiling to automated data wrangling and drafting new charts, AI can help analysts quickly build a deep understanding of their data before diving into more formal analyses. Learn more best practices for exploratory data analysis in our recent posts: