Guide

AI Tools for Data Analysis: Data Workflows with LLMs

Traditional data analysis can be a slow, manual process that requires analysts to clean data, build models, and generate reports using time-consuming workflows. This often limits how quickly organizations can act on their data, especially when dealing with large or more complex datasets.

Enter generative AI (GenAI), a new class of tools powered by large language models (LLMs) that dramatically accelerate and simplify data analysis. These tools can interpret structured, semi-structured, and unstructured data, automate repetitive tasks, and generate insights that might otherwise go unnoticed.

This article explores the key features that make LLMs effective for data analysis, offers best-practice recommendations, and highlights real-world examples of how these tools reshape modern data workflows.

Summary of key concepts related to AI tools for data analysis

Concept	Description
Data analysis types	AI analytics tools can be categorized as descriptive, diagnostic, predictive, and prescriptive.
Descriptive analytics ("what happened?")	AI-powered descriptive analytics tools help businesses understand what happened in the past by analyzing historical data and identifying patterns, trends, and providing insights into key performance indicators (KPIs).
Diagnostic analytics ("why did it happen?")	These analytics help uncover the "why" behind observed data patterns by identifying root causes and contributing factors. They utilize machine learning and data mining techniques to explore data, make correlations, and pinpoint the reasons behind anomalies or trends.
Predictive analytics ("what is likely to happen?")	Predictive analytics uses AI and data analysis to forecast future outcomes by identifying patterns in historical data and applying machine learning techniques. This information helps businesses make better decisions, optimize operations, and personalize customer experiences.
Prescriptive analytics ("what should we do?")	Prescriptive analytics tools combine predictive modeling and optimization techniques to recommend the best action to achieve desired outcomes based on constraints and business goals.
Making data AI-ready	Make sure your data is ready for AI, whether it is structured (such as spreadsheets or databases), unstructured (like emails or PDFs), or semi-structured (e.g., JSON or XML).
Required features of AI-based tools	Key features include a natural-language interface, insight-driven assistant functionality, in situ data processing, and a flexible context layer.

Data analysis types

Data analysis progresses from understanding past occurrences to determining the most suitable future action. The four essential types of data analysis serve different purposes through specialized tools and techniques.

Analysis Type	Analyst's Question	Tool Types
Descriptive Analytics	What happened?	Business Intelligence Tools Reporting Software
Diagnostic Analytics	Why did it happen?	Data Mining Tools Statistical Analysis Software
Predictive Analytics	What is likely to happen?	Machine Learning Platforms Forecasting Tools
Prescriptive Analytics	What should we do?	Optimization Software Simulation Tools

In the rest of this article, we’ll examine how this evolution applies to a fictitious supermarket, “SuperMart,” and how it might use various types of data analysis to derive business value.

AI Tools for Data Analysis: Data Workflows with LLMs — How data analytics evolves (source)

Descriptive analytics (“what happened?”)

Descriptive analytics helps companies understand past and current data. SuperMart uses business intelligence tools, like Power BI or Tableau, to report on total daily, weekly, and monthly sales revenue. The data analysts have created separate dashboards to enable management to drill down into specific categories like dairy, bakery, produce, and beverages.

SuperMart monitors website traffic to track visitors, page views, and bounce rates. The marketing department reviews the Weekly Specials page. It also closely monitors the company’s website traffic, particularly for the Weekly Specials page, tracking key metrics like:

Page views
Visitor counts
Bounce rates

These metrics highlight the effectiveness of online promotions and help refine content strategy for better user engagement.

SuperMart's current implementation has several limitations:

Performing descriptive analytics involves reviewing 25 different dashboards, and it is not always clear to the teams whether each dashboard is current and where the data is being sourced from.
Updating dashboards to show inventory levels for a new product takes too long to implement. The consumers of the data are not SQL query experts, so they have to ask a data analyst to update the dashboard to integrate new data.
Combining unstructured data with structured data in dashboards is challenging when performing diagnostic analysis.
Onboarding new team members is time-consuming. They spend too much time finding the right dashboard rather than providing recommendations or acting on insights.

Descriptive analytics with LLMs

The SuperMart CTO recognizes these issues and that AI can help with descriptive analytics. LLMs are capable of generating accurate SQL queries from natural language questions, so instead of navigating multiple dashboards, SuperMart employees can ask questions in natural language. For example, the produce department manager can make the following request:

“Give me a summary of last week's sales performance, highlighting key trends and underperforming categories.”

The LLM-based system will create an SQL statement by recognizing that the user is asking about sales, so the system queries the sales table and limits records by a time period of the last week. The system would generate the relevant chart and a textual summary like this one:

“Last week, SuperMart achieved total sales of $150,000. The beverage department was a strong performer, contributing $30,000, a 15% increase week over week, likely driven by the recent heatwave. However, the bakery department saw a 10% decline, which warrants further investigation."

The bar chart below shows each department’s weekly sales with the week-over-week percentage change annotated above each bar.

‍

Diagnostic analytics (“why did it happen?”)

After identifying what occurred, diagnostic analytics investigates the reasons behind it. For example, suppose that last week, a SuperMart data analyst observed that ice cream sales dropped by 20% on the weekly sales dashboard. From investigating previous sales drops, the analyst thinks the drop could be caused by any of the following:

Competitor promotions
Local events
Weather conditions
Poor stock control

The SuperMart head of sales recognizes that investigating sales drops takes her team a lot of time, as they have to investigate each possible cause to compile a report.

Diagnostic analytics with LLMs

Upon noticing the ice cream sales drop, an LLM could analyze various data streams and suggest potential reasons for the 20% drop in ice cream sales:

“Local weather reports show a significant temperature decrease (average 5°C lower than the previous week).”
“Competitor 'ValueMart' launched a 'BOGO free' on frozen desserts (source: competitor promotions database).”
“Social media sentiment around ice cream shows a slight dip locally.”
“No stockouts were reported for popular ice cream brands.”

Predictive analytics (“what is likely to happen?”)

Predictive analytics uses historical and real-time data through probabilistic models to forecast future outcomes. At SuperMart, the buying team uses demand forecasting to predict how much of a product to order in the coming days, weeks, or months. For example, at a particular time, the team may have used past sales data and upcoming weather forecasts to predict a 30% increase in demand for ice cream. To develop these forecasts, the SuperMart team has been using traditional machine learning tools (like scikit-learn or TensorFlow) and forecasting tools (such as ARIMA or Prophet).

Predictive analytics with LLMs

Some of the outputs from the existing tools are complex for non-technical users to understand.

For instance, a traditional predictive model might simply present a churn probability of 75% for a specific customer, along with a list of raw contributing features like:

shopping_frequency_change: -3,

avg_basket_value_change: -55,

recent_feedback_sentiment: negative.

The team integrated a Large Language Model (LLM) to improve these outputs. Now, instead of just seeing the raw data, the new AI tool explains the reasoning behind the production in plain language:

“Customer John Doe has a 75% probability of churning. This is based on his reduced shopping frequency (from once a week to once a month), a significant decrease in basket value (average $80 down to $25), and his recent negative feedback regarding product availability.”

The marketing team at SuperMart reviews food-related social media to identify trends. LLMs have been valuable for analyzing online food blogs, social media posts, and competitor product descriptions. Every week, the team receives a summary from the LLM, such as “rising interest in plant-based diets.” The outputs from the LLM are used as inputs to the forecasting tools.

Prescriptive analytics (“what should we do?”)

The predictive analytics stage provides specific recommendations that help organizations maximize their outcomes. SuperMart has introduced a dynamic pricing model to recommend optimal prices based on demand, competitor pricing, stock levels, and desired profit margins. For example, last week, the dynamic pricing model recommended a 15% price reduction on avocados for the next 48 hours because stock levels were too high.

Prescriptive analytics with LLMs

LLMs can help brainstorm, draft, and refine action plans based on the dynamic pricing model's output. An LLM cannot replace SuperMart’s dynamic pricing model; instead, SuperMart uses LLMs to provide recommendations in natural language.

For stock optimization, if a supplier suddenly announces a delay for a popular item, an LLM integrated with the predictive demand model and current inventory system could immediately suggest the following:

“Supplier X has a 2-day delay on 'Brand A' pasta. Recommendations:

Temporarily increase promotion of 'Brand B' pasta (similar product) by 10%.
Notify online shoppers of potential delay if they order Brand A.
Generate a list of stores with critically low stock of Brand A to prioritize the next shipment.”

Making data AI-ready

SuperMart's data architecture includes a wide range of data types, which it describes as a data lakehouse architecture. This approach enables structured, unstructured, and semi-structured data to be stored in raw formats and provides queryable tables to support the company’s business intelligence tools.

Data generally has three broad types, each requiring different handling and techniques. Here’s a summary of the three; read our article AI Readiness Framework for Enterprise AI Deployment to learn more about preparing data for LLMs.

Structured data

The main challenge when integrating LLMs with SuperMart’s existing data sources was providing sufficient context to the LLM. SuperMart had greater success with simple database schemas and by providing metadata to the LLM that described the different columns and data types.

Semi-structured data

A hybrid approach combines traditional ML techniques with LLMs for semi-structured data like JSON, XML, and logs. This approach is necessary because semi-structured data often contains well-defined fields and more free-form text, requiring ML methods to extract and process predictable elements while relying on LLMs to interpret and summarize more nested content.

Unstructured data

Unstructured data like texts (e.g., PDFs or chat transcripts), images, and audio start with embedding models and vector databases to transform the raw content into semantically searchable representations. These embeddings enable semantic search, summarization, sentiment analysis, and Q&A tasks.

Required features of AI-based tools in data workflows

The main advantage of AI tools in modern data workflows is their ability to handle a wide range of data types, including structured, semi-structured, and unstructured sources. Traditional BI tools have effectively analyzed structured data like spreadsheets or databases. Unfortunately, they require time-consuming preparation to handle formats such as PDFs, emails, images, or natural language text.

In contrast, AI tools can now directly interpret and analyze these diverse sources. Users can ask questions about a customer feedback dataset composed of open-ended survey responses or extract insights from unstructured documents without first transforming the data into a tabular format. This unlocks new possibilities for generating insights from previously inaccessible or underutilized data.

Enterprise-ready AI-based data analysis tools should have the following essential features.

Natural-language interface

Natural language inference is the ability to accept natural language queries and return intuitive, meaningful responses. An AI natural language interface enables users to control data systems through regular spoken language. This feature makes data analysis accessible to everyone because it lets users retrieve database information, create reports, and find insights through simple language instead of programming code.

Insight-driven assistant functionality

Acting as a “thinking buddy,” the AI tool provides suggestions, surfaces relevant insights, and prompts and trends in their data. AI tools with assistant capabilities actively deliver insights in addition to their query response functions. These systems help analysts by detecting anomalies, suggesting trends, and making recommendations.

In situ data processing

A good AI tool works directly within existing data environments rather than requiring data consolidation from multiple sources. Data analysis through in situ data processing occurs directly at its original location without requiring data movement. Such an approach minimizes delays while improving security standards and enabling immediate analytics capabilities.

Flexible context layer

A context layer is an ever-evolving semantic brain for your AI. It ingests and learns from enterprise-specific content: database schemas, data dictionaries, business glossaries, documentation, or anything else that describes what your data means and how your business talks about it. From this, it builds a rich map of business semantics. When a user poses a question, the context layer provides the AI with the necessary hints and facts to understand that question in the correct business context. A context layer overcomes the limitations of traditional semantic layers found in BI tools.

Comparative overview

The implementation of these features in AI tools speeds up data workflows and enables quick, informed decision-making. The future development of AI depends on adopting these functionalities to achieve maximum data analytics benefits.

Feature	Description	Example Tool
AI natural language interface	Enables users to interact with data systems using everyday language	WisdomAI, Google's Natural Language AI
Insight-driven assistant functionality	Proactively provides insights, detects anomalies, and offers recommendations	WisdomAI, Adobe Customer Journey Analytics
In situ data processing	Analyzes data directly at its source, reducing latency and enhancing security	WisdomAI, Apache Wayang
Context layer	Provides descriptive metadata to the LLM	WisdomAI

WisdomAI provides these essential features to enable AI-powered analytics from structured and unstructured data. The natural language layer translates plain English queries into SQL queries, and the context layer learns enterprise-specific terminology. WisdomAI follows up with further insights and connects with your entire data ecosystem for unified insights across teams and tools.

Last thoughts

Over time, AI will continue to become more commonplace in data analytics due to the benefits it offers in areas such as analysis speed, data validation, data democratization, and automation. The future of AI in data analytics looks exciting, with many new tools and applications being developed constantly. These include coding for data analysis, explaining findings, creating synthetic data, crafting dashboards, and automating data entry.

Applying these techniques and tools will help you stay relevant as a data professional. With AI, you can tackle data analysis tasks more efficiently and accurately. AI will play an essential role in data analysis by identifying patterns and extracting meaningful insights from extensive datasets.

Continue reading this series

CHAPTER

AI for Business Intelligence: Best Practices For Leveraging AI

Learn how AI is transforming business intelligence from a reporting tool into a strategic partner for decision-making by combining advanced analytics, natural language processing, and industry-specific implementation strategies.

Read the guide

CHAPTER

Text-to-SQL Systems: Tutorial & Best Practices

Learn about the advancements and challenges of using Text-to-SQL technology, including the role of large language models, techniques for resolving ambiguity and optimizing join relationships, and security best practices.

Read the guide

CHAPTER

Overcoming Semantic Layer Limitations for AI-driven Analytics

Learn the key concepts and applications of semantic layers, bridging the gap between complex data structures and user-friendly interfaces in AI-driven analytics.

Read the guide

CHAPTER

AI for Data Analysis: Best Practices for Descriptive, Diagnostic, Predictive, and Prescriptive Analytics

Learn how generative AI techniques can revolutionize data analysis for businesses, breaking through traditional limitations and streamlining the process with natural language prompts and automation.

Read the guide

CHAPTER

Self-Service BI: AI-based Business Intelligence

Learn about self-service business intelligence systems, their role in revolutionizing data analysis, and the challenges and benefits of implementing them.

Read the guide

CHAPTER

Data Cleansing in SQL: Traditional vs Generative AI-based Techniques

Learn the role of data cleansing in SQL and the emergence of generative AI for analyzing structured data and improving the responses with descriptive metadata.

Read the guide

CHAPTER

Analyze CSV Data with ChatGPT: Tutorial, Challenges, and Limitations

Learn how to analyze CSV data with ChatGPT, including uploading, plotting graphs, generating dummy data, and utilizing AI platforms.

Read the guide

CHAPTER

Conversational Analytics: Best Practices for AI Agents

Learn about the power of conversational analytics, powered by AI agents, to provide real-time insights and actions from data through natural language conversations.

Read the guide

CHAPTER

AI Tools for Data Analysis: Data Workflows with LLMs

Learn how AI analytics tools can be categorized into descriptive, diagnostic, predictive, and prescriptive types, and how they can improve data analysis processes for businesses.

Read the guide

CHAPTER

Will AI Replace Data Analysts? A Guide For Data Analysts

Learn how AI tools can enhance data analysts' work, without replacing their expertise in extracting insights to benefit organizations and careers.

Read the guide

CHAPTER

Business Intelligence Strategy: Best Practices for Building an AI-Ready BI Architecture

Learn best practices for evolving a modern business intelligence strategy to support AI and LLM-driven workflows, including understanding the relationships between BI, data, and technology and incorporating feedback loops for continuous improvement.

Read the guide

Insights at your fingertips with AI-powered analytics

Book a demo