AI Tools for Data Analysis: Data Workflows with LLMs
Traditional data analysis can be a slow, manual process that requires analysts to clean data, build models, and generate reports using time-consuming workflows. This often limits how quickly organizations can act on their data, especially when dealing with large or more complex datasets.
Enter generative AI (GenAI), a new class of tools powered by large language models (LLMs) that dramatically accelerate and simplify data analysis. These tools can interpret structured, semi-structured, and unstructured data, automate repetitive tasks, and generate insights that might otherwise go unnoticed.
This article explores the key features that make LLMs effective for data analysis, offers best-practice recommendations, and highlights real-world examples of how these tools reshape modern data workflows.
Summary of key concepts related to AI tools for data analysis
Data analysis types
Data analysis progresses from understanding past occurrences to determining the most suitable future action. The four essential types of data analysis serve different purposes through specialized tools and techniques.
In the rest of this article, we’ll examine how this evolution applies to a fictitious supermarket, “SuperMart,” and how it might use various types of data analysis to derive business value.

Descriptive analytics (“what happened?”)
Descriptive analytics helps companies understand past and current data. SuperMart uses business intelligence tools, like Power BI or Tableau, to report on total daily, weekly, and monthly sales revenue. The data analysts have created separate dashboards to enable management to drill down into specific categories like dairy, bakery, produce, and beverages.
SuperMart monitors website traffic to track visitors, page views, and bounce rates. The marketing department reviews the Weekly Specials page. It also closely monitors the company’s website traffic, particularly for the Weekly Specials page, tracking key metrics like:
- Page views
- Visitor counts
- Bounce rates
These metrics highlight the effectiveness of online promotions and help refine content strategy for better user engagement.

SuperMart's current implementation has several limitations:
- Performing descriptive analytics involves reviewing 25 different dashboards, and it is not always clear to the teams whether each dashboard is current and where the data is being sourced from.
- Updating dashboards to show inventory levels for a new product takes too long to implement. The consumers of the data are not SQL query experts, so they have to ask a data analyst to update the dashboard to integrate new data.
- Combining unstructured data with structured data in dashboards is challenging when performing diagnostic analysis.
- Onboarding new team members is time-consuming. They spend too much time finding the right dashboard rather than providing recommendations or acting on insights.
Descriptive analytics with LLMs
The SuperMart CTO recognizes these issues and that AI can help with descriptive analytics. LLMs are capable of generating accurate SQL queries from natural language questions, so instead of navigating multiple dashboards, SuperMart employees can ask questions in natural language. For example, the produce department manager can make the following request:
“Give me a summary of last week's sales performance, highlighting key trends and underperforming categories.”
The LLM-based system will create an SQL statement by recognizing that the user is asking about sales, so the system queries the sales table and limits records by a time period of the last week. The system would generate the relevant chart and a textual summary like this one:
“Last week, SuperMart achieved total sales of $150,000. The beverage department was a strong performer, contributing $30,000, a 15% increase week over week, likely driven by the recent heatwave. However, the bakery department saw a 10% decline, which warrants further investigation."
The bar chart below shows each department’s weekly sales with the week-over-week percentage change annotated above each bar.

{{banner-large-3="/banners"}}
Diagnostic analytics (“why did it happen?”)
After identifying what occurred, diagnostic analytics investigates the reasons behind it. For example, suppose that last week, a SuperMart data analyst observed that ice cream sales dropped by 20% on the weekly sales dashboard. From investigating previous sales drops, the analyst thinks the drop could be caused by any of the following:
- Competitor promotions
- Local events
- Weather conditions
- Poor stock control
The SuperMart head of sales recognizes that investigating sales drops takes her team a lot of time, as they have to investigate each possible cause to compile a report.
Diagnostic analytics with LLMs
Upon noticing the ice cream sales drop, an LLM could analyze various data streams and suggest potential reasons for the 20% drop in ice cream sales:
- “Local weather reports show a significant temperature decrease (average 5°C lower than the previous week).”
- “Competitor 'ValueMart' launched a 'BOGO free' on frozen desserts (source: competitor promotions database).”
- “Social media sentiment around ice cream shows a slight dip locally.”
- “No stockouts were reported for popular ice cream brands.”
Predictive analytics (“what is likely to happen?”)
Predictive analytics uses historical and real-time data through probabilistic models to forecast future outcomes. At SuperMart, the buying team uses demand forecasting to predict how much of a product to order in the coming days, weeks, or months. For example, at a particular time, the team may have used past sales data and upcoming weather forecasts to predict a 30% increase in demand for ice cream. To develop these forecasts, the SuperMart team has been using traditional machine learning tools (like scikit-learn or TensorFlow) and forecasting tools (such as ARIMA or Prophet).
Predictive analytics with LLMs
Some of the outputs from the existing tools are complex for non-technical users to understand.
For instance, a traditional predictive model might simply present a churn probability of 75% for a specific customer, along with a list of raw contributing features like:
shopping_frequency_change: -3,
avg_basket_value_change: -55,
recent_feedback_sentiment: negative.
The team integrated a Large Language Model (LLM) to improve these outputs. Now, instead of just seeing the raw data, the new AI tool explains the reasoning behind the production in plain language:
“Customer John Doe has a 75% probability of churning. This is based on his reduced shopping frequency (from once a week to once a month), a significant decrease in basket value (average $80 down to $25), and his recent negative feedback regarding product availability.”
The marketing team at SuperMart reviews food-related social media to identify trends. LLMs have been valuable for analyzing online food blogs, social media posts, and competitor product descriptions. Every week, the team receives a summary from the LLM, such as “rising interest in plant-based diets.” The outputs from the LLM are used as inputs to the forecasting tools.
Prescriptive analytics (“what should we do?”)
The predictive analytics stage provides specific recommendations that help organizations maximize their outcomes. SuperMart has introduced a dynamic pricing model to recommend optimal prices based on demand, competitor pricing, stock levels, and desired profit margins. For example, last week, the dynamic pricing model recommended a 15% price reduction on avocados for the next 48 hours because stock levels were too high.
Prescriptive analytics with LLMs
LLMs can help brainstorm, draft, and refine action plans based on the dynamic pricing model's output. An LLM cannot replace SuperMart’s dynamic pricing model; instead, SuperMart uses LLMs to provide recommendations in natural language.
For stock optimization, if a supplier suddenly announces a delay for a popular item, an LLM integrated with the predictive demand model and current inventory system could immediately suggest the following:
“Supplier X has a 2-day delay on 'Brand A' pasta. Recommendations:
- Temporarily increase promotion of 'Brand B' pasta (similar product) by 10%.
- Notify online shoppers of potential delay if they order Brand A.
- Generate a list of stores with critically low stock of Brand A to prioritize the next shipment.”
Making data AI-ready
SuperMart's data architecture includes a wide range of data types, which it describes as a data lakehouse architecture. This approach enables structured, unstructured, and semi-structured data to be stored in raw formats and provides queryable tables to support the company’s business intelligence tools.

Data generally has three broad types, each requiring different handling and techniques. Here’s a summary of the three; read our article AI Readiness Framework for Enterprise AI Deployment to learn more about preparing data for LLMs.
Structured data
The main challenge when integrating LLMs with SuperMart’s existing data sources was providing sufficient context to the LLM. SuperMart had greater success with simple database schemas and by providing metadata to the LLM that described the different columns and data types.
Semi-structured data
A hybrid approach combines traditional ML techniques with LLMs for semi-structured data like JSON, XML, and logs. This approach is necessary because semi-structured data often contains well-defined fields and more free-form text, requiring ML methods to extract and process predictable elements while relying on LLMs to interpret and summarize more nested content.
Unstructured data
Unstructured data like texts (e.g., PDFs or chat transcripts), images, and audio start with embedding models and vector databases to transform the raw content into semantically searchable representations. These embeddings enable semantic search, summarization, sentiment analysis, and Q&A tasks.
Required features of AI-based tools in data workflows
The main advantage of AI tools in modern data workflows is their ability to handle a wide range of data types, including structured, semi-structured, and unstructured sources. Traditional BI tools have effectively analyzed structured data like spreadsheets or databases. Unfortunately, they require time-consuming preparation to handle formats such as PDFs, emails, images, or natural language text.
In contrast, AI tools can now directly interpret and analyze these diverse sources. Users can ask questions about a customer feedback dataset composed of open-ended survey responses or extract insights from unstructured documents without first transforming the data into a tabular format. This unlocks new possibilities for generating insights from previously inaccessible or underutilized data.
Enterprise-ready AI-based data analysis tools should have the following essential features.
Natural-language interface
Natural language inference is the ability to accept natural language queries and return intuitive, meaningful responses. An AI natural language interface enables users to control data systems through regular spoken language. This feature makes data analysis accessible to everyone because it lets users retrieve database information, create reports, and find insights through simple language instead of programming code.
Insight-driven assistant functionality
Acting as a “thinking buddy,” the AI tool provides suggestions, surfaces relevant insights, and prompts and trends in their data. AI tools with assistant capabilities actively deliver insights in addition to their query response functions. These systems help analysts by detecting anomalies, suggesting trends, and making recommendations.
In situ data processing
A good AI tool works directly within existing data environments rather than requiring data consolidation from multiple sources. Data analysis through in situ data processing occurs directly at its original location without requiring data movement. Such an approach minimizes delays while improving security standards and enabling immediate analytics capabilities.
Flexible context layer
A context layer is an ever-evolving semantic brain for your AI. It ingests and learns from enterprise-specific content: database schemas, data dictionaries, business glossaries, documentation, or anything else that describes what your data means and how your business talks about it. From this, it builds a rich map of business semantics. When a user poses a question, the context layer provides the AI with the necessary hints and facts to understand that question in the correct business context. A context layer overcomes the limitations of traditional semantic layers found in BI tools.
Comparative overview
The implementation of these features in AI tools speeds up data workflows and enables quick, informed decision-making. The future development of AI depends on adopting these functionalities to achieve maximum data analytics benefits.
WisdomAI provides these essential features to enable AI-powered analytics from structured and unstructured data. The natural language layer translates plain English queries into SQL queries, and the context layer learns enterprise-specific terminology. WisdomAI follows up with further insights and connects with your entire data ecosystem for unified insights across teams and tools.

{{banner-small-3="/banners"}}
Last thoughts
Over time, AI will continue to become more commonplace in data analytics due to the benefits it offers in areas such as analysis speed, data validation, data democratization, and automation. The future of AI in data analytics looks exciting, with many new tools and applications being developed constantly. These include coding for data analysis, explaining findings, creating synthetic data, crafting dashboards, and automating data entry.
Applying these techniques and tools will help you stay relevant as a data professional. With AI, you can tackle data analysis tasks more efficiently and accurately. AI will play an essential role in data analysis by identifying patterns and extracting meaningful insights from extensive datasets.