AI Agents Processing Time Series and Large Dataframes

Build from Scratch using only Python & Ollama (no GPU, no APIKEY) The post AI Agents Processing Time Series and Large Dataframes appeared first on Towards Data Science.

Apr 22, 2025 - 18:18
 0
AI Agents Processing Time Series and Large Dataframes

Intro

Agents are AI systems, powered by LLMs, that can reason about their objectives and take actions to achieve a final goal. They are designed not just to respond to queries, but to orchestrate a sequence of operations, including processing data (i.e. dataframes and time series). This ability unlocks numerous real-world applications for democratizing access to data analysis, such as automating reporting, no-code queries, support on data cleaning and manipulation. 

Agents that can interact with dataframes in two different ways: 

  • with natural language the LLM reads the table as a string and tries to make sense of it based on its knowledge base
  • by generating and executing code — the Agent activates tools to process the dataset as an object. 

So, by combining the power of NLP with the precision of code execution, AI Agents enable a broader range of users to interact with complex datasets and derive insights.

In this tutorial, I’m going to show how to process dataframes and time series with AI Agents. I will present some useful Python code that can be easily applied in other similar cases (just copy, paste, run) and walk through every line of code with comments so that you can replicate this example (link to full code at the end of the article).

Setup

Let’s start by setting up Ollama (pip install ollama==0.4.7), a library that allows users to run open-source LLMs locally, without needing cloud-based services, giving more control over data privacy and performance. Since it runs locally, any conversation data does not leave your machine.

First of all, you need to download Ollama from the website. 

Then, on the prompt shell of your laptop, use the command to download the selected LLM. I’m going with Alibaba’s Qwen, as it’s both smart and light.

After the download is completed, you can move on to Python and start writing code.

import ollama
llm = "qwen2.5"

Let’s test the LLM:

stream = ollama.generate(model=llm, prompt='''what time is it?''', stream=True)
for chunk in stream:
    print(chunk['response'], end='', flush=True)

Time Series

A time series is a sequence of data points measured over time, often used for analysis and forecasting. It allows us to see how variables change over time, and it’s used to identify trends and seasonal patterns.

I’m going to generate a fake time series dataset to use as an example.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## create data
np.random.seed(1) #<--for reproducibility
length = 30
ts = pd.DataFrame(data=np.random.randint(low=0, high=15, size=length),
                  columns=['y'],
                  index=pd.date_range(start='2023-01-01', freq='MS', periods=length).strftime('%Y-%m'))

## plot
ts.plot(kind="bar", figsize=(10,3), legend=False, color="black").grid(axis='y')

Usually, time series datasets have a really simple structure with the main variable as a column and the time as the index.

Before transforming it into a string, I want to make sure that everything is placed under a column, so that we don’t lose any piece of information.

dtf = ts.reset_index().rename(columns={"index":"date"})
dtf.head()

Then, I shall change the data type from dataframe to dictionary.

data = dtf.to_dict(orient='records')
data[0:5]

Finally, from dictionary to string.

str_data = "\n".join([str(row) for row in data])
str_data

Now that we have a string, it can be included in a prompt that any language model is able to process. When you paste a dataset into a prompt, the LLM reads the data as plain text, but can still understand the structure and meaning based on patterns seen during training.

prompt = f'''
Analyze this dataset, it contains monthly sales data of an online retail product:
{str_data}
'''

We can easily start a chat with the LLM. Please note that, right now, this is not an Agent as it doesn’t have any Tool, we’re just using the language model. While it doesn’t process numbers like a computer, the LLM can recognize column names, time-based patterns, trends, and outliers, especially with smaller datasets. It can simulate analysis and explain findings, but it won’t perform precise calculations independently, as it’s not executing code like an Agent.

messages = [{"role":"system", "content":prompt}]

while True:
    ## User
    q = input('                        </div>
                                            <div class=
                            
                                Read More