Google’s Data Science Agent: Can It Really Do Your Job?

I tested Google’s Data Science Agent in Colab—here’s what it got right (and where it failed) The post Google’s Data Science Agent: Can It Really Do Your Job? appeared first on Towards Data Science.

Mar 21, 2025 - 20:35

Google’s Data Science Agent: Can It Really Do Your Job?

On March 3rd, Google officially rolled out its Data Science Agent to most Colab users for free. This is not something brand new — it was first announced in December last year, but it is now integrated into Colab and made widely accessible.

Google says it is “The future of data analysis with Gemini”, stating: “Simply describe your analysis goals in plain language, and watch your notebook take shape automatically, helping accelerate your ability to conduct research and data analysis.” But is it a real game-changer in Data Science? What can it actually do, and what can’t it do? Is it ready to replace data analysts and data scientists? And what does it tell us about the future of data science careers?

In this article, I will explore these questions with real-world examples.

What It Can Do

The Data Science Agent is straightforward to use:

Open a new notebook in Google Colab — you just need a Google Account and can use Google Colab for free;
Click “Analyze files with Gemini” — this will open the Gemini chat window on the right;
Upload your data file and describe your goal in the chat. The agent will generate a series of tasks accordingly;
Click “Execute Plan”, and Gemini will start to write the Jupyter Notebook automatically.

Data Science Agent UI (image by author)

Let’s look at a real example. Here, I used the dataset from the Regression with an Insurance Dataset Kaggle Playground Prediction Competition (Apache 2.0 license). This dataset has 20 features, and the goal is to predict the insurance premium amount. It has both continuous and categorical variables with scenarios like missing values and outliers. Therefore, it is a good example dataset for Machine Learning practices.

Jupyter Notebook generated by the Data Science Agent (image by author)

After running my experiment, here are the highlights I’ve observed from the Data Science Agent’s performance:

Customizable execution plan: Based on my prompt of “Can you help me analyze how the factors impact insurance premium amount? “, the Data Science Agent first came up with a series of 10 tasks, including data loading, data exploration, data cleaning, data wrangling, feature engineering, data splitting, model training, model optimization, model evaluation, and data visualization. This is a pretty standard and reasonable process of conducting exploratory data analysis and building a machine learning model. It then asked for my confirmation and feedback before executing the plan. I tried to ask it to focus on Exploratory Data Analysis first, and it was able to adjust the execution plan accordingly. This provides flexibility to customize the plan based on your needs.

Initial tasks the agent generated (image by author)

Plan adjustment based on feedback (image by author)

End-to-end execution and autocorrection: After confirming the plan, the Data Science Agent was able to execute the plan end-to-end autonomously. Whenever it encountered errors while running Python code, it diagnosed what was wrong and attempted to correct the error by itself. For example, at the model training step, it first ran into a DTypePromotionError error because of including a datetime column in training. It decided to drop the column in the next try but then got the error message ValueError: Input X contains NaN. In its third attempt, it added a simpleImputer to impute all missing values with the mean of each column and eventually got the step to work.

The agent ran into an error and auto-corrected it (image by author)

Interactive and iterative notebook: Since the Data Science Agent is built into Google Colab, it populates a Jupyter Notebook as it executes. This comes with several advantages:
- Real-time visibility: Firstly, you can actually watch the Python code running in real time, including the error messages and warnings. The dataset I provided was a bit large — even though I only kept the first 50k rows of the dataset for the sake of a quick test — and it took about 20 minutes to finish the model optimization step in the Jupyter notebook. The notebook kept running without timeout and I received a notification once it finished.
- Editable code: Secondly, you can edit the code on top of what the agent has built for you. This is something clearly better than the official Data Analyst GPT in ChatGPT, which also runs the code and shows the result, but you have to copy and paste the code elsewhere to make manual iterations.
- Seamless collaboration: Lastly, having a Jupyter Notebook makes it very easy to share your work with others — now you can collaborate with both AI and your teammates in the same environment. The agent also drafted step-by-step explanations and key findings, making it much more presentation-friendly.

Summary section generated by the Agent (image by author)

What It Cannot Do

We’ve talked about its advantages; now, let’s discuss some missing pieces I’ve noticed for the Data Science Agent to be a real autonomous data scientist.

It does not modify the Notebook based on follow-up prompts. I mentioned that the Jupyter Notebook environment makes it easy to iterate. In this example, after its initial execution, I noticed the Feature Importance charts did not have the feature labels. Therefore, I asked the Agent to add the labels. I assumed it would update the Python code directly or at least add a new cell with the refined code. However, it merely provided me with the revised code in the chat window, leaving the actual notebook update work to me. Similarly, when I asked it to add a new section with recommendations for lowering the insurance premium costs, it added a markdown response with its recommendation in the chatbot Read More