LangChain Agent with Bright Data Provider
Introduction The future of automation is with “Agents”. Whereas, in the business automations, there isn't an easy solution to get the functionality done. In the present day and age of intelligent automation, it is highly crucial to develop powerful platforms and tools. Such a vast combination is Bright Data, LangChain, and Google Gemini. Bright Data facilitates web-scale data extraction, LangChain facilitates developing advanced language models and chains, and Google Gemini provides premium summarization capabilities. This blog post will take you through a real-world use case integrating these technologies to create an intelligent agent that can execute Google search queries via Bright Data, scrape Airbnb listings, and return summarized insights from the results. You will be demonstrated with the usage of LangChain to organize your workflow and Google Gemini for summarization so that the output is actionable, meaningful, and concise. Who is this for? This solution is ideal for: • Data Engineers and Data Scientists: Wishing to create an intelligent agent that can gather, process, and abstract data from diverse sources. • Developers: Wanting to incorporate APIs and create sophisticated applications with the help of LangChain and other third-party platforms such as Bright Data and Google Gemini. • Business Analysts and Product Managers: Who wish to find means of deriving insights from two platforms (Google and Airbnb) and summarizing the data for faster decision-making. What problem is this workflow solving? Actionable information quickly is always critical. Traditional data extraction methods are often slow, error-prone, and require significant manual effort. This workflow solves the following problems: Web scraping complexity: Automating the extraction of data from websites like Google and Airbnb in a structured and scalable manner. Search optimization: Refining Google search results and presenting them in a meaningful way, specifically for business applications like competitive analysis or market research. Summarization: Aggregating data and providing concise summaries using advanced AI techniques to ensure that key insights are easily consumable. What this workflow does The core of this workflow is a LangChain agent that: Performs a Google search using the Bright Data SERP API. Scrapes Airbnb listings from a specific location using the Bright Data Web Unlocker. Summarizes the results using Google Gemini. Detailed Breakdown of the Process Scraping Airbnb Listings: With a location input, the agent scrapes Airbnb listings using Bright Data Web Unlocker, which provides access to dynamic content such as property details (price, location, amenities). Google Search via Bright Data SERP API: The workflow first sends a search query to Google using Bright Data’s SERP API. This API allows us to bypass search engine restrictions and retrieve organic search results (titles, snippets, URLs) for a given query. Summarization with Google Gemini: Once the data is retrieved, Google Gemini is used to summarize the results. The model condenses the large set of information into a few concise points, allowing the user to quickly understand the key insights without having to read through every detail. Setup To set up this workflow, follow the steps below: 1. Install Required Libraries Before getting started, ensure that you have all necessary libraries installed. You can use the following requirements.txt file to manage dependencies: 2. Set Up API Keys Bright Data: You will need an API key for the Bright Data SERP API and Web Unlocker. Sign up for an account on Bright Data and retrieve your API keys from the dashboard. Google Gemini: You need an API key to access the Google Gemini model. Set up the necessary authentication and obtain an API key from the Google Cloud Console. 3. Environment Variables To securely store your API keys, use a .env file. This ensures that your credentials are not exposed in your codebase. BRIGHTDATA_SERP_API_KEY=your_brightdata_api_key BRIGHTDATA_BEARER_TOKEN=your_brightdata_bearer_token GOOGLE_API_KEY=your_google_api_key GOOGLE_GEMINI_MODEL_NAME=your_google_gemini_model_name 4. Write the LangChain Agent Now that all dependencies are in place, let's write the LangChain agent. This agent will interact with Bright Data, scrape the necessary information, and pass it through Google Gemini for summarization. Code for the LangChain Agent: Source Code LangChain-BrightData-Agent Here’s the crucial agent implementation which utilizes the Bright Data, Airbnb and Google Gemini providers. from dotenv import load_dotenv from langchain.agents import initialize_agent, Tool from tools.google_search import GoogleSearchTool from tools.airbnb import AirbnbTool from gemini_summary import summarize_with_gemini from llm import GeminiLLM load_dotenv() llm = GeminiLLM() tools = [ Tool.from_function(func=GoogleSearchTool(), name="Google Search", descripti

Introduction
The future of automation is with “Agents”. Whereas, in the business automations, there isn't an easy solution to get the functionality done. In the present day and age of intelligent automation, it is highly crucial to develop powerful platforms and tools. Such a vast combination is Bright Data, LangChain, and Google Gemini. Bright Data facilitates web-scale data extraction, LangChain facilitates developing advanced language models and chains, and Google Gemini provides premium summarization capabilities.
This blog post will take you through a real-world use case integrating these technologies to create an intelligent agent that can execute Google search queries via Bright Data, scrape Airbnb listings, and return summarized insights from the results. You will be demonstrated with the usage of LangChain to organize your workflow and Google Gemini for summarization so that the output is actionable, meaningful, and concise.
Who is this for?
This solution is ideal for:
• Data Engineers and Data Scientists: Wishing to create an intelligent agent that can gather, process, and abstract data from diverse sources.
• Developers: Wanting to incorporate APIs and create sophisticated applications with the help of LangChain and other third-party platforms such as Bright Data and Google Gemini.
• Business Analysts and Product Managers: Who wish to find means of deriving insights from two platforms (Google and Airbnb) and summarizing the data for faster decision-making.
What problem is this workflow solving?
Actionable information quickly is always critical. Traditional data extraction methods are often slow, error-prone, and require significant manual effort.
This workflow solves the following problems:
- Web scraping complexity: Automating the extraction of data from websites like Google and Airbnb in a structured and scalable manner.
- Search optimization: Refining Google search results and presenting them in a meaningful way, specifically for business applications like competitive analysis or market research.
- Summarization: Aggregating data and providing concise summaries using advanced AI techniques to ensure that key insights are easily consumable.
What this workflow does
The core of this workflow is a LangChain agent that:
- Performs a Google search using the Bright Data SERP API.
- Scrapes Airbnb listings from a specific location using the Bright Data Web Unlocker.
- Summarizes the results using Google Gemini.
Detailed Breakdown of the Process
- Scraping Airbnb Listings: With a location input, the agent scrapes Airbnb listings using Bright Data Web Unlocker, which provides access to dynamic content such as property details (price, location, amenities).
- Google Search via Bright Data SERP API: The workflow first sends a search query to Google using Bright Data’s SERP API. This API allows us to bypass search engine restrictions and retrieve organic search results (titles, snippets, URLs) for a given query.
- Summarization with Google Gemini: Once the data is retrieved, Google Gemini is used to summarize the results. The model condenses the large set of information into a few concise points, allowing the user to quickly understand the key insights without having to read through every detail.
Setup
To set up this workflow, follow the steps below:
1. Install Required Libraries
Before getting started, ensure that you have all necessary libraries installed. You can use the following requirements.txt file to manage dependencies:
2. Set Up API Keys
- Bright Data: You will need an API key for the Bright Data SERP API and Web Unlocker. Sign up for an account on Bright Data and retrieve your API keys from the dashboard.
- Google Gemini: You need an API key to access the Google Gemini model. Set up the necessary authentication and obtain an API key from the Google Cloud Console.
3. Environment Variables
To securely store your API keys, use a .env file. This ensures that your credentials are not exposed in your codebase.
BRIGHTDATA_SERP_API_KEY=your_brightdata_api_key
BRIGHTDATA_BEARER_TOKEN=your_brightdata_bearer_token
GOOGLE_API_KEY=your_google_api_key
GOOGLE_GEMINI_MODEL_NAME=your_google_gemini_model_name
4. Write the LangChain Agent
Now that all dependencies are in place, let's write the LangChain agent. This agent will interact with Bright Data, scrape the necessary information, and pass it through Google Gemini for summarization.
Code for the LangChain Agent:
Source Code LangChain-BrightData-Agent
Here’s the crucial agent implementation which utilizes the Bright Data, Airbnb and Google Gemini providers.
from dotenv import load_dotenv
from langchain.agents import initialize_agent, Tool
from tools.google_search import GoogleSearchTool
from tools.airbnb import AirbnbTool
from gemini_summary import summarize_with_gemini
from llm import GeminiLLM
load_dotenv()
llm = GeminiLLM()
tools = [
Tool.from_function(func=GoogleSearchTool(), name="Google Search", description="Search Google for answers"),
Tool.from_function(func=AirbnbTool(), name="Airbnb Search", description="Search Airbnb for listings")
]
agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True)
if __name__ == "__main__":
query = "Find Airbnb listings in New York and summarize Google reviews about staying there."
try:
result = agent.run(query)
summary = summarize_with_gemini(result)
print("\n===== RAW RESULT =====")
print(result)
print("\n===== SUMMARY =====")
print(summary)
except Exception as e:
print(f"[Agent Error] {e}")
How to Customize this Workflow to Your Needs
- Modify Search Queries: You can change the search query by modifying the query variable. This allows the agent to search for different topics, products, or services.
- Summarization Settings: If you want a different style or level of detail for the summaries, update the gemini_summary.py with the prompt template for summarization.
- Add More Tools: LangChain allows for easily adding new tools. You can integrate other web scraping tools, APIs, or services to extend this agent's capabilities.
Source Code
Here's the Source Code LangChain-BrightData-Agent
Conclusion
By integrating Bright Data, LangChain, and Google Gemini, you can build an intelligent agent that efficiently scrapes data, processes it, and gives insightful answers in the form of summaries. This process can be further tailored to adapt to various applications, including competitive research, market analysis, or vacation planning.
Content Credits - This blog-post contents were formatted with ChatGPT to make it more professional and produce a polished content for the targeted audience.