Build an AI agent for trip planning with Gemini 1.5 Pro: A step-by-step guide

1 month ago 17
News Banner

Looking for an Interim or Fractional CTO to support your business?

Read more

Gemini 1.5 Pro is creating new possibilities for developers to build AI agents that streamline the customer experience. In this post, we'll focus on a practical application that has emerged in the travel industry – building an AI-powered trip planning agent. You'll learn how to connect your agent to external data sources like event APIs, enabling it to generate personalized travel itineraries based on real-time information.

Understanding the core concepts

  • Function calling: Allows developers to connect Gemini models (all Gemini models except Gemini 1.0 Pro Vision) with external systems, APIs, and data sources. This enables the AI to retrieve real-time information and perform actions, making it more dynamic and versatile.
  • Grounding: Enhances Gemini' model’s ability to access and process information from external sources like documents, knowledge bases, and the web, leading to more accurate and up-to-date responses.

By combining these features, we can create an AI agent that can understand user requests, retrieve relevant information from the web, and provide personalized recommendations.

Step-by-step: Function calling with grounding

Let’s run through a scenario:

Let’s say you’re an AI engineer tasked with creating an AI agent that helps users plan trips by finding local events and potential hotels to stay at. Your company has given you full creative freedom to build a minimal viable product using Google’s generative AI products, so you’ve chosen to use Gemini 1.5 Pro and loop in other external APIs. 

The first step is to define potential queries that any user might enter into the Gemini chat. This will help clarify development requirements and ensure the final product meets the standards of both users and stakeholders. Here are some examples:

  • “I’m bored, what is there to do today?”
  • “I would like to take me and my two kids somewhere warm because spring break starts next week. Where should I take them?”
  • “My friend will be moving to Atlanta soon for a job. What fun events do they have going on during the weekends?”

From these sample queries,  it looks like we’ll need to use an events API and a hotels API for localized information. Next, let’s set up our development environment.

Notebook setup

To use Gemini 1.5 Pro for development, you’ll need to either create or use an existing project in Google Cloud. Follow the official instructions that are linked here before continuing. Working in a Jupyter notebook environment is one of the easiest way to get started developing with Gemini 1.5 Pro. You can either use Google Colab or follow along in your own local environment. 

First, you’ll need to install the latest version of the Vertex AI SDK for Python, import the necessary modules, and initialize the Gemini model: 

1. Add a code cell to install the necessary libraries. This demo notebook requires the use of the google-cloud-aiplatform>=1.52 Python module.

2. Add another code cell to import the necessary Python packages.

3. Now we can initialize Vertex AI with your exact project ID. Enter your information in between the variable quotes so you can reuse them. Uncomment the gcloud authentication commands if necessary.

API key configuration

For this demo, we will also be using an additional API to generate information for the events and hotels. We'll be using Google’s SerpAPI for both, so be sure to create an account and select a subscription plan that fits your needs. This demo can be completed using their free tier. Once that’s done, you'll find your unique API key in your account dashboard.

Once you have the API keys, you can pass them to the SDK in one of two ways:

  • Put the key in the GOOGLE_API_KEY environment variable (where the SDK will automatically pick it up from there)
  • Pass the key using genai.configure(api_key = . . .)

Navigate to https://serpapi.com and replace the contents of the variable below between the quotes with your specific API key:

Defining custom functions for function calling

In this step, you'll define custom functions in order to pass them to Gemini 1.5 Pro and incorporate the API outputs back into the model for more accurate responses. We'll first define a function for the events API.

To use function calling, pass a list of functions to the tools parameter when creating a generative model. The model uses the function name, docstring, parameters, and parameter type annotations to decide if it needs the function to best answer a prompt.

Now we will follow the same format to define a function for the hotels API.

Declare the custom function as a tool

The function declaration below describes the function for the events API. It lets the Gemini model know this API retrieves event information based on a query and optional filters.

Again, we will follow the same format for the hotels API.

Consider configuring safety settings for the model

Safety settings in Gemini exist to prevent the generation of harmful or unsafe content. They act as filters that analyze the generated output and block or flag anything that might be considered inappropriate, offensive, or dangerous. This is good practice when you’re developing using generative AI content.

Pass the tool and start a chat

Here we’ll be passing the tool as a function declaration and starting the chat with Gemini. Using the chat.send_message(“ . . . “) functionality, you can send messages to the model in a conversation-like structure.

Build the agent

Next we will create a callable hashmap to map the tool name to the tool function so that it can be called within the agent function. We will also implement prompt engineering (mission prompt) to better prompt the model to handle user inputs and equip the model with the datetime.

Test the agent

Below are some sample queries you can try to test the chat capabilities of the agent. Don’t forget to test out a query of your own!

Wrapping up

That’s all! Gemini 1.5 Pro’s function calling and grounding features enhances its capabilities, enabling developers to connect to external tools and improve model results. This integration enables Gemini models to provide up-to-date information while minimizing hallucinations. 

If you’re looking for more hands-on tutorials and code examples, check out some of Google’s Codelabs (such as How to Interact with APIs Using Function Calling in Gemini) to guide you through examples of building a beginner function calling application.

Posted in
Read Entire Article