Build gen AI apps quickly with LangChain VectorStore in Cloud SQL for PostgreSQL

8 months ago 135

News Banner

We recently announced a suite of LangChain packages for the Google Cloud database portfolio. Each package will have up to three LangChain integrations:

Vector stores to enable semantic search for our databases that support vectors
Document loaders for loading and saving documents to/from your database
Chat Message Memory to enable chains to recall previous conversations

In this blog, we deep dive into the benefits of the VectorStore from our Cloud SQL for PostgreSQL LangChain package, and see how it helps make generative AI application development easy, secure, and flexible.

Security

The Cloud SQL for PostgreSQL LangChain packages come embedded with the Cloud SQL Python connector, which makes connecting securely to your database easy. Developers get the following benefits out of the box:

IAM authorization: Uses IAM permissions to control who or what can connect to your Cloud SQL instances
Convenience: Removes the requirement to manage SSL certificates, configure firewall rules, or enable authorized networks
IAM database authentication: Provides support for Cloud SQL's automatic IAM database authentication feature

Ease of use

Connect with just instance name

Now, you no longer need to construct a connection string with your IP address or pass in a myriad of arguments to connect to your PostgreSQL instance. Instead, the instance name alone will suffice as shown below:

code_block <ListValue: [StructValue([('code', 'from langchain_google_cloud_sql_pg import PostgresVectorStore, PostgresEngine\r\nfrom langchain.embeddings import VertexAIEmbeddings\r\n\r\n\r\nengine = PostgresEngine.from_instance(\r\n "project-id", "region", "my-instance", "my-database")\r\nembedding_service = VertexAIEmbeddings(model_name="textembedding-gecko@003")\r\nvectorstore = PostgresVectorStore.create_sync(\r\n engine,\r\n table_name="my-table",\r\n embedding_service=embedding_service\r\n)'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3ec1cc2cc3a0>)])]>

Connection pooling by default

Connection management is an important part of scaling PostgreSQL. Cloud SQL for PostgreSQL’s LangChain packages come automatically configured with an SQLAlchemy connection pool. Our package supports custom configurations as well as reusing the pool in other parts of your application:

code_block <ListValue: [StructValue([('code', 'from sqlalchemy.ext.asyncio import create_async_engine\r\n\r\nengine = create_async_engine(\r\n "postgresql+asyncpg://langchain:langchain@...",\r\n)\r\nengine = PostgresEngine.from_engine(engine)'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3ec1cc2ccaf0>)])]>

Schema flexibility

While the existing langchain-postgres package offers a VectorStore, it only supports a limited and fixed schema. It uses two tables with fixed names and schemas for all vector stores initialized in a database. Any schema changes require dropping and recreating the table, losing the previous data. Nor does it currently support indexing, and it can only be used for KNN.

Table per collection

In contrast, Cloud SQL for PostgreSQL’s LangChain packages use a different table for each collection of vectors, meaning that schemas can vary as shown below.

code_block <ListValue: [StructValue([('code', '# first store\r\nengine.init_vectorstore_table(\r\n table_name="my-table1",\r\n vector_size=768, # VertexAI model: textembedding-gecko@003\r\n)\r\nvectorstore = PostgresVectorStore.create_sync(\r\n engine,\r\n table_name="my-table1",\r\n embedding_service=embedding_service\r\n)\r\n# second store\r\nengine.init_vectorstore_table(\r\n table_name="my-table2",\r\n vector_size=768, # VertexAI model: textembedding-gecko@003\r\n)\r\nvectorstore = PostgresVectorStore.create_sync(\r\n engine,\r\n table_name="my-table2",\r\n embedding_service=embedding_service\r\n)'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3ec1cc2cc730>)])]>

Support indexing

The Cloud SQL for PostgreSQL LangChain packages supports ANN to speed up vector search. Below are simple code snippets to create, refresh and drop indexes using the package.

Create index

code_block <ListValue: [StructValue([('code', 'from langchain_google_cloud_sql_pg.indexes import IVFFlatIndex\r\n\r\nindex = IVFFlatIndex()\r\nawait vectorstore.aapply_vector_index(index)'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3ec1cc2cc7c0>)])]>

Re-index

code_block <ListValue: [StructValue([('code', 'await vectorstore.areindex() # Re-index using default index name'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3ec1cc2cca60>)])]>

Drop indexes

code_block <ListValue: [StructValue([('code', 'await vectorstore.adrop_vector_index() # Delete index using default name'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3ec1cc2cc190>)])]>

Custom schemas

The Cloud SQL for PostgreSQL LangChain packages allow you to use different schemas, which means you can both reuse any existing table, as well as more easily migrate from other implementations (such as langchain-postgres package).

Use preexisting table

When initializing a PostgresVectorStore, you can optionally specify the names of the columns that you store things like content, ids, or other metadata fields from your LangChain document. This allows you to leverage existing tables, or tables made from other integrations (such as langchain-postgres).

code_block <ListValue: [StructValue([('code', '# Initialize PostgresVectorStore\r\ncustom_store = await PostgresVectorStore.create(\r\n engine=engine,\r\n # Connect to the table the langchain-postgres extension uses\r\n table_name=\'langchain_pg_embedding\',\r\n embedding_service=embedding_service,\r\n # Connect to a existing VectorStore by customizing the table schema:\r\n id_column="id",\r\n content_column="document",\r\n metadata_json_column="cmetadata",\r\n)'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3ec1cc2cc100>)])]>

Extract metadata to column

Specifying metadata columns causes the integration to pull that field from the document metadata and store it in its own, properly typed column.

code_block <ListValue: [StructValue([('code', 'from langchain_google_cloud_sql_pg import Column, PostgresVectorStore\r\n\r\n# Set table name\r\nTABLE_NAME = "vectorstore_custom"\r\n\r\nawait engine.ainit_vectorstore_table(\r\n table_name=TABLE_NAME,\r\n vector_size=768, # VertexAI model: textembedding-gecko@003\r\n metadata_columns=[Column("length", "INTEGER")],\r\n)\r\n\r\n# Initialize PostgresVectorStore\r\ncustom_store = await PostgresVectorStore.create(\r\n engine=engine,\r\n table_name=TABLE_NAME,\r\n embedding_service=embedding_service,\r\n metadata_columns=["length"],\r\n)'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3ec1cc2ccf40>)])]>

Filter on metadata

Having a metadata field stored in a column allows you to leverage PostgreSQL’s value as a relational database, filtering efficiently.

code_block <ListValue: [StructValue([('code', 'import uuid\r\n\r\n# Add texts to the Vector Store\r\nall_texts = ["Apples and oranges", "Cars and airplanes", "Pineapple", "Train", "Banana"]\r\nmetadatas = [{"length": len(t)} for t in all_texts]\r\nids = [str(uuid.uuid4()) for _ in all_texts]\r\nawait custom_store.aadd_texts(all_texts, metadatas=metadatas, ids=ids)\r\n\r\n# Use filter on search\r\ndocs = await custom_store.asimilarity_search_by_vector(\r\n query_vector, \r\n filter="length >= 6"\r\n)\r\n\r\nprint(docs)'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3ec1cc2cc460>)])]>

In summary, it’s very easy to get started with Cloud SQL for PostgreSQL as a vector database, and our native LangChain packages make gen AI development more flexible and powerful.

You can play around with the Cloud SQL for PostgreSQL VectorStore with this VectorStore Notebook, or you can check out the package on GitHub, including filing bugs or other feedback by opening an issue.

Read Entire Article