As the fuel for AI, data's role in driving innovation is uncontested. However, since so much data is unstructured and unmanaged today, data accessibility can stand in the way of AI adoption. Our goal is to make data accessible, actionable and transformative for enterprises. Today, we are delivering new capabilities to help you realize this with a data cloud based on open standards, that connects data to AI in real-time, and pushes the boundaries of gen AI with conversational data agents.
An open ecosystem to work with data in real-time
Earlier this year, we announced plans to unify BigQuery into a single platform for data and AI use cases, incorporating all formats and types of data, multiple engines, governance, ML and business intelligence. To support customers working with open formats, we are excited to announce the general availability of a managed experience for Iceberg, Hudi, and Delta file formats. To make data preparation for multimodal data easier, we are adding capabilities for data processing across a variety of formats including documents, audio, images and video.
Volkswagen uses BigQuery to ground AI models on multiple data sources, including vehicle owner’s manuals, frequently asked questions from customers, help center articles, and official Volkswagen YouTube videos.
“We’re driven to introduce new technologies and features that enhance the ownership experience for all of our Volkswagen customers and create love for our vehicles. AI is emerging as a utility tool for Volkswagen owners to better understand their vehicles and get answers to questions faster and easier.” - Abdallah Shanti, Chief Information Officer, Volkswagen Group of America.
To support ways customers ingest data, we are announcing new managed services for Flink and Kafka, helping you ingest, configure, tune, scale, monitor and upgrade real-time workloads. The preview of BigQuery workflows allows data engineers to build data pipelines within a unified platform, executing them manually, via an API, or on a schedule.
Another recent innovation, BigQuery continuous queries, helps customers go beyond real-time data analysis to real-time activation of insights. Historically, "real-time" meant analyzing data that was minutes or even hours old. However, the landscape of data ingestion and analysis is rapidly evolving. A surge in generated data, customer engagement, decision-making, and AI-driven automation has drastically reduced the acceptable latency for decision-making — the demand for insights to activation needs to be seamless and no longer minutes or hours, it’s seconds. In addition, we’ve also extended the Analytics Hub data marketplace to enable real-time data sharing, in preview.
To help customers extract meaningful insights from log data, we are thrilled to announce BigQuery pipe syntax designed to improve the way you manage, analyze, and derive value from your logs. This provides data teams with SQL designed for the semi-structured nature of log data, providing a more simplified approach to data transformations.
Connect all your data to AI
Today, BigQuery customers can generate and search embeddings at scale, enabling a wide range of use cases such as semantic nearest-neighbor search, entity resolution, semantic search, similarity detection, RAG and recommendations. Thanks to integration with Vertex AI, users can easily generate embeddings over text, images, video, and multimodal data, as well as over structured data. LangChain integration with BigQuery makes it simple to pre-process your data, generate and store embeddings, and run vector search — now generally available.
To enhance our vector search capabilities, we are introducing search for large queries with ScaNN, in preview. This is the same technology that powers popular Google services such as Google Search and YouTube. The ScaNN index can support more than one billion vectors while maintaining state-of-the-art query performance, enabling high scale workloads for every enterprise.
We’re also making it easy to process data with familiar Python APIs using BigQuery DataFrames. This includes generating synthetic data as an alternative to training ML models and testing systems. To accelerate this kind of AI experimentation, we are partnering with Gretel AI for synthetic data generation in BigQuery, so you can use data that closely resembles your actual data, but that doesn’t contain any sensitive information.
Unify data with fine-grained governance
Tens of thousands of organizations already use BigQuery and its integrated AI capabilities to power their data clouds. But in a data-driven AI era, organizations need to govern new data types and an ever growing variety of workloads.
For example, Box Inc. handles billions of files and serves millions of users globally, and BigQuery, with its serverless architecture, makes it easier for them to process hundreds of thousands of events per second and manage petabyte-scale storage. Using BigQuery, they have strengthened security through fine-grained access control to reliably identify, classify, and protect sensitive data fields.
With more data-access and AI use cases, data management and governance become top of mind. To discover your data and AI assets in a unified way, we are excited to announce the general availability of BigQuery’s unified catalog, which helps you automatically harvest, ingest, and index metadata from across your data estate — including data sources, AI models, and BI assets. To easily discover and query all those data assets, regardless of their type or location, we are also introducing BigQuery catalog semantic search in preview. Now you can ask questions in natural language and BigQuery understands your intent, retrieving the most relevant results and making it easier for users to find what they are looking for.
To make your data accessible to multiple execution engines, you can use our new BigQuery metastore. Available next month in preview, this capability allows multiple engines to run on a single copy of data across both structured and unstructured object tables, providing a single view for policy and performance management, as well as workload orchestration.
You can also use new governance capabilities in BigQuery for BI use cases with Looker. You get a fully managed, self-service experience to connect and ingest metadata from Looker — no need to set up, maintain, and operate your own connector; instead you can use the catalog metadata from Looker instances and capture Looker dashboards, exploration, and dimensions.
Finally, to ensure business continuity we’ve added disaster recovery capabilities to BigQuery. This provides failover and redundant compute capacity with a service level agreement (SLA) for your business-critical workloads. These capabilities are not only limited to your data; we also support failover of analytics workloads with BigQuery.
Conversational data agents with Gemini
Organizations across the globe want to build data agents powered by LLMs to perform both internal and customer-facing tasks, to drive access to data, provide novel insights and spur action. To help, we are working on a new set of conversational APIs to empower developers to create their own data agents to increase self-service data access, and monetize their data to differentiate their own products in the market.
Conversational Analytics
In fact, we leveraged these APIs to build a conversational analytics experience with Gemini in Looker. In combination with business logic models available from Looker’s enterprise-scale semantic layer. This provides a single source of truth for your data to ground AI, providing consistent metrics across the organization. The experience then lets you use a familiar Google-Search-like experience to explore your data using natural language.
Your data agents can be built on semantic data models in LookML, which allows you to define governed metrics and semantic relationships between data models. And these models don’t just contain a description of your data — you can query your LookML models to access your data directly.
Under the hood, data agents are powered by a dynamic knowledge graph of data. With BigQuery at the core, the dynamic knowledge graph goes beyond simple semantics, weaving together usage patterns, metadata, historical trends, and more to build a network of data, activities, and relationships.
Last but not least, Gemini in BigQuery is now generally available, helping boost data teams’ productivity with data migration, data preparation, code assist, and data insights. Now your entire company, including business and analyst teams, can chat with your data and obtain insights in seconds, fueling a data-driven, decision-making culture. New capabilities for data insights reduce guesswork with ready-to-run queries offering immediate insights and AI assisted data preparation provides a natural language interface to build data pipelines in BigQuery Studio.
It’s time to connect all your data with AI, starting by bringing it into BigQuery with the data migration program. You can learn about the latest BigQuery platform innovations in this product roadmap webcast. We can’t wait to hear how you apply these data analytics innovations to your business.
Posted in