Real-time in no time: Introducing BigQuery continuous queries for up-to-the-minute insights

2 months ago 21
News Banner

Looking for an Interim or Fractional CTO to support your business?

Read more

The world operates in real time: Customers make purchases, financial transactions are made, goods are shipped, sensors generate data, and security threats emerge. The sheer volume of data generated by these real-time events is staggering. Yet, many businesses still rely on traditional, batch-oriented analysis that struggles to keep pace.

Among data analysts and engineers, BigQuery is a favorite for its ability to handle massive datasets and complex queries with ease. However, users are increasingly demanding expanded real-time capabilities to manage continuous data streams for both input and output. To address this challenge for customers, we have transformed BigQuery into a real-time, event-driven analytical platform. So today, we’re excited to launch BigQuery continuous queries, now available in preview. 

BigQuery continuous queries is our answer to the challenge of the inherent cost and complexity of true real-time data analysis. Historically, "real-time" meant analyzing data that was minutes or even hours old. However, the landscape of data ingestion and analysis is rapidly evolving. The surge in data generation, customer engagement, decision-making, and AI-driven automation has drastically reduced the acceptable latency for decision-making. The demand for insights is no longer minutes or hours, it’s seconds.

Customer expectations have shifted dramatically, too. Today, they want real-time, personalized interactions across their online experiences. Businesses are under pressure to respond instantly and with all the relevant context, a feat that batch-oriented analysis simply cannot achieve.

Meeting these demands is hard. Even an enterprise data platform like BigQuery, while capable of high-throughput real-time data ingestion, was originally designed to perform analysis in a batch-oriented manner, where data is "pulled" from the system through ad-hoc or scheduled jobs, rather than "pushed" in an event-driven way. And while you could integrate additional technologies with BigQuery to enable streaming analysis, this often added architectural complexity, required diverse programming skills, and addressed a limited number of use cases. 

Introducing BigQuery continuous queries

BigQuery continuous queries changes all that. With BigQuery continuous queries, you can execute continuously processing SQL statements that can process, analyze, and transform data as new events arrive in BigQuery, ensuring your insights are always up to date. The feature's native integration with the Google Cloud ecosystem unlocks even more potential. You can harness the power of Vertex AI and Gemini to perform machine learning (ML) inference on incoming data in real time. Or perhaps you want to replicate the results of a continuous query to Pub/Sub topics, Bigtable instances, or even other BigQuery tables for further processing or analysis. It's like having an always-on analyst at your disposal, constantly monitoring your data streams and triggering actions the moment something noteworthy occurs.

With BigQuery continuous queries, we're dramatically expanding BigQuery's abilities, empowering you with new dynamic and event-driven data processing capabilities, alongside its existing unified data platform strengths. This feature allows you to build applications that respond instantly to changes in your data, opening a new realm of possibilities. Craft personalized customer experiences on the fly, detect anomalies before they escalate, and automate decision-making processes, all with unprecedented agility.

1 - Continuous Queries Overview

The game-changing potential of BigQuery continuous queries

The introduction of BigQuery continuous queries is a significant step forward in real-time and event-driven data analysis right where your data resides. It empowers you to:

  • Simplify real-time pipelines: Express complex, real-time data transformations and analysis using the familiar language of SQL, removing the need for additional technologies or specialized programming skills. 

  • Unlock real-time AI use cases: Incorporate real-time data transformation with Google’s robust AI offerings using Vertex AI and Gemini, enabling a wide range of real-time AI-powered applications, such as generating personalized content, data enrichment and entity extraction, detecting anomalies instantly, and powering event-driven architectures.

  • Streamline reverse ETL: BigQuery continuous queries integrates with other Google Cloud services like Pub/Sub and Bigtable, so you can send the results of a continuous query to Pub/Sub topics to craft event-driven data pipelines and Bigtable instances for real-time application serving. Alternatively, the results of a continuous query can be written into another BigQuery table for further analysis.

  • Provide scalability and performance: Backed by BigQuery's robust serverless infrastructure, continuous queries can handle massive volumes of data with high throughput and low latency.

In short, BigQuery continuous queries democratizes real-time event processing, making it accessible to a broader audience and enabling businesses to unlock the full potential of their data using SQL. 

Customers like Bayer, one of the largest pharmaceutical and biomedical companies in the world, see value in leveraging BigQuery continuous queries to make new real-time use cases possible.

“At Bayer, we are under more pressure to deliver real-time analytics – which has historically proven difficult. Now that we’ve had an opportunity to evaluate BigQuery continuous queries, we are incredibly excited about the future possibilities this capability will unlock. From real-time integration of ERP, CRM, IOT data to real-time monitoring and alerting use-cases, we believe continuous queries will be a game-changer that will significantly expand the types of business challenges we can address within our data warehouse.”  -  Anthony Savio, Data Warehouse Engineering Lead, Bayer

Building with an example

The best way to learn is often by seeing it in action. So, let's explore how BigQuery continuous queries can tackle a common ecommerce challenge: shopping cart abandonment.

Imagine this: You've poured your heart into creating a fantastic product, attracted potential customers to your website, and they've even added items to their cart. But then, they vanish without completing the purchase. Frustrating, right? Shopping cart abandonment is a widespread issue; the average cart abandonment rate hovers around a disheartening 70% according to the Baymard Institute. One solution? Real-time engagement that rekindles customer interest with a BigQuery continuous query.

To demonstrate, we’ll use a BigQuery table that logs our website’s abandoned cart events and captures: customer’s contact information, the abandoned cart contents, and the abandonment time. We’ll run a BigQuery continuous query that constantly monitors this abandoned cart table for new events, then sends any new abandoned carts through Vertex AI to generate a tailored promotional email for each customer, complete with product suggestions and perhaps a limited-time discount, and publishes the personalized email content to a Pub/Sub topic. Lastly we’ll use a simple Application Integration platform trigger to send an email for each Pub/Sub message received.

2 - Architectural Overview

You can follow along with step-by-step instructions and build your own end-to-end continuous queries demo using this GitHub repository.

Once our demo environment is set up and we have real-time events being streamed into our BigQuery abandoned carts table, we can write a SQL query like the below to process, use generative AI to craft a personalized email, and write this message to our Pub/Sub topic.

code_block <ListValue: [StructValue([('code', 'EXPORT DATA\r\n OPTIONS (format = CLOUD_PUBSUB,\r\n uri = "https://pubsub.googleapis.com/projects/production-242320/topics/recapture_customer")\r\nAS (SELECT\r\n TO_JSON_STRING(\r\n STRUCT(\r\n customer_name AS customer_name,\r\n customer_email AS customer_email, REGEXP_REPLACE(REGEXP_EXTRACT(ml_generate_text_llm_result,r"(?im)\\<html\\>(?s:.)*\\<\\/html\\>"), r"(?i)\\[your name\\]", "Your friends at AI Megastore") AS customer_message))\r\n FROM ML.GENERATE_TEXT( MODEL `Continuous_Queries_Demo.gemini_1_5_pro`,\r\n (SELECT\r\n customer_name,\r\n customer_email,\r\n CONCAT("Write an email to customer ", customer_name, ", explaining the benefits and encouraging them to complete their purchase of: ", products, ". Show other items the customer might be interested in and include details about our rewards program. Provide the response email in HTML format and include a hyperlink to the customer\'s shopping cart.") AS prompt\r\n FROM `Continuous_Queries_Demo.abandoned_carts`),\r\n STRUCT(\r\n 1024 AS max_output_tokens,\r\n 0.2 AS temperature,\r\n 1 AS candidate_count, \r\n TRUE AS flatten_json_output)))'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3e6734e7b1c0>)])]>

Then, with the simple act of enabling BigQuery continuous query mode in the BigQuery Editor, we’ll configure this query to run persistently and process data as new events arrive.

3 - Enabling Continuous Query Mode

Now when a shopping cart is abandoned, within seconds, the customer receives a personalized email like the example below, increasing the chances of recovering the sale. All of this is orchestrated using the simplicity and power of SQL within BigQuery continuous queries.

4 - Resulting Email

Empowering everybody

Replicating the results of a BigQuery continuous query into Pub/Sub unlocks very exciting possibilities for building event-driven data architectures — especially when you consider partner integrations. In fact, several Google Cloud ISV partners have already validated that their offerings support Pub/Sub messages generated from a continuous query, including (but not limited to) Aiven, Census, Confluent, Estuary, Hightouch, Keboola, Lytics, Nexla, Qlik, and Redpanda.

5 - Supported Partners

“With the integration of Confluent Cloud and BigQuery continuous queries, we're empowering organizations to unlock the full potential of their data in real-time. This collaboration streamlines data pipelines, enhances decision-making, and enables the creation of innovative, data-driven applications, giving businesses a significant competitive edge.”  Mike Agnich, VP/GM of Data Streaming Platform, Confluent

For data scientists seeking a more familiar and interactive experience, BigQuery continuous queries integrates with BigQuery DataFrames. Now, you can harness the power of real-time data processing directly within your Python notebooks, for streamlined experimentation, rapid prototyping, and seamless integration of continuous queries into your existing machine learning workflows. Check out the Python streaming Dataframes notebook example to learn more!

Ready to get started?

In today's fast-paced world, businesses that rely solely on historical or even near-real-time data are at a distinct disadvantage. Real-time insights are no longer a luxury, but a necessity for making informed decisions, delivering exceptional customer experiences, and staying ahead of the competition. Whether you're a data scientist uncovering hidden patterns, a business analyst driving strategic initiatives, or a C-level executive charting the course for your organization, understanding the power of real-time data is paramount.

BigQuery continuous queries offer a transformative solution, empowering you to harness the full potential of your data as it flows in, all using the democratized language of SQL. The future of event-driven data analysis has arrived. Sign up for the public preview today and start exploring the possibilities that BigQuery continuous queries unlock. You can also watch our Google Cloud Next 2024 session dedicated to BigQuery continuous queries HERE.

The future is real-time, and it's powered by BigQuery!

Read Entire Article