Storage Insights datasets: Enabling org-wide operational discovery with activity insights

13 hours ago 6

As enterprise storage footprints scale to billions of objects, AI applications and agentic workloads are fundamentally shifting the role of storage from a passive repository to the foundation of the data platform. This is driven by a surge in unstructured model data and the billions of actions performed on those objects, including session logs and audit trails. To manage this and answer questions about cost, operations, and security, storage and platform admins need to go beyond knowing what data they have, to understanding exactly how it is being accessed, moved, and modified.

To help, we're excited to announce activity insights within Storage Insights datasets. Now generally available, these new views provide visibility into the operational details of your Google Cloud Storage assets, enabling data-driven cost optimization and faster troubleshooting. For example, with activity insights, you can answer questions like:

Are my objects located in the right storage classes within my buckets?
What regions is my bucket interacting with the most so I can assess if it is optimally located?
Where are there errors across operations on my storage estate and why?

Answering these questions confidently is the key to unlocking cost optimizations and reclaiming engineering time. Storage Insights datasets, a feature of Storage Intelligence for Cloud Storage, provides daily metadata and frequent activity insights (typically within four hours of the activity) so you have better visibility into your storage estate. While Storage Intelligence is a unified management product with capabilities like Bucket relocation, Batch operations and Gemini Cloud Assist, this blog focuses on how you can leverage Storage Insights datasets for operational optimization.

What are Storage Insights datasets?

Storage Insights datasets deliver an automated, query-ready BigQuery index of your entire storage estate, complete with raw metadata and activity insights, replacing manual, error-prone data collection. Storage Insights datasets can be customized in scope: create a dataset for your entire org, a specific folder, a project, or a set of projects, or even specific buckets. The dataset then refreshes with regular updates, giving you a comprehensive view of your storage.

From static metadata to live intelligence

Storage Insights datasets are your go-to tool for understanding your storage metadata, acting as an inventory management tool, scanning object metadata (storage class, location, age, custom metadata) and organizing it into a powerful, queryable BigQuery-linked dataset. This is crucial for knowing what data you have (learn more about how to optimize storage spend with Storage Insights datasets here).

But what if you also knew how and when that data is being used?

Storage Insights datasets now offers a set of new views that capture:

Object-level activity, including writes, updates, deletes, and errors
Bucket-level aggregate activity, including total object operations, a breakdown by type of operations, total errors and most active prefixes
Bucket-level regional traffic activity, including ingress and egress bytes per region that interact with your bucket
Project-level aggregate activity, including total object operations, a breakdown by type of operations and total errors

This data flows directly into new BigQuery views within your dataset so you can run analytics queries for specific insights, interact with the data via Gemini or simply connect it to powerful Looker dashboards for visualization.

This moves you from a static snapshot to a dynamic, queryable analysis of your data's entire lifecycle. It's the difference between knowing what's in your warehouse and knowing what’s used and when.

Three ways to use activity insights immediately

Here’s what you can do, starting today, with activity insights in Storage Intelligence datasets.

1. Right-size your storage estate

The challenge: You have terabytes of data in Standard or Nearline class storage that you believe is cold. But without proof, moving it to Coldline or Archive class is risky. What if a critical process still needs to read it once per quarter?
The solution: With the new Storage Intelligence views that surface activity insights, you can now identify buckets that have had minimal read/write activity over the last 30, 60, or 90 days.
The outcome: Apply or fine-tune lifecycle policies to transition this data to more cost-effective storage classes.

For example, here’s a SQL query to order all the buckets in your estate with little to no activity in the last six months:

2. Architect for global performance with data-driven bucket placement

The challenge: Your team set up a multi-region bucket to serve a global application. But a year later, is that still the right architecture? What if 99% of your traffic is now coming from a single region?
The solution: Analyze the access patterns in your new bucket_region_activity_view table. You can easily pinpoint which regions are driving read and write activity for the bucket.
The outcome: Make data-driven decisions to co-locate your bucket with your compute. You might find that changing a multi-region bucket to a single-region one (or vice-versa) can lead to significant cost-savings and even improve performance.

For example, here’s a SQL query to break down the egress and ingress traffic pattern for a bucket across regions:

Shipt, a retail technology platform and same-day delivery service, has been using Storage Intelligence capabilities to inform their data location decisions:

“Storage Intelligence enables us to efficiently manage over 2 billion objects, delivering cost and performance optimization. With Insights datasets, we detected and analyzed egress charges from multi-region buckets, identifying opportunities to improve efficiency by co-locating compute and storage. By leveraging the Bucket Relocate capability, we seamlessly moved 1.3 Petabytes of data from multi-region to regional storage, achieving substantial cost savings while maintaining uninterrupted application performance and data pipeline continuity.” - Ron Cuirle, Director of Engineering - Cloud Platforms, Shipt

3. Demystify and resolve operational hotspots

The challenge: Your team sees a spike in 429 (too many requests) errors. In a massive environment, this is rarely just a performance hiccup — it’s expensive! These errors trigger automatic retries, which often lead to a cycle of high-frequency, billable operations that drive up your Class A costs. Pinpointing exactly which object or prefix is causing this can be a time-consuming troubleshooting nightmare.
The solution: The new Storage Insights datasets views provide granular details on these errors, right in BigQuery. You can query for 429 errors and see exactly which objects and prefixes are under pressure.
The outcome: Additionally, you can pinpoint the cause of your 429 errors, moving your team from troubleshooting to resolution.

For example, here’s a SQL query to analyze 429s occurring across your estate, where they are happening and why:

Getting started

As your organization grows with Google Cloud, the scale of your data will only increase. Stop relying on archival data and start optimizing your organization’s storage estate. Cloud Storage Storage Insights datasets with activity insights turn massive data estates from complex operational challenges into clearly understood, highly optimized assets.

To get started, check out use our pre-configured Looker Studio template here to connect to your dataset for quick analysis and value:

For example: View the trend for Total Reads on your bucket over time

Or, analyze the ingress and egress traffic patterns for your bucket:

Posted in

Storage & Data Transfer

Read Entire Article