When an incident disrupts a cloud service, it’s critical to understand the cause and impact, so you can chart a course of action and mount an effective response. In August 2023, we introduced Personalized Service Health, offering fast, transparent, and actionable communication about Google Cloud disruptions to help you respond more effectively to incidents.
Today, we are excited to announce the general availability of Personalized Service Health for 50+ Google Cloud products and services including Compute Engine, Cloud Storage, all Cloud Networking offerings, BigQuery, and Google Kubernetes Engine.
Personalized Service Health is enabled and managed in individual Google Cloud projects. It’s aware of the Google Cloud services in use for the selected project and determines which incidents are most relevant. You can enable Personalized Service Health for individual projects in your organization or across your entire organization.
Once enabled, Personalized Service Health begins processing and publishing relevant incidents to your Service Health dashboard in the Google Cloud console. In the dashboard, you can view active disruptions, evaluate the impact to your project, and track updates.
How to use Personalized Service Health
When faced with a service degradation, Personalized Service Health should be the first step in your incident response journey. It's the go-to Google Cloud destination to check on an emerging or active disruption, providing a variety of integration options (e.g., logs or alerts) to simplify your incident management workflow.
While Personalized Service Health offers the most extensive coverage of incidents relevant to you, we recommend that you use the public status dashboard, Google Cloud Service Health, as a backup, where we post about large incidents that affect a broad set of customers.
Discover incidents through proactive alerts
Personalized Service Health emits logs and can send alerts to a variety of destinations when a Google Cloud service disruption is posted or updated. You can choose which of these you would like to be alerted on and customize the alert content to include critical information about the incident — including the affected Google services and locations, current relevance to your project, observable symptoms, and known mitigations.
This journey starts with setting up an alert. From there, you can configure the alert to be sent to one or more destinations, including email, SMS, Pub/Sub, webhook, and PagerDuty, or configure custom conditions to filter the incidents you wish to be alerted on. Alerts can be created directly from Personalized Service Health, in Cloud Monitoring, or via Terraform.
Control which service disruptions are relevant to you
Personalized Service Health is designed to communicate incidents that may affect you and are relevant to your projects. Personalized Service Health provides multiple interaction points: dashboard, API, logs, and alerts. Each interaction point offers configurable filters to help you narrow down the set of incidents you wish to track or be alerted on. For example, you may want to receive alerts for a specific Google Cloud service or region, or for an incident that you’ve confirmed is impacting your project. To achieve this, you can define filters when you view incidents in our dashboard or use an example alert policy when creating alerts
Integrate with your incident management workflow
Personalized Service Health offers multiple integration options with your preferred incident management tools and workflows. For example, you can integrate alerts with PagerDuty to alert the appropriate incident responders when a service disruption begins, or use the Service Health API to integrate with an incident response dashboard.
The Service Health API offers programmatic access to all incidents relevant to a specific project or for all projects across your organization. The API provides complete programmatic access to all relevant incidents, updates from Google Cloud, and description of impact. You can request incidents using the Service Health API and use the output of that request in your incident management workflow.
From our customers
“In a large organization with critical applications, it is essential to quickly and easily identify incidents that are impacting applications. Deployments can be complex and having insight into Cloud service disruptions relevant to our organization can lead to a quicker recovery. We are excited to see Google Cloud offering fast, personalized visibility of service disruptions and have enabled Personalized Service Health (PSH) in our Google Cloud Platform tenants.”
- T.J. Brandon, Google Cloud Operations Supervisor, Ford
“The instinct for cloud providers is to be overly cautious about sharing outages too quickly. I’d rather proactively move a workload and learn there was no issue than the workload go down unknowingly. We’re happy to see Google Cloud make this step to be more transparent and look forward to leveraging PSH.”
- Justin Watts, Director of Information Services & Technology Strategy, Telus
“We are currently adopting Personalized Service Health to support our collaborative major Incident handling program with Google Cloud. It adds an important asset to our partnership in the context of outage responses. The ability to determine the impact of impaired Google Cloud services in real-time will help us to manage the situations more effectively.”
-Steffen Germersdorf, Head of Global Cloud Service Risk- & Incident De-Escalation, SAP SE
Ready to start using Personalized Service Health? Get started today!