The story of Google Nest’s migration to Cloud SQL on Google Cloud

9 months ago 42
News Banner

Looking for an Interim or Fractional CTO to support your business?

Read more

Editor’s note: Google Nest — a leader in smart home products — migrated its legacy services to Google Cloud with minimal downtime, no outages, and no lost or corrupted data. Here’s how the team reduced infrastructure costs, improved performance, and increased system reliability.


Google Nest makes a variety of smart home products, including smart speakers, smart doorbells, and smart thermostats. Our products work beautifully together to help customers stay informed, feel safer in their homes, and be more connected to friends and family.

A key legacy system struggles with cost, complexity, and risk

When Google acquired Nest Labs in 2014, we also inherited the smart home company's technology infrastructure. This included critical subscription services deployed on AWS, utilizing Amazon EC2 instances for database management.

Maintaining the infrastructure to support these subscriptions was a big job. On the application side, our team managed numerous different services, including an SAP Hybris instance used for subscription management. This legacy infrastructure included terabytes of subscription data stored on multiple MySQL databases, all running on Amazon EC2. Few of these workloads were self-managed services, which meant that our own site reliability engineers (SREs) had the tough job of monitoring and managing these systems on a 24/7 schedule.

Wanted: a sustainable future for MySQL workloads

We knew our current approach to running legacy infrastructure wasn’t sustainable. We wanted to create a cloud-based architecture that would supersede the limitations of our legacy MySQL database environment. First and foremost, that meant using managed services to reduce maintenance overhead. In addition, we wanted true horizontal scalability with rapid elasticity, tangible gains in system availability and resiliency, and robust monitoring tools to tell us exactly where we stood with these and other post-migration KPIs.

Picking a winner: Cloud SQL stands out

Cloud SQL was an ideal solution for managing our legacy subscription data. As a secure, fully managed and automated service, Cloud SQL allowed us to hand off database management, cut our operational costs, and maintain the reliability and performance we needed for critical subscription services. Cloud SQL also integrated seamlessly with other Google Cloud services, including a wide range of monitoring and analytics tools, and it gave us a simple, near- zero downtime migration experience.

Once we decided on our future state, we broke down our migration into two main parts: First we migrated our subscription services on this infrastructure to Google Kubernetes Engine (GKE) and then moved our legacy MySQL databases onto Cloud SQL for MySQL. We began looking for tools that would help us avoid downtime and outages, minimize the risk of lost or corrupted data, and validate the migrated data. A pair of powerful capabilities that made a huge difference during the migration process were:

  1. The Database Migration Service (DMS) dramatically reduced the operational burden that we expected to deal with during a database migration. This fully managed service handled almost every aspect of the migration, from provisioning servers, to automated monitoring, to seamless auto-scaling support. DMS also leverages a change data capture (CDC) model that enables real-time database migrations with a much lower burden on system resources and near-zero delay for the continuous replication.
  2. The data validation tool (DVT) validated terabytes of subscription data while limiting total migration downtime to less than one hour.

When we started the process of migrating our legacy subscription infrastructure, we knew it was important to move quickly. But we also knew that choosing the wrong migration option for an environment that included multiple MySQL instances and terabytes of data could force us to live with the cost, complexity, and risk. Working with the Cloud SQL team ensured a swift and successful migration away from an AWS environment by:

  • Leveraging a continuous migration process and secure connectivity to all but eliminate the performance, reliability, and data integrity risks often associated with migrations of this size
  • Ensuring no lost or corrupted user data
  • Setting up our legacy infrastructure to deliver consistent, long-term cost savings
  • Setting the standard for future migration projects

The big payback: Cloud SQL delivers on its potential

Cloud SQL continues to meet and exceed our expectations across the board. Migrating our legacy databases to a fully managed cloud environment, for example, freed up at least 10% of an SRE’s valuable bandwidth, which had the opportunity to pivot from low-value maintenance and administrative tasks to more strategic projects. As expected, Cloud SQL cut our operational costs: all but eliminating time spent managing infrastructure, dealing with backups and replication, adding storage capacity, and dealing with other day-to-day tasks. Finally, unifying our legacy subscription infrastructure on Google Cloud has already unlocked significant performance gains, including a 25% improvement in our key p50 latency metric.

Our experience working with Cloud SQL is sure to be valuable as we plan these migrations, and we’re looking forward to our next opportunity to take advantage of everything Cloud SQL has to offer.

Get started

Read Entire Article