Dynamic Workload Scheduler: Optimizing resource access and economics for AI/ML workloads

11 months ago 42
News Banner

Looking for an Interim or Fractional CTO to support your business?

Read more

We are in the midst of an exciting era of AI-driven innovation and transformation. Today we announced AI Hypercomputer, a groundbreaking architecture that employs an integrated system of AI-optimized hardware, software, and consumption models. With AI Hypercomputer, enterprises everywhere can run on the same cutting-edge infrastructure that is already the backbone of Google’s internal AI/ML research, development, training, and serving.

But the overwhelming demand for TPUs and NVIDIA GPUs makes effective resource management more crucial than ever.

To address this, today we are excited to announce Dynamic Workload Scheduler, a new, simple, and powerful approach to get access to GPUs and TPUs. This blog is for technical audiences to deep-dive into what it is, how it works, and how you can use it today.

What is Dynamic Workload Scheduler?

Dynamic Workload Scheduler is a resource management and job scheduling platform designed for AI Hypercomputer. Dynamic Workload Scheduler improves your access to AI/ML resources, helps you optimize your spend, and can improve the experience of workloads such as training and fine-tuning jobs, by scheduling all the accelerators needed simultaneously. Dynamic Workload Scheduler supports TPUs and NVIDIA GPUs, and brings scheduling advancements from Google ML fleet to Google Cloud customers. Dynamic Workload Scheduler is also integrated in many of your preferred Google Cloud AI/ML services: Compute Engine Managed Instance Groups, Google Kubernetes Engine, Vertex AI, Batch, and more are planned.

Two modes: Flex Start and Calendar

Dynamic Workload Scheduler introduces two modes: Flex Start mode for enhanced obtainability and optimized economics, and Calendar mode for high predictability on job start times.

1. Flex Start mode: Efficient GPU and TPU access with better economics

Flex Start mode is designed for fine-tuning models, experimentation, shorter training jobs, distillation, offline inference, and batch jobs. With Flex Start mode, you can request GPU and TPU capacity as your jobs are ready to run.

With Dynamic Workload Scheduler in Flex Start mode, you submit a GPU capacity request for your AI/ML jobs by indicating how many you need, a duration, and your preferred region. Dynamic Workload Scheduler intelligently persists the request; once the capacity becomes available, it automatically provisions your VMs enabling your workloads to run continuously for the entire duration of the capacity allocation. Dynamic Workload Scheduler supports capacity requests for up to seven days, with no minimum duration requirement. You can request capacity for as little as a few minutes or hours; typically, the scheduler can fulfill shorter requests more quickly than longer ones.

If your training job finishes early, you can simply terminate the VMs to free up the resources and only pay for what your workload actually consumed. You no longer need to hold onto idle resources just to use them later.

If you’re using GKE node pools for your AI/ML workloads, an easy way to use Dynamic Workload Scheduler is through orchestrators such as Kueue. Popular ML frameworks such as Ray, Kubeflow, Flux, PyTorch and other training operators are supported out of the box. Here are the steps to enable this:

Step 1: Create a node pool with the “enable-queued-provisioning” option enabled.

code_block<ListValue: [StructValue([('code', 'gcloud beta container node-pools create NODEPOOL_NAME \r\n--cluster=CLUSTER_NAME \r\n--region=CLUSTER_REGION \r\n--enable-queued-provisioning \r\n--accelerator type=GPU_TYPE,count=AMOUNT,gpu-driver-version=DRIVER_VERSION --enable-autoscaling --num-nodes=0 --total-max-nodes=NODES_MAX --reservation-affinity=none --no-enable-autorepair \r\n--impersonate-service-account SERVICE_ACCOUNT'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3e34a1344280>)])]>

Step 2: When you create your GKE job, label it to indicate that Kueue and Dynamic Workload Scheduler should run it. Kueue does the rest: it will automatically do a capacity request and handle the job start orchestration.

code_block<ListValue: [StructValue([('code', 'apiVersion: batch/v1\r\nkind: Job\r\nmetadata:\r\n name: sample-job\r\n namespace: default\r\n labels:\r\n kueue.x-k8s.io/queue-name: dws-local-queue\r\nspec:\r\n parallelism: 1\r\n completions: 1\r\n suspend: true\r\n template:\r\n. . . . .'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3e34a1344070>)])]>

2. Calendar mode: Reserved start times for your AI workloads [Preview Q1’ 24]

Calendar mode caters to training and experimentation workloads that demand precise start times and have a defined duration. This mode extends the future reservation capabilities announced back in September.

With Calendar mode, you will be able to request GPU capacity in fixed duration capacity blocks. It will initially support future reservations with durations of 7 or 14 days and can be purchased up to 8 weeks in advance. Your reservation will get confirmed, based on availability, and the capacity will be delivered to your project on your requested start date. Your VMs will be able to target this reservation to consume this capacity block. At the end of the defined duration, the VMs will be terminated, and the reservations will get deleted.

Step 1: Create a calendar mode future reservation.

code_block<ListValue: [StructValue([('code', 'gcloud compute future-reservations create my-calendarblock \\\r\n--zone us-central1-a\r\n--machine-type a3-highgpu-8g\r\n--vm_count=VM_COUNT\r\n--start-time=START_TIME\r\n--end-time=END_TIME\r\n--auto-delete=true\r\n--specific-reservation-required=true'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3e34a1344190>)])]>

Step 2: Run VMs with the specific reservation affinity and specify the reservation delivered at the desired start date via Compute Engine, Managed Instance Groups, or GKE APIs as available today.

What customers are saying about Dynamic Workload Scheduler

Dynamic Workload Scheduler is built on Google Borg technology, which is responsible for real-time scheduling of millions of jobs on the Google ML Fleet, including one of the largest distributed LLM training jobs in the world (as of November 2023). With Flex Start and Calendar modes, Dynamic Workload Scheduler can provide you with more flexibility, improved access to GPUs and TPUs, better resource utilization, and lower costs. Customers and partners are already seeing the benefits of Dynamic Workload Scheduler.

Here is what Linum AI, a text-to-video generative AI company, had to say:

“The new Dynamic Workload Scheduler scheduling capabilities have been a game-changer in procuring sufficient GPU capacity for our training runs. We didn’t have to worry about wasting money on idle GPUs while refreshing the page hoping for sufficient compute resources to become available.” - Sahil Chopra, Co-Founder & CEO, Linum AI

sudoAI, a 3D generative AI company, trained its latest generative model using APIs enabled by Dynamic Workload Scheduler.

“We really like the convenience of finding capacity each time we need it without needing to worry about it. It enabled us to test new ideas, iterate, and also run longer training runs. We were able to fully train our latest 3D Gen AI model using the new Dynamic Workload Scheduler functionality and meet our internal deadlines to launch.” - Robin Han, Co-Founder and CEO, sudoAI

Get started today

Get started today with Dynamic Workload Scheduler-enabled APIs. Learn how to deploy GPUs using the Google Kubernetes Engine ProvisioningRequest API and Kueue. For TPUs, you can use Kubernetes Engine or directly through Queued Resources.

Read Entire Article