Hibernating EC2 Instances in Response to a CloudWatch Alarm

11 months ago 65
News Banner

Looking for an Interim or Fractional CTO to support your business?

Read more

This blog post is written by Jose Guay, Technical Account Manger, Enterprise Support. 

A typical option to reduce costs associated with running Amazon Elastic Compute Cloud (Amazon EC2) instances is to stop them when they are idle. However, there are scenarios where stopping an idle instance is not practical. For example, instances with development environments that take time to prepare and run which benefit from not needing to do this process every day. For these instances, hibernation is a better alternative.

This blog post explores a solution that will find idle instances using an Amazon CloudWatch alarm that monitors the instance’s CPU usage. When the CPU usage consistently drops below the alarm’s threshold, the alarm enters the ALARM state and raises an event used to identify the instance and trigger hibernation.

With this solution, the instance no longer incurs in compute costs, and only accrues storage costs for any Amazon Elastic Block Store (Amazon EBS) volumes.

Overview

To hibernate an EC2 instance, there are prerequisites and required preparation. The instance must be configured to hibernate, and this is done when first launching it. This configuration cannot be changed after launching the instance.

One way to trigger instance hibernation is to use an AWS Lambda function. The Lambda function needs specific permissions configured with AWS Identity and Access Management (IAM). To connect the function with the alarm that detects the idle instance, use an Amazon EventBridge bus.

The following architecture diagram shows a solution.

Solution architecture

Figure 1 – Solution architecture

  • An EC2 instance sends metrics to CloudWatch.
  • A CloudWatch alarm detects an idle instance and sends the event to EventBridge.
  • EventBridge triggers a Lambda function.
  • The Lambda function evaluates the execution role permissions.
  • The Lambda function identifies the instance and sends the hibernation signal.

To implement the solution, follow these steps:

  1. Configure permissions with IAM
  2. Create the Lambda function
  3. Configure the EC2 instance to send metrics to CloudWatch
  4. Configure EventBridge

a. Configure permissions with IAM

Create an IAM role with permissions to stop an EC2 instance. The Lambda function uses it as its execution role. The IAM role also needs permissions to save logs in CloudWatch. This is useful to log when an instance is entering hibernation.

  1. Open the IAM console.
  2. In the navigation pane, choose Policies.
  3. Select Create policy.
  4. For Select a service, search and select CloudWatch Logs.
  5. In Actions allowed, search “createlog” and select CreateLogStream and CreateLogGroup.
  6. Repeat the search, this time for “putlog”, and select PutLogEvents.
  7. In Resources, choose All.
  8. Select + Add more permissions.
  9. For Select a service, select EC2.
  10. In Actions allowed, search “stop” and select StopInstances from the results.
  11. In Resources, choose Specific, and select the Add Arn
  12. From the pop-up window select Resource in this account, type the region where the instance is, and the instance ID. This forms the ARN of the instances to monitor.
  13. Select Add ARNs.
  14. Select Next.
  15. Name the policy AllowHibernateEC2InstancePolicy.

IAM policy to access EC2 instances and CloudWatch logs

Figure 2 – IAM policy to access EC2 instances and CloudWatch logs

Viewing the IAM policy in JSON format

Figure 3 – Viewing the IAM policy in JSON format

  1. In the navigation page, select Roles.
  2. Select Create role.
  3. For Trusted entity type, select AWS Service.
  4. For Use case, select Lambda.
  5. Select Next.
  6. In the Permissions policies list, search and select Allow HibernateEC2InstancePolicy.
  7. Select Next.
  8. Name the role AllowHibernateEC2InstanceFromLambdaRole.
  9. Select Create role.

IAM role implementing the IAM policy

Figure 4 – IAM role implementing the IAM policy

b. Create the Lambda function

Create a Lambda function that will find the ID of the idle instance using the event data from the CloudWatch alarm to hibernate it. The event data will be in a function parameter.

The event data is in the JSON format. The following is an example of what this data looks like.

{ "version": "0", "id": "77b0f9cf-ebe3-3893-f60e-1950d2b8ef26", "detail-type": "CloudWatch Alarm State Change", "source": "aws.cloudwatch", "account": "<account>", "time": "2023-08-10T21:27:58Z", "region": "us-east-1", "resources": [ "arn:aws:cloudwatch:<region>:<account>:alarm:alarm-name" ], "detail": { "alarmName": "alarm-name", "state": { "value": "ALARM", "reason": "TEST", "timestamp": "2023-07-05T21:27:58.659+0000" }, "previousState": { "value": "OK", "reason": "Unchecked: Initial alarm creation", "timestamp": "2023-07-05T21:13:51.658+0000" }, "configuration": { "metrics": [ { "id": "26c493f3-c295-4454-ff19-70ce482dca64", "metricStat": { "metric": { "namespace": "AWS/EC2", "name": "CPUUtilization", "dimensions": { "InstanceId": "<instance id>" } }, "period": 300, "stat": "Average" }, "returnData": true } ], "description": "Created from EC2 Console" } } }

Follow these steps to create the Lambda function.

  1. Open the Functions page of the Lambda console.
  2. Choose Create function.
  3. Select Author from scratch.
  4. Name the function HibernateEC2InstanceFunction.
  5. For the Runtime, select Python 3.10 (or the latest Python version).
  6. For Architecture, choose arm64.
  7. Expand Change default execution role and select Use an existing role.
  8. Select AllowHibernateEC2InstanceFromLambdaRole from the list of existing roles.
  9. Select Create function at the bottom of the page.

In the Lambda function page, scroll down to view the Code tab at the bottom. Copy the following code onto the editor for the lambda_function.py file.

import boto3 def lambda_handler(event, context): instancesToHibernate = [] region = getRegion(event) ec2Client = boto3.client('ec2', region_name=region) id = getInstanceId(event) if id is not None: instancesToHibernate.append(id) ec2Client.stop_instances(InstanceIds=instancesToHibernate, Hibernate=True) print('stopped instances: ' + str(instancesToHibernate) + ' in region ' + region) else: print('No instance id found') def getRegion(payload): if 'region' in payload: region = payload['region'] return region #default to N. Virginia return 'us-east-1' def getInstanceId(payload): if 'detail' in payload: detail = payload['detail'] if 'configuration' in detail: configuration = detail['configuration'] if 'metrics' in configuration: if len(configuration['metrics']) > 0: firstMetric = configuration['metrics'][0] if 'metricStat' in firstMetric: metricStat = firstMetric['metricStat'] if 'metric' in metricStat: metric = metricStat['metric'] if 'dimensions' in metric: dimensions = metric['dimensions'] if 'InstanceId' in dimensions: id = dimensions['InstanceId'] return id return None

Lambda function code editor

Figure 5 – Lambda function code editor

The code has the following contents:

  1. Imports section. In this section, import the libraries that the function uses. In our case, the boto3
  2. The main method, called lambda_handler, is the execution entry point. This is the method called whenever the Lambda function runs.
    1. It defines an array to store the IDs of the instances that enter hibernation. This is necessary because the method stop_instances expects an array as opposed to a single value.
    2. Using the event data, it finds the AWS Region and instance ID of the instance to hibernate.
    3. It initializes the Amazon EC2 client by calling the client method.
    4. If it finds an instance ID, then it adds it to the instances array.
    5. Calls stop_instances passing as parameters the instances array and True to indicate the hibernation operation.

c. Configure the EC2 instance to send metrics to CloudWatch

In the scenario, an idle EC2 instance has its CPU utilization under 10% during a 15-minute period. Adjust the utilization percentage and/or period to meet your needs. To enable alarm tracking, the EC2 instance must send the CPU Usage metric to CloudWatch.

  1. Open the Amazon EC2 console.
  2. In the navigation pane, choose Instances.
  3. Select an instance to monitor with the checkbox on the left.
  4. Find the Alarm status column, and select the plus sign to add a new alarm.

Creating a new CloudWatch alarm from the EC2 console

Figure 6 – Creating a new CloudWatch alarm from the EC2 console

  1. In the Manage CloudWatch alarms page, select Create an alarm. Then, turn off Alarm action. Use Alarm notification to notify when hibernating an instance, otherwise, turn off.

CloudWatch alarm notification and action settings

Figure 7 – CloudWatch alarm notification and action settings

  1. In the Alarm thresholds section, select:
    1. Group samples by Average.
    2. Type of data to sample CPU utilization.
    3. Alarm when less than (<).
    4. Percent 10.
    5. Consecutive periods 1.
    6. Period 15 Minutes.
    7. Alarm name Idle-EC2-Instance-LessThan10Pct-CPUUtilization-15Min.

CloudWatch alarm thresholds

Figure 8 – CloudWatch alarm thresholds

  1. Select Create at the bottom of the page.
  2. A successful creation shows a green banner at the top of the page.
  3. Select the Alarm status column for the instance, then select the link that shows in the pop-up window to go to the new CloudWatch alarm details.

Accessing the CloudWatch alarm from the EC2 console

Figure 9 – Accessing the CloudWatch alarm from the EC2 console

  1. Scroll down to view the alarm details and copy its ARN, which shows in the lower right corner. The EventBridge rule needs this.

Finding the CloudWatch alarm ARN

Figure 10 – Finding the CloudWatch alarm ARN

d. Configure EventBridge to consume events from CloudWatch

When the alarm enters the ALARM state, it means it has detected an idle EC2 instance. It will then generate an event that EventBridge can consume and act upon. For this, EventBridge uses rules. EventBridge rules rely on patterns to identify the events and trigger the appropriate actions.

  1. Open the Amazon EventBridge console.
  2. In the navigation pane, choose Rules.
  3. Choose Create rule.
  4. Enter a name and description for the rule. A rule cannot have the same name as another rule in the same Region and on the same event bus.
  5. For Event bus, choose an event bus to associate with this rule. To match events that come from the same account, select AWS default event bus. When an AWS service in the account emits an event, it always goes to the account’s default event bus.
  6. For Rule type, choose Rule with an event pattern.
  7. Select Next.
  8. For Event source, choose AWS services.
  9. Scroll down to Creation method and select Custom pattern (JSON editor).
  10. Enter the following pattern on the Event Pattern
{ "source": ["aws.cloudwatch"], "detail-type": ["CloudWatch Alarm State Change"], "detail": { "state": { "value": ["ALARM"] }, "resources":[ "<ARN of CW alarms to respond to>" ] } }
  1. In the resources element of the pattern, add the ARN of the CloudWatch alarm created for the EC2 instance. The resources element is an array. Add the ARN of every alarm to which this rule monitors and responds. Doing this allows using a single rule to handle the same action for multiple alarms.
  2. Select Next.
  3. Select a target. This is the action that EventBridge executes once it has identified an event. Choose AWS service and select Lambda function.
  4. Select HibernateEC2InstanceFunction.
  5. Select Next.
  6. Add tags to the rule as needed.
  7. Select Next.
  8. Review the rule configuration, and select Create rule.

- EventBridge rule event pattern

Figure 11 – EventBridge rule event pattern

EventBridge rule targets

Figure 12 – EventBridge rule targets

Testing the implementation

To test the solution, wait for the instance’s CPU utilization to fall below the 10% threshold for 15 minutes. Alternatively, force the alarm to enter the ALARM state with the following AWS CLI command.

aws cloudwatch set-alarm-state --alarm-name
"Idle-EC2-Instance-LessThan10Pct-CPUUtilization-15Min"
--state-value ALARM --state-reason "testing"

Conclusion

Hibernating EC2 instances brings savings during periods of low utilization. Another benefit is that when they start again, they continue their work from where they left off. To hibernate the instance, set the hibernation configuration when launching it. Detect the idle instance with a CloudWatch alarm, and use EventBridge to capture the alarms and trigger a Lambda function to call the Amazon EC2 stop API with the hibernate parameter.

To learn more

Read Entire Article