AWS Systems Manager Patch Manager for orchestrating patching at scale

Overview

Patch Manager is a feature of AWS Systems Manager which automates the patching process for managed nodes with both security and other types of updates for operating systems and applications. Patch Manager currently supports patching operations for Amazon Linux, Amazon Linux 2, CentOS, Debian Server, macOS, Oracle Linux, Raspberry PI OS, Red Hat Enterprise Linux (RHEL), SUSE Linux Enterprise Server (SLES), Ubuntu Server and Windows Server. You can scan fleets of managed nodes to see reports of missing patches, install missing patches manually or automatically based on schedules and combing the two into one automated patching scan and install workflow in a safe and secure manner from a unified user interface. This provides a greatly simplified and powerful mechanism for handling large, distributed patching operations.

Patch Manager utilizes either resource groups and tags or patch groups to determine which virtual machines to run scan or install operations against using predefined, customisable patch baselines which define the desired patching operation. Patching can be scheduled using Systems Manager Maintenance Windows to run operations automatically based on a predefined schedule to meet business requirements. The workflow for a patch operation is controlled by the AWS Systems Manager Run command which is used to perform automated configuration changes on managed machines using automation documents. You can create your own automation documents or use AWS owned and managed documents which have been developed for specific tasks. The automation document most relevant for Patch Manager is the AWS-RunPatchbaseline command document.

Patch Groups

Patch Manager uses patch groups as a way of organising virtual machines for patching. Patch groups can be created for different operating systems, different operating system versions, different environments of for different virtual machine functions. Patch groups provide a way of controlling separation of concerns when running patch operations across fleets of virtual machines. Additionally, patch groups allow a canary like rollout of patches through various environments by targeting environment specific patch groups with varying approval delays.

Patch Groups require a specific resource tag on virtual machines to associate it with a patch group. The tag key must be Patch Group and is case sensitive, where the tag value is the name of the required patch group. It is important to note that a virtual machine is limited to one patch group at a time. An EC2 instance running Ubuntu could be assigned to an Ubuntu patch group using the following tag:

Patch Group: custom-ubuntu-patch-group

Patch groups only handle the organisation of virtual machines, when a patch scan or install is triggered either manually or through Maintenance Windows, it uses the associated patch baseline to determine what patches to install and from what source. This ensures only the correct patches are applied during a patch operation, ensuring compliance with organisational patch compliance standards.

Patch Baselines

Patch baselines are used to provide a template for consistent patching of virtual machines that belong to the patch group associated with a specific patch baseline. Patch baselines include rules for auto-approving patches immediately or a predetermined number of days after release. This can be useful for orchestrating a gradual and automated patch rollout across multiple environments by staggering the patch approval range. For example, a development environment may auto-approve patches immediately, whereas a pre-production environment may auto-approve after 7 days, and finally a production environment may require a manual patch approval or an automatic approval a further 7 days after pre-production. Patch baselines allow customisable lists of approved and rejected patches, this allows fine grained control over what patches are installed on specific machines. Additionally, patch baselines allow customisable parameters to control how a patch operation will behave when the AWS-RunPatchBaseline SSM automation document is invoked.

AWS provide several predefined default Patch Baselines for supported Operating Systems which can be used as in the absence of any specific patch requirements.

Maintenance windows

The primary purpose of AWS Systems Manager Maintenance Windows is to define a schedule to perform potentially disruptive operations on managed instances like patching. Each maintenance window has a set schedule, maximum duration and targets. You can specify dates on which a maintenance window should and shouldn’t run and set time zone preferences to ensure consistent scheduling of tasks.

Maintenance window schedules can be controlled using common cron definitions or using rate expressions. Furthermore, concurrency settings can be utilised to ensure Maintenance Window tasks are conducted in a phased rollout rather than all at once and error thresholds can cancel a Maintenance Window Task if a set number of operations fail. Logs can be stored in Amazon CloudWatch Logs or Amazon S3 buckets to provide a clear auditable trail of configuration changes applied to machines and Amazon Simple Notification Service can be configured to report execution progress notifications during scheduled executions.

Maintenance Windows support running four different types of tasks:

  1. Commands in Run command like applying patches using the AWS-RunPatchBaseline automation document.
  2. Automation workflows like execution scripts or remote configuration changes on managed resources.
  3. AWS Lambda functions for any programmatic operations.
  4. AWS Step Functions for complex orchestration of serverless workflows.

Maintenance Window Tasks

When a maintenance window is invoked, it uses the registered task to determine what action to take. Common tasks include applying patch baselines to registered patch groups using the Run Command feature of Systems Manager.

Maintenance Window Targets

Maintenance Window Targets are the resources which will have actions defined in Maintenance Window Tasks applied to them. In a patching scenario the run command applies the AWS-RunPatchBaseline automation document loading parameters from the specified patch baseline against registered nodes in the Maintenance Window Target definition.

Patching Workflow

 

The initial step is to invoke the AWS-RunPatchBaseline, this can be done manually using the Patch Now feature of Patch Manager or using a Maintenance Window for a more automated workflow. When using Maintenance Windows, the task definition states what to do (AWS-RunPatchbaseline) and the Maintenance Window registered targets define on what resources to apply the task to.
The AWS-RunPatchBaseline automation document loads parameters from the task invocation, including operation type (scan, install) and any overriding parameters that will take precedence over what is declared in the Patch Baseline. It then follows a pre-determined set of steps to walk through the process of identifying machines, the Patch Group they belong to, what Patch Baseline to apply and finally the installation of missing patches.

First, it checks a target to see if it has the required Patch Group: value tag, if it doesn’t then the automation workflow ends, if it does then it identifies what Patch Group the managed node belongs to.

If the Patch Group has a specified Patch Baseline, then the automation document carries out the defined operation from the task invocation (scan, install) to either scan for missing patches or install them on the machine based on the criteria defined in the Patch Baseline. If the Patch Group has no Patch Baseline associated with it, it carries out the same steps but using the default Patch Baseline for the Operating System – if this has not been defined then it defaults to an AWS owned and managed Patch Baseline.

Summary

AWS Systems Manager Patch Manager allows organisations to greatly simplify patch operations to ensure virtual machines are always compliant and protected against common exploits. Patch Manager allows organisations to handle distributed patching in a safe, automated and secure manner that can scale with business needs.