CloudWatch Metric Guide

DatabaseConnections

DatabaseConnections counts the number of client network connections open to the RDS instance at the time of sampling. It reflects both active and idle connections held by your application's connection pool.

Threshold

≥ 80% of your RDS instance's max_connections parameter value

FreeableMemory

FreeableMemory reports the amount of available random access memory on the RDS instance, in bytes. It includes memory in the OS free pool plus reclaimable cached and buffered memory.

Threshold

< 25% of total instance RAM (or a specific byte floor for your instance class)

CPUUtilization

CPUUtilization measures the percentage of CPU capacity consumed by the RDS instance across all available vCPUs. It is the aggregate across all cores.

Threshold

> 80% sustained for 5 minutes or more

FreeStorageSpace

FreeStorageSpace reports the available storage capacity on the RDS instance's EBS volume, in bytes. When this reaches zero, the database stops accepting writes.

Threshold

< 20% of total allocated storage, or < 5 GB absolute floor (whichever is larger)

LAMBDA

AWS Lambda

4 metrics covered

Errors

Errors counts the number of Lambda invocations that resulted in a function error — including exceptions thrown by the function code and runtime errors (timeout, out-of-memory, handler not found).

Threshold

> 0 in any 5-minute window for critical functions; > N errors/minute for high-volume functions (set N based on your acceptable error rate)

Duration

Milliseconds

Duration measures the elapsed wall-clock time from when the Lambda function handler begins executing to when it returns or times out. CloudWatch publishes minimum, maximum, average, and percentile statistics.

Threshold

p99 Duration > 80% of the function's configured timeout

Throttles

Throttles counts invocation requests that Lambda rejected because concurrent execution limits were reached. A throttled invocation was not executed — the caller receives a 429 TooManyRequestsException.

Threshold

> 0 for critical functions; for high-volume functions, a low absolute count threshold based on your acceptable drop rate

ConcurrentExecutions

ConcurrentExecutions reports the number of Lambda function instances actively processing events at any given time, across the entire account or per function when filtered by function name.

Threshold

> 800 concurrent executions at the account level (for accounts at the default 1,000 limit)

ECS

Amazon ECS

3 metrics covered

AWS/ECS

CPUUtilization

CPUUtilization for ECS measures the percentage of CPU units reserved by the tasks in a service that are in use, averaged across the tasks in the service.

Threshold

> 85% sustained for 3 minutes

AWS/ECS

MemoryUtilization

MemoryUtilization for ECS measures the percentage of memory reserved by the tasks in a service that is in use, averaged across the running tasks.

Threshold

> 85% averaged across the service

AWS/ECS

RunningTaskCount

RunningTaskCount reports the number of tasks in the RUNNING state for an ECS service. Tasks not in RUNNING are either pending, provisioning, deprovisioning, or stopped.

Threshold

< desired task count for the service

ALB

Application Load Balancer

3 metrics covered

AWS/ApplicationELB

HTTPCode_ELB_5XX_Count

HTTPCode_ELB_5XX_Count counts HTTP 5XX response codes generated by the load balancer itself — not by the registered targets. These indicate the ALB could not deliver the request to a healthy target.

Threshold

> 0 in any 5-minute window

AWS/ApplicationELB

TargetResponseTime

Seconds

TargetResponseTime measures the time elapsed from when the ALB sent the request to a registered target until the target started sending a response, in seconds. CloudWatch exposes this as p50, p90, p95, and p99 percentiles.

Threshold

p99 TargetResponseTime > your defined SLA threshold (typically 1s for APIs, 3s for pages)

AWS/ApplicationELB

UnHealthyHostCount

UnHealthyHostCount reports the number of targets (EC2 instances, ECS tasks, Lambda functions) registered with the ALB target group that are currently failing health checks.

Threshold

> 0 (any unhealthy host in a production target group)

EC2

Amazon EC2

4 metrics covered

CPUUtilization

CPUUtilization measures the percentage of allocated EC2 compute units (vCPUs) that are in use on the instance, as reported by the hypervisor.

Threshold

> 80% for 15 consecutive minutes

NetworkIn

NetworkIn measures the number of bytes received by the instance on all network interfaces during the CloudWatch measurement period.

Threshold

Anomaly detection recommended over fixed threshold — alert when NetworkIn exceeds 3 standard deviations above the historical baseline for the same time of day

NetworkOut

NetworkOut measures the number of bytes sent by the instance on all network interfaces during the CloudWatch measurement period.

Threshold

Anomaly detection recommended — alert when NetworkOut exceeds 3 standard deviations above baseline for the same time of day and day of week

StatusCheckFailed

StatusCheckFailed combines the results of both the instance status check (instance software and network configuration) and the system status check (underlying AWS host hardware). A value of 1 means at least one of these checks has failed.

Threshold

> 0 — alarm immediately

DYNAMODB

Amazon DynamoDB

2 metrics covered

AWS/DynamoDB

ConsumedReadCapacityUnits

ConsumedReadCapacityUnits reports the number of read capacity units consumed over the specified time period for a DynamoDB table or global secondary index.

Threshold

> 80% of provisioned read capacity (for provisioned mode tables)

AWS/DynamoDB

ThrottledRequests