Nuberio Diagnose · Root cause when alarms fire
Your phone buzzes. The diagnosis is already there.
The moment a CloudWatch alarm fires, Nuberio Diagnose reads your logs, CloudTrail, and resource state — then sends a plain-English root cause and numbered fix options to WhatsApp or Slack. Reply with a number. Go back to sleep. If you want to catch anomalies before they alarm, see Nuberio Watch.
Included in all paid plans. First diagnosis in under 60 seconds from alarm fire.
Nuberio
online
🚨 PROD-Lambda-Function-Errors fired
prod-api-order-processor · us-east-1 · production
Root cause (91% confidence) Memory exhaustion — Lambda hitting 512 MB limit under burst load. 340+ invocations timed out.
[ERROR] Task timed out after 900.00s
MemorySize: 512 MB · Used: 510 MB
Errors/min: 47 ↑ (threshold: 5)💥 Impact: ~340 orders unprocessed in last 15 min
Fix steps:
1 → Increase Lambda memory to 1024 MB
2 → Enable SQS dead-letter queue
3 → Reduce batch size: 100 → 25
Reply: ACKNOWLEDGE · RESOLVE · SNOOZE 30
ACKNOWLEDGE
✓ Acknowledged · 03:15 AM
Auto-escalating to on-call in 20 min if not resolved. Reply RESOLVE when fixed.
The 3am ritual
CloudWatch sends you a wall of JSON. You're not awake yet.
A CloudWatch alarm tells you something crossed a threshold. It doesn't tell you why. So you open the AWS console on your phone, squint at four dashboards, scroll through CloudTrail, dig for log groups, and try to reconstruct what happened — all while half-asleep and increasingly aware that whatever you do next, you'll be doing it groggy.
First diagnosis
under 60 seconds from alarm fire
Reads automatically
logs, CloudTrail, and resource state
Delivered to
WhatsApp or Slack — your choice
One alarm. One answer.
Here's what CloudWatch sends — and what Nuberio sends instead.
What CloudWatch sends you
Raw noise. No context.
⚠ ALARM — us-east-1 Source: AWS CloudWatch Time: 02:17 AM Alert: CPUUtilization > 80% Service: api-service (us-east-1) Value: 96.4%
That's it. Raw. No context. No fix. Just noise. A wall of JSON. You still have to open CloudWatch, correlate four dashboards, and guess at the root cause — at 3 am.
What Nuberio sends you
Diagnosis. Next step. Done.
Nuberio
online
🚨 PROD-Lambda-Function-Errors fired
prod-api-order-processor · us-east-1 · production
Root cause (91% confidence) Memory exhaustion — Lambda hitting 512 MB limit under burst load. 340+ invocations timed out.
[ERROR] Task timed out after 900.00s
MemorySize: 512 MB · Used: 510 MB
Errors/min: 47 ↑ (threshold: 5)💥 Impact: ~340 orders unprocessed in last 15 min
Fix steps:
1 → Increase Lambda memory to 1024 MB
2 → Enable SQS dead-letter queue
3 → Reduce batch size: 100 → 25
Reply: ACKNOWLEDGE · RESOLVE · SNOOZE 30
ACKNOWLEDGE
✓ Acknowledged · 03:15 AM
Auto-escalating to on-call in 20 min if not resolved. Reply RESOLVE when fixed.
Root cause already identified. Suggested fix already written. Reply 1 and go back to sleep.
How it works
From alert to fix in four steps.
Nuberio Diagnose runs automatically the moment a CloudWatch alarm fires — no manual trigger, no dashboard to open. Deploy a read-only CloudFormation template once, and every future alarm arrives with root cause already attached.
Connect AWS
Drop in a CloudFormation template. 2 minutes. No CLI required. Read-only access by default.
Alert fires
CloudWatch triggers Nuberio the moment a threshold is breached. Not after you've been paged.
AI investigates
While you're still waking up, Nuberio has already read the logs, checked what changed, and worked out why it broke.
You reply
Get the diagnosis on WhatsApp or Slack. Reply with a number. Nuberio confirms before acting.
Your alarm, your words.
Reply to any alert with plain English. No dashboards, no runbooks.
ACKNOWLEDGE
You're on it. Stops reminder notifications.
INVESTIGATE
Triggers deep-dive analysis with full log context.
RESOLVE
Marks it resolved and logs time to fix.
ADJUST
Gets AI-recommended threshold change. Reply YES to apply.
SILENCE
Suppresses a noisy alarm permanently. You can WATCH to re-enable.
WATCH
Re-enables a silenced alarm.
STATUS
Get a live summary of active alarms and diagnoses.
What Nuberio Diagnose reads
Every signal your alarm needs.
When an alarm fires, Nuberio Diagnose investigates in parallel — pulling whatever signals are relevant to the metric that triggered it. Most diagnoses complete in under 60 seconds.
CloudWatch Logs
Error, warning, timeout, OOM, and panic log lines from the affected resource in the last 15 minutes. Covers ECS, Lambda, RDS, EC2, ALB, EKS, API Gateway, and SNS log groups.
CloudTrail
Recent API calls that may have caused the issue — deploys, config changes, IAM modifications — in the last 2 hours.
Resource state
Current state of the resource at the moment of diagnosis: ECS running vs desired vs pending task count and active deployments; RDS status and Multi-AZ; Lambda state, concurrency, and timeout; EC2 state and instance type; ALB target health per target group; EKS cluster status and node groups; SQS messages visible, in-flight, and delayed.
Related metrics
Other CloudWatch metrics on the same resource in the last 30 minutes — to surface correlated signals the triggering alarm didn't capture.
GuardDuty, Security Hub, and Inspector
Active findings from all three services on the affected resource. Filtered to the specific resource that triggered the alarm where the ARN can be matched.
AWS Health
Open regional or service-wide events from AWS Health — to distinguish your incident from an AWS-side outage before waking anyone up.
Service quota utilization
How close each AWS service in your account is to its limit — identifies quota exhaustion as a potential cause of throttling, rejections, or request failures.
Certificate expiry
Any TLS certificates approaching expiry — correlated against the alarm when the failure looks like a connection, SSL handshake, or certificate error.
Coverage
What Nuberio Diagnose handles.
Nuberio Diagnose works with CloudWatch alarms on any of these services. Connect your AWS account once — every alarm on every resource is automatically eligible.
Compute and containers
- EC2
- ECS (Fargate and EC2-backed)
- Lambda
- EKS
Database and cache
- RDS (all engines)
- Aurora
- DynamoDB
- ElastiCache
Network and edge
- ALB / NLB
- API Gateway
- CloudFront
Storage and messaging
- S3
- EBS
- SQS
- SNS
Plus security findings from GuardDuty, Security Hub, Inspector, and Trusted Advisor.
Pricing
Diagnose is included free.
Nuberio Diagnose is free for 1 AWS account — no per-investigation pricing, no trial expiry. Need more accounts? Growth is $49/mo for up to 5. WhatsApp and Slack both supported on all tiers.
Common questions about Nuberio Diagnose.
Answers about diagnosis speed, accuracy, remediation permissions, and how Diagnose works alongside PagerDuty, Slack, and WhatsApp.
Alarm guides
Common alarms Nuberio diagnoses
Not sure what threshold to set, or what it means when one of these fires? Each guide covers the recommended alarm configuration, common root causes, and how Nuberio investigates it.
ECS RunningTaskCount
Detects tasks below desired count — crash loops, bad deploys, OOM kills.
RDS DatabaseConnections
Connection saturation before max_connections is reached.
EC2 StatusCheckFailed
Hardware and OS-level failures requiring immediate action.
ALB UnHealthyHostCount
Targets failing health checks — capacity cascade risk.
Lambda Errors
Function-level failures, timeouts, and OOM kills.
DynamoDB ConsumedReadCapacityUnits
Throttle risk and cost spikes from hot partitions or scans.
Connect AWS. First diagnosis in under 10 minutes.
Read-only CloudFormation template. No CLI. No agent. Revoke access from your IAM console in 30 seconds.
Free for 1 AWS account. Growth $49/mo for up to 5. Cancel any time.