Self-service actions & Day 2 Ops
Not live — Governed self-service actions for Day 2 operations are not available in the product yet. This page describes the intended direction only; timelines and scope will be announced when features ship.
Self-service actions in the Day 2 Ops sense mean letting engineers and operators trigger or request production change through governed paths—policy checks, approvals where required, time bounds, and audit history—instead of only ticket-driven handoffs.
That work sits alongside monitoring, incidents, status communication, and the service catalog so the same platform that shows health can eventually authorize and record safe change.
Examples (illustrative)
The list below is not a commitment to ship every item; it shows the kind of work teams often want on rails after launch.
Deployments, runtime, and capacity
- Roll forward or roll back a release to a prior artifact, sometimes with canary or progressive delivery and automatic promotion gates.
- Restart a service, job worker, or sidecar (for example after a bad config cache) without a full redeploy.
- Scale replica count, queue consumers, or autoscaling bounds within policy (min/max, cost caps).
- Drain or cordon a host or availability zone before maintenance, then uncordon when checks pass.
- Run a one-off task in the right cluster and namespace (migration runner, reprocessor) from an approved image and command template.
Data, caches, and messaging
- Kick off a backfill or reindex with row limits, rate limits, and a kill switch.
- Invalidate or warm caches (CDN, application cache) for a key prefix or named bucket.
- Replay messages from a dead-letter queue to a target topic with a max batch size.
- Pause or resume a consumer group when upstream is unhealthy (with alerting tied to the action).
Access, break-glass, and operations hygiene
- Request time-bound production access (SSH, cloud shell, database read-only session) with approvers, TTL, and automatic revocation.
- Rotate an API key or database credential through the secret store’s workflow instead of pasting values in chat.
- Open or close a maintenance window and tie it to status page messaging in one flow.
Configuration, flags, and integrations
- Change a feature flag or dynamic config in a controlled environment order (staging → canary → full) with required reviewers.
- Update an integration webhook URL, OAuth client, or third-party connector with validation and audit.
- Adjust rate limits or circuit breaker thresholds when traffic patterns change, within guardrails.
Network, edge, and certificates
- Request a certificate renewal or attach a renewed cert to a load balancer or ingress.
- Toggle or stage WAF rules, IP allow lists, or geofencing when security approves.
- Coordinate a DNS or traffic shift (weighted records, service mesh route) with pre-checks and rollback.
Catalog and ownership
- Propose updates to service metadata (owners, tiers, dependencies, runbooks) so incidents and routing stay accurate.
- Link a runbook or on-call rotation to a component so self-service actions inherit the right context and escalation paths.
For background on why this class of work dominates after launch, see Developer autonomy and the work that repeats after ship on the Exemplar blog.