Course Outline

Foundations of Cloud Operations on AWS

  • Operational roles and responsibilities in the cloud
  • AWS account structure, organizations, and multi-account strategy
  • Core operational services: CloudWatch, CloudTrail, AWS Config

Infrastructure as Code and Provisioning

  • Principles of IaC and immutable infrastructure
  • Provisioning with Terraform and AWS CloudFormation
  • Managing state, modules, and environment promotion

CI/CD and Deployment Strategies

  • Designing CI/CD pipelines for cloud-native apps
  • Blue/green, canary, and rolling deployments
  • Automating rollback, health checks, and release validation

Monitoring, Observability, and Alerting

  • Metrics, logs, and traces: ship, store, and analyze
  • Using CloudWatch, X-Ray, and third-party observability tools
  • Defining SLOs/SLIs, alerting policies, and on-call practices

Security Operations and Identity Management

  • IAM best practices, least privilege, and cross-account access
  • Secrets management, KMS, and secure parameter stores
  • Operational security: patching strategies, vulnerability scanning, and audit trails

Resilience, Backup, and Disaster Recovery

  • Designing for fault tolerance and high availability
  • Backup strategies, snapshot automation, and restore procedures
  • Disaster recovery planning and runbook creation

Cost Optimization and Governance

  • Cost visibility: billing, tagging, and cost allocation strategies
  • Rightsizing, reserved instances/savings plans, and budgeting controls
  • Governance: policies, guardrails, and automation for compliance

Containers, Serverless, and Runtime Operations

  • Operational considerations for ECS, EKS, and Lambda
  • Service discovery, autoscaling, and resource limits
  • Logging, tracing, and debugging containerized workloads

Incident Response, Playbooks, and Chaos Engineering

  • Runbook-driven incident response and postmortem practices
  • Automating remediation and self-healing patterns
  • Intro to chaos experiments for validating resilience

Hands-on Workshop: Operate a Sample Workload

  • Deploy a sample application using IaC and a CI/CD pipeline
  • Implement monitoring, alerts, and an automated remediation script
  • Simulate incidents and practice runbook-based response

Summary and Next Steps

Requirements

  • A basic understanding of cloud concepts and networking
  • Familiarity with Linux command line and scripting
  • Experience with source control (Git) and basic CI/CD concepts

Audience

  • Cloud operations engineers
  • SREs and platform engineers
  • DevOps engineers and technical team leads
 21 Hours

Number of participants


Price per participant

Testimonials (5)

Provisional Upcoming Courses (Require 5+ participants)

Related Categories