Skip to main content
Exam Guides 🇺🇸 · 9 min read

SOA-C03 Deep Dive: Monitoring, Logging and Automated Deployment (Domains 1 & 3)

Master the two heaviest SOA-C03 domains: Monitoring, Logging, Analysis & Remediation (22%) and Deployment, Provisioning & Automation (22%). Covers CloudWatch, Container Insights, X-Ray, CDK, Systems Manager, and CodeDeploy.

Domains 1 and 3 of the SOA-C03 (AWS Certified CloudOps Engineer -- Associate) exam each carry 22% of the total score. Together, they represent 44% of the exam -- nearly half your grade. If you master these two domains, you have a massive advantage.

This deep dive covers every critical service, concept, and comparison you need. We include code examples, decision tables, and the specific details AWS loves to test. Bookmark this page and return to it during your final review.

Domain 1: Monitoring, Logging, Analysis & Remediation (22%)

This domain tests your ability to collect metrics and logs, analyze them, and automatically remediate issues. The keyword is "remediation" -- AWS wants to see that you do not just detect problems, you fix them with automation.

CloudWatch: Metrics, Alarms, and Logs Insights

Metrics fundamentals: CloudWatch collects metrics from AWS services automatically. EC2 sends metrics every 5 minutes (basic monitoring) or every 1 minute (detailed monitoring, extra cost). Custom metrics can be published at resolutions down to 1 second (high-resolution metrics). The exam often tests whether you know that memory utilization and disk space are NOT default EC2 metrics -- you need the CloudWatch Agent to collect them.

Alarms: A CloudWatch alarm watches a single metric and triggers when it crosses a threshold. Key states: OK, ALARM, INSUFFICIENT_DATA. Alarms can trigger SNS notifications, Auto Scaling actions, or EC2 actions (stop, terminate, reboot, recover).

Composite alarms: Combine multiple alarms with AND/OR logic. For the exam, remember that composite alarms reduce alarm noise -- instead of getting paged for every individual metric, you can create a composite that only fires when CPU is high AND request latency is high AND error rate exceeds 5%.

Logs Insights: A query language for CloudWatch Logs. The exam may show you a Logs Insights query and ask what it does, or ask you which query would solve a problem. Key syntax:

fields @timestamp, @message
| filter @message like /ERROR/
| stats count(*) as errorCount by bin(5m)
| sort errorCount desc
| limit 20

This query finds ERROR messages, groups them into 5-minute buckets, counts them, and shows the top 20 time windows. You do not need to memorize syntax, but you should be able to read a query and understand its output.

Container Insights for ECS and EKS

CloudWatch Container Insights collects, aggregates, and summarizes metrics and logs from containerized applications. This is a major addition to the SOA-C03 exam.

For ECS: Container Insights is enabled at the cluster level (account setting or cluster setting). It collects CPU, memory, network, and storage metrics at the task, service, and cluster levels. For Fargate tasks, it collects metrics automatically once enabled.

For EKS: Requires the CloudWatch Agent or the AWS Distro for OpenTelemetry (ADOT) collector running as a DaemonSet on your nodes. It provides pod-level, node-level, and cluster-level metrics.

Exam Tip: The exam may ask: "How do you monitor memory utilization for ECS tasks running on Fargate?" The answer is Container Insights -- it is the only way to get task-level memory metrics for Fargate. The standard CloudWatch metrics for ECS do not include memory at the task level.

X-Ray, Application Signals, and DevOps Guru

AWS X-Ray: Distributed tracing for microservices. X-Ray creates a service map showing how requests flow through your application. It uses the X-Ray SDK (or ADOT) to instrument your code and the X-Ray daemon to send trace segments to the X-Ray API. Key concepts: traces, segments, subsegments, annotations (indexed, searchable), and metadata (not indexed).

Application Signals: A newer CloudWatch feature that provides SLO (Service Level Objective) monitoring. It automatically discovers application services and creates dashboards showing latency, error rate, and availability against your defined SLOs. Know that it works with applications instrumented with ADOT or the CloudWatch Agent.

DevOps Guru: Uses machine learning to detect anomalous operational behavior. It analyzes CloudWatch metrics, Config changes, and CloudTrail events to proactively identify issues. DevOps Guru generates "insights" -- either reactive (something already went wrong) or proactive (something is likely to go wrong). For the exam, remember that DevOps Guru can be scoped to specific CloudFormation stacks or tags.

CloudTrail: Records API calls (management events by default, data events optionally). Management events are free for the last 90 days in Event History. For the exam, know the difference: management events capture control-plane actions (CreateBucket, RunInstances), while data events capture data-plane actions (GetObject, PutItem) and cost extra.

Domain 3: Deployment, Provisioning & Automation (22%)

This domain focuses on how you deploy, update, and manage AWS infrastructure and applications. The 2026 exam places heavy emphasis on CDK and Systems Manager.

CDK vs CloudFormation: The Full Picture

A critical concept for the exam: CDK does NOT replace CloudFormation. CDK is a framework that lets you define infrastructure using programming languages (TypeScript, Python, Java, C#, Go). When you run cdk synth, CDK generates a CloudFormation template. When you run cdk deploy, it submits that template to CloudFormation for execution.

CDK Construct Levels:

  • L1 (Cfn resources): Direct 1:1 mapping to CloudFormation resources. Named CfnXxx. No defaults, you configure everything manually.
  • L2 (Curated): Higher-level abstractions with sensible defaults. For example, s3.Bucket() creates a bucket with encryption enabled by default. Most commonly used on the exam.
  • L3 (Patterns): Complete architectures. For example, ecs_patterns.ApplicationLoadBalancedFargateService() creates an ECS Fargate service behind an ALB with all networking configured.

CDK L2 Construct Example (Python):

from aws_cdk import (
    Stack,
    aws_s3 as s3,
    aws_lambda as _lambda,
    aws_s3_notifications as s3n,
    RemovalPolicy,
    Duration,
)
from constructs import Construct

class ImageProcessingStack(Stack):
    def __init__(self, scope: Construct, id: str, **kwargs):
        super().__init__(scope, id, **kwargs)

        # L2 construct: bucket with encryption + lifecycle
        bucket = s3.Bucket(self, "UploadBucket",
            encryption=s3.BucketEncryption.S3_MANAGED,
            removal_policy=RemovalPolicy.DESTROY,
            lifecycle_rules=[
                s3.LifecycleRule(
                    transitions=[
                        s3.Transition(
                            storage_class=s3.StorageClass.GLACIER,
                            transition_after=Duration.days(90),
                        )
                    ]
                )
            ],
        )

        # L2 construct: Lambda function
        processor = _lambda.Function(self, "Processor",
            runtime=_lambda.Runtime.PYTHON_3_12,
            handler="index.handler",
            code=_lambda.Code.from_asset("lambda"),
        )

        # Event notification: S3 triggers Lambda
        bucket.add_event_notification(
            s3.EventType.OBJECT_CREATED,
            s3n.LambdaDestination(processor),
        )

Notice how the L2 construct handles encryption, lifecycle, and IAM permissions (the bucket automatically grants invoke permission to Lambda) with minimal code. This is the power of CDK, and this is what the exam tests.

Systems Manager: The Operational Swiss Army Knife

AWS Systems Manager (SSM) is an enormous service with many sub-features. Here are the ones most tested on SOA-C03:

  • Run Command: Execute commands on managed instances (EC2 or on-premises) without SSH/RDP. Uses SSM Agent (pre-installed on Amazon Linux 2/2023 and Windows AMIs). Commands are defined in SSM Documents.
  • Patch Manager: Automate OS and application patching. Define patch baselines (which patches to approve), patch groups (which instances to patch), and maintenance windows (when to patch).
  • State Manager: Ensure instances maintain a desired configuration over time. Associates an SSM Document with target instances and runs it on a schedule.
  • Fleet Manager: A unified UI to manage remote instances. View file systems, performance counters, Windows Registry, and more.
  • Automation: Run multi-step operational runbooks. Common use case: create a golden AMI by launching an instance, installing software, running tests, and creating an AMI -- all automated.
  • Parameter Store: Secure, hierarchical key-value storage for configuration data and secrets. Free for standard parameters (up to 10,000). Supports encryption with KMS.

Systems Manager Run Command Example:

aws ssm send-command \
  --document-name "AWS-RunShellScript" \
  --targets "Key=tag:Environment,Values=Production" \
  --parameters 'commands=["sudo yum update -y","sudo systemctl restart httpd"]' \
  --comment "Patch and restart Apache on all prod servers" \
  --timeout-seconds 600 \
  --max-concurrency "50%" \
  --max-errors "10%"

Key details the exam tests: --max-concurrency controls how many instances run the command simultaneously. --max-errors defines the failure threshold before the command stops rolling out. These parameters are critical for safe operational changes.

Deployment Strategies: CodeDeploy, Beanstalk, and More

CodeDeploy: Automates application deployments to EC2, Lambda, and ECS. The exam frequently tests blue/green deployments for ECS:

  • CodeDeploy creates a new task set (green) alongside the existing one (blue)
  • Routes a percentage of traffic to green for testing (canary or linear)
  • If healthy, shifts all traffic to green and terminates blue
  • If unhealthy, rolls back to blue automatically

Elastic Beanstalk Deployment Policies:

Policy Downtime? Rollback Speed Cost
All at once Yes Manual redeploy No extra cost
Rolling Partial (reduced capacity) Manual redeploy No extra cost
Rolling with additional batch No Manual redeploy Temporary extra instances
Immutable No Fast (terminate new ASG) Double capacity temporarily
Blue/Green (swap URLs) No Fast (swap back) Full duplicate environment

EC2 Image Builder: Automates the creation of custom AMIs. You define a pipeline with a base image, build components (install software), test components (validate), and a distribution configuration (which regions and accounts receive the AMI). The exam tests Image Builder in the context of "golden AMI" strategies.

Lambda Deployment Automation: CodeDeploy supports Lambda traffic shifting using aliases. Canary deployments send a percentage (e.g., 10%) to the new version, wait, then shift all traffic. Linear deployments shift traffic in equal increments over time.

Decision Table: Which Tool for Which Operational Task

Operational Task Best Tool Why
Run a shell command on 500 EC2 instances SSM Run Command No SSH, rate-controlled, auditable
Patch OS on a schedule SSM Patch Manager Baselines, groups, maintenance windows
Ensure instances always have agent installed SSM State Manager Scheduled association enforces desired state
Define infra using Python/TypeScript AWS CDK Generates CloudFormation, supports loops/conditions natively
Deploy with zero downtime to ECS CodeDeploy Blue/Green Traffic shifting with automatic rollback
Build a golden AMI pipeline EC2 Image Builder Automated build, test, distribute
Detect configuration drift CloudFormation Drift Detection Compares actual state to template
Store database connection string securely SSM Parameter Store (SecureString) Free, KMS-encrypted, integrated with SSM
Auto-rotate database credentials Secrets Manager Built-in rotation with Lambda
Query error patterns in logs CloudWatch Logs Insights SQL-like queries across log groups
Practice These Domains: CertLand offers 380 SOA-C03 practice questions with detailed explanations. Filter by Domain 1 or Domain 3 to drill into monitoring and deployment scenarios specifically. Each explanation breaks down why the correct answer is right and why every distractor is wrong.

Domains 1 and 3 reward candidates who understand the operational "how" -- not just which service to use, but how to configure it, what metrics to watch, and what automation to put in place. Master these two domains and you are nearly halfway to passing the SOA-C03.

Ready to test your knowledge? Start practicing with CertLand's SOA-C03 CloudOps question bank.

Comments

Sign in to leave a comment.

No comments yet. Be the first!

Comments are reviewed before publication.