Files
spicy-automation/docs/ECS_SERVICE.md
Ryan Wilson fa1e865f50 Fix ALB listener default action, auto-import numberOfAzs, and correct docs
- Fix HTTP listener in spicy-alb.ts missing default action when no certificate
  is provided, which would cause CDK synth to fail
- Auto-import numberOfAzs from VPC stack exports (NumberOfAZs) in cluster,
  service, and ALB stacks when not provided via context
- Fix CDK_SYNTH_EXAMPLES.md ALB examples using raw vpcId/subnetIds that don't
  match the actual fromContext() implementation (requires clusterName)
- Fix docs overstating "only clusterName required" to list actual required params
- Remove package-lock.json and add to .gitignore (project uses pnpm)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 17:18:18 -08:00

31 KiB

ECS Service Deployment

Deploy ECS services with mixed capacity provider strategies for EC2 + Fargate burst.

Features

  • Blue/Green Deployments - Zero-downtime deploys with instant rollback via hostname swap
  • Mixed Capacity Provider Strategy - EC2 base + Fargate burst for cost optimization
  • Auto-scaling on CPU, Memory, and ALB request count
  • ALB Integration with host/path routing
  • Deployment Circuit Breaker with automatic rollback
  • ECS Exec for interactive debugging
  • Secrets Manager integration
  • CloudWatch Logs with configurable retention
  • Pipeline Hooks - Custom behavior at every stage

Quick Start

Minimal Jenkinsfile (EC2 Only) - Using CloudFormation Imports

Minimal required props: clusterName, numberOfAzs, serviceName, image, containerPort, and tags (ownerTag, productTag). VPC info auto-imports from cluster/VPC stack exports.

@Library(["spicy-automation@main"]) _

spicyECSService(
    jenkinsAwsCredentialsId: "aws-credentials",
    region: "ca-central-1",
    stackName: "my-api-dev",
    serviceName: "my-api",

    // Cluster info - VPC ID auto-imports from ${clusterName}-VPC
    clusterName: "my-cluster-dev",

    // Container
    image: "nexus.kodeniks.com/docker-hosted/my-api:latest",
    containerPort: 3000,

    // Tags
    ownerTag: "MyTeam",
    productTag: "my-product",
    componentTag: "api",
    environment: "dev"
)

What auto-imports:

  • VPC stack name from ${clusterStackName}-VPCStackName (cluster stack export)
  • VPC ID from ${clusterStackName}-VPC (cluster stack export)
  • VPC CIDR from ${vpcStackName}-VPCCIDR (VPC stack export, using imported VPC stack name)
  • Number of AZs from ${vpcStackName}-NumberOfAZs (VPC stack export, or override with numberOfAzs parameter)
  • Private subnets from ${vpcStackName}-PrivateSubnetA1ID, etc. (VPC stack exports)
  • Logs bucket from ${clusterStackName}-logs-s3-bucket (cluster stack export)

Note: The service stack imports the VPC stack name from the cluster stack export ${clusterStackName}-VPCStackName, then uses that to import all VPC details (CIDR, subnets, AZs) from the VPC stack exports. numberOfAzs is required by the Jenkins pipeline but auto-imports from VPC stack exports when using CDK directly.

Mixed Capacity Strategy (EC2 + Fargate Burst)

@Library(["spicy-automation@main"]) _

spicyECSService(
    jenkinsAwsCredentialsId: "aws-credentials",
    region: "ca-central-1",
    stackName: "my-api-prod",
    serviceName: "my-api",

    // Cluster info - VPC details auto-import from cluster/VPC stack exports
    clusterName: "my-cluster-prod",
    vpcStackName: "my-vpc",  // Optional: auto-imported from cluster stack if not provided

    // Container
    image: "nexus.kodeniks.com/docker-hosted/my-api:v1.2.3",
    containerPort: 3000,
    cpu: 512,
    memory: 1024,

    // Mixed capacity strategy
    capacityProviderStrategy: [
        [capacityProvider: "my-cluster-prod-ec2", base: 2, weight: 3],  // First 2 on EC2, then 75%
        [capacityProvider: "FARGATE_SPOT", weight: 1],                   // 25% burst to Fargate Spot
    ],

    // Scaling
    desiredCount: 2,
    minCapacity: 2,
    maxCapacity: 20,
    targetCpuUtilization: 70,
    targetMemoryUtilization: 80,

    // Routing - Use cluster ALB (default)
    useClusterAlb: true,  // Use cluster ALB (default)
    albScheme: "internet-facing",  // or "internal"
    hostHeader: "api.example.com",
    priority: 100,
    healthCheckPath: "/health",

    // Tags
    ownerTag: "Platform",
    productTag: "my-product",
    componentTag: "api",
    environment: "prod",

    // Approval for prod
    approvers: "admin,platform-team"
)

Parameters Reference

Required Parameters

Parameter Description Example
jenkinsAwsCredentialsId Jenkins credential ID for AWS "aws-credentials"
region AWS region "ca-central-1"
stackName CloudFormation stack name "my-api-dev"
serviceName ECS service name "my-api"
clusterName ECS cluster name (required) - used to auto-import VPC stack name, VPC ID, and other details from cluster stack exports "my-cluster-dev"
clusterStackName Cluster stack name (defaults to clusterName) - used for CloudFormation imports "my-cluster-dev"
vpcStackName VPC stack name (auto-imported from ${clusterStackName}-VPCStackName, or provide explicitly) "my-vpc"
albStackName ALB stack name (for individual ALB) - used to auto-import ALB ARN and listener ARNs from ALB stack "my-service-alb"
image Docker image URI "nexus.kodeniks.com/docker-hosted/my-api:latest"
containerPort Container port 3000
ownerTag Owner tag "MyTeam"
productTag Product tag "my-product"

Container Configuration

Parameter Default Description
containerPort 3000 Container port
cpu 256 CPU units (256 = 0.25 vCPU)
memory 512 Memory in MiB
environment_vars - Map of environment variables
secrets - Map of secret ARNs (env var name → ARN)

Capacity Provider Strategy

Parameter Default Description
capacityProviderStrategy Cluster default Array of capacity provider configs
desiredCount 2 Initial task count

Strategy Item Format:

[
    capacityProvider: "provider-name",  // EC2 provider name, "FARGATE", or "FARGATE_SPOT"
    base: 0,                            // Minimum tasks on this provider
    weight: 1                           // Distribution weight
]

Auto-Scaling

Parameter Default Description
minCapacity - Minimum tasks (required for scaling)
maxCapacity - Maximum tasks (required for scaling)
targetCpuUtilization - Target CPU % (0-100)
targetMemoryUtilization - Target memory % (0-100)
targetRequestsPerTarget - Target requests per task per minute

ALB Routing

Parameter Default Description
useClusterAlb true Use cluster ALB (default). Set to false to use individual ALB per service
albScheme - ALB scheme: "internet-facing" or "internal" (required when useClusterAlb=true)
albStackName - ALB stack name for individual ALB (auto-derived from {stackName}-alb when useClusterAlb=false)
certificateArn - ACM certificate ARN (required for individual ALB when useClusterAlb=false)
redirectHttpToHttps - Redirect HTTP to HTTPS (for individual ALB)
hostHeader - Host header for routing
pathPatterns - Comma-separated path patterns
priority 100 Listener rule priority
stickiness false Enable session stickiness
stickinessDuration 86400 Stickiness duration (seconds)

Health Check

Parameter Default Description
healthCheckPath /health Health check URL path

DNS (Route53) - Blue/Green Only

Parameter Default Description
bgHostedZoneId - Route53 hosted zone ID for blue/green DNS records (optional)
hostName - Simple hostname (e.g., "api.example.com") - auto-generates active/inactive
activeHostname - Active hostname (e.g., "api.example.com") - explicit hostname
inactiveHostname - Inactive hostname (e.g., "inactive-api.example.com") - explicit hostname

Deployment

Parameter Default Description
circuitBreaker true Enable deployment circuit breaker
enableExecuteCommand true Enable ECS Exec

Mixed Capacity Strategy Explained

How Task Distribution Works

capacityProviderStrategy: [
    [capacityProvider: "my-cluster-ec2", base: 2, weight: 3],
    [capacityProvider: "FARGATE_SPOT", weight: 1],
]

Distribution:

  1. Base tasks go to their designated provider first
  2. Additional tasks are distributed by weight ratio

Example with 10 tasks:

Tasks 1-2:  EC2 (base)
Tasks 3-8:  EC2 (weight 3 = 75%)
Tasks 9-10: FARGATE_SPOT (weight 1 = 25%)

Development (Cost Optimized):

capacityProviderStrategy: [
    [capacityProvider: "FARGATE_SPOT", weight: 1],
]

Staging (Balanced):

capacityProviderStrategy: [
    [capacityProvider: "my-cluster-staging-ec2", base: 1, weight: 2],
    [capacityProvider: "FARGATE_SPOT", weight: 1],
]

Production (Reliability + Burst):

capacityProviderStrategy: [
    [capacityProvider: "my-cluster-prod-ec2", base: 2, weight: 3],
    [capacityProvider: "FARGATE_SPOT", weight: 1],
]

Building Docker Images

The pipeline automatically builds and pushes Docker images if a Dockerfile exists:

spicyECSService(
    // ... other params ...

    // Image will be built and pushed automatically
    serviceName: "my-api",  // Used as image name
    imageTag: env.BUILD_NUMBER,  // Or defaults to Git SHA

    // Optional: customize build
    dockerfile: "Dockerfile.prod",
    dockerContext: "./app",
    dockerBuildArgs: [
        NODE_ENV: "production",
        BUILD_DATE: new Date().format("yyyy-MM-dd")
    ],

    // Or skip build and use existing image
    buildImage: false,
    image: "nexus.kodeniks.com/docker-hosted/my-api:v1.2.3",
)

Environment Variables and Secrets

Plain Environment Variables

spicyECSService(
    // ...
    environment_vars: [
        NODE_ENV: "production",
        LOG_LEVEL: "info",
        API_URL: "https://api.example.com"
    ],
)

Secrets from Secrets Manager

spicyECSService(
    // ...
    secrets: [
        DATABASE_URL: "arn:aws:secretsmanager:ca-central-1:123456789:secret:my-db-creds",
        API_KEY: "arn:aws:secretsmanager:ca-central-1:123456789:secret:api-keys::api_key",
    ],
)

Format: "ENV_VAR_NAME": "secret-arn" or "ENV_VAR_NAME": "secret-arn::json-key"

Auto-Scaling Behavior

CPU-Based Scaling

spicyECSService(
    // ...
    minCapacity: 2,
    maxCapacity: 20,
    targetCpuUtilization: 70,
)
  • Scales up when average CPU > 70%
  • Scales down when average CPU < 70%
  • Cooldown: 60s scale-out, 300s scale-in

Request-Based Scaling

spicyECSService(
    // ...
    minCapacity: 2,
    maxCapacity: 20,
    targetRequestsPerTarget: 1000,  // 1000 requests/minute per task
)

Combined Scaling

spicyECSService(
    // ...
    minCapacity: 2,
    maxCapacity: 20,
    targetCpuUtilization: 70,
    targetMemoryUtilization: 80,
    targetRequestsPerTarget: 1000,
)

All policies run independently - whichever triggers first wins.

Deployment Circuit Breaker

Enabled by default. If a deployment fails:

  1. ECS detects unhealthy tasks
  2. After threshold failures, deployment stops
  3. Automatic rollback to previous version
  4. CloudWatch alarm triggered

Disable if needed:

spicyECSService(
    // ...
    circuitBreaker: false,
)

ECS Exec (Debugging)

Enabled by default. Connect to running containers:

aws ecs execute-command \
    --cluster my-cluster-dev \
    --task <task-id> \
    --container my-api \
    --interactive \
    --command "/bin/sh"

Stack Outputs

Output Export Name Description
ServiceName {stackName}-service-name ECS service name
ServiceArn {stackName}-service-arn ECS service ARN
TaskDefinitionArn {stackName}-task-definition-arn Task definition ARN
LogGroupName {stackName}-log-group CloudWatch log group
TargetGroupArn {stackName}-target-group-arn ALB target group ARN

Troubleshooting

Tasks Stuck in PENDING

Cause: No capacity available

Solutions:

  1. Check EC2 instances have room for tasks
  2. Verify Fargate is in capacity strategy
  3. Check task CPU/memory fits available resources

Tasks Failing Health Checks

# Check container logs
aws logs tail /ecs/my-api --follow

# Check target group health
aws elbv2 describe-target-health --target-group-arn <arn>

Deployment Rolling Back

# Check deployment events
aws ecs describe-services --cluster my-cluster --services my-api

# Check task stopped reasons
aws ecs describe-tasks --cluster my-cluster --tasks <task-id>

ECS Exec Not Working

  1. Verify enableExecuteCommand: true
  2. Check task role has SSM permissions
  3. Ensure SSM agent is running in container

Blue/Green Deployments

Blue/Green deployments provide zero-downtime releases with instant rollback capability.

How It Works

  1. Two Services: myapp-blue and myapp-green run simultaneously
  2. DNS Records: Both active and inactive hostnames point to the same ALB (DNS never changes)
  3. ALB Listener Rules: Traffic routing is controlled by listener rule priorities, not DNS
  4. Deploy to Inactive: New version deploys to inactive service with higher priority rule
  5. Test Inactive: Run integration tests against inactive hostname (same ALB, different rule)
  6. Swap Hostnames: Update listener rule priorities to swap active/inactive (no DNS changes)
  7. Keep for Rollback: Old version stays running for rollback window

Important: DNS records are created once and never change. Both api.example.com and inactive-api.example.com always resolve to the same ALB. Traffic routing is controlled entirely by ALB listener rule priorities:

  • Active service: Lower priority (e.g., 100) for active hostname
  • Inactive service: Higher priority (e.g., 200) for inactive hostname
  • When swapping: Only the listener rule priorities are updated via CDK deployment

Architecture

                    ┌─────────────────────────────────────┐
                    │              ALB                     │
                    │  ┌─────────────┐  ┌─────────────┐   │
                    │  │ api.example │  │ inactive-api│   │
                    │  │  Priority:  │  │  Priority:  │   │
                    │  │    100      │  │    200      │   │
                    │  └──────┬──────┘  └──────┬──────┘   │
                    └─────────┼────────────────┼──────────┘
                              │                │
              ┌───────────────▼──┐      ┌──────▼───────────────┐
              │   BLUE Service   │      │   GREEN Service      │
              │   (v1 - Active)  │      │   (v2 - Inactive)    │
              │   ████████████   │      │   ████████████       │
              │   ████████████   │      │   ████████████       │
              └──────────────────┘      └──────────────────────┘

Enable Blue/Green

@Library(["spicy-automation@main"]) _

spicyECSService(
    jenkinsAwsCredentialsId: "aws-credentials",
    region: "ca-central-1",
    stackName: "my-api-prod",
    serviceName: "my-api",

    // Cluster info - VPC details auto-import from cluster/VPC stack exports
    clusterName: "my-cluster-prod",
    vpcStackName: "my-vpc",  // Optional: auto-imported from cluster stack if not provided

    // Container
    image: "nexus.kodeniks.com/docker-hosted/my-api:latest",
    containerPort: 3000,

    // Enable Blue/Green
    blueGreen: true,
    hostName: "api.example.com",  // Auto-generates active/inactive hostnames
    bgHostedZoneId: "Z1234567890",  // Route53 hosted zone (optional, for automatic DNS)

    // ALB Configuration
    useClusterAlb: false,  // Use individual ALB (required for blue/green with individual ALB)
    albScheme: "internet-facing",
    certificateArn: "arn:aws:acm:ca-central-1:123456789:certificate/xxx",
    redirectHttpToHttps: true,
    healthCheckPath: "/health",
    priority: 100,

    // Tags
    ownerTag: "Platform",
    productTag: "my-product",
    componentTag: "api",
    environment: "prod",

    // Test inactive before swap
    blueGreenTest: { args, buildInfo ->
        sh "curl -f https://${buildInfo.inactiveHostname}/health"
        sh "./run-integration-tests.sh ${buildInfo.inactiveHostname}"
    },

    // Test active after swap
    smokeTest: { args, buildInfo ->
        sh "curl -f https://${buildInfo.activeHostname}/health"
    },
)

Blue/Green Parameters

Parameter Default Description
blueGreen false Enable blue/green deployment
hostName - Simple hostname (e.g., "api.example.com") - auto-generates active/inactive
activeHostname - Active hostname (e.g., "api.example.com") - explicit hostname
inactiveHostname - Inactive hostname (e.g., "inactive-api.example.com") - explicit hostname
bgHostedZoneId - Route53 hosted zone ID for automatic DNS records (optional)
rollbackWindowHours 2 Hours to keep old version for rollback

Rollback

Instant rollback by swapping hostnames - no new deployment needed:

@Library(["spicy-automation@main"]) _

spicyRollback(
    jenkinsAwsCredentialsId: "aws-credentials",
    region: "ca-central-1",
    stackName: "my-api-prod",
    serviceName: "my-api",

    // Same params as deploy
    clusterName: "my-cluster-prod",
    vpcStackName: "my-vpc",  // Optional: auto-imported from cluster stack if not provided

    hostName: "api.example.com",  // Or use activeHostname/inactiveHostname
    bgHostedZoneId: "Z1234567890",  // If using Route53

    ownerTag: "Platform",
    productTag: "my-product",
    componentTag: "api",
    environment: "prod",
)

Rollback time: ~30 seconds (just ALB rule updates, no DNS propagation delays)

How it works:

  • DNS records for both active and inactive hostnames already exist and point to the same ALB
  • Rollback simply updates ALB listener rule priorities (e.g., swap priority 100 ↔ 200)
  • No DNS changes are needed, so rollback is instant

State Management

Active color is stored in SSM Parameter Store:

/spicy/{serviceName}/active-color = "blue" | "green"

Check current state:

aws ssm get-parameter --name /spicy/my-api/active-color --query 'Parameter.Value'

Testing Blue/Green DNS

To verify blue/green DNS configuration is working correctly:

  1. Verify DNS Records:

    # Both hostnames should resolve to the same ALB
    dig api.example.com
    dig inactive-api.example.com
    # Both should return the same ALB DNS name
    
  2. Verify ALB Listener Rules:

    # Get ALB ARN from stack outputs
    ALB_ARN=$(aws cloudformation describe-stacks --stack-name my-api-prod-alb \
      --query 'Stacks[0].Outputs[?OutputKey==`LoadBalancerArn`].OutputValue' --output text)
    
    # Get listener ARN
    LISTENER_ARN=$(aws cloudformation describe-stacks --stack-name my-api-prod-alb \
      --query 'Stacks[0].Outputs[?OutputKey==`HTTPSListenerArn`].OutputValue' --output text)
    
    # Check listener rules
    aws elbv2 describe-rules --listener-arn $LISTENER_ARN
    # Active service should have priority 100 for api.example.com
    # Inactive service should have priority 200 for inactive-api.example.com
    
  3. Test Traffic Routing:

    # Get ALB DNS name
    ALB_DNS=$(aws cloudformation describe-stacks --stack-name my-api-prod-alb \
      --query 'Stacks[0].Outputs[?OutputKey==`LoadBalancerDNS`].OutputValue' --output text)
    
    # Test active hostname (should hit active service)
    curl -H "Host: api.example.com" https://$ALB_DNS/health
    
    # Test inactive hostname (should hit inactive service)
    curl -H "Host: inactive-api.example.com" https://$ALB_DNS/health
    
  4. Verify After Swap:

    # After swapping, priorities should be reversed
    # New active should have priority 100
    # Old active should have priority 200
    aws elbv2 describe-rules --listener-arn $LISTENER_ARN
    

Important: DNS records never change - they always point to the same ALB. Only listener rule priorities change during swaps.


Pipeline Hooks

Customize pipeline behavior at every stage with Groovy closures.

Hook Execution Order

┌─────────────┐
│  Checkout   │
└──────┬──────┘
       │
       ▼
┌─────────────┐     ┌──────────────┐
│ Build Image │────▶│ onPostBuild  │  ← Unit tests, linting
└──────┬──────┘     └──────────────┘
       │
       ▼
┌─────────────┐
│ onPreDeploy │  ← Setup, integration test prep
└──────┬──────┘
       │
       ▼
┌─────────────┐
│   Deploy    │
└──────┬──────┘
       │
       ▼ (Blue/Green only)
┌───────────────┐
│ blueGreenTest │  ← Test inactive stack
└───────┬───────┘
        │
        ▼
┌─────────────┐
│ Swap (B/G)  │
└──────┬──────┘
       │
       ▼
┌──────────────┐
│ onPostDeploy │  ← Cleanup, notifications
└──────┬───────┘
       │
       ▼
┌─────────────┐
│  smokeTest  │  ← Smoke tests
└─────────────┘

Available Hooks

buildCommand

Custom build command (replaces default docker build):

spicyECSService(
    // ...
    buildCommand: { args, buildInfo ->
        sh "make build"
        sh "docker tag my-app:latest ${args.image}"
    },
)

onPostBuild

After build completes (linting, unit tests):

spicyECSService(
    // ...
    onPostBuild: { args, buildInfo ->
        sh "npm run lint"
        sh "npm run test:unit"
        junit 'coverage/junit.xml'
        publishHTML(target: [
            reportDir: 'coverage/lcov-report',
            reportFiles: 'index.html',
            reportName: 'Coverage'
        ])
    },
)

onPreDeploy

Before deployment (setup, test prep):

spicyECSService(
    // ...
    onPreDeploy: { args, buildInfo ->
        sh "docker-compose -f docker-compose.test.yml up -d"
        sh "./wait-for-services.sh"
    },
)

blueGreenTest

After inactive stack is up (integration tests):

spicyECSService(
    // ...
    blueGreen: true,
    blueGreenTest: { args, buildInfo ->
        echo "Testing inactive: ${buildInfo.inactiveHostname}"
        sh "curl -f https://${buildInfo.inactiveHostname}/health"
        sh "./run-integration-tests.sh ${buildInfo.inactiveHostname}"
    },
)

Available in buildInfo:

  • inactiveHostname - Inactive service hostname
  • activeHostname - Active service hostname
  • currentActive - Current active color (blue/green)
  • targetColor - Deployment target color

onPostDeploy

After deployment succeeds (cleanup, notifications):

spicyECSService(
    // ...
    onPostDeploy: { args, buildInfo ->
        sh "docker-compose -f docker-compose.test.yml down"
        slackSend(channel: '#deploys', message: "Deployed ${args.serviceName}")
    },
)

smokeTest

After deployment or swap (smoke tests):

spicyECSService(
    // ...
    smokeTest: { args, buildInfo ->
        sh "curl -f https://${buildInfo.activeHostname}/health"
        sh "curl -f https://${buildInfo.activeHostname}/api/status"
    },
)

Hook Parameters

All hooks receive:

Parameter Type Description
args Map All pipeline arguments
buildInfo Map Build context (see below)

buildInfo contents:

Key Description
commitSha Git commit SHA
branch Git branch name
image Built Docker image URI
imageTag Docker image tag
activeHostname Active service hostname
inactiveHostname Inactive service hostname (B/G)
currentActive Current active color (B/G)
targetColor Deployment target color (B/G)
healthCheckPath Health check path

Log Streaming

The pipeline automatically streams ECS logs during deployment:

spicyECSService(
    // ...
    streamLogs: true,  // Default: true
)

Logs show:

  • Container startup logs
  • Health check results
  • Recent ECS service events

Disable if not needed:

spicyECSService(
    // ...
    streamLogs: false,
)

Migration from Legacy

Legacy Parameter New Parameter
desiredCount desiredCount
minCapacity minCapacity
maxCapacity maxCapacity
cpu cpu
memory memory
containerPort containerPort
healthCheckUrl healthCheckPath
albPriority priority
albScheme: internet-facing useClusterAlb: true, albScheme: "internet-facing"
albScheme: internal useClusterAlb: true, albScheme: "internal"
targetGroupStickinessEnabled stickiness
targetGroupLBCookieDurationSecs stickinessDuration
vpcId, vpcCidrBlock, etc. clusterName, vpcStackName (auto-imports)

Key Differences

  1. CloudFormation imports - VPC details auto-import from cluster/VPC stack exports (no explicit IDs needed)
  2. Individual ALB support - Set useClusterAlb: false for dedicated ALB per service
  3. No Ansible - Pure CDK deployment
  4. Capacity providers - Native mixed EC2/Fargate support
  5. Circuit breaker - Native deployment rollback
  6. ECS Exec - Built-in debugging support
  7. Blue/Green hostnames - Use hostName for simple setup or activeHostname/inactiveHostname for explicit control