Files

Ryan Wilson 68684df471 Initial commit: Spicy CDK automation framework

Jenkins shared library and CDK constructs for AWS infrastructure.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2025-11-18 22:21:00 -08:00

17 KiB

Raw Blame History

ECS Cluster Deployment

Deploy production-ready ECS clusters with AWS CDK.

Features

EC2 Capacity Provider with managed scaling (replaces custom SchedulableContainers metric)
Mixed Instances Policy for Spot support (replaces Autospotting)
Launch Templates with IMDSv2 and gp3 EBS volumes
Instance Draining via lifecycle hooks for graceful task migration
Optional Fargate capacity providers for serverless workloads
Internal/External ALBs with HTTPS support
Container Insights for monitoring
Automatic instance refresh via max instance lifetime

Quick Start

Minimal Jenkinsfile - Using CloudFormation Imports

Minimal props: Only vpcStackName required. All VPC details auto-import from VPC stack exports.

@Library(["spicy-automation@main"]) _

spicyECSCluster(
    jenkinsAwsCredentialsId: "aws-credentials",
    region: "ca-central-1",
    stackName: "my-ecs-cluster",
    vpcStackName: "my-vpc",  // Auto-imports ALL VPC details (VPC ID, CIDR, subnets, AZs)
    ownerTag: "MyTeam",
    productTag: "my-product",
    componentTag: "ecs-cluster",
    environment: "dev"
)

What auto-imports from VPC stack:

VPC ID from ${vpcStackName}-VPCID
VPC CIDR from ${vpcStackName}-VPCCIDR
Number of AZs from ${vpcStackName}-NumberOfAZs
Private subnet IDs from ${vpcStackName}-PrivateSubnetA1ID, ${vpcStackName}-PrivateSubnetB1ID, etc.
Public subnet IDs from ${vpcStackName}-PublicSubnetAID, ${vpcStackName}-PublicSubnetBID, etc. (if createExternalLoadBalancer: true)
Availability zones auto-derived from region and number of AZs

Production Jenkinsfile with All Options

@Library(["spicy-automation@main"]) _

spicyECSCluster(
    // AWS Configuration
    jenkinsAwsCredentialsId: "aws-credentials",
    region: "ca-central-1",
    accountId: "123456789012",
    stackName: "prod-ecs-cluster",

    // VPC Configuration - only vpcStackName required, all VPC details auto-import
    vpcStackName: "production-vpc",
    // VPC ID, CIDR, subnets, AZs, and numberOfAzs all auto-import from VPC stack exports

    // Tags
    ownerTag: "Platform",
    productTag: "spicy",
    componentTag: "ecs-cluster",
    environment: "prod",

    // Instance Configuration
    instanceType: "m5a.xlarge",
    additionalInstanceTypes: "m5.xlarge,m5d.xlarge,m5n.xlarge",
    keyName: "my-keypair",
    ebsVolumeSize: 100,

    // Scaling
    minClusterSize: 3,
    maxClusterSize: 10,
    targetCapacityPercent: 100,

    // Spot Configuration (for cost savings)
    spotEnabled: true,
    onDemandPercentage: 50,  // 50% On-Demand, 50% Spot
    spotAllocationStrategy: "capacity-optimized",

    // Load Balancers
    createExternalLoadBalancer: true,
    createInternalLoadBalancer: true,
    certificateArn: "arn:aws:acm:ca-central-1:123456789012:certificate/xxx",

    // Fargate (optional hybrid - enables both FARGATE and FARGATE_SPOT)
    enableFargate: false,

    // Timeouts
    drainingTimeout: 900,        // 15 minutes for task draining
    maxInstanceLifetime: 604800, // 7 days for instance refresh

    // Container Insights
    containerInsights: true,

    // Approval for production
    approvers: "admin,platform-team"
)

Parameters Reference

Required Parameters

Parameter	Description	Example
`jenkinsAwsCredentialsId`	Jenkins credential ID for AWS	`"aws-credentials"`
`region`	AWS region	`"ca-central-1"`
`stackName`	CloudFormation stack name	`"my-ecs-cluster"`
`vpcStackName`	VPC stack name - required. All VPC details (VPC ID, CIDR, subnets, AZs) auto-import from VPC stack exports	`"my-vpc"`
`ownerTag`	Owner tag value	`"MyTeam"`
`productTag`	Product tag value	`"my-product"`

Instance Configuration

Parameter	Default	Description
`instanceType`	`m5a.large`	Primary EC2 instance type
`additionalInstanceTypes`	-	Additional types for Spot diversity
`keyName`	-	EC2 key pair for SSH access
`ebsVolumeSize`	`100`	EBS volume size in GB
`containerInsights`	`true`	Enable Container Insights

Scaling Configuration

Parameter	Default	Description
`minClusterSize`	`2`	Minimum number of instances
`maxClusterSize`	`4`	Maximum number of instances
`targetCapacityPercent`	`100`	Target utilization for managed scaling

Spot Configuration

Parameter	Default	Description
`spotEnabled`	`false`	Enable Spot instances
`onDemandPercentage`	`100`	Percentage of On-Demand (rest is Spot)
`spotAllocationStrategy`	`capacity-optimized`	Spot allocation strategy

Spot Allocation Strategies:

capacity-optimized - Best for interruption avoidance (recommended)
lowest-price - Best for cost, higher interruption risk
capacity-optimized-prioritized - Prioritizes instance types you specify

Load Balancer Configuration

Parameter	Default	Description
`createExternalLoadBalancer`	`false`	Create internet-facing ALB (public subnets auto-imported from VPC stack if enabled)
`createInternalLoadBalancer`	`false`	Create internal ALB
`certificateArn`	-	ACM certificate for HTTPS

Fargate Configuration

Parameter	Default	Description
`enableFargate`	`false`	Enable Fargate capacity providers (adds both FARGATE and FARGATE_SPOT)

Lifecycle Configuration

Parameter	Default	Description
`drainingTimeout`	`900`	Seconds to wait for task draining
`maxInstanceLifetime`	`604800`	Max instance age (7 days)

Environment-Specific Configuration

Development/Sandbox

spicyECSCluster(
    // ... base config ...
    environment: "dev",
    minClusterSize: 1,
    maxClusterSize: 2,
    spotEnabled: true,
    onDemandPercentage: 0,  // 100% Spot for max savings
)

Staging

spicyECSCluster(
    // ... base config ...
    environment: "staging",
    minClusterSize: 2,
    maxClusterSize: 4,
    spotEnabled: true,
    onDemandPercentage: 20,  // 80% Spot
)

Production

spicyECSCluster(
    // ... base config ...
    environment: "prod",
    minClusterSize: 3,
    maxClusterSize: 10,
    spotEnabled: true,
    onDemandPercentage: 50,  // 50% On-Demand baseline
    approvers: "admin,platform-team"
)

Stack Outputs

The stack exports these values for use by ECS services:

Output	Export Name	Description
`ClusterName`	`{stackName}-cluster-name`	ECS cluster name
`ClusterArn`	`{stackName}-cluster-arn`	ECS cluster ARN
`VPC`	`{stackName}-VPC`	VPC ID
`ECSHostSecurityGroup`	`{stackName}-ecs-host-security-group`	EC2 security group
`AutoScalingGroupName`	`{stackName}-auto-scaling-group`	ASG name
`ExternalLoadBalancerDNS`	`{stackName}-internet-facing-url`	External ALB DNS
`ExternalLoadBalancerArn`	`{stackName}-internet-facing-arn`	External ALB ARN
`ExternalHTTPListenerArn`	`{stackName}-internet-facing-http-listener`	HTTP listener ARN
`ExternalHTTPSListenerArn`	`{stackName}-internet-facing-https-listener`	HTTPS listener ARN
`InternalLoadBalancerDNS`	`{stackName}-internal-url`	Internal ALB DNS
`InternalLoadBalancerArn`	`{stackName}-internal-arn`	Internal ALB ARN
`InternalHTTPListenerArn`	`{stackName}-internal-http-listener`	HTTP listener ARN
`InternalHTTPSListenerArn`	`{stackName}-internal-https-listener`	HTTPS listener ARN
`LogsBucketName`	`{stackName}-logs-s3-bucket`	ALB access logs bucket

How It Works

Capacity Providers (Replaces Custom Scaling)

The cluster uses ECS Managed Scaling via Capacity Providers:

┌─────────────────────────────────────────────────────────┐
│                    ECS Cluster                          │
├─────────────────────────────────────────────────────────┤
│  Capacity Providers:                                    │
│  ┌─────────────────────────────────────────────────┐   │
│  │ EC2 Capacity Provider                           │   │
│  │ - Managed Scaling: ON                           │   │
│  │ - Target Capacity: 100%                         │   │
│  │ - Min Scaling Step: 1                           │   │
│  │ - Max Scaling Step: 10000                       │   │
│  └─────────────────────────────────────────────────┘   │
│  ┌─────────────────────────────────────────────────┐   │
│  │ FARGATE (optional)                              │   │
│  └─────────────────────────────────────────────────┘   │
│  ┌─────────────────────────────────────────────────┐   │
│  │ FARGATE_SPOT (optional)                         │   │
│  └─────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────┘

This replaces the legacy SchedulableContainers Lambda metric with AWS-native scaling.

Mixed Instances Policy (Replaces Autospotting)

When spotEnabled: true:

┌─────────────────────────────────────────────────────────┐
│              Auto Scaling Group                         │
├─────────────────────────────────────────────────────────┤
│  Mixed Instances Policy:                                │
│  ┌─────────────────────────────────────────────────┐   │
│  │ On-Demand Base Capacity: 0                      │   │
│  │ On-Demand % Above Base: 50%                     │   │
│  │ Spot Allocation: capacity-optimized             │   │
│  └─────────────────────────────────────────────────┘   │
│                                                         │
│  Instance Type Overrides:                               │
│  - m5a.large (primary)                                  │
│  - m5.large                                             │
│  - m5d.large                                            │
│  - m5n.large                                            │
└─────────────────────────────────────────────────────────┘

Benefits over Autospotting:

No Lambda to maintain
Faster response (no polling delay)
Better capacity data (AWS-native)
Simpler architecture

Instance Draining

Two-layer draining for zero-downtime:

Spot Interruption Draining (native ECS):
```
ECS_ENABLE_SPOT_INSTANCE_DRAINING=true
```
ECS agent drains tasks on 2-minute Spot termination notice.
Lifecycle Hook Draining (Lambda):
- ASG sends termination event to SNS
- Lambda sets instance to DRAINING
- Waits for running tasks to migrate
- Completes lifecycle action

Launch Template Features

IMDSv2 Required: Enhanced metadata security
gp3 EBS Volumes: Better performance, lower cost than gp2
Encrypted Volumes: EBS encryption enabled
SSM Agent: Pre-installed for Session Manager access

Migrating from Legacy Automation

Parameter Mapping

Legacy (Ansible)	New (CDK)
`stackName`	`stackName`
`instanceType`	`instanceType`
`minClusterSize`	`minClusterSize`
`maxClusterSize`	`maxClusterSize`
`spotEnabled`	`spotEnabled`
`minOnDemandPercentage`	`onDemandPercentage`
`largestContainerCPUReservation`	(not needed - managed scaling)
`largestContainerMemoryReservation`	(not needed - managed scaling)
`clusterScaleUpAdjustment`	(not needed - managed scaling)
`clusterScaleDownAdjustment`	(not needed - managed scaling)

Removed Features

These legacy features are no longer needed:

SchedulableContainers Lambda: Replaced by Capacity Provider managed scaling
Autospotting: Replaced by Mixed Instances Policy
Launch Configurations: Replaced by Launch Templates
gp2 volumes: Upgraded to gp3
IMDSv1: Now requires IMDSv2

Troubleshooting

Instances Not Joining Cluster

Check the ECS agent logs:

docker logs ecs-agent
cat /var/log/ecs/ecs-agent.log

Verify cluster name in user data:

cat /etc/ecs/ecs.config

Tasks Not Draining

Check Lambda logs in CloudWatch:

/aws/lambda/{stackName}-DrainingLambda

Spot Interruptions

Monitor with CloudWatch metrics:

AWS/EC2Spot → InterruptionRate
AWS/ECS → CPUReservation, MemoryReservation

Consider increasing onDemandPercentage for critical workloads.

Cost Optimization Tips

Use Spot in non-prod: onDemandPercentage: 0
Multiple instance types: Better Spot availability
Right-size instances: Match to your container sizes
Enable Fargate Spot: For batch/background tasks
Set max instance lifetime: Force instance refresh for patches

17 KiB Raw Blame History