# ECS Cluster Deployment Deploy production-ready ECS clusters with AWS CDK. ## Features - **EC2 Capacity Provider** with managed scaling (replaces custom SchedulableContainers metric) - **Mixed Instances Policy** for Spot support (replaces Autospotting) - **Launch Templates** with IMDSv2 and gp3 EBS volumes - **Instance Draining** via lifecycle hooks for graceful task migration - **Optional Fargate** capacity providers for serverless workloads - **Internal/External ALBs** with HTTPS support - **Container Insights** for monitoring - **Automatic instance refresh** via max instance lifetime ## Quick Start ### Minimal Jenkinsfile - Using CloudFormation Imports **Minimal props:** Only `vpcStackName` required. All VPC details auto-import from VPC stack exports. ```groovy @Library(["spicy-automation@main"]) _ spicyECSCluster( jenkinsAwsCredentialsId: "aws-credentials", region: "ca-central-1", stackName: "my-ecs-cluster", vpcStackName: "my-vpc", // Auto-imports ALL VPC details (VPC ID, CIDR, subnets, AZs) ownerTag: "MyTeam", productTag: "my-product", componentTag: "ecs-cluster", environment: "dev" ) ``` **What auto-imports from VPC stack:** - VPC ID from `${vpcStackName}-VPCID` - VPC CIDR from `${vpcStackName}-VPCCIDR` - Number of AZs from `${vpcStackName}-NumberOfAZs` - Private subnet IDs from `${vpcStackName}-PrivateSubnetA1ID`, `${vpcStackName}-PrivateSubnetB1ID`, etc. - Public subnet IDs from `${vpcStackName}-PublicSubnetAID`, `${vpcStackName}-PublicSubnetBID`, etc. (if `createExternalLoadBalancer: true`) - Availability zones auto-derived from region and number of AZs ### Production Jenkinsfile with All Options ```groovy @Library(["spicy-automation@main"]) _ spicyECSCluster( // AWS Configuration jenkinsAwsCredentialsId: "aws-credentials", region: "ca-central-1", accountId: "123456789012", stackName: "prod-ecs-cluster", // VPC Configuration - only vpcStackName required, all VPC details auto-import vpcStackName: "production-vpc", // VPC ID, CIDR, subnets, AZs, and numberOfAzs all auto-import from VPC stack exports // Tags ownerTag: "Platform", productTag: "spicy", componentTag: "ecs-cluster", environment: "prod", // Instance Configuration instanceType: "m5a.xlarge", additionalInstanceTypes: "m5.xlarge,m5d.xlarge,m5n.xlarge", keyName: "my-keypair", ebsVolumeSize: 100, // Scaling minClusterSize: 3, maxClusterSize: 10, targetCapacityPercent: 100, // Spot Configuration (for cost savings) spotEnabled: true, onDemandPercentage: 50, // 50% On-Demand, 50% Spot spotAllocationStrategy: "capacity-optimized", // Load Balancers createExternalLoadBalancer: true, createInternalLoadBalancer: true, certificateArn: "arn:aws:acm:ca-central-1:123456789012:certificate/xxx", // Fargate (optional hybrid - enables both FARGATE and FARGATE_SPOT) enableFargate: false, // Timeouts drainingTimeout: 900, // 15 minutes for task draining maxInstanceLifetime: 604800, // 7 days for instance refresh // Container Insights containerInsights: true, // Approval for production approvers: "admin,platform-team" ) ``` ## Parameters Reference ### Required Parameters | Parameter | Description | Example | | ------------------------- | -------------------------------------------------------------------------------------------------------------- | ------------------- | | `jenkinsAwsCredentialsId` | Jenkins credential ID for AWS | `"aws-credentials"` | | `region` | AWS region | `"ca-central-1"` | | `stackName` | CloudFormation stack name | `"my-ecs-cluster"` | | `vpcStackName` | VPC stack name - **required**. All VPC details (VPC ID, CIDR, subnets, AZs) auto-import from VPC stack exports | `"my-vpc"` | | `ownerTag` | Owner tag value | `"MyTeam"` | | `productTag` | Product tag value | `"my-product"` | ### Instance Configuration | Parameter | Default | Description | | ------------------------- | ----------- | ----------------------------------- | | `instanceType` | `m5a.large` | Primary EC2 instance type | | `additionalInstanceTypes` | - | Additional types for Spot diversity | | `keyName` | - | EC2 key pair for SSH access | | `ebsVolumeSize` | `100` | EBS volume size in GB | | `containerInsights` | `true` | Enable Container Insights | ### Scaling Configuration | Parameter | Default | Description | | ----------------------- | ------- | -------------------------------------- | | `minClusterSize` | `2` | Minimum number of instances | | `maxClusterSize` | `4` | Maximum number of instances | | `targetCapacityPercent` | `100` | Target utilization for managed scaling | ### Spot Configuration | Parameter | Default | Description | | ------------------------ | -------------------- | -------------------------------------- | | `spotEnabled` | `false` | Enable Spot instances | | `onDemandPercentage` | `100` | Percentage of On-Demand (rest is Spot) | | `spotAllocationStrategy` | `capacity-optimized` | Spot allocation strategy | **Spot Allocation Strategies:** - `capacity-optimized` - Best for interruption avoidance (recommended) - `lowest-price` - Best for cost, higher interruption risk - `capacity-optimized-prioritized` - Prioritizes instance types you specify ### Load Balancer Configuration | Parameter | Default | Description | | ---------------------------- | ------- | ----------------------------------------------------------------------------------- | | `createExternalLoadBalancer` | `false` | Create internet-facing ALB (public subnets auto-imported from VPC stack if enabled) | | `createInternalLoadBalancer` | `false` | Create internal ALB | | `certificateArn` | - | ACM certificate for HTTPS | ### Fargate Configuration | Parameter | Default | Description | | --------------- | ------- | ---------------------------------------------------------------------- | | `enableFargate` | `false` | Enable Fargate capacity providers (adds both FARGATE and FARGATE_SPOT) | ### Lifecycle Configuration | Parameter | Default | Description | | --------------------- | -------- | --------------------------------- | | `drainingTimeout` | `900` | Seconds to wait for task draining | | `maxInstanceLifetime` | `604800` | Max instance age (7 days) | ## Environment-Specific Configuration ### Development/Sandbox ```groovy spicyECSCluster( // ... base config ... environment: "dev", minClusterSize: 1, maxClusterSize: 2, spotEnabled: true, onDemandPercentage: 0, // 100% Spot for max savings ) ``` ### Staging ```groovy spicyECSCluster( // ... base config ... environment: "staging", minClusterSize: 2, maxClusterSize: 4, spotEnabled: true, onDemandPercentage: 20, // 80% Spot ) ``` ### Production ```groovy spicyECSCluster( // ... base config ... environment: "prod", minClusterSize: 3, maxClusterSize: 10, spotEnabled: true, onDemandPercentage: 50, // 50% On-Demand baseline approvers: "admin,platform-team" ) ``` ## Stack Outputs The stack exports these values for use by ECS services: | Output | Export Name | Description | | -------------------------- | -------------------------------------------- | ---------------------- | | `ClusterName` | `{stackName}-cluster-name` | ECS cluster name | | `ClusterArn` | `{stackName}-cluster-arn` | ECS cluster ARN | | `VPC` | `{stackName}-VPC` | VPC ID | | `ECSHostSecurityGroup` | `{stackName}-ecs-host-security-group` | EC2 security group | | `AutoScalingGroupName` | `{stackName}-auto-scaling-group` | ASG name | | `ExternalLoadBalancerDNS` | `{stackName}-internet-facing-url` | External ALB DNS | | `ExternalLoadBalancerArn` | `{stackName}-internet-facing-arn` | External ALB ARN | | `ExternalHTTPListenerArn` | `{stackName}-internet-facing-http-listener` | HTTP listener ARN | | `ExternalHTTPSListenerArn` | `{stackName}-internet-facing-https-listener` | HTTPS listener ARN | | `InternalLoadBalancerDNS` | `{stackName}-internal-url` | Internal ALB DNS | | `InternalLoadBalancerArn` | `{stackName}-internal-arn` | Internal ALB ARN | | `InternalHTTPListenerArn` | `{stackName}-internal-http-listener` | HTTP listener ARN | | `InternalHTTPSListenerArn` | `{stackName}-internal-https-listener` | HTTPS listener ARN | | `LogsBucketName` | `{stackName}-logs-s3-bucket` | ALB access logs bucket | ## How It Works ### Capacity Providers (Replaces Custom Scaling) The cluster uses **ECS Managed Scaling** via Capacity Providers: ``` ┌─────────────────────────────────────────────────────────┐ │ ECS Cluster │ ├─────────────────────────────────────────────────────────┤ │ Capacity Providers: │ │ ┌─────────────────────────────────────────────────┐ │ │ │ EC2 Capacity Provider │ │ │ │ - Managed Scaling: ON │ │ │ │ - Target Capacity: 100% │ │ │ │ - Min Scaling Step: 1 │ │ │ │ - Max Scaling Step: 10000 │ │ │ └─────────────────────────────────────────────────┘ │ │ ┌─────────────────────────────────────────────────┐ │ │ │ FARGATE (optional) │ │ │ └─────────────────────────────────────────────────┘ │ │ ┌─────────────────────────────────────────────────┐ │ │ │ FARGATE_SPOT (optional) │ │ │ └─────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────┘ ``` This replaces the legacy `SchedulableContainers` Lambda metric with AWS-native scaling. ### Mixed Instances Policy (Replaces Autospotting) When `spotEnabled: true`: ``` ┌─────────────────────────────────────────────────────────┐ │ Auto Scaling Group │ ├─────────────────────────────────────────────────────────┤ │ Mixed Instances Policy: │ │ ┌─────────────────────────────────────────────────┐ │ │ │ On-Demand Base Capacity: 0 │ │ │ │ On-Demand % Above Base: 50% │ │ │ │ Spot Allocation: capacity-optimized │ │ │ └─────────────────────────────────────────────────┘ │ │ │ │ Instance Type Overrides: │ │ - m5a.large (primary) │ │ - m5.large │ │ - m5d.large │ │ - m5n.large │ └─────────────────────────────────────────────────────────┘ ``` Benefits over Autospotting: - No Lambda to maintain - Faster response (no polling delay) - Better capacity data (AWS-native) - Simpler architecture ### Instance Draining Two-layer draining for zero-downtime: 1. **Spot Interruption Draining** (native ECS): ```bash ECS_ENABLE_SPOT_INSTANCE_DRAINING=true ``` ECS agent drains tasks on 2-minute Spot termination notice. 2. **Lifecycle Hook Draining** (Lambda): - ASG sends termination event to SNS - Lambda sets instance to DRAINING - Waits for running tasks to migrate - Completes lifecycle action ### Launch Template Features - **IMDSv2 Required**: Enhanced metadata security - **gp3 EBS Volumes**: Better performance, lower cost than gp2 - **Encrypted Volumes**: EBS encryption enabled - **SSM Agent**: Pre-installed for Session Manager access ## Migrating from Legacy Automation ### Parameter Mapping | Legacy (Ansible) | New (CDK) | | ----------------------------------- | ------------------------------ | | `stackName` | `stackName` | | `instanceType` | `instanceType` | | `minClusterSize` | `minClusterSize` | | `maxClusterSize` | `maxClusterSize` | | `spotEnabled` | `spotEnabled` | | `minOnDemandPercentage` | `onDemandPercentage` | | `largestContainerCPUReservation` | (not needed - managed scaling) | | `largestContainerMemoryReservation` | (not needed - managed scaling) | | `clusterScaleUpAdjustment` | (not needed - managed scaling) | | `clusterScaleDownAdjustment` | (not needed - managed scaling) | ### Removed Features These legacy features are no longer needed: - **SchedulableContainers Lambda**: Replaced by Capacity Provider managed scaling - **Autospotting**: Replaced by Mixed Instances Policy - **Launch Configurations**: Replaced by Launch Templates - **gp2 volumes**: Upgraded to gp3 - **IMDSv1**: Now requires IMDSv2 ## Troubleshooting ### Instances Not Joining Cluster Check the ECS agent logs: ```bash docker logs ecs-agent cat /var/log/ecs/ecs-agent.log ``` Verify cluster name in user data: ```bash cat /etc/ecs/ecs.config ``` ### Tasks Not Draining Check Lambda logs in CloudWatch: ``` /aws/lambda/{stackName}-DrainingLambda ``` ### Spot Interruptions Monitor with CloudWatch metrics: - `AWS/EC2Spot` → `InterruptionRate` - `AWS/ECS` → `CPUReservation`, `MemoryReservation` Consider increasing `onDemandPercentage` for critical workloads. ## Cost Optimization Tips 1. **Use Spot in non-prod**: `onDemandPercentage: 0` 2. **Multiple instance types**: Better Spot availability 3. **Right-size instances**: Match to your container sizes 4. **Enable Fargate Spot**: For batch/background tasks 5. **Set max instance lifetime**: Force instance refresh for patches