Jenkins shared library and CDK constructs for AWS infrastructure. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
385 lines
17 KiB
Markdown
385 lines
17 KiB
Markdown
# ECS Cluster Deployment
|
|
|
|
Deploy production-ready ECS clusters with AWS CDK.
|
|
|
|
## Features
|
|
|
|
- **EC2 Capacity Provider** with managed scaling (replaces custom SchedulableContainers metric)
|
|
- **Mixed Instances Policy** for Spot support (replaces Autospotting)
|
|
- **Launch Templates** with IMDSv2 and gp3 EBS volumes
|
|
- **Instance Draining** via lifecycle hooks for graceful task migration
|
|
- **Optional Fargate** capacity providers for serverless workloads
|
|
- **Internal/External ALBs** with HTTPS support
|
|
- **Container Insights** for monitoring
|
|
- **Automatic instance refresh** via max instance lifetime
|
|
|
|
## Quick Start
|
|
|
|
### Minimal Jenkinsfile - Using CloudFormation Imports
|
|
|
|
**Minimal props:** Only `vpcStackName` required. All VPC details auto-import from VPC stack exports.
|
|
|
|
```groovy
|
|
@Library(["spicy-automation@main"]) _
|
|
|
|
spicyECSCluster(
|
|
jenkinsAwsCredentialsId: "aws-credentials",
|
|
region: "ca-central-1",
|
|
stackName: "my-ecs-cluster",
|
|
vpcStackName: "my-vpc", // Auto-imports ALL VPC details (VPC ID, CIDR, subnets, AZs)
|
|
ownerTag: "MyTeam",
|
|
productTag: "my-product",
|
|
componentTag: "ecs-cluster",
|
|
environment: "dev"
|
|
)
|
|
```
|
|
|
|
**What auto-imports from VPC stack:**
|
|
|
|
- VPC ID from `${vpcStackName}-VPCID`
|
|
- VPC CIDR from `${vpcStackName}-VPCCIDR`
|
|
- Number of AZs from `${vpcStackName}-NumberOfAZs`
|
|
- Private subnet IDs from `${vpcStackName}-PrivateSubnetA1ID`, `${vpcStackName}-PrivateSubnetB1ID`, etc.
|
|
- Public subnet IDs from `${vpcStackName}-PublicSubnetAID`, `${vpcStackName}-PublicSubnetBID`, etc. (if `createExternalLoadBalancer: true`)
|
|
- Availability zones auto-derived from region and number of AZs
|
|
|
|
### Production Jenkinsfile with All Options
|
|
|
|
```groovy
|
|
@Library(["spicy-automation@main"]) _
|
|
|
|
spicyECSCluster(
|
|
// AWS Configuration
|
|
jenkinsAwsCredentialsId: "aws-credentials",
|
|
region: "ca-central-1",
|
|
accountId: "123456789012",
|
|
stackName: "prod-ecs-cluster",
|
|
|
|
// VPC Configuration - only vpcStackName required, all VPC details auto-import
|
|
vpcStackName: "production-vpc",
|
|
// VPC ID, CIDR, subnets, AZs, and numberOfAzs all auto-import from VPC stack exports
|
|
|
|
// Tags
|
|
ownerTag: "Platform",
|
|
productTag: "spicy",
|
|
componentTag: "ecs-cluster",
|
|
environment: "prod",
|
|
|
|
// Instance Configuration
|
|
instanceType: "m5a.xlarge",
|
|
additionalInstanceTypes: "m5.xlarge,m5d.xlarge,m5n.xlarge",
|
|
keyName: "my-keypair",
|
|
ebsVolumeSize: 100,
|
|
|
|
// Scaling
|
|
minClusterSize: 3,
|
|
maxClusterSize: 10,
|
|
targetCapacityPercent: 100,
|
|
|
|
// Spot Configuration (for cost savings)
|
|
spotEnabled: true,
|
|
onDemandPercentage: 50, // 50% On-Demand, 50% Spot
|
|
spotAllocationStrategy: "capacity-optimized",
|
|
|
|
// Load Balancers
|
|
createExternalLoadBalancer: true,
|
|
createInternalLoadBalancer: true,
|
|
certificateArn: "arn:aws:acm:ca-central-1:123456789012:certificate/xxx",
|
|
|
|
// Fargate (optional hybrid - enables both FARGATE and FARGATE_SPOT)
|
|
enableFargate: false,
|
|
|
|
// Timeouts
|
|
drainingTimeout: 900, // 15 minutes for task draining
|
|
maxInstanceLifetime: 604800, // 7 days for instance refresh
|
|
|
|
// Container Insights
|
|
containerInsights: true,
|
|
|
|
// Approval for production
|
|
approvers: "admin,platform-team"
|
|
)
|
|
```
|
|
|
|
## Parameters Reference
|
|
|
|
### Required Parameters
|
|
|
|
| Parameter | Description | Example |
|
|
| ------------------------- | -------------------------------------------------------------------------------------------------------------- | ------------------- |
|
|
| `jenkinsAwsCredentialsId` | Jenkins credential ID for AWS | `"aws-credentials"` |
|
|
| `region` | AWS region | `"ca-central-1"` |
|
|
| `stackName` | CloudFormation stack name | `"my-ecs-cluster"` |
|
|
| `vpcStackName` | VPC stack name - **required**. All VPC details (VPC ID, CIDR, subnets, AZs) auto-import from VPC stack exports | `"my-vpc"` |
|
|
| `ownerTag` | Owner tag value | `"MyTeam"` |
|
|
| `productTag` | Product tag value | `"my-product"` |
|
|
|
|
### Instance Configuration
|
|
|
|
| Parameter | Default | Description |
|
|
| ------------------------- | ----------- | ----------------------------------- |
|
|
| `instanceType` | `m5a.large` | Primary EC2 instance type |
|
|
| `additionalInstanceTypes` | - | Additional types for Spot diversity |
|
|
| `keyName` | - | EC2 key pair for SSH access |
|
|
| `ebsVolumeSize` | `100` | EBS volume size in GB |
|
|
| `containerInsights` | `true` | Enable Container Insights |
|
|
|
|
### Scaling Configuration
|
|
|
|
| Parameter | Default | Description |
|
|
| ----------------------- | ------- | -------------------------------------- |
|
|
| `minClusterSize` | `2` | Minimum number of instances |
|
|
| `maxClusterSize` | `4` | Maximum number of instances |
|
|
| `targetCapacityPercent` | `100` | Target utilization for managed scaling |
|
|
|
|
### Spot Configuration
|
|
|
|
| Parameter | Default | Description |
|
|
| ------------------------ | -------------------- | -------------------------------------- |
|
|
| `spotEnabled` | `false` | Enable Spot instances |
|
|
| `onDemandPercentage` | `100` | Percentage of On-Demand (rest is Spot) |
|
|
| `spotAllocationStrategy` | `capacity-optimized` | Spot allocation strategy |
|
|
|
|
**Spot Allocation Strategies:**
|
|
|
|
- `capacity-optimized` - Best for interruption avoidance (recommended)
|
|
- `lowest-price` - Best for cost, higher interruption risk
|
|
- `capacity-optimized-prioritized` - Prioritizes instance types you specify
|
|
|
|
### Load Balancer Configuration
|
|
|
|
| Parameter | Default | Description |
|
|
| ---------------------------- | ------- | ----------------------------------------------------------------------------------- |
|
|
| `createExternalLoadBalancer` | `false` | Create internet-facing ALB (public subnets auto-imported from VPC stack if enabled) |
|
|
| `createInternalLoadBalancer` | `false` | Create internal ALB |
|
|
| `certificateArn` | - | ACM certificate for HTTPS |
|
|
|
|
### Fargate Configuration
|
|
|
|
| Parameter | Default | Description |
|
|
| --------------- | ------- | ---------------------------------------------------------------------- |
|
|
| `enableFargate` | `false` | Enable Fargate capacity providers (adds both FARGATE and FARGATE_SPOT) |
|
|
|
|
### Lifecycle Configuration
|
|
|
|
| Parameter | Default | Description |
|
|
| --------------------- | -------- | --------------------------------- |
|
|
| `drainingTimeout` | `900` | Seconds to wait for task draining |
|
|
| `maxInstanceLifetime` | `604800` | Max instance age (7 days) |
|
|
|
|
## Environment-Specific Configuration
|
|
|
|
### Development/Sandbox
|
|
|
|
```groovy
|
|
spicyECSCluster(
|
|
// ... base config ...
|
|
environment: "dev",
|
|
minClusterSize: 1,
|
|
maxClusterSize: 2,
|
|
spotEnabled: true,
|
|
onDemandPercentage: 0, // 100% Spot for max savings
|
|
)
|
|
```
|
|
|
|
### Staging
|
|
|
|
```groovy
|
|
spicyECSCluster(
|
|
// ... base config ...
|
|
environment: "staging",
|
|
minClusterSize: 2,
|
|
maxClusterSize: 4,
|
|
spotEnabled: true,
|
|
onDemandPercentage: 20, // 80% Spot
|
|
)
|
|
```
|
|
|
|
### Production
|
|
|
|
```groovy
|
|
spicyECSCluster(
|
|
// ... base config ...
|
|
environment: "prod",
|
|
minClusterSize: 3,
|
|
maxClusterSize: 10,
|
|
spotEnabled: true,
|
|
onDemandPercentage: 50, // 50% On-Demand baseline
|
|
approvers: "admin,platform-team"
|
|
)
|
|
```
|
|
|
|
## Stack Outputs
|
|
|
|
The stack exports these values for use by ECS services:
|
|
|
|
| Output | Export Name | Description |
|
|
| -------------------------- | -------------------------------------------- | ---------------------- |
|
|
| `ClusterName` | `{stackName}-cluster-name` | ECS cluster name |
|
|
| `ClusterArn` | `{stackName}-cluster-arn` | ECS cluster ARN |
|
|
| `VPC` | `{stackName}-VPC` | VPC ID |
|
|
| `ECSHostSecurityGroup` | `{stackName}-ecs-host-security-group` | EC2 security group |
|
|
| `AutoScalingGroupName` | `{stackName}-auto-scaling-group` | ASG name |
|
|
| `ExternalLoadBalancerDNS` | `{stackName}-internet-facing-url` | External ALB DNS |
|
|
| `ExternalLoadBalancerArn` | `{stackName}-internet-facing-arn` | External ALB ARN |
|
|
| `ExternalHTTPListenerArn` | `{stackName}-internet-facing-http-listener` | HTTP listener ARN |
|
|
| `ExternalHTTPSListenerArn` | `{stackName}-internet-facing-https-listener` | HTTPS listener ARN |
|
|
| `InternalLoadBalancerDNS` | `{stackName}-internal-url` | Internal ALB DNS |
|
|
| `InternalLoadBalancerArn` | `{stackName}-internal-arn` | Internal ALB ARN |
|
|
| `InternalHTTPListenerArn` | `{stackName}-internal-http-listener` | HTTP listener ARN |
|
|
| `InternalHTTPSListenerArn` | `{stackName}-internal-https-listener` | HTTPS listener ARN |
|
|
| `LogsBucketName` | `{stackName}-logs-s3-bucket` | ALB access logs bucket |
|
|
|
|
## How It Works
|
|
|
|
### Capacity Providers (Replaces Custom Scaling)
|
|
|
|
The cluster uses **ECS Managed Scaling** via Capacity Providers:
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ ECS Cluster │
|
|
├─────────────────────────────────────────────────────────┤
|
|
│ Capacity Providers: │
|
|
│ ┌─────────────────────────────────────────────────┐ │
|
|
│ │ EC2 Capacity Provider │ │
|
|
│ │ - Managed Scaling: ON │ │
|
|
│ │ - Target Capacity: 100% │ │
|
|
│ │ - Min Scaling Step: 1 │ │
|
|
│ │ - Max Scaling Step: 10000 │ │
|
|
│ └─────────────────────────────────────────────────┘ │
|
|
│ ┌─────────────────────────────────────────────────┐ │
|
|
│ │ FARGATE (optional) │ │
|
|
│ └─────────────────────────────────────────────────┘ │
|
|
│ ┌─────────────────────────────────────────────────┐ │
|
|
│ │ FARGATE_SPOT (optional) │ │
|
|
│ └─────────────────────────────────────────────────┘ │
|
|
└─────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
This replaces the legacy `SchedulableContainers` Lambda metric with AWS-native scaling.
|
|
|
|
### Mixed Instances Policy (Replaces Autospotting)
|
|
|
|
When `spotEnabled: true`:
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────┐
|
|
│ Auto Scaling Group │
|
|
├─────────────────────────────────────────────────────────┤
|
|
│ Mixed Instances Policy: │
|
|
│ ┌─────────────────────────────────────────────────┐ │
|
|
│ │ On-Demand Base Capacity: 0 │ │
|
|
│ │ On-Demand % Above Base: 50% │ │
|
|
│ │ Spot Allocation: capacity-optimized │ │
|
|
│ └─────────────────────────────────────────────────┘ │
|
|
│ │
|
|
│ Instance Type Overrides: │
|
|
│ - m5a.large (primary) │
|
|
│ - m5.large │
|
|
│ - m5d.large │
|
|
│ - m5n.large │
|
|
└─────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
Benefits over Autospotting:
|
|
|
|
- No Lambda to maintain
|
|
- Faster response (no polling delay)
|
|
- Better capacity data (AWS-native)
|
|
- Simpler architecture
|
|
|
|
### Instance Draining
|
|
|
|
Two-layer draining for zero-downtime:
|
|
|
|
1. **Spot Interruption Draining** (native ECS):
|
|
|
|
```bash
|
|
ECS_ENABLE_SPOT_INSTANCE_DRAINING=true
|
|
```
|
|
|
|
ECS agent drains tasks on 2-minute Spot termination notice.
|
|
|
|
2. **Lifecycle Hook Draining** (Lambda):
|
|
- ASG sends termination event to SNS
|
|
- Lambda sets instance to DRAINING
|
|
- Waits for running tasks to migrate
|
|
- Completes lifecycle action
|
|
|
|
### Launch Template Features
|
|
|
|
- **IMDSv2 Required**: Enhanced metadata security
|
|
- **gp3 EBS Volumes**: Better performance, lower cost than gp2
|
|
- **Encrypted Volumes**: EBS encryption enabled
|
|
- **SSM Agent**: Pre-installed for Session Manager access
|
|
|
|
## Migrating from Legacy Automation
|
|
|
|
### Parameter Mapping
|
|
|
|
| Legacy (Ansible) | New (CDK) |
|
|
| ----------------------------------- | ------------------------------ |
|
|
| `stackName` | `stackName` |
|
|
| `instanceType` | `instanceType` |
|
|
| `minClusterSize` | `minClusterSize` |
|
|
| `maxClusterSize` | `maxClusterSize` |
|
|
| `spotEnabled` | `spotEnabled` |
|
|
| `minOnDemandPercentage` | `onDemandPercentage` |
|
|
| `largestContainerCPUReservation` | (not needed - managed scaling) |
|
|
| `largestContainerMemoryReservation` | (not needed - managed scaling) |
|
|
| `clusterScaleUpAdjustment` | (not needed - managed scaling) |
|
|
| `clusterScaleDownAdjustment` | (not needed - managed scaling) |
|
|
|
|
### Removed Features
|
|
|
|
These legacy features are no longer needed:
|
|
|
|
- **SchedulableContainers Lambda**: Replaced by Capacity Provider managed scaling
|
|
- **Autospotting**: Replaced by Mixed Instances Policy
|
|
- **Launch Configurations**: Replaced by Launch Templates
|
|
- **gp2 volumes**: Upgraded to gp3
|
|
- **IMDSv1**: Now requires IMDSv2
|
|
|
|
## Troubleshooting
|
|
|
|
### Instances Not Joining Cluster
|
|
|
|
Check the ECS agent logs:
|
|
|
|
```bash
|
|
docker logs ecs-agent
|
|
cat /var/log/ecs/ecs-agent.log
|
|
```
|
|
|
|
Verify cluster name in user data:
|
|
|
|
```bash
|
|
cat /etc/ecs/ecs.config
|
|
```
|
|
|
|
### Tasks Not Draining
|
|
|
|
Check Lambda logs in CloudWatch:
|
|
|
|
```
|
|
/aws/lambda/{stackName}-DrainingLambda
|
|
```
|
|
|
|
### Spot Interruptions
|
|
|
|
Monitor with CloudWatch metrics:
|
|
|
|
- `AWS/EC2Spot` → `InterruptionRate`
|
|
- `AWS/ECS` → `CPUReservation`, `MemoryReservation`
|
|
|
|
Consider increasing `onDemandPercentage` for critical workloads.
|
|
|
|
## Cost Optimization Tips
|
|
|
|
1. **Use Spot in non-prod**: `onDemandPercentage: 0`
|
|
2. **Multiple instance types**: Better Spot availability
|
|
3. **Right-size instances**: Match to your container sizes
|
|
4. **Enable Fargate Spot**: For batch/background tasks
|
|
5. **Set max instance lifetime**: Force instance refresh for patches
|