Back to list
itsmostafa

ecs

by itsmostafa

AWS Skills for Agents

965🍴 417📅 Jan 23, 2026

SKILL.md


name: ecs description: AWS ECS container orchestration for running Docker containers. Use when deploying containerized applications, configuring task definitions, setting up services, managing clusters, or troubleshooting container issues. last_updated: "2026-01-07" doc_source: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/

AWS ECS

Amazon Elastic Container Service (ECS) is a fully managed container orchestration service. Run containers on AWS Fargate (serverless) or EC2 instances.

Table of Contents

Core Concepts

Cluster

Logical grouping of tasks or services. Can contain Fargate tasks, EC2 instances, or both.

Task Definition

Blueprint for your application. Defines containers, resources, networking, and IAM roles.

Task

Running instance of a task definition. Can run standalone or as part of a service.

Service

Maintains desired count of tasks. Handles deployments, load balancing, and auto scaling.

Launch Types

TypeDescriptionUse Case
FargateServerless, pay per taskMost workloads
EC2Self-managed instancesGPU, Windows, specific requirements

Common Patterns

Create a Fargate Cluster

AWS CLI:

# Create cluster
aws ecs create-cluster --cluster-name my-cluster

# With capacity providers
aws ecs create-cluster \
  --cluster-name my-cluster \
  --capacity-providers FARGATE FARGATE_SPOT \
  --default-capacity-provider-strategy \
    capacityProvider=FARGATE,weight=1 \
    capacityProvider=FARGATE_SPOT,weight=1

Register Task Definition

cat > task-definition.json << 'EOF'
{
  "family": "web-app",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "256",
  "memory": "512",
  "executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole",
  "taskRoleArn": "arn:aws:iam::123456789012:role/ecsTaskRole",
  "containerDefinitions": [
    {
      "name": "web",
      "image": "123456789012.dkr.ecr.us-east-1.amazonaws.com/my-app:latest",
      "portMappings": [
        {
          "containerPort": 8080,
          "protocol": "tcp"
        }
      ],
      "environment": [
        {"name": "NODE_ENV", "value": "production"}
      ],
      "secrets": [
        {
          "name": "DB_PASSWORD",
          "valueFrom": "arn:aws:secretsmanager:us-east-1:123456789012:secret:db-password"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/web-app",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "ecs"
        }
      },
      "healthCheck": {
        "command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"],
        "interval": 30,
        "timeout": 5,
        "retries": 3,
        "startPeriod": 60
      }
    }
  ]
}
EOF

aws ecs register-task-definition --cli-input-json file://task-definition.json

Create Service with Load Balancer

aws ecs create-service \
  --cluster my-cluster \
  --service-name web-service \
  --task-definition web-app:1 \
  --desired-count 2 \
  --launch-type FARGATE \
  --network-configuration "awsvpcConfiguration={
    subnets=[subnet-12345678,subnet-87654321],
    securityGroups=[sg-12345678],
    assignPublicIp=DISABLED
  }" \
  --load-balancers "targetGroupArn=arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/web-tg/1234567890123456,containerName=web,containerPort=8080" \
  --health-check-grace-period-seconds 60

Run Standalone Task

aws ecs run-task \
  --cluster my-cluster \
  --task-definition my-batch-job:1 \
  --launch-type FARGATE \
  --network-configuration "awsvpcConfiguration={
    subnets=[subnet-12345678],
    securityGroups=[sg-12345678],
    assignPublicIp=ENABLED
  }"

Update Service (Deploy New Image)

# Register new task definition with updated image
aws ecs register-task-definition --cli-input-json file://task-definition.json

# Update service to use new version
aws ecs update-service \
  --cluster my-cluster \
  --service web-service \
  --task-definition web-app:2 \
  --force-new-deployment

Auto Scaling

# Register scalable target
aws application-autoscaling register-scalable-target \
  --service-namespace ecs \
  --resource-id service/my-cluster/web-service \
  --scalable-dimension ecs:service:DesiredCount \
  --min-capacity 2 \
  --max-capacity 10

# Target tracking policy
aws application-autoscaling put-scaling-policy \
  --service-namespace ecs \
  --resource-id service/my-cluster/web-service \
  --scalable-dimension ecs:service:DesiredCount \
  --policy-name cpu-target-tracking \
  --policy-type TargetTrackingScaling \
  --target-tracking-scaling-policy-configuration '{
    "TargetValue": 70.0,
    "PredefinedMetricSpecification": {
      "PredefinedMetricType": "ECSServiceAverageCPUUtilization"
    },
    "ScaleOutCooldown": 60,
    "ScaleInCooldown": 120
  }'

CLI Reference

Cluster Management

CommandDescription
aws ecs create-clusterCreate cluster
aws ecs describe-clustersGet cluster details
aws ecs list-clustersList clusters
aws ecs delete-clusterDelete cluster

Task Definitions

CommandDescription
aws ecs register-task-definitionCreate task definition
aws ecs describe-task-definitionGet task definition
aws ecs list-task-definitionsList task definitions
aws ecs deregister-task-definitionDeregister version

Services

CommandDescription
aws ecs create-serviceCreate service
aws ecs update-serviceUpdate service
aws ecs describe-servicesGet service details
aws ecs delete-serviceDelete service

Tasks

CommandDescription
aws ecs run-taskRun standalone task
aws ecs stop-taskStop running task
aws ecs describe-tasksGet task details
aws ecs list-tasksList tasks

Best Practices

Security

  • Use task roles for AWS API access (not access keys)
  • Use execution roles for ECR/Secrets access
  • Store secrets in Secrets Manager or Parameter Store
  • Use private subnets with NAT gateway
  • Enable CloudTrail for API auditing

Performance

  • Right-size CPU/memory — monitor and adjust
  • Use Fargate Spot for fault-tolerant workloads (70% savings)
  • Enable container insights for monitoring
  • Use service discovery for internal communication

Reliability

  • Deploy across multiple AZs
  • Configure health checks properly
  • Set appropriate deregistration delay
  • Use circuit breaker for deployments
aws ecs update-service \
  --cluster my-cluster \
  --service web-service \
  --deployment-configuration '{
    "deploymentCircuitBreaker": {
      "enable": true,
      "rollback": true
    }
  }'

Cost Optimization

  • Use Fargate Spot for batch workloads
  • Right-size task resources
  • Scale to zero when not needed
  • Use capacity providers for mixed Fargate/Spot

Troubleshooting

Task Fails to Start

Check:

# View stopped tasks
aws ecs describe-tasks \
  --cluster my-cluster \
  --tasks $(aws ecs list-tasks --cluster my-cluster --desired-status STOPPED --query 'taskArns[0]' --output text)

Common causes:

  • Image not found (ECR permissions)
  • Secrets access denied
  • Network configuration (subnets, security groups)
  • Resource limits exceeded

Container Keeps Restarting

Debug:

# Check CloudWatch logs
aws logs get-log-events \
  --log-group-name /ecs/web-app \
  --log-stream-name "ecs/web/abc123"

# Check task details
aws ecs describe-tasks \
  --cluster my-cluster \
  --tasks task-arn \
  --query 'tasks[0].containers[0].{reason:reason,exitCode:exitCode}'

Causes:

  • Health check failing
  • Application crashing
  • Out of memory

Service Stuck Deploying

# Check deployment status
aws ecs describe-services \
  --cluster my-cluster \
  --services web-service \
  --query 'services[0].deployments'

# Check events
aws ecs describe-services \
  --cluster my-cluster \
  --services web-service \
  --query 'services[0].events[:5]'

Causes:

  • Health check failing on new tasks
  • Not enough capacity
  • Target group health checks failing

Cannot Pull Image from ECR

Check execution role has:

{
  "Effect": "Allow",
  "Action": [
    "ecr:GetAuthorizationToken",
    "ecr:BatchCheckLayerAvailability",
    "ecr:GetDownloadUrlForLayer",
    "ecr:BatchGetImage"
  ],
  "Resource": "*"
}

Also check:

  • VPC endpoint for ECR (if private subnet)
  • NAT gateway (if private subnet)
  • Security group allows HTTPS outbound

References

Score

Total Score

80/100

Based on repository quality metrics

SKILL.md

SKILL.mdファイルが含まれている

+20
LICENSE

ライセンスが設定されている

+10
説明文

100文字以上の説明がある

0/10
人気

GitHub Stars 500以上

+10
最近の活動

1ヶ月以内に更新

+10
フォーク

10回以上フォークされている

+5
Issue管理

オープンIssueが50未満

+5
言語

プログラミング言語が設定されている

+5
タグ

1つ以上のタグが設定されている

+5

Reviews

💬

Reviews coming soon