Catalogue
AWS ECS Troubleshooting

AWS ECS Troubleshooting

🌐 日本語で読む

While working with ECS, I ran into a number of pitfalls, so I’ve summarized them here.

“started 1 task” runs multiple times, but the container won’t start

1
2
3
4
5
$ ecs-cli compose service up ...

level=info msg="(service hogehoge) has started 1 tasks ..."
level=info msg="(service hogehoge) has started 1 tasks ..."
level=info msg="(service hogehoge) has started 1 tasks ..."

This is a state where, although ecs-cli compose service up attempts to launch the task during deployment, the launch doesn’t succeed.
This can be caused by a problem in the processing that runs when the container starts.

  • Check the container logs, and look at the logs around the time the container failed to start.
  • For example, there may be a typo or syntax error in the Nginx configuration file or the Rails code.

already using a port required by your task

1
2
service hogehoge was unable to place a task because no container instance met all of its requirements.
The closest matching container-instance a1b2c3d4-e5f6-g7h8-j9k0-l1m2n3o4p5q6 is already using a port required by your task

The port mapping had been configured as follows.

1
2
3
4
5
6
7
"portMappings": [
{
"hostPort": 0,
"protocol": "tcp",
"containerPort": 80
}
],

Because the new task also tries to use the 0:80 port, this results in an error.
Configuring it as follows allowed me to avoid the problem.

1
2
3
4
5
"portMappings": [
{
"containerPort": 80
}
],

insufficient memory available

1
INFO[0031] (service hogehoge) was unable to place a task because no container instance met all of its requirements. The closest matching (container-instance a1b2c3d4-e5f6-g7h8-j9k0-l1m2n3o4p5q6) has insufficient memory available. For more information, see the Troubleshooting section of the Amazon ECS Developer Guide.  timestamp=2018-03-09 15:45:24 +0000 UTC

When a memory shortage like the above appears while running a task update (ecs-cli compose service up),
you need to increase the memory resources, for example by upgrading the instance type or by deleting other tasks.

no space on device

Unable to pull the image due to no space on device.

Check the capacity usage with the df -hT command.

Clean up by forcibly removing unused containers and volumes.

1
docker system prune -af --volumes

msg=”Couldn’t run containers” reason=”RESOURCE:CPU”

1
msg="Couldn't run containers" reason="RESOURCE:CPU"

The cpu (vCPU) specified in the task is insufficient.
You need to increase the CPU resources, for example by upgrading the instance type or by deleting other tasks.

Fargate - Port Mapping Error

1
level=error msg="Create task definition failed" error="ClientException: When networkMode=awsvpc, the host ports and container ports in port mappings must match.\n\tstatus code: 400, request id: a1b2c3d4-e5f6-g7h8-j9k0-l1m2n3o4p5q6"

With the Fargate launch type, a configuration like the following is NG.

1
2
ports:
- "80"

This one is OK.

1
2
ports:
- "80:80"

You need a mapping between the host port and the container port.

Fargate volume_from cannot be used

volume_from cannot be used with Fargate.

1
level=error msg="Create task definition failed" error="ClientException: host.sourcePath should not be set for volumes in Fargate.\n\tstatus code: 400, request id: a1b2c3d4-e5f6-g7h8-j9k0-l1m2n3o4p5q6"

The specified IAM Role has not been granted the proper permissions

Grant the appropriate permissions to the IAM Role.

1
2
level=info msg="(service hogehoge) failed to launch a task with (error ECS was unable to assume the role 'arn:aws:iam::123456789012:role/ecsTask
ExecutionRole' that was provided for this task. Please verify that the role being passed has the proper trust relationship and permissions and that your IAM user has permissions to pass this role.)." timestamp=2018-06-21 08:15:43 +0000 UTC

An error saying the image can’t be pulled is also mostly caused by permissions not being granted.

1
CannotPullContainerError: API error (500): Get https://123456789012.dkr.ecr.ap-northeast-1.amazonaws.com/v2/: net/http: request canceled while waiting for connection"

Please refer to the permissions of the IAM Role of your currently running ECS. Since they may change, treat this only as a reference and respond using the latest information as appropriate.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Action": [
"logs:PutLogEvents",
"logs:CreateLogStream",
"logs:CreateLogGroup",
"elasticloadbalancing:RegisterTargets",
"elasticloadbalancing:Describe*",
"elasticloadbalancing:DeregisterTargets",
"ecs:UpdateService",
"ecs:Submit*",
"ecs:StartTelemetrySession",
"ecs:StartTask",
"ecs:RunTask",
"ecs:RegisterTaskDefinition",
"ecs:RegisterContainerInstance",
"ecs:Poll",
"ecs:ListTasks",
"ecs:DiscoverPollEndpoint",
"ecs:DescribeTasks",
"ecs:DescribeServices",
"ecs:DescribeContainerInstances",
"ecs:DeregisterContainerInstance",
"ecs:CreateService",
"ecr:UploadLayerPart",
"ecr:PutImage",
"ecr:InitiateLayerUpload",
"ecr:GetDownloadUrlForLayer",
"ecr:GetAuthorizationToken",
"ecr:CompleteLayerUpload",
"ecr:BatchGetImage",
"ecr:BatchCheckLayerAvailability",
"ec2:Describe*"
],
"Resource": "*"
}
]
}

That’s all.

I’ll keep adding to this whenever something new comes up.

Reference

kenzo0107

kenzo0107