We have important news about your account (AWS Account ID: xxxxxxxxxxxx). EC2 has detected degradation of the underlying hardware hosting your Amazon EC2 instance (instance-ID: i-xxxxxxxx) in the ap-northeast-1 region. Due to this degradation, your instance could already be unreachable. After 2017-04-25 04:00 UTC your instance, which has an EBS volume as the root device, will be stopped.
You can see more information on your instances that are scheduled for retirement in the AWS Management Console (https://console.aws.amazon.com/ec2/v2/home?region=ap-northeast-1#Events)
* How does this affect you? Your instance's root device is an EBS volume and the instance will be stopped after the specified retirement date. You can start it again at any time. Note that if you have EC2 instance store volumes attached to the instance, any data on these volumes will be lost when the instance is stopped or terminated as these volumes are physically attached to the host computer
* What do you need to do? You may still be able to access the instance. We recommend that you replace the instance by creating an AMI of your instance and launch a new instance from the AMI. For more information please see Amazon Machine Images (http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html) in the EC2 User Guide. In case of difficulties stopping your EBS-backed instance, please see the Instance FAQ (http://aws.amazon.com/instance-help/#ebs-stuck-stopping).
* Why retirement? AWS may schedule instances for retirement in cases where there is an unrecoverable issue with the underlying hardware. For more information about scheduled retirement events please see the EC2 user guide (http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-retirement.html). To avoid single points of failure within critical applications, please refer to our architecture center for more information on implementing fault-tolerant architectures: http://aws.amazon.com/architecture
If you have any questions or concerns, you can contact the AWS Support Team on the community forums and via AWS Premium Support at: http://aws.amazon.com/support
Sincerely, Amazon Web Services
AWS Console イベントを見ると一覧で表示されている。
AWS Console 詳細を見ると Notice が出ている。
ToDO
ボリュームタイプによって異なります。
EBS ボリューム
インスタンスの停止後、起動 (Reboot は ×)
インスタンスストアボリューム
AMI からインスタンス再作成、データ移行
今回は EBS ボリューム対応について記載してます。
対応
対象インスタンスが多かったのでローカル PC (macOS) から awscli でインスタンス停止 → 起動するシェル作成しました。 本番環境で利用されるインスタンスも含まれていた為、1 件ずつ実行することとしました。
Enter file in which to save the key (/Users/kenzo_tanaka/.ssh/id_rsa): /Users/kenzo_tanaka/.ssh/terraform-test Enter passphrase (empty for no passphrase): (空のままEnter) Enter same passphrase again: (空のままEnter) ... ...
variable "key_name" { description = "Desired name of AWS key pair" }
variable "public_key_path" { description = <<DESCRIPTION Path to the SSH public key to be used for authentication. Ensure this keypair is added to your local SSH agent so provisioners can connect.
[Instance A ]# pcs cluster auth ip-10-0-0-20.ap-northeast-1.compute.internal ip-10-0-1-20.ap-northeast-1.compute.internal -u hacluster -p ruby2015 Error: Unable to communicate with ip-10-0-0-20.ap-northeast-1.compute.internal Error: Unable to communicate with ip-10-0-1-20.ap-northeast-1.compute.internal
Shutting down pacemaker/corosync services... Redirecting to /bin/systemctl stop pacemaker.service Redirecting to /bin/systemctl stop corosync.service Killing any remaining services... Removing all cluster configuration files... ip-10-0-0-20.ap-northeast-1.compute.internal: Succeeded ip-10-0-1-20.ap-northeast-1.compute.internal: Succeeded Synchronizing pcsd certificates on nodes ip-10-0-0-20.ap-northeast-1.compute.internal, ip-10-0-1-20.ap-northeast-1.compute.internal... ip-10-0-0-20.ap-northeast-1.compute.internal: Success ip-10-0-1-20.ap-northeast-1.compute.internal: Success
Restaring pcsd on the nodes in order to reload the certificates... ip-10-0-0-20.ap-northeast-1.compute.internal: Success ip-10-0-1-20.ap-northeast-1.compute.internal: Success
[Instance A & B ]# rpm -iUvh http://dl.fedoraproject.org/pub/epel/7/x86_64/e/epel-release-7-5.noarch.rpm [Instance A & B ]# yum -y install python-pip [Instance A & B ]# pip --version pip 7.1.0 from /usr/lib/python2.7/site-packages (python 2.7)
[Instance A & B ]# pip install awscli [Instance A & B ]# aws configure AWS Access Key ID [None]: ********************* AWS Secret Access Key [None]: ************************************** Default region name [None]: ap-northeast-1 Default output format [None]: json
EIP 付け替えリソース作成
heartbeat で問題検知した際に起動するリソースとして登録します。
OCF_ROOT が定数として指定されているが、存在しない為
1 2 3 4 5 6 7
[Instance A & B ]# cd /tmp [Instance A & B ]# git clone https://github.com/moomindani/aws-eip-resource-agent.git [Instance A & B ]# cd aws-eip-resource-agent [Instance A & B ]# sed -i 's/\${OCF_ROOT}/\/usr\/lib\/ocf/' eip [Instance A & B ]# mv eip /usr/lib/ocf/resource.d/heartbeat/ [Instance A & B ]# chown root:root /usr/lib/ocf/resource.d/heartbeat/eip [Instance A & B ]# chmod 0755 /usr/lib/ocf/resource.d/heartbeat/eip
pacemaker 設定
stonish 無効化
1
[Instance A ]# pcs property set stonith-enabled=false
split-brain (スプリット・ブレイン) が発生しても quorum (クォーラム) が特別な動作を行わないように設定
1
[Instance A ]# pcs property set no-quorum-policy=ignore