Catalogue
Dealing with a Flood of Alerts from Datadog NTP Monitoring

Dealing with a Flood of Alerts from Datadog NTP Monitoring

🌐 日本語で読む

Overview

When monitoring server time with Datadog, we ran into an issue where unnecessary alerts fired because the reference sources for standard time differed.

By default, Datadog references pool.ntp.org.

Since the Chrony configured on our AWS EC2 instances was set up to reference ntp.nict.jp by default, one day we suddenly got hit with a flood of alerts.

As a countermeasure, we configured Datadog and Chrony to use a unified reference source.

Unifying the Time Server Host

In this case we were using AWS, and since AWS also provides an NTP server, we decided to reference that.

The AWS Time Sync Service host is 169.254.169.123.

Since it is accessible via the link-local IP address 169.254.169.123, it can be reached even from a private subnet.
The fact that it is an IP address is a bit nerve-wracking, since it would be painful if it suddenly changed one day, but so far that hasn’t happened.

  • /etc/datadog-agent/conf.d/ntp.d/conf.yaml
1
2
3
4
5
init_config:

instances:
- offset_threshold: 60
host: 169.254.169.123 # 追加
  • /etc/chrony/chrony.conf
1
2
# server ntp.nict.jp minpoll 4 maxpoll 4  # コメントアウト
server 169.254.169.123 prefer iburst # 追加

After applying the settings above, restart the services.

1
2
$ sudo systemctl restart chrony
$ sudo systemctl restart datadog-agent

The steps above resolved the alerts.

References

kenzo0107

kenzo0107