High latency and error rate for RIPEstat

Incident Report for RIPE NCC

Resolved

The issues recovered after our last mitigations. We will make some of these changes permanent.
Posted Sep 12, 2025 - 11:46 CEST

Update

We are continuing to monitor for any further issues.
Posted Sep 12, 2025 - 08:03 CEST

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Sep 12, 2025 - 07:52 CEST

Update

We are continuing to work on a fix for this issue.
Posted Sep 12, 2025 - 07:52 CEST

Update

We are continuing to work on the issue. As part of our mitigation we are starting to enforce our API rate-limits on specific endpoints.

We do not have contact details for the affected API user.
Posted Sep 12, 2025 - 07:51 CEST

Identified

Due to a bug in the salt version we are using (that causes excessive memory usage), we can not currently run our configuration management on machines that are already swapping. Attempting to manually restart processes did not work due to the load polarising to the first machines that are healthy again. We are working on a workaround.
Posted Sep 12, 2025 - 06:37 CEST

Monitoring

We did an intervention and are monitoring the results.
Posted Sep 12, 2025 - 05:56 CEST

Identified

We see a siginifcant increase in error rate and latency since ~3:15 UTC. We are working on the solution.
Posted Sep 12, 2025 - 05:55 CEST
This incident affected: Non-Critical Services (RIPEstat).