Resolved -
The issues recovered after our last mitigations. We will make some of these changes permanent.
Sep 12, 11:46 CEST
Update -
We are continuing to monitor for any further issues.
Sep 12, 08:03 CEST
Monitoring -
A fix has been implemented and we are monitoring the results.
Sep 12, 07:52 CEST
Update -
We are continuing to work on a fix for this issue.
Sep 12, 07:52 CEST
Update -
We are continuing to work on the issue. As part of our mitigation we are starting to enforce our API rate-limits on specific endpoints.
We do not have contact details for the affected API user.
Sep 12, 07:51 CEST
Identified -
Due to a bug in the salt version we are using (that causes excessive memory usage), we can not currently run our configuration management on machines that are already swapping. Attempting to manually restart processes did not work due to the load polarising to the first machines that are healthy again. We are working on a workaround.
Sep 12, 06:37 CEST
Monitoring -
We did an intervention and are monitoring the results.
Sep 12, 05:56 CEST
Identified -
We see a siginifcant increase in error rate and latency since ~3:15 UTC. We are working on the solution.
Sep 12, 05:55 CEST