Inacurate performance measurement
-
Investigating
As a side effect of the monitoring infrastructure migration, reported performance statistics are inaccurate, mostly in the Europe and America regions. We are actively working on a fix.
Some monitoring checks are not executed
-
Resolved
The infrastructure migration is successful, and uptime monitoring isn't affected anymore.
-
Monitoring
The new infrastructure is working as expected, a progressive migration is now in place with a 100% target set on November 18 at 00:00 UTC.
-
Monitoring
Phare's edge monitoring infrastructure as been replicated on a new provider, Bunny.net. Ten percent of monitoring requests will now be routed to this new infrastructure, to make sure the quality of the monitoring stay consistent between platforms.
-
Identified
The issue has been traced to an update in Cloudflare routing rules between workers, which was released in a rolling deployment affecting only a few monitor checks starting November 11 at 19:00 UTC up to 100% on November 12 around 17:00 UTC. All requests are routed to Germany for the moment to keep the system working at best capacity, until a workaround can be deployed to allow accurate multi-region checks.
-
Investigating
A workaround as been implemented to re-route all requests through the Germany region while other regions are working properly again.
-
Investigating
Uptime monitoring is affected in every region, we are currently investigating the issue.
Documentation is down
-
Resolved
Phare's documentation is back online after some unexpected downtime from our hosting provider
Monitoring service interruption
-
Resolved
The service in charge of scheduling uptime monitoring experienced a complete outage for 3 hours and 11 minutes. During this period, no uptime monitoring could be performed, affecting all users relying on our service between 16:34 UTC to 19:45 UTC The issue has been resolved quickly once identified and the root cause identified. A misconfigured step in our deployment step did not correctly restart the service in charge or scheduling uptime monitoring, and our current observability stack did not catch this particular case as an issue. An upgrade of our internal observability stack has been scheduled to be alerted quickly in case a similar problem arise in the future, has this incident highlighted areas for improvement. Thank you for your continued trust in our uptime monitoring service.
Infrastructure maintenance
-
Resolved
An infrastructure migration between hosting providers resulted in 6 minutes of uptime monitoring downtime and about 11 minutes without dashboard access. We apologize for any inconvenience, thank you for your patience and support.