A previously reported automated failure on health-check
has been resolved.
At 8:50 PM, we received a health-check
failure notice (and internal incident alarms) that our blocks and users APIs were experiencing higher than expected internal error rates and latencies.
This was due to the slow rollout of underlying GCP infrastructure undergoing routine maintenance upgrades. Our infrastructure self-healed at 9:00 PM once the slow rollout completed; subsequently, latencies returned to normal levels, and internal error rates dropped back to ~0.
For the ten minutes between 8:50 PM and 9:00 PM, we identified that roughly 0.43% of requests experienced higher than expected latencies (and potentially timeouts). Our clients and SDK exponentially retry these requests, and we have not recorded any errors after 9:00 PM.