Dopt - blocks and users APIs experiencing higher-than-expected latencies and error rates – Incident details

blocks and users APIs experiencing higher-than-expected latencies and error rates

Resolved
Partial outage
Started 5 months agoLasted 4 minutes

Affected

APIs

Partial outage from 6:57 PM to 7:01 PM, Operational from 7:01 PM to 7:01 PM, Partial outage from 7:17 PM to 7:21 PM

blocks.dopt.com

Operational from 6:57 PM to 7:01 PM, Partial outage from 7:17 PM to 7:21 PM

users.dopt.com

Partial outage from 6:57 PM to 7:01 PM, Operational from 7:01 PM to 7:01 PM, Partial outage from 7:17 PM to 7:21 PM

Updates
  • Resolved
    Resolved

    At 10:59 AM, we re-deployed services that had gone into dead-lock waiting for other services to come up. This resolved all issues with higher than expected latencies and errors.

    At the peak (~10:58 AM), less than 3% of requests had to be retried. All systems are back to normal post re-deploy.

    We're actively working on mitigating dead-locking and k8s coordination.

  • Investigating
    Investigating

    We're investigating an incident which was automatically triggered by a health-check failure.

    Starting at 10:57 AM, we received reports of higher-than-expected p99 latency and 500s on both the blocks and users APIs.