Service performance issues
Incident Report for Uscreen
Postmortem

We wanted to provide a detailed postmortem of the service outage that occurred on Saturday, June 15th, from 16:11 UTC to 20:11 UTC.

Summary of the Issue:

The outage was caused by issues with our Load Balancer. While not every store was impacted, those with older DNS configurations experienced interruptions.

Background:

Last year, we upgraded our Load Balancer Services to utilize Google Cloud Services, offering enhanced reliability and security. This upgrade ensures faster and more reliable service for our customers’ catalogs. However, stores still using our older DNS configuration were affected during this outage.

Steps Taken:

  1. Immediate Response: Our team quickly identified the Load Balancer as the root cause and worked to restore service.
  2. Resolution: Service was fully restored by 20:11 UTC.
  3. Preventive Measures: We added another layer of WAF at the edge using Fastly, in addition to the Google Cloud Armor security layer.

Next Steps for Affected Stores:

Starting this week, to prevent future issues, stores using the older DNS configuration will see a new banner on their dashboard. The banner provides steps to confirm domain ownership and transfer DNS configurations to our new servers. This process takes several hours but will not result in downtime for your members.

For those who do not manage their DNS settings, please coordinate with the team member responsible for this task.

Attached is an example of the banner you will see.

We apologize for the inconvenience caused and appreciate your understanding and cooperation in making these necessary updates.

Posted Jun 17, 2024 - 14:53 UTC

Resolved
This incident has been resolved.
Posted Jun 15, 2024 - 20:11 UTC
Update
Fixed applied and currently in monitoring. You might receive a message in the Dashboard in order to update your domain DNS settings; please follow the step-by-step instructions there.
Posted Jun 15, 2024 - 19:51 UTC
Update
Users with outdated DNS settings getting the error. If you use an old DNS, please update your domain DNS settings.
Posted Jun 15, 2024 - 18:57 UTC
Update
Service responses have been stable for the past 32 minutes. We keep monitoring.
Posted Jun 15, 2024 - 18:42 UTC
Update
We are back online. Continue monitoring.
Posted Jun 15, 2024 - 18:15 UTC
Update
We are still working on the issue.
Posted Jun 15, 2024 - 17:50 UTC
Update
We saw no issues in the past 25 minutes. We will continue monitoring.
Posted Jun 15, 2024 - 17:28 UTC
Update
We are continuing to monitor for any further issues.
Posted Jun 15, 2024 - 17:09 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Jun 15, 2024 - 17:09 UTC
Investigating
We are currently investigating an issue with performance & availability of our services. We'll post updates as soon as they are available. Thank you for your patience.
Posted Jun 15, 2024 - 17:01 UTC
This incident affected: API (API V1, API V2) and Admin Portal, Storefront.