Website / Misc. Network Connectivity Intermittent Behavior
Incident Report for CubedHost, LLC
Postmortem

The Problem

Today, June 24th, 2019, our website and control panel become unavailable for approximately ~1h45m intermittently. This caused some unwanted behavior and limited access to various parts of the world due to our hosting and CDN providers being affected by this issue.

Our main website infrastructure operates off of Amazon’s Web Services (or AWS for short), as well as is operating behind Cloudflare to minimize the amount of data necessary to load the control panel / speed it up in regions where we do not operate a Point of Presence (PoP).

The Cause

Unfortunately, this issue arose due to a BGP (Border Gateway Protocol; what powers the backbones of the Internet) leak, improperly routing traffic to other destinations or upstream(s). According to various other network administrators, this appears to have been a leak caused by a customer of Verizon’s using the Noction BGP optimization platform improperly. BGP is, unfortunately, by default an unauthenticated protocol and there are actions being taken to begin improvement of this in order to isolate BGP leaks or eliminate them entirely.

Noction’s platform works in a way to allow their customers to route smaller, specific IP prefixes over preferred routes (shorter and lower latency, for example). To provide an example: 209.182.104.0 - 209.182.111.255 is 209.182.104.0/21; if their platform detected 209.182.104.0 - 209.182.104.255 (209.182.104.0/24) had lower latency over a specific route, their platform would tell the customer’s equipment to (in most cases, internally only) to route traffic outbound over the new preferred route. In this case, the traffic was not routed internally only and caused an outage on a much larger, global scale due to, what appears to have been, a misconfiguration of the Noction platform.

Fortunately, our Minecraft hosting platform for all of our US locations would not have been affected by this, however, it’s definitely possible that our Canada, Europe, Singapore, or Australia locations would have been affected. In the US, we ensure to announce the most specific prefix allowed (/24, or 255 IPs) with our own IP address space for this and portability reasons between datacenters based on the needs of each datacenter.

The Solution

In this case, it’s unfortunately up to the responsible party to resolve this matter. In this case, Cloudflare attempted to work with the appropriate parties and, from the looks of it, resolved the matter successfully. The incident lasted up to 1 hour and 45 minutes long and was out of our control. It affected internet giants Cloudflare and AWS, as well as common day to day services like Discord, as BGP is the backbone of the internet.

Posted 3 months ago. Jun 24, 2019 - 19:15 UTC

Resolved
This incident has been resolved.
Posted 3 months ago. Jun 24, 2019 - 15:39 UTC
Update
Cloudflare has posted an update stating that this issue is still ongoing and they're working with the network operator that is causing the BGP leak.
Posted 3 months ago. Jun 24, 2019 - 12:39 UTC
Investigating
We've been alerted to unavailability with our website Points of Presences (PoPs), and upon further investigation, it appears to be due to a BGP route leak affecting Cloudflare and Amazon Web Services (AWS).

For more information, please watch the status pages for the appropriate services:
Amazon AWS: https://status.aws.amazon.com/
Cloudflare: https://www.cloudflarestatus.com/

Unfortunately, at this time, there's nothing we can do on our end to improve the behavior and resolve the matter.
Posted 3 months ago. Jun 24, 2019 - 11:58 UTC
This incident affected: Main Website (cubedhost.com, Legacy Billing Area, Help Center) and Prisma Control Panel (POP - Ashburn, VA, POP - Dublin, Ireland, POP - Sydney, Australia).