Today, June 24th, 2019, our website and control panel become unavailable for approximately ~1h45m intermittently. This caused some unwanted behavior and limited access to various parts of the world due to our hosting and CDN providers being affected by this issue.
Our main website infrastructure operates off of Amazon’s Web Services (or AWS for short), as well as is operating behind Cloudflare to minimize the amount of data necessary to load the control panel / speed it up in regions where we do not operate a Point of Presence (PoP).
Unfortunately, this issue arose due to a BGP (Border Gateway Protocol; what powers the backbones of the Internet) leak, improperly routing traffic to other destinations or upstream(s). According to various other network administrators, this appears to have been a leak caused by a customer of Verizon’s using the Noction BGP optimization platform improperly. BGP is, unfortunately, by default an unauthenticated protocol and there are actions being taken to begin improvement of this in order to isolate BGP leaks or eliminate them entirely.
Noction’s platform works in a way to allow their customers to route smaller, specific IP prefixes over preferred routes (shorter and lower latency, for example). To provide an example: 126.96.36.199 - 188.8.131.52 is 184.108.40.206/21; if their platform detected 220.127.116.11 - 18.104.22.168 (22.214.171.124/24) had lower latency over a specific route, their platform would tell the customer’s equipment to (in most cases, internally only) to route traffic outbound over the new preferred route. In this case, the traffic was not routed internally only and caused an outage on a much larger, global scale due to, what appears to have been, a misconfiguration of the Noction platform.
Fortunately, our Minecraft hosting platform for all of our US locations would not have been affected by this, however, it’s definitely possible that our Canada, Europe, Singapore, or Australia locations would have been affected. In the US, we ensure to announce the most specific prefix allowed (/24, or 255 IPs) with our own IP address space for this and portability reasons between datacenters based on the needs of each datacenter.
In this case, it’s unfortunately up to the responsible party to resolve this matter. In this case, Cloudflare attempted to work with the appropriate parties and, from the looks of it, resolved the matter successfully. The incident lasted up to 1 hour and 45 minutes long and was out of our control. It affected internet giants Cloudflare and AWS, as well as common day to day services like Discord, as BGP is the backbone of the internet.