Backup System Outage
Incident Report for CubedHost, LLC
Resolved
We've confirmed the system to be operational and all data is in tact. We're working to consolidate the data to the original system until our new backup solution is in place. This may lead to slow restoration times on a temporary basis.

If you wish to obtain a copy of the files from on or before April 6th, please reach out to us as soon as possible. In the interest of resuming our 30 day backup window, the recovered backups will be available for up to 14 days; any backups from March 15th or older may be pruned in the next 72 hours.

At this time, we are considering this incident resolved. Once we've completed some various upcoming maintenances, we will schedule a window for the migration of our backups to the new solution.
Posted Apr 30, 2021 - 22:14 UTC
Update
We've received word that the system is online and operational. We will confirm this during business hours on Friday (today) and evaluate the actions necessary moving forward. Any pending backup restorations or download requests for prior to April 6th will be handled appropriately after our evaluation later today.
Posted Apr 30, 2021 - 06:59 UTC
Update
Unfortunately, there are still no updates to this incident. We've reached out over social media today to try and prompt at least any sort of response to understand the scope of what's happening with the system.

Regarding the new backup system - backups have now been limited to 10 days due to the amount of data that our customer servers require. We're looking at rolling out yet another new solution in the coming weeks to replace both of these systems in the future, opting for a decentralized backup solution. This solution is on the backlog of our tasks until Prisma v1.24.0 is released, likely to be this coming week. Until then, we may opt for twice daily backups for a longer period of time over thrice daily for 10 days. More information will become available via this status page post once we have it available.

We apologize for the inconvenience caused by these issues prompted by both the old backup system availability and the decreased longevity of backups currently being kept. Thank you for your patience while we continue to work to resolve the matter.
Posted Apr 24, 2021 - 17:14 UTC
Update
We've received word that the system is online, but the server is not reachable via ping. We've escalated the issue with the datacenter to resolve the matter. If all goes well, backups from this system will be capable of being reached within the next 24 hours.
Posted Apr 14, 2021 - 06:07 UTC
Update
We're still awaiting additional information from the datacenter. Unfortunately, the system has yet to come back online.

Restorations may be slow for the next few hours - we are running optimization tools on the system to improve our ability to store a larger number of backups.
Posted Apr 08, 2021 - 20:34 UTC
Update
We've received word that power has been restored and the datacenter is now working on restoring all of their networks. We've asked for an update and confirmation if the system was affected by water damage. Once we know more, we'll update you here.
Posted Apr 06, 2021 - 23:58 UTC
Update
There are several reports coming in that the datacenter is back online and individual servers are coming up slowly, one by one. At this present moment, the backup system has not come back online. As soon as the system comes back online, we will copy over backups taken in the past 48 hours to the old system to restore all access to backups.
Posted Apr 06, 2021 - 13:14 UTC
Update
We've received new information from the datacenter:
- Estimation of 90-95% of their servers are entirely unharmed.
- Inspection has taken place and they currently have electricians on-site to resolve issues discovered during the inspection.
- Current ETR is 3:00PM MDT on April 6th, 2021.

It's unclear if the backup system is affected in the small percentage that has any damage to it, and as a byproduct of the potential of any damaged components, the estimated restoration time may not be accurate.
Posted Apr 05, 2021 - 22:10 UTC
Update
We are now restoring the functionality of backups being taken on an automatic basis. These will occur at a reduced rate of 3 backups per day with a maximum of up to 15 days (down from 30) until we're able to restore service to the previous backup system or the next iteration of our backup solution. The new interim system has a reduced capacity of 20TB, compared to our previous 36TB.
Posted Apr 05, 2021 - 19:48 UTC
Monitoring
Backups are completing successfully; it will take several hours for the initial backup process to complete in full. We will resume automatic backups April 5th, 2021 between 12PM - 4PM CST.
Posted Apr 05, 2021 - 05:56 UTC
Update
We're are now beginning a test of the backup system; if all goes well, this will be the first new backup available via Prisma.
Until the previous system is back online and checked for data consistency, any backups prior to April 5th, 2021 at ~12AM UTC should be considered potentially unusable.
Posted Apr 05, 2021 - 05:45 UTC
Identified
We have received confirmation that the power outage incident was due to a fire at the datacenter during a load testing of a UPS system. This prompted the power being cut to the core network and triggered the datacenter's fire suppression. The local fire department chose to cut all power as a precaution. The facility is now awaiting an emergency inspection to receive the all clear to resume services as normal.

From what we're being told, it's likely that there has not been any data loss and the system should be recoverable. If the system needs any components replaced due to water damage, it's likely it will take several days to recover the data on the existing system.

Regarding the replacement interim system, we are confirming all security access has been rolled out to all of our nodes prior to beginning a new backup run. We'll update here once we have any more information. ETA for new backups starting: < 1h.
Posted Apr 05, 2021 - 05:34 UTC
Update
We've identified third-party claims of potential power loss and potential damage to unspecified equipment; it is unclear if this is referencing the electrical/UPS systems or otherwise. We are actively deploying a new system to replace the existing system until further notice.
As we retain backups in an off-site facility, completely unrelated to all production services for service continuity reasons, this does not impact any services other than the inability to restore backups. The new deployment will also be in an off-site facility. Currently, no ETA is available for the completion of the deployment.

Until the existing system becomes available, no historic data will be available for restoration prior to today. We apologize for the inconvenience this may cause and thank you for your patience while we work to resolve this matter.
Posted Apr 05, 2021 - 02:43 UTC
Investigating
We are looking into an issue affecting our off-site backup solution. Until this incident is resolved, backups will not be available for restoration from within Prisma.
Posted Apr 05, 2021 - 00:30 UTC