A tale of two outages: How prepared are you for the next disruption?

August 20 2024, by Phillip Wallace | Category: Cloud Services

The dust has settled from last month’s CrowdStrike incident, and it has already been labeled the most significant IT outage in history. As the post-incident reviews and analysis continue at this stage at least two things seem to be undeniable:

Businesses, governments, and essential services and infrastructure around the world can indeed be brought to a standstill by a single error in a single part of the technology ecosystem; and
It doesn’t take a determined cyber adversary to cause maximum disruption and chaos.

As it happens, resolving this particular issue was a relatively straightforward (albeit time-consuming) fix for most IT teams. And for now, the immediate focus will be on recovery for most. However, for many of us, it’s also going to serve as a timely reminder to revisit business continuity management and disaster recovery plans.

How ready are you in case something similar (or worse) happens again?

Not an isolated incident.

Of course, the CrowdStrike incident is not the only disruptive event of the past few months. In fact, it’s the second major recent event that can be attributed to an accident, rather than malicious actors.

Picture this: you’re the CEO of a multi-million dollar financial institution. It’s 7.30am, the start of a seemingly normal work day. You’ve just arrived at the office and fired up your laptop, when you realise there’s something wrong with your internal systems and website.

Then the phone rings. It’s your head of IT with some out-of-the-blue news: overnight, your frontline cloud provider has deleted both your primary account and your backup account. Just like that, the data of more than 700,000 of your members around Australia has vanished – without reason and without warning.

For company leaders, this scenario is the stuff of nightmares – and yet, it’s a true story. You may have already recognised it as the 2024 UniSuper incident. As the funds manager was thrown head-first into an existential crisis (through no fault of their own), onlookers were left to shake their heads in disbelief. How could this happen? What could UniSuper have done differently?

Here’s the thing: far from being a cautionary tale, UniSuper did everything right in this circumstance. As time passed, it became clear that UniSuper had been rigorous in terms of business continuity planning and disaster recovery. The management team thought ahead and had core principles in place, as well as a well-tested, documented plan.

In the end, this preparation was the difference between a couple of weeks without access to member data – a moderate but manageable inconvenience – and a catastrophic, months-long outage that may have been game over and lights out for UniSuper.

Key takeaways from UniSuper.

The UniSuper incident has been analysed heavily elsewhere, so I won’t pull it apart in detail here. There are three key points that stand out to me:

Multi-cloud saved the day. While they had a primary cloud provider in place, their data backups and services were spread across multiple clouds. Different data sets were backed up separately. While there was some immediate crisis management work to communicate with members who were concerned about their savings, other parts of the organisation were able to continue operating – including critical trading and portfolio management functions.
Never assume when it comes to cloud security. It’s tempting to assume that data in public clouds is secure by nature, however UniSuper has underlined the danger in this thinking. When you read the fine print, most public clouds promote a shared responsibility matrix for public clouds. Customers are required to protect their own data.
Cyber security is about more than bad actors. Without labouring the point, this has been a timely reminder that while bad actors and cyber criminals are a significant concern (the most recent ASD Cyber Threat Report recorded 93,000 cybercrime reports, up 23% on the previous period), they are not the only concern.

Humans make mistakes and systems can (and do) fail without notice. Both these scenarios can cause just as much disruption as any cyber criminal. The key message? Be prepared for anything (and everything).

Evaluating the true cost of downtime.

For company executives and other leaders, now is the time to consider a fundamental question: what is the true cost to my business likely to be if we’re down for 33 (or more) days. (Statista lists 33 days as the average time to complete forensic investigation of an attack). You’re no doubt already considering those implications from a financial, operational and reputational perspective to capture the possible impacts.

Meanwhile, the hidden costs of downtime are harder to quantify, yet also worth considering. For example, a negative impact on employee morale and turnover is likely to be one immediate outcome. Stagnation of innovation and continuous improvement is another risk, particularly if critical resources are diverted to address downtime issues.

To ensure your business continuity plan is cyber-ready, and determine the true costs to the business, we encourage you to ask your IT team about your current Recovery Point Objective (RPO – the maximum amount of data loss you can tolerate without significant impact on operations) and Recovery Time Objective (RTO – the maximum acceptable amount of time that a business process or system can be down without significant impact).

RPO and RTO are the two critical metrics that will help you plan and prepare for potential disruption. Essentially, this information will help you establish how much data you would lose in the case of an outage, how much time it would take to restore it, and ultimately what it would cost the business.

A defined and documented RTO and RPO – well-tested with regular drills and training – will help you be more resilient, with quicker recovery and less impact on operations if disaster strikes. Without these objectives, your next 7.30am phone call from IT may not be a pleasant one.

Be prepared for anything with Macquarie Cloud Services.

In today’s landscape, the question of operational disruption is not so much “if” but “when”. With nearly 94,000 cybercrime reports made to the Australian Signals Directorate in 2022-2023, it always pays to be prepared.

At Macquarie Cloud Services, our end-to-end business continuity planning services provide you with the right advice, support and infrastructure to ensure your operations are resilient in the event of an emergency. We’ll ensure you’re back up and running with minimal downtime, even if the unthinkable occurs.

Reach out to us today at 1800 004 943 or drop us an email at enquiries@macquariecloudservices.com to explore how we can help you.

Get in touch.

Enquiry Sent.

Cloud Reset – The Podcast | Episode 7:...

Cloud Reset – The Podcast | Episode 6:...

Cloud Reset – The Podcast | Episode 5:...

Macquarie Cloud Services