What We Call Security: Recovery & Regulation (5/7)

In this instalment I’d like to look at recovery from a different perspective: How it relates to corporate liability and other impacts of present and upcoming regulations around Operational Resilience and even Privacy like GDPR.

We know from GDPR that companies often fear the associated fines more than they fear the potential breach itself. It’s bad enough to have a breach and experience weeks of chaos to recover (or hopefully just hours with a top-notch recovery capability), it’s a whole other thing when you’re later publicly investigated and eventually fined a noticeable percentage of your earnings.

Regulation around Operational Resilience, with the potential for eventual penalties, is now starting to appear in multiple countries. This means you may now be penalised for incidents that caused business downtime, in addition to Privacy fines from GDPR, CCPA, etc.

In a nutshell you’ll experience downtime (losses), have to shoulder recovery costs(more losses), and then get hit by a fine for good measure (you guessed it, even more losses). That stings.

To my knowledge, the US, Europe, UK, Australia, and multiple countries in Asia,Africa, and the Middle East are currently working on regulations around Operational Resilience. Most are, for now, focused on critical industries such as financial services and manufacturing.

Fortunately, recovery capabilities can help reduce or prevent the three areas of business and financial losses mentioned above.

Heck, it can help potentially prevent the breach from happening in the first place by empowering the Security as Quality principles mentioned in previous instalments. But in this instalment, I wanted to provide a generalised summary around the salient points of most Operational Resilience regulation, what it means to us, and how fast and effective recovery plays a role:

‍

1. Who is responsible for my organisation’s resilience?

Most regulators are looking at resilience from the standpoint of the organisation and its services. It is not looked at from an IT-specific lens.

It’s therefore no surprise that most see the responsibility for Operational Resilience as the ultimate responsibility of theCOO or equivalent.

For example, the UK’s FCA (Financial Conduct Authority) gives responsibility to what they refer to as “SMF (Senior Management Function) 24”, which maps to the “Chief Operations Function”. This is defined as follows:‍

The chief operations function is the function of having overall responsibility for managing all or substantially all the internal operations or technology of the firm or of a part of the firm.‍

There are however several exemptions to accountability if designated individuals have had part of the above scope delegated to them.

In most scenarios this can make the CIO accountable for the IT-relevant portions of resilience targets based on how the roles are defined to the regulator. In some cases, the CISO may also bear some responsibility, again depending on how the roles are reported to the regulator.

‍

2. How do regulators look at Operational Resilience?

Most regulators I’ve seen use a model where acceptable thresholds are defined. These limits can vary based on industry vertical or even be self-defined. The idea being that organisations should define at what point an outage is causing unreasonable disruption to the parties dependent on their services as well as themselves (because that would inevitably impact the consumer of their services).

For example, it may be tolerable for a banking customer to be unable to access funds for a few hours, but a few days would be considered excessively disruptive. Some real-world examples of excessive disruption could include the relatively recent US FAA outage, the UK Royal Mail outage, and several UK and US Bank outages (some of which resulted insignificant fines). Considering the impact these incidents had, it’s no wonder regulators are keen to prevent incidents of that scale from happening.

Organisations must be prepared to meet the defined criteria for their various business activities. These must be reasonable self-defined values that in some cases need to be registered with the regulator. In some cases, the regulators themselves may define and impose these thresholds.

‍

3. How does Organisational Resilience differ from (and complement) IT resilience?

When it comes to Operational Resilience, regulators are looking at organisational resilience and not just IT resilience.

While IT resilience typically supports many business processes, it’s important to consider that disruption may come from non-IT failures, that the recovery of IT services may not fully restore the business process by itself, and that some business processes may not involve any IT at all.

Operational Resilience regulations’ primary concern is not your IT, it is your ability to deliver services.

Conversely, switching operations to pen and paper can be perfectly acceptable from the regulator’s standpoint if service levels are reasonable.

All this means that your approach to resilience, in the context of these customer-focused regulations, cannot be siloed. It must consider the full business process and all departments involved in the delivery of the service to the customer.

4. How does Cyber Insurance relate to Operational Resilience?

Cyber Insurance is a common component of an organisation's resilience strategy. It helps an organisation mitigate the financial impact of an incident by providing some coverage for expenses such as business interruption, legal fees, and data recovery costs. Its other important value proposition is that it typically gives access to expertise brought in by the insurer after an incident that can help accelerate recovery to some degree.

However, this assistance cannot be used to establish defined recovery times before an incident and can therefore not be relied on for defining or meeting the recovery targets looked at by regulators.

It’s best to think of the insurer’s post-incident services as something to mitigate unexpected situations beyond your standard recovery plan. Don’t forget that there will likely also be limits in the scope of coverage and the potential risk of denied claims. In other words: Do not rely on insurance as the basis for your recovery capability, it will not tick the box.

Conversely, an organisation’s existing recovery capabilities are likely to be a key factor in its ability to secure coverage, the amount of coverage the insurer will be willing to extend, and the associated premiums.

We have moved from a footing where many organisations relied on insurance to provide them with the desired resilience, to one where resilience is a prerequisite to obtaining the insurance.

In short, Insurers are more likely to cover you, for less, the better your recovery capability is.

‍

5. What’s the best way to beat resilience targets?

In a word: Speed. Your recovery must be effective and complete. Meaning it must be well planned and consider all elements of business process (IT or otherwise) necessary to resume or continue operations and service delivery. This can entail restoring normal service or bringing up some temporary way of working, potentially with reduced but acceptable capacity.

Once you are confident that you have ways of restoring or maintaining your business functions, what matters most is simply how fast you can execute your recovery process. Operational resilience targets, as seen by regulators, are defined by the speed in which you can bring back service.

This means speed is the decisive factor in meeting targets to achieve and maintain compliance to these new regulations.

Preparation, planning, and testing are important pre-requisites. No complex recovery has a good chance of success without them. But they should be in place before the incident. Once a recovery needs to be triggered, it’s all about how fast you can run your recovery process.

Now feels like a good time repeat something I mentioned in the last instalment: Hitachi Vantara’s recovery solutions, when used with VM2020’s CyberVR, have been tested as the fastest recovery solution on the market today.

That speed is going to help beat those regulatory targets, not to mention lower the financial impact to the business. Speaking for myself, it’s an advantage I want.

6. What about forensics?

Quick story: My first encounter with Hitachi Vantara solutions was through VM2020’s CyberVR solution. It had nothing to do with recovery at the time, I was more interested in the capabilities VM2020’s Thin Digital Twins around security remediation and testing.

But I then saw that Digital Twins can have powerful uses in recovery scenarios as well.

I will discuss the possibilities Thin Digital Twins give us more fully in a future instalment, but I want to quickly mention one of them in the scope of this article: Forensics.

One of the biggest delays to recovery speed is the need for forensics after an incident. Traditionally, recovery cannot take place until forensics are completed. Something that can delay recovery by days, even weeks.

Due to their ability to replicate environments using only a fraction of the computing and storage resource, Thin Digital Twins allow for forensics to be done largely in parallel to recovery activities without the need to make a full copy of your environment (something most organisations do not have the capacity for).

This can save significant amounts of time and make the difference in meeting recovery targets, getting the business back up, and avoiding fines.

More on Digital Twins in a future instalment of this series.

‍

To summarise, while we often look at recovery capability from the perspective of whether we can recover, the element of how quickly it can be achieved is often not as well considered. Careful planning and selection of recovery capabilities (technologies) and maximising their effectiveness through correct implementation and planning is key not just in minimising downtime, it’s particularly crucial for meeting regulatory targets.

And that concludes this instalment which I hope helped frame what factors, both and in the future, matter to regulators and how to best meet them.

Join us next time as we take a closer look at what else Thin Digital Twins can do for us.

‍

Next article: What We Call Security: Simulation for Remediation and Design (6/7)

What We Call Security: Recovery & Regulation (5/7)

Make the shift today towards proven cyber resilience

Info

general

Newsletter