SOC Mistake #1: You're Over-reliant on Protective and Detective Controls (Part 1)
It’s only taken me a few years to get from starting off this blog post series to getting to the number one mistake I’ve seen in Security Operations Centres, but now it’s here and it is the biggest mistake I’ve seen organisations make by far and this can be proven by the current pandemic of ransomware attacks we’ve seen over the past couple of years.
As anyone who’s read my blog for a while knows, I like frameworks and taxonomies - they give us a common language and structure to share our best practices, gaps in capability and our common experience across different organisation structures and vertical markets. Today I’m going to focus quite a lot of the NIST Cyber Security Framework (CSF) as I personally think it’s one of the best overall frameworks for cyber risk management and operations and it provides a good solid grounding for me to make my observations around.
NIST CSF spans the gamut from the strategic 5 functions at the top right down to pragmatic implementation guidance through its Informative References. While originally designed for Critical National Infrastructure, its adoption and use has spread well beyond that with 70% of senior security leaders seeing it as representing best practice in a recent report on security frameworks.
For this unfamiliar with NIST CSF, recognising the United States’ dependence on technology and the increasing threats against it that could impact national and economic security President Obama issued Executive Order 13636 Improving Critical Infrastructure Cybersecurity in February of 2013. The Order directed the National Institute of Standards & Technology (NIST) to work with stakeholders to develop a voluntary framework – based on existing standards, guidelines, and practices - for reducing cyber risks to critical infrastructure. The framework created contains voluntary guidance, based on existing standards, guidelines, and practices that helps an organisations to better manage and reduce cybersecurity risk. It was designed to foster better risk and cybersecurity management communications amongst both internal and external organisational stakeholders, which is something often lacking in todays complex, interdependent and outsourced supply chains and with the often adversarial relationships between IT and security functions within organisations.
At the top level, NIST CSF is organised around five key Functions that largely represent the chronological stages of a security security incident and cyber risk management:
Identify - develop an organisational understanding to manage cyber security risk to systems, assets, data and capabilities.
Protect - develop and implement the appropriate safeguards to ensure delivery of services.
Detect - develop and implement the appropriate activities to identify the occurrence of a cyber security event.
Respond - develop and implement appropriate activities to take action regarding a detected cybersecurity incident.
Recover. - develop and implement the appropriate activities to maintain plans for resilience and to restore any capabilities or services that were impaired due to a cyber security event.
This all makes sense, apart from in almost all organisations I’ve been embedded within or evaluated the operational capability of these steps are either missing, or severely broken and I will deep dive on the specific problems I’ve seen and some pragmatic steps that can be taken in each of the functions in further blog posts.
Just as an example, in the identify function most organisations don’t know what they’re protecting. Configuration Management Databases are a well-known inadequacy in most organisations and when they do exist, they normally focus on hardware and software components - exactly the types of components that have become commoditised due to virtualisation, cloud adoption and orchestration, these things can be stood up and configured in seconds. It is the data that is important, and it is the data that an adversary is after to either steal, delete or corrupt. Knowing your data is critical to modelling an adversaries likely motivations, needed to calculate the likelihood quotient of a risk calculation, it’s also critical to understanding the regulatory and business impact of an event. In other words: if you don’t know your data, your entire risk calculation and your entire controls catalog is based on a pack of cards. As I said, I will deeper dive into each of the CSF Functions in greater detail in later posts but I wanted to give a flavour of what the discussion points will be like.
Going back to the broader discussion of SOC Mistake #1, in order to be Cyber Resilient organisations need to take a balanced approach to the five CSF Functions. The Cyber Resilience of an organisation will only be as good as its weakest CSF Function.
As discussed above, most organisations fail to Identify the criticality of data to delivery of their mission as well as the regulatory and compliance obligations of that data. Instead they focus on the repositories of that data, an increasingly thankless task with the advent of micro-services architectures where the repositories are a myriad on-premise, cloud and third-party ones linked together in a mesh.
Instead most organisation’s information security teams still possess a “moat-and-wall” mentality believing that they can build impermeable defences against cyber attacks. Fuelled by security vendors selling Silver-Bullet solutions, seeding fear-uncertainty-and-doubt in their customer’s minds while, I have to admit, often taking us out for some very nice lunches and on hospitality jaunts.
The challenge with a purely protectionist approach is that we face a human adversary that will continually adapt and our defences will always lag behind - more of which I’ll discuss in later posts as I deep dive on the challenges in the Protect and Detect CSF Functions.
While we can stop a large number of attacks, we can’t stop or even detect some of them. Now in the past when cyber security incidents largely had secondary losses only: regulatory fines, litigation and loss of reputation - now a single cyber security incident such as a wiper or ransomware attack has primary losses: the inability of an organisation to execute its mission - now the need to readdress our inadequacies in Respond and Recover come to the fore.
In a survey conducted by Sophos - a company that makes and sells end-point and network security solutions - they said that 75% of organisations that have paid ransoms in order to recover their data had up-to-date network and end-point security solutions in place. The reality is that many of the “ransomware protection” solutions on the market today are the same products that have been providing malware protection for decades just rebadged to appeal to CISOs desperate to show their leadership that they are “doing something about ransomware”.
As we know, ransomware shares many of the same attack vectors as traditional malware, it’s just the actions-on-objectives that differ (or share some similarity in common in the case of double extortion ransomware) and we all haven’t experienced any malware incidents while we’ve had those solutions in place, right, right? **tumbleweed crosses across the dusty road**. So while these solutions have been preventing a large number of malware infection attempts for traditional data exfiltration attacks, they haven’t been infallible. In the past CISOs haven’t been too concerned as even if an incident happens, yes they have egg on their faces but business continues during a data exfiltration incident, the impact is on the data subject victim not on the organisation. There could be some regulatory fines, litigation and loss of reputation but these aren’t show stoppers as the breaches at JPMorgan, Sony and TalkTalk show.
Traditional malware attacks required human effort to execute and average dwell times would sit above 200 days, providing enterprises with plenty of time for the, according to Deloitte, average of over 130 different protective and detective security controls to actually catch up with what adversaries were doing. Now the ransomware action-on-objectives is simple: encrypt - it doesn’t need human intervention, it can be fully automated, leading to dwell times of hours or days.
Then our current response to ransomware recovery is to rely on systems and platforms that themselves my not be available due to the ransomware incident. Have they been wiped? Has the backup itself been encrypted? How will communications be conducted between IT and Security when our ticketing or case management system is encrypted? Will the telephones work because they are based on VoIP? Will we be able to log into systems that use our IDAM server that is unavailable? Will we be able to even investigate, understand the attack and remove the vulnerabilities before we put systems back online as our platforms are down? Organisations that believe their protective and detective controls are infallible often don’t ask themselves those questions - and they’re the ones that we’re seeing suffer the most in the current ransomware pandemic. The impact of ransomware on organisations is a symptom, the actual disease is a lack of focus on cyber resiliency.
So we have this perfect storm of not understanding what we’re protecting, overly relying on platforms that can not provide infallible protection and then not being ready when the incident hits the fan. In the remainder of this series on SOC Mistake #1 I’m going to dive deeper into each of the NIST CSF Functions and discuss current state in most organisations and pragmatic steps that can be taken to move to a posture of resilience.