SOC Mistake #6: You don’t focus on the big picture
This is a mistake we see a lot in Security Operations Centres that have SIEM Use Cases that have been built using a bottom-up approach. I discussed this in my post SOC Mistake #7: On Use Cases, You Model Your Defences, Not Your Attackers, where SIEM Use Cases are are arrived upon by looking at what event sources are easy to obtain or are available, rather than what is needed to maximise the efficiency and effectiveness of a SIEM’s capability to detect attacks against your key line-of-business infrastructure.
Often in Security Operations Centres that have been built this way, rules cannot be built in a granular enough way to provide the analyst with enough context to determine whether something is a false positive or a real attack without significant digging around. To deal with event volume the Security Operations Centre has to invest significant cost in hiring additional Level 1 Analysts to perform event triage. Another problem with bottom-up rules is that they can be extremely tricky to tune, usually have simple correlations relying on two or three different log sources and simple logic – often tuning them for one scenario may detune the detection capability of another.
In contrast a top-down approach should provide multiple opportunities to detect the attack along the attack chain, so if one component of the staged rule is causing mis-fires into the SIEM it is possible to tune anywhere along the staged rules comprising the attack chain. With this approach you can start to tune out false positives (alarms where there is no real event) without introducing excessive false negatives (missing a real event). The business impact and threat assessment you undertook as a part of your Use Case Workshop should drive what the tolerable level of false negatives is: you compare the operational cost of the additional staffing to handle the false positives you have to keep in because you can’t tune them out without introducing false negatives, against the likelihood and impact of the event if you miss it – of course you can’t make these kinds of judgements if you haven’t taken a top-down approach. I’ll talk more in a later post about collecting and analysing metrics that tell you that you should stop tuning and just can a Use Case and start again with a different approach to it.
Going back to the “Big Picture”, in these bottom-up Security Operations Centres you often see SIEM events in the triage console that resemble the raw events from the event sources, i.e. they don’t resemble ‘actionable intelligence’. In fact in one multinational company that paid a significant amount for their SIEM infrastructure we saw the SIEM platform only receiving events from a single device type, from the same manufacturer – they might have well just dunked the Analyst in front of the management console of the device and forgot SIEM altogether. What constitutes ‘actionable intelligence’ will differ depending on which SIEM-vendors marketing glossies your reading, but to me it is enough information for a Level 1 Analyst to conduct initial triage without having to use a large number of investigatory tools to be able to triage false positives, determine the likely impact of the event on the organisation and determine the level of skill and possible motivations of the attacker.
A Use Case built using the top-down approach will provide this information. The process of building these kinds of Use Cases involve the modelling of vulnerabilities, threats and controls in the people, processes, applications, data, networks, compute and storage for each line-of-business, armed with the information about where in the attack chain the attack has been detected and all of the event information up to the point of detection (or beyond if the rule also triggered a higher-level of proactive monitoring, such as full packet capture or logging of keystrokes or even redirected to attacker to a tarpit to gather further information on their intent, tools, techniques and procedures). This information allows the analyst conducting the triage, at a glance, to make an initial determinations around impact, capability and scope of the attack.
The SIEM platform, ideally, should provide integrated tools for further analysis such as the retrieval and visualisation of related historical logs to look for anomalies, correlations, affinity groups and context; as well as the ability to lookup sources IPs, packet captures or executables against threat intelligence sources- and beyond to query the configuration management or identity management servers to understand the use and recent configuration changes to machines, as well as the rights of users, involved. In fact in HP ArcSight this data can be automatically brought in to enrich the event before it is even opened by the Level 1 Analyst to make them more operationally efficient.
So what is the “Big Picture”, well the answer to that is understanding the Who? What? When? How?, and most difficult, Why? of the attack. Faced with a huge deluge of rule fires that require significant effort to investigate and of which a large proportion end up being false positives which you “never seem to be able to tune out” when something that looks like a real attack is found, the Analyst often will run around with their hair on fire. Often they’ll escalate without answering these basic questions and when C-level exec has been got out of bed they’ll ask relevant questions that often the SOC can’t answer – Who? What? When? How? and Why?
Before an incident is declared the function of a Security Operations Centre in large organisation is to answer those questions – to prepare, to detect and to investigate. They should be able to prioritise the incidents to be dealt with by understanding the capability of the adversary, the impact of the incident and the scope of systems involved. This is the information they should be passing to the incident responders to allow them to contain, eradicate, finally IT operations works with the SOC to eliminate the vulnerabilities/apply additional controls and then recover (and increasing logging to detect if the machine is attacked again). During the whole process the Security Operations Centre should be working iteratively with the incident response team and IT operations.
Bad Use Cases that provide no context of the attack, bad integration of intrusion detection tools, lack of knowledge of context of systems and users, coupled with a lack of analytical skills in Analysts results in the focusing on the individual events, not the scope and impact of a potential incident or breach.
One story we frequently tell is of a SOC we knew of that where they hadn’t reached out to IT Operations department to win their buy-in for obtaining logs. Due to the adversarial relationship with IT Ops and the infosec department the infosec team relied on the logs that they could obtain easily, i.e. the ones from the systems they had ownership of – namely intrusion detection, firewall and anti-virus. Now everyone who works in information security reading this blog knows just how effective these technologies are in 2015 so nothing was triggering the correlations on the SIEM platform (the customer had also just deployed the default content from the vendor, not tuning it to their available resources). Funnily enough the SIEM didn’t detect a large breach very public breach that the customer suffered.
Questions were asked by the CEO about why he wasn’t notified and then why the SIEM product they’d spent so much money on had “failed”. At least as a result of the incident the information security team got carte blanche access to whatever logs they wanted – great right? Well no. The small SOC then on-boarded every single log source they could lay their hands on using a bottom-up approach. The result was chaos – masses of events bleeding into the console providing no answers to the contextual questions and in an overcompensation for not notifying the CEO of the original incident, the SOC team call him out-of-hours over a dozen times in one month over incidents that they had panicked over as they hadn’t been able to truly understanding what was happening.