SOC Mistake #9: You don’t tier your SOC staff

Security Information and Event Management (SIEM) platforms are all about turning the mass of raw events that occur in your organisation’s infrastructure into intelligence that can be assessed by analysts and incident responders to identify and react to information security incidents.

SIEMs, despite what vendors will tell you, are not magic.  It will take you months to tune your ruleset to eliminate the bulk of false positives and you’re probably working against a moving target of an increasing number of event sources as well as continually having to adjust the rules to detect the new threats you’re facing.

To ensure the maximum use of your highly-skilled trained analysts, it is common to tier your analysts into at least two layers.

The initial layer that are solely responsible (at least to start with) for the triage of incoming events.  That is the identification of false positives, ensuring the appropriate prioritisation and escalation.

In an effective SOC, however, these level 1 analysts are not simply “click-monkeys”, as well as triaging false positives they should be doing some form of initial assessment so they can evaluate the potential impact and scope of the incident.  They should also be performing some form of adversary characterisation by evaluating where in the attack chain the event was detected (further down the chain, such as at the command or control or lateral movement stage,  may imply that they have conducted significant reconnaissance and have crafted a specific exploit to be undetectable to your host or network Intrusion Detection/Protection System – this implies a motivated and fairly skilled adversary) and they should also be, from their initial investigation, ascertaining the potential impact to the business.

Often the SIEM will have some form of prioritisation algorithm based on a number of factors, but only a human analysts can take all of the context into consideration (Skill level of attacker? Does the attacker exhibit known behaviour in their Tools, Techniques and Procedures (TTP) that can assist with attribution?  What is the apparent intent of the attacker (disruption, theft, espionage)?  Is this a one-off event or part of a sustained campaign?  Does the attack demonstrate investment of a lot of time or funds (use of zero days, for instance)? What systems are effected and what line-of-business do they support?

Only events assessed as what the level 1 analyst deals real events are escalated to the next level of more skilled analysts to conduct a deeper level of investigation.  You can create specialisations at the Level 2, or above, layers to allow workflows to be created that direct events of a certain category to specific analysts, or groups of analysts.  Some organisations have as many as three or four tiers of analysts, gradually becoming more skills and specialised as you move up the chain.

Any false positives discovered by the analysts can be routed to content authors who can further tune the SIEM rules to try and prevent the false positive from occurring in the future.
The focus should be on making this process as efficient and repeatable as possible, while allowing the collection of metrics to support continual improvement.  For instance, in HP ArcSight, we create ActiveLists for a ‘triage channel’ and the ‘content needs tuning’.  As we’re largely automating this workflow we can collect metrics on key operational Key Performance Indicators such as time-to-triage, time-to-investigate, number of false positives per use case category, number of events escalated per analyst, number of incorrectly categorised false positives per analyst.  These metrics, when combined together, can help you achieve the right balance of efficiency and effectiveness.

In my practice we’ve evaluated hundreds of Security Operations Centres were all of the analysts are highly trained and all operate at a single tier.  They all randomly pick the events they wish to work on off the console and do their typical ‘deep dive’ investigation.  This causes several problems:

  1. It’s hard to maintain but a broad-spectrum of investigatory skills needed for triage of all event types and a deep-level of specialisation to do a full investigation;

  2. The analyst may prefer to investigate specific categories of events, meaning that some event types may remain in the triage channel for extended periods of time;

  3. Having your highly-skilled analysts conduct the initial triage of false-positives is a bad use of their time; and

  4. Often Security Operations Centres find it really difficult to produce meaningful metrics on the overall performance of the capability, or individual analysts.

Implementing at least a two-tier system of triage/prioritisation and investigation can dramatically increase the performance of your Security Operations Centres.

Previous
Previous

SOC Mistake #8: You don’t speak the language of business, you speak the language of security

Next
Next

SOC Mistake #10: You confuse your SOC with your NOC