SOC Automation Best Practices

Organizations use SOC automation to improve the efficiency and effectiveness of their security operations centers (SOCs). This allows organizations to scale their SOC using technology rather than human resources, enhance the consistency of their SOC processes, and enable humans to focus on other important tasks.

When done effectively, SOC automation:

Improves the consistency of security incident response processes and investigations
Improves mean-time-to-respond and mean-time-to-resolution metrics
Enhances an organization's resilience to threats
Frees up human resources to focus on more strategic tasks

This article describes the best practices for implementing SOC automation. Each best practice includes practical examples and specific guidelines for any organization to enhance SOC automation.

Summary of SOC automation best practices

Define the level of autonomous decision-making

Adopting automated tooling within SOC operations can be a double-edged sword for many organizations. On the one hand, automating manual, repetitive processes reduces the load on human operators, allowing organizations to become more efficient and streamlined. At the same time, there can be valid concerns about what actions an automated process can take and how these processes can be controlled and managed.

To address these concerns, invest time with decision-makers and stakeholders to establish the right amount of automation for your organization. Follow the three processes below to define the limits of autonomous decision-making within your organization:

Discuss three hypothetical scenarios: Identify three practical scenarios where automation can be implemented. Then map out the current processes and identify areas where automation could add value. From these discussions, identify the comfort level of autonomous decision-making and where humans should be included in the loop.
Conduct parallel sandbox testing on real alerts: Convert the discussions above into three actionable events, then deploy the actions into a sandbox area and observe what happens. Once enough data has been gathered, return to the discussions in the first step and reconfirm any concerns or challenges.
Deploy limited automation first, then build: Once the decision has been made to deploy live automation, start with simple, automated actions. This allows the stakeholders involved to get used to the presence and actions of automation in the environment.

Orchestration vs. automation

When defining the limits of autonomous decision-making, consider the difference between automation and orchestration. Although these two terms are often used interchangeably, they have distinct differences that can impact the level of decision-making allowed.

Automation considers the linear relationship between two tasks. For instance, automating a process means taking that process and removing the need for human involvement so that a machine can do it.

Orchestration adds layers of coordination and decision-making over standard automated tasks.

Take, for instance, the need to capture a memory dump on a laptop. A typical automation would simply specify the laptop asset and then direct the automated process to act. In contrast, an orchestration could enhance the automated flow by:

Checking to ensure that the laptop has enough disk space to cache the memory dump
Initiating the memory dump
Providing updates during the action
Branching based on different scenarios

Even though the end result is the same, an orchestration can consider more variables to achieve the outcome; as a result, it has more autonomy in performing actions. In turn, this means delegating more decision-making authority to orchestration.

Exploring the nuances of these two approaches should be a focus of the three processes mentioned earlier.

Workflow orchestration and automation for security teams

No code or low code - no custom development necessary
Integrates with all your systems - internal and external
Built-in safeguards like credential management and change control

Learn more

Choose no-code tooling with built-in integrations

Once an organization decides to implement SOC automation, the next step is to select the appropriate tools—either new or existing—that will be utilized. While this is often an engineering decision, security professionals should ensure that specific requirements are included in the specifications for these tools:

No-code capability: Choose tools that have extensive support for no-code solutions. This reduces the barriers to entry for tooling use, speeds up tooling adoption, and reduces workload. No-code tools typically include easy-to-use workflow orchestration and automation builders, further simplifying their adoption.
Extensive integrations: Select tools that offer a wide range of prebuilt integrations right out of the box in addition to the ability to integrate with any API. Don’t just verify that the tool provides REST APIs; ensure that it includes integrations already designed to connect with other REST APIs in a user-friendly format. This approach allows you to focus on developing workflows instead of spending time establishing connections between different tools.
Scaleability: Choose tooling that autoscales as your organization grows and develops. This helps future-proof your tooling selection.

Enhance alert triage

One of the most challenging areas of SOC activity is alert management and triage. Many SOCs report being overwhelmed by the quantity and variety of incoming alerts, leading to alert fatigue and decreased alert triage quality.

Common causes of alert fatigue

When organizations investigate the root causes of alert fatigue and decreased alert triage quality, four things are often identified.

First, a significant amount of time is wasted responding to incorrect or inaccurate alerts. This can include external notifications about malicious IP addresses the organization's infrastructure has never accessed or security alerts regarding file-sharing activity that ultimately turn out to be legitimate. These unverified alerts consume the cognitive resources of staff and reduce their overall sense of urgency, resulting in missed alerts or slow response times.

Second, manually gathering necessary contextual information consumes initial response times and reduces the quality of information for valid alerts. For instance, SOC operators might need to perform manual searches for compromised asset details and their associated user accounts. When done manually, each task requires human resources to divert attention from incoming alert triage activities to gather the information needed to progress the alert. The time lost to these activities tends to compound as human operators become exhausted from constant context-switching and high-focus activities. The end result is that alerts may be ineffectively triaged, further slowing down response times.

Third, organizations often struggle to consistently record and capture critical information effectively. Human operators, faced with an overwhelming influx of alerts and pressured to reduce triage time, may forget or incorrectly document key details about the alerts they are handling. This oversight can lead to significant downstream consequences, including repetitive triage work by subsequent SOC operators, flawed decision-making based on inaccurate data, and missed opportunities for correlation.

Finally, organizations often find it impossible to consistently identify complex activities that require multiple pieces of information to correlate. Human operators become so task-focused on solving the next alert that they lose the situational awareness to consider the current alert in the context of all of the other alerts that have been generated. This challenge becomes even more acute when correlation activity spans multiple days and weeks.

SOC automation can be used to address each of these challenges.

See Tines’ library of pre-built workflows

Visit the library

Qualify incoming alerts

Use SOC automation to enhance alert triage by automatically qualifying and verifying incoming alerts. In this approach, an automated workflow orchestration or automation tool performs a predefined set of alert verification activities. Once an actionable decision is reached, the alert branches into actions such as notifying, escalating, or resolving.

Typically, the automated checks mirror SOC operator actions. However, when done using SOC automation, the checks are done automatically before the alert arrives in a SOC operator's queue. As a result, SOC operators can focus on actionable alerts.

For instance, the workflow below receives an IP address and then searches a database engine called Cribl to see if it has been observed. Depending on the organization's autonomous decision-making levels, this workflow can be appended with actions to resolve the alert if the IP address has not been seen or escalate the alert if the IP address has been seen.

Enrich qualified alerts

Use SOC automation to enhance alert triage by enriching qualified alerts with contextual information. In this approach, automation attaches additional information to an alert, ensuring that it has all the necessary details to advance. This could mean including something as straightforward as the user ID of an affected user or something more complex, such as a network snapshot from the past 10 minutes.

As with alert qualification, this activity typically automates existing triage and alert processing steps.

To see this in action, consider this workflow orchestration, where Elastic alerts are triaged through a workflow orchestration and automation tool. The workflow triages alerts while updating its progress in an Elastic SIEM case.

Automate information capture

Use SOC automation to enhance alert triage by automating information capture. In this approach, automation captures specific information and ensures that it is recorded appropriately. This might include updating Jira tickets, creating new cases in an event management system, or simply backing up information to a different location.

Leveraging SOC automation allows organizations to ensure that critical data is continuously captured, which, in turn, helps with downstream analysis processes. For example, the previous workflow continuously updated an associated SIEM case.

Alert correlation

Consider this scenario: Over the course of several months, an advanced persistent threat (APT) actor undertakes a series of actions to infiltrate a targeted network. While some of these actions may trigger an alert, each individual alert is minor enough to be discarded by the SOC. It is only when the alerts are correlated over time that a clearer picture can be seen.

This kind of activity is almost impossible for SOC operators to identify. Tracking multiple, low-priority alerts over an extended timeframe is beyond the scope of most SOC operations teams, especially when they are already under pressure from new alerts coming in. Even in the case of advanced threat-hunting teams, this kind of activity is notoriously difficult to detect without an initial tip-off or alert.

Automation can be leveraged here, enabling the development of workflows that can perform advanced correlation activities on incoming alerts without burdening existing human resources. For example, SOCs may choose to automate some of the more complex attack detections from the MITRE Att&ck framework or develop their own in-house correlation approaches.

These longer-running, more complex orchestrations still require initial human involvement to set up and design; however, once implemented, they will work constantly and consistently, improving the overall efficacy of alert triage.

Automate playbooks

Many SOCs utilize templated playbooks when responding to legitimate alerts. While playbooks can range from high-level workflows to detailed step-by-step instructions, their main goal is to strike a balance: They should provide sufficient information to complete the task without being so detailed that they require constant updates as technology evolves and tech stacks change.

The problem for many SOCs is that playbooks eventually start to hinder operations. Playbooks tend to increase in quantity and complexity over time until more time is spent ensuring that the playbook is accurate than maintaining active situational awareness of SOC activities. As overwhelmed SOC operators struggle to balance the competing priorities of time and accuracy, playbooks may be discarded or followed loosely, leading to many of the same problems they were initially created to address.

To solve this challenge, use SOC automation to automate SOC playbooks, leveraging the steps described below. Note that this section assumes that playbooks have already been prepared for automation, a process that can be seen in more detail here.

Automate the most manual tasks first

Start playbook automation by automating simple, manual tasks. Ideally, these tasks should offer real value to SOC operators while still possessing clear and discrete outcomes.

For example, a common investigative action for a SOC is establishing the number of endpoints impacted by a defined malware event with an initial network vector. This is a great candidate for initial automation as it offers clear value to SOC operators, requires relatively straightforward integrations with a SIEM, and has clearly defined outcomes for success.

In contrast, attempting to establish a malware mutex from a laptop's memory dump would not be an effective candidate for initial automation. This complex task requires multiple processes to work effectively, along with some human involvement and decision-making.

Build linear workflows

Next, start implementing straightforward, multistep workflows. Choose workflows that are linear in nature but require a series of automated actions. Ideally, these workflows mirror playbooks that add value to SOC operators without being overwhelmingly complex.

For instance, building on the malware event example previously, a more complex workflow could involve analyzing the asset for evidence of program execution on Windows machines. This might include auto-analyzing artifacts such as the Background Activity Monitor, prefetch folder, and ShimCache in the impacted machines. This would require the automation to:

Identify which assets have been impacted (as described above)
Determine which assets are Windows-based
Connect to each impacted asset and extract or analyze the specified artifacts
Report the results back to the SOC operator

Each step would typically take a human operator a significant amount of time to perform manually, making it a good candidate for workflow automation. It offers clear benefits for SOC operators by simplifying a time-consuming process, but it is a linear task with clear if-then actions.

Block diagram of an automated workflow (source)

Integrate automated response actions

For most organizations, the next step in playbook automation is adding specific response actions to their automation and orchestration workflows. In this step, start defining actions that the automated tooling can take. These actions should be discrete in nature and use existing tooling. Note that in some cases, this step may not be required. If this is the case, continue to the next step.

For example, as part of the malware response example being built on, an automated action could involve locking the impacted Windows laptops from further use until they have been analyzed. This action quarantines the assets, giving the SOC team time to respond further.

Block diagram of an automated workflow that includes a response action (source)

Add orchestration to workflows

In this step, start introducing more complexity to the workflows built so far. This enables the automation to become more dynamic and responsive and continues to offload decision-making from SOC operators, enabling them to focus on higher-level challenges.

For instance, building on the malware response example, two decision points could be introduced:

What should happen if the asset has network evidence of the malware being introduced, but the artifact evidence returns false?
What happens if only one of the three artifacts returns true?

Each of these questions could have a number of answers:

Escalate to the SOC team.
Perform further investigation.
Record and resolve.
Take action.

Here’s a flow diagram of what this might look like:

Block diagram of an automated workflow with multiple decision points (source)

Enforce routine security processes

SOC automation should be utilized to enforce routine security processes. This approach enables organizations to use workflow orchestration and automation tools, ensuring that standard cyber hygiene tasks are consistently performed. By doing so, they can reduce their attack surface and maintain a more secure baseline within their environment.

Two areas where SOC automation can be used to enforce routine security processes are described below.

Cyber hygiene

Cyber hygiene activities help maintain a secure environment. Other organizational events often influence these activities and can significantly impact the overall cybersecurity posture. Unfortunately, because non-cybersecurity-related events drive them, it can be difficult to ensure they are executed correctly.

For example, when an employee leaves an organization, their credentials across various systems should be removed and invalidated. This practice prevents the exploitation of unused credentials and supports the principle of least privilege. However, since this process is tied to human resource events, it can be challenging for SOC teams to verify that it has been completed accurately and in a timely manner.

SOC automation enables organizations to automate these types of processes using the same tooling that their SOCs already use. This can be seen in this workflow orchestration, which automatically removes a user account from multiple different platforms at once.

Routine environment scanning

Regularly scanning an organization's IT infrastructure is essential for maintaining a secure environment. This process offers independent confirmation of other security activities and can help uncover issues that may have been overlooked.

Examples of routine environment scanning include:

SOC automation streamlines these scans and alerts affected teams when problems are detected.

Record all automated actions

When using SOC automation, ensure that every action is recorded in a case management system. This provides accountability and helps reconstruct events if necessary.

Define the information to record

Every organization has unique documentation requirements for specific business, regulatory, and cybersecurity needs. However, the following information should always be recorded as a baseline:

Define where the information will be recorded

Many organizations find it challenging to capture and record the information defined above seamlessly. There are often competing opinions on the relative benefits of using existing case management systems versus having entirely separate and parallel information recording systems. Each option will have various levels of integration with other systems being used in the SOC, along with upsides and downsides for the teams consuming the recorded information.

To help navigate these discussions, take into account the following considerations:

Automation integration: The storage location should easily integrate with the chosen SOC automation tooling. This ensures that information can be added without disrupting orchestration flows.
Security: The storage location and transmission of data should be secure, following the least privilege and confidentiality principles. The data being stored is extremely sensitive, containing sensitive operational and investigation information. Many organizations will take this one step further and classify the security data's sensitivity, then store more sensitive data in entirely separate logical or physical locations.
Availability: The storage location should be highly available. This ensures that it is always accessible by automation tools and when human review is required.

Many organizations succeed by using specialized tools to record automated SOC activities while relying on their existing task management tools for general task-related activities. For example, the Tines workflow orchestration and automation platform includes a fully integrated case management system called Tines Cases. This system tracks and records all information related to Tines workflows while also allowing these workflows to input information into other task management systems, such as Jira, when necessary.

Integrating systems in this manner allows organizations to ensure that their security teams have all the necessary resources to operate effectively while also enabling other teams to utilize task management systems that may better suit their needs.

Did you know Tines' Community Edition is free forever?

No code or low code - no custom development necessary
Integrates with all your systems - internal and external
Built-in safeguards like credential management and change control

See free features

Last thoughts

This article described the best practices for integrating SOC automation into an organization's SOC operations. It covered items such as defining the limits of automated decision-making, the tooling that should be used, and practical ways to improve alert handling and playbook automation.

Organizations can use these best practices to uplift their SOC capabilities with SOC automation. This improves the consistency and effectiveness of their SOC operations, leading to a reduction in mean-time-to-respond and mean-time-to-resolution metrics.

Organizations that effectively implement SOC automation find they can scale their SOC capability using technology rather than human resources. This enables the organization to become more resilient to cybersecurity threats and to create a more secure environment to conduct their business operations.

By product

Professional services

Partners

Tines Blog →

Case studies

Library

University

Tines Explained ↗

Customer center

Careers

About

Tines Store ↗