Insider Threat Hunting with Datadog and CrowdStrike
Security is Built on Trust
Rarely are all the elements for confident decision making located in a single silo, repository, or team. Equally, security patterns that are mandated for accessing disparate or remote data sources may be a mixture of 'push' or 'pull' depending upon the boundaries, zones, or enclaves to be crossed. So, when signals can be gathered from multiple sources such as Datadog and CrowdStrike across assets, our decision making and defenses become better, faster, and smarter.
At its core, security is about trust, transitive trust. A trusts B and B trusts C, so effectively this means that A trusts C. Very quickly we realize that there is a web of trust amongst systems and endpoints that form linear or hierarchical patterns, some - stars, and others, vastly more complex network graphs. As complexity and dependencies increase, so too does risk. Nefarious entities look to pivot through and exploit these transitive trusts.
Whether we focus on increased observability or greater controllability we face a constantly morphing attack surface, one for which the related risk must be managed. This is an especially acute challenge when continuous deployment accelerates the rate of change. The potential for new misconfigurations and vulnerabilities introduces a greater need to defend against novel threat vectors, and faster than before.
In the context of the OODA loop (a term borrowed from military parlance which looks to support decision making and actions i.e. 'Observe, Orient, Decide, and Act'), the depth, breadth, and complexity of an installed fleet challenge both real-time and retrospective human capabilities. So we look to instrument and aggregate our data back to centralized locations for it to be efficiently parsed, processed, and prioritized. With basic pattern matching, we try to separate signal from noise, yet to detect security signals, such as IoAs(Indicators of Attack) or IoC(Indicators of Compromise) we require multiple forms of enrichment including that of up-to-date threat intelligence. This then gives us the confidence we need to identify and classify malicious actors, agents, and intent. Many of us have jumped to conclusions early on during investigations when in actual fact what we were dealing with was just a benign outlier. So, how do you constantly decrease your MTTR(Mean Time to Respond) and reduce Dwell Time with confidence?
Datadog Security Monitoring
Recently Datadog has launched their Security Monitoring with Detection Rules. If you already have trusted Datadog agents installed on hosts gathering telemetry and logs then it's one of the logical choices for data-mining and subsequent threat hunting for security signals. Datadog provides some default Detection Rules 'out-of-the-box' which look for common TTPs(Tactics, Techniques, and Procedures). Datadog conveniently maps and links to the relevant Mitre Att&ck framework tactic and technique IDs (easily navigable here).
Datadog's Detection Rules can surface a range of suspicious low level atomic or pattern-based attributes across cloud providers, host OS's, orchestration platforms, and even containers, middleware or applications. This then provides a great jumping-off point for further threat hunting, analysis, and automation.
However, the question is, after this first pass over high-risk assets, can you find better signals in the noise? What series of consistent and repeatable steps do you take next and how? Time is of the essence, so what further enrichment, hunting, and decision points (or logic) will determine the resources you expend and workflows or playbooks you use across other platforms to mitigate risk (or neutralize threats)?
Tines, A Multi-Pronged Approach
Security workflows require a high degree of confidence and speed. They require consistent and repeatable actions, actions that are impervious to human error, and resistant to alert fatigue. A simple yet powerful workflow automation engine mitigates and reduces these risks but not just for security-related outcomes, it also aids in fostering collaborative cultures. Teams that trust one another's decisions move faster as a whole and when errors are minimized (and reaction speed doesn't undermine confidence but actually bolsters it), moving quickly and confidently helps to form better bonds throughout an organization.
This is where Tines excels with a flexible automation engine that matches your security posture and needs. Teams that need to securely draw from multiple data sources and use a range of patterns can use Tines to carry out their defensive work at scale and across multiple types of secure channels.
Let The Hunt Begin
We will use Datadog Security Monitoring as a means to identify scans and classify risks originating from potential insider threats i.e. those sourced from other managed assets such as user endpoints be they inside or outside our network footprint. Endpoints that use EDR(Endpoint Detection and Response) platforms like Crowdstrike (or similarly Sentinel One or Carbon Black) may be being used to surreptitiously probe our higher value assets. By combining the two sources we strengthen the signal. The example below highlights a mixed approach where Datadog and Crowdstrike’s native agents may not intersect on all asset types, but can provide complimentary sources for security teams to enrich events and automate detection and response workflows. There are many possible jumping-off points within the below logic but we will focus on a somewhat linear story for simplicity sake (albeit we're sure you could quickly augment the story with additional integrations, branches, and further actions).
We start with the Tines storyboard and some of the simple prebuilt actions which will leverage the Datadog API (one of our featured integrations). It's a simple 'drag and drop' to form these actions into a story for each of your team’s workflow needs.
Our primary goal at the start is to extract the 'scanning' source IP addresses (and some other TTP information which we may use later). For demonstration purposes, we show how you could also treat IPv4 or IPv6 addresses differently. We also run the addresses through the Abuse IPDB and could take additional steps based upon the returned confidence rating (such as sending an immediate alert to PagerDuty). For now though, we've built a list of suspicious scanning source IPs and deduplicated the Datadog events so we don't double handle them across a search window.
Note: You can see the first raw event received by the first step from the Datadog event stream in the partial JSON below:
We then move on to checking with CrowdStrike if any of those scanning source IPs (be they from internal or external NETBLOCKs), existed on any of our managed endpoints. User's endpoints may have RFC1928 addresses, public addresses, or represent themselves with public addresses (including those from NAT gateways). An attacker with a foothold on a user’s machine may be using it to do further reconnaissance inside the organization’s network (from either inside or outside). CrowdStrike constantly records the directly assigned IP and perceived 'remote IP' i.e. the public-facing IP (from CrowdStrike's cloud perspective!) on all hosts with their actions installed. This enables us to begin to marry up information about any low, medium, or high-grade detections from the user fleet that may have been used to scan or attack our other assets. This approach enables us to take low-level operational data and security signals from assets that may not have more complex EDR capabilities and then correlate those events with a range of actions taken from user endpoints.
Here we deduplicate the IP addresses (per story/workflow run only as we may see the same IPs become active again over time). We want to be able to query CrowdStrike for any new detections rather than arbitrarily ignore all IPs in the future. We also use triggers to create a temporary IP 'type' for use in any future queries or stories.
Towards the end, we chose to only pursue detections with a 'max_severity_display' of 'high' and then send those events on to another automated and modular remediation story, one which contacts and prompts the asset owner to acknowledge a security message, and then asks them to contact support immediately (while also containing the endpoint and automatically opening a support ticket for remediation activities).
This is just one simple but powerful (and somewhat linear) story but we hope you can see how fast and easy it is to branch and build your own stories. You can even download the above story here to try out yourself. This type of automation empowers your team, reduces toil, and increases consistency and effectiveness. The automation and integration possibilities are endless and you don't need to be a programmer to automate these workflows. Consistency, clarity, and correctness builds trust. Automating toil enables you to scale further, move faster, and spend less time in the weeds! Check out some of the other hundreds of prebuilt actions and integrations here.
*Please note we recently updated our terminology. Our "agents" are now known as "actions," but some visuals might not reflect this.*