Data leaks and information disclosure caused by employees is an issue security teams regularly contend with. Committing credentials to Github is one of the more well-known ways this issue arises. Recently, posting sensitive data on public Trello boards has also made headlines. In this post, we explore how security teams themselves often unintentionally expose sensitive company information.
Introduction
Cyber security teams analyze URLs and files to determine if they represent a threat to their organization. This requirement might arise while investigating a suspicious email sent to an executive staff member. Or while reviewing web traffic from an infected endpoint. File and URL investigations can be time-consuming if performed manually. As such, creative security engineers have developed a number of solutions to automate and streamline this process in a safe way. We call these tools “sandboxes”.
Many of the most popular sandboxes (see five common examples below) are free and made publicly available to security teams and researchers. These sandboxes are incredibly useful resources and all security teams should be aware of them, however, like every tool, when misused, they may actually case data leaks in your company.
Although the exact mechanics of each sandbox varies, broadly they operate something like this:
User (usually a security analyst) submits a suspicious file or URL to a sandbox.
Sandbox analyses the behavior of the submission (by opening the file or visiting the URL) and provides the user with analysis results allowing them to determine if the URL or file represents a threat.
Sandbox stores and makes publicly searchable the results of the analysis so other companies may inform and protect themselves.
How security teams can cause data leaks
The problem of leaked data arises when a user submits a legitimate URL or file which leads to or contains, sensitive information, to a sandbox. By design, sandboxes record and make this sensitive information public. Additionally, as many public sandboxes provide APIs allowing programmatic submissions, sensitive information being “sandboxed” inadvertently by security teams is increased. For example, at Tines we regularly see security teams sandboxing every URL in every email that comes from an external source to an employee. This is fantastic from a threat detection perspective, but unless filtering and redaction occur before sandbox submission, it’s almost certain that sensitive content will also be sandboxed.
To understand how widespread this subtle form of data leakage was, I spent a little time searching sandboxes for sensitive content. It’s important to point out that the services hosting the exposed content (Dropbox, Google Docs, etc.) are not at fault here. What happens to the URLs/emails after they are correctly sent to their intended recipient is largely out of their control. (The argument that some of this content should be behind additional authN/Z is outside the scope of this post.)