People have long looked at automation as a great way to overcome the limitations of human bandwidth, and get more tasks done in a shorter amount of time. Does the same principle apply to automating OSINT data collection? Let’s look into it. But first, let’s unpack some of the terms.
If you are not familiar with OSINT – it’s basically a catch-all term referring to investigations using intelligence gathered from sources that are available to the public, like the regular surface web, along with deep, or hidden web, where data is not indexed for search engines, but is still accessible, although it may require registration or sit behind a paywall. Naturally, OSINT also involves a variety of non-digital elements, but today we are going to focus on online sources.
Data collection is the first part of the investigative process that includes many elements, including data gathering, storage, processing, analysis and distribution.
And of course, automation means that we are going to be doing it automatically, with minimal manual input.
Well… and here comes the first catch: collecting data for investigations can quickly result in information overload. The firehose approach of gathering anything and everything that might be useful doesn’t work here. The last thing investigators need is to waste precious time sorting and filtering copious amounts of data while they are trying to actively stop the cyber threat from spreading or quickly get to the root of a problem. So, for data collection, automation actually involves a fair amount of human involvement, at least at the start of the process.
Analysts are always under pressure. Especially when they are investigating a fast-moving incident or impending threat, they can’t afford to waste any time – researchers need to process as many data sources as possible in the shortest amount of time. And this is where automation is most valuable.
Automation can’t help you fine-tune your collection strategy, or decide how and where to collect data that’s accessible, usable and easy to understand. What it can do is help you target more sources in less time, removing the human bandwidth limitation, increasing output and productivity, and saving valuable time to remediate issues faster. So, yes, you still need humans to help decide what data to collect, but once the parameters are set, automation can be far more productive, getting to more sources faster (possibly before they are taken down) than people ever could.
OSINT is a fast-growing, multi-faceted discipline, and an increasing number of organizations, even beyond financial corporations and federal and law enforcement agencies, are investing in tools that can help make their analysts’ jobs easier and accelerate issue resolution times.
Among many important considerations for OSINT automation tools are:
- How they manage footprint and attribution;
- Whether they can rotate IP addresses and imitate various locations and time zones;
- How effectively they can protect networks from accidental exposure to malware;
- The ease of storing and sharing sensitive data;
- Whether they can comply with industry and company audit requirements.
The more sophisticated your adversary is – more time and effort is required to set up a successful OSINT strategy. With data constantly changing, the number of sites analysts need to investigate grows every single day. Automation – especially using the right tools and techniques – can help ensure that teams are gathering the most relevant data as quickly and efficiently as possible, while keeping investigations – and investigators – secure.
If you are struggling with OSINT data collection or what to learn more about best practices and techniques for automation, join me for a webinar on March 24, Out in the Open: Automation of OSINT for CTI. In this webinar, I will discuss:
- Existing methods of OSINT collection and how analysts can leverage them for secure data gathering, analysis and storage
- Best practices and techniques that can help you meet mission requirements while keeping your enterprise secure
- How to use automation to gather the most relevant data, with minimal manual intervention
March 24, 2021 | 10 a.m. PT