OSINT Tools and Automation — OSINT — Open Source Intelligence

From Manual Search to Automation

The techniques covered so far — WHOIS, DNS, dorking, username correlation — are perfectly executable by hand, but at scale they become unfeasible. Investigating a single target can involve dozens of queries to different sources, and repeating that process for multiple targets manually consumes enormous time and is error-prone. This is where automation tools and frameworks come in, orchestrating many sources into reproducible workflows.

These tools do not replace the analyst's judgment; they amplify it. They automate collection and correlation, but interpretation, verification, and decisions remain human. A common beginner mistake is to blindly trust a tool's output without validating it. Automation is a powerful starting point, not a conclusion.

A caveat about scope is also in order: many of these tools can generate active traffic toward the target (DNS queries, subdomain resolution). In passive mode they work only with third-party sources, but in active mode they touch the target's infrastructure, which must be done exclusively within an authorized engagement.

Maltego: Link Analysis

Maltego is probably the best-known OSINT visualization tool. Its working model revolves around "entities" (domains, people, emails, IPs, social networks) and "transforms," which are operations that take an entity and return related entities. For example, a transform on a domain can return its subdomains; another on an email can return associated profiles. The result is represented as an interactive graph.

Maltego's strength is precisely that visual representation of relationships. In a complex investigation, seeing the graph of connections between entities reveals patterns that would go unnoticed in a flat list: an email linking three domains, a person connecting two organizations. This ability to "pivot" visually from one datum to another is what makes it so valuable for link analysis.

Maltego offers a free community edition with limitations and commercial editions with access to more data sources. Its ecosystem of transforms, many contributed by third parties, allows integrating very diverse sources. It is especially useful in the analysis and dissemination phases of the intelligence cycle, where the visual clarity of findings is decisive.

theHarvester, SpiderFoot, and recon-ng

theHarvester is a classic and lightweight tool focused on gathering emails, subdomains, hosts, employee names, and open ports from public sources. It is ideal for a quick initial reconnaissance phase: with a single command it queries multiple search engines and sources to build a first map of an organization's footprint. Its simplicity and speed make it a common starting point.

SpiderFoot takes automation much further. It is a framework with over a hundred modules that integrate with dozens of sources (passive DNS, Shodan, Have I Been Pwned, certificate logs, and many more). It runs as a web application, launches a scan on a target, and automatically correlates the findings, generating a complete report. It allows configuring passive mode so as not to touch the target. It is one of the most complete tools for automating an investigation end to end.

recon-ng takes a different philosophy: it offers a modular console interface inspired by Metasploit, with modules for each type of reconnaissance task and a built-in database that stores results. Its design makes it easy to chain modules and build reproducible workflows, which makes it ideal for those who prefer granular control and the command line. Alongside these, specialized tools like Amass (subdomains), Sherlock (usernames), or Photon (web crawling) round out the investigator's arsenal.

Building a Workflow

No tool does everything, so the real art lies in combining them into a coherent flow. A common pattern begins with broad automated reconnaissance (theHarvester or SpiderFoot in passive mode) to get a first overview, continues with specialized tools to dig into specific areas (Amass for subdomains, crt.sh for certificates, Sherlock for usernames), and culminates with Maltego to visualize and analyze the relationships among all the findings.

API key management is an important practical detail. Many sources (Shodan, Censys, Hunter.io, VirusTotal) offer programmatic access via API, often with limited free quotas. Configuring these keys in the tools multiplies their reach, but requires caring for their security: they must never be uploaded to public repositories or shared, an OPSEC lesson we will cover in the next lesson.

Above the choice of tools comes documentation. Every automated workflow must record what was queried, when, and with what result, to guarantee the traceability and reproducibility of the investigation. A good practice is to standardize the output format and centralize findings in a single knowledge base, so that subsequent analysis relies on clean data, dated and attributed to its source.