Advanced Search and Google Dorking
The Power of Advanced Search
Search engines are probably the most underrated OSINT tool. Most people just type loose words, but search engines offer a language of operators that turns a generic search into a surgical query. Mastering these operators lets you find exactly what you are looking for and discard the noise, multiplying the investigator's effectiveness.
The set of techniques that combines these operators to discover specific information is known as Google Dorking (or Google Hacking, a term coined by Johnny Long). Despite the name, the technique applies to almost any engine — Bing, DuckDuckGo, or Yandex have their own operators. The core idea is the same: instruct the engine to return only pages that meet very precise conditions.
It is essential to understand that Google Dorking does not access anything that is not already indexed and publicly accessible. It finds what is exposed, not what is protected. That is why it is such a valuable tool for defensive audits: it reveals what sensitive information of an organization has unintentionally ended up within reach of any search engine.
Fundamental Operators
A few operators form the basis of all dorking. site: limits results to a specific domain: site:example.com shows only pages from that site, ideal for mapping everything indexed about a target. filetype: (or ext:) searches for specific file types: filetype:pdf or filetype:xlsx help find documents that often contain internal information and metadata.
The operators intitle: and inurl: search for terms in the page title or in the URL respectively, very useful for locating admin panels or specific pages (intitle:"index of" reveals open directory listings). Quotation marks force an exact phrase match, the hyphen - excludes terms, and the OR operator (in uppercase) broadens the search to alternatives.
The real power appears when you combine them. A query like site:example.com filetype:pdf "confidential" searches for PDF documents marked confidential on a specific domain. In an audit, this kind of query can reveal real leaks: internal documents, exposed backups, or credentials in configuration files indexed by mistake.
Responsible Dorking and the Google Hacking Database
There is a public repository, the Google Hacking Database (GHDB) maintained by Exploit-DB, that catalogs thousands of known dorks: queries that reveal exposed configuration files, accessible cameras, revealing error messages, or leaked credentials. It is an excellent learning resource for understanding what kinds of exposure exist and how to formulate effective queries.
The responsible use of these dorks is key. Searching for exposed information about your own organization, or about a target within the scope of an authorized pentest, is legitimate and very useful OSINT. However, using the information found — for example, leaked credentials or exposed panels — to access third-party systems without permission is illegal, regardless of how easy it was to find.
Dorking is, in this sense, a double-edged tool that the ethical investigator uses primarily to defend. Finding that a sensitive document is indexed allows requesting its removal; discovering an exposed panel allows closing it before an attacker finds it. The value lies in prevention, not exploitation.
Specialized Engines and Sources
Beyond Google, there is an ecosystem of specialized search engines covering specific niches. We already mentioned Shodan, Censys, and crt.sh for infrastructure. For people and professional data, platforms like LinkedIn or email finders like Hunter.io provide different angles. For source code, GitHub and GitLab have their own search engines, where it is common to find secrets accidentally leaked in commits (API keys, passwords in configuration files).
Time machines like the Wayback Machine (archive.org) are indispensable: they let you view old versions of a web page, recovering content the target has already deleted but that remains of interest. A contact email, a directory structure, or a document that no longer exists on the current site is often still accessible in the historical archive.
Finally, metasearch engines and aggregators combine multiple sources into a single query. The key to success is not knowing a single magic tool, but knowing which engine is right for each type of data and combining several to corroborate findings. Advanced search is the backbone of OSINT: almost every investigation starts and returns, again and again, to a well-used search box.