Open Source Intelligence Gathering
Open Source Intelligence (OSINT) is data collected from publicly available sources.
12 additional techniques for doing OSINT
1.SSL/TLS certificates have a wealth of information that is of significance during security assessments.
An SSL/TLS certificate usually contains domain names, sub-domain names and email addresses. This makes them a treasure trove of information for attackers.
Certificate Transparency(CT) is a project under which a Certificate Authority(CA) has to publish every SSL/TLS certificate they issue to a public log. Almost every major CA out there logs every SSL/TLS certificate they issue in a CT log. These logs are available publicly and anyone can look through these logs. We wrote a script to extract subdomains from SSL/TLS certificates found in CT logs for a given domain.
2.WHOIS service is generally used during a penetration test to query information related to registered users of an Internet resource, such as a domain name or an IP address (block). WHOIS enumeration is especially effective against target organisations that have large presence on the Internet.
Let’s look at some advanced WHOIS queries to gather information —
We can query ARIN WHOIS server to return all the entries that has email address of a given domain name, which in this case is icann.org. We are extracting only the email addresses from the results.
3.Finding Autonomous System (AS) Numbers will help us identify netblocks belonging to an organisation which in-turn may lead to discovering services running on the hosts in the netblock.
Resolve the IP address of a given domain using dig or host
4.Usage of Cloud storage has become common especially object/block storage services like Amazon S3, DigitalOcean Spaces and Azure Blob Storage. In last couple of years, there have been high profile data breaches that occurred due to mis-configured S3 buckets.
In our experience, we have seen people storing all sorts of data on poorly secured third-party services, from their credentials in plain text files to pictures of their pets.
There are tools like Slurp , AWSBucketDump and Spaces Finder to hunt for service specific publicly accessible object storage instances. Tools like Slurp and Bucket Stream combine Certificate Transparency log data with permutation based discovery to identify publicly accessible S3 buckets.
5.Wayback Machine is massive digital archive of the World Wide Web and other information on the Internet. Wayback Machine also contains the historical snapshots of websites. Wayback CDX Server API makes it easy to search through the archives. waybackurls is neat tool to search for data related to a site of interest.
Digging through Wayback Machine archive is quite useful in identifying subdomains for a given domain, sensitive directories, sensitive files and parameters in an application.
6.Common Crawl is a project that builds and maintains a repository of web crawl data that can be accessed and analysed by anyone. Common Crawl contains historical snapshots of websites along with metadata about the website and services providing it. We can use Common Crawl API to search their indexed crawl data for sites of interest. cc.py is a neat little tool to search for crawl data for sites of interest.
7.Censys is a platform that aggregates massive Internet wide scan data and provides an interface to search through the datasets. Censys categorises the datasets into three types — IPv4 hosts, websites, and SSL/TLS certificates. Censys has treasure trove of information on par with Shodan, if we know what to look for and how to look for it.
8.Censys project collects SSL/TLS certificates from multiple sources. One of the techniques used is to probe all the machines on public IPv4 address space on port 443 and aggregate the SSL/TLS certificates they return. Censys provides a way to correlate SSL/TLS certificate gathered with IPv4 hosts that provided the certificate.
Using correlation between SSL/TLS certificates and the IPv4 host that provided the certificate, it is possible to expose origin servers of a domains that are protected by services like Cloudflare.
9.Source code repos are a treasure trove of information during security assessments. Source code can reveal a lot of information ranging from credentials, potential vulnerabilities to infrastructure details etc. GitHub is an extremely popular version control and collaboration platform that you should look at. Gitlab and Bitbucket are also popular services where you might find source code of a target organisation.
10.Forward DNS dataset is published as part of Rapid7’s Open Data project. This data a collection of responses to DNS requests for all forward DNS names known by Rapid7’s Project Sonar. The data format is a gzip-compressed JSON file. We can parse the dataset to find sub-domains for a given domain. The dataset is massive though(20+GB compressed, 300+GB uncompressed). In the recent times, the dataset has been broken into multiple files based on the type of DNS records the data contains.
11.Content Security Policy(CSP) defines the Content-Security-Policy HTTP header, which allows us to create a whitelist of sources of trusted content, and instructs the browser to only execute or render resources from those sources
12.A Sender Policy Framework(SPF) record and is used to indicate to receiving mail exchanges which hosts are authorised to send mail for a given domain
Simply put, an SPF record lists all the hosts that are authorised send emails on behalf of a domain. Sometimes SPF records leak internal net-blocks and domain names.