
Origin discovery
Bachelor Thesis - Origin discovery of DPS Provider

In my bachelor thesis, I delved into an assessment of DDoS Protection Service (DPS) providers, specifically DOSArrest and Akamai. The overarching objective was to explore the potential vulnerabilities in their protective mechanisms. The scope of my research was broad, spanning an examination of the top one million frequently visited domains.
The initial phase entailed categorizing domains to ascertain their affiliation with either of the DPS providers or other entities. This process led to the identification of approximately 5500 domains associated with Akamai and around 260 domains linked to DOSArrest. Subsequently, the task of penetrating the route through the DPS providers to reach the domain's origin server was tackled. To achieve this, common attack vectors such as subdomain scanning and DNS probing were employed. Analyzing the IP addresses retrieved through these vectors involved comparing the content of pages returned by sending requests to both the IP address and the DPS provider. The assessment involved gauging the similarity of the two pages, a calculation facilitated by the Levenshtein distance algorithm. However, differentiating between DPS providers, hosting providers, and proxy servers posed a challenge due to the potential for traffic redirection to the domain's DPS provider.
In the case of Akamai, we successfully detected approximately 1200 origin IP addresses, with DNS entries emerging as the more susceptible component.
In addition, an innovative attack vector was introduced into the methodology. This vector revolved around analyzing historical snapshots of domain websites stored in the Internet Archive. Detailed scrutiny was extended to yearly and monthly versions for each domain. The scrutiny involved identifying hardcoded IPs and referenced subdomains. This approach yielded the revelation of origin IP addresses. With respect to Akamai, we unveiled around 180 vulnerable domains, constituting approximately 3.5% of the total.
Moreover, the study encompassed the execution of five comprehensive large-scale scans. These scans were pursued for domains that had not yielded origin IP addresses through previous methods. We conducted this by forwarding domain-as-header requests to all conceivable IPv4 addresses and subsequently comparing the outcomes. These analyses involved nearly 14 million successfully retrieved pages for each domain. Despite an evaluation interval of roughly 150 hours, the origin remained elusive in all but one case. For that particular domain, the margin was exceptionally narrow, with only 28 IP addresses remaining to be probed.