BreachSight: an Engine for Securing Data Leaks

This post was originally published here by UpGuard.

When we began building a Cyber Risk Research team at UpGuard, we knew there were unavoidable risks. We would be finding and publishing reports on sensitive, exposed data in order to stanch the flow of such private information onto the public internet. It seemed likely the entities involved would not always be pleased, particularly as the majority of the exposures we discovered would be attributable to human error and/or internal process failures. As a team, however, we believed those were risks worth taking. By performing the public good of securing exposures and raising awareness of this problem, UpGuard could help to spur long-term improvements and raise awareness of the growing data security epidemic.

There is, we have learned, far more exposed data than we can secure using the methods with which we started this mission. As our team’s capabilities and methods have expanded, more and more exposures have been discovered– more than we can secure using a process-based on outreach to the responsible parties. For that reason, we are launching a new product and responsible disclosure program to give concerned organizations visibility into their own data exposures and, ultimately, the ability to secure those exposures before they find their way into the wrong hands.

Process Controls

We began the Cyber Risk Research team in 2017 with a set of process governance documents to ensure that everything we did would be ethical, even as we also continued to sell our enterprise software products. Following a thorough legal review, we received approval from our legal counsel of process guidelines that addressed both our concerns and our counsel’s. Among these mandates were measures ensuring data integrity, avoiding any real or perceived threat of extortion, and preventing the dissemination of any of our information to potentially malicious actors. But there remained many unknowns relating to the responses we would face from third parties. How would companies respond to our notifications? Would they try to buy us off? Given that we are only able to report on publicly accessible data, would we simply run out of available data to write about?

The Good

With only a year’s work, our Cyber Risk Team has accomplished some remarkable achievements. We have secured hundreds of millions of exposed records and built close working relationships with legal, law enforcement, and policy-making organizations across the world, helping to educate stakeholders about the real dangers of cyber risk as they exist today. When we talk with people outside of our company, we hear them taking seriously the issues our research has raised and, promisingly, taking action to address the real sources of risk.

Security By Design

We have even seen changes from technology providers like Amazon Web Services as they make the security features of their products more apparent, helping operators use their services securely. While it is no secret that many of our reports deal with exposures in Amazon’s S3 service, due in particular to the way in which users can reconfigure default bucket settings to allow public access, Amazon has not attempted to solve this problem by stifling researchers. Instead, Amazon has done the more admirable and sustainable work of educating their users and improving their products. The results speak for themselves. While the UpGuard Cyber Risk Team still finds open S3 buckets improperly configured for public access, we can confirm there are far fewer of those today than before Amazon’s corrective measures were effected.

Growing Cyber Resilience

We have been heartened to receive many positive responses when notifying affected entities of data exposures related to their organization. Many have responded quickly to our notifications, working efficiently to secure the data, inform affected parties, and act with transparency and integrity throughout. The realization the Cyber Risk Team confronts every day is that data is everywhere. Modern businesses simply generate too much data for it to be easily controlled. The result: every organization has suffered or will suffer some kind of data exposure – and when they do, it will be just a matter of time before someone finds it. What differentiates organizations with strong security practices from their less diligent peers is whether the enterprise is prepared to respond effectively once they become aware of a leak. While there are many discouraging trends in cybersecurity, there are also businesses excelling in the pursuit of cyber resilience.

The Bad

There have also been some less than ideal outcomes. Learning from those has allowed us to improve our processes in order to ensure the continued operational capability and safety of our team, of affected entities, and of consumers. There are several factors that constrain the number of breach notifications we can responsibly manage in an outbound communication model. First, there are the challenges of attribution and significance. It takes time to fully examine a data set and determine to whom it belongs and how serious it is. Second, there is the time involved in contacting that party and getting them to acknowledge and secure the exposed data. We often must reach out over multiple channels to multiple people before finding someone who will respond, and even longer before that person can escalate the issue to get it resolved.

Initial Contact

Many businesses are unprepared for the possibility of a third party needing to disclose a security incident to them, and may react slowly or fearfully when contacted. In a world where data breach notifications are among the most effective used in phishing attacks, such caution is understandable. We often must pursue every avenue of contact to reach someone to whom we can responsibly disclose a finding – emails, phone calls, and messages, often sent repeatedly. Proposed standards like a security.txt file aim to streamline this part of the notification process, but many businesses lack staff members prepared for the day they receive a security disclosure.

Time to Repair

After making contact with an affected entity, finding someone within the organization who can and will secure the exposure sometimes takes much longer than we would expect. While some enterprises have been able to remove public access within hours or even minutes, others have spent weeks failing to fully remediate the issue – despite knowing sensitive customer data remains exposed. As the technology for cloud storage has grown more powerful and easy to use, it has become more accessible to people who may lack expertise in disciplines like access control management. Similarly, third and fourth party risks when the data is exposed by a third party, and that third party’s IT is in turn managed by another third party,

The Ugly

We have faced litigation in the past and are familiar with its role in achieving business outcomes. Our goal in all of these dealings has been not just to reach commercially viable outcomes, but to maintain our integrity while doing work that we believe in. We understand that legal discussion has a place in security incident response, but also know that you can’t litigate your way into information security. In addition, the costs involved in defending lawsuits can have a prohibitive effect on businesses of our size, even if the contentions at hand are baseless. Our goal at UpGuard is to expend resources on improving our ability to detect and report data exposures, not on legal counsel.

To that end, we must avoid legal entanglements when we know we can do so. If a company has a documented history of launching legal action against a security researcher, we cannot disclose an exposure to them. While it is our goal to help all organizations, we cannot in good faith knowingly report a data exposure if doing so could lead to a protracted legal battle that could severely impact UpGuard’s operations. When we find sensitive data exposures related to these organizations, we will attempt to find alternate means of communicating the information to them, such as notifying relevant regulatory bodies, so as to continue to protect the best interests of their business and their customers.

As always, we will use our discretion regarding which findings result in disclosures. There is simply too much exposed data for us to clean it all up. We appreciate the cooperation that so many businesses have shown us, and hope such collaboration will provide the model for breach response in the future.


Rather than finding that the amount of exposed data was diminishing as we reported it to responsible parties and drove best practices, we found that as we improved our methods, the mountain of exposed data needing to be secured instead grew larger. Our major bottleneck was the outreach process: identifying the data’s owner, finding an appropriate person to whom we could disclose it, and rate limiting our disclosures to maintain an acceptable level of business risk from potential litigation. When we did not have these impediments– when we had trusted contacts at affected entities– data was consistently secured faster.

At the same time, we were regularly fielding inquiries from organizations asking us to audit them for data exposures. As unpleasant as it might be to learn that your sensitive data has been exposed, it is undoubtedly better to know sooner than later, and for it to be reported by UpGuard than for it to be silently replicated by people seeking to exploit that information.

As these patterns repeated themselves we concluded we had to do something differently. The logical conclusion is BreachSight: an offering whereby organizations can come to us, circumventing the negative factors that prolong and complicate the notification process. The research team will continue to look for vectors for data leaks and to notify as many affected entities as possible with the constraints imposed by their process. We are hopeful, however, that through BreachSight we will have a vehicle to accelerate the rate at which we remove sensitive data from the public internet.

Most importantly, we have supplemented our outbound notification method with an inbound disclosure program. Organizations that want to understand their exposure can approach us to receive an assessment, rather than waiting and wondering whether their data has leaked by some unknown side channel.


With this addition to our product portfolio, and some slight changes to our notification process, we hope to minimize the time spent on administrative overhead and do more of the work we love: identifying the sources of cyber risk and helping the companies eager to remediate them. As regulations like GDPR and and Australian Privacy Amendment come into effect, we believe that companies will take more seriously the risks posed by exposed data, from PII to system credentials to source code. With BreachSight, CyberRisk, and Core, we aim to offer a set of products that address the real problems in information security now and in the future.



No posts to display