Exact Match Cures Healthcare’s DLP Woes


This post was originally published here by  Rich Campagna.

We recently hosted a Bitglass customer roundtable during one of our training classes. The roundtable included several large healthcare organizations, and the topic quickly veered towards Data Leakage Prevention and using standard keyword and regular expression patterns to detect Protected Health Information (PHI). I was shocked when I heard one of the participants state that their premises-based network DLP system generated 99% false positives, and sadly, several people around the table nodded their heads in agreement! 

The challenge? Most endpoint, network, and cloud access security broker (CASB) DLP systems include pre-built patterns for PHI. These patterns are built leveraging keywords and regular expressions. Keyword datasets typically revolve around drug names, diagnoses and conditions, and specific terms used in patient data, such as Medical Record Number or MRN. Regular expressions are typically built to identify the specific structure of an organization’s MRN. 

While these patterns offer a starting point for PHI detection, they are problematic for a number of reasons. First, many of the keywords used in such a pattern are regularly used across a healthcare organization, regardless of whether specific patients are being discussed. Second, MRNs may take many different forms (for example, many hospital systems have grown via M&A, with each acquired entity bringing in a unique MRN), and they may resemble other types of data (for example, a 10-digit MRN might be difficult to distinguish from a US phone number). Techniques such as proximity and occurrence based matching can be used to minimize false positives, but the problem still may persist.

For these reasons, we have noticed that our healthcare customers are increasingly turning to exact match DLP policies. Exact match works by matching on actual data from a database, such as actual patient names or the specific medical record numbers that have been assigned to patients. Such systems work by taking data out of an EMR system like Epic, tokenizing it, then uploading it into the DLP system. Such an approach minimizes or even eliminates false positives in the DLP system.

Unfortunately, with millions or even billions of records in a large healthcare provider, most network and endpoint DLP systems are unable to scale to the large amounts of data leveraged by the exact match technique. With the move to cloud-native, auto-scaling CASBs, such limitations are no longer an issue, and techniques like exact match can be leveraged to provide accurate, useful detection and control of PHI. 

To learn how Bitglass can help with HIPAA compliance, download the report below. 



No posts to display