It’s not ‘See you later.’ It’s ‘Goodbye’: Moving on from Tokenization in the age of Ransomware

    By Arti Raman, CEO, Titaniam

    Encryption-in-use, a.k.a. data-in-use encryption, is changing the data protection landscape and could spark a cybersecurity movement that dwarfs tokenization in both usage and magnitude of impact.

    Tokenization was invented a little over twenty years ago in 2001 to address the risk of losing cardholder data from eCommerce platforms. When created, it did an excellent job of meeting its single goal–enabling payment card transactions on e-commerce platforms without storing payment card data on the platforms themselves.

    The concept was simple: swap payment card numbers for substitute numbers, i.e., tokens, where there is a 1-1 correlation between a token and its underlying card. The token itself bears no monetary value and is only a stand-in of the actual payment card number. Transactions could now flow through entire financial workflows without risking payment card compromise. Financial institutions could “clear” these by matching tokens with the original payment cards in highly secure back-end environments.

    This design was so secure and successful that it became, and remains, the gold standard of how the financial services industry protects its most sensitive data.

    In recent years, cyber attackers began to go after a wide variety of data beyond just payment cards. As these attacks started to play out and organizations began to experience the loss of other types of Personally Identifiable Information (PII) and Protected Health Information (PHI), the cybersecurity industry naturally responded with one of its most successful data protection technologies, i.e., tokenization.

    The market became filled with tokenization solutions that could be applied to much more than payment card data. Financial institutions, healthcare organizations, government agencies, and many others, who wished to protect more data than just payment cards, all bought into the idea that they should swap out sensitive data for tokens as data flowed through their systems. Hundreds of millions of dollars were spent implementing tokenization and integrating it into complex enterprise workflows.

    However, billions of PII records continue to be lost to cyberattacks from the same enterprises that have spent hundreds of millions on tokenization. According to BigID, an alarming 80% of compromised data contains PII — which is often highly sensitive, vulnerable, and regulated.

    This mess has a rational explanation and could have easily been predicted if one thought about what tokenization was designed to do versus how it was applied.

    The invention of Tokenization was intended for payment card data. Payment card data never needed to be subject to complex searches or analytics. It never needed to be indexed for things like wildcard searches. It was not required to be broken down into rich analytics functions or otherwise manipulated. All one needed to do with payment card numbers was let them flow through transactions and track how much was spent against them. The card number was important and valuable, not in its essence but in the purchasing power it represented.

    This is not true about other types of sensitive data.

    Let’s take names and addresses, for instance. Names and addresses must be populated in enormous databases and indexed for search. They can be misspelled and should still be retrievable. We expect to retrieve them by partial word searches and sometimes like to retrieve similar names and run them against fraud algorithms. Addresses have the same characteristics. We might be interested in looking at names at similar-looking addresses. When we get to more general types of valuable data like intellectual property, it exhibits even fewer similarities to payment card data. Applying tokenization concepts to all this makes very little sense.

    But enterprises did it anyway and ended up with massive complex implementations where tokens were swapped in and out of systems and reconciled in token vaults. Whenever the enterprise was faced with sensitive data that needed to be richly searched and analyzed, they did one of two things – they either decided to process it in clear text and accept the risk of losing it in a data breach, or they decided to implement large scale batch detokenization processes, placed enormous volumes of detokenized sensitive data in “walled gardens.” They hoped it would be securely handled and ultimately be deleted. This process was and remains slow, cumbersome, and not very secure. It also defeats the very purpose of tokenization in the first place.

    These days, these clear text analytic stores that were either never tokenized or formed after de-tokenization represent the largest concentration of sensitive data risk in enterprises with otherwise strong security practices.

    Every time ransomware attackers make their way into enterprises and steal privileged credentials, they look for these large repositories where valuable data is queried and analyzed as it supports business processes. They access these as admins would and leave with millions of records of sensitive data in cleartext. This data is then used to extort victim organizations, customers, and partners. It does not matter that the victims have their backup and recovery systems in order. The stolen data becomes the primary reason for extortion, and it represents a very long tail of impact as this data ultimately makes its way to the dark web for sale. In fact, the average cost of a ransomware attack in 2021 is $1.85 million, which is almost twice what it was the previous year.

    With organizations collecting more sensitive data than ever before and utilizing it for business insight, it is no surprise that the average organization can only protect less than 10% of its sensitive data using tokenization. In light of the significant threat of ransomware and extortion looming over organizations, the extent of coverage needs to improve!

    What about encryption? Do these enterprises not encrypt this data, and why does this not help?

    The answer to that question is that many victim enterprises do encrypt. All encryption, however, is not equal. Data exists in three states: at-rest, in-transit, and in-use. Until recently, the only types of encryption that have been viable have been encryption for data-at-rest (aka encryption-at-rest) and encryption for data-in-transit (encryption-in-transit). These types of encryption do not address the risk of losing data to hackers with credentials. Once hackers have access to highly privileged datastore credentials, encryption-at-rest and in-transit fall away, and it is easy for hackers to query datastores or dump their contents en masse.

    So tokenization provides insufficient coverage, and traditional encryption does not stop hackers with credentials. What can enterprises do?

    What is more relevant is encryption for data-in-use, also known as encryption-in-use.

    Encryption-in-use is a new type of encryption that stays on even when data is being actively utilized. Datastores utilizing encryption-in-use can keep data encrypted even while running rich searches such as prefixes, suffixes, wildcards, and ranges. Queries and analytics can be supported without data decryption, and result sets come back in encrypted form. Hackers are accessing memory encounter only encrypted data. Even if the most privileged credentials are applied to the datastore or application and all of its data is dumped, it is only available in encrypted form.

    Good security practices require encryption keys to be securely housed externally in the datastore. With encryption-in-use, sensitive data that could otherwise not be tokenized due to analytics needs can be secured appropriately while still being available for rich business use.

    Encryption-in-use supports various architectures, including being directly utilized in datastores/apps or tokenization vaults where tokens are provided to primary datastores/apps. At the same time, the original data resides in a vault. With encryption-in-use, the original data can be kept encrypted and utilized in complex search and analytics regardless of the architecture.

    Encryption-in-use has advanced to a stage where performance and scale are no longer a concern, and organizations finally have the means to expand sensitive data protection from 10% to 100% and, in doing so, close the large extortion gap that exists today.

    In the last two years, as ransomware has risen to become the dominant cybersecurity threat for enterprises and governments, we are seeing a steady realization that tokenization is simply not enough. The handful of encryption-in-use providers is seeing a strong influx of customers, keen attention from analysts, and a free flow of investment. With its enormously large data coverage relative to tokenization and the ease with which it can be deployed and integrated, will encryption-in-use make traditional tokenization obsolete? We will see.


    No posts to display