Discovery Series: Bridge the Gap Between "Where You Think" Your Sensitive Data is and "Where it Really is"

Accurate and effective data discovery is the first step to efficient data protection and compliance. Unfortunately, in the real world, this is not easy. Even identifying and locating all of your sensitive data is a challenge, due to the dynamic nature and complexities of data sources.

Traditional sensitive data discovery methods can be divided into two major categories: Expert Determination and Data Discovery Tools (like dictionary match and regular expression). But these methods are not really good enough. Here’s why!

Expert Determination is the method where an application expert pinpoints to the locations of sensitive data. In reality, application experts simply do not have the time for this exercise. If they do find the time, their knowledge of sensitive data locations may not be comprehensive, owing to the changes an application undergoes over time. In one such instance, even an application expert who has spent two decades with the application wasn’t able to find all locations of sensitive data.

The other method of using rudimentary Data Discovery Tools such as dictionary search can find data in column names that specifically follow a pattern or known column names such as ‘National Identifier’, ‘First Name’, and so on. In multiple cases, however, we have seen that users enter sensitive data in ‘Value’ and ‘Description’ fields. How do you find these locations with just a dictionary search?

When it comes to data discovery with Data Discovery Tools like regular expression (reg-ex), one cannot be confident that the pattern is indeed sensitive data or simply a random pattern. The risk of false positives is too high. Regular expression other and pattern-based searches are also typically unable to find sensitive data in complex columns, composite columns, BLOBs, CLOBs, key-value pairs, and phantom tables, that may not be populated.

Additionally, these traditional methods do not identify reasons or attributes for classifying data as sensitive and also cannot distinguish data classifications with the same patterns, This is because, these methods are not data-classification centric. As some of these methods are configured to identify only the locations of sensitive data, they cannot generate metadata, comprising which users and programs have access to sensitive data.

But what if there is a better solution to locate ALL of your sensitive data with minimum error? For example, finding sensitive data in hard-to-find locations like free text fields and complex columns using patterns and validations, files in database columns, and even in temporary tables which could be found only using code scanning!

What your enterprise needs is a comprehensive configurable sensitive data discovery solution that can find all locations of sensitive data across structured and unstructured data sources and with minimal false positives. MENTIS has a demonstrable ability to find more sensitive data locations than the ones you already know. This will allow you to bridge the gap between “where you think” your sensitive data is and “where it really is.”

To find out more, download the research paper featuring MENTIS data and application security platform, including data discovery, data masking, data monitoring, and data retirement. You can find the report here.