Gartner defines big data as the three Vs, “…big data is high-volume, high-velocity and/or high-variety information assets.” The enterprise big data landscape continues to evolve and it’s happening at an expedited rate. Many organizations aren’t prepared for the complexities of protecting their big data, especially as it’s increasingly becoming an integral part of the overall data environment.
In this post, we’ll review the unique challenges of securing big data and discuss four items to consider for reducing your risk.
Big Data Security Complexities
Big data databases are holding more sensitive data than ever before. But securing big data is not “more of the same” – big data is very different from your typical RDBMS databases. Big data keeps changing and evolving, and security measures must keep up with these changes. Let’s take a closer look at the unique differences:
High Variety
Enterprises with complex, heterogeneous environments cannot just implement a simple add-on to their traditional databases for security. NoSQL does not provide a common standard query language like SQL for RDBMS, there’s a high variety of technologies across the different vendors.
Security is not going to be one-size-fits-all. Your landscape may include several implementations side by side including traditional RDBMS databases, structured, semi-structured and unstructured databases. Enterprise application teams are also developing in-house applications that connect to big data repositories, increasing the variety of access methods to the data. Those apps will also require security.
High Velocity
Big data solutions are evolving at a much faster rate. If your enterprise is accustomed to upgrading database releases every year or so, you might choose for big data to adopt releases at a higher rate, due to new features. For example in 2016, Cloudera had more than 12 releases.
High Volume
A few years ago, it was thought that big data held only non-sensitive data, but it’s becoming apparent that this is not the case. These big data databases hold more and more data, with a growing portion of it sensitive, across all the different technologies both on-premises and in the cloud.
A Layered Approach to Big Data Security
You may already have some security measures in place for your big data environment, but like any other data security strategy it’s important to think in terms of a layered approach.
Here are four things you want to consider when it comes to securing big data:
1. Database Discovery and Data Classification
Big data can hold sensitive data and you need to identify where it resides. Discovery can be seen as a two-phase process: first, discover where all your big data databases reside; then second, understand if they hold sensitive data (and if so, what kinds of sensitive data).
Discovery of all your big data databases is not a one-shot activity, but rather something you should do periodically as the volume and variety of big data databases can increase over time.
Also, identifying sensitive data is complex task. Big data architecture can be semi-structured (for example, MongoDB) so performing this task manually is tedious and your assessment can soon become out-of-date. It’s worth investing in a discovery solution.
2. Insider Threats
You may have set security measures to deny physical access to the database machine or restricted access to the database, but this will not protect against malicious or compromised insiders. Threats from within require a different set of security measures. For example, you’ll want a solution that looks for abnormal data access, including from privileged users like DBAs. A behavior model based simply on who typically logs in to which resource (e.g., login/logout of a database) at what time is not enough. The real need is to scale the early identification of potentially malicious data abuse which requires a deeper dive into the exact data that is being accessed (e.g., after login, what records are and are not accessed).
3. Vulnerability Testing
Currently there are only a few security benchmarks for big data databases. For instance, a MongoDB security benchmark (registration required) was recently released by the CIS.
You should run regular tests to discover vulnerabilities and take action to remediate any gaps. Some can be easily fixed by updating patches, but some will require configuring your system. Taking such actions will close the security holes that make you vulnerable. This domain is still relatively immature compared to RDBMS; it will take time for the applicable benchmarks to catch up.
4. Database Activity Monitoring
You should monitor the day to day activity of users and applications to your big data databases to discover any suspicious activities. You can set alerts (to trigger manual action) or set policies that automatically deny access to users or applications when inappropriate access is discovered. If your big data database contains sensitive data, you may also need to monitor (and/or limit) access for compliance reasons.
Next Steps
Protection of big data is not trivial. The variety of big data databases, architectures, applications and technologies means that you’ll need a comprehensive solution to secure your big data environment. Look for a solution that offers database discovery and assessment, breach prevention against insider threats, and vulnerability testing. And one that provides ongoing monitoring and auditing with the goal of identifying inappropriate or suspicious activity.
Once you take steps in this direction you’ll be better equipped for the ever-evolving world of big data.