Databricks, the company founded by the team that created Apache Spark, today announced the completion of the first phase of the Databricks Enterprise Security (DBES) framework.
In making the announcement at the Spark Summit 2016 in San Francisco, Databricks said this move makes it the first company to provide end-to-end enterprise security for Apache Spark.
DBES combines encryption, integrated identity management, role-based access control, data governance and compliance standards to secure Apache Spark workloads.
“ESG research shows the number one attribute sought in evaluating a big data/analytics solution is now security,” Nik Rouda, senior analyst at Enterprise Strategy Group (ESG), said in a statement. “As Apache Spark grows rapidly in production environments, satisfying the stringent operational requirements of the enterprise becomes critical. Databricks is accelerating the maturity of their just-in-time data platform built on top of open-source Apache Spark in important ways.”
DBES builds on the Databricks access management and encryption functionalities that already exist, Dave Wang, director of product marketing at Databricks, said in a blog post. “With the completion of DBES Phase One today, enterprises gain the ability to control access to Apache Spark clusters on an individual basis, manage user identity with a SAML 2.0 compatible identify management provider service, and end-to-end auditability,” he said.
The new security framework provides strong encryption for data at rest and in flight with support for standards such as Secure Sockets Layer (SSL) and keys stored in the AWS Key Management System (KMS). It also is designed to provide integrated identity management and facilitate seamless integration with enterprise identity providers via SAML 2.0 and Active Directory. In addition, DBES provides role-based access control and enables fine-grained management access to every component of the enterprise data infrastructure, including files, clusters, code, application deployments, dashboards and reports.
Regarding data governance, DBES guarantees the ability to monitor and audit all actions taken in every aspect of the enterprise data infrastructure. It also helps with compliance requirements, achieving security compliance that exceeds the high standards of FedRAMP as well as HIPAA (the Health Insurance Portability and Accountability Act) or Sarbanes-Oxley as part of Databricks’ ongoing DBES strategy, the company said.
“End-to-end security requirements are top-of-mind for today’s enterprises that are building advanced analytics solutions,” Ali Ghodsi, CEO at Databricks, said in a statement. “Yet building a truly secure, multi-tenant, and cloud-based enterprise data platform proves to be an impossible undertaking for most. We’re delighted to be the first vendor to solve this problem comprehensively for Apache Spark, allowing enterprises to maximize the value from their data without compromising compliance and security.”
DBES also features cluster access control lists, single sign-on support and audit logs to monitor usage patterns.
“Databricks’ vision is to empower anyone to easily build and deploy advanced analytics solutions,” Wang said. “With the Databricks Enterprise Security Framework, Databricks can satisfy the diverse (and sometimes competing) needs to secure big data in the modern enterprise, end-to-end. Phase One is only the beginning.”
Databricks Secures Apache Spark, Launches Community Edition
Also at Spark Summit, Databricks announced the general availability of Databricks Community Edition (DCE), a free version of the company’s data platform. The announcement comes just four months after the beta launch of Databricks Community Edition at Spark Summit East in New York.
“This year we’ve seen explosive growth for the Apache Spark project and all signs indicate the pace will only accelerate as the community expands even more,” Matei Zaharia, co-founder and CTO at Databricks, said in a statement. “Databricks Community Edition has created an ideal environment for learning Apache Spark. Developers of all backgrounds can now use Databricks Community Edition to learn Spark and mitigate the acute Spark skills gap.”
In a blog post on the GA release, Ion Stoica, executive chairman of Databricks, said more than 8,000 users have signed up for DCE, many of them using the service heavily. The top 10 most active users are averaging more than six hours per week with the platform and are executing more than 10,000 commands on average.
Moreover, DCE is attracting a wide user base, Stoica said. According to a recent survey, 25 percent of DCE users have never used Spark before, and 60 percent of the users are neither data scientists nor data engineers. “This demonstrates the effectiveness of DCE to grow the open-source Apache Spark user community by bringing new users into the fold, as well as its ability to train new data scientists and engineers,” he said.
The same survey also indicates that 90 percent of the users employ DCE for learning Apache Spark, “which establishes the role of DCE as a learning platform for Spark. Indeed, since its launch, tens of universities have already used DCE for teaching, including UC Berkeley and Stanford,” Stoica said. Also, the GA comes with new introductory materials and sample applications to make learning Apache Spark even easier.
“Today’s enterprises have an insatiable demand for data skills, which is exacerbated by the scarcity of qualified talent,” Stoica said in a statement. “Education is in Databricks’ DNA, and our birthplace at the UC Berkeley AMPLab gives the company significant experience in educating students and users. Databricks Community Edition augments that effort as a learning platform. More than 2,200 students have already taken courses using Databricks Community Edition since its beta release, and with its general availability, we expect widespread adoption by universities across the world.”
Databricks Community Edition users will have access to a 6GB micro-cluster as well as a cluster manager and the notebook environment to prototype simple applications. As a learning tool, DCE comes with a portfolio of Apache Spark learning resources, including a set of Massive Open Online Courses (MOOC) and sample notebooks.