“ESG research shows Apache Spark adoption is poised to grow quickly, with 16% of businesses already in production and another 47% very interested in implementing Spark,” said Nik Rouda, senior analyst, ESG. “As such, Spark will power the next wave of big data. Yet enterprises will demand a robust platform to meet their operational requirements. MapR is helping to accelerate Spark by addressing this need.”
The new distribution enables all advanced analytics including batch processing, machine learning, procedural SQL, and graph computation. Because Spark runs seamlessly on MapR it benefits from the platform’s patented enterprise-grade features such as web-scale storage, high availability, mirroring, snapshots, NFS, integrated security, global namespace, etc. This native integration makes it the only reliable and production-ready platform for Spark workloads on-premise and in the cloud. Product extensions of the distribution could include real-time streaming and operational analytic capabilities, with MapR-Streams, MapR-DB, and Hadoop as add-ons.
“This is a great example of MapR continued commitment to open source Apache Spark," said John Tripier, senior director of business development, Databricks. "MapR was early to recognise the impact Spark would have on the big data landscape, and we are excited to see them extending the power of Spark for their enterprise customers with this announcement."
“We’ve built this new distribution to make it easier for customers that leverage the power of Spark for their big data initiatives,” said Anoop Dawar, vice president product management, MapR Technologies. “We’ve seen significant growth of customers deploying Spark as their primary compute engine. We believe this gives our customers a converged compute and storage engine for batch, analytics, and real-time processing that helps build and deploy applications rapidly.”