DataOps is an emerging practice employed by large organisations with teams of data scientists, developers, and other data-focused roles that train machine learning models and deploy them to production. The goal of using a DataOps methodology is to create an agile, self-service workflow that fosters collaboration and boosts creativity while respecting data governance policies. A DataOps practice supports cross-functional collaboration and fast time-to-value. It is characterised by processes as well as the use of enabling technologies, such as the MapR Platform.
“In 6.0, our platform’s unique capabilities focus on three key areas in support of DataOps: automated cluster health and administration, security and data governance, and faster time to machine learning and analytics,” said Anoop Dawar, vice president product management and marketing, MapR. “DataOps is an important movement, ultimately letting organisations turn their data into value as quickly as possible. We continue to evolve the MapR Platform to accommodate the needs of everyone involved with data: data scientists, operations personnel, and security practitioners.”
New benefits from the added features and updates to Version 6.0 of the MapR Platform include:
? Automatic Platform Health and Security. To simplify processing of cluster health and continuous operations, The MapR Platform now includes:
? New MapR Control System administers all data (volumes, tables, and streams) and monitors cluster health with metric co-relation in single pane of glass. Also includes extensible dashboards for volume metrics, including: capacity, throughput, latency, and IOPs. MapR monitoring metrics are now automatically pushed to MapR Event Streams to enable easy integration with enterprise systems.
? Recently announced database indexing in MapR-DB delivers auto-propagation, auto-scale, and auto-management.
· Real-time Data Integration. The MapR Change Data Capture now integrates MapR-DB with MapR-ES out of the box. MapR-ES is a global event streaming system that enables real-time data ingestion and continuous stream processing. MapR-DB, in conjunction with MapR-ES, enables simplified data integration, and allows multiple applications, including advanced Machine Learning models and deployments to share information and be synchronized in real-time.
? Secure, Discoverable Data. Users across business lines should be able to quickly find the data they need or data that could be useful to them in their analysis, but only if they have appropriate rights to that data. Version 6.0 offers new single-click security enhancements such as enforcement of authentication and more comprehensive encryption on the wire, while taking much of the guesswork out of configuring security. MapR is simpler to secure out-of-box, helping to lower the probability of a security breach.
? Self-Service Data Science, Artificial Intelligence. Data analysis is increasingly being driven by machine learning / artificial intelligence to gain quick, accurate, and actionable insights and data scientists are a driving force behind the DataOps movement. MapR makes its recently announced Data Science Refinery available for complete, self-service access to all data from within the same cluster.
· Updated MapR Expansion Pack (MEP). MEP 4.0 includes a new MapR Container for Developers, a single-node MapR deployment intended for developers that want to create new applications and services, or simply learn more about MapR, support for the new Apache Myriad 0.2 release with security improvements and the ability to handle Mesos GPU bids, enhanced support for Hive on MapR-DB JSON tables. More details on MEP 4.0 can be found here.
"Customers adopting the StreamSets Data Operations Platform regularly implement change data capture, or CDC, to efficiently enable event-driven architectures,” said Kirit Basu, director, product management, StreamSets. “Continuing our close collaboration with MapR, we're excited to integrate StreamSets Data Collector with the new CDC capabilities in the MapR Converged Data Platform, giving our joint customers the utmost flexibility to build real-time pipelines for data-intensive applications."
Saket Saurabh, CEO of Nexla, a DataOps platform for inter-company data, commented, “DataOps is the backbone of any data-driven enterprise, but too often it lacks the tooling to make it a scalable and repeatable process. In fact, in our recent industry-wide survey of over 300 data professionals, we reported that integration, troubleshooting, building data pipelines, and ETL take up 47% of respondent’s time. That’s valuable time that could be applied toward analytics that leverages new frameworks like machine learning.”
MapR Converged Data Platform 6.0 is one of the most comprehensive releases MapR has delivered with a vast array of breakthrough technology advancements engineered throughout the platform. Other new features also shipping in this release, include:
· MapR-DB database for global data-intensive applications with rich open JSON application interface (OJAI) 2.0 APIs, native secondary indexes, deep integration and optimisation of Apache Drill for SQL analytics and business intelligence, and advanced analytics using native Apache Spark and Apache Hive.
· MapR Orbit Cloud Suite enhancements offering cloud-scale multi-tenancy, MapR OpenStack Manila plug-in for tenant self-service provisioning of files, and edge to cloud file migrate for real-time, automatic movement of files from edge to cloud (S3).