Talend’s integrated platform now includes new data preparation features for big data that enable all employees to access, cleanse and collaborate on the analysis of massive data sets, as well as an intuitive, self-service Data Stewardship app that helps companies avoid the costly fines and penalties that can result from data integrity issues. The latest version of Talend Data Fabric also includes Spark 2.0 innovations for Talend Big Data and Talend Integration Cloud that allow customers to accelerate business processes and easily upgrade their environments to keep pace with the rapidly changing technology landscape.
Gartner research indicates that “Through 2018, 90 percent of deployed data lakes will become useless as they are overwhelmed with information assets captured for uncertain use cases.”[1] While data lakes have numerous benefits and can often serve as the first step in a company’s digital transformation, they also present new challenges in terms of governance, data quality, lineage, and ubiquitous access.
“Companies need to fundamentally change how they use and share data across their organization to advance their digitization efforts. The beauty of a data lake is that regardless of whether it’s housed in Hadoop, on premises, or in the cloud, you have a centralized repository that allows you to store significantly more information at a lower cost, and extract more insight,” said Ashley Stirrup, chief marketing officer for Talend. “The new version of Talend Data Fabric propels customers to the next phase of their digital evolution by fostering collaboration between IT and the business to scale and transform their data lakes into qualified, trusted data that employees can use to make more informed decisions, faster.”
Data Preparation for Big Data
The latest version of Talend Data Fabric empowers IT to enable business users to access and expedite data preparation and cleansing to get more value out of corporate data lakes. The new data preparation capabilities for Talend Big Data allow customers to: • Access any data source–whether it’s housed in Hadoop, the cloud or traditional databases—and share it across users and groups to encourage collaboration
• Run preparations at scale using the power of Spark 2.0 and Hadoop
• Utilize a pre-configured data dictionary to auto-recognize the meaning of the raw data from the data lake, as well as augment the dictionary with their own vocabulary, such as product codes or names
• Crowdsource new data definitions from open data and/or the Talend Community
Data Stewardship: Getting to Good, Clean Data
In today’s increasingly competitive marketplace, the difference between digital leaders and laggards lies in how companies put their data to use. Talend’s new Data Stewardship app is one of the first self-service tools that allows IT and business users to curate and manage data efficiently throughout its lifecycle. With this component, users can quickly resolve many data integrity issues to ensure data in the lake is clean, governed and compliant. The new app can help companies ensure better data compliance to avoid the costly fines that can be incurred from a breach of regulatory mandates such as the General Data Protection Regulation or Sarbanes-Oxley. By extending data governance tasks to line-of-business stewards who are most familiar with the data, the new app creates a collaborative environment, wherein data in the lake is ‘trusted’, spurring broader use.
Using the Data Stewardship app, employees can embed governance into any data integration flow, and isolate subsets of data that require manual curation, arbitration or certification. The app then organizes those tasks as workflows, assigns each one to the business worker best equipped to perform the quality check, and sets rules for which data should be cleansed and validated. The new version of Talend Data Fabric also utilizes machine learning to discover best practices for data curation from the line-of-business experts and to automate matching of massive data sets so they can be completed faster and with greater intelligence. Additionally, new support for Apache Atlas allows customers to have a better understanding of data lineage across Hadoop, to better manage risk and compliance.
“Many organizations start data governance initiatives either due to an embarrassing or regulatory incident, or because the line of business workers feel they can’t trust the data. Some organizations also see data governance as an IT problem and not a business problem,” said Stewart Bond, research director of IDC's Data Integration Software service. “The best way to manage data governance is to engage line of business workers in the data stewardship process. Having intimate knowledge of the data empowers users to improve data trust and value through enrichment, cleansing, standardization and certification, increasing confidence in data-driven business decisions.”
Adaptable Investments Provide Peace of Mind
Big data and cloud technologies are rapidly evolving, which gives some customers pause that the platform purchases they make today may be outdated in a matter of months. Built on open source and adhering to industry standards, Talend Data Fabric can more easily adapt to change than proprietary software solutions. The continuous innovation provided by the open source developer community, as well as multiple big data and cloud partners, ensures that Talend Data Fabric keeps pace with emerging technology advances. Additionally, Talend Data Fabric is a model-driven code generator, which makes it very easy to acclimate to emerging technologies. For example, generating the code to transition a job or application from Spark 1.6 to Spark 2.0 can be done in just a few clicks. All of these features give customers peace of mind that their technology investments are secure for the long term and won’t need to be replaced every two years.