There are, of course, two sides to the issue of getting the best results out of big data resources. One, as the CSW 2014 predictions earlier this month suggested, is the growing need for data scientists. These are the people who will often best get to the real questions to which a business needs answers if it is to get the best value from its data. The second, however, is to ensure that the applications code, the management services and the hardware itself is fully optimised to get the best results.
This latter requirement has become the focus of attention at Application Performance Management (APM) specialist, Compuware. It has spent a good deal of time recently looking at the optimisation issues associated with using Hadoop on Amazon’s Elastic MapReduce system.
The result of this work is now available, in the form of the APM solution for Hadoop on Amazon Elastic MapReduce, which is now available in the AWS Marketplace. Its goal is to help organisations manage big data at scale, enabling them to gain faster business value at lower cost. It has been designed to work across the lifecycle of an application, from the development stage, through testing and onto the production environment.
At the heart of it is PurePath , which uses dynaTrace’s patented PurePath Technology to capture timing and code level context for all transactions, end-to-end. This means monitoring from user’s click, across all tiers, to the database of record and back. With this exact, highly granular level of detail, PurePath allows for more accurate reporting, granular business transaction grouping, precise SLA management and what Compuware claims is the fastest path to root causes available on the market.
Compuware APM profiles Amazon EMR jobs, providing drill-down dashboards that can pinpoint the root cause of failed jobs or performance hotspots with a single click. Operation teams gain full visibility into cluster usage based on users or job types, enabling monitoring of service level agreements (SLAs) and charge-back models to consumers.
By profiling Hadoop jobs in production, operations teams can quickly identify the issues, whether they are misconfigured or unbalanced clusters, poorly-coded workloads or under-performing hosts. Developers, armed with exact detail shared by operations and QA, no longer have to guess at the performance of their code when running at massive scale.
“Compuware APM combined with the AWS cloud provides customers with the technical capabilities they need to allow them to focus on their business,” said Terry Hanold, Vice President, Cloud Commerce, AWS.
The Amazon EMR APM solution incorporates a number of helpful functions. For example, it allows the profiling of Hadoop jobs in production clustersto see which teams are utilising the cluster and exactly why, to code level, a job takes minutes or hours to run. It also automatically identifies performance hotspots in Amazon EMR, such as whether the problem is due to a poor configuration, over-utilised or failed infrastructure, or inefficient code.
Eliminating the need to scour log filesis made possible by identifying the root cause of a job failure in one click. Exceptions, stack traces and logged data are automatically identified by PurePath Technology, showing both where and why failures occur. These detailed facts can be easily shared with development teams to quickly fix issues.
“Compuware APM makes application performance management for big data both simple and straightforward across the lifecycle from development to test to production,” said Steve Tack, Vice President of Product Management for Compuware’s APM business unit. “To help customers grappling with this maturing technology where expertise is scarce, Compuware APM for Amazon EMR adds visibility and helps organisations better manage their big data workloads and transactions.”