When the words ‘Big Data’ are used, there is much discussion about how to use, manage, and store data as a strategic advantage for companies. What is often forgotten is the fact that most organisations do not need special Big Data applications that are promoted under this hype. However, what in many cases is useful and necessary as a prerequisite for the efficient use and analysis of any company’s data, is the virtualisation of their in the enterprise. The idea is based on the same concept as virtualised servers and networks in the past already having contributed significantly to the efficiency of businesses. By performing the essential step of data virtualisation, businesses are ideally equipped for handling the upcoming petabyte data loads that can be expected from Big Data.
So what exactly is Big Data?
The key to understanding Big Data is to accept that it is not a class or type of data. It has been used to describe the analysis of large volumes of various types of data. Big Data is also a trend covering multiple new approaches and technologies for storing, processing and analysing data and the technology used to do so. Such analysis can be useful for businesses looking to understand what people are buying, when, where and how.
Its popularity is such that for many, it is seen as the Holy Grail for businesses today. It will enable organisations to understand what their customers want and target them to drive profitable sales and growth. The Big Data trend has the potential to revolutionise the IT industry by offering new businesses insight based on previously ignored and underused data. The UN predicts that over half the world’s population will be connected to the Internet by 2016. That’s some 3 billion people possibly connected to social networks such as Facebook and Twitter, providing a wealth of potentially valuable data related to customer interests and buying behaviour.
This trend has stimulated an intense debate about how Big Data can help organisations improve customer targeting and drive revenue. Amid the excitement, Big Data is often over-hyped and discussed in a context that overlooks the fact that data is meaningless without intelligent insight. The challenge for users is to negotiate toward a successful outcome while avoiding falling for the hype.
Insight is important. But just because organisations now have access to vast amounts of information, they still need to understand and draw conclusions from complex and unwieldy data. Many fall into the trap of believing a correlation between data sets is all that is needed. For instance, if you identified a correlation between the rise in the consumption of ice cream and an increase in the murder rate during the summer months, you might conclude that one caused the other. However it is a third variable – that of hotter temperatures during the summer – that is a more likely cause of the other two. So it’s not just about looking at the trends between data sets. Whatever data you analyse, you still need to understand cause and effect; otherwise you simply end up with a series of false positives.
But let’s go back to basics. Above all else, Big Data is about storing, processing and analysing data that was previously discarded as being too expensive to store and process using traditional database technologies. That includes existing data sources such as web, network and server log data, as well as new data sources such as sensor and other machine generated data and social media data.
For IT professionals, the opportunity to lead the way in helping organisations store and manage data is key. IDC estimates* that 60% of what is stored in data centres is actually copy data – multiple copies of the same thing or outdated versions. The vast majority of stored data are extra copies of production data created by disparate data protection and management tools like backup, disaster recovery, development and testing, and analytics.
It is not marketing that is the driver of big data. It is the ability of IT to deliver quality data quickly and at low cost; for a business to analyse and interpret. Big Data has arrived, but big insights have not yet. The challenge now is to solve the underlying problem and gain answers – without making the same statistical mistake on a grander scale than ever.
While many IT experts are focused on how to deal with the mountains of data that are produced by this intentional and unintentional copying, far fewer are addressing the root cause of copy data. In the same way that prevention is better than cure, reducing this weed-like data proliferation should be a priority for businesses.
Enterprise IT heads tend to have similar key strategic priorities – improving resiliency, increasing agility, and moving toward the cloud to make their systems more distributed and scalable. Often they are held back by traditional software and hardware.
Copy data virtualisation - freeing organisations’ data from their legacy physical infrastructure just as virtualisation did for servers a decade ago – is likely to be the way forward. If business divisions work on a single physical ‘golden’ copy which can spawn innumerable virtual copies then copies won’t take up server space. The sooner companies reduce the creation of physical copies, the less they will have to spend on storage and the quicker they can get to the analysis.
*http://www.datatrend.com/library/IDC-Mar-2013-Copy-Data-Problem-Analysis.pdf