Delphix has revealed that following the deployment of its powerful Data as a Service platform, the European Bioinformatics Institute (EMBL-EBI) has been able to share data from publicly funded scientific research more quickly than ever.
EMBL-EBI manages well over 50 Petabytes of data (about 100,000 laptops worth), and this amount is approximately doubling in size every year. Researchers in medicine, agriculture and environmental science make over 12 million requests per month for this freely available life-science data, which is managed jointly with collaborators in the US and Japan.
Data from genome sequencing uses most of the storage available at EMBL-EBI, and demand in this area of science is growing rapidly as the price of the supporting technology continues to fall. Scientists at EMBL-EBI regularly add information about genome sequencing and other data types, and need to find innovative ways to improve database efficiency and scalability.
EMBL-EBI uses Delphix technology to virtualise databases so that production teams can prepare and release research data faster and more frequently than ever before.
“The collection, curation and release of reference genome data is vital for research activities worldwide – especially in the area of personalised medicine, which will drive future healthcare. However, the sheer size and complexity of the data we host makes it increasingly difficult to move, both internally and externally,” says Steven Newhouse, Head of Technical Services at EMBL-EBI.
It used to take up to three months to prepare a data release. Much of this time was spent passing copies of databases from one team to another, adding extra information about different molecules and interactions along the way. Now, the process is many times faster.
EMBL-EBI initially deployed Delphix Data as a Service three years ago, and now hosts over 50 virtual data environments supporting test and development operations. Engineers can get the data they want on demand, and retrieve it from any point in time without resorting to archives and the long, painful process of reviving historical data.
“Delphix enables each developer to deploy his or her own temporary environment on demand, to do independent exploratory work, benchmarking or development. This kind of agility was just not possible without Delphix,” says Manuela Menchi of EMBL-EBI’s Database Team. “Now, our DBA resources are freed up so they can focus on higher-value tasks.”
The public data at EMBL-EBI is distributed over three data centres, with databases and files replicated regularly across these locations. With individual datasets as big as seven Petabytes, its metadata needs to be stored in over 500 databases held across Oracle, MySQL, PostgreSQL and NoSQL.
When fully deployed, Delphix is expected to reduce the total database storage footprint by 70%.
Future plans for Delphix include increased consolidation of data infrastructure, replication between data centres and backup-and-recovery enhancement.
“Our projection is that Delphix will reduce the data release timeframe by 20%, allowing some data resources to make an extra release – or more – every year with no additional development or curation staff. Reputation is the currency of research, and our users demand reliable and fast delivery. Delphix Data as a Service allows us to release our data faster and more frequently,” concluded Newhouse.
“We are extremely proud to be helping EMBL-EBI provide invaluable services to scientists everywhere,” said Iain Chidgey, EMEA General Manager of Delphix. “The massive growth in data is a major bottleneck to an organisation’s output and productivity. We’re delighted that EMBL-EBI can demonstrate how our Data as a Service removes this constraint and increases agility.”