SciNet relies on Excelero

Burst buffer application at Canada’s largest supercomputer centre uses NVMesh® to achieve unheard-of bandwidth and cost efficiency via pooled NVMe within the GPFS shared parallel file system.

  • 7 years ago Posted in
Excelero customer SciNet has deployed Excelero’s NVMesh™ server SAN for the highly efficient, cost-effective storage behind a new supercomputer at the University of Toronto. By using NVMesh for burst buffer – a storage architecture that helps ensure high availability and high ROI, SciNet created a unified pool of distributed high-performance NVMe flash that retains the speeds and latency of directly attached storage media, while meeting the demanding service level agreements (SLAs) for the new supercomputer.
 
“For SciNet, NVMesh is an extremely cost-effective method of achieving unheard-of burst buffer bandwidth,” said Dr. Daniel Gruner, chief technical officer, SciNet High Performance Computing Consortium. “By adding commodity flash drives and NVMesh software to compute nodes, and to a low-latency network fabric that was already provided for the supercomputer itself, NVMesh provides redundancy without impacting target CPUs. This enables standard servers to go beyond their usual role in acting as block targets – the servers now can also act as file servers.”
 
Based in Toronto, SciNet, Canada’s largest supercomputer centre, serves thousands of researchers in biomedical, aerospace, climate sciences, and more. Their large-scale modelling, simulation, analysis and visualisation applications sometimes run for weeks, and interruptions can sometimes destroy the result of an entire job. To avoid interruption SciNet implemented a burst buffer - a fast intermediate layer between the non-persistent memory of the compute nodes and the storage - to enable fast checkpointing, so that computing jobs can be easily restarted. SciNet had deployed the Spectrum Scale (GPFS) shared parallel file system on their spinning disk system, but at scale, as individual jobs become larger, checkpointing may take too long to complete, making the calculation difficult, or even impossible to carry out.
 
Using Excelero’s NVMesh in a burst buffer implementation, SciNet created a peta-scale storage system that leverages the full performance of NVMe SSDs at scale, over the network – easily meeting SLA requirements for completing checkpoints in 15 minutes, without needing costly proprietary arrays. With NVMesh, SciNet created a unified, distributed pool of NVMe flash storage comprised of 80 NVMe devices in just 10 NSD protocol-supporting servers. This provided approximately 148 GB/s of write burst (device limited) and 230GB /s of read throughput (network limited) – in addition to well over 20M random 4K iOPS.
 
Emulating the “shared nothing” architectures of the Tech Giants, SciNet’s NVMesh deployment allows them to use hardware from any storage, server and networking vendor, eliminating vendor lock-in. Integration with SciNet’s parallel file system is straightforward, and the system enables SciNet to scale both capacity and performance linearly as its research load grows.
 
“Mellanox interconnect solutions include smart and scalable NVMe accelerations that enable users to maximise their storage performance and efficiency,” said Gilad Shainer, vice president of marketing at Mellanox Technologies. “Leveraging the advantages of InfiniBand, Excelero delivers world leading NVMe platforms, accelerating the next generations of supercomputers.”
 
“In supercomputing any unavailability wastes time, reduces the availability score of the system and impedes the progress of scientific exploration. We’re delighted to provide SciNet and its researchers with important storage functionality that achieves the highest performance available in the industry at a significantly reduced price – while assuring vital scientific research can progress swiftly,” said Lior Gal, CEO and co-founder at Excelero.

ATTO Technology has published the findings of an independent survey of IT decision-makers from...
NetApp extends its collaboration to accelerate Ducati Corse’s digital transformation and deliver...
Delivering on the promise of SSDs that address future enterprise infrastructure requirements KIOXIA...
FlashBlade at Equinix with Azure for EDA: industry first validated solution to leverage...
Infinidat says that Richard Bradbury has been appointed SVP, EMEA & APJ. Leveraging his extensive...
New storage automation and delivery platform and cloud native Database-as-a-Service offering bring...
Leveraging its strength and leadership in flash, Western Digital has launched the new WD Red SN700...
Nutanix has added new capabilities to the Nutanix® Cloud Platform that make it easier for...