Saving Time and Money with Flywheel HPC Integration
By Nathaniel Kofalt
Flywheel, a comprehensive research data platform for medical imaging, machine learning, and clinical trials, supports computing on High Performance Computing (HPC) clusters, including Slurm and SGE, in addition to more traditional virtual machine (VM)-based deployments in the cloud or on-premises.
As capital investments (frequently in the millions of dollars), HPC systems constitute an enormous opportunity as a local, shared resource for an organization. At the same time, these systems are often difficult or confusing to use, due to their specialized nature and older technology base. Access is frequently restrictive, and the workflow for running software on an HPC cluster is significantly different from a traditional machine, due to the “drop off & pick up” nature of the interaction.
Furthermore, debugging tends to have an extremely long turnaround time, due to the system’s fluctuating and inscrutable job queue. This tends to result in idle cluster capacity. Flywheel can increase HPC utilization, by making the system more accessible to a large user community.
During beta testing alone, one Flywheel customer estimated savings in excess of $7,000 over a period of one month, by moving some of their compute-intensive workloads from cloud hosting to their university-sponsored HPC. In that time, over four months of single-machine, eight-core work transpired. Much of that capacity would otherwise have sat idle on their cluster.
With Flywheel, scientific algorithms run in OCI-compliant (Docker, etc.) containers, called Gears. When using Flywheel with the new HPC integration, customers work directly with us to whitelist specific Gears for this feature, but still access the same point-and-click experience available to all Gears. The Flywheel system translates the request into the system-specific format, submits the HPC job, using the Singularity container runtime, waits for the HPC queue to pick up the work, and marshals input and output data to and from the system.
The result is that all of Flywheel’s computation management features – such as batch jobs, SDK integration, and Gear Rules – work out of the box on HPC systems or local hardware, with great potential for improving productivity and reducing costs.
“The Flywheel integration with the HPC at Penn has been a total game-changer. It allows us to leverage the complementary advantages of two powerful systems. By launching compute jobs as containerized Gears through Flywheel, we can ensure total reproducibility. Furthermore, by integrating Flywheel with the massive computational resources provided by the Penn HPC, run by Christos Davatzikos, we can run computationally demanding jobs at scale across large samples without worrying about cloud compute charges. Throughout, the Flywheel engineering team was incredibly responsive; it was really a model for successful collaboration.”
– Ted Satterthwaite, MD, Assistant Professor in the Department of Psychiatry at the University of Pennsylvania Perelman School of Medicine