Computing with Flywheel

I am often asked to explain how Flywheel supports a broad range computational workflows, including:

  • Working with existing pipelines
  • Exploratory development and analysis
  • Automating routine processing

Flywheel offers an open and extensible approach that provides you the flexibility to work in the manner that makes sense for your lab or project.

Working with existing processing pipelines

The simplest approach for working with existing pipelines involves downloading the required data from Flywheel and processing it as usual. Flywheel provides several download options including the web-based UI and command line tools. For more control over selecting and formatting data, Flywheel provides easy-to-use programming interfaces for use with leading scientific languages including Python, MATLAB, and R. These may be used to access, format, and download any data or metadata in the Flywheel database.  

Exploratory Development and Analysis

For developing new algorithms or pipelines, Flywheel’s Python and MATLAB SDKs provide a powerful alternative to downloading to disk. Using the SDKs, a Python or MATLAB user may work with data in Flywheel directly from their preferred scripting language. Full search is available along with simple commands for reading and writing data and metadata.

Routine Processing with Plug-In Applications (Gears)

Gears are plug-in applications that automate routine tasks, including metadata extraction, classification, quality assurance, format conversion, and full analytic pipelines.  Here’s how gears work:

Leveraging Standard OCI-Compliant Containers

From a technical perspective, Gears are applications running in standard OCI-compliant (Docker, Singularity, etc.) containers that are managed by Flywheel. A container typically contains application code and all of its dependencies to create a portable, reproducible unit of processing. Containers can be easily made into Gears with the addition of metadata that explains to Flywheel how to use the containerized applications. This metadata is expressed via a simple JSON file that includes descriptive metadata, such as links to source code, authors, etc. It also includes instructions for passing in data, configuration options, and how to execute commands in the container.

Automating and Scaling Gear Execution

Gears may be run in a variety of ways. They may be executed on demand for a given data set.  They may also be run in batch mode for a selected collection of data sets. In these cases, the user is prompted for inputs prior to execution. Gears may also be run automatically by rules configured for the project.  For example, when a DICOM series is uploaded, it can be classified and converted to NIfTI, if it is imaging data. Gear rules may be used to automate routine pre-processing as well as trigger complex pipelines. Gears may be scheduled by tasks outside of Flywheel using the command line tool (CLI) or programming interfaces. Finally, when deployed in cloud or private cloud infrastructures, Flywheel can dynamically scale resources to maximize parallel processing to save you time.

Process Any Level of Data in Your Project

Gears may be designed to process data at different levels of the Flywheel project hierarchy.  Gears may process individual sessions (exams/DICOM studies). For longitudinal studies, Gears may be used to process at the subject (participant/patient) level with the ability to process data from multiple sessions. Finally, project-level Gears may be used to perform group/cohort analyses across all subjects.  

Automated Provenance

A key advantage of using Gears to manage routine processing is the documentation that results. Everytime a Gear is run, Flywheel records a great deal of derivative information that supports consistency and reproducibility of your project. These “Analysis” documents record Gear version, who ran it, when it ran, success/fail status, inputs, configuration options used, and outputs produced.  Further, they may be annotated with notes or structured JSON metadata to meet your project needs. This provenance makes it easy to ensure that all necessary processing steps were performed and performed consistently.

Flywheel Gear Exchange

To speed project deployment, Flywheel provides a library of commonly used algorithms as Gears via the Flywheel Gear Exchange. The Gear Exchange currently contains roughly 70 Gears contributed by Flywheel or Flywheel users. Examples include DICOM-to-NIfTI conversion, Freesurfer Recon-All, the Human Connectome Pipelines, and commonly used BIDS applications, such as MRIQC and FMRIPrep. The Gear Exchange provides a powerful way to share reproducible units of code that may be used as building blocks for new projects.

User-Developed Custom Gears

Users may easily create their own Gears as well. Gear developers simply get their code running in an OCI-compatible container and provide the gear metadata. Applications may be developed in any language. Flywheel’s APIs and SDKs may be used in a Gear if needed, otherwise, the containerized application need not be Flywheel aware. 

Flywheel streamlines the process of creating the Gear metadata via the CLI Gear Builder tool which prompts the user through the required information and generates most of the metadata automatically. The resulting Gears may be shared with other Flywheel sites via the Flywheel Gear Exchange, or may be kept private by uploading them only to the user’s site. Flywheel does not make any claim on any of the intellectual property in customer Gears.

Conclusion

Flywheel makes it easy to work the way you want. Our open CLI, APIs, and SDKs make it easy to download data and use existing processes. Our Gears framework allows you to automate routine processing consistently with extensive documentation to support quality and reproducibility.

Read more about our scientific collaborations or send us your questions!


Why a Research-First Platform for Imaging Informatics and Machine Learning?

It's no secret that researchers face many challenges that impede the research and development of artificial intelligence (AI) solutions in clinical settings. Machine learning requires large volumes of data for accuracy in most applications. Institutions often have a wealth of data but lack the systems needed to get it into the hands of researchers cost-effectively.

Those data must be of high quality and labeled correctly. Imaging projects often involve complex preprocessing to identify and extract features and biomarkers. To further complicate matters, security and privacy are critical, particularly when involving collaboration outside of the context of clinical care.

Unfortunately, established clinical solutions fail to address six critical needs of researchers, impeding research productivity and slowing innovation.

Multimodality

Imaging offers significant opportunities for machine learning, but imaging is often not enough. Given that so much of today's research is centered around precision medicine and opportunities to revolutionize cost and quality of care, researchers often require a 360° degree view of patients including EMR, digital pathology, EEG, -omics, and other data. Clinical imaging systems such as PACS and vendor-neutral archives (VNAs) are designed specifically for imaging and typically don't deal well with nonimaging data, particularly in the context of research workflows.

Cohorts, projects, and IRB compliance

Researchers require the ability to organize and analyze data in cohorts while enabling collaboration with others outside of the context of clinical care. Clinical imaging systems are designed for individual patient care, not for cohort or population health studies, and often lack the organizational structures required for research applications such as machine learning. Institutional review boards (IRBs) typically define for a project the scope of allowed data as well as the people authorized to work with that data. Modern research informatics systems must enable productive workflows while enforcing these IRB constraints.

Quality assurance

Machine learning can be highly sensitive to the quality of the data. Researchers must be able to confirm the quality of data, including completeness and consistency with the protocol defined for the study. Quality control and supporting documentation are required for scientific reproducibility and for processes such as U.S. Food and Drug Administration (FDA) approval. Subsequently, modern informatics systems must incorporate comprehensive support for quality assurance as part of the workflow.

Integrated labeling and annotation workflows

Machine learning depends on accurately labeled sample datasets in order to effectively train AI models. Real-world data, often originating from multiple sources, generally lack the structure and consistent labels required to directly support training. Modern imaging informatics solutions must provide the ability to efficiently organize and classify data for search and selection into the appropriate projects or machine-learning applications. Labeling workflows must be supported, including the ability to normalize classification of images and other factors such as disease indications. In the context of imaging, this may involve image annotations collected from radiologists or other experts in a consistent, machine-readable manner via blind multireader studies or similar workflows.

Automated computational workflows

Imaging and machine learning are computationally intensive activities. Research informatics platforms must automate and scale computational workflows ranging from basic image preprocessing to analytic pipelines and training AI models. The ability to rapidly define and integrate new processes using modern tools and technologies is critical for productivity and sustainability. These systems must also provide the ability to leverage diverse private cloud, public cloud, and high-performance computing (HPC) infrastructures to achieve the performance required to process large cohorts cost-effectively.

Integrated data privacy

Data privacy is critical. Compliance with regulations such as HIPAA and GDPR is a must, given the potential financial and ethical risks. However, the lack of scalable systems for ensuring data privacy is impeding researcher access to data and, therefore, slowing innovation and the related benefits. Modern research informatics solutions must systematically address data privacy. Regulations require deidentification of protected health information to the minimum level required for the intended use. However, the minimum level of identification may differ by project. Subsequently, informatics solutions must integrate deidentification and related data privacy measures in a way that can meet the needs of projects with different requirements while maintaining compliance.

Data as a strategic asset with FAIR

Data is the key to clinical research and machine learning. A scalable, systematic approach to research data management should be the foundation of research strategies aimed at machine learning and precision care. Cost-effectively scaling access to clinical data in a manner that supports research workflows while ensuring security and data privacy can improve research productivity, accelerate innovation, and enable research organizations to realize their strategic potential.

Implementing the FAIR principles in your organization helps maximize the strategic value of data that exists in your institution. These principles, developed by academics, agency professionals, and industry members, amplify the value of data by making it Findable, Accessible, Interoperable, and Reusable (FAIR).

  • Findable data are labeled and annotated with rich metadata, and the metadata are searchable.
  • Accessible data are open to researchers with the correct authorization, and the metadata persist even after data are gone.
  • Interoperable data follow standards for storing information and can operate with other metadata and systems.
  • Reusable data are well-described and well-tracked with provenance for computation and processing.

Modern informatics systems should deliver on the FAIR principles while supporting the workflow needs of researchers as described above.

A clinical research platform designed to enhance productivity and accelerate innovation

Flywheel is a new class of informatics platform that addresses the unique needs of researchers involved in imaging and machine learning. Deployed at leading research institutions around the world, Flywheel supports the entire research workflow including capture, curation, computation, and collaboration, plus compliance at each step.

Capture

Flywheel is designed for true multimodality research. While the system specializes in the unique data types and workflows associated with imaging, the platform is capable of managing nonimaging data such as EMR, digital pathology, EEG, genomics, or any other file-based data. Further, Flywheel can automate data capture from imaging modalities and also clinical PACS and VNAs to streamline research workflows as well as translational testing scenarios.

Curate

Flywheel is unique in its ability to organize and curate research data in cohort-centric projects. The platform provides extensive tools for managing metadata including classification and labeling. Quality assurance is supported through project templates and automation rules. Integrated viewers with image annotation and persistent regions of interest (ROIs) are provided to support blind multireader studies and related machine-learning workflows. Powerful search options with access to all standard or custom metadata are provided to support the FAIR principles.

Compute

Flywheel provides comprehensive tools to automate routine processing, ranging from simple preprocessing to full analytic pipelines and training machine-learning models. The platform scales computational workloads using industry-standard "containerized" applications referred to as "Gears." Gears may originate from Flywheel's Gear Exchange containing ready-to-use applications for common workflows or may be user-provided custom applications. The platform supports elastic scaling of workloads to maximize performance and productivity. Gears automate capture of provenance to support scientific reproducibility and regulatory approvals. Further, Flywheel helps you work with existing pipelines external to the system with powerful APIs and tools for leading scientific programming languages, including Python, MATLAB, and R.

Collaborate

Collaboration is enabled through secure, IRB-compliant projects. Collaboration may be within an institution or across the globe for applications such as clinical trials or multicenter studies. Flywheel projects provide role-based access controls to authorize access and control sharing of data and algorithms. Data may be reused across project boundaries for applications such as machine learning, which require as much data as possible.

Compliance

Flywheel helps reduce security and data privacy risks by providing a secure, regulatory-compliant infrastructure for systematically scaling research data management according to HIPAA and GDPR requirements. The platform provides integrated tools for deidentification of research data to ensure the protection of personal healthcare information.

A research-first platform answers the challenges to implementing AI

Flywheel's innovative research informatics platform helps you maximize the value of your data and serves as the backbone of your imaging research and machine learning strategy. Flywheel overcomes the limitations of systems designed for clinical operations to meet the unique needs of researchers. The result is improved collaboration and data sharing and reuse. Ultimately, Flywheel improves research productivity and accelerates innovation.

Original article can be found on Aunt Minnie