Improved Collaborative Workflows with Custom Roles and Permissions

Flywheel is committed to provide customization tools for a secure, collaborative workflow. Previously, Flywheel offered fixed, predefined roles and permissions for administrators to match to site users and project collaborators. Now, administrators can have complete control over user permissions and defining roles for projects using a simple interface.

Tailor Your Workflows With Custom Roles and Permissions

The Custom Roles and Permissions interface enables you to: 

Align the Flywheel system with specific responsibilities of the users. Select user capabilities for project management, access to files and metadata, and computational permissions.

Ensure your workflow is consistent with your organization’s policies. Define roles that ensure your research process follows organizational policies for viewing, modifying, and deleting data.

Implement fine-grained control to prevent unauthorized use and reduce risk. Ensure data integrity by entrusting only specific users with the ability to modify data.

Easily coordinate  on responsibilities in multi-site collaboration. Reflect the permissions collaborators need to have while observing multiple institutional procedures.

Flexible Controls Enable a Variety of Applications

For example, here’s how you might use custom roles and permissions:

  • Data Managers in clinical trials can be restricted from viewing or modifying analyses.
  • A statistician role can be created with permissions to run gears, perform analyses but are restricted from deleting or modifying underlying data.
  • A compliance coordinator role can be created with limited permissions to view metadata and data only to ensure project contents are valid and complete.

Powerful Controls and Easy-to-Use

Custom roles are defined at the site level, enabling consistency in permission sets across the site. Controls over Flywheel permissions include a user’s level of access with data, which data permissions apply to, and other key operations like running analysis or downloading data. 

Creating an "Analyst” role with limited project permissions but has the ability to work with analyses

Research groups may then select from the site’s defined roles for the roles that fit their workflow. Users are assigned a specific role or multiple roles at the project level.

Setting roles at the project level - Note that users can be assigned multiple roles

 

You may find additional information about setting User Roles & Permissions in our documentation.


Computing with Flywheel

I am often asked to explain how Flywheel supports a broad range computational workflows, including:

  • Working with existing pipelines
  • Exploratory development and analysis
  • Automating routine processing

Flywheel offers an open and extensible approach that provides you the flexibility to work in the manner that makes sense for your lab or project.

Working with existing processing pipelines

The simplest approach for working with existing pipelines involves downloading the required data from Flywheel and processing it as usual. Flywheel provides several download options including the web-based UI and command line tools. For more control over selecting and formatting data, Flywheel provides easy-to-use programming interfaces for use with leading scientific languages including Python, MATLAB, and R. These may be used to access, format, and download any data or metadata in the Flywheel database.  

Exploratory Development and Analysis

For developing new algorithms or pipelines, Flywheel’s Python and MATLAB SDKs provide a powerful alternative to downloading to disk. Using the SDKs, a Python or MATLAB user may work with data in Flywheel directly from their preferred scripting language. Full search is available along with simple commands for reading and writing data and metadata.

Routine Processing with Plug-In Applications (Gears)

Gears are plug-in applications that automate routine tasks, including metadata extraction, classification, quality assurance, format conversion, and full analytic pipelines.  Here’s how gears work:

Leveraging Standard OCI-Compliant Containers

From a technical perspective, Gears are applications running in standard OCI-compliant (Docker, Singularity, etc.) containers that are managed by Flywheel. A container typically contains application code and all of its dependencies to create a portable, reproducible unit of processing. Containers can be easily made into Gears with the addition of metadata that explains to Flywheel how to use the containerized applications. This metadata is expressed via a simple JSON file that includes descriptive metadata, such as links to source code, authors, etc. It also includes instructions for passing in data, configuration options, and how to execute commands in the container.

Automating and Scaling Gear Execution

Gears may be run in a variety of ways. They may be executed on demand for a given data set.  They may also be run in batch mode for a selected collection of data sets. In these cases, the user is prompted for inputs prior to execution. Gears may also be run automatically by rules configured for the project.  For example, when a DICOM series is uploaded, it can be classified and converted to NIfTI, if it is imaging data. Gear rules may be used to automate routine pre-processing as well as trigger complex pipelines. Gears may be scheduled by tasks outside of Flywheel using the command line tool (CLI) or programming interfaces. Finally, when deployed in cloud or private cloud infrastructures, Flywheel can dynamically scale resources to maximize parallel processing to save you time.

Process Any Level of Data in Your Project

Gears may be designed to process data at different levels of the Flywheel project hierarchy.  Gears may process individual sessions (exams/DICOM studies). For longitudinal studies, Gears may be used to process at the subject (participant/patient) level with the ability to process data from multiple sessions. Finally, project-level Gears may be used to perform group/cohort analyses across all subjects.  

Automated Provenance

A key advantage of using Gears to manage routine processing is the documentation that results. Everytime a Gear is run, Flywheel records a great deal of derivative information that supports consistency and reproducibility of your project. These “Analysis” documents record Gear version, who ran it, when it ran, success/fail status, inputs, configuration options used, and outputs produced.  Further, they may be annotated with notes or structured JSON metadata to meet your project needs. This provenance makes it easy to ensure that all necessary processing steps were performed and performed consistently.

Flywheel Gear Exchange

To speed project deployment, Flywheel provides a library of commonly used algorithms as Gears via the Flywheel Gear Exchange. The Gear Exchange currently contains roughly 70 Gears contributed by Flywheel or Flywheel users. Examples include DICOM-to-NIfTI conversion, Freesurfer Recon-All, the Human Connectome Pipelines, and commonly used BIDS applications, such as MRIQC and FMRIPrep. The Gear Exchange provides a powerful way to share reproducible units of code that may be used as building blocks for new projects.

User-Developed Custom Gears

Users may easily create their own Gears as well. Gear developers simply get their code running in an OCI-compatible container and provide the gear metadata. Applications may be developed in any language. Flywheel’s APIs and SDKs may be used in a Gear if needed, otherwise, the containerized application need not be Flywheel aware. 

Flywheel streamlines the process of creating the Gear metadata via the CLI Gear Builder tool which prompts the user through the required information and generates most of the metadata automatically. The resulting Gears may be shared with other Flywheel sites via the Flywheel Gear Exchange, or may be kept private by uploading them only to the user’s site. Flywheel does not make any claim on any of the intellectual property in customer Gears.

Conclusion

Flywheel makes it easy to work the way you want. Our open CLI, APIs, and SDKs make it easy to download data and use existing processes. Our Gears framework allows you to automate routine processing consistently with extensive documentation to support quality and reproducibility.

Read more about our scientific collaborations or send us your questions!


Saving Time and Money with Flywheel HPC Integration 

Flywheel, a comprehensive research data platform for medical imaging, machine learning, and clinical trials, recently rolled out a beta integration feature for High Performance Computing (HPC) clusters, including Slurm and SGE. With this new feature, Flywheel supports computing on HPC clusters, in addition to more traditional virtual machine (VM)-based deployments in the cloud or on-premises.

As capital investments (frequently in the millions of dollars), HPC systems constitute an enormous opportunity as a local, shared resource for an organization. At the same time, these systems are often difficult or confusing to use, due to their specialized nature and older technology base. Access is frequently restrictive, and the workflow for running software on an HPC cluster is significantly different from a traditional machine, due to the "drop off & pick up" nature of the interaction. 

Furthermore, debugging tends to have an extremely long turnaround time, due to the system's fluctuating and inscrutable job queue. This tends to result in idle cluster capacity. Flywheel can increase HPC utilization, by making the system more accessible to a large user community.

During beta testing alone, one Flywheel customer estimated savings in excess of $7000 over a period of one month, by moving some of their compute-intensive workloads from cloud hosting to their university-sponsored HPC. In that time, over four months of single-machine, eight-core work transpired. Much of that capacity would otherwise have sat idle on their cluster.

With Flywheel, scientific algorithms run in OCI-compliant (Docker, etc.) containers, called Gears. When using Flywheel with the new HPC integration, customers work directly with us to whitelist specific Gears for this feature, but still access the same point-and-click experience available to all Gears. The Flywheel system translates the request into the system-specific format, submits the HPC job, using the Singularity container runtime, waits for the HPC queue to pick up the work, and marshals input and output data to and from the system.

The result is that all of Flywheel’s computation management features - such as batch jobs, SDK integration, and Gear Rules - work out of the box on HPC systems or local hardware, with great potential for improving productivity and reducing costs.

Ted Satterthwaite, MD, Assistant Professor in the Department of Psychiatry at the University of Pennsylvania Perelman School of Medicine

“The Flywheel integration with the HPC at Penn has been a total game-changer. It allows us to leverage the complementary advantages of two powerful systems. By launching compute jobs as containerized Gears through Flywheel, we can ensure total reproducibility. Furthermore, by integrating Flywheel with the massive computational resources provided by the Penn HPC, run by Christos Davatzikos, we can run computationally demanding jobs at scale across large samples without worrying about cloud compute charges. Throughout, the Flywheel engineering team was incredibly responsive; it was really a model for successful collaboration.”

– Ted Satterthwaite, MD, Assistant Professor in the Department of Psychiatry at the University of Pennsylvania Perelman School of Medicine


Leveraging Flywheel for Deep Learning Model Prediction

Since 2012, the Medical Image Computing and Computer Assisted Intervention Society (MICCAI) has put on the Brain Tumor Segmentation (BraTS) challenge with the Center for Biomedical Image Computing and Analytics (CBICA) at the Perelman School of Medicine at the University of Pennsylvania. The past eight competitions have seen rapid improvements in the automated segmentation of gliomas. This automation promises to address the most labor-intensive process required to accurately assess both the progression and effective treatment of brain tumors.

In this article, we demonstrate the power and potential of coupling the results of this competition with a FAIR (Findable, Accessible, Interoperable, Reusable) framework.  With constructing a well-labeled dataset constituting the most labor-intensive component of processing raw data, it is essential to automate this process as much as possible. We utilize Flywheel as our FAIR framework to demonstrate this process.

Flywheel (flywheel.io) is a FAIR framework that leverages the proprietary core infrastructure with open-source extensions (gears) to collect, curate, compute on, and collaborate on clinical research data. The core infrastructure of a Flywheel instance manages the collection, curation, and collaboration aspects, enabling multi-modal data to be quickly searched across an enterprise-scale collection. Each “gear” of the Flywheel ecosystem is a container-encapsulated open-source algorithm with a standardized interface. This interface enables consistent stand-alone execution or coupling with the Flywheel core infrastructure—complete with provenance of raw data, derived results, and usage records.

For the purposes of this illustration, we wrap into a gear the second-place winner of the MICCAI 2017 BraTS Challenge. This team’s entry is one of the few that has both a docker hub image and a well-documented github repository available. Their algorithm is built around both TensorFlow and NiftyNet frameworks for training and testing their Deep Learning model. As illustrated in our github repository, this “wrapping” constitutes providing the data configuration expected by their algorithm and launching their algorithm for model prediction (*).

As shown in the figure above, Flywheel provides a user-friendly interface to navigate to the MRI images expected for execution. With the required co-registered and skull-stripped MRI modalities (T1-weighted, T1-weighted with contrast, T2-weighted, and Fluid Attenuation Inversion Recovery), segmentation into distinct tissues (normal, edema, contrast enhancing, and necrosis) takes twelve minutes on our team’s Flywheel instance (see figure below). This task can take a person over an hour to segment the same tumor. When performed on a Graphical Processing Unit (GPU), this task takes less than three minutes to complete.

Segmentation into normal, edema, contrast enhancing, and necrosis tissues with the Flywheel-wrapped second place winner of the 2017 BraTS Challenge.

Although this example predictively segments the tumor of a single patient, modifications to this gear can allow tumor segmentation of multiple patients for multiple imaging sessions over the course of their care. Furthermore, with scalable cloud architecture, these tasks can be deployed in parallel, significantly reducing the overall time required to iterate inference over an entire image repository. Enacting this as a pre-curation strategy could significantly reduce the time necessary for manual labeling of clinical imaging data. 

Therein lies the vast potential benefit from using a strong FAIR framework in an AI-mediated workflow. Being able to pre-curate new data, optimize human input, and retrain on well-labeled data over accelerated time-scales. These model design, train, and test cycles are greatly facilitated by a FAIR framework, which is able to curate the data, results, and their provenance in a searchable interface.

As with this brain tumor challenge example, there are many other similar challenge events that make their algorithms and pretrained models publicly available for the research community.  One nexus of these is the Grand Challenges in Biomedical Image Analysis, hosting over 21,000 submissions in 179 challenges (56 public, 123 hidden).  Flywheel’s capacity to quickly package these algorithms to be interoperable with its framework makes it a powerful foundation for a data-driven research enterprise.

Two more useful deep learning and GPU-enabled algorithms have recently been incorporated into Flywheel gears. First, quickNAT uses default or user-supplied pre-trained deep learning models to segment neuroanatomy within thirty seconds when deployed on sufficient GPU hardware. We have wrapped a Pytorch implementation of quickNAT in a Flywheel gear. Prediction of brain regions on CPU hardware requires two hours.  Although much longer than thirty seconds needed on a GPU, it is still a fraction of the nearly twelve hours needed for FreeSurfer’s recon-all. Next, we have Nobrainer, a deep learning framework for 3D image processing. The derived Flywheel gear uses a default (or user-supplied) pre-trained model to create a whole brain mask within two minutes on a CPU. Utilizing a GPU brings this time down under thirty seconds.

The previous paragraph elicits two questions. First, with GPU model prediction times significantly faster than CPUs, when will GPU-enabled Flywheel instances be available? The next being, how can Flywheel be effectively leveraged in training deep learning models? Flywheel is actively developing GPU-deployable gears and the architecture to deliver them.  We briefly explore the second question next, leaving a more thorough investigation for another article.

Training on an extensive and diverse dataset is needed for Deep Learning models to generalize effectively and accurately across unseen data. With uncommon conditions, such as gliomas, finding enough high-quality data at a single institution can be daunting. Furthermore, sharing these data across institutional boundaries incurs the risk of exposing protected health information (PHI). With Federated Training, Deep Learning models (and their updates) are communicated across institutional boundaries to acquire the abstracted insight of distributed annotation. This eliminates the risk and requirement of transferring large data repositories while still allowing model access to a diverse dataset. With Federated Search across institutional instances of Flywheel firmly on the roadmap, this type of Federated Training of Deep Learning models will be possible within the Flywheel ecosystem.

(*) The authors of this repository and the University College London do not explicitly promote or endorse the use of Flywheel as a FAIR framework. 


Why a Research-First Platform for Imaging Informatics and Machine Learning?

It's no secret that researchers face many challenges that impede the research and development of artificial intelligence (AI) solutions in clinical settings. Machine learning requires large volumes of data for accuracy in most applications. Institutions often have a wealth of data but lack the systems needed to get it into the hands of researchers cost-effectively.

Those data must be of high quality and labeled correctly. Imaging projects often involve complex preprocessing to identify and extract features and biomarkers. To further complicate matters, security and privacy are critical, particularly when involving collaboration outside of the context of clinical care.

Unfortunately, established clinical solutions fail to address six critical needs of researchers, impeding research productivity and slowing innovation.

Multimodality

Imaging offers significant opportunities for machine learning, but imaging is often not enough. Given that so much of today's research is centered around precision medicine and opportunities to revolutionize cost and quality of care, researchers often require a 360° degree view of patients including EMR, digital pathology, EEG, -omics, and other data. Clinical imaging systems such as PACS and vendor-neutral archives (VNAs) are designed specifically for imaging and typically don't deal well with nonimaging data, particularly in the context of research workflows.

Cohorts, projects, and IRB compliance

Researchers require the ability to organize and analyze data in cohorts while enabling collaboration with others outside of the context of clinical care. Clinical imaging systems are designed for individual patient care, not for cohort or population health studies, and often lack the organizational structures required for research applications such as machine learning. Institutional review boards (IRBs) typically define for a project the scope of allowed data as well as the people authorized to work with that data. Modern research informatics systems must enable productive workflows while enforcing these IRB constraints.

Quality assurance

Machine learning can be highly sensitive to the quality of the data. Researchers must be able to confirm the quality of data, including completeness and consistency with the protocol defined for the study. Quality control and supporting documentation are required for scientific reproducibility and for processes such as U.S. Food and Drug Administration (FDA) approval. Subsequently, modern informatics systems must incorporate comprehensive support for quality assurance as part of the workflow.

Integrated labeling and annotation workflows

Machine learning depends on accurately labeled sample datasets in order to effectively train AI models. Real-world data, often originating from multiple sources, generally lack the structure and consistent labels required to directly support training. Modern imaging informatics solutions must provide the ability to efficiently organize and classify data for search and selection into the appropriate projects or machine-learning applications. Labeling workflows must be supported, including the ability to normalize classification of images and other factors such as disease indications. In the context of imaging, this may involve image annotations collected from radiologists or other experts in a consistent, machine-readable manner via blind multireader studies or similar workflows.

Automated computational workflows

Imaging and machine learning are computationally intensive activities. Research informatics platforms must automate and scale computational workflows ranging from basic image preprocessing to analytic pipelines and training AI models. The ability to rapidly define and integrate new processes using modern tools and technologies is critical for productivity and sustainability. These systems must also provide the ability to leverage diverse private cloud, public cloud, and high-performance computing (HPC) infrastructures to achieve the performance required to process large cohorts cost-effectively.

Integrated data privacy

Data privacy is critical. Compliance with regulations such as HIPAA and GDPR is a must, given the potential financial and ethical risks. However, the lack of scalable systems for ensuring data privacy is impeding researcher access to data and, therefore, slowing innovation and the related benefits. Modern research informatics solutions must systematically address data privacy. Regulations require deidentification of protected health information to the minimum level required for the intended use. However, the minimum level of identification may differ by project. Subsequently, informatics solutions must integrate deidentification and related data privacy measures in a way that can meet the needs of projects with different requirements while maintaining compliance.

Data as a strategic asset with FAIR

Data is the key to clinical research and machine learning. A scalable, systematic approach to research data management should be the foundation of research strategies aimed at machine learning and precision care. Cost-effectively scaling access to clinical data in a manner that supports research workflows while ensuring security and data privacy can improve research productivity, accelerate innovation, and enable research organizations to realize their strategic potential.

Implementing the FAIR principles in your organization helps maximize the strategic value of data that exists in your institution. These principles, developed by academics, agency professionals, and industry members, amplify the value of data by making it Findable, Accessible, Interoperable, and Reusable (FAIR).

  • Findable data are labeled and annotated with rich metadata, and the metadata are searchable.
  • Accessible data are open to researchers with the correct authorization, and the metadata persist even after data are gone.
  • Interoperable data follow standards for storing information and can operate with other metadata and systems.
  • Reusable data are well-described and well-tracked with provenance for computation and processing.

Modern informatics systems should deliver on the FAIR principles while supporting the workflow needs of researchers as described above.

A clinical research platform designed to enhance productivity and accelerate innovation

Flywheel is a new class of informatics platform that addresses the unique needs of researchers involved in imaging and machine learning. Deployed at leading research institutions around the world, Flywheel supports the entire research workflow including capture, curation, computation, and collaboration, plus compliance at each step.

Capture

Flywheel is designed for true multimodality research. While the system specializes in the unique data types and workflows associated with imaging, the platform is capable of managing nonimaging data such as EMR, digital pathology, EEG, genomics, or any other file-based data. Further, Flywheel can automate data capture from imaging modalities and also clinical PACS and VNAs to streamline research workflows as well as translational testing scenarios.

Curate

Flywheel is unique in its ability to organize and curate research data in cohort-centric projects. The platform provides extensive tools for managing metadata including classification and labeling. Quality assurance is supported through project templates and automation rules. Integrated viewers with image annotation and persistent regions of interest (ROIs) are provided to support blind multireader studies and related machine-learning workflows. Powerful search options with access to all standard or custom metadata are provided to support the FAIR principles.

Compute

Flywheel provides comprehensive tools to automate routine processing, ranging from simple preprocessing to full analytic pipelines and training machine-learning models. The platform scales computational workloads using industry-standard "containerized" applications referred to as "Gears." Gears may originate from Flywheel's Gear Exchange containing ready-to-use applications for common workflows or may be user-provided custom applications. The platform supports elastic scaling of workloads to maximize performance and productivity. Gears automate capture of provenance to support scientific reproducibility and regulatory approvals. Further, Flywheel helps you work with existing pipelines external to the system with powerful APIs and tools for leading scientific programming languages, including Python, MATLAB, and R.

Collaborate

Collaboration is enabled through secure, IRB-compliant projects. Collaboration may be within an institution or across the globe for applications such as clinical trials or multicenter studies. Flywheel projects provide role-based access controls to authorize access and control sharing of data and algorithms. Data may be reused across project boundaries for applications such as machine learning, which require as much data as possible.

Compliance

Flywheel helps reduce security and data privacy risks by providing a secure, regulatory-compliant infrastructure for systematically scaling research data management according to HIPAA and GDPR requirements. The platform provides integrated tools for deidentification of research data to ensure the protection of personal healthcare information.

A research-first platform answers the challenges to implementing AI

Flywheel's innovative research informatics platform helps you maximize the value of your data and serves as the backbone of your imaging research and machine learning strategy. Flywheel overcomes the limitations of systems designed for clinical operations to meet the unique needs of researchers. The result is improved collaboration and data sharing and reuse. Ultimately, Flywheel improves research productivity and accelerates innovation.

Original article can be found on Aunt Minnie

 


Four AI Workflow Trends from RSNA 2019

The Biggest Trend: Maturing Implementation of AI

Attendees who visited our booth last year were interested in learning about AI capabilities. This year they were bringing questions about implementing infrastructure needed for AI and how to scale AI research in their organizations. Scaling access to clinical data and interoperability appears to be a rising concern this year. Organizations are also gradually accepting cloud scaling as a secure option.

Radiologists are beginning to plan for AI in their standard workflows. There were many radiologists in our booth asking questions with respect to AI research in their current clinical workflows.

Data Curation for Research Still Falls Short

The focus in many workshops and presentations from radiologists was “data wrangling” and data set quality. We received many questions from attendees regarding metadata management and labelling tools. At the same time there is growing recognition that clinical systems don’t meet the needs of the research and AI development communities. Additionally, an entirely new class of solution that supports the research workflow is needed.

We recommend Dr. Paul Chang’s (University of Chicago) AuntMinnie interview during RSNA: “AI is like a great car … Most cars still need gas and roads. In the context of this analogy, gas is vetted data and the road is workflow orchestration that is AI-enabled... The only way to make a transformative technology real is to do the boring stuff, the infrastructure stuff.”

Everyone Noticed the Busy AI Showcase

The AI Showcase was very active this year. In 2018, there were roughly 70 vendors in the AI Showcase, but this year there were 129, including many international AI vendors. We noticed growth in AI development for cardiac and brain imaging.

It’s Imminent: Equipment Vendors are Integrating AI Workflows

AI is moving beyond the desktop as imaging equipment manufacturers have their eye on supporting research workflows. Leading equipment manufacturers like Philips and Canon displayed developments in their interfaces to support AI or analysis tools in a disease specific applications. Flywheel is expanding partnerships with AI vendors and equipment vendors in addition to supporting clients performing imaging and clinical research.

CEO Travis Richardson presenting at the Google Cloud Booth about Flywheel’s scalable infrastructure for machine learning.

Flywheel Delivers Reproducibility

Flywheel is committed to supporting reproducible research computations.  We make many software design decisions guided by this commitment. This document explains some key reproducibility challenges and our decisions. 

Reproducibility challenges

Flywheel’s scientific advisory board member, Victoria Stodden, writes that reproducible research must enable people to check each other's work. In simpler times, research articles could provide enough information so that scientists skilled in the art could check published results by repeating the experiments and computations. But the increased complexity of modern research and software makes the methods section of a published article insufficient to support such checking. The recognition of this problem has motivated the development of many tools.

Reproducibility and data

A first requirement of reproducibility is a clear and well-defined system for sharing data and critical metadata. Data management tools are a strength of the Flywheel software. The tools go far beyond file formats and directory trees, advancing data management for reproducible research and the FAIR principles.

Through experience working with many labs, Flywheel recognized the limitations of modern tools and what new technologies might help. Many customers wanted to begin managing data the moment they were acquired rather than waiting until they were ready to upload fully analyzed results. Flywheel built tools that acquire data directly from imaging instruments - from the scanner to the database. In some MRI sites, Flywheel even acquires the raw scanner data and implements site-specific image reconstruction. The system can also store and search through an enormous range of metadata including DICOM tags as well as project-specific custom annotations and tags.

Reproducibility and containers

A second requirement of reproducibility is sharing open-source software in a repository, such as GitHub or BitBucket. Researchers, or reviewers, can read the source code and in some cases they can download, install and run it. 

Based on customer feedback, Flywheel learned that (a) downloading and installing software - even from freely available open-source code on GitHub! - can be daunting, (b) customers often had difficulty versioning and maintaining software, as students and postdocs come and go, and (c) they would run the software many times, often changing key parameters, and have difficulty keeping track of the work they had done and the work that remained to be done. 

To respond to these challenges, Flywheel implemented computational tools based on container technology (Docker and Singularity). Implementing mature algorithms in a container nearly eliminates the burden of downloading, compiling, and installing critical pieces of software.  Containers include the compiled code along with all the dependencies, such as libraries in small virtual machines that can be run on many operating systems (PC, Mac, Linux, each with different variants). These small virtual machines (containers) can be run on a local machine or on a cloud system. This eliminates the burden of having to find the code, update all the dependencies, and compile.

Reproducibility and analyses: Introducing Gears

Once an algorithm is implemented in a container, Flywheel users run it. A lot. They wanted ways to record the precise input data as well as the algorithm version parameters that were used as they explored the data. The outputs also needed to be recorded. Such a complete record is difficult for individuals to implement; having such a record is necessary for reproducibility.

Flywheel solves these problems by creating a computational system for managed application containers, which we call Gears. The Gear is structured to record every parameter needed to perform an analysis. When the user runs a Gear, the input data, specific version of the container, all the parameters needed to run the container, and the output data are all recorded in the database. This is called an ‘Analysis’ and users perform and store hundreds of Analyses on a data set.

Because all the information about an Analysis is stored in the database associated with the study, people can re-run precisely the same Gear. It is also straightforward to run the same Gear using different data, or to explore the consequences of re-running the Gear after selecting slightly different parameters. Making Analyses searchable also helps people keep track of which Gears were run and which still need to be run. 

Reproducibility and documentation

Clear writing is vitally important to making scientific work reproducible. Tools that support clear and organized notes during the experiments are also very valuable. During the initial development, Flywheel partnered with Fernando Perez and the Jupyter (then iPython) team to implement tools that built on shared software. Flywheel continues to find ways to support these tools. Flywheel tools permit users to link their data to published papers, write documentation about projects and sessions, and add notes. This documentation is part of the searchable database, and Flywheel will continue to support users to incorporate clean and thorough documentation.

 


Flywheel Delivers Data Management

Persistently storing data is the critical first step in planning for reproducible science. Defining file formats and organizing directories is a good start; in our experience this is where most researchers focus their efforts. But modern computer science provides many technologies that improve data storage, making data FAIR e.g. findable, accessible, interoperable, and reusable (see Flywheel delivers FAIR). Flywheel uses these tools in order to support reproducible science.

Metadata are important

The value of raw data, for example the numerical data of an image, is vastly increased when we know more about the data. This information - called the metadata - can tell us many important things: the instrument parameters used to acquire the data, information about the subject (demographics, medical conditions, etc.), time and place of the acquisition, and facts about the experimental context; for example, that the subject fell asleep during the resting state MR scan.  

The biomedical imaging community recognizes the importance of metadata in two important cases. First, by defining standard file formats (DICOM or NIfTI) that embed metadata into the file header. Second, the BIDS system recognizes the importance of metadata, using the file name or an accompanying file ‘sidecar’ to store useful metadata.

Storing metadata within a file header, or an accompanying file, is a good start. But using an extensible database offers many advantages. Here is why:

Databases are efficient

Nearly all modern computer operating systems use databases to store files and their metadata. For example, on Apple systems the (CMD-I) command returns metadata (‘Info’) about the file from the operating system’s database (comments, preview, kind of file) as well as standard Posix information like file size and date of access. The Apple Spotlight search uses the database to identify files.

There are many advantages to storing information about a file in a database compared to putting the information in the file header or accompanying file. For example, we have seen many cases in which people fail to keep the two files together; and sometimes they rename one of the files and lose the association between the data and metadata files. Putting the information in the file header avoids these problems but has others. Files are distributed across the disk making searches through file headers very inefficient. Also, files arise from many different sources and it is virtually impossible to guarantee that vendors keep up-to-date with changes. Headers are most useful for a particular type of file, but not for a large system.

Databases solve these problems by having the user interact with files through a unified interface that includes the name of the raw data file on disk as well as the associated metadata. To read the raw data, one consults the database for the location of the file containing the raw data. To read the metadata, one consults only the database. Typically, the database itself is small, and updates to its format or additions to its content are possible. 

Flywheel uses a document database (MongoDB) to manage user interactions with data and metadata. In the Flywheel system, you can read metadata via the web-browser interface. When programming, you can access metadata using the software development kits (SDKs) or REST API. 

Metadata can be attached to any object in the system hierarchy

The Flywheel data are organized in a hierarchy: Group, Project, Subject, Session, Acquisition, Files and Analyses. This hierarchy can incorporate virtually any file type and associated metadata. Most of our customers store files containing medical imaging data in the hierarchy, including MRI, PET, CT, OCT, and pathology images.  But some customers store other types of files, such as computer graphics files that are useful for machine learning. All of the objects, the files and the organizational containers (Project, Subject, Session, Acquisition, Analyses) are described in the database, each with its own metadata. Users can search, annotate and reuse the files and containers from any level in the Flywheel system.

Metadata are flexible

By using a general database, Flywheel can be complete and flexible. For MRI DICOM files, the database includes all of the header information in the file, such as TR, TE, voxel size, and diffusion directions. In addition, the Flywheel database includes fields for users to place searchable notes, say, about the experiment. The database can also include links to additional experimental information about the subject and auxiliary measures (often behavioral data).

The Flywheel database can add fields without needing to rebuild the entire database. For example, as new MRI technologies developed, we were able to add additional fields that describe the new acquisition parameters. Similarly, Flywheel regularly expands to manage new types of data; as we do so, we add new database fields.

Data reuse

Flywheel helps users to reuse data by (a) helping them find data sets and (b) using the search results to create a new project in their database. Adding a database entry eliminates the need for data copying - we simply copy database entries to specify the new project’s sessions, acquisitions, and files.  Flywheel calls such a virtual project a 'Collection'. 

Reproducible science 

Data management and the ability to search across all types of objects enhance the value of the data. Carefully storing and managing metadata supports finding and reusing data, two pillars of FAIR and reproducible research

Contact us here for a demonstration to see how Flywheel’s database and further computing features can be the backbone of your research.


Four Takeaways from BioData World West 2019

BioData World West wrapped up its third year! A mix of experts from industry, academia, and government mingled and mused on the data management supporting the healthcare industry.

Below are the insights from our own Chief Technology Officer, Gunnar Schaefer and Director of Sales, Marco Comianos, who attended.

Gunnar Schaefer, Co-Founder and CTO of @Flywheel_io presents on scaling medical imaging and machine learning in clinical research

Share quality data within your organization

The main focus among conversations at BioData this year was making data accessible across departments and organizations. Letting data flow freely between labs in life sciences organizations creates a feedback loop from health network partners and previously unprofitable drug trials. In health networks, data scientists can highlight opportunities where patients are underserved to create better experiences and processes that can be streamlined to cut costs.

When these different sources of data are merged, unconventional combinations of biomedical data can point to obscure patterns of disease. Scientists from organizations like GenomeAsia, Sidra Medicine, and AstraZeneca presented their findings from blending microbiome and genetic research, genotypic and phenotypic data, and imaging and text data. 

In order for machine learning to power artificial intelligence applications, data must be routed, organized, cleaned, and standardized from the moment of creation. More important than proper data storage is the ability to query a system over and over for renewed insight. Genentech introduced the need to store data so it is FAIR: findable, accessible, interoperable, and reusable. That way, data are ripe for query and can integrate together for analysis. 

However, it’s important to remember that no matter how well sources are linked together, data must be high-quality and machine learning investigations must be ethically supervised. As Faisal Khan of AstraZeneca put it: “Tortured data will confess to anything.” 

Looking forward, expect life sciences companies to adopt better data principles in their data strategies, refine what’s working already, and search for software that bridges the gaps.

Being precise about requirements for precision medicine

Much of the groundwork for precision medicine is now being laid, though mostly in oncology. At BioData, speakers gave direction for its high-value applications. 

Today’s genomics research can treat previously-untreated rare diseases. A panel addressed how data sharing must accompany public genomic projects to optimize therapeutic development for rare diseases. Presenters also reported on diversifying the pools for large genome projects. On the treatment side, analysts explained methods to match an individual’s genomic profile with one out of many pre-existing drugs, saving time for patients facing debilitating diseases. 

These advancements require access to large amounts of data with well-defined interoperability. Looking forward, expect the general hype around precision medicine to fade, making way for discussions about infrastructure which enable answers to disease-specific precision questions.

Machine learning shortens both ends of drug trials

Beyond the potential for drug discovery using genetic markers, algorithms were showcased which had correctly predicted the pharmacokinetics and effectiveness of drug compounds. Not only does this technology assist researchers and cut costs for developing compounds or finding targets, once therapies are in clinical trials, AI can predict the likelihood of certain subpopulations having an adverse reaction to a drug. Clinical trial pools normally miss these portions of the population, which can result in a public perception crisis. 

Looking forward, expect to see AI use with historical clinical data and patient data becoming a competitive factor in shortening the time horizon for successful drug launches. We’ll also see  which AI vendors become the most productive partners for life sciences organizations.

AI specialists come ready to partner

If data scientists hold some healthy skepticism of practically applying machine learning, AI specialists showed up with the energy to compensate. AI specialists are drawing talent from universities to specialize in anatomical regions. Companies in this vertical are also starting to partner with each other to complement their deep expertise in one region.

Many AI companies at BioData specialize in genomics and digital slide pathology, so look forward to development and consolidation in this field. Fewer imaging analysis companies were present at BioData - stay tuned for the imaging market insights yet to come out of RSNA!

At RSNA’s Annual Meeting, Flywheel will be exhibiting from December 1st to December 5th in the AI Showcase. Schedule a demo and find us at booth #11618.


Flywheel Delivers FAIR Principles

The FAIR acronym is a nice way to summarize four important aspirations of modern research practice: scholarly data should be Findable, Accessible, Interoperable, and Reusable. The article describing the FAIR aspirations is excellent, and we recommend reading it. Some limitations of current practice are described here. Our company was founded to advance research and we embrace these principles.

Flywheel, software used by thousands of researchers, embodies tools and technology that deliver on the FAIR principles.

About Flywheel

Flywheel is an integrated suite of software tools that (a) stores data and metadata in a searchable database, (b) includes computational tools to analyze the data, and (c) provides users with both browser-based and command line tools to manage data and perform analyses. Our customers use these tools on a range of hardware platforms: cloud systems, on-premise clusters and servers, and laptops.

Flywheel supports users throughout a project’s life cycle. The software can import data directly from the instrument (like an MR scanner) and extract metadata from the instrument files that is stored into the database. Auxiliary data from other sources can also be imported into the database. The user can view, annotate, and analyze the data, keeping track of all the scientific activities. Finally the data and analyses can be shared widely when it is time to publish the results.

FAIR Data Principals Implemented

Findable

Flywheel makes data ‘Findable’ by search and browsing. The Flywheel search tools address the entire site’s dataset, looking for data with particular features. It is straightforward, for example, to find the diffusion-weighted imaging data for female subjects between the ages of 30 and 45. The user can contact the owners of the data for access, and the data returned by a search can be placed in a virtual project (Collection) for reuse and further analysis.

Search is most effective when there are high quality metadata associated with the data and analyses. Flywheel creates a deep set of metadata by scanning the image data, classifying them. Users can attach specific searchable key words and add data-specific notes at many places - from the overall project level, the session level, the specific data file or the analyses. Users can find data by searching based on these descriptions.

Accessible

Our customers frequently observe that there is a conflict between making data accessible (sharing) while complying with health privacy rules. We live in a world with privacy officers on the one hand and open data advocates on the other.

Flywheel delivers an accessible solution that is respectful of both principles. We implemented a rigorous user-rights management system that is easy to use. Access to the data and analyses is controlled through a simple web-based interface. The system implements the different roles that are needed during a project’s life cycle. At first perhaps only the principal investigator and close collaborators have access; later, additional people (reviewers, other scientists) might be granted access to check the data and analyses. When ready, the anonymized data and full descriptions of the analyses can be made publicly viewable. An effective system that manages a project through these stages is complicated to write, but Flywheel makes the system easy-to-use through its browser interface.

Interoperable

Most scientists have felt the frustration of learning that a dataset is available, but the file format or organization of the data files requires substantial effort to decode and use. The medical imaging community has worked to reduce this burden by defining standardized file and directory organizations. Flywheel is committed to using and promoting these standards.

Our experience teaches us that well intentioned file formats and directory organizations are not enough. Flywheel stores far more information than what one finds in the header of a DICOM or NIfTI file or the BIDS directory structure. Our commitment to interoperability includes reading in files and directories in these standards and even writing Flywheel data into these formats. Beyond this, we are committed to tools that import and export data and metadata between Flywheel and other database systems.

Flywheel is further committed to supporting the interoperability of computational tools. We have opened our infrastructure so that users can analyze data using Flywheel-defined containerized algorithms, their own containers, or their own custom software. The Flywheel standards are clearly defined based on industry-standard formats (e.g., JSON, Docker, Singularity) so that other groups can use them and in this way support computational interoperability.

Reusable

From its inception, Flywheel was designed to make data reusable. Users at a center can share data within their group or across groups, they can reuse the data by combining from different groups, and create and share different computational tools. The user can select data from any project and merge it into a new project. Such reused data is called a Collection in Flywheel. The original data remain securely in place, and the user can analyze the collection as a new virtual project. All the analyses, notes, and metadata of the original data remain attached to the data as they are reused.

Equally important, the computational methods are carefully managed and reusable. Each container for algorithms is accompanied by a precise definition of its control parameters and how they were set at execution time. This combination of container and parameters is called a Flywheel Gear, and the specific Gear that was executed can be reused and shared.

More

The FAIR principles are an important part of the Flywheel system. We have also been able to design in additional functionality that supports these principles.

  • Security and data backup are very important and fundamental. The ability to import older data into the modern technology has been valuable to many of our customers.
  • The visualization tools built into Flywheel help our customers check for accuracy and data quality as soon as the data are part of the system.
  • The programming interface, supported by endpoints accessible in three different scientific programming languages, permits users to test their ideas in a way that gracefully leads to shared data and code.