Leveraging Flywheel for Deep Learning Model Prediction

Since 2012, the Medical Image Computing and Computer Assisted Intervention Society (MICCAI) has put on the Brain Tumor Segmentation (BraTS) challenge with the Center for Biomedical Image Computing and Analytics (CBICA) at the Perelman School of Medicine at the University of Pennsylvania. The past eight competitions have seen rapid improvements in the automated segmentation of gliomas. This automation promises to address the most labor-intensive process required to accurately assess both the progression and effective treatment of brain tumors.

In this article, we demonstrate the power and potential of coupling the results of this competition with a FAIR (Findable, Accessible, Interoperable, Reusable) framework.  With constructing a well-labeled dataset constituting the most labor-intensive component of processing raw data, it is essential to automate this process as much as possible. We utilize Flywheel as our FAIR framework to demonstrate this process.

Flywheel (flywheel.io) is a FAIR framework that leverages the proprietary core infrastructure with open-source extensions (gears) to collect, curate, compute on, and collaborate on clinical research data. The core infrastructure of a Flywheel instance manages the collection, curation, and collaboration aspects, enabling multi-modal data to be quickly searched across an enterprise-scale collection. Each “gear” of the Flywheel ecosystem is a container-encapsulated open-source algorithm with a standardized interface. This interface enables consistent stand-alone execution or coupling with the Flywheel core infrastructure—complete with provenance of raw data, derived results, and usage records.

For the purposes of this illustration, we wrap into a gear the second-place winner of the MICCAI 2017 BraTS Challenge. This team’s entry is one of the few that has both a docker hub image and a well-documented github repository available. Their algorithm is built around both TensorFlow and NiftyNet frameworks for training and testing their Deep Learning model. As illustrated in our github repository, this “wrapping” constitutes providing the data configuration expected by their algorithm and launching their algorithm for model prediction (*).

As shown in the figure above, Flywheel provides a user-friendly interface to navigate to the MRI images expected for execution. With the required co-registered and skull-stripped MRI modalities (T1-weighted, T1-weighted with contrast, T2-weighted, and Fluid Attenuation Inversion Recovery), segmentation into distinct tissues (normal, edema, contrast enhancing, and necrosis) takes twelve minutes on our team’s Flywheel instance (see figure below). This task can take a person over an hour to segment the same tumor. When performed on a Graphical Processing Unit (GPU), this task takes less than three minutes to complete.

Segmentation into normal, edema, contrast enhancing, and necrosis tissues with the Flywheel-wrapped second place winner of the 2017 BraTS Challenge.

Although this example predictively segments the tumor of a single patient, modifications to this gear can allow tumor segmentation of multiple patients for multiple imaging sessions over the course of their care. Furthermore, with scalable cloud architecture, these tasks can be deployed in parallel, significantly reducing the overall time required to iterate inference over an entire image repository. Enacting this as a pre-curation strategy could significantly reduce the time necessary for manual labeling of clinical imaging data. 

Therein lies the vast potential benefit from using a strong FAIR framework in an AI-mediated workflow. Being able to pre-curate new data, optimize human input, and retrain on well-labeled data over accelerated time-scales. These model design, train, and test cycles are greatly facilitated by a FAIR framework, which is able to curate the data, results, and their provenance in a searchable interface.

As with this brain tumor challenge example, there are many other similar challenge events that make their algorithms and pretrained models publicly available for the research community.  One nexus of these is the Grand Challenges in Biomedical Image Analysis, hosting over 21,000 submissions in 179 challenges (56 public, 123 hidden).  Flywheel’s capacity to quickly package these algorithms to be interoperable with its framework makes it a powerful foundation for a data-driven research enterprise.

Two more useful deep learning and GPU-enabled algorithms have recently been incorporated into Flywheel gears. First, quickNAT uses default or user-supplied pre-trained deep learning models to segment neuroanatomy within thirty seconds when deployed on sufficient GPU hardware. We have wrapped a Pytorch implementation of quickNAT in a Flywheel gear. Prediction of brain regions on CPU hardware requires two hours.  Although much longer than thirty seconds needed on a GPU, it is still a fraction of the nearly twelve hours needed for FreeSurfer’s recon-all. Next, we have Nobrainer, a deep learning framework for 3D image processing. The derived Flywheel gear uses a default (or user-supplied) pre-trained model to create a whole brain mask within two minutes on a CPU. Utilizing a GPU brings this time down under thirty seconds.

The previous paragraph elicits two questions. First, with GPU model prediction times significantly faster than CPUs, when will GPU-enabled Flywheel instances be available? The next being, how can Flywheel be effectively leveraged in training deep learning models? Flywheel is actively developing GPU-deployable gears and the architecture to deliver them.  We briefly explore the second question next, leaving a more thorough investigation for another article.

Training on an extensive and diverse dataset is needed for Deep Learning models to generalize effectively and accurately across unseen data. With uncommon conditions, such as gliomas, finding enough high-quality data at a single institution can be daunting. Furthermore, sharing these data across institutional boundaries incurs the risk of exposing protected health information (PHI). With Federated Training, Deep Learning models (and their updates) are communicated across institutional boundaries to acquire the abstracted insight of distributed annotation. This eliminates the risk and requirement of transferring large data repositories while still allowing model access to a diverse dataset. With Federated Search across institutional instances of Flywheel firmly on the roadmap, this type of Federated Training of Deep Learning models will be possible within the Flywheel ecosystem.

(*) The authors of this repository and the University College London do not explicitly promote or endorse the use of Flywheel as a FAIR framework. 


Why a Research-First Platform for Imaging Informatics and Machine Learning?

It's no secret that researchers face many challenges that impede the research and development of artificial intelligence (AI) solutions in clinical settings. Machine learning requires large volumes of data for accuracy in most applications. Institutions often have a wealth of data but lack the systems needed to get it into the hands of researchers cost-effectively.

Those data must be of high quality and labeled correctly. Imaging projects often involve complex preprocessing to identify and extract features and biomarkers. To further complicate matters, security and privacy are critical, particularly when involving collaboration outside of the context of clinical care.

Unfortunately, established clinical solutions fail to address six critical needs of researchers, impeding research productivity and slowing innovation.

Multimodality

Imaging offers significant opportunities for machine learning, but imaging is often not enough. Given that so much of today's research is centered around precision medicine and opportunities to revolutionize cost and quality of care, researchers often require a 360° degree view of patients including EMR, digital pathology, EEG, -omics, and other data. Clinical imaging systems such as PACS and vendor-neutral archives (VNAs) are designed specifically for imaging and typically don't deal well with nonimaging data, particularly in the context of research workflows.

Cohorts, projects, and IRB compliance

Researchers require the ability to organize and analyze data in cohorts while enabling collaboration with others outside of the context of clinical care. Clinical imaging systems are designed for individual patient care, not for cohort or population health studies, and often lack the organizational structures required for research applications such as machine learning. Institutional review boards (IRBs) typically define for a project the scope of allowed data as well as the people authorized to work with that data. Modern research informatics systems must enable productive workflows while enforcing these IRB constraints.

Quality assurance

Machine learning can be highly sensitive to the quality of the data. Researchers must be able to confirm the quality of data, including completeness and consistency with the protocol defined for the study. Quality control and supporting documentation are required for scientific reproducibility and for processes such as U.S. Food and Drug Administration (FDA) approval. Subsequently, modern informatics systems must incorporate comprehensive support for quality assurance as part of the workflow.

Integrated labeling and annotation workflows

Machine learning depends on accurately labeled sample datasets in order to effectively train AI models. Real-world data, often originating from multiple sources, generally lack the structure and consistent labels required to directly support training. Modern imaging informatics solutions must provide the ability to efficiently organize and classify data for search and selection into the appropriate projects or machine-learning applications. Labeling workflows must be supported, including the ability to normalize classification of images and other factors such as disease indications. In the context of imaging, this may involve image annotations collected from radiologists or other experts in a consistent, machine-readable manner via blind multireader studies or similar workflows.

Automated computational workflows

Imaging and machine learning are computationally intensive activities. Research informatics platforms must automate and scale computational workflows ranging from basic image preprocessing to analytic pipelines and training AI models. The ability to rapidly define and integrate new processes using modern tools and technologies is critical for productivity and sustainability. These systems must also provide the ability to leverage diverse private cloud, public cloud, and high-performance computing (HPC) infrastructures to achieve the performance required to process large cohorts cost-effectively.

Integrated data privacy

Data privacy is critical. Compliance with regulations such as HIPAA and GDPR is a must, given the potential financial and ethical risks. However, the lack of scalable systems for ensuring data privacy is impeding researcher access to data and, therefore, slowing innovation and the related benefits. Modern research informatics solutions must systematically address data privacy. Regulations require deidentification of protected health information to the minimum level required for the intended use. However, the minimum level of identification may differ by project. Subsequently, informatics solutions must integrate deidentification and related data privacy measures in a way that can meet the needs of projects with different requirements while maintaining compliance.

Data as a strategic asset with FAIR

Data is the key to clinical research and machine learning. A scalable, systematic approach to research data management should be the foundation of research strategies aimed at machine learning and precision care. Cost-effectively scaling access to clinical data in a manner that supports research workflows while ensuring security and data privacy can improve research productivity, accelerate innovation, and enable research organizations to realize their strategic potential.

Implementing the FAIR principles in your organization helps maximize the strategic value of data that exists in your institution. These principles, developed by academics, agency professionals, and industry members, amplify the value of data by making it Findable, Accessible, Interoperable, and Reusable (FAIR).

  • Findable data are labeled and annotated with rich metadata, and the metadata are searchable.
  • Accessible data are open to researchers with the correct authorization, and the metadata persist even after data are gone.
  • Interoperable data follow standards for storing information and can operate with other metadata and systems.
  • Reusable data are well-described and well-tracked with provenance for computation and processing.

Modern informatics systems should deliver on the FAIR principles while supporting the workflow needs of researchers as described above.

A clinical research platform designed to enhance productivity and accelerate innovation

Flywheel is a new class of informatics platform that addresses the unique needs of researchers involved in imaging and machine learning. Deployed at leading research institutions around the world, Flywheel supports the entire research workflow including capture, curation, computation, and collaboration, plus compliance at each step.

Capture

Flywheel is designed for true multimodality research. While the system specializes in the unique data types and workflows associated with imaging, the platform is capable of managing nonimaging data such as EMR, digital pathology, EEG, genomics, or any other file-based data. Further, Flywheel can automate data capture from imaging modalities and also clinical PACS and VNAs to streamline research workflows as well as translational testing scenarios.

Curate

Flywheel is unique in its ability to organize and curate research data in cohort-centric projects. The platform provides extensive tools for managing metadata including classification and labeling. Quality assurance is supported through project templates and automation rules. Integrated viewers with image annotation and persistent regions of interest (ROIs) are provided to support blind multireader studies and related machine-learning workflows. Powerful search options with access to all standard or custom metadata are provided to support the FAIR principles.

Compute

Flywheel provides comprehensive tools to automate routine processing, ranging from simple preprocessing to full analytic pipelines and training machine-learning models. The platform scales computational workloads using industry-standard "containerized" applications referred to as "Gears." Gears may originate from Flywheel's Gear Exchange containing ready-to-use applications for common workflows or may be user-provided custom applications. The platform supports elastic scaling of workloads to maximize performance and productivity. Gears automate capture of provenance to support scientific reproducibility and regulatory approvals. Further, Flywheel helps you work with existing pipelines external to the system with powerful APIs and tools for leading scientific programming languages, including Python, MATLAB, and R.

Collaborate

Collaboration is enabled through secure, IRB-compliant projects. Collaboration may be within an institution or across the globe for applications such as clinical trials or multicenter studies. Flywheel projects provide role-based access controls to authorize access and control sharing of data and algorithms. Data may be reused across project boundaries for applications such as machine learning, which require as much data as possible.

Compliance

Flywheel helps reduce security and data privacy risks by providing a secure, regulatory-compliant infrastructure for systematically scaling research data management according to HIPAA and GDPR requirements. The platform provides integrated tools for deidentification of research data to ensure the protection of personal healthcare information.

A research-first platform answers the challenges to implementing AI

Flywheel's innovative research informatics platform helps you maximize the value of your data and serves as the backbone of your imaging research and machine learning strategy. Flywheel overcomes the limitations of systems designed for clinical operations to meet the unique needs of researchers. The result is improved collaboration and data sharing and reuse. Ultimately, Flywheel improves research productivity and accelerates innovation.

Original article can be found on Aunt Minnie

 


Four AI Workflow Trends from RSNA 2019

The Biggest Trend: Maturing Implementation of AI

Attendees who visited our booth last year were interested in learning about AI capabilities. This year they were bringing questions about implementing infrastructure needed for AI and how to scale AI research in their organizations. Scaling access to clinical data and interoperability appears to be a rising concern this year. Organizations are also gradually accepting cloud scaling as a secure option.

Radiologists are beginning to plan for AI in their standard workflows. There were many radiologists in our booth asking questions with respect to AI research in their current clinical workflows.

Data Curation for Research Still Falls Short

The focus in many workshops and presentations from radiologists was “data wrangling” and data set quality. We received many questions from attendees regarding metadata management and labelling tools. At the same time there is growing recognition that clinical systems don’t meet the needs of the research and AI development communities. Additionally, an entirely new class of solution that supports the research workflow is needed.

We recommend Dr. Paul Chang’s (University of Chicago) AuntMinnie interview during RSNA: “AI is like a great car … Most cars still need gas and roads. In the context of this analogy, gas is vetted data and the road is workflow orchestration that is AI-enabled... The only way to make a transformative technology real is to do the boring stuff, the infrastructure stuff.”

Everyone Noticed the Busy AI Showcase

The AI Showcase was very active this year. In 2018, there were roughly 70 vendors in the AI Showcase, but this year there were 129, including many international AI vendors. We noticed growth in AI development for cardiac and brain imaging.

It’s Imminent: Equipment Vendors are Integrating AI Workflows

AI is moving beyond the desktop as imaging equipment manufacturers have their eye on supporting research workflows. Leading equipment manufacturers like Philips and Canon displayed developments in their interfaces to support AI or analysis tools in a disease specific applications. Flywheel is expanding partnerships with AI vendors and equipment vendors in addition to supporting clients performing imaging and clinical research.

CEO Travis Richardson presenting at the Google Cloud Booth about Flywheel’s scalable infrastructure for machine learning.

Flywheel Delivers Reproducibility

Flywheel is committed to supporting reproducible research computations.  We make many software design decisions guided by this commitment. This document explains some key reproducibility challenges and our decisions. 

Reproducibility challenges

Flywheel’s scientific advisory board member, Victoria Stodden, writes that reproducible research must enable people to check each other's work. In simpler times, research articles could provide enough information so that scientists skilled in the art could check published results by repeating the experiments and computations. But the increased complexity of modern research and software makes the methods section of a published article insufficient to support such checking. The recognition of this problem has motivated the development of many tools.

Reproducibility and data

A first requirement of reproducibility is a clear and well-defined system for sharing data and critical metadata. Data management tools are a strength of the Flywheel software. The tools go far beyond file formats and directory trees, advancing data management for reproducible research and the FAIR principles.

Through experience working with many labs, Flywheel recognized the limitations of modern tools and what new technologies might help. Many customers wanted to begin managing data the moment they were acquired rather than waiting until they were ready to upload fully analyzed results. Flywheel built tools that acquire data directly from imaging instruments - from the scanner to the database. In some MRI sites, Flywheel even acquires the raw scanner data and implements site-specific image reconstruction. The system can also store and search through an enormous range of metadata including DICOM tags as well as project-specific custom annotations and tags.

Reproducibility and containers

A second requirement of reproducibility is sharing open-source software in a repository, such as GitHub or BitBucket. Researchers, or reviewers, can read the source code and in some cases they can download, install and run it. 

Based on customer feedback, Flywheel learned that (a) downloading and installing software - even from freely available open-source code on GitHub! - can be daunting, (b) customers often had difficulty versioning and maintaining software, as students and postdocs come and go, and (c) they would run the software many times, often changing key parameters, and have difficulty keeping track of the work they had done and the work that remained to be done. 

To respond to these challenges, Flywheel implemented computational tools based on container technology (Docker and Singularity). Implementing mature algorithms in a container nearly eliminates the burden of downloading, compiling, and installing critical pieces of software.  Containers include the compiled code along with all the dependencies, such as libraries in small virtual machines that can be run on many operating systems (PC, Mac, Linux, each with different variants). These small virtual machines (containers) can be run on a local machine or on a cloud system. This eliminates the burden of having to find the code, update all the dependencies, and compile.

Reproducibility and analyses: Introducing Gears

Once an algorithm is implemented in a container, Flywheel users run it. A lot. They wanted ways to record the precise input data as well as the algorithm version parameters that were used as they explored the data. The outputs also needed to be recorded. Such a complete record is difficult for individuals to implement; having such a record is necessary for reproducibility.

Flywheel solves these problems by creating a computational system for managed application containers, which we call Gears. The Gear is structured to record every parameter needed to perform an analysis. When the user runs a Gear, the input data, specific version of the container, all the parameters needed to run the container, and the output data are all recorded in the database. This is called an ‘Analysis’ and users perform and store hundreds of Analyses on a data set.

Because all the information about an Analysis is stored in the database associated with the study, people can re-run precisely the same Gear. It is also straightforward to run the same Gear using different data, or to explore the consequences of re-running the Gear after selecting slightly different parameters. Making Analyses searchable also helps people keep track of which Gears were run and which still need to be run. 

Reproducibility and documentation

Clear writing is vitally important to making scientific work reproducible. Tools that support clear and organized notes during the experiments are also very valuable. During the initial development, Flywheel partnered with Fernando Perez and the Jupyter (then iPython) team to implement tools that built on shared software. Flywheel continues to find ways to support these tools. Flywheel tools permit users to link their data to published papers, write documentation about projects and sessions, and add notes. This documentation is part of the searchable database, and Flywheel will continue to support users to incorporate clean and thorough documentation.

 


Flywheel Delivers Data Management

Persistently storing data is the critical first step in planning for reproducible science. Defining file formats and organizing directories is a good start; in our experience this is where most researchers focus their efforts. But modern computer science provides many technologies that improve data storage, making data FAIR e.g. findable, accessible, interoperable, and reusable (see Flywheel delivers FAIR). Flywheel uses these tools in order to support reproducible science.

Metadata are important

The value of raw data, for example the numerical data of an image, is vastly increased when we know more about the data. This information - called the metadata - can tell us many important things: the instrument parameters used to acquire the data, information about the subject (demographics, medical conditions, etc.), time and place of the acquisition, and facts about the experimental context; for example, that the subject fell asleep during the resting state MR scan.  

The biomedical imaging community recognizes the importance of metadata in two important cases. First, by defining standard file formats (DICOM or NIfTI) that embed metadata into the file header. Second, the BIDS system recognizes the importance of metadata, using the file name or an accompanying file ‘sidecar’ to store useful metadata.

Storing metadata within a file header, or an accompanying file, is a good start. But using an extensible database offers many advantages. Here is why:

Databases are efficient

Nearly all modern computer operating systems use databases to store files and their metadata. For example, on Apple systems the (CMD-I) command returns metadata (‘Info’) about the file from the operating system’s database (comments, preview, kind of file) as well as standard Posix information like file size and date of access. The Apple Spotlight search uses the database to identify files.

There are many advantages to storing information about a file in a database compared to putting the information in the file header or accompanying file. For example, we have seen many cases in which people fail to keep the two files together; and sometimes they rename one of the files and lose the association between the data and metadata files. Putting the information in the file header avoids these problems but has others. Files are distributed across the disk making searches through file headers very inefficient. Also, files arise from many different sources and it is virtually impossible to guarantee that vendors keep up-to-date with changes. Headers are most useful for a particular type of file, but not for a large system.

Databases solve these problems by having the user interact with files through a unified interface that includes the name of the raw data file on disk as well as the associated metadata. To read the raw data, one consults the database for the location of the file containing the raw data. To read the metadata, one consults only the database. Typically, the database itself is small, and updates to its format or additions to its content are possible. 

Flywheel uses a document database (MongoDB) to manage user interactions with data and metadata. In the Flywheel system, you can read metadata via the web-browser interface. When programming, you can access metadata using the software development kits (SDKs) or REST API. 

Metadata can be attached to any object in the system hierarchy

The Flywheel data are organized in a hierarchy: Group, Project, Subject, Session, Acquisition, Files and Analyses. This hierarchy can incorporate virtually any file type and associated metadata. Most of our customers store files containing medical imaging data in the hierarchy, including MRI, PET, CT, OCT, and pathology images.  But some customers store other types of files, such as computer graphics files that are useful for machine learning. All of the objects, the files and the organizational containers (Project, Subject, Session, Acquisition, Analyses) are described in the database, each with its own metadata. Users can search, annotate and reuse the files and containers from any level in the Flywheel system.

Metadata are flexible

By using a general database, Flywheel can be complete and flexible. For MRI DICOM files, the database includes all of the header information in the file, such as TR, TE, voxel size, and diffusion directions. In addition, the Flywheel database includes fields for users to place searchable notes, say, about the experiment. The database can also include links to additional experimental information about the subject and auxiliary measures (often behavioral data).

The Flywheel database can add fields without needing to rebuild the entire database. For example, as new MRI technologies developed, we were able to add additional fields that describe the new acquisition parameters. Similarly, Flywheel regularly expands to manage new types of data; as we do so, we add new database fields.

Data reuse

Flywheel helps users to reuse data by (a) helping them find data sets and (b) using the search results to create a new project in their database. Adding a database entry eliminates the need for data copying - we simply copy database entries to specify the new project’s sessions, acquisitions, and files.  Flywheel calls such a virtual project a 'Collection'. 

Reproducible science 

Data management and the ability to search across all types of objects enhance the value of the data. Carefully storing and managing metadata supports finding and reusing data, two pillars of FAIR and reproducible research

Contact us here for a demonstration to see how Flywheel’s database and further computing features can be the backbone of your research.


Four Takeaways from BioData World West 2019

BioData World West wrapped up its third year! A mix of experts from industry, academia, and government mingled and mused on the data management supporting the healthcare industry.

Below are the insights from our own Chief Technology Officer, Gunnar Schaefer and Director of Sales, Marco Comianos, who attended.

Gunnar Schaefer, Co-Founder and CTO of @Flywheel_io presents on scaling medical imaging and machine learning in clinical research

Share quality data within your organization

The main focus among conversations at BioData this year was making data accessible across departments and organizations. Letting data flow freely between labs in life sciences organizations creates a feedback loop from health network partners and previously unprofitable drug trials. In health networks, data scientists can highlight opportunities where patients are underserved to create better experiences and processes that can be streamlined to cut costs.

When these different sources of data are merged, unconventional combinations of biomedical data can point to obscure patterns of disease. Scientists from organizations like GenomeAsia, Sidra Medicine, and AstraZeneca presented their findings from blending microbiome and genetic research, genotypic and phenotypic data, and imaging and text data. 

In order for machine learning to power artificial intelligence applications, data must be routed, organized, cleaned, and standardized from the moment of creation. More important than proper data storage is the ability to query a system over and over for renewed insight. Genentech introduced the need to store data so it is FAIR: findable, accessible, interoperable, and reusable. That way, data are ripe for query and can integrate together for analysis. 

However, it’s important to remember that no matter how well sources are linked together, data must be high-quality and machine learning investigations must be ethically supervised. As Faisal Khan of AstraZeneca put it: “Tortured data will confess to anything.” 

Looking forward, expect life sciences companies to adopt better data principles in their data strategies, refine what’s working already, and search for software that bridges the gaps.

Being precise about requirements for precision medicine

Much of the groundwork for precision medicine is now being laid, though mostly in oncology. At BioData, speakers gave direction for its high-value applications. 

Today’s genomics research can treat previously-untreated rare diseases. A panel addressed how data sharing must accompany public genomic projects to optimize therapeutic development for rare diseases. Presenters also reported on diversifying the pools for large genome projects. On the treatment side, analysts explained methods to match an individual’s genomic profile with one out of many pre-existing drugs, saving time for patients facing debilitating diseases. 

These advancements require access to large amounts of data with well-defined interoperability. Looking forward, expect the general hype around precision medicine to fade, making way for discussions about infrastructure which enable answers to disease-specific precision questions.

Machine learning shortens both ends of drug trials

Beyond the potential for drug discovery using genetic markers, algorithms were showcased which had correctly predicted the pharmacokinetics and effectiveness of drug compounds. Not only does this technology assist researchers and cut costs for developing compounds or finding targets, once therapies are in clinical trials, AI can predict the likelihood of certain subpopulations having an adverse reaction to a drug. Clinical trial pools normally miss these portions of the population, which can result in a public perception crisis. 

Looking forward, expect to see AI use with historical clinical data and patient data becoming a competitive factor in shortening the time horizon for successful drug launches. We’ll also see  which AI vendors become the most productive partners for life sciences organizations.

AI specialists come ready to partner

If data scientists hold some healthy skepticism of practically applying machine learning, AI specialists showed up with the energy to compensate. AI specialists are drawing talent from universities to specialize in anatomical regions. Companies in this vertical are also starting to partner with each other to complement their deep expertise in one region.

Many AI companies at BioData specialize in genomics and digital slide pathology, so look forward to development and consolidation in this field. Fewer imaging analysis companies were present at BioData - stay tuned for the imaging market insights yet to come out of RSNA!

At RSNA’s Annual Meeting, Flywheel will be exhibiting from December 1st to December 5th in the AI Showcase. Schedule a demo and find us at booth #11618.


Flywheel Delivers FAIR Principles

The FAIR acronym is a nice way to summarize four important aspirations of modern research practice: scholarly data should be Findable, Accessible, Interoperable, and Reusable. The article describing the FAIR aspirations is excellent, and we recommend reading it. Some limitations of current practice are described here. Our company was founded to advance research and we embrace these principles.

Flywheel, software used by thousands of researchers, embodies tools and technology that deliver on the FAIR principles.

About Flywheel

Flywheel is an integrated suite of software tools that (a) stores data and metadata in a searchable database, (b) includes computational tools to analyze the data, and (c) provides users with both browser-based and command line tools to manage data and perform analyses. Our customers use these tools on a range of hardware platforms: cloud systems, on-premise clusters and servers, and laptops.

Flywheel supports users throughout a project’s life cycle. The software can import data directly from the instrument (like an MR scanner) and extract metadata from the instrument files that is stored into the database. Auxiliary data from other sources can also be imported into the database. The user can view, annotate, and analyze the data, keeping track of all the scientific activities. Finally the data and analyses can be shared widely when it is time to publish the results.

FAIR Data Principals Implemented

Findable

Flywheel makes data ‘Findable’ by search and browsing. The Flywheel search tools address the entire site’s dataset, looking for data with particular features. It is straightforward, for example, to find the diffusion-weighted imaging data for female subjects between the ages of 30 and 45. The user can contact the owners of the data for access, and the data returned by a search can be placed in a virtual project (Collection) for reuse and further analysis.

Search is most effective when there are high quality metadata associated with the data and analyses. Flywheel creates a deep set of metadata by scanning the image data, classifying them. Users can attach specific searchable key words and add data-specific notes at many places - from the overall project level, the session level, the specific data file or the analyses. Users can find data by searching based on these descriptions.

Accessible

Our customers frequently observe that there is a conflict between making data accessible (sharing) while complying with health privacy rules. We live in a world with privacy officers on the one hand and open data advocates on the other.

Flywheel delivers an accessible solution that is respectful of both principles. We implemented a rigorous user-rights management system that is easy to use. Access to the data and analyses is controlled through a simple web-based interface. The system implements the different roles that are needed during a project’s life cycle. At first perhaps only the principal investigator and close collaborators have access; later, additional people (reviewers, other scientists) might be granted access to check the data and analyses. When ready, the anonymized data and full descriptions of the analyses can be made publicly viewable. An effective system that manages a project through these stages is complicated to write, but Flywheel makes the system easy-to-use through its browser interface.

Interoperable

Most scientists have felt the frustration of learning that a dataset is available, but the file format or organization of the data files requires substantial effort to decode and use. The medical imaging community has worked to reduce this burden by defining standardized file and directory organizations. Flywheel is committed to using and promoting these standards.

Our experience teaches us that well intentioned file formats and directory organizations are not enough. Flywheel stores far more information than what one finds in the header of a DICOM or NIfTI file or the BIDS directory structure. Our commitment to interoperability includes reading in files and directories in these standards and even writing Flywheel data into these formats. Beyond this, we are committed to tools that import and export data and metadata between Flywheel and other database systems.

Flywheel is further committed to supporting the interoperability of computational tools. We have opened our infrastructure so that users can analyze data using Flywheel-defined containerized algorithms, their own containers, or their own custom software. The Flywheel standards are clearly defined based on industry-standard formats (e.g., JSON, Docker, Singularity) so that other groups can use them and in this way support computational interoperability.

Reusable

From its inception, Flywheel was designed to make data reusable. Users at a center can share data within their group or across groups, they can reuse the data by combining from different groups, and create and share different computational tools. The user can select data from any project and merge it into a new project. Such reused data is called a Collection in Flywheel. The original data remain securely in place, and the user can analyze the collection as a new virtual project. All the analyses, notes, and metadata of the original data remain attached to the data as they are reused.

Equally important, the computational methods are carefully managed and reusable. Each container for algorithms is accompanied by a precise definition of its control parameters and how they were set at execution time. This combination of container and parameters is called a Flywheel Gear, and the specific Gear that was executed can be reused and shared.

More

The FAIR principles are an important part of the Flywheel system. We have also been able to design in additional functionality that supports these principles.

  • Security and data backup are very important and fundamental. The ability to import older data into the modern technology has been valuable to many of our customers.
  • The visualization tools built into Flywheel help our customers check for accuracy and data quality as soon as the data are part of the system.
  • The programming interface, supported by endpoints accessible in three different scientific programming languages, permits users to test their ideas in a way that gracefully leads to shared data and code.

Flywheel-Connect 3D Slicer Extension

Modern scientific workflows require the utilization of a diverse (and sometimes disparate) set of tools to turn data into insight. With each of these tools designed to excel at a specific set of tasks, shepherding results between tools can require significant effort and expense. Fortunately, Flywheel is an integrated framework facilitating ease-of-use of these tools and the transfer of data between them. The Flywheel-connect tool described here is an 3d Slicer extension which further enhances the Flywheel platform's integrated functions. 

Extending the Flywheel-Connect 3D Slicer Extension

In this article we demonstrate the capacity of Flywheel to extend the utility of 3D Slicer. 3D Slicer is an open source software platform for medical image informatics, image processing, and three-dimensional visualization. While 3D Slicer excels at desktop visualization and local image analysis, it lacks the capacity to orchestrate cloud-centric computing and institutional data resources -- the specific domain in which the Flywheel 3D Slicer extension shines. 

We have leveraged 3D Slicer’s Extension architecture and Flywheel’s Python SDK to provide 3D Slicer access to NifTI images stored within a remote Flywheel instance running on Google Cloud Platform. Called “flywheel-connect”, this 3D Slicer extension is a proof-of-concept utility for downloading and displaying multiple NifTI images from selected acquisitions within a specific Flywheel instance. The “flywheel-connect” extension makes it possible for a 3D Slicer extension user to directly access and load data stored within any Flywheel instance they have access to.

Demonstration

We demonstrate flywheel-connect in the video below using data from the 2017 MICCAI Multimodal Brain Tumor Segmentation Challenge. Firstly, a user-generated api-key from a Flywheel instance is input.  Then, after the “Connect Flywheel” button is pressed, combo-boxes are populated with all of the groups, projects, sessions, and acquisitions that the user has permissions to access. To load an image into the 3D Slicer extension, press “Retrieve Acquisition”. By default, images are cached in a user-specific directory for later use. These directories and images can be deleted directly off of the filesystem or flushed with the “Cache Images” checkbox.

When the data is in 3D Slicer, Slicer-specific operations can be performed on these data. As portrayed in the video, the discrete-valued volume representing tumor segmentation across imaging modalities (FLAIR, T1W, T1CE, T2W) is converted into a label map.  In turn, this label map is converted to a three-dimensional representation with Slicer-provided tools.

Although flywheel-connect demonstrates a usable solution, additional features would greatly increase overall utility for Slicer-centric imaging workflows. First, the capacity to view the results of specific Flywheel analyses. This would, for example, allow the visual inspection of image registration to an atlas or template done in Flywheel. Secondly, saving an entire 3D Slicer workspace in a Flywheel instance.  Along with instantiating this workspace from a Flywheel instance, this feature would provide functionality entirely lacking in 3D Slicer: Management of multiple workspaces relating to the same project.

To try out flywheel-connect for yourself, see the github page for this project. Suggestions and proposed improvements are welcome. https://github.com/flywheel-apps/flywheel-connect

https://youtu.be/vGoLwRiKV3s


Building Blocks of Imaging AI Use Case: Flywheel delivers presentation at BioData World West 2019

Oct., 10-11, 2019, Hilton San Diego Resort and Spa, San Diego, California– Flywheel Exchange, is sponsoring a demonstration in booth 14, and an imaging AI use case presentation “Scaling Medical Imaging and Machine Learning in Clinical Research: Data Management, Curation, Computational Workflows” at Bio-Data West 2019. To arrange for a private meeting onsite connect with us here!

The Imaging AI use case presentation by Flywheel CEO, Travis Richardson, describes a framework for a scientific workflow which manages imaging data.  The presentation by Richardson addresses creating standardization of imaging data and metadata from multiple, disparate data repositories. 

Specifically, the presentation walks through managing historical clinical trial imaging data sets, located in a variety of repositories, including: vendor neutral archives (VNAs), picture archiving and communication systems (PACS), file servers, cloud servers, thumbdrives, or even DVDs. Moreover, Richardson reviews the Imaging AI use case challenge of bulk upload ingest of imaging data and metadata, as well as automated validation of an organization’s unique DICOM imaging data and metadata files, structures, and formatting. Also, Richardson reviews how the ingested, standardized, and validated imaging data and metadata is then searchable enabling easy construction of new data sets and training models for future imaging AI training models and research projects. 

Imaging AI use case: compliance, automation and reproducibility

Key to the case study is the Flywheel imaging infrastructure platform. Flywheel exchange provides a collaborative workflow, compliant with the requirements of Institutional Review Boards (IRBs), the Health Information Privacy and Portability Act (HIPPA) and General Data Privacy Regulation (GDPR). Flywheel also automates reproducibility as required for funding by the National Institute of Health (NIH).   

The presentation is of interest to scientific workflow researchers, principal investigators (PIs), imaging lab and center directors, and life science teams seeking to avoid IT bottlenecks, improve the efficiency and speed of imaging AI and research scientific discovery. 

BioData West 2019, as an expo which includes biomedical imaging, data, clinical and research professionals as well as AI & big data, alongside start-ups, growth firms, and Fortune 100 life science organizations, the presentation at BioData West 2019 will facilitate conversations across disciplines.

Flywheel Exchange team members are looking forward to hearing about projects and initiatives, as well as to share their recent insights into biomedical imaging , infrastructure, and solving unique imaging AI challenges.


Flywheel's Neuroinformatics Platform: Translating scientific findings into clinical applications

 

Lerma-Usabiaga led a group from Stanford University and the University of California, San Francisco (UCSF) in developing a framework for translating magnetic resonance imaging (MRI) scientific findings into clinical practice.  Their system is based on the Flywheel neuroinformatics platform, including both the data and computational management tools. The methods framework explores replication, which is essential for valid science, and generalization, which is essential for clinical applications. 

The authors gathered nine data sets into the Flywheel neuroinformatics platform, grouping  them into three categories. Variations in data characteristics begin during acquisition, the authors note, due to calibration differences in MRI instruments between competing MRI vendors and other factors.

Neuroinformatics platform reproducibility

Lerma-Usabiaga, et al., (2019) use Flywheel to support their goal of computational reproducibility. They  note that computational reproducibility is supported by using open source containerized methods whose inputs, outputs, parameters and versioning (provenance) are  stored in the Flywheel neuroinformatics platform. Other scientists can reproduce the analysis by accessing the Flywheel system, if they are authorized by the Institutional Review Board (IRB).

The containers execute the largest and most complex jobs, while  Flywheel’s software development kit (SDK) facilitates data preparation, further statistical analysis, and visualization  (2019, p. 4). Script reproducibility is supported via storage and versioning in a GitHub repository (https://github.com/garikoitz/paper-reproducibility) while input data and the executed version is stored in the Flywheel neuroinformatics platform.

Neuroinformatics platform for scientific reproducibility

The specific application the authors explore is a potential  biomarker in the white matter that can be used to assess individual subjects, following the  “Precision Medicine” approach emerging in neuroimaging. The authors point to a need for quantitative and objective method frameworks addressing biomarker measurement validation which provides precision, as well as replication, and reproducibility. 

The paper addresses increasing variations in the clinical setting - neuroimaging instruments, populations, and measurement protocols – as well as the problem of algorithmic complexity. Lerma-Usabiaga, et al., (2019) point out that by using containers and storing the analysis history  the Flywheel neuroinformatics platform system implemented “closely align” (2019, p. 8) with the Poldrack et al., (2017) description of scientific reproducibility research tools

“…entire analysis workflow…completely automated in a workflow engine and packaged in a software container or virtual machine to ensure computational reproducibility.” (p. 124, 2017)

Finally, Lerma-Usabiaga, et al., 2019 note that the Flywheel platform is  extensibile, simplifying the analysis of new datasets using identical computational methods and measuring how compliance ranges fluctuate. A  process of continuous data aggregation should allow continuing improvement of the methods and better definition of the compliance range, helping translate scientific research from the lab to the clinic.

References

Lerma-Usabiaga, G., et al., 2019. Replication and generalization in applied neuroimaging. NeuroImage, a Journal of Brain Function. Volume 202, Article 116048, November 15, 2019.

Marcus, D., et al., 2011. Informatics and data mining tools and strategies for the human connectome project. Front. Neuroinf., 5 (2011), p. 4.

Poldrack, R. A., et al., 2017. Scanning the horizon: towards transparent and reproducible neuroimaging research. Nat. Rev. Neurosci. 18, 115-126.