Four AI Workflow Trends from RSNA 2019

The Biggest Trend: Maturing Implementation of AI

Attendees who visited our booth last year were interested in learning about AI capabilities. This year they were bringing questions about implementing infrastructure needed for AI and how to scale AI research in their organizations. Scaling access to clinical data and interoperability appears to be a rising concern this year. Organizations are also gradually accepting cloud scaling as a secure option.

Radiologists are beginning to plan for AI in their standard workflows. There were many radiologists in our booth asking questions with respect to AI research in their current clinical workflows.

Data Curation for Research Still Falls Short

The focus in many workshops and presentations from radiologists was “data wrangling” and data set quality. We received many questions from attendees regarding metadata management and labelling tools. At the same time there is growing recognition that clinical systems don’t meet the needs of the research and AI development communities. Additionally, an entirely new class of solution that supports the research workflow is needed.

We recommend Dr. Paul Chang’s (University of Chicago) AuntMinnie interview during RSNA: “AI is like a great car … Most cars still need gas and roads. In the context of this analogy, gas is vetted data and the road is workflow orchestration that is AI-enabled... The only way to make a transformative technology real is to do the boring stuff, the infrastructure stuff.”

Everyone Noticed the Busy AI Showcase

The AI Showcase was very active this year. In 2018, there were roughly 70 vendors in the AI Showcase, but this year there were 129, including many international AI vendors. We noticed growth in AI development for cardiac and brain imaging.

It’s Imminent: Equipment Vendors are Integrating AI Workflows

AI is moving beyond the desktop as imaging equipment manufacturers have their eye on supporting research workflows. Leading equipment manufacturers like Philips and Canon displayed developments in their interfaces to support AI or analysis tools in a disease specific applications. Flywheel is expanding partnerships with AI vendors and equipment vendors in addition to supporting clients performing imaging and clinical research.

CEO Travis Richardson presenting at the Google Cloud Booth about Flywheel’s scalable infrastructure for machine learning.

Flywheel Delivers Reproducibility

Flywheel is committed to supporting reproducible research computations.  We make many software design decisions guided by this commitment. This document explains some key reproducibility challenges and our decisions. 

Reproducibility challenges

Flywheel’s scientific advisory board member, Victoria Stodden, writes that reproducible research must enable people to check each other's work. In simpler times, research articles could provide enough information so that scientists skilled in the art could check published results by repeating the experiments and computations. But the increased complexity of modern research and software makes the methods section of a published article insufficient to support such checking. The recognition of this problem has motivated the development of many tools.

Reproducibility and data

A first requirement of reproducibility is a clear and well-defined system for sharing data and critical metadata. Data management tools are a strength of the Flywheel software. The tools go far beyond file formats and directory trees, advancing data management for reproducible research and the FAIR principles.

Through experience working with many labs, Flywheel recognized the limitations of modern tools and what new technologies might help. Many customers wanted to begin managing data the moment they were acquired rather than waiting until they were ready to upload fully analyzed results. Flywheel built tools that acquire data directly from imaging instruments - from the scanner to the database. In some MRI sites, Flywheel even acquires the raw scanner data and implements site-specific image reconstruction. The system can also store and search through an enormous range of metadata including DICOM tags as well as project-specific custom annotations and tags.

Reproducibility and containers

A second requirement of reproducibility is sharing open-source software in a repository, such as GitHub or BitBucket. Researchers, or reviewers, can read the source code and in some cases they can download, install and run it. 

Based on customer feedback, Flywheel learned that (a) downloading and installing software - even from freely available open-source code on GitHub! - can be daunting, (b) customers often had difficulty versioning and maintaining software, as students and postdocs come and go, and (c) they would run the software many times, often changing key parameters, and have difficulty keeping track of the work they had done and the work that remained to be done. 

To respond to these challenges, Flywheel implemented computational tools based on container technology (Docker and Singularity). Implementing mature algorithms in a container nearly eliminates the burden of downloading, compiling, and installing critical pieces of software.  Containers include the compiled code along with all the dependencies, such as libraries in small virtual machines that can be run on many operating systems (PC, Mac, Linux, each with different variants). These small virtual machines (containers) can be run on a local machine or on a cloud system. This eliminates the burden of having to find the code, update all the dependencies, and compile.

Reproducibility and analyses: Introducing Gears

Once an algorithm is implemented in a container, Flywheel users run it. A lot. They wanted ways to record the precise input data as well as the algorithm version parameters that were used as they explored the data. The outputs also needed to be recorded. Such a complete record is difficult for individuals to implement; having such a record is necessary for reproducibility.

Flywheel solves these problems by creating a computational system for managed application containers, which we call Gears. The Gear is structured to record every parameter needed to perform an analysis. When the user runs a Gear, the input data, specific version of the container, all the parameters needed to run the container, and the output data are all recorded in the database. This is called an ‘Analysis’ and users perform and store hundreds of Analyses on a data set.

Because all the information about an Analysis is stored in the database associated with the study, people can re-run precisely the same Gear. It is also straightforward to run the same Gear using different data, or to explore the consequences of re-running the Gear after selecting slightly different parameters. Making Analyses searchable also helps people keep track of which Gears were run and which still need to be run. 

Reproducibility and documentation

Clear writing is vitally important to making scientific work reproducible. Tools that support clear and organized notes during the experiments are also very valuable. During the initial development, Flywheel partnered with Fernando Perez and the Jupyter (then iPython) team to implement tools that built on shared software. Flywheel continues to find ways to support these tools. Flywheel tools permit users to link their data to published papers, write documentation about projects and sessions, and add notes. This documentation is part of the searchable database, and Flywheel will continue to support users to incorporate clean and thorough documentation.


Flywheel Delivers Data Management

Persistently storing data is the critical first step in planning for reproducible science. Defining file formats and organizing directories is a good start; in our experience this is where most researchers focus their efforts. But modern computer science provides many technologies that improve data storage, making data FAIR e.g. findable, accessible, interoperable, and reusable (see Flywheel delivers FAIR). Flywheel uses these tools in order to support reproducible science.

Metadata are important

The value of raw data, for example the numerical data of an image, is vastly increased when we know more about the data. This information - called the metadata - can tell us many important things: the instrument parameters used to acquire the data, information about the subject (demographics, medical conditions, etc.), time and place of the acquisition, and facts about the experimental context; for example, that the subject fell asleep during the resting state MR scan.  

The biomedical imaging community recognizes the importance of metadata in two important cases. First, by defining standard file formats (DICOM or NIfTI) that embed metadata into the file header. Second, the BIDS system recognizes the importance of metadata, using the file name or an accompanying file ‘sidecar’ to store useful metadata.

Storing metadata within a file header, or an accompanying file, is a good start. But using an extensible database offers many advantages. Here is why:

Databases are efficient

Nearly all modern computer operating systems use databases to store files and their metadata. For example, on Apple systems the (CMD-I) command returns metadata (‘Info’) about the file from the operating system’s database (comments, preview, kind of file) as well as standard Posix information like file size and date of access. The Apple Spotlight search uses the database to identify files.

There are many advantages to storing information about a file in a database compared to putting the information in the file header or accompanying file. For example, we have seen many cases in which people fail to keep the two files together; and sometimes they rename one of the files and lose the association between the data and metadata files. Putting the information in the file header avoids these problems but has others. Files are distributed across the disk making searches through file headers very inefficient. Also, files arise from many different sources and it is virtually impossible to guarantee that vendors keep up-to-date with changes. Headers are most useful for a particular type of file, but not for a large system.

Databases solve these problems by having the user interact with files through a unified interface that includes the name of the raw data file on disk as well as the associated metadata. To read the raw data, one consults the database for the location of the file containing the raw data. To read the metadata, one consults only the database. Typically, the database itself is small, and updates to its format or additions to its content are possible. 

Flywheel uses a document database (MongoDB) to manage user interactions with data and metadata. In the Flywheel system, you can read metadata via the web-browser interface. When programming, you can access metadata using the software development kits (SDKs) or REST API. 

Metadata can be attached to any object in the system hierarchy

The Flywheel data are organized in a hierarchy: Group, Project, Subject, Session, Acquisition, Files and Analyses. This hierarchy can incorporate virtually any file type and associated metadata. Most of our customers store files containing medical imaging data in the hierarchy, including MRI, PET, CT, OCT, and pathology images.  But some customers store other types of files, such as computer graphics files that are useful for machine learning. All of the objects, the files and the organizational containers (Project, Subject, Session, Acquisition, Analyses) are described in the database, each with its own metadata. Users can search, annotate and reuse the files and containers from any level in the Flywheel system.

Metadata are flexible

By using a general database, Flywheel can be complete and flexible. For MRI DICOM files, the database includes all of the header information in the file, such as TR, TE, voxel size, and diffusion directions. In addition, the Flywheel database includes fields for users to place searchable notes, say, about the experiment. The database can also include links to additional experimental information about the subject and auxiliary measures (often behavioral data).

The Flywheel database can add fields without needing to rebuild the entire database. For example, as new MRI technologies developed, we were able to add additional fields that describe the new acquisition parameters. Similarly, Flywheel regularly expands to manage new types of data; as we do so, we add new database fields.

Data reuse

Flywheel helps users to reuse data by (a) helping them find data sets and (b) using the search results to create a new project in their database. Adding a database entry eliminates the need for data copying - we simply copy database entries to specify the new project’s sessions, acquisitions, and files.  Flywheel calls such a virtual project a 'Collection'. 

Reproducible science 

Data management and the ability to search across all types of objects enhance the value of the data. Carefully storing and managing metadata supports finding and reusing data, two pillars of FAIR and reproducible research

Contact us here for a demonstration to see how Flywheel’s database and further computing features can be the backbone of your research.

Flywheel Wins Minnesota High Tech Association 2019 Tekne Award for Cloud Computing 

Minneapolis, November 25, 2019 — Flywheel, a cloud-based biomedical imaging research platform, was awarded the 2019 Tekne Award in the Cloud Computing category by the Minnesota High Tech Association.  The Tekne Awards, announced on Wednesday, September 20th, recognize companies bringing innovation to Minnesota’s science and technology industry. 

Flywheel is a comprehensive research data platform for medical imaging, machine learning, and clinical trials.  The company offers a range of solutions for life sciences, clinical, and academic research applications. Flywheel streamlines the entire research workflow including data capture, curation, computation, and collaboration.  Flywheel’s platform runs on all the leading cloud platforms including Google Cloud Platform, AWS, and Azure, as well as private cloud infrastructures. By leveraging cloud scalability and automating research workflows, Flywheel helps organizations scale research data and analysis, improve scientific collaboration and accelerate discoveries.

“We are excited to be named the winner of the Cloud Computing 2019 Tekne Award.  It is an honor to be recognized among a group of innovative organizations driving forward incredible advancements in science and technology.  Flywheel is privileged to help the world’s leading life sciences, clinical, and academic researchers collaborate to solve healthcare challenges that impact the lives of so many people.  Flywheel’s cloud-based research platform helps researchers do more science and less IT in their pursuit of healthcare discoveries,” said Flywheel CEO, Travis Richardson.

Four Takeaways from BioData World West 2019

BioData World West wrapped up its third year! A mix of experts from industry, academia, and government mingled and mused on the data management supporting the healthcare industry.

Below are the insights from our own Chief Technology Officer, Gunnar Schaefer and Director of Sales, Marco Comianos, who attended.

Gunnar Schaefer, Co-Founder and CTO of @Flywheel_io presents on scaling medical imaging and machine learning in clinical research

Share quality data within your organization

The main focus among conversations at BioData this year was making data accessible across departments and organizations. Letting data flow freely between labs in life sciences organizations creates a feedback loop from health network partners and previously unprofitable drug trials. In health networks, data scientists can highlight opportunities where patients are underserved to create better experiences and processes that can be streamlined to cut costs.

When these different sources of data are merged, unconventional combinations of biomedical data can point to obscure patterns of disease. Scientists from organizations like GenomeAsia, Sidra Medicine, and AstraZeneca presented their findings from blending microbiome and genetic research, genotypic and phenotypic data, and imaging and text data. 

In order for machine learning to power artificial intelligence applications, data must be routed, organized, cleaned, and standardized from the moment of creation. More important than proper data storage is the ability to query a system over and over for renewed insight. Genentech introduced the need to store data so it is FAIR: findable, accessible, interoperable, and reusable. That way, data are ripe for query and can integrate together for analysis. 

However, it’s important to remember that no matter how well sources are linked together, data must be high-quality and machine learning investigations must be ethically supervised. As Faisal Khan of AstraZeneca put it: “Tortured data will confess to anything.” 

Looking forward, expect life sciences companies to adopt better data principles in their data strategies, refine what’s working already, and search for software that bridges the gaps.

Being precise about requirements for precision medicine

Much of the groundwork for precision medicine is now being laid, though mostly in oncology. At BioData, speakers gave direction for its high-value applications. 

Today’s genomics research can treat previously-untreated rare diseases. A panel addressed how data sharing must accompany public genomic projects to optimize therapeutic development for rare diseases. Presenters also reported on diversifying the pools for large genome projects. On the treatment side, analysts explained methods to match an individual’s genomic profile with one out of many pre-existing drugs, saving time for patients facing debilitating diseases. 

These advancements require access to large amounts of data with well-defined interoperability. Looking forward, expect the general hype around precision medicine to fade, making way for discussions about infrastructure which enable answers to disease-specific precision questions.

Machine learning shortens both ends of drug trials

Beyond the potential for drug discovery using genetic markers, algorithms were showcased which had correctly predicted the pharmacokinetics and effectiveness of drug compounds. Not only does this technology assist researchers and cut costs for developing compounds or finding targets, once therapies are in clinical trials, AI can predict the likelihood of certain subpopulations having an adverse reaction to a drug. Clinical trial pools normally miss these portions of the population, which can result in a public perception crisis. 

Looking forward, expect to see AI use with historical clinical data and patient data becoming a competitive factor in shortening the time horizon for successful drug launches. We’ll also see  which AI vendors become the most productive partners for life sciences organizations.

AI specialists come ready to partner

If data scientists hold some healthy skepticism of practically applying machine learning, AI specialists showed up with the energy to compensate. AI specialists are drawing talent from universities to specialize in anatomical regions. Companies in this vertical are also starting to partner with each other to complement their deep expertise in one region.

Many AI companies at BioData specialize in genomics and digital slide pathology, so look forward to development and consolidation in this field. Fewer imaging analysis companies were present at BioData - stay tuned for the imaging market insights yet to come out of RSNA!

At RSNA’s Annual Meeting, Flywheel will be exhibiting from December 1st to December 5th in the AI Showcase. Schedule a demo and find us at booth #11618.