Incorporating R&D Workflows into Life Science Digital Transformation

Digital Transformation in the Age of AI Requires New Infrastructure

A digital transformation is underway as life science organizations work to reduce costs, increase operational efficiency, and accelerate drug development. Data is the key.  Subsequently, these organizations are focused on integrating scalable analytics, adopting artificial intelligence (AI) and machine learning (ML), and fully migrating their operations into the cloud. The ultimate objective in these initiatives is to create a culture of collaboration and experimentation to drive innovation and meet the demands of a rapidly evolving healthcare landscape, especially when faced with unpredictable events, such as the COVID-19 pandemic.

Medical imaging is an important component of this vision since it’s a rich source of patient information that can accelerate drug discovery and development to diagnose and assess disease, define and quantify biomarkers, and optimize the clinical trial process.  With megapixel upon megapixel of sub-millimeter resolution data packed into the outputs from X-rays, CAT scans, MRIs, and other modalities, medical imaging is ripe for artificial intelligence applications, especially when optimization of drug development and clinical trials are the ultimate goal.

However, the incorporation of medical imaging workflows into a digital transformation R&D ecosystem is not trivial. Domain-specific tools are necessary to access and curate large volumes of imaging data and manage complex computational algorithms and AI workflows, all while maintaining data quality, privacy and compliance. Additionally, standardization of data and analytical workflows are critical to enable collaboration. If integrated effectively, this powerful technology can greatly accelerate innovation and enable teams to meet their R&D objectives. 

In our experience working with life sciences organizations, there are common challenges that organizations face when taking on this type of digital transformation. This is not an exhaustive list of problems and solutions but rather a few infrastructure related guidelines that are important as life science companies adopt medical imaging into their digital transformation ecosystem.

Data Management is the Key to Life Sciences R&D

Problem: Consolidating data from disparate sources into a single repository

High quality data can drive better patient recruitment and engagement and lead to more efficient trials and higher quality results. With AI and modern image processing techniques, there are new opportunities to gain insights from medical imaging data.  Imaging data (mostly DICOM) originates from many disparate sources and partners including CROs, research institutions, internal stores and external real world data which are all hosted on unique systems. Life Science companies need to not only bring together data from these sources, but also make this data easily accessible for data consolidation, labeling and conversion to desired formats.  In fact, a recent IBM study reported that 80+% of effort in AI and big data projects is linked to data preparation1. The diversity and complexity of medical data types adds further difficulty and expense to data management.

Solution: A robust database and workflow to handle large volumes of data

Organizations must initially validate their data to ensure that data was completely ingested and that received data is appropriate for the research purposes.  Next, the data needs to be examined for adequate quality for optimal processing and analysis. Since healthcare data tends to be large, complex, and diverse in nature, enterprise-level scaling requires significant stress testing to ensure that the platform can on-board large numbers of active researchers. Additionally, every data access, curation, and processing action, either manual or automatic, needs to be logged and tracked to establish reproducibility and audit readiness.  

Automated workflows are also mandatory since the ingested data is in the order of tera or peta-bytes and manual processes are inefficient, time consuming, and prone to human error.  At the point of entry, data (and metadata) needs to be de-identified, classified, and quality control algorithms need to be triggered to “prep” the data for larger scale, complex analysis. Associated non-DICOM (non-imaging) data also needs to be handled with care as this data is needed for analysis. All ingested data ultimately requires a flexible, robust, searchable framework where all metadata and processes are automatically indexed and immediately available for search within the system.

Cloud Scale Computing Enables AI and Complex Analysis

Problem: Medical image processing and AI place high demand on resources 

Large scale data analysis in medical imaging often revolves around the use of multiple complex algorithms to create “pipelines”, i.e., data processing elements connected in series, where the output of one element is the input of the next one. These pipelines are necessary for image segmentation, biomarker quantification, and synthetic data creation that is in most cases applied to hundreds and thousands of data sets.  Inevitably, local IT infrastructures struggle to maintain the many algorithms and associated processing workflows, especially when developers want to fully maximize a multitude of CPUs, GPUs, and TPUs.

Solution: A cloud-scale processing infrastructure integrated with a curated database

Flexible deployment of pipelines can greatly ease the strain of development. Containerizing these pipelines (or pipeline components) reduces the IT burden to maintain these algorithms over time and promotes reproducible practices.  A processing infrastructure that can leverage local compute resources for low volume processing, combined with elastic cloud scaling for large-scale processing is a strategy that optimizes for both cost and capacity.

As life science companies look to machine learning to guide the future of their product development, machine learning workflows with comprehensive provenance for reproducibility and regulatory approvals are needed. Ideally, organizations want the ability to easily search and locate cohorts of data, train AI models, and run data conversion and quality assurance locally in an effort to scale the models in the cloud. This is a workflow that has benefitted many life science companies when working with medical imaging and other associated data sets.

Collaboration Across the Enterprise Drives Innovation

Problem: Not only are data and algorithms siloed, so are the people

Many large scale life science companies employ a vast array of scientists and engineers located in many geographies. These professionals, in many cases, need to collaborate with internal and external partners to advance an R&D initiative. Inevitably, their ability to collaborate is closely tied to their ability to share large scale data and complex processing pipelines.

Solution: Data and processing pipelines should be closely linked and in the cloud

Migrating data in the life science industry from one location to another has its share of complexities ranging from large data transfer bottlenecks to regulatory compliance. Additionally, many of these companies have teams located all over the world requiring observance of regulatory requirements for each country or region. Federation of databases across disparate regions, where computation resources are closely tied to data locality, will provide researchers with a seamless resource where data and algorithms can be accessed, eliminating the need to manage multiple databases.  Leveraging web interfaces and software development kits, users can securely access and upload to the platform, as well as process the data. Additionally, privileges to access the data and algorithms with secure controls and in compliance with regulatory constraints can be created. 

The Way Forward in Life Sciences R&D

The modern life science company is moving towards a “data-driven” operational model.  Medical imaging plays an important role in this new paradigm as the power of diagnostic tools can greatly enhance R&D discovery and clinical trial outcomes. Additional data types such as digital pathology, microscopy, and genomics are becoming complementary additions to multi-modal research adding significant values for diagnosis of complicated diseases but also creating additional complexities to the data management process. The integration of all data types as part of a digital transformation initiative requires an all-encompassing solution that can curate and organize large volumes of these data types (and related data), enable complex processing and AI pipelines, and provide the tools necessary to enhance collaboration across many teams and partners.

Author: Jim Olson, CEO, Flywheel Exchange, LLC.

To learn more about Flywheel’s enterprise-scale research data management platform and how it  enables digital transformation in the life sciences, please click here or email info@flywheel.io.

1https://www.ibm.com/cloud/blog/ibm-data-catalog-data-scientists-productivity