Breaking Down Research Data Silos for Accelerated Innovation

May 25, 2022

Accelerating pharma innovation depends on maximizing the value of biomedical data assets, including imaging data. Within research organizations, however, data is often underleveraged, siloed, and disorganized. This adds to R&D timelines and costs, hinders AI development, and prevents collaboration that can lead to breakthroughs. How can companies address these data challenges with maximum buy-in and minimal disruption?

Our panelists discuss:

The barriers to breaking down data silos in pharma
Cues life sciences organizations can take from the FAIR data principles
Tools for standardizing how data is captured, curated and shared
How standardizing curation can speed AI development
Examples of pharma successes in breaking down data silos

Presenters

Dan Marcus, PhD; Chief Scientific Officer, Flywheel
Costas Tsougarakis; Vice President Life Sciences Solutions, Flywheel
Oliver Keown, MD; Managing Director, Intuitive Ventures

Transcript:

Andrea:

Hello everyone. Thank you for attending today's webinar, "Breaking Down Research Data Silos for Accelerated Innovation," presented by Flywheel. I'm Andrea Anderson and I'll be moderating this webinar. Our speakers today are Dan Marcus, PhD, Chief Scientific Officer at Flywheel; Costas Tsougarakis, Vice President of Life Science Solutions at Flywheel and Oliver Keown, MD, Managing Director of Intuitive Ventures. You can read their full bios on the left side of your window by selecting the speakers tab. Just a few technical notes before we begin, the webcast is being streamed through your computer so there is no dial in number. For the best audio quality, please make sure your volume is up. This webinar is being recorded and will be available on demand within 24 hours after the event. Time permitting, we will follow the presentations with a Q & A session. Please submit your questions, using the questions and answers tab on the left side of your screen. All right, let's begin. Oliver, please go ahead.

Oliver Keown:

Thanks so much. It's great to be on here with everyone and I am excited to help moderate a pretty exciting conversation here with Dan and Costas on the future of clinical research and AI development in pursuit of innovation in pharma and life sciences. A bit about me, I'm Managing Director of Intuitive Ventures that is the corporate venture investing arm of Intuitive Surgical, a pioneer in the field of surgical robotics over the last 25 years. At Intuitive Ventures, we invest in early stage companies that are pioneering the field and moving the digital therapeutic diagnostic opportunities for impact on patient outcomes forward. We're very pleased and excited that one of those investments is Flywheel, a company really at the forefront of enabling collaboration and clinical research. That's gonna be, the thrust of the discussion today as we hear more from Dan and Costas. So without further ado, I'll hand it over to Dan and Costas to introduce themselves and then we'll jump into the conversation.

Dan Marcus:

Thank you, Oliver. I'm Dan Marcus, Chief Scientific Officer at Flywheel where I oversee our product development. Our products are focused on biomedical data management, really with a goal towards supporting collaborative research. Our customers range from large academic medical centers out to life science and medical device customers. I'm also a professor of Radiology at Washington University School of Medicine, where I am the Director of the Computational Imaging Research Center. One of the main developments out of our lab there at the University is an informatics system called XNAT, which is an open source tool for managing medical imaging and related data, and now very happily is part of the Flywheel family where we're building a number of our products around the platform. Also new to the Flywheel family is Costas who I'll ask to introduce himself.

Costas Tsougarakis:

Thank you, Dan. Good afternoon, all, my name is Costas Tsougarakis. I'm a VP of Life Sciences Solutions at Flywheel, and in my current role, I help our customers solve big data problems, such as the one that we are talking about today. And prior to joining Flywheel, I was at Genentech/Roche for the last seven years. Over the last three years, I established a global imaging platform, which was motivated in turn by the idea of breaking data silos and performing data FAIR-ification. Glad to be here.

Oliver Keown:

Yeah, thank you. Great breadth obviously with the Flywheel perspective today, but from the academic worlds to the life science players here and, the topic for consideration, as we said, is about breaking down silos. It's about how to innovate across those traditional groupings of data, but also institutions. But maybe we can start, Costas, with a question to you around, what do we mean when we're talking about data silos? What does that look like in the life science context? And what can some of the challenges be as we think about the opportunity of data?

Costas Tsougarakis:

Sure. To me at the core the data silo is a mindset. It's thinking about data as "my data" or "your data" and not "ours." And that manifests from data being stored locally, let's say in the department file share and not centralized, or even worse being archived in the media and, locked in a closet. And the data as a result stay within teams are used internally for a specific purpose and not broadly accessible. And that is especially true when we're talking about global enterprises that are geographically dispersed. Parts of the dangers and pitfalls: Researchers typically will be focused on immediate needs and not thinking of about the broader picture and how this data could be reused. That means that you're missing on potential, untapped potential of reusability of the data and not realizing its full value. Increased development costs, missed opportunities by not linking data modalities with each other, even duplication, redundancy, increased cost in infrastructure and even compliance issues. When it's very hard, if you have multiple copies of data, that's very hard to maintain audit trails about who's accessing and using the data.

Oliver Keown:

And hearing you discuss that and obviously I come from a med device world and a surgical robotics company, but these are issues that span, I think, many data-rich organizations not just traditional pharma and life science or biotech. But certainly others in the med tech world. Dan, I'm curious, as you've seen both within Flywheel and Radiologics and the academic world, what are organizations doing to address some of these challenges in your experience?

Dan Marcus:

Yeah, I think there's a number of things happening. Perhaps the most common approach is to build a data lake or data lakehouse where raw data in particular can be aggregated in a way where it's independent of the source modality. And this is a great start. It's important and essential to have a place to source your raw data. There's limitations; in particular, that without some modality-specific functionality, you start to lose the ability to extract rich features from the data. You lose the ability to have curation processes that are specific to those modalities to extract image quality, details about how the images or other types of data have been acquired. And so there is a need to have some link into your data lakes and data warehouses to help to extract those source- specific types of features. And that's been a big emphasis of what we've been doing within the XNAT system and within Flywheel to be able to work closely with complex biomedical data, very close to the source raw data and to extract features and to help curate that data and to build cohorts that link closely with related modalities.

Oliver Keown:

And Costas, from the life science perspective, what have you seen that's worked and maybe not worked so well regarding addressing some of these challenges?

Costas Tsougarakis:

Maybe I'll start with what has not worked: having a piecemeal approach and not thinking of big picture. Doing this FAIR-ification of breaking silos by a lot of manual preparation and a lot of curation that happens by hand. That can be a very time consuming task. Lacking of standards. When we talk about data FAIR-ification to know what are we striving towards, the nomenclature, the classification of the data, and coming up with the ontology that services all needs, or all organizations, or even trying to do retrospective FAIR-ification. When you talk about legacy data that have been acquired through the years and trying to apply FAIR-ification and data best practices to old data. What is working I think is all the opposite: having a more concerted effort that involves, from data owners, people like clinical operation, biostatisticians even informatics groups, to work together towards a common goal. Developing automation and a scalable architecture to address these large data problems. And trying to get on the action of breaking the data silos and doing the FAIR-ification as early as possible, and as close to the data collection as can be.

Oliver Keown:

And I heard you talk about FAIR-ification here and, increasingly the communities in academia and clinical research are talking about FAIR standards for how to use and manage data. Dan, could you tell us a bit more about that, in terms of the principles and where you're seeing this trend heading?

Dan Marcus:

Yeah, for sure. So FAIR, for those who aren't familiar with it, refers to Findable, Accessible, Interoperable, Reusable. So very key principles, tenets towards how to make data valuable, rich, something that can be collaborated around. And I'll say that FAIR is not like a state, it's like a thing you work towards continually to iimprove your processes to get closer and closer to it. On the academic side, we have maybe a lot of latitude towards how we think about FAIR because collaboration is just baked into how we work in the academic side, particularly across labs in different institutions. I'm involved in a number of consortia and collaborations where FAIR and data sharing are just kind of prebuilt into how we work. For example, the Human Connectome Project is a large imaging study that we've been doing for the past decade.

And from the very beginning, it was an NIH-funded study, one of our mandates was to make the data as FAIR as possible, which meant that the data were collected in a way where we knew upfront that it was gonna be shared with the community. It allowed us to build our patient consents in ways that data sharing was part of the process. We were annotating the data from the very beginning in ways to make sure that it would be very reusable. We built our imaging protocols where the protocols themselves were shared. And so anybody could deploy those same protocols and their scanners around the world to collect data in the same way that we were collecting the data. And then we shared it. As we were collecting it, after we did curation and standard processing pipelines, all that data was shared to the community in ways where it was very easily downloaded and could be brought into whatever the consumer's computing environment is.

We've worked very closely with the NIH on that, working with what's now called the NIMH Data Archive, which is where you go to get Human Connectome data. And importantly, by making all these data available, it's spawned, what's now been literally dozens of additional studies that have added to that connectome data set. That's been some very deliberate connectome projects that were secondary to the initial connectome study and literally followed the same subjects, but it's also just been a lot of investigator driven studies where they've created their own projects doing connectome style work. And so at this point, they are literally petabytes of connectome data that are being shared around the world from our own original data set. It's been downloaded, I think about 25,000 times. And there are literally hundreds of studies that are using that data to do novel research. So that's been an example where by doing FAIR from the outset, and really thinking ahead towards that as a goal that it was really comprehensive across the full study. And I think is a great example of how to do FAIR in an academic setting. Costas probably has some ideas about, how those same kind of approaches can be taken from the life science side.

Costas Tsougarakis:

Yeah, I think it's very similar Dan and, I think key is addressing these type of issues as early as possible and have business buy in on data FAIR-ification, because especially in the start, it requires some investment from the business.

Oliver Keown:

And, one of the historic challenges has been kind of applying some of these best practices to the range and increasingly complex data sets, and specifically complex object data sets that Flywheel and Radiologics and others are uniquely positioned to manipulate. As we think about the broad range of data sets leveraged today in real world evidence and clinical trials, from imaging to surgical video, something that we think about a lot at Intuitive, to digital pathology, are there unique challenges or unique kind of facets that life science organizations or other stakeholders should consider when it comes to FAIR? Is it still practical to pursue that type of framework?

Dan Marcus:

Yeah. In medical imaging, which is kind of my native habitat we have the advantage that all of the data collection systems use a data format called DICOM, which is very rich in metadata. So we are able to have a lot of the details that are needed to support data discovery, understanding the provenance of the data. So it's all there. I think one of the challenges is that it's embedded inside of the actual DICOM files, which are these kind of complex binary files and so a really important part of how to get medical imaging data into a FAIR state is to have standard data curation and metadata extraction processes. So you can make that metadata discoverable and searchable and interoperable across studies. That's a big part of what we've been doing in Flywheel is building these sorts of data extraction tools and curation tools.

You can get that kind of rich information out of the DICOM files and the related data and to link it with the associated clinical outcomes and other clinical metadata. And that's key, making all that kind of stuff searchable in a way where it's associated with the common ontology that's shared across different cohorts and different studies. So DICOM is a great start for that on the medical imaging side, but it really does take additional tooling to get that metadata in the state where it's findable.

Oliver Keown:

And do you see much variation as you think about kind of broader unique data sets that might be applicable for certain segments of the industry or between industries?

Dan Marcus:

Yeah, I mean, I think it's a common pattern in medical imaging, we do have this good starting point with the DICOM. Some of the other modalities tend to be a little less standardized and that introduces some challenges, but it's still the same principles, right? Organize the raw data, extract the metadata that you can, spend the time curating it, and adding the additional features and characterization to the data to make it most reusable. So the patterns are definitely there. Some fields are further along than others, but I think we have some good best practices that are fairly well established. And many of those are baked into how the Flywheel platforms operate.

Oliver Keown:

That's great. Well, just before we jump into some more questions here, I wanna remind folks that are dialing in that we do have a question feature and we'll save some time at the end to go through some of those. So we've got a few and please keep them coming. Kind of slightly shifting gear to the application, with all this data, we talked about the opportunity and how to curate and organize it, but we're seeing a lot of AI applied to some of these sectors and see opportunity for that. Maybe we could talk through how some of those organizations specifically, rather than just maybe kind of running studies and other things, how they're innovating around the data to drive AI. Are there unique requirements for optimizing their data towards that goal versus some of the more historic use cases of data management?

Dan Marcus:

Yeah, I mean, I think that a lot of the same principles that are part of the FAIR tenet applies to how to prepare data for AI, right? Like it's about standardized curation, labeling the data in standardized ways, enriching the data with associated metadata, in the case of imaging, related clinical attributes. So it's a lot of the same stuff. I'll add for AI, a big case is it's just very hungry for data. So you're looking for rich data sets, large data sets. But those data need to be curated and managed in a way where you can actually link them together. And that means understanding, do they have common image quality characteristics.

If you're looking at two different organizations, data coming from two different sources, the image characteristics may be very different. And that makes building an algorithm challenging if you don't really characterize those quality kind of characteristics. And you may choose like "my algorithm's only gonna work in a certain limited quality constraints," but until you characterize it, you don't know what those constraints are. And so you go to deploy it out in the real world and you see a new data set. How does it fit into your quality constraints that you built into the algorithm? You may choose to make those constraints as broad as possible, which means having really large data sets that span the whole quality range, or you may choose to keep it fairly discrete maybe tied more closely if you're running a clinical trial to the kinds of data you see in a nice controlled trial. You may be dealing with world data where you need to take all comers and then you need to really be working hard to make your algorithm as broad as possible.

But the starting point is understanding, what is the image quality? What are the acquisition constraints with how that source data was initially collected.

Oliver Keown:

And maybe, even a step further, what are you trying to achieve, right? What is the goal, and kind of working back from that. And Costas, you've been deep in that environment, looking at the application of data, the opportunity for AI, what have you seen in terms of kind of taking these unique assets and developing some type of application or ROI from it? Maybe you could talk to some of the use cases and then how you've seen this play out within the life sciences industry.

Costas Tsougarakis:

I think we have seen a lot of turmoil or turbulence in that space. A lot of layoffs within the digital transformation groups and companies now are in their second or third wave of implementation and how they're handling AI development. And I think the main issue behind all that is that the initial approach has not been very focused. People get very excited about AI and the promise, and they try to solve everything under the sun and throwing everything on the wall and seeing what sticks. So I think to address that, you have to understand and articulate the ROI very early on. Start with the market research to understand the problem that you're trying to solve, starting with scientific questions and using that as your platform to drive your innovation and to drive your AI development

Oliver Keown:

Makes sense. I'm curious, Dan, as you've kind of looked at it from the academic setting and into some of your industrial partners, from a Flywheel perspective, are there applications or opportunities you're seeing that are increasingly driving some of the utilization here of a Flywheel type platform or that collaborative platform?

Dan Marcus:

Yeah. One of the really interesting developments that's happening is around federated learning, which is the concept of, instead of having all of your data in a place where you have direct access to it and training on that singular data set, that you build your training and validation processes to work with distributed data. So the data can stay in place, which helps from a security perspective, a patient privacy perspective, and allows you to collaborate with a lot of groups without having to move data and figure out how to centralize it. And it makes the whole training process quite efficient then because you aren't moving lots of data around. The only thing that moves around is the model weights which is pretty cool. But it introduces all kinds of challenges of its own.

So, the data essentially sits as a black box. You don't have the access to individual patients and understanding at the individual patient level, how that patient's contributing to your model. So if you see some sort of weird artifact, it can be hard to trace back to the data how a case may be impacting your model.

So, it's a really exciting area and there's all kinds of development happening around federated learning, federated analysis, more generally. And what we're building at Flywheel I think is really well suited towards a federated learning approach where we do make the data well organized. We can run computational tools to extract these kinds of image quality and other metadata and features about the images that make it so that we can build federated data sets in ways where it kind of breaks down that black box problem. Cause we can at least at group level characterize the data well.

Oliver Keown:

And how would you characterize the kind of collaboration that happens, between organizations? Is that where federated learning's gonna have the most opportunity? I think there's a lot of opportunity even within, when I see AMCs or academic medical centers and other hospitals, often the departments can be their own kind of silos, you know kind of stakeholders. I'm curious, federation on top of federation, kind of where is it today and where do you ultimately see that opportunity going for for that trend?

Dan Marcus:

It's a really good point that these kind of collaborations really happen at all levels, even within my own lab where I have two grad students, one ready to leave the lab and a new one coming in, how are they collaborating to make sure there's continuity across their data sets and their algorithms and the work they're doing, all the way out to academic and industry partnerships, which have their own unique challenges. So certainly collaborations at all levels. Some of that can be done, within a single data instance of Flywheel. Some of it's done across instances and we're working very hard now to build what we call the Flywheel Exchange, which essentially serves as a fabric for connecting Flywheel instances, XNAT instances, to connect collaborators across different organizations. So that is a really exciting opportunity, I think to really take collaboration up to increasing levels across organizations

Oliver Keown:

And Costas, from your experience as you've seen, there hasn't traditionally been this opportunity for collaboration and data sharing. But with tools like Flywheel and new infrastructure to facilitate it, what are the challenges within institutions around some of these new trends and utilizing the opportunity?

Costas Tsougarakis:

I think challenges have been around, let's say infrastructure to try to build up the infrastructure needed, the personnel needed, the resources to deal with large data. I think some common issues we talk about earlier, about people changing mindsets and being more collaborative with their own data, thinking about FAIR-ification of the data very early on and not at the end of their process, defining and harmonizing metadata, and building aggregates of data and data lakes for algorithms to train on.

Oliver Keown:

Makes sense. I think I see a lot of early stage companies, trying to facilitate, certainly more on the clinical side, that exchange of data as a value proposition in and of itself. And I sometimes struggle that stakeholders don't necessarily know that they want to do the sharing. They want it to be an output of a broader goal, right. Some type of endpoint, some type of clinical trial, where it's the broader vision than that single standpoint.

I'm curious as you think about change management and the need to take this data, and often it's coming from a point of maybe disorganization or not necessarily the level of organization required, to reap the opportunities that we're describing here. Costas, how does one go from zero to one within an institution to get the ship in order to be able to realize some of the opportunities we've been discussing today?

Costas Tsougarakis:

Yeah, I would say data FAIRification needs to be a mandate from the business to start with, because there are challenges to that. You have to have the appetite to solve bringing different groups together, to work on a common goal. When we're dealing with this type of issues, I think we mentioned earlier, data harmonization and curation are crucial. So investing in tools and solutions that can automate that process and lift the weight off data scientists that just want to actually work on their algorithms and automate wherever possible. And yeah, those are some of the key initiatives.

Oliver Keown:

That makes sense. Dan, anything else you've seen as you've seen customers or collaborators go on a journey from their data strategies?

Dan Marcus:

I think to me the biggest change that I've been seeing is just that this is happening, right? Like the organizations are getting onboard. We're working with a number of the large academic medical centers where there's been some struggle over the past decade or so to get kind of consensus on how to approach data sharing and collaboration. And I feel like we're kind of past that. Like, it's almost a given now that data is such an asset and so valuable and so reusable. In my lab, when we collect some data, we have an idea in mind with what we're gonna do with it. But once we share it, now you have hundreds of thousands of people with ideas of how they might use it. And so it's just given in proper stewardship of the sharing should be part of it.

But that also means having the governance in place to do it properly. So you have to have data use agreements and data tracking and provenance, and this is all stuff that is a challenge. There's IP issues, there's legal issues, there's HIPAA kinds of issues. So it's not a thing like you can just go start throwing data on FTP site, if you want to do it right, with proper respect to your patients and, and the people who've collected the data.

So that means a lot of organizations are really starting to get their minds around, 'What are the systems we need in place? What are the processes we need to have in place? And how are we approaching this as an institution?'

On the academic side one of the things that's really starting to get people moving faster is that the NIH is introducing a new data sharing mandate. So all NIH grants, which basically funds the academic medical enterprise, will be required to share their data and have a very clear data sharing plan as part of the granting mechanism starting in January next year. So that's definitely got people starting to move faster in really wanting to think through the governance issues.

Privacy issues are the other thing that's like, as soon as you make this commitment, then you start thinking about risk. And most of that risk is really around respecting patient privacy. And patient privacy can be really tricky, particularly with complex data like images, because there can be patient identifying information buried inside of those images, both including in the metadata that's part of those data files as well as literally, patient names can be burned into an image that's acquired in a clinic. And so tooling is needed to both discover where that information is and to remove it in a way that doesn't impact downstream use of the data.

So these are all things that are happening. And I think, with the new NIH data sharing mandate it's happening and accelerating and people are starting to really feel urgency.

Oliver Keown:

That privacy piece or the kind of data piece in, again, in the world that I'm from in surgical video, taking endoscopy feed, right? The endoscope starts outside of the clinical environment and it might be in the room. You might see the patient, you might see the staff. There's elements of privacy there, to your point that can be, needs to be worked through, both at an individual file level, but privacy more broadly, uniquely challenging as you think about the different jurisdictions within which, today in international trials, international consortiums around some of these clinical areas may fall into some of the different rules around data sharing. Does that pose unique challenges to the opportunity for AI development and others and do you see organizations— academic or, Costas ones like your former employer—taking strategies to how to deal with some of those cross boundary kind of issues?

Costas Tsougarakis:

Well, I would say the number one thing for data sharing and dealing with this type of issues is data anonymization. Strip all direct or indirect identifiers that may be contained within let's say within image data so they can be completely separated from individuals. You are right. They are definitely considerations to be made in terms of data locality. So whatever solution you come up with has to respect local governance. So, yeah, so we have seen this issue before and making sure that there are data repositories across geographic regions that actually comply with local laws.

Oliver Keown:

Makes sense. Well, Costas, maybe, back to you. You've gone through the journey, being a leader within an organization. I'm sure you've seen maybe within your own department or your peers, in the industry. Can you describe what you've seen in terms of organizations that have done this journey and what the impact has been beginning to end, maybe within your organization or others.

Costas Tsougarakis:

Yeah. I'll just talk a little bit about that journey. I think the main hurdle to overcome was actually more political—trying to get multiple stakeholders across a global organization to come together and to have a common vision, a common goal, a common platform and implementation to drive toward. Once that is established, I think bringing a huge amount of legacy data into this platform, going through the whole FAIRification transformation. But at the end of the day, you create a huge data lake of discoverable data that are all harmonized that have driven quite a bit of AI innovation. They have driven exploratory analysis, accelerating drug development. So the the potential is huge when when your data has been FAIRified and you break down those silos.

Oliver Keown:

That's great. Dan, any kind of best in class examples you've seen in your experience, beginning to end? And what kind of outcomes have been achieved through that efforts— the end research or the impact on patients or otherwise.

Dan Marcus:

I'll use the cancer image archive actually as a really successful example of data sharing and one that we've been engaged in. So it's an NIH, NCI funded data repository that has worked very closely with dozens of NCI funded studies to help them, from initiation all the way through data collection and sharing, to make their data as shareable as possible. And then to really have a flexible, but pretty progressive approach towards data use terms. Where many of the data sets, you can just go download. There's no blocks, no red tape, nothing that keeps groups from just getting started with cancer imaging archive data. For many of the data sets. Other data sets have come with more governance and data use restrictions.

And so I think that's an important principle and a risk based principle of like, if you can share it completely openly, sure, please do. If you have restrictions due to, how the data were collected, a particular vulnerability of a patient population, you may not be able to share it as openly, but there's probably some level of sharing that you can do. And that may be only in a federated mode where the data's gonna stay in place and your algorithms go to it and you get aggregated results back.

But there really is that spectrum, and anywhere your data fits in that spectrum, it still has value. And the cancer image archive has done a really good job of appreciating that spectrum. And as a result there are hundreds of studies that have been published from TCIA data sets.

We've worked with the one of their glioma datasets a lot in my lab. And because that dataset has served almost like as a reference for a lot of AI development, you can really compare across different models that groups have built from that data set. And so it serves as a really nice kind of reference for comparing models and that just kind of moves the field along faster. And then we're often enriching it with our own data, right? Like, so you start with that data set and then you can fine tune in your institution. We've shared models. I work very closely with a group at MD Anderson. We've shared models back and forth where we've both started from the TCIA glioma datasets, enriched it with data from our own institutions, and then continue fine tuning it as we share our algorithms back and forth. And that's, to me, the secret to getting towards generalizable AI models is it really takes that diversity of data and a willingness to continue to develop and improve your models.

Oliver Keown:

Yeah. Well, nice to wrap up the discussion before jumping into Q&A on that kind of inspirational front— this is all the pursuit, right, of getting the bricks and mortar and the infrastructure in place, to unlock that value and the collaboration and what it can achieve.

We've had some great questions here in the chat. So we might flip over and pose some of these to both of you. One of them, are there platforms out there, kind of infrastructure backbones that are more amenable to life science data analysis? I think this is the kind of question around cloud distributors and networks like AWS and others. How does the Flywheel perspective or your own kind of unique perspectives on who are good partners for this type of work more broadly.

Costas Tsougarakis:

I would say Flywheel is cloud agnostic and works with multiple cloud providers. Being cloud enabled is important because it allows you the scalability of the infrastructure, the solution, distribution of the data across areas. So you're not tied down to a specific let's say data center the way they've typically done it in the past, because that infrastructure is not flexible enough or doesn't move fast enough to respond to the data needs and consumptions needed by AI models.

Oliver Keown:

Dan, any experience on your side?

Dan Marcus:

Yeah, I'll add to what Costas said just by saying that it's not so much about the platform, at the cloud level there's kind of comparability at the core level across most clouds. It's really about how to utilize them in ways where you're using common APIs that are easily codeable against, so that you're not locked into a particular vendor. And to be thinking about data organization and curation and standard ways that can then be implemented across platforms. So doing things like, for example, in containerized apps, right, like make it very shareable as you build an algorithm, containerize it, that can be deployed on any cloud, it can be deployed on prem. It can be deployed in ways where it's now reproducible, right? Like if I run that container in my lab, share it with another lab, I can be assured largely that they'll get the same results.

And that portability and reproducibility are just key to getting all this stuff done. And it's largely baked into how we've thought about building the Flywheel platform. We call containerized apps, our name for those is Gears. We have a Gear library with literally hundreds of different applications and algorithms that are available on the platform and our customers can easily add their own Gears. One of the things we've been doing across the Flywheel and XNAT platforms is to make that Gear specification shareable across systems. So if you're on an open source XNAT instance and then want to share an algorithm with a partner in industry who's on Flywheel, it's very easy to share those apps and to have a very low bar towards collaboration.

Oliver Keown:

Another question that we've had and something I spend a lot of time thinking again, on the clinical side, within a large organization where they may have a universal set of challenges here, or they may have different groups within an organization that have their own unique take on the same challenge— do you typically find best practice is having a group that kind of mandates a strategy for the broader organization to adopt certain technologies and best practices? Or again, maybe this from the institutional perspective, or do you see smaller groups within that develop their own bespoke strategies to solve, their version of the problem? Maybe Costas, you could speak to your strategy within your former organization, and then Dan, how you've seen it as in terms of best practice and impact in other organizations,

Costas Tsougarakis:

Right. I can speak from experience for an overall strategy for the entire organization. You do need a single source— leadership mandate that cascades down to various groups. For solving some specific issues around data FAIRification, though, we did then delegate or separate out specific questions to experts in fields. Like defining, let's say this harmonized nomenclature, we reached out to specific experts and specialists in every area in every therapeutic area or data modality, to define their own standards. And so it wasn't forced down on them. And that's the best way of driving adoption.

Dan Marcus:

Costas is exactly right. I think that you need that top down encouragement and principled approach that we are gonna collaborate, we are gonna make data available across organization and between organizations possible. So top down mandate there. And then also putting some meat behind that and providing resources to actually get this kind of work done. But then letting specific groups kind of work out the details of their domain. Because every domain is different, has its own different risks, has its own processes. So letting them kind of figure it out, but in that principled environment.

Oliver Keown:

What about your organizations that maybe don't have that leadership or don't have the kind of runway to getting the leadership in place? Is it still possible to find internal champions and build to a bigger strategy over time? Maybe just thinking the range of folks that might be on this call and different strategies they could invoke here.

Dan Marcus:

Yeah, I mean, I think so. Certainly you can help compel leadership to get on board if they see ROI from specific example projects, see that value, see what kinds of innovation it fosters. Getting some small wins, I think can make a huge difference. So yeah, I would certainly encourage small groups to work together and figure out how to solve it at their level and make sure that that has visibility to leadership in the organization.

Oliver Keown:

Makes sense. We've got another question here: in improving accuracy and speed, as well as providing real time diagnostic analysis, are there any specific strategies or approaches you might recommend for companies in the research data space, looking to platforms such as Flywheel to solve some of those very specific accuracy and speed from a realtime diagnostic perspective. Dan, is that something you see from users of a platform today and the research side?

Dan Marcus:

Yeah, I mean, to me, the big chasm is going from a model that works really well in a research context, and you're really excited and happy about it and you've published and it all looks cool, to actually putting that into a diagnostic workflow. Because often that work is done by a graduate student on his desktop computer and reproducing that in a way where it can be validated and under the level of what an FDA type of algorithm might be expected to have, following all the GXP sorts of compliance requirements, like graduates aren't thinking about that stuff when they're just trying to do cool stuff to get their school. But then they do in fact create really cool stuff that could have diagnostic utilities. So that gap can be quite broad. And to me that's the biggest challenge. And why, where we are in kind of the AI/ML space is all kinds of cool innovation happening, but not a lot of it making its way into actual clinical practice and having patient impact.

It's the thing we're thinking a lot about at Flywheel and having the ability to do ML development on the platform so that you do have that provenance and reproducibility around models, even for your graduate students who are used to just kind of hacking stuff up in a Python notebook or something like that, that you have enough of the provenance and history and understanding of the data and how those models were built, that you can move it along the chain. Having that kind of compliance infrastructure in place that it can be deployed diagnostically. That last mile, getting it actually into the clinic has not been a focus of Flywheel. There are a lot of platforms out there that are doing that last mile of taking in models and putting them into an app store sorts of environments. And I think, we go up to that last mile and provide a lot of the core infrastructure that's necessary so that you have the confidence that once you throw that model over into the diagnostic side, that it's actually gonna, work and be valuable to clinicians.

Oliver Keown:

Makes sense. Another, another question here maybe Costas for you, an institution with huge historical legacy, clinical study data, I imagine potentially of different sorts, different kind of use cases— where does one start, if they're developing a strategy or implementing a strategy to get to that harmonized data lake vision that we outlined here?

Costas Tsougarakis:

I know I've dealt with this also quite a bit. I think just biting the bullet and doing so expecting that you can't boil the ocean, right? So you can't solve any data problem or any data anomaly that you may encounter focusing on creating analysis-ready data sets. So focusing on key things: building automation that will help you get all the legacy data into your common platform and building QCing around your data. So you can scale up as you tackle with historical data. But kind of be accepting that you may not be able to solve for all of it, but focus on what is important,

Oliver Keown:

Right. Start somewhere, right. There's a question here might be our last question. Flywheel, maybe a question for you, Dan, specifically has worked with a lot of academic institutions. There's a question, do you have enterprise customers and big pharma. I think you've outlined that that is certainly the case, but could you maybe describe the business model, the sets of value propositions, maybe how that differs between the different stakeholders, maybe all the different types of stakeholders that Flywheel services.

Dan Marcus:

Yeah, so we certainly work very closely with the academic medical centers. Both Flywheel and XNAT, our DNA and origin is in academic environments, but the problems are the same, that we were trying to solve in the academic context that are within life science organizations, as far as data curation, data management, building rich cohorts, doing automation around all of your workflows. It's the same problems, probably just bigger and requiring more of the compliance around doing it. So very much Flywheel is engaging with life science, med device, life science environment, generally. We have a number of customers there; Costas certainly had an experience with Flywheel in the Genentech/Roche environment.

And to me, the really exciting part of this story is the collaboration across AMCs and industry, with the Flywheel Exchange I was referring to before, I think this is gonna provide this really interesting set of capabilities to work closely across different academic organizations, but between academics and industry. Costas, do you want to anything to that just from having been on more on the life science side?

Costas Tsougarakis:

No, I think you covered that all. I think I saw just a question here about that Flywheel is more geared towards academia and not life sciences. Just to correct that notion that Flywheel has large scale enterprise customers such as Roche and finding a platform that can scale up and handle the volume of data such as coming out from a company such as Roche I think is key.

Oliver Keown:

Absolutely. well, I think that's a nice point here to finish on. This discussion has ranged across the barriers here and silos of data and some of the opportunities of getting it right. I think we heard the opportunity to really have leadership to align around leadership in this space and have a vision for what good data governance and data strategy could look like. And, ultimately it being about getting that infrastructure in place, something that can scale and support the various use cases and applications that different teams are looking to achieve.

I think that idea of collaboration, Dan, and ultimately federated learning and that is something that's certainly as I see in the investing side, a huge opportunity going forward broadly for the ecosystem. And I think Flywheel has some fantastic solutions in that space that you guys are leveraging. So I think with that, we might wrap up the the webinar here, but I wanna say huge thanks to Dan and to Costas for the rich discussion and the folks here and organizers of the WebEx for the opportunity to have this discussion. So I'll maybe hand over back to to Andrea here, but thanks, Dan, thanks Costas.

Dan Marcus:

Thank you, Oliver.