Artificial Intelligence Meets Disease Research: How A New Partnership is Streamlining NYSCF’s Science


Artificial intelligence (AI) is helping scientists take biological research to a whole new level. A finely-tuned algorithm can help identify features of cells imperceptible to the human eye, analyze large datasets in unprecedented detail, and even provide insights into how to improve experimental processes.

Over the past several years, data science and AI have played an increasingly pivotal role at the NYSCF Research Institute, and their applications are continuing to expand and grow throughout NYSCF’s many areas of research. Now, thanks to a partnership with Two Sigma’s Data Clinic, NYSCF researchers and Two Sigma data scientists are collaborating to further improve the workflow of the NYSCF Global Stem Cell Array®, our automated system for creating stem cells. This work will enhance the quality of the cells the Array generates and help scientists get back a bit of their most valuable asset: their time.

Why does biological research need artificial intelligence?

A lot of biological research, including NYSCF’s, generates enormous datasets. AI can help researchers dive into this data in ways that used to be impossible, illuminating new biological insights and enabling more powerful experiments.

“The data sizes that we’re making are just too big to be analyzed by hand,” remarked Daniel Paull, PhD, NYSCF’s Senior Vice President of Discovery and Platform Development. “They’re so complex that even traditional approaches of data analysis just don’t cut it. AI allows insights that just can’t be generated otherwise. And it can impact a lot: everything from improving the process development in making our disease models to drawing conclusions about a given disease.”

What is Two Sigma’s Data Clinic?

Data Clinic brings people, data science skills, and technological know-how to help organizations that serve the public good to use data and tech more effectively.

“Data Clinic, founded in 2014, is the pro bono data science and tech-for-good arm of Two Sigma,” explained Rachael Weiss Riley, PhD, Director of Data Clinic. “We partner with social impact organizations including nonprofits, academia, and government agencies to provide data and engineering support on a project basis. Our Data Clinic personnel and Two Sigma volunteers work closely alongside nonprofits for about 4-6 months to deliver some sort of bespoke solution: so that could be a model, a set of research insights, or a small engineering build. We also contribute to the broader ‘data for good’ movement through our own research and the development of open source tools that aim to enable more people and more organizations to use data as a resource for impact.”

“We understand that nonprofit organizations often have limited resources and employees have to make various priorities in what they spend their time on,” noted Erin Stein, Head of Operations at Data Clinic. “Our goal is to provide data and tech support that allows organizations to spend more time on meaningful activities, decisions, and innovations that are going to help them further their mission and have greater impact on the work that they’re doing.”

How did the collaboration start?

According to Alfred Spector, Two Sigma’s CTO when the project began, “Contributing to the NYSCF Global Stem Cell Array was a perfect project for Two Sigma’s Data Clinic as it combined opportunities in both data science and programming, and it held promise that great scientists and great technology could speed our understanding of disease and ultimately its treatment.” Mr. Spector also recently began working with the NYSCF Research as a Senior Scientific Advisor on the technical and strategic aspects of NYSCF’s work in drug discovery and machine learning.

“We are thrilled to be working with Two Sigma on this exciting project,” added NYSCF CEO Susan L. Solomon. “NYSCF has always pioneered technology and data-driven research, and partnering with Data Clinic is helping us accelerate solutions for the major diseases of our time.”

“We have already learned so much from this collaboration,” said Rick Monsma, PhD, Senior Vice President of Scientific Operations at NYSCF. “The Data Clinic team is doing outstanding work with our researchers to optimize our studies, and this work is a true testament to the power of data science for improving research, and in turn, outcomes for patients.”

How is Data Clinic helping NYSCF?

Right now, NYSCF researchers have to examine plates of cells by hand to determine whether the cells are ready to be advanced on the Array, and this process can be laborious and tricky: advancing cells at the inopportune time can affect their viability, rendering them less suitable (or unusable) for research. With Data Clinic’s help, this workflow could be aided by analyzing image-based data in a mathematical model that makes a recommendation for how to proceed, cutting down on decision-making time by our scientists.

Scientists working with Array robotics
NYSCF scientists working on the Array

“The focus here is to optimize the workflow that we’ve had in place since 2015,” explained Dr. Paull. “It’s a good target because we can have very large numbers of samples in culture at any given time. For example, in 2020 we created over 8,000 unique samples, and in July alone, we had created close to 1,000 different plates. Anything that can speed up the process of deciding what needs to happen next in a workflow allows us to get back our own time and the robot’s time.”

The model could also help further the standardization of the cells produced, increasing their quality for research.

“If I am working, I am going to make one set of decisions. Then the next weekend someone else will be here and they’ll make another set of decisions,” said Dr. Paull. “If we can have a tool that guides decision-making in a very standardized way, that will reduce the variation and perhaps increase our efficiencies as well as the quality of all of the cell lines that we make.”

“The end goal is also that all of these plates that have cells growing in them are viable for research,” added Kaushik Mohan, a data scientist at Data Clinic.

Importantly, the data scientists do not aim to replace a scientist’s role in this process, but rather provide them with more information to optimize it.

“We don’t see our work as geared toward getting rid of the human, but rather augmenting the human in the loop,” said Dr. Weiss Riley. “We are thinking through ways to support human operations and decision-making, but ultimately it’s the experienced technician coming in and deciding whether or not they should indeed transfer those plates — at least at this early stage. AI and machine learning and deep learning and all these things are great, but they need to exist in concert with the folks on the ground.”

What’s the latest for the collaboration?

“Right now, we’ve developed a couple of different models that would predict the optimal time to passage cells [advance them on the Array],” said Mohan. “We’re now giving the relevant software code to the team at NYSCF to implement on their end. And we’re hoping to do a pilot study to figure out which model works best so that the team can use it to make these decisions about advancing cells.”

Then, our scientists will start testing the model so the collaborative team can work out any kinks.

“Once we hand off this model then we’re in validation mode: trying to figure out if it’s working the way we expect,” added Dr. Weiss Riley. “This is a really important step in our collaboration. We need to make sure that there’s capacity on both ends to implement it and integrate it into the organization. We look forward to that stage because it’s fun for us. We get to see if what we built is actually doing what we hope it’s doing. We sure hope so, but if it’s not, then we can iterate, and we’ll keep tweaking and fixing things.”

What is the most challenging part of bringing AI into biology?

“None of us on the Data Clinic team come from a biological background,” noted Mohan. “The first time we spoke to the team at NYSCF, our heads were spinning a bit because we don’t know what all of the terms meant. So I think the context and the framing of the problem is a big challenge. Once we can wrap our heads around the process that is currently in place and understand what the data actually represents, then the data science-y part of the problem, which is prediction or anything else, becomes much simpler.”

On the researchers’ side, the collaboration is teaching them a lot about what kind of data can be useful for AI applications and how to bring a data-driven lens to more of their experiments and processes.

“We’re learning that there’s so much data that gets thrown away yet could ultimately be useful, either now or even a couple of years down the line,” said Dr. Paull. “It has made me, and the wider team, think more about the types of data we collect, and has been very educational. We made decisions early on to capture as much data as possible without really knowing what all of it would be ultimately used for – it’s projects like this that help validate those decisions but also make us question what other data we need to start capturing.”

Why are multidisciplinary collaborations like this important for science?

The teams stressed that these kinds of partnerships bring together different types of thinking to tackle a shared goal.

“What we love about the Data Clinic is that bringing these diverse perspectives together really allows people to consider problems and potential solutions in a whole new light,” noted Stein. “As you saw, we’re learning about all this for the very first time. And because of that, we were able to think about it with a really fresh perspective. And then on the flip side, our partners are able to hear about the kinds of stuff that we can do and maybe think outside the box in terms of solutions that maybe they haven’t attempted yet or had time to try yet. It’s really a ripe environment for innovation.”

“Access to expertise and resources like this as a smallish non-profit, especially on the data science side, is just phenomenal,” added Dr. Monsma. “We are incredibly grateful for all of the amazing pro bono help they’ve provided for us. It’s been really exciting so far, and I’m looking forward to what we will accomplish together.”