Dead End Discovery Reinforcement Learning A.I. in Healthcare Policy
Developed by Microsoft, Adobe, MIT, and Vector Institute
While Microsoft powers innovation behind AI at Scale, Microsoft is also applying A.I. to healthcare. DeD or dead-end-discovery, is using reinforcement learning to identify high-risk states and treatments in healthcare.
In this research project, Microsoft built a machine learning (ML) model that works with scenarios where data is limited, such as healthcare. This model was developed to recognize treatment protocols that could contribute to negative outcomes and to alert clinicians when a patient’s health could decline to a dangerous level.
Microsoft Becoming a Juggernaut of A.I. Research
I like this project because it takes a real-world condition like the pandemic, to improve potential A.I. applications to deal with it and similar situations. Microsoft is showing a commitment to work projects in A.I. where every organization in the world would benefit from the power of these models, which is why Microsoft’s AI at Scale initiative is making these large models – and the systems and infrastructure to enable training and utilization – available as a platform.
Aether, a Microsoft cross-company initiative on AI Ethics and Effects in Engineering and Research, as outreach from their commitment to advancing the practice of human-centered responsible AI is also hard at work to improve AI Trustworthiness.
Microsoft is also partnering with Nvidia more to work on large scale generative language models. According to Microsoft, thanks to self-supervised learning, few-shot, zero-shot, and fine-tuning techniques, the size of the language models are growing each passing day significantly, calling for high-performance hardware, software, and algorithms to enable training large models.
Microsoft and Nvidia recently have claimed to established state-of-the-art results, alongside SOTA accuracies in natural language processing (NLP), by adapting to downstream tasks via few-shot, zero-shot, and fine-tuning techniques.
While these large-scale pretrained language models have made significant breakthroughs in language understanding, they still struggle with commonsense knowledge gathered in our daily lives. Microsoft KEAR achieved this breakthrough in commonsense that surpassed human parity in the CommonsenseQA benchmark in December 2021.
Microsoft’s potential impact and implications of research in Medical, Health and Genomics is above average.
Do you enjoy A.I. articles at the intersection of breaking news, than help me continue to write on the subject.
What is DeD
Off-policy Reinforcement Learning (RL) separates behavioral policies that generate experience from the target policy that seeks optimality. It also allows for learning several target policies with distinct aims using the same data stream or prior experience.
In the medical field, RL has been used to determine the best treatment plans based on the outcomes of previous treatments.
Given a patient’s condition, these policies equate to advising what therapies to deliver.
RL estimates of optimal policies are usually unreliable in healthcare, and most clinical environments prohibit the investigation of different treatment courses due to legal and ethical concerns.
Read the blog post here: https://www.microsoft.com/en-us/research/blog/using-reinforcement-learning-to-identify-high-risk-states-and-treatments-in-healthcare/
Read the full paper here: https://www.microsoft.com/en-us/research/publication/medical-dead-ends-and-learning-to-identify-high-risk-states-and-treatments/
Summary of Best Policy with RL
Researchers from Microsoft, Adobe, MIT, and Vector Institute have developed Dead-end Discovery (DeD), a new Reinforcement Learning (RL) based technology that identifies therapies to avoid rather than which treatment to choose.
Read the paper here.
The lead Researcher in this study is Mehdi Fatemi, see his LinkedIn profile here. Contributors in the video above include Microsoft Research Senior Researcher Mehdi Fatemi, MIT Assistant Professor Marzyeh Ghassemi, and PhD student Taylor W. Killian. You can read more of Mehdi’s topics on Google Scholar here.
Can AI be Implicated in Future Health Policy at Scale in Emergency Situations with Limited Data?
This paradigm shift eliminates the difficulties that might occur when policies are constrained to stay near to potentially suboptimal recorded behavior.
Studies show that the current techniques fail to produce a trustworthy policy when there isn’t enough exploratory behavior in the data. That is why the researchers use this information to limit the scope of the policy by retrospective analysis of observed consequences. This method has been proven to be more manageable when data is scarce.
DeD uses two complementary Markov Decision Processes (MDPs) with a specialized reward design to identify dead-ends, allowing the underlying value functions to have distinctive meaning. These value functions are independently assessed using Deep Q-Networks (DQN) to infer the probability of a negative outcome and the reachability of a positive outcome. Overall, DeD learns directory from offline data and establishes a formal link between the concept of value functions and the dead-end problem.
The system could help physicians select the least risky treatments in urgent situations, such as treating sepsis. However I can think of much more dangerous scenarios where this could be useful.
To help clinicians avoid remedies that may potentially contribute to a patient’s death, researchers at MIT and elsewhere have developed a machine-learning model that could be used to identify treatments that pose a higher risk than other options.
This DeD model was developed to recognize treatment protocols that could contribute to negative outcomes and to alert clinicians when a patient’s health could decline to a dangerous level.
“One core idea here is to decrease the probability of selecting each treatment in proportion to its chance of forcing the patient to enter a medical dead-end — a property that is called treatment security. This is a hard problem to solve as the data do not directly give us such an insight. Our theoretical results allowed us to recast this core idea as a reinforcement learning problem,” Fatemi says.
To develop their approach, called Dead-end Discovery (DeD), they created two copies of a neural network.
The first neural network focuses only on negative outcomes — when a patient died — and the second network only focuses on positive outcomes — when a patient survived.
Using two neural networks separately enabled the researchers to detect a risky treatment in one and then confirm it using the other. They fed each neural network patient health statistics and a proposed treatment.
The networks output an estimated value of that treatment and also evaluate the probability the patient will enter a medical dead end. The researchers compared those estimates to set thresholds to see if the situation raises any flags.
A yellow flag means that a patient is entering an area of concern while a red flag identifies a situation where it is very likely the patient will not recover.
Do you enjoy A.I. articles at the intersection of breaking news, than help me continue to write on the subject.
AI Will Improvement Treatment Because it Will Recognize Flags that Indicate Risk of Mortality Earlier
You can read the details of ths Sepsis example here.
Flagging medical remedies that could impact mortality or reduce good outcomes could be a way treatments are improved in the future with A.I. significantly improving patient outcomes and speeding up interventions at critical windows of patient care.
Honestly this is Google level AI for healthcare. I’m not sure even Microsoft realizes how important this is.
By using the limited data from a hospital ICU to train a reinforcement learning model to identify treatments to avoid, with the goal of keeping a patient from entering a medical dead end, DeD sigifnicantly contributes to the field of A.I'‘s impact on future treatment policy and speed of interventions to avoid poor outcomes.
Typical Reinforcement Learning
Reinforcement learning is widely used in gaming, for example, to determine the best sequence of chess moves and maximize an AI system’s chances of winning. Over time, due to trial-and-error experimentation, the desired actions are maximized and the undesired ones are minimized until the optimal solution is identified.
DeD is basically reverse-psychology RL and is essentially an A.I. model that describes how algorithms could help patients die less in hospital and clinical settings.
You can explore the details of this research project in our research paper, “Medical Dead-ends and Learning to Identify High-risk States and Treatments,” which was presented at the 2021 Conference on Neural Information Processing Systems (NeurIPS 2021).
DeD illustrates how effective A.I. could be to impact treatment protocols and policy decisions to reduce mortality. This is because healthcare is a sequential decision-making domain, and reinforcement learning is the formal paradigm for modeling and solving problems in such domains.
Offline Reinforcement Learning (ORL)
In healthcare, clinicians base their treatment decisions on an overall understanding of a patient’s health; they observe how the patient responds to this treatment, and the process repeats.
However, unlike in gaming, exploratory data collection and experimentation are not possible in healthcare, and our only option in this realm is to work with previously collected datasets, providing very limited opportunities to explore alternative choices. This is where offline reinforcement learning comes into focus.
A subarea of reinforcement learning, offline reinforcement learning works only with data that already exists—instead of proactively taking in new data, we’re using a fixed dataset. Even so, to propose the best course of action, an offline reinforcement learning algorithm still requires sufficient trial-and-error with alternatives, and this necessitates a very large dataset, something not feasible in safety-critical domains with limited data, like healthcare.
I hope this gave you a worthwhile introduction to the kind of research work BigTech companies like Microsoft are doing to impact the future of healthcare. I’m covering Microsoft Research as best I can in addition to the breaking news around A.I.
A.I. Could be a Star Pupil in the ICU
So finally let’s review what DeD is and why it could save lives in the future for a number of conditions, scenarios or even unexpected collective situations where limited data is involved.
The team developed their methodology called Dead-end Discovery (DeD), which identifies treatments to avoid in order to prevent a medical dead-end—the point at which the patient is most likely to die regardless of future treatment.
DeD provably requires exponentially less data than the standard methods, making it significantly more reliable in limited-data situations.
By identifying known high-risk treatments, DeD could assist clinicians in making trustworthy decisions in highly stressful situations, where minutes count.
Moreover, this methodology could also raise an early warning flag and alert clinicians when a patient’s condition reveals outstanding risk, often before it becomes obvious.
Off-policy RL really could be used in multiple ways in the future. Nobody wants to be in a medical dead-end and if A.I. can help us improve patient outcomes in the ICU and in treatment policy and rapid interventions (for instances in conditions where limited number of nurses to patient ratios exist), clearly A.I. is going to help in the future of healthcare.
Final Note on DeD and Other Applications
The researchers involved view DeD as a powerful tool that could magnify human expertise in healthcare by supporting clinicians with predictive models as they make critical decisions.
There is significant potential for researchers to use the DeD method to expand on this research and look at other measures, such as the relationship between patient demographics and sepsis treatment, with the goal of preventing certain treatment profiles for particular subgroups of patients.
The principles of offline reinforcement learning and the DeD methodology can also be applied to other clinical conditions, as well as to safety-critical areas beyond healthcare that also rely on sequential decision-making. For example, the domain of finance entails similar core concepts as it is analogously based on sequential decision-making processes.
I believe this model could also be used to lower costs in healthcare to optimize healthcare accessibility. You can learn more about the research and access the code here. More immediate it could be applied to stocks. The researchers even speculate that DeD could be used to alert financial professionals when specific actions, such as buying or selling certain assets, are likely to result in unavoidable future loss, or a financial dead-end.
The brings to mind how A.I. could be used to model dead-ends for human extinction say for instance with the impacts of climate change. DeD is an elegant methodology with countless potential applications though is likely years if not decades away from impacting how ICUs actually operate.
You can view the research video on DeD here.
If you enjoy my articles here you might enjoy a new Newsletter I’m starting around Quantum computing, innovation and genomics called Quantum Foundry.
I cannot continue to write without tips, patronage and community support from you, my readers and audience. I want to keep my articles free for the majority of my readers.
So by subscribing you are essentially helping fund a network of Newsletters whose aim is to inspire and inform.
AiSupremacy is the fastest Substack Newsletter in AI at the intersection of breaking news. It’s ranked #1 in Machine Learning as of January 22nd, 2022.
Thanks for reading!