Insilico Medicine, a clinical-stage AI drug-discovery company, announced a groundbreaking medical milestone: It successfully predicted Phase II to Phase III clinical trial outcomes using its proprietary generative transformer-based AI tool, inClinico.
The clinical stage accounts for approximately 90% of drug-development failures attributed to issues such as lack of efficacy, safety concerns and the intricacies of diseases and data. These failures lead to trillions of dollars lost and years of effort wasted. In response to this immense failure rate, Insilico developed the generative AI software platform inClinico to forecast the outcomes of Phase II clinical trials.
The platform incorporates various engines that harness the power of gen AI and multimodal data, encompassing text, omics, clinical trial design and small molecule properties. Its training data includes more than 55,600 unique Phase II clinical trials from the past seven years.
The subsequent clinical trial probability model, developed by Insilico researchers, demonstrated an impressive 79% accuracy when validated against real-world trials in the prospective validation set where measurable outcomes were available.
AI revolutionizing drug development
The research, published in the Clinical Pharmacology and Therapeutics journal, showcases the potential of AI to revolutionize drug development and investment decision-making.
The company said that AI engines used in this study have been integrated into the inClinico system, designed to predict clinical trial outcomes. This integration is a key component of the Medicine42 clinical trials analysis and planning platform.
“AI offers an enormous advantage when it comes to processing and analyzing complex data and recognizing patterns,” Alex Zhavoronkov, founder and CEO of Insilico Medicine, told TechForgePulse. “Using machine learning and AI, we built models based on various data points related to successfully launched and failed drugs. We then combined these models into our prediction engine inClinico. For every evaluated Phase II trial, inClinico generates a probability of success for proceeding to Phase III.”
Zhavoronkov said the validation studies were conducted internally and in collaboration with pharmaceutical companies and financial institutions, demonstrating the robustness of the incline platform. On a quasi-prospective validation dataset, the platform achieved an impressive ROC AUC score of 0.88, a measure of its capability to discriminate between success and failure in clinical trial transitions.
The company claims that the platform’s accurate predictions were tested with a date-stamped virtual trading portfolio, resulting in a 35% return on investment (ROI) over nine months, making it a valuable tool for investors seeking critical technical due diligence insights.
Leveraging generative AI for drug development and discovery
Insilico’s Zhavoronkov said that his research group created the starting dataset of Phase II clinical trial data from 55,653 trials pulled from clinicaltrials.gov and various other public sources, including pharma press releases and publications.
This data had to be properly labeled, annotated and linked together; a task performed by biomedical experts, a discriminative transformer and a generative large language model.
A transformer system then mapped these trials to drugs and diseases using a natural language processing (NLP) pipeline based on the state-of-the-art Drug and Disease Interpretation Learning with Biomedical Entity Representation Transformer (DILBERT), which was published at the ECIR 2021 conference.
Zhavoronkov stated that the pharma industry traditionally relied on fundamental academic research and serendipity to generate new ideas and hypotheses. However, the high failure rate indicates that the complexity of diseases and biological mechanisms make it exceedingly challenging to identify successful targets for treating diseases, especially novel targets.
Revealing insights, potential treatments
Zhavoronkov asserts that incorporating AI into analyzing large, diverse datasets can reveal insights about disease mechanisms and potential treatments that may not be evident to humans. PandaOmics is a part of the inClinico and assimilates vast amounts of data from clinical trials, drugs and disease information to predict the likelihood of success or failure during the Phase II to Phase III transition.
PandaOmics utilizes various data types such as omics data, grants, clinical trials, compounds and publications to analyze and produce a ranked list of potential targets specific to a disease of interest.
“PandaOmics is a knowledge graph for target identification through which our generative AI platform can find connections between clinical trial success or failure, disease conditions and drug attributes that might elude human scientists,” Zhavoronkov told TechForgePulse. “Using this data, we built our model for predicting the Phase II clinical trial probability of success, defined as the transition of drug-condition pair from Phase II to Phase III.”
Enhanced predictive capabilities
Insilico Medicine has been training inClinico on clinical trials, drugs and diseases since 2014, said Zhavoronkov, who emphasized that by combining multimodal LLMs and other gen AI technologies, the company has significantly enhanced its predictive capabilities.
As a result, inClinico now serves as a tool to guide companies in directing their research funds and expertise toward programs with the highest likelihood of success while enabling them to capture and utilize valuable information from programs that have faced setbacks.
“The ability of inClinico to predict the successful Phase II to Phase III transition drugs, even without prior information related to the clinical relevance of the drug’s action of disease, validates the generative AI models and their ability to build on existing data to predict outcomes for diseases where fewer data is available,” Zhavoronkov explained. “The more data it has, and the more successful outcomes, the better AI becomes at accurate prediction.”
What’s next for Insilico?
Zhavoronkov expressed strong encouragement regarding the findings, while also acknowledging their basis within a limited dataset. He firmly believes that the system’s sophistication and precision will continuously improve over time, driven by a surge in data and reinforcement, including insights from Insilico’s internal pipeline programs — three of which (for idiopathic pulmonary fibrosis, cancer and COVID-19) have successfully advanced to clinical trials.
Insilico projects that approximately 20 to 25% of trials can be predictably assessed using the inClinico tool with meaningful accuracy. The company aspires to expand its capabilities further, leveraging new laboratory robotics advancements to predict success rates for combination therapies and facilitate the selection of the most effective combinations for targeted therapies.
“We integrate cutting-edge technological breakthroughs into our platform, incorporating AI-powered robotics, AlphaFold and quantum computing,” Zhavoronkov explained. “My grand goal is to see this tool deployed extensively because broader usage will drive further improvement. We employ an approach called Reinforcement Learning from Expert Feedback (RLEF), where the tool’s accuracy improves with the insights we receive from analysts using it for predictions. Currently, we can only predict small molecule first-in-class single-agent targeted therapeutics.”
TechForgePulse's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.