Generative AI in drug discovery

Audio version
Following a breakout year of rapid growth, generative AI has been widely, and justifiably, described as an undisputed game-changer for almost every industry. A recent McKinsey Global Survey lists the healthcare, pharma, and medical products sectors as one of the top regular users of generative AI. The report also highlights that organizations that have successfully maximized the value derived from their traditional AI capabilities tend to be more ardent adopters of generative AI tools.
The AI revolution in the life sciences industry continues at an accelerated pace, reflected partly in the increasing number of partnerships, mergers, and acquisitions centered around the transformative potential of AI. For the life sciences industry, therefore, generative AI represents the logical next step to transcend conventional model predictive AI methods and explore new horizons in computational drug discovery.
Here then, is a quick overview of generative AI and its potential and challenges vis-a-vis in silico drug discovery and development.
What is generative AI?
Where traditional AI systems make predictions based on large volumes of data, generative AI refers to a class of AI models that are capable of generating entirely new output based on a variety of inputs including text, images, audio, video, 3D models, and more.
Based solely on the input-output modality, generative AI models can be categorized as text to text (ChatGPT-4, Bard), to speech (Vertex AI), to video (Emu Video), to audio (Voicebox), to image (Adobe Firefly); image to text (Pix2Struct), to image (SinCode AI), to video (LeiaPix); video to video (Runway AI) and much more.
Currently, the most prominent types of generative AI models include Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), Recurrent Neural Networks (RNNs), diffusion models, flow-based models, autoregressive models, transformer-based Models, and style transfer models.
What is the role of generative AI in drug discovery?
It is estimated that generative AI technologies could yield as much as $110 billion a year in economic value for the life sciences industry. These technologies can play a transformative role across the drug discovery pipeline.
Generative AI can boost the precision, productivity, and efficiency of target identification and help accelerate the drug discovery process. These technologies will provide drug discovery teams with the capabilities to generate or design novel molecules with the desired properties and curate a set of drug candidates with the highest probability of success. This in turn would free up valuable R&D resources to focus on orphan, rare, and untreatable diseases.
These technologies will enable life sciences R&D to cope with the explosion in digital data, in diverse formats such as unstructured text, images, patient records, PDFs, and emails, and ingest and process multimodal data at scale. The ability to extract patterns from vast volumes of patient data can empower more personalized treatments and improved patient outcomes.
AI systems played an instrumental role in accelerating the development of an effective mRNA vaccine for COVID-19, the company put into place AI systems to accelerate the research process. Generative AI technologies are now being leveraged to address some of the challenges associated with designing RNA therapeutics and to design mRNA medicines with optimal safety and performance.
As with traditional AI systems, generative AI will help complement experimental drug discovery processes to further enhance the speed and accuracy of drug discovery and development while reducing the time and costs involved.
How do different generative models compare for molecule design?
Generative models like VAEs (Variational Autoencoders) and GANs (Generative Adversarial Networks) are increasingly applied to de novo drug design. VAEs are particularly effective for exploring latent chemical space, offering structured representations that capture chemical relationships. GANs, on the other hand, excel at generating structurally novel molecules, often producing higher diversity in candidate structures.
Combining both models in a generative pipeline helps balance molecular novelty with drug-like properties.
Model comparison
Model | Strengths | Weaknesses | Use Case |
---|---|---|---|
VAE | Explores latent space; captures structure–property relationships | Lower novelty | Scaffold hopping |
GAN | High novelty; structurally diverse outputs | Training instability | De novo design |
Combined Use | Balance between control and diversity | May increase complexity | Balanced candidate profiles |
How is generative AI used for compound screening in drug discovery?
Pharma and biotech companies are increasingly turning to generative AI for in silico screening of novel compounds. These models are trained on molecular graph datasets (e.g., SMILES strings or 3D conformers) and validated using drug-likeness metrics like QED scores, docking simulations, and ADMET predictions.
To build a generative AI model for molecules, most researchers:
- Use a curated SMILES-based dataset
- Train a VAE or GAN on molecular representations
- Validate outputs using metrics such as QED, synthesizability, and binding affinity predictions
These workflows can be combined with retrieval-augmented generation (RAG) pipelines to further refine candidate selection using up-to-date biomedical literature.
What are the key generative AI applications in drug discovery?
Overall, generative AI offers a transformative approach to drug discovery, significantly accelerating the identification and optimization of promising drug candidates while reducing costs and experimental uncertainty.
Molecule generation
Generative AI models represent a more efficient approach to navigating the vast chemical space and creating novel molecular structures with desired properties. Currently, a range of techniques, such as VAEs, GANs, RNNs, genetic algorithms, and reinforcement learning, are being used to generate molecules with desirable ADMET properties. One approach synergistically combines generative AI, predictive modeling, and reinforcement learning to generate valid molecules with desired properties. With their ability to simultaneously optimize multiple properties of a molecule, generative AI systems can help identify candidates with the most balanced profile in terms of efficacy, safety, and other pharmacological parameters.
Antibody design & development
The continuing evolution of artificial intelligence (AI), machine learning (ML), and deep learning (DL) techniques has helped significantly advance computational antibody discovery as a complement to traditional lab-based processes. The advent of protein language models (PLM), generative AI models trained on protein sequences, has the potential to unlock further innovations in in silico antibody design and development. Generative antibody design can significantly enhance the speed, quality, and efficiency of antibody design, help create more targeted and potent treatment modalities, and generate novel target-specific antibodies beyond the scope of conventional design techniques. Recent developments in this field have demonstrated the ability of zero-shot generative AI, models that do not use training data, to generate novel antibody designs that were tested and functionally validated in the wet lab without the need for any further optimization.
De novo drug design
The power of generative AI models is also being harnessed to create entirely new drug candidates by predicting molecular structures that interact favorably with biological targets. The increasing popularity of generative techniques has created a new approach to generative chemistry that has been successfully applied across atom-based, fragment-based, and reaction-based approaches for generating novel structures. Generative models have helped extend the capabilities of rule-based de novo molecule generation with recent research highlighting the potential of “rule-free” generative deep learning for de novo molecular design. The continuing evolution of generative AI towards multimodality will help further advance de novo design using complementary insights derived from diverse data modalities.
Drug repurposing
Generative AI can expedite the discovery of new uses for approved drugs, thereby circumventing the development time and costs associated with traditional drug discovery. One study demonstrated the power of generative AI technologies like ChatGPT modes to accelerate the review of existing scientific knowledge in an extensive Internet-based search space to prioritize drug repurposing candidates. New research also demonstrates how generative AI can rapidly model clinical trials to identify new uses for existing drugs and therapeutics. These technologies are already being applied successfully to the critical task of repurposing existing medicines for the treatment of rare diseases.
Precision drug discovery
By analyzing large-scale multimodal datasets, including multiomics data, genome-wide association studies (GWAS), disease-specific repositories, biobank-scale studies, patient data, genetic evidence, clinical data, imaging data, etc., generative AI models can help design drug candidates with the highest likelihood of efficacy and minimal side effects for specific patient populations.
What are the generative AI challenges in drug discovery?
Despite their immense potential, there are still several challenges that need to be addressed before generative AI technologies can be successfully integrated into drug discovery workflows.
Generative models require large, high-quality, diverse datasets for training. In drug discovery, experimental data is often sparse, and noisy, with errors and outliers. The availability of large volumes of high-quality data, especially for rare diseases or novel drug targets, remains a challenge.
Generative models trained on biased or limited datasets may produce biased or unrealistic outputs. It is therefore crucial to ensure that these models are trained on unbiased, diverse datasets and are generalized across the vast chemical space and biological targets. These technologies raise significant ethical and regulatory considerations, including concerns about patient safety, data privacy, and intellectual property rights.
Finally, and most importantly, generative models are inherently a black box, raising further questions about interpretability and explainability.
These challenges notwithstanding, generative AI has the potential to usher in the next generation of AI-driven drug discovery.
Ready to explore how generative AI can support your drug discovery programs?
Talk to our team or explore more use cases in our platform.
Glossary tooltips
-
VAE (Variational Autoencoder)
A generative model that learns to encode data into a latent space and decode it back, enabling the generation of new, similar data samples. In drug discovery, VAEs are used to generate novel molecular structures by exploring chemical space.
-
GAN (Generative Adversarial Network)
A generative model consisting of two neural networks—the generator and the discriminator—that are trained simultaneously. The generator creates new data instances, while the discriminator evaluates them. GANs are employed in drug discovery to generate realistic molecular structures.
-
SMILES (Simplified Molecular Input Line Entry System)
A notation that encodes molecular structures into linear strings of text, facilitating the input of chemical structures into computational models for analysis and generation.
-
QED Score (Quantitative Estimate of Drug-likeness)
A metric that evaluates the drug-likeness of a molecule based on properties like molecular weight, lipophilicity, and hydrogen bond donors/acceptors. Higher QED scores indicate higher potential as a drug candidate.
-
ADMET
An acronym for Absorption, Distribution, Metabolism, Excretion, and Toxicity—key pharmacokinetic and pharmacodynamic properties that determine a compound's suitability as a drug.
-
Multimodal AI
Artificial intelligence systems capable of processing and integrating multiple types of data inputs, such as text, images, and molecular structures, to enhance predictive modeling and decision-making in drug discovery.
-
Retrieval-Augmented Generation (RAG)
A framework that combines information retrieval with generative models, allowing AI systems to access external knowledge sources during generation, thereby improving the relevance and accuracy of outputs in tasks like literature-based drug discovery.
-
Protein Language Model (PLM)
A type of AI model trained on protein sequences to understand and generate protein-related data. PLMs are used in predicting protein structures and functions, aiding in tasks like antibody design.
-
De novo design
The process of designing new molecules from scratch, without relying on existing compounds, often using computational models to predict structures with desired biological activity.
-
Latent space
A compressed, abstract representation of data learned by models like VAEs, where similar data points are positioned closely, enabling efficient exploration and generation of new data instances.
Subscribe to our Blog and get new articles right after publication into your inbox.