Genesis Molecular AI: Advancing Drug Discovery with PEARL and Diffusion Models

Genesis Molecular AI: Advancing Drug Discovery with PEARL and Diffusion Models

Diffusion Models as the Primitive for 3D Structure Prediction

Generative Adversarial Networks (GANs) proved ineffective for protein and ligand systems, but diffusion models have emerged as the critical primitive for 3D structure prediction. While much of the current AI focus is on large language models (LLMs), some of the most innovative diffusion research is now occurring in the field of molecular biology, specifically in predicting how proteins and small molecules interact in 3D space.

PEARL: Achieving Sub-Angstrom Resolution

Genesis Molecular AI has developed PEARL (Place Every Atom at the Right Location), a structure prediction model that takes a protein sequence and a ligand representation to predict their combined 3D structure.

Moving Beyond "Slop"

Traditional benchmarks in the field often use a 2Å RMSD (Root Mean Square Deviation) threshold for accuracy. Genesis argues that 2Å is insufficient for drug discovery because it allows for significant physical errors, such as flipped aromatic rings, which can completely invalidate a structural hypothesis for a medicinal chemist.

PEARL aims for sub-angstrom (1Å) resolution. This level of precision is necessary because critical molecular interactions, such as hydrogen bonds, occur within a very narrow distance range (typically 2.7Å to 3.3Å). An error of just 0.6Å can be the difference between a strong bond and a physical clash or a complete lack of interaction.

Modeling Induced Fit

Unlike static models, PEARL is designed to model how a protein flexes to accommodate a ligand—a process known as induced fit. In recent tests on the OpenBind benchmark (specifically the EV A721A protease target), PEARL demonstrated a superior ability to predict the movement of flexible loops in the protein, outperforming other co-folding models on targets it had not seen during training.

The Architecture of PEARL and Training Strategies

Scaling and Synthetic Data

Because the public database of crystal structures (PDB) is relatively small (approximately 200,000 structures) and grows slowly, Genesis uses physics-based simulations to generate synthetic training data. This allows the model to learn from a much larger set of molecular behaviors than would be possible using only experimental data.

Inference-Time Scaling

Similar to "thinking tokens" in advanced LLMs, Genesis employs inference-time scaling. The model uses a diffusion-based head that iteratively refines the predicted structure. During this process, physics-based guidance is used to steer the model toward physically valid outputs, improving overall performance.

SAPPHIRE: Agentic Drug Discovery

Genesis is developing SAPPHIRE, an agentic platform designed to automate the drudgery of drug discovery.

  • Orchestration: SAPPHIRE uses an LLM to orchestrate a suite of specialized tools (including PEARL and ADMET prediction models).
  • Hypothesis Generation: The agent can analyze predicted crystal structures, form hypotheses about binding, and propose new molecular candidates.
  • Strategic Direction: The goal is not to replace human scientists but to allow medicinal chemists and CAD scientists to act as grand strategists, providing direction while the agent executes the iterative design-make-test-analyze cycles.

Beyond Structure: ADMET Prediction

Predicting a 3D pose is only one part of drug discovery. A viable drug must also satisfy ADMET properties: Absorption, Distribution, Metabolism, Elimination, and Toxicity.

Genesis utilizes multitask graph neural networks to predict over 30 different properties, such as solubility and oral bioavailability. They emphasize that these properties often anti-correlate (e.g., increasing binding affinity often decreases solubility), making the search for "Pareto optimal" compounds a complex optimization problem that requires high-resolution modeling.

Integration with Wet Lab Data

Genesis partners with companies like Insight to create a tight feedback loop between AI predictions and physical synthesis.

"We want to have design, make, test, analyze cycles that are as rapid as possible and continuously fine-tune... the models based on what we see in the lab."

This partnership allows Genesis to use reinforcement learning (RL) to improve models based on actual biochemical and cellular assay results, reducing the reliance on high-throughput screens which often suffer from high false-positive rates.

Sources