Humanized antibodies and computational screening: a keynote presentation by Prof Charlotte Deane at Oxford University
Prof Charlotte Deane from Oxford University commenced her talk on machine learning in antibody and protein research by explaining data and the databases developed in her lab: Observed Antibody Space (Olsen et al, 2021), Structural Antibody Database (Schneider et al., 2021) and Thera-SAbDab (Raybould et al., 2020), a self-updating database of immunotherapeutic variable domain sequences and the CoV-AbDab, the Coronavirus Antibody Database, a database of all antibodies shown to bind to SARS-CoV-2.
Talking about humanization, Prof Deane outlined the issues with current therapeutics, 50% of which are currently derived from non-human sources.
Non-human antibodies can potentially result in a harmful immune response in patients (immunogenicity) and therefore, it is important to humanize antibody therapeutics for safety and efficacy purposes. Currently, the humanization process is typically carried out experimentally, in a largely trial-and-error process.
Prof Deane presented a potential solution to finding humanized antibodies, a Hu-mAb database of humanized antibodies that uses random forest machine learning (ML) models built with over 65 million human and non-human sequences from the Observed Antibody Space database.
There are separate models for each human V gene type available in the Hu-mAb database. Prof Deane explained that feeding the ML models a sequence trying to predict whether a sequence is human or not is not a difficult task, and the prediction achieves a very high AUC-ROC (area under the curve of the receiver-operator curve), suggesting a very good performance.
Compared to other published models, the Hu-mAb model has a surprisingly good performance.
The next step of testing was testing whether Hu-mAb is able to distinguish human, humanized, chimeric, or mouse known therapeutics. Hu-mAb performed well on human and humanized antibodies, though not as well on the chimeric antibodies.
Therapeutic sequences classified as human by the Hu-mAb model tend to have low immunogenicity levels, while sequences classified as not human as more immunogenic.
The ultimate goal was to convert Hu-mAb into a humanization database so that the models could suggest the optimal humanized sequence with the lowest immunogenicity.
Prof Deane tested Hu-mAb on 25 humanized sequences that demonstrated low immunogenicity and for which the precursor sequences were available (murine, rat or rabbit).
The results suggested that 77-85% of mutations suggested by Hu-mAb to humanize the antibodies were similar to those made experimentally, and 58-59% of the mutations suggested were indeed made experimentally.
The comparison of Hu-mAb results with experimental humanization, therefore, demonstrates a good agreement but greater efficiency.
Hu-mAb proposes fewer mutations to the VH-VL interface making the orientation and therefore binding properties more likely to be preserved, leading to a greater likelihood of preserving antibody structure and function.
In summary, in addition to being able to accurately predict whether an antibody is human or not, Hu-mAb can also be used to evaluate and improve the immunogenicity profiles of antibodu sequences.
Prof Deane was next interested in adding structural information to the Hu-mAb database, to improve its prediction abilities. Adding full structural annotation of BCR data led to building a Human Antibody Model Library, that would describe the structural variability of human antibody space.
The library contains BCR data from approximately 500 healthy individuals, with sequences from unpaired naive and memory IgM molecules sourced from peripheral blood, bone marrow, and spleen. Using computational models, the sequences were reduced to ~20,000 structurally diverse antibodies that were possible to be modeled accurately.
To enable better design of therapeutic antibodies in conjunction with the Human Antibody Model Library, Prof Deane’s group developed the Therapeutic Antibody Profiler, to reduce developability issues such as poor stability or high levels of aggregation.
The Profiler was built using variable domain structure of 137 post-Phase I clinical-stage antibody therapeutics and validated two datasets of MedImmune developability failures (Jain et al., 2017 PNAS). The profiler is automatically updated with source data, enabling continuous prediction improvements.
Building on the success of the Profiler, Prof Deane presented a virtual screening tool of a model antibody library using deep learning, named Dlab (Schneider et al., 2021, Bioinformatics). The Dlab works using convoluted neural nets to predict whether an antibody will bind or not bind to the target peptide.
When tested on identifying binders amongst 50 non-binders, Dlab enriched in the top 20% binders amongst the non-binders. As the availability of data increases, virtual screening for libraries may become more and more prominent, identifying potential starting points for antibody therapeutics.
In the last part of her Talk, Prof Deane introduced ABlooper, a model improving the speed and quality of structural models of antibodies (Abanades et al., 2021). ABlooper uses equivariant graph neural networks to give predictions of complementarity-determining regions, and it also provides an estimate of the accuracy of the prediction.
ABlooper is very fast and it takes under five seconds to perform predictions on hundreds of structures. Compared to the benchmark tools, ABlooper does not perform significantly better than other prediction models. However, it offers a prediction on how good the prediction might be.