We have recently published a manuscript entitled “Automatic Recognition of Ligands in Electron Density by Machine Learning” showing that automatic identification of small-molecule ligands in macromolecular complexes by machine learning is now possible. CheckMyBlob, our machine learning algorithm, identifies and interprets ligands solely from electron density maps. In benchmark tests on over 200,000 ligand binding sites found in the Protein Data Bank (PDB), the correct compound was the top ranked ligand roughly 60% of the time and among the top ten predictions 91% of the time.
Correct ligand identification and refinement are pivotal for understanding the function of many macromolecules. Currently, each structure in the PDB is downloaded an average ~30,000 times. Due to the myriad ways these structures are used, inaccuracies and errors in deposited models propagate and affect other areas of science. In our paper we introduce an algorithm that helps the researchers avoid human bias from contaminating structural models during data interpretation. We also present four structures that illustrate some of the problems that CMB tackles: mislabeling in the PDB (A), wrong chirality (B), incorrectly identified ligands combined with poor refinement (C), buffer molecules occupying the active site (D). We have re-refined all four sample structures, offered the original authors a joint deposition, and deposited the corrected models in the PDB. It is important to mention that: (i) sometimes improvement of model quality was dramatic (e.g., 5% drop of Rfree); (ii) structure-remediation servers, such as PDB_REDO, do not handle ligand misinterpretation, and for all our examples they produced “better” structures but with incorrect ligands.
You can read more about CheckMyBlob in Bioinformatics.