Abstract

Advancements in neural machinery have led to a wide range of algorithmic solutions for molecular property prediction. Two classes of models in particular have yielded promising results: neural networks applied to computed molecular fingerprints or expert-crafted descriptors and graph convolutional neural networks that construct a learned molecular representation by operating on the graph structure of the molecule. However, recent literature has yet to clearly determine which of these two methods is superior when generalizing to new chemical space. Furthermore, prior research has rarely examined these new models in industry research settings in comparison to existing employed models. In this paper, we benchmark models extensively on 19 public and 16 proprietary industrial data sets spanning a wide variety of chemical end points. In addition, we introduce a graph convolutional model that consistently matches or outperforms models using fixed molecular descriptors as well as previous graph neural architectures on both public and proprietary data sets. Our empirical findings indicate that while approaches based on these representations have yet to reach the level of experimental reproducibility, our proposed model nevertheless offers significant improvements over models currently used in industrial workflows.

Keywords

Computer scienceGraphMolecular graphBenchmark (surveying)Artificial intelligenceChemical spaceConvolutional neural networkWorkflowMachine learningProperty (philosophy)Artificial neural networkData miningConstruct (python library)Representation (politics)Theoretical computer scienceDrug discoveryChemistryDatabase

MeSH Terms

Computer GraphicsNeural NetworksComputer

Affiliated Institutions

Related Publications

Publication Info

Year
2019
Type
article
Volume
59
Issue
8
Pages
3370-3388
Citations
1528
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1528
OpenAlex
125
Influential
1345
CrossRef

Cite This

Kevin Yang, Kyle Swanson, Wengong Jin et al. (2019). Analyzing Learned Molecular Representations for Property Prediction. Journal of Chemical Information and Modeling , 59 (8) , 3370-3388. https://doi.org/10.1021/acs.jcim.9b00237

Identifiers

DOI
10.1021/acs.jcim.9b00237
PMID
31361484
PMCID
PMC6727618
arXiv
1904.01561

Data Quality

Data completeness: 93%