Abstract

We report a method to convert discrete representations of molecules to and\nfrom a multidimensional continuous representation. This model allows us to\ngenerate new molecules for efficient exploration and optimization through\nopen-ended spaces of chemical compounds. A deep neural network was trained on\nhundreds of thousands of existing chemical structures to construct three\ncoupled functions: an encoder, a decoder and a predictor. The encoder converts\nthe discrete representation of a molecule into a real-valued continuous vector,\nand the decoder converts these continuous vectors back to discrete molecular\nrepresentations. The predictor estimates chemical properties from the latent\ncontinuous vector representation of the molecule. Continuous representations\nallow us to automatically generate novel chemical structures by performing\nsimple operations in the latent space, such as decoding random vectors,\nperturbing known chemical structures, or interpolating between molecules.\nContinuous representations also allow the use of powerful gradient-based\noptimization to efficiently guide the search for optimized functional\ncompounds. We demonstrate our method in the domain of drug-like molecules and\nalso in the set of molecules with fewer that nine heavy atoms.\n

Affiliated Institutions

Related Publications

Skip-Thought Vectors

We describe an approach for unsupervised learning of a generic, distributed sentence encoder. Using the continuity of text from books, we train an encoder-decoder model that tri...

2015 arXiv (Cornell University) 723 citations

Holographic reduced representations

Associative memories are conventionally used to represent data with very simple structure: sets of pairs of vectors. This paper describes a method for representing more complex ...

1995 IEEE Transactions on Neural Networks 652 citations

Publication Info

Year
2018
Type
article
Volume
4
Issue
2
Pages
268-276
Citations
2745
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

2745
OpenAlex
183
Influential
2419
CrossRef

Cite This

Rafael Gómez-Bombarelli, Jennifer N Wei, David Duvenaud et al. (2018). Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules. ACS Central Science , 4 (2) , 268-276. https://doi.org/10.1021/acscentsci.7b00572

Identifiers

DOI
10.1021/acscentsci.7b00572
PMID
29532027
PMCID
PMC5833007
arXiv
1610.02415

Data Quality

Data completeness: 84%