Abstract

Recent advances in machine learning have leveraged evolutionary information in multiple sequence alignments to predict protein structure. We demonstrate direct inference of full atomic-level protein structure from primary sequence using a large language model. As language models of protein sequences are scaled up to 15 billion parameters, an atomic-resolution picture of protein structure emerges in the learned representations. This results in an order-of-magnitude acceleration of high-resolution structure prediction, which enables large-scale structural characterization of metagenomic proteins. We apply this capability to construct the ESM Metagenomic Atlas by predicting structures for >617 million metagenomic protein sequences, including >225 million that are predicted with high confidence, which gives a view into the vast breadth and diversity of natural proteins.

Keywords

MetagenomicsComputer scienceInferenceProtein structure predictionConstruct (python library)Sequence (biology)Protein structureScale (ratio)Artificial intelligenceMachine learningComputational biologyBiologyGeneticsGeographyCartography

Affiliated Institutions

Related Publications

Publication Info

Year
2023
Type
article
Volume
379
Issue
6637
Pages
1123-1130
Citations
3635
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

3635
OpenAlex

Cite This

Zeming Lin, Halil Akin, Roshan Rao et al. (2023). Evolutionary-scale prediction of atomic-level protein structure with a language model. Science , 379 (6637) , 1123-1130. https://doi.org/10.1126/science.ade2574

Identifiers

DOI
10.1126/science.ade2574