Abstract

The crystallographically determined bond length, valence angle, and torsion angle information in the Cambridge Structural Database (CSD) has many uses. However, accessing it by means of conventional substructure searching requires nontrivial user intervention. In consequence, these valuable data have been underutilized and have not been directly accessible to client applications. The situation has been remedied by development of a new program (Mogul) for automated retrieval of molecular geometry data from the CSD. The program uses a system of keys to encode the chemical environments of fragments (bonds, valence angles, and acyclic torsions) from CSD structures. Fragments with identical keys are deemed to be chemically identical and are grouped together, and the distribution of the appropriate geometrical parameter (bond length, valence angle, or torsion angle) is computed and stored. Use of a search tree indexed on key values, together with a novel similarity calculation, then enables the distribution matching any given query fragment (or the distributions most closely matching, if an adequate exact match is unavailable) to be found easily and with no user intervention. Validation experiments indicate that, with rare exceptions, search results afford precise and unbiased estimates of molecular geometrical preferences. Such estimates may be used, for example, to validate the geometries of libraries of modeled molecules or of newly determined crystal structures or to assist structure solution from low-resolution (e.g. powder diffraction) X-ray data.

Keywords

Computer scienceTorsion (gastropod)Molecular geometryValence (chemistry)SubstructureENCODEMatching (statistics)AlgorithmCrystallographyGeometryMoleculeData miningChemistryMathematics

Affiliated Institutions

Related Publications

Publication Info

Year
2004
Type
article
Volume
44
Issue
6
Pages
2133-2144
Citations
953
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

953
OpenAlex

Cite This

Ian Bruno, Jason C. Cole, M. Kessler et al. (2004). Retrieval of Crystallographically-Derived Molecular Geometry Information. Journal of Chemical Information and Computer Sciences , 44 (6) , 2133-2144. https://doi.org/10.1021/ci049780b

Identifiers

DOI
10.1021/ci049780b