EST_GENOME: a program to align spliced DNA sequences to unspliced genomic DNA

1997 Computer applications in the biosciences 275 citations

Abstract

This note describes the program EST_GENOME for aligning spliced DNA to unspliced genomic DNA. It is written in ANSI C and has been tested under Digital OSF3.2. The spurce code and documentation are available from ftp:// www.sanger.ac.uky ftp/pub/ badger/est_genome.2.tar.Z. The prediction of genes in uncharacterized genomic DNA sequence is currently one of the main problems facing sequence annotators. Methods based on de novo prediction, e.g. searching for motifs like the splice-site consensus, or on statistical properties such as biased codon usage, etc. (Solovyev et al., 1994; Hebsgaard et al., 1996) have been only partially successful, and investigators have often found that the surest way of predicting a gene is by alignment with a homologous protein sequence (Birney et al., 1996; Gelfand et al., 1996; Huang and Zhang, 1996), or a spliced gene product [an expressed sequence tag (EST), mRNA or cDNA], particularly now that a large number of ESTs are available (Hillier et al., 1996). Standard alignment tools are not ideal for finding the correct alignment of a spliced product to genomic DNA, because of the large introns which can occur in the genomic sequence and because the programs ignore the conserved sequences found at donor/acceptor splice sites (intron/exon boundaries). In addition, very large genomic DNA sequences can be hard to align using quadratic-space dynamic programming because they require too much memory. The program EST_GENOME addresses this problem. It allows large introns, can recognize splice sites and uses limited memory. This combination of features makes a powerful and useful tool. EST_GENOME is used routinely at the Sanger Centre to help annotate human genomic sequence. As it is slow compared with search methods like BLAST (Altschul et al., 1990), we first screen genomic DNA against dbEST using BLASTN. Any matching ESTs are realigned using EST_GENOME. The algorithm uses a modification of Smith and Waterman (1981). The penalty structure used to score an alignment is as follows (defaults are in parentheses). Aligned bases score +match (1) or cost —mismatch (1) as appropriate. An indel in

Keywords

GeneticsBiologyGenomegenomic DNAGene predictionIntronComputational biologyGene

Affiliated Institutions

Related Publications

Publication Info

Year
1997
Type
article
Volume
13
Issue
4
Pages
477-478
Citations
275
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

275
OpenAlex

Cite This

Richard Mott (1997). EST_GENOME: a program to align spliced DNA sequences to unspliced genomic DNA. Computer applications in the biosciences , 13 (4) , 477-478. https://doi.org/10.1093/bioinformatics/13.4.477

Identifiers

DOI
10.1093/bioinformatics/13.4.477