Abstract

Abstract Motivation: The annotation of the Arabidopsis thalianagenome remains a problem in terms of time and quality. To improve the annotation process, we want to choose the most appropriate tools to use inside a computer-assisted annotation platform. We therefore need evaluation of prediction programs with Arabidopsis sequences containing multiple genes. Results: We have developed AraSet, a data set of contigs of validated genes, enabling the evaluation of multi-gene models for the Arabidopsis genome. Besides conventional metrics to evaluate gene prediction at the site and the exon levels, new measures were introduced for the prediction at the protein sequence level as well as for the evaluation of gene models. This evaluation method is of general interest and could apply to any new gene prediction software and to any eukaryotic genome. The GeneMark.hmm program appears to be the most accurate software at all three levels for the Arabidopsis genomic sequences. Gene modeling could be further improved by combination of prediction software. Availability: The AraSet sequence set, the Perl programs and complementary results and notes are available at http://sphinx.rug.ac.be:8080/biocomp/napav/. Contact: Pierre.Rouze@gengenp.rug.ac.be

Keywords

AnnotationGene predictionArabidopsisPerlComputer scienceSoftwareGenomeComputational biologyGene AnnotationGenome projectSet (abstract data type)GenomicsArabidopsis thalianaGeneData miningGeneticsBiologyArtificial intelligenceProgramming language

Affiliated Institutions

Related Publications

NetAffx: Affymetrix probesets and annotations

NetAffx (http://www.affymetrix.com) details and annotates probesets on Affymetrix GeneChip microarrays. These annotations include (i) static information specific to the probeset...

2003 Nucleic Acids Research 486 citations

Publication Info

Year
1999
Type
article
Volume
15
Issue
11
Pages
887-899
Citations
107
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

107
OpenAlex

Cite This

Nathalie Pavy, Stéphane Rombauts, Patrice Déhais et al. (1999). Evaluation of gene prediction software using a genomic data set: application to Arabidopsis thalianasequences. Bioinformatics , 15 (11) , 887-899. https://doi.org/10.1093/bioinformatics/15.11.887

Identifiers

DOI
10.1093/bioinformatics/15.11.887